Working with spatial data in R

EVR 628- Intro to Environmental Data Science

Juan Carlos Villaseñor-Derbez (JC)

Rosenstiel School of Marine, Atmospheric, and Earth Science and Institute for Data Science and Computing

Announcements

Upcoming EVR Seminar

Travis Coan, Professor of Computational Social Science, University of Exeter

Last Week

  • Spatial data can be represented in a vector model, or a raster model
  • We focused on vector data, which is represented by points, lines, polygons (and combination of these)
  • vector data are often saved as shapefiles, geoJSON files, or geopackage files
  • In R, we use the simple features standard to work with vector data via {sf}
  • Regardless of the file extension (.shp, .gpkg…), we read them with sf::read_sf()
  • sf objects have three main components:
    • CRS: Where is the origin
    • geometry: The spatial portion of the data
    • attributes: The characteristics associated to each point, line, or polygon

Learning Objectives

By the end of this week, you should be able to

  • Identify whether a map was built with raster or vector data (or both)
  • Load and work with raster objects with {terra}
  • Visualize and manipulate raster data with {tidyterra}

Raster Data Model

Raster Data Model

  • The world is represented with a continuous grid
  • Grid is made up of grid cells or pixels
  • All pixels are the same size1
  • For most applications, grid cells will be “square”

Raster Data Model

Raster Data Model

Common Applications:

  • Usually used to represent continuous information:
    • Elevation / Depth
    • Temperature
    • Population density
    • Spectral data (in remote sensing applications)
  • Discrete data can also be used

Raster Data Model

Common extensions include:

  • .tiff / .tif:
    • “Tagged Image File Format”
  • .asc
  • .nc: Common in remote sensing and climate model output

TIFF or “GeoTiff” is simply an image made of pixels

Components of a Raster Object

  • Two main components:
    • Header: metadata with CRS, extent, and origin
    • Matrix: the data we want to represent
  • x-coordinates = columns
  • y-coordinates = rows
  • In terra the “origin” of the matrix is the top-left corner

Source: Wickham, Cetinkaya-Rundel, and Grolemund (2023)

Raster Is Faster. . . But Why?

  • There is a fundamental relationship between resolution, extent, and origin:

\[ \mathrm{resolution_x} = \frac{x_{max} - x_{min}}{n_{col}} \]

  • Raster data are typically “lighter” (and faster to work with) because we don’t need to store the coordinates for all corners of each cell

Q: How many coordinates does R need to store if you specify the origin, extent, and resolution?

A: One coordinate pair, lattitude and longitude for the origin

  • In contrast to vector data, the cell of one raster layer can only hold a single value

Raster Is Faster. . . But Why?

Source: Wickham, Cetinkaya-Rundel, and Grolemund (2023)
  • I just need to store the coordinates for the center of cell #1
  • Coordinates for cell number 2 are \((X_1 + resolution, Y_1)\)

Raster Data in R

The terra Package

{terra}: An R package to work with raster datasets

The {terra} package:

  • Replaces the older {raster} package
  • Provides functions to create, read, transform, manipulate, and export raster data
  • Has its own approach to vector data (not friendly with the tidyverse)
  • Still plays well with sf

The terra Package: Example

  • I can read a raster file into using the function rast()
library(terra) # Load terra
# Load in my raster
depth <- rast("data/raw/depth_raster.tif")
depth # Printing a raster reveals the header, not the data
class       : SpatRaster 
size        : 3600, 7200, 1  (nrow, ncol, nlyr)
resolution  : 0.05, 0.05  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : depth_raster.tif 
name        :       depth_m 
min value   : -10273.276367 
max value   :      0.985951 
  • Q: What is the resolution of this raster?
  • A: 0.05°
  • Q: What is the extent?
  • A: The world

Find info contained in the header with specific functions:

ext(depth)   # Gets the extent
SpatExtent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
res(depth)   # Gets the resolution
[1] 0.05 0.05
dim(depth)   # Returns nrow, ncol, nlayers
[1] 3600 7200    1
ncell(depth) # Number of pixels (values)
[1] 25920000
names(depth) # Gets the names for each layer
[1] "depth_m"

Visualizing Rasters in R

The fastest way is to use the base plot() function

plot(depth) # Single line of code gets me a decent map

Visualizing Rasters in R

As with vector data, I can layer different pieces of my plot

plot(depth) # Build plot
plot(FL_counties, add = T) # Add FL counties on top

  • Q: How are missing values represented?
  • A: As transparent (white) pixels

Manipulating Raster Objects

Task: Produce a map of water depth around FL

  • Q: In human language, what do I have to do?
  • A: “Zoom-in” on my global raster: crop()1
FL_depth <- crop(depth, FL_counties) # Crop the global depth raster to the extent of FL_counties
plot(FL_depth)
plot(FL_counties, add = T)

Manipulating Raster Objects

Task: Build the same map, but only show shallow (depth > -100 m) waters

  • Rasters are just matrices, so we can use our trusty binary operators
plot(FL_depth <= -100)

Manipulating Raster Objects

Task: Build the same map, but only show shallow (depth > -100 m) waters

  • In a data.frame (or sf) object, you can remove rows (observations) based on values
  • But a raster is not a tidy table. It’s a matrix, so removing rows wouldn’t be wise
  • Instead of filtering data, we turn them into NA
# Crop FL
FL_depth <- crop(depth, FL_counties)
# Find pixels deeper than 100 and replace them with NA
FL_depth[FL_depth <= -100] <- NA
# Plot it
plot(FL_depth)
plot(FL_counties, add = T)
  • This code is not very readable

Tidy rasters in R: {tidyterra}

Tidy Rasters in R: {tidyterra}

{tidyterra}: Common methods of the tidyverse for objects created with {terra}

The {tidyterra} package:

  • Developed by Diego Hernangómez (Hernangómez (2023))
  • Extends, but doesn’t replace, the terra package
  • Provides functions to manipulate and visualize the attribute portion of the raster
  • Brings ggplot capabilities (geom_spatraster())

Visualizing Rasters in R: {tidyterra}

library(tidyterra) # Load terra
ggplot() +         # Begin a ggplot
  geom_spatraster(data = depth, # specify object to plot
                  aes(fill = depth_m)) + # specify layer to plot
  geom_sf(data = FL_counties) # Add FL counties on top

  • Q: How are missing values represented?
  • A: As gray pixels

Manipulating Rasters with tidyterra

library(tidyterra)
FL_depth <- depth |> 
  terra::crop(FL_counties) |> 
  tidyterra::filter(depth_m >= -100)

plot(FL_depth)
plot(FL_counties, add = T)

Remember:

  • terra for the spatial part (e.g. crop())
  • tidyterra for the attribute part (e.g. filter())

Rasters with more than one attribute (layer)

Combining Rasters in terra

Source: Wasser et al., 2024

All layers must have the same extent and resolution

Combining Rasters in terra

# Let's read habitat suitability models for four sea turtle species
log <- rast("data/raw/AquaMaps/Caretta_caretta.tiff")
green <- rast("data/raw/AquaMaps/Chelonia_mydas.tiff")
leather <- rast("data/raw/AquaMaps/Dermochelys_coriacea.tiff")
hawk <- rast("data/raw/AquaMaps/Eretmochelys_imbricata.tiff")
# We can combine them together using the `c()` function
turtles <- c(log, green, leather, hawk)
turtles
class       : SpatRaster 
size        : 300, 720, 4  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180, 180, -75, 75  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
sources     : Caretta_caretta.tiff  
              Chelonia_mydas.tiff  
              Dermochelys_coriacea.tiff  
              Eretmochelys_imbricata.tiff  
names       : Caretta_caretta, Chelonia_mydas, Dermoch~oriacea, Eretmoc~bricata 
min values  :            0.01,           0.01,            0.01,            0.01 
max values  :            1.00,           1.00,            1.00,            1.00 

Multi-Layer Rasters

Calling plot() on a multi-layer raster shows us one sub-plot per layer

plot(turtles)

Multi-Layer Rasters

Task: Calculate the mean habitat suitability around FL waters

turtle_habitat <- turtles |> # Start from turtles
  # And pipe into filter()
  filter(Caretta_caretta > 0.5,
         Chelonia_mydas > 0.5,
         Dermochelys_coriacea > 0.5,
         Eretmochelys_imbricata > 0.5)
# Inspect the metadata
turtle_habitat
# And plot it
plot(turtle_habitat)
class       : SpatRaster 
size        : 300, 720, 4  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180, 180, -75, 75  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
names       : Caretta_caretta, Chelonia_mydas, Dermoch~oriacea, Eretmoc~bricata 
min values  :            0.51,           0.51,            0.55,            0.51 
max values  :            1.00,           1.00,            1.00,            1.00 

Multi-Layer Rasters

Task: Calculate the mean habitat suitability around FL waters

turtle_habitat <- turtles |> # Start from turtles
  # And pipe into filter()
  filter(Caretta_caretta > 0.5,
         Chelonia_mydas > 0.5,
         Dermochelys_coriacea > 0.5,
         Eretmochelys_imbricata > 0.5) |> 
  # Then crop it to FL counties
  crop(st_buffer(FL_counties, dist = 100000)) |>  # Any guesses on what st_buffer is doing?
  mean()

Wrap Up

  • Raster Data Model: Continuous grid representation with pixels/cells of uniform size
    • Ideal for continuous data (elevation, temperature, population density)
    • More storage-efficient than vector data
    • Each cell holds a single value per layer
  • The terra Package:
    • Modern replacement for {raster} package
    • Use rast() to read raster files (.tif, .tiff, .asc, .nc)
  • The tidyterra Package:
    • Extends terra with tidyverse-friendly functions
    • Use terra for spatial operations (e.g., crop())
    • Use tidyterra for attribute operations (e.g., filter(), mutate())
  • Multi-Layer Rasters:
    • Combine multiple rasters with c()
    • All layers must have same extent and resolution
    • Use mean(), sum(), etc. to aggregate across layers

Before Next Class:

  • Update RStudio
  • Install terra, tidyterra, and ggspatial
install.packages("terra")
install.packages("tidyterra")
install.packages("ggspatial")

References

Hernangómez, Diego. 2023. “Using the Tidyverse with Terra Objects: The Tidyterra Package.” Journal of Open Source Software 8 (91): 5751. https://doi.org/10.21105/joss.05751.
Wickham, Hadley, Mine Cetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, CA: O’Reilly Media.