Geographic data models

Raster data

Juan Carlos Villaseñor-Derbez (JC)

Intro slide

We mentioned that spatial data can be represented in a vector model, or a raster model
We focused on vector data, which is represented by points, lines, polygons (and combination of these)
In R, we use the simple features standard to work with vector data via the sf package
sf objects have three main components:
- CRS
- geometry
- attributes

Raster data model

The world with the continuous grid of cells (gridcells, grid cells, grid-cells, or pixels)
All pixels are the same size¹
For 99% of the cases grid cells will be squares

Code

rsmaes_poly_coords <- st_polygon(
  x = list(
    matrix(
      data = c(
        -80.163017, 25.733950,
        -80.164236, 25.732816,
        -80.163772, 25.732353,
        -80.163924, 25.732148,
        -80.163455, 25.731597,
        -80.162187, 25.731605,
        -80.160968, 25.732172,
        -80.163017, 25.733950),
      ncol = 2,
      byrow = T)),
  dim = "XY") |> 
  st_sfc(crs = 4326)

rsmaes_poly <- st_sf(id = "Rosenstiel (polygon)",
                     geometry = rsmaes_poly_coords)

mapview(rsmaes_poly)

Code

rsmaes_rast <- rasterize(
  x = vect(rsmaes_poly),
  y = rast(resolution = 0.0001,
           crs = "EPSG:4326",
           val = 0,
           xmin = -80.168, xmax = -80.155,
           ymin = 25.730, ymax = 25.735),
  field = 1)

names(rsmaes_rast) <- "Rosenstiel"
rsmaes_rast[is.na(rsmaes_rast)] <- 0

mapview(rsmaes_rast,
        legend = F)

Components of a raster

Typically two components:
- Header (“metadata” with CRS, extent, and origin)
- Matrix (the actual “data” we want to represent)
x-coordinates = columns
y-coordinates = rows

GFW 1: Trees

Tree cover in the year 2000, defined as canopy closure for all vegetation taller than 5m in height. Encoded as a percentage per output grid cell, in the range 0–100.

Code

tc <- rast(here("data/Hansen_GFC2015_treecover2000_00N_080W.tif"))

tc

class       : SpatRaster 
dimensions  : 40000, 40000, 1  (nrow, ncol, nlyr)
resolution  : 0.00025, 0.00025  (x, y)
extent      : -80, -70, -10, 0  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : Hansen_GFC2015_treecover2000_00N_080W.tif 
name        : Hansen_GFC2015_treecover2000_00N_080W

Let’s check the resolution and number of cells

Code

0.00025 * 111.31 * 1000

[1] 27.8275

Code

plot(tc)

Download data from GFW earthengine portal

Why are rasters faster?

There is a fundamental relationship between resolution, extent, and origin:

\[ \mathrm{resolution} = \frac{x_{max} - x_{min}}{n_{col}} , \frac{y_{max} - y_{min}}{n_{row}} \]

Raster data are typically “lighter” (and faster to work with) because they don’t need to store all the coordinates
How many coordinates do you need to store if you know origin, extend, and resolution?

Accessing data in the raster

You can access data by their Cell ID or by their position in the matrix

Modified from GeoCompR

GFW 2: Fishers

Code

fe <- rast(here("data", "gfw_fishing_effort_2024.tif"))

fe

class       : SpatRaster 
dimensions  : 334, 720, 1  (nrow, ncol, nlyr)
resolution  : 0.5, 0.5  (x, y)
extent      : -180, 180, -77.8, 89.2  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : gfw_fishing_effort_2024.tif 
name        :      sum 
min value   :      0.0 
max value   : 778060.7

Code

plot(log(fe))