Geographic data models

Vector data

Juan Carlos Villaseñor-Derbez (JC)

Introduction

GIS vs Geocomputation

GIS:

Powerful
Emphasize graphical user interface (GUI)
You can only do what the developers allow you to do
Difficult to reproduce

Geocomputation:

Just as powerful (same underlying code)
Command line interface (CLI)
Flexible
Reproducible

Academic research, software development and practical applications that use geographic data to solve problems, with a focus on reproducibility, flexibility and tool development. - Lovlace, Nowosad, Muenchow (GeocompR, 2024)

How do we get the world into a computer?

Imagine an attribute that occurs somewhere in the World:
- SST
- site identity
- hurricane path
- city boundaries
How do we represent it?
How do we work on it?
How do we store the data?

We need a model

Data can be represented in a vector model, or a raster model

In today’s class:

Introduce the why behind each model
Discuss where and when they are used
Focus on vector:
- It’s main attributes
- It’s implementation in R

Today’s goals

By the end of the lecture, you should be able to:

list the two main data models, and articulate their main characteristics
understand the principles of vector data
list the three main components of a spatial vector object
list the tree main types of a simple feature
given an sf object, plot it and inspect it

Two models

`vector`

World is represented using geometries: points, lines, and polygons
Discrete and well-defined borders = high precision
Computationally intensive

`raster`

Space is divided into equal-sized grid cells (or pixels)
Aggregate spatial features to a given resolution
You can lose some precision, but much faster to work with

Note

Spatial data can be represented in a vector or raster model

Code

# Build the "spatial" part
rsmaes_pt_coords <- st_point(x = c(-80.1632879, 25.7328129),
                             dim = "XY") |> 
  st_sfc(crs = 4326)
# Combine with the "attributes" part
rsmaes_pt <- st_sf(id = "Rosenstiel (point)",
                   geometry = rsmaes_pt_coords)
# Visualize
mapview(rsmaes_pt)

Code

rsmaes_line_coords <- st_linestring(
  x = matrix(data = c(-80.163017, 25.733950,
                      -80.164236, 25.732816,
                      -80.163772, 25.732353,
                      -80.163924, 25.732148,
                      -80.163455, 25.731597,
                      -80.162187, 25.731605,
                      -80.160968, 25.732172,
                      -80.163017, 25.733950),
             ncol = 2,
             byrow = T),
  dim = "XY") |> 
  list() |> 
  st_sfc(crs = 4326)

rsmaes_line <- st_sf(id = "Rosenstiel (line)",
                     geometry = rsmaes_line_coords)

mapview(rsmaes_line)

Code

rsmaes_poly_coords <- st_polygon(
  x = list(
    matrix(
      data = c(
        -80.163017, 25.733950,
        -80.164236, 25.732816,
        -80.163772, 25.732353,
        -80.163924, 25.732148,
        -80.163455, 25.731597,
        -80.162187, 25.731605,
        -80.160968, 25.732172,
        -80.163017, 25.733950),
      ncol = 2,
      byrow = T)),
  dim = "XY") |> 
  st_sfc(crs = 4326)

rsmaes_poly <- st_sf(id = "Rosenstiel (polygon)",
                     geometry = rsmaes_poly_coords)

mapview(rsmaes_poly)

Code

rsmaes_rast <- rasterize(
  x = vect(rsmaes_poly),
  y = rast(resolution = 0.0001,
           crs = "EPSG:4326",
           val = 0,
           xmin = -80.168, xmax = -80.155,
           ymin = 25.730, ymax = 25.735),
  field = 1)

names(rsmaes_rast) <- "Rosenstiel"
rsmaes_rast[is.na(rsmaes_rast)] <- 0

mapview(rsmaes_rast,
        legend = F)

Common uses

`vector`

Common in social sciences:
- Census tract boundaries
- Political boundaries
- Roads

`raster`

Common in natural sciences:
- Sea surface temperature
- Satellite imagery
- Species distribution models

As environmental scientists and practitioners we will typically use both
The “correct” type of data varies by use case
You’ll be able to make this choice in a few weeks

“raster is faster, but vector is correctr”

Vector data model

World is represented using geometries: points, lines, and polygons

Q: What is a line? What is a polygon?

A: A sequence of points

Q: How do we represent a point?

A: Coordinates

Q: What are coordinates?

A: Two numbers representing distance from an origin: a coordinate reference system

Note

The vector data model is based on points located within a coordinate reference system (CRS)

What is where? And what does “where” mean?

Coordinates, Coordinate Reference Systems, and Attributes

What does “where” mean?

Consider the following two coordinates:

(-80.1632879, 25.7328129)
(1186764, 2863941)

Where are they?

The importance of attributes

Coordinates and a CRS are enough to place the point on a map
Also enough to perform spatial analysis
But not enough to convey information about the location
Often not enough to derive insights
We also need attributes

Note

The three main components of a spatial vector object are attributes (what), geometry, and CRS (where)

Attributes

Are not inherently spatial
Can be a discrete or categorical variable representing a characteristic of interest:
- Identity
- Sea Surface Temperature
- Time
Note that you can have more than one attribute tied to a geometry
- 2 attributes: vessel-tracking data or hurricane path (id and time)
- 2+ attributes: Buoy weather data (id, time, and all variables measured)

Handling spatial data in a computer

How are these compotents handled in a computer?

We rely on four key low-level libraries for geocomputation:

GDAL: read, write, and manipulating geographic data
PROJ: coordinate system transformation
GEOS: planar geometry engine only to be used with data with a projected CRS)
S2: a spherical geometry engine, written in C++ (fast) and developed by Google (reliable)

All are incredibly powerful and open source, but not exactly user-friendly

Ways of interacting with these libraries

The `sf` package

From R, our main point of access will be the sf package
When we load sf, it automatically links to these low level libraries

library(sf)

Linking to GEOS 3.13.0, GDAL 3.10.0, PROJ 9.5.1; sf_use_s2() is TRUE

sf provides:
- A consistent command-line interface to communicate with the libraries
- R classes and objects to handle vector data using the Simple Features standard

Introduction to Simple Features

Simple Feature

Simple features is an official, open standard (i.e. registered at the ISO)
Hierarchical data model to represent types of geometries (i.e. points, lines, polygons, multi-*, and collections)

The standard says:

“A simple feature is defined by the OpenGIS Abstract specification to have both spatial and non-spatial attributes. Spatial attributes are geometry valued, and simple features are based on 2D geometry with linear interpolation between vertices.”

—Pebesma, 2024

Quick example 1: working with `sf` objects

# Read the data
milton <- read_sf(here("data", "milton.gpkg"), quiet = T)
# Inspect it
class(milton)

[1] "sf"         "tbl_df"     "tbl"        "data.frame"

# See its names
names(milton)

[1] "NAME"     "ISO_TIME" "USA_WIND" "USA_PRES" "USA_SSHS" "geom"

# Plot it (note you get one plot per column, except for geometry)
plot(milton)

Quick example 2: `sf` object is just a `data.frame`

You can quickly calculate summary statistics for any attribute variable

Code

summary(milton$USA_WIND)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  25.00   34.00   75.00   81.87  134.00  150.00

Keep only some columns

Code

milton_light <- milton[,c(2, 3)]
milton_light

Simple feature collection with 55 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -95.5 ymin: 20.6 xmax: -77.5 ymax: 29.5
Geodetic CRS:  WGS 84
# A tibble: 55 × 3
   ISO_TIME            USA_WIND         geom
   <dttm>                 <dbl>  <POINT [°]>
 1 2024-10-03 20:00:00       25 (-94.7 20.6)
 2 2024-10-03 23:00:00       25 (-94.8 20.7)
 3 2024-10-04 02:00:00       25 (-94.9 20.7)
 4 2024-10-04 05:00:00       25   (-95 20.7)
 5 2024-10-04 08:00:00       25   (-95 20.8)
 6 2024-10-04 11:00:00       25 (-95.1 20.8)
 7 2024-10-04 14:00:00       25 (-95.1 20.9)
 8 2024-10-04 17:00:00       28   (-95.1 21)
 9 2024-10-04 20:00:00       30 (-95.2 21.1)
10 2024-10-04 23:00:00       30 (-95.3 21.3)
# ℹ 45 more rows

Or plot the attribute variables

Code

ggplot(data = milton_light,
       mapping = aes(x = ISO_TIME,
                     y = USA_WIND)) +
  geom_line() +
  geom_point() +
  labs(x = "Date",
       y = "Maximum sustained wind speed (knots)")

Why simple features?

Why use simple features

STANDARD = transferability and reproducibility (important in science and policy making!)
Widely supported data model (e.g. QGIS and PostGIS)
Relatively quick I/O
Plotting performance (easy to use attributes to specify visualization characteristics)
Full R support: sf objects can be treated as data.frames (wrangling and visualization, pipe-friendly)

Visualizing simple features

(Just an overview for now)

Basic maps

You can quickly produce rough-and-ready maps using the base R function plot

plot(USA)

This will give you a multi-panel figure with one panel per attribute (default max of 9)

Other parameters to `plot()`

You can specify you only want one panel

plot(USA, max.plot = 1)

Other parameters to `plot()`

Or that you specificlaly want a column (either by name or by position)

plot(USA["woe_id"])

plot(USA[,18])

Adding layers

Use reset = FALSE and add = TRUE

plot(USA[,1], reset = F)
plot(FL, add = T, col = "orange")
plot(milton, add = T, col = "red")
plot(MEX, add = T, col = "gray")

Adding layers

reset = FALSE needed when plotting the first layer produces a key OR the extent needs to be updated as new layers come in

plot(USA$geometry)
plot(FL, add = T, col = "orange")
plot(milton, add = T, col = "red")
plot(MEX, add = T)

See ?sf::plot_sf() for more details

Basic = limitations

Base plot() is great for:

Rapidly visualizing your data (just a few lines of code)
Checking correspondence between layers

Base plot() is terrible for:

Building complex visualizations (including formal maps)

We will cover visualization in more detail in a few weeks. In the meantime, two quick ways

Visualizing `sf` objects with `ggplot`

p <- ggplot(data = USA) +
  geom_sf(fill = "gray90")

p

p +
  geom_sf(data = FL, fill = "orange") +
  geom_sf(data = milton, color = "red", size = 1) +
  geom_sf(data = MEX)

This is what I use to build 99% of my maps, including those for publications and proposals

Visualizing `sf` objects with `mapview`

Just like plot(), mapview produces quick visualizations
Customization is painful, but worth it (although leaflet may be better)

mapview(list(USA, FL, milton, MEX))

This is what I use to interactively explore my data

Note

For now, recall that you can quickly visualize sf objects using plot(), ggplot(), or mapview()

Geometry types

The main types of simple features

POINT zero-dimensional geometry containing a single point
LINESTRING sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry
POLYGON geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring
MULTIPOINT set of points
MULTILINESTRING set of linestrings
MULTIPOLYGON set of polygons
GEOMETRYCOLLECTION set of geometries of any type except GEOMETRYCOLLECTION

`sf` objects

Remember the objects I built on the slides with tabs?

Point

class(rsmaes_pt)

[1] "sf"         "data.frame"

rsmaes_pt

Simple feature collection with 1 feature and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.16329 ymin: 25.73281 xmax: -80.16329 ymax: 25.73281
Geodetic CRS:  WGS 84
                  id                   geometry
1 Rosenstiel (point) POINT (-80.16329 25.73281)

Polygon

class(rsmaes_poly)

[1] "sf"         "data.frame"

rsmaes_poly

Simple feature collection with 1 feature and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -80.16424 ymin: 25.7316 xmax: -80.16097 ymax: 25.73395
Geodetic CRS:  WGS 84
                    id                       geometry
1 Rosenstiel (polygon) POLYGON ((-80.16302 25.7339...

Note

Both are sf objects, which are data.frames /tibbles with a special column called geometry

Let’s dive deeper into how an object of class sf is made

The `sf` class

An `sf` object

Two main parts: geometries (spatial) and attributes (non-geographic)

GeoCompR, 2024 - Rhomboids are functions, rectangles are objects

rsmaes_pt <- c(-80.1632879, 25.7328129) |>      # Coordinates
  st_point() |>                                 # Build sfg
  st_sfc(crs = "ESRI:4326") |>                  # Build sfc (an specify CRS)
  st_sf(                                        # Build sf
    data.frame(id = "Rosenstiel (point)",       # Add the data.frame portion
               foundation = 1943))

rsmaes_pt

Simple feature collection with 1 feature and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.16329 ymin: 25.73281 xmax: -80.16329 ymax: 25.73281
CRS:           NA
                  id foundation
1 Rosenstiel (point)       1943
  st_sfc.st_point.c..80.1632879..25.7328129....crs....ESRI.4326..
1                                      POINT (-80.16329 25.73281)

Simple feature geometries (`sfg`)

These are the essential building blocks
They represent the type of simple features that can be handled in R
There is one function to create each type
You will rarely use them
They take numbers, matrices, or lists as inputs
Do not hold information on CRS

Simple feature columns (`sfc`)

A collection of 1 + sfg objects
Its contents may be more than one type of geometry (e.g. points and polygons)
Can contain a CRS, but must be the same across all geometries

Spherical geometries

What are spherical geometries?

Most software used to perform spatial operations assumes planar (e.g. projected) geometries
But our world is not flat
Therefore, when working with planar geometries, your choice of projection can influence your calculations
Using spherical geometries can bypass this
Spherical geometries are turned on by default (sf_use_s2(TRUE))

What do spherical geometries do?

S2 geometry assumes that straight lines between points on the globe are not formed by straight lines in the equirectangular projection, but by great circles: the shortest path over the sphere.

mex <- st_point(c(-99.165852, 19.376395)) |>
  st_sfc(crs = 4326)
ams <- st_point(c(4.646219, 52.38739)) |>
  st_sfc(crs = 4326)

# What is the distance between these points?
st_distance(mex, ams)

Units: [m]
        [,1]
[1,] 9206771

sf_use_s2(F) # Turn off spherical geometries
st_distance(mex, ams)

Units: [m]
        [,1]
[1,] 9220537

Why are they different?

One tries to calculate the distance along a straight (red) line. The other does so alogn the “curved” (black) line. On a spahere, the black line is shorter.

Which one is right?

In reality

Wrap up

Main points

There are two models with which we represent spatial data (vector and raster)
The vector model relies on simple features: points, lines, and polygons (coordinates)
sf is the main R package for working with vector data
an sf object should have three things:
- attributes (what: data frame)
- features (where: geometry column)
- CRS (in relation to what?: in the header)
sf objects can be treated like tibbles / data.frames
Quickly visualize sf objects using plot(), ggplot(), or mapview()

Next class

Working with simple feature objects in R

Geographic data models

Introduction

GIS vs Geocomputation

GIS:

Geocomputation:

How do we get the world into a computer?

We need a model

Today’s goals

Two models

vector

raster

Example: Rosenstiel

Common uses

vector

raster

Vector data model

What is where? And what does “where” mean?

What does “where” mean?

The importance of attributes

Attributes

Handling spatial data in a computer

How are these compotents handled in a computer?

Ways of interacting with these libraries

The sf package

Introduction to Simple Features

Simple Feature

Quick example 1: working with sf objects

Quick example 2: sf object is just a data.frame

Why simple features?

Why use simple features

Visualizing simple features

Basic maps

Other parameters to plot()

Other parameters to plot()

Adding layers

Adding layers

Basic = limitations

Visualizing sf objects with ggplot

Visualizing sf objects with mapview

Geometry types

The main types of simple features

sf objects

The sf class

An sf object

Simple feature geometries (sfg)

Simple feature columns (sfc)

Spherical geometries

What are spherical geometries?

What do spherical geometries do?

Why are they different?

Which one is right?

Wrap up

Main points

Next class

Working with

`vector`

`raster`

`vector`

`raster`

The `sf` package

Quick example 1: working with `sf` objects

Quick example 2: `sf` object is just a `data.frame`

Other parameters to `plot()`

Other parameters to `plot()`

Visualizing `sf` objects with `ggplot`

Visualizing `sf` objects with `mapview`

`sf` objects

The `sf` class

An `sf` object

Simple feature geometries (`sfg`)

Simple feature columns (`sfc`)