id | total_length_mm |
---|---|
001-Po-16/05/10 | 213 |
002-Po-29/05/10 | 124 |
003-Pd-29/05/10 | 166 |
Week 2: Data visualization
By the end of this week, you should be able to:
ggplot2
data_lionfish
, from the EVR628tools
packageTip
Use ?data_lionfish
to look at the documentation.
The type of data you have can influence how you visualize them
id | site | total_length_mm | size_class |
---|---|---|---|
001-Po-16/05/10 | Paraiso | 213 | large |
002-Po-29/05/10 | Paraiso | 124 | medium |
003-Pd-29/05/10 | Pared | 166 | medium |
004-Cs-12/06/10 | Canones | 203 | large |
005-Cs-12/06/10 | Canones | 212 | large |
006-Pl-21/06/10 | Paamul | 210 | large |
What type of data is shown in variable id
?
How about total_length_mm
?
And size_class
?
name | iso_time | lat | lon | sshs |
---|---|---|---|---|
MILTON | 2024-10-05 09:00:00 | 21.7 | -95.5 | -3 |
MILTON | 2024-10-05 12:00:00 | 22.0 | -95.5 | -1 |
MILTON | 2024-10-05 15:00:00 | 22.3 | -95.5 | -1 |
MILTON | 2024-10-05 18:00:00 | 22.5 | -95.5 | 0 |
MILTON | 2024-10-05 21:00:00 | 22.6 | -95.5 | 0 |
What type of data is sshs
?
Trick question!
Use ?data_milton
to look at the documentation.
What would -2.5 in sshs
mean?
There are many, many, MANY types of visualizations:
But what really matters:
Remember our goal:
To communicate something to the viewer (our ourselves)
That something is usually one of 4:
Distribution: “Most of my fish are quite small” (example)
Relationship (or correlation): “Look, slightly larger fish are waaaay heavier!” (example)
Evolution: “The wind speed of a hurricane was highest the night of Oct 7th”(example)
Ranking / Part of a whole: “Largest fish comes from Castillo”(example)
Any visual can usually achieve more than one of this at a time
Most visuals can only achieve one of these effectively at a time
Message: “Most lionfish are around 100 mm in length”
ggplot2::geom_histogram()
Message: “Tzimin-Ha has the heaviest fish”
Tip
Use this one with care. It packs a lot of information and not everyone knows (remembers) how to read it.
Message: “Most fish are medium sized and come from Paamul”
ggplot2::geom_bin_2d()
Is my x-axis bothering you?
R does not know that there is a logical order (small -> medium -> large).
Message: “Look, slightly larger fish are waaaay heavier!”
ggplot2::geom_point()
Message: “The wind speed of a hurricane was highest the night of Oct 7th”
ggplot::geom_line()
Message: “Largest fish comes from Castillo”
ggplot2::geom_col()
, but there is also ggplot2::geom_bar()
Which is best at showing the relationship between size and weight?
Which is best at showing the size and weight of most fish?
Which is best at showing me the number of samples by site and size?
The right visual is the one that gets the message across
Ask yourself some questions:
There are resources to help you brainstorm:
You know what type of graph you want, lets make it effective
EVR628tools::palette_UM()
)viridis
colors)Which one is better? Why?
p <- ggplot(data = data_lionfish,
mapping = aes(x = total_length_mm,
y = total_weight_gr)) +
geom_point()
p +
theme_gray() +
scale_x_continuous(breaks = seq(0, 350, by = 15),
limits = c(0, 350)) +
scale_y_continuous(breaks = seq(0, 400, by = 20),
limits = c(0, 400)) +
theme(axis.line = element_line(color = "black"),
panel.grid = element_line(color = "black")) +
labs(title = "Total length (mm) vs total weight (gr) for 109 lionfish sampled from Mexico",
subtitle = "Note that the largest fish is not the heaviest fish")
longest <- data_lionfish |> slice_max(total_length_mm)
heaviest <- data_lionfish |> slice_max(total_weight_gr)
p +
geom_text_repel(data = longest,
label = "Longest",
nudge_x = -5,
nudge_y = -150,
size = 5) +
geom_text_repel(data = heaviest,
label = "Heaviest",
nudge_x = -50,
nudge_y = -10,
size = 5) +
theme_minimal(base_size = 14) +
theme(axis.text = element_text(color = "black", size = 10),
axis.title = element_text(color = "black", size = 12)) +
labs(x = "Total length (mm)",
y = "Total weight (gr)") +
labs(title = "There's always a bigger fish",
subtitle = "The largest fish is not the heaviest fish")
ggplot(data = data_lionfish,
mapping = aes(x = site, fill = site == "Paamul")) +
geom_bar() +
scale_fill_manual(values = c("FALSE" = "gray",
"TRUE" = "darkred")) +
coord_flip() +
labs(x = "Site", y = "N",
title = "N = 31 come from <span style='color:darkred;'>Paamul</span>") +
theme(plot.title = element_markdown(),
axis.title.y = element_markdown(),
legend.position = "None") +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 32))
Do you really need to use color?
Use better colors
Avoid redundant use of your limited aesthetics (x, y, size, color, shape)
Use a discrete color palette for categorical data
It is difficult to track more than (6) 10 colors
These colors come from EVR628tools::palette_UM()
, using UM’s visual style guide
When was it warmer / colder / average?
Single hue palette
ggplot(data = data_lionfish,
mapping = aes(x = total_length_mm, y = total_weight_gr, color = depth_m, size = fct_relevel(size_class, c("small", "medium", "large")), shape = site)) +
geom_point() +
labs(x = "Total length (mm)",
y = "Total weight (gr)",
color = "Site",
shape = "Site",
size = "Size class") +
scale_shape_manual(values = c(1:10))
Use these as guidelines, not as absolute truths
By the end of this week, you should be able to:
ggplot2
Before Thursday: Read Chapter 1 of R4DS
ggplot2
Grammar:
Grammar of graphics:
ggplot2
:
aes
thetic mappings and geom
etric objectsYou will need to load two packages
Recall the data_lionfish
data:
Rows: 109
Columns: 9
$ id <chr> "001-Po-16/05/10", "002-Po-29/05/10", "003-Pd-29/05/10…
$ site <chr> "Paraiso", "Paraiso", "Pared", "Canones", "Canones", "…
$ lat <dbl> 20.48361, 20.48361, 20.50167, 20.47694, 20.47694, 20.5…
$ lon <dbl> -87.22611, -87.22611, -87.21167, -87.23278, -87.23278,…
$ total_length_mm <dbl> 213, 124, 166, 203, 212, 210, 132, 122, 224, 117, 211,…
$ total_weight_gr <dbl> 112.70, 27.60, 52.30, 123.10, 129.00, 138.75, 50.29, 1…
$ size_class <chr> "large", "medium", "medium", "large", "large", "large"…
$ depth_m <dbl> 38.1, 27.9, 18.5, 15.5, 15.0, 22.7, 13.4, 18.5, 18.2, …
$ temperature_C <dbl> 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 28, 28, 28, 28…
ggplot2
aes
thetic mappingsgeom
etric representationgeom
s as needed (optional)aes
theticsaes
theticsgeom
etric Representationggplot(data = data_lionfish,
mapping = aes(x = depth_m, y = total_length_mm)) +
geom_point(shape = 21, fill = "steelblue", size = 2) +
labs(x = "Depth (m)",
y = "Total length (mm)",
title = "Body length and depth",
subtitle = "Larger fish tend to live deeper",
caption = "Source EVR628tools::data_lionfish")
ggplot2
cheatsheetMy guide for live coding