Final points

EVR 628- Intro to Environmental Data Science

Juan Carlos Villaseñor-Derbez (JC)

Rosenstiel School of Marine, Atmospheric, and Earth Science and Institute for Data Science and Computing

Student evaluations

Go to ce.miami.edu and complete your student evaluations

Final project

What You Have Learned

Intro to Environmental Data Science

Code
library(tidyverse)
library(ggrepel)

tibble(skill = c("Coding", "Stats", "Domain"),
       x = c(-1, 1, 0),
       y = c(1, 1, -1)) |> 
  ggplot(aes(x = x, y = y, label = skill, color = skill)) +
  geom_point(size = 100, alpha = 0.5) +
  geom_text(mapping = aes(label = skill),
            color = "black",
            nudge_x = c(-2, 2, 0),
            nudge_y = c(0.5, 0.5, -2)) +
  geom_text(x = 0, y = 0, label = "Data Science",
            color = "black") +
  coord_equal() +
  lims(x = c(-4, 4),
       y = c(-4, 4)) +
  theme_void() +
  theme(legend.position = "none")

Conceptual model of data science

Intro to Environmental Data Science

Modified from Wickham, Cetinkaya-Rundel, and Grolemund (2023)

Your Data Science Toolkit

Core Foundations

  • R & RStudio: Projects, packages, and the R environment
  • Version Control: Git & GitHub for reproducible workflows
  • Coding Principles: Objects, classes, pipes (|>), and data structures

Data Management

  • File organization and project structure
  • Reading/writing: CSV, Excel, RDS files
  • Relative vs absolute paths

Data Transformation

  • {dplyr}: filter(), select(), mutate(), group_by(), summarize()
  • {tidyr}: pivot_longer(), pivot_wider(), joins
  • Data cleaning and wrangling pipelines

Visualization

  • {ggplot2}: Grammar of graphics, geoms, aesthetics
  • Advanced: Facets, layering, themes, color schemes
  • Visualization principles and best practices

Specialized Tools for Environmental Data

Spatial Data Analysis

  • {sf}: Vector data (points, lines, polygons)
  • {terra}: Raster data
  • Spatial operations and mapping

Specialized Data Types

  • {forcats}: Working with factors and categorical data
  • {lubridate}: Dates and times
  • {stringr}: Text manipulation and regular expressions

Reproducible Research

  • {Quarto}: Documents, slides, websites
  • Integrating code, text, and outputs
  • Professional presentation of results

Programming Fundamentals

  • Writing custom functions
  • Iteration with for loops
  • Building reusable code

How to Continue Using These Skills

These tools are not just for this course—they’re industry standards you’ll use throughout your career:

In Research & Academia

  • Reproducible analyses for publications
  • Data management for long-term projects
  • Collaborative research with version control

In Industry & Government

  • Data analysis and reporting
  • Environmental monitoring and assessment
  • Policy support and decision-making

Building Your Career

  • Showcase projects on GitHub
  • Create impactful documents with Quarto
  • Demonstrate data science skills to employers

Continue learning

Practice

Mastery comes through practice. Keep using these tools in your research, internships, and personal projects!

Community

Other resources

References

Wickham, Hadley, Mine Cetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, CA: O’Reilly Media.