mex_fisheries

Mexican Fisheries Data

DOI

Use the DOI link provided above to cite this version (recommended). You can cite all versions by using the DOI 10.5281/zenodo.10641018. This DOI represents all versions, and will always resolve to the latest one.

This repository contains the code to clean and maintain what I call the Mexican fisheries data set. This data set contains tables on Mexico’s Vessel Monitoring System (VMS) tracking data, a vessel registry, landings data, and some other things. Data themselves are NOT archived in the repository due to GitHub’s size constraints. I am using git-lfs, which you will have to install if you want to clone or fork the repository.

I am more than happy to share any and all these data and see them put to a good use. If you want to come up with a way of automating the delivery of the data, please reach out to me. I simply don’t have the time.

Please submit an issue or send me an email if you encounter any issues.

1) Mexican VMS data (2007 - 2025 [partial])

Raw sources

“Clean” data vailability, with two different levels of processing

Note that BigQuery data use a standard versioning system every time the tables undergo a major change, like fixing bugs, adding data, or modifying the underlying cleaning code. Past versions include:

You should be able to access the entire date set using BigQuery (SQL) or R. The following code snippet shows how you might connect to the database:

# Load packages ----------------------------------------------------------------
pacman::p_load(
  bigrquery,
  DBI,
  tidyverse
)

bq_auth("juancarlos.villader@gmail.com") # You'll need to authenticate using your own email

# Establish a connection -------------------------------------------------------
con <- dbConnect(bigquery(),
                 project = "mex-fisheries", # This is the name of the project, leave it as-is
                 dataset = "mex_vms",       # This is the name of the dataset, leave it as-is
                 billing = "your-billing-id-here", # And this is the blling. You will need to use yours here.
                 use_legacy_sql = FALSE, 
                 allowLargeResults = TRUE)
  
mex_vms <- tbl(con, "mex_vms_processed_latest") # This object now contains a tbl that points at mex_vms_processed_v_20250319

# That's it, you can now use dplyr verbs to work with the data.
# For example, get latitude, longitude, and vessel id for the first 1000 rows in the data
mex_vms |> 
    select(vessel_rnpa, lat, lon) |> 
    head(1000) |> 
    collect()

NOTE: For details on the data cleaning, next steps, and know issues, see the dedicated README. File may not be up to date

2) Vessel registry

Raw data sources

“Clean” data availability

3) Landings data [~2000-2025 (partial)]

Raw data sources

Clean” data availability

I use the CONAPESCA Avisos (2000-2019) and CONAPESCA (2018-present) to build the data sets listed below. The pipeline is found under scripts/mex_landings/

Data by economic unit RNPA

Data by vessel RNPA

4) Subsidy data (coming soon)

Sources

Availability

5) Mexican TURFs (coming soon, but see Ere’s paper here)

Sources

Availability

6) Misc spatial features

Sources

How do all these pieces come together?

There is a Makefile outlining dependencies and order of operations, and the DAG is shown here: