Use the DOI link provided above to cite this version (recommended). You can cite all versions by using the DOI 10.5281/zenodo.10641018. This DOI represents all versions, and will always resolve to the latest one.
This repository contains the code to clean and maintain what I call the Mexican fisheries data
set. This data set contains tables on Mexico’s Vessel Monitoring System (VMS) tracking data, a vessel registry, landings data, and some other things. Data themselves are NOT archived in the repository due to GitHub’s size constraints. I am using git-lfs
, which you will have to install if you want to clone or fork the repository.
I am more than happy to share any and all these data and see them put to a good use. If you want to come up with a way of automating the delivery of the data, please reach out to me. I simply don’t have the time.
Please submit an issue or send me an email if you encounter any issues.
SISMEP
(Sistema de Monitoreo Satelital de Embarcaciones Pesqueras). It reports the geolocation and timestamp of mexican fishing vessels that comply with Mexico’s fisheries regulation on the matter. Simply put, vessels larger than 10.5m in length overall, with an on-board engine > 80hp, and with a roof must carry a transponder.data/mex_vms/clean/
and on Google Cloud storage at: gs://mex_fisheries/MEX_VMS/*
mex-fisheries.mex_vms.mex_vms_latest
, as a partitioned table (by year) and some level of processing with added features.Note that BigQuery data use a standard versioning system every time the tables undergo a major change, like fixing bugs, adding data, or modifying the underlying cleaning code. Past versions include:
mex-fisheries.mex_vms.mex_vms_processed_v_20250623
<– This is the current version, viewed by mex-fisheries.mex_vms.mex_vms_latest
mex-fisheries.mex_vms.mex_vms_processed_v_20250613
mex-fisheries.mex_vms.mex_vms_processed_v_20250319
mex-fisheries.mex_vms.mex_vms_processed_v_20240615
mex-fisheries.mex_vms.mex_vms_processed_v_20240515
You should be able to access the entire date set using BigQuery (SQL) or R. The following code snippet shows how you might connect to the database:
# Load packages ----------------------------------------------------------------
pacman::p_load(
bigrquery,
DBI,
tidyverse
)
bq_auth("juancarlos.villader@gmail.com") # You'll need to authenticate using your own email
# Establish a connection -------------------------------------------------------
con <- dbConnect(bigquery(),
project = "mex-fisheries", # This is the name of the project, leave it as-is
dataset = "mex_vms", # This is the name of the dataset, leave it as-is
billing = "your-billing-id-here", # And this is the blling. You will need to use yours here.
use_legacy_sql = FALSE,
allowLargeResults = TRUE)
mex_vms <- tbl(con, "mex_vms_processed_latest") # This object now contains a tbl that points at mex_vms_processed_v_20250319
# That's it, you can now use dplyr verbs to work with the data.
# For example, get latitude, longitude, and vessel id for the first 1000 rows in the data
mex_vms |>
select(vessel_rnpa, lat, lon) |>
head(1000) |>
collect()
NOTE: For details on the data cleaning, next steps, and know issues, see the dedicated README. File may not be up to date
mex-fisheries.mex_vms.vessel_info_v_*
I use the CONAPESCA Avisos (2000-2019) and CONAPESCA (2018-present) to build the data sets listed below. The pipeline is found under scripts/mex_landings/
There is a Makefile outlining dependencies and order of operations, and the DAG is shown here: