- Motivation: the problem
- Common solutions
- GNU Make
- Work together
4/27/2021
We’ll simulate everyone’s nightmare:
The solution lies in the problem
The problem lies in the solution to the original problem
data |_raw_data.csv |_clean_data.csv scripts |_clean_data.R |_plot_data.R results |_figure1.png
lapply
# Identify all scripts scripts <- list.files(path = "scripts", pattern = "*.R") # Run them all lapply(scripts, source)
purrr
+ furrr
# Identify all scripts scripts <- list.files(path = "scripts", pattern = "*.R") # Run them all... in parallel plan(multisession, workers = 4) # Use four cores future_walk(scripts, source) # Walk through files
run_all.R
Have a script:
# Run all scripts # This script runs all the code in my project, from scratch source("scripts/clean_data.R") # To clean the data source("scripts/plot_data.R") # To plot the data
And either call source(run_all.R)
or manually source the ones that we think we need to run.
Do I even need to actually run all?
What if variables / values are left in my environment?
It worked when I wrote it, but not anymore
What if the timing isn’t correct?
make
make
and Makefile
From GNU
’s website:
“GNU Make is a tool which controls the generation of executables and other non-source files of a program from the program’s source files.”
make
“looks” for a file called Makefile
Makefile
, listing all the good stufftarget: prerequisite command
taco: recipe fridge/tortilla fridge/meat fridge/salsa follow recipe happiness: taco #A target can be a prerequisite eat taco
data |_raw_data.csv |_clean_data.csv scripts |_clean_data.R |_plot_data.R results |_figure1.png
Makefile
for this project?target: prerequisites command
results/figure1.png: scripts/plot_data.R data/clean_data.csv Rscript scripts/plot_data.R data/clean_data.csv: scripts/clean_data.R data/raw_data.csv Rscript scripts/clean_data.R
And one command does it all: make
For this, we need graphviz and makefile2graph
make -Bnd | make2graph | dot -Tpng -o makefile-dag.png
-Bnd
tells make
to “B
: Unconditionally make all targets, n
: just print, d
: print debug info”|
is just the OG pipe-Tpng
tells dot
“T
use the following output format: png
”make -Bnd | make2graph | dot -Tpng -o makefile-dag.png
$@
: the file name of the target$<
: the name of the first prerequisite$^
: the names of all prerequisites$(@D)
: the directory part of the target$(@F)
: the file part of the target$(<D)
: the directory part of the first prerequisite$(<F)
: the file part of the first prerequisite$(<D)
and $(<F)
results/figure1.png: scripts/plot_data.R data/clean_data.csv Rscript scripts/plot_data.R
Can be written as
results/figure1.png: scripts/plot_data.R data/clean_data.csv cd $(<D);Rscript $(<F)
Imagine you have something like this:
results/figure1.png: scripts/figure1.R Rscript scripts/figure1.R results/figure2.png: scripts/figure2.R Rscript scripts/figure2.R results/figure3.png: scripts/figure3.R Rscript scripts/figure3.R results/figure4.png: scripts/figure4.R Rscript scripts/figure4.R results/figure5.png: scripts/figure5.R Rscript scripts/figure5.R . . . . . .
Writing out this chunk \(n\) times presents opportunity for \(\alpha n\) errors:
results/figure1.png: scripts/figure1.R cd $(<D);Rscript $(<F)
Instead, we use pattern rules with %
, which is equivalent to *
results/%.png: scripts/%.R cd $(<D);Rscript $(<F)
(I think it’s cool that the workflow induces standardization)
You can include them:
R_OPTS=--no-save --no-restore --no-init-file --no-site-file results/%.png: scripts/%.R cd $(<D);Rscript $(R_OPTS) $(<F)
The repo is here: https://github.com/jcvdav/make_tutorial
draft.Rmd draft.html scripts |_00_clean_data.R |_01_figure_1.R |_02_figure_2.R |_03_regression.R results |_img | |_ first_year.png | |_ time_sereies.png |_tab |_ reg_table.html
make
the projectmake
ing the project (end-to-end update)make
int the project once again (end-to-end update)