- Motivation: the problem
- Common solutions
- GNU Make
- Work together
4/27/2021
We’ll simulate everyone’s nightmare:
The solution lies in the problem
The problem lies in the solution to the original problem
data
|_raw_data.csv
|_clean_data.csv
scripts
|_clean_data.R
|_plot_data.R
results
|_figure1.png
lapply# Identify all scripts
scripts <- list.files(path = "scripts",
pattern = "*.R")
# Run them all
lapply(scripts, source)
purrr + furrr# Identify all scripts
scripts <- list.files(path = "scripts",
pattern = "*.R")
# Run them all... in parallel
plan(multisession, workers = 4) # Use four cores
future_walk(scripts, source) # Walk through files
run_all.RHave a script:
# Run all scripts
# This script runs all the code in my project, from scratch
source("scripts/clean_data.R") # To clean the data
source("scripts/plot_data.R") # To plot the data
And either call source(run_all.R) or manually source the ones that we think we need to run.
Do I even need to actually run all?
What if variables / values are left in my environment?
It worked when I wrote it, but not anymore
What if the timing isn’t correct?
makemake and MakefileFrom GNU’s website:
“GNU Make is a tool which controls the generation of executables and other non-source files of a program from the program’s source files.”
make “looks” for a file called MakefileMakefile, listing all the good stufftarget: prerequisite command
taco: recipe fridge/tortilla fridge/meat fridge/salsa follow recipe happiness: taco #A target can be a prerequisite eat taco
data
|_raw_data.csv
|_clean_data.csv
scripts
|_clean_data.R
|_plot_data.R
results
|_figure1.png
Makefile for this project?target: prerequisites command
results/figure1.png: scripts/plot_data.R data/clean_data.csv Rscript scripts/plot_data.R data/clean_data.csv: scripts/clean_data.R data/raw_data.csv Rscript scripts/clean_data.R
And one command does it all: make
For this, we need graphviz and makefile2graph
make -Bnd | make2graph | dot -Tpng -o makefile-dag.png
-Bnd tells make to “B: Unconditionally make all targets, n: just print, d: print debug info”| is just the OG pipe-Tpng tells dot “T use the following output format: png”make -Bnd | make2graph | dot -Tpng -o makefile-dag.png
$@: the file name of the target$<: the name of the first prerequisite$^: the names of all prerequisites$(@D): the directory part of the target$(@F): the file part of the target$(<D): the directory part of the first prerequisite$(<F): the file part of the first prerequisite$(<D) and $(<F)results/figure1.png: scripts/plot_data.R data/clean_data.csv Rscript scripts/plot_data.R
Can be written as
results/figure1.png: scripts/plot_data.R data/clean_data.csv cd $(<D);Rscript $(<F)
Imagine you have something like this:
results/figure1.png: scripts/figure1.R Rscript scripts/figure1.R results/figure2.png: scripts/figure2.R Rscript scripts/figure2.R results/figure3.png: scripts/figure3.R Rscript scripts/figure3.R results/figure4.png: scripts/figure4.R Rscript scripts/figure4.R results/figure5.png: scripts/figure5.R Rscript scripts/figure5.R . . . . . .
Writing out this chunk \(n\) times presents opportunity for \(\alpha n\) errors:
results/figure1.png: scripts/figure1.R cd $(<D);Rscript $(<F)
Instead, we use pattern rules with %, which is equivalent to *
results/%.png: scripts/%.R
cd $(<D);Rscript $(<F)
(I think it’s cool that the workflow induces standardization)
You can include them:
R_OPTS=--no-save --no-restore --no-init-file --no-site-file
results/%.png: scripts/%.R
cd $(<D);Rscript $(R_OPTS) $(<F)
The repo is here: https://github.com/jcvdav/make_tutorial
draft.Rmd
draft.html
scripts
|_00_clean_data.R
|_01_figure_1.R
|_02_figure_2.R
|_03_regression.R
results
|_img
| |_ first_year.png
| |_ time_sereies.png
|_tab
|_ reg_table.html
make the projectmakeing the project (end-to-end update)makeint the project once again (end-to-end update)