Nov 1st, 2017

Outline

  • What and Why
  • Getting started
    • Syntax and basic elements
  • Hands-on
    • Using code to produce output
  • Show and tell

What and why?

What is this?

  • Markdown is:
    • a lightweight markup language
    • easy-to-read, easy-to-write plain text that gets converted into HTML
  • R Markdown is:
    • The above + R code embedded to create graphs, tables, values… any type of R output
  • knitr, which is an R package, transforms *.Rmd into *.md, which is then processed by pandoc

Keep in mind

"But remember, Markdown was designed for HTML, and LaTeX was for PDF and related output formats."

Yihui Xie

Why?

  • It is great for research
    • Keeps everything together (analysis, references, text, data…)
    • Easier to iterate over your work (i.e. easier to incorporate suggestions by Reviewer #2)
  • It fosters reproducibility

Getting started (just text)

New Rmd document and types of output

  • Slides
    • ioslides
    • slidy presentations
    • beamer*
  • HTML
    • HTML notebook
  • PDF
  • Word
  • Shiny (interactive documents)
    • Document
    • Presentation

YAML

  • Set of key: and value pairs
  • Contains variables/parameters passed to knitr and pandoc to control output
  • The default one looks like this:
---
title: "Introduction to R Markdown"
author: "Villaseñor-Derbez, J.C."
date: "November 1, 2017"
output: ioslides_presentation
---

YAML

  • You can also specify many other things:
  • fig_caption: yes adds figure captions from chunks
  • toc: yes adds a table of contents
  • subtitle:"Fancy subtitle" adds a subtitle
  • bibliography: references.bib specifies where to look for BibTeX entries
  • csl: plos.csl specifies citation format (in this case, PLoS ONE's)

See options for HTML and options for PDF output

Basic syntax

  • Use # to indicate headers
  • *single asterisks* give me single asterisks
  • **double asterisks** give me double asterisks
  • m s^-2^ is then m s-2
  • You can write LaTeX equations by wrapping an expression in $:
    • $E = mC^2$ is \(E = mC^2\)
  • Write Greek letters with $\letter$:
    • $\beta$ gives you \(\beta\)
  • Same for fancier equations:
    • $$\hat{Y} = \sum_{i = 1}^N\frac{\beta_a^4}{\phi \times \Omega} + \beta_b$$ is just: \[\hat{Y} = \sum_{i = 1}^N\frac{\beta_a^4}{\phi \times \Omega} + \beta_b\]

Basic syntax: bullets

  • Bullets can be specified by -, +, or *

This:

* Item 1
* Item 2
    + Item 2a
    + Item 2b

Gives:

  • Item 1
  • Item 2
    • Item 2a
    • Item 2b

Basic synax: numbered bullets

This

1. Item 1
2. Item 2
3. Item 3
    + Item 3a
    + Item 3b

Gives:

  1. Item 1
  2. Item 2
  3. Item 3
    • Item 3a
    • Item 3b

Links and Images

Hands-on

Code chunks and in-line code

  • There were "r nrow(mtcars)" cars studied
  • There were 32 cars studied

  • Used to isolate plain text from code
  • Allow you to execute R, Python, Rcpp, SQL, and Stan within your file
  • Use Ctrl + Alt + I (Windows) Cmd + Alt + I (Mac) to insert
  • Always name your chunks!

References

You need:

  • A BibTeX file (provided in the CourseMaterials folder)
  • Specify bibliography: and csl: in the header
  • That's it…
---
bibliography: references.bib
csl: Citation_styles/ieee.csl
---

"A tidy dataset has one column per variable" [@wickham_2014]

"A tidy dataset has one column per variable" [1]

References

[1] H. Wickham, “Tidy data,” J Stat Softw, vol. 59, no. 10, 2014.

Load packages

suppressPackageStartupMessages({
  library(stargazer)
  library(knitr)
  library(kableExtra)
})

Code chunk options

  • echo hides code from output
  • eval evaluate chunk?
  • fig.width
  • fig.height
  • fig.cap To add figure caption

echo = TRUE

model <- lm(mpg ~ disp, mtcars) #Fit a linear model
summary(model) # Look at the summary of the model
## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8922 -2.2022 -0.9631  1.6272  7.2305 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
## disp        -0.041215   0.004712  -8.747 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared:  0.7183, Adjusted R-squared:  0.709 
## F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

echo = FALSE

## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8922 -2.2022 -0.9631  1.6272  7.2305 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
## disp        -0.041215   0.004712  -8.747 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared:  0.7183, Adjusted R-squared:  0.709 
## F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

Fancier way to report models with stargazer

stargazer::stargazer(model, type = "html") #Create a regression table of the model
Dependent variable:
mpg
disp -0.041***
(0.005)
Constant 29.600***
(1.230)
Observations 32
R2 0.718
Adjusted R2 0.709
Residual Std. Error 3.251 (df = 30)
F Statistic 76.513*** (df = 1; 30)
Note: p<0.1; p<0.05; p<0.01

Fancier way to report models with stargazer

# Create a customized regression table of the model
stargazer::stargazer(model,
                     title = "Results of regressing miles per galon on displacement",
                     type = "html",
                     single.row = T,
                     covariate.labels = "Displacement (cu. in.)",
                     omit.stat = "adj.rsq")
Results of regressing miles per galon on displacement
Dependent variable:
mpg
Displacement (cu. in.) -0.041*** (0.005)
Constant 29.600*** (1.230)
Observations 32
R2 0.718
Residual Std. Error 3.251 (df = 30)
F Statistic 76.513*** (df = 1; 30)
Note: p<0.1; p<0.05; p<0.01

Figures with code

plot(mtcars$disp, mtcars$mpg, col = mtcars$cyl, pch = 20, xlab = "Disp (cu. in.)", ylab = "mpg") # Plot the model
This is my caption

This is my caption

Tables

  • You can create raw markdown tables:

You type this:

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Cell

It looks like this:

First Header Second Header
Content Cell Content Cell
Content Cell Content Cell

Tables

  • The above was not great for tables that summarize data, instead, use knitr::kable()
  • Using the taxa_table.csv file in the CourseMaterials

Tables

taxa_table <- read.csv("Data/taxa_table.csv") #Load the data

knitr::kable(taxa_table) # Create a table of the data
Family Genus Species Count Surfers
Pomacentridae Hypsypops rubicundus 1 20
Polyprionidae Stereolepis gigas 2 18
Scorpaenidae Sebastes hopkinsi 30 10
Scorpaenidae Sebastes paucispinis 45 1
Scorpaenidae Pterois volitans 62 2
Serranidae Paralabrax clathratus 12 8
Serranidae Paralabrax nebulifer 15 12

California's State Fish

Juvenile Garibaldi (Hypsypops rubicundus)

  • October 15, 1995, Adopted as the California State Marine Fish

Fancier tables with kableExtra

But scientific names (Genus species) are supposed to be in italics!

  • Specify format by columns
knitr::kable(taxa_table, format = "html") %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(column = 2, italic = T) %>% #specify column styles
  kableExtra::column_spec(column = 3, italic = T) %>% 
  kableExtra::column_spec(column = 4, bold = T)
Family Genus Species Count Surfers
Pomacentridae Hypsypops rubicundus 1 20
Polyprionidae Stereolepis gigas 2 18
Scorpaenidae Sebastes hopkinsi 30 10
Scorpaenidae Sebastes paucispinis 45 1
Scorpaenidae Pterois volitans 62 2
Serranidae Paralabrax clathratus 12 8
Serranidae Paralabrax nebulifer 15 12

Fancier tables with kableExtra

  • Collapse rows
knitr::kable(taxa_table, format = "html") %>% 
  kableExtra::kable_styling() %>% 
  kableExtra::column_spec(column = 2, italic = T) %>% # Specify column styles
  kableExtra::column_spec(column = 3, italic = T) %>% 
  kableExtra::column_spec(column = 4, bold = T) %>% 
  kableExtra::collapse_rows(columns = c(1,2)) #collapse some rowws to delete obvious information
Family Genus Species Count Surfers
Pomacentridae Hypsypops rubicundus 1 20
Polyprionidae Stereolepis gigas 2 18
Scorpaenidae Sebastes hopkinsi 30 10
paucispinis 45 1
Pterois volitans 62 2
Serranidae Paralabrax clathratus 12 8
nebulifer 15 12

If there is time

Should be less than ~7 minutes

  • Create an html document:
    • Title
    • Subtitle
    • Author
    • Headers and subheader
    • A list with bullets
    • Text with old and itallics
    • 1 plot (nothing meaningful, invent something or use existing data)
    • 1 knitr + kableExtra table
  • Extra:
    • Include an equation
    • Code doesn't show
    • Include at least 1 reference from the BibTex file

Resources

Resources

Show and tell

Show and tell

  • <div id="refs"></div> to specify placement of references (i.e. when you are writing a manuscript)
  • To add line numbers and double-spacing to your manuscript, add to the YAML:
header-includes:
- \usepackage{setspace}
- \doublespacing
- \usepackage{lineno}
- \linenumbers