Day 2

Quick Comment: Re-compiling Lecture Slides

  • While click-through slides are useful for presentations, they may not be the best for review
  • You can Knit the .Rmd file for the slides into a more conveient format
    • You can turn incremental off
    • You can output it to an md file
    • You can output it as beamer_presentation

Quick Comment: Files on GitHub

Log onto your test-assignment GitHub repo

  • Last ditch homework submission:
    • If you have problems pushing your work, upload it to the appropriate repo with the Upload Files button
  • “I don’t want to clone your repository but I want your code”:
    • Click on test-activity.Rmd. GitHub recognized basic Markdown formatting, but can’t run R code chunks
    • To view .Rmd code: Click the Raw button on right side
    • To download: Right click and select “Save As” and save as .Rmd (not .txt) file.
    • OR, just copy and paste into a new R Markdown file.
  • Don’t edit submitted assignment files after submission!

Replicability vs. Reproducibility

  • Scientific findings should be replicatable
    • Ann repeats Bob’s lab experiment and gets different data but makes the same conclusions as Bob
  • Statistical findings should be reproducible
    • Ann takes Bob’s data and gets the exact same statistical results as Bob
  • Statistical findings should be easily reproducible
    • Ann only needs to hit one button to reproduce Bob’s results.
    • Ann only needs to hit one button to reproduce Bob’s analysis on a new data set

Email from your boss:

Hi [Insert your name]
Your report was very interesting, but you used an old version of the data.  
Our newest loan data has 1000 loan cases, not 700 like you used.  

I'm really sorry you did all that work on the wrong dataset, but I expect you will 
easily be able to rerun your analysis on the newest data and get me a new report shortly. 

LOL, 
Barb

Open up Rstudio

  • Open your test-assignment Rstudio Project
    • Click on the Files tab (lower right window)
    • Check the box by test-assignment.Rmd
    • Select menu More > Copy and rename it Barb.Rmd
  • Make this change to your first R chunk then knit your .Rmd:
> loans <- read.csv("https://raw.githubusercontent.com/mgelman/data/master/CreditData.csv")
  • This loads the correct (1000 case) data set.
  • Do your written answers match the R results?!

Full disclosure to avoid plagarism charges

  • Some of my ideas and comments in these slides were taken from Karl Broman’s talk on “Steps toward Reproducible Research” that he presented at JSM 2016 in Chicago.
  • And from Yihui Xie’s (creator of knitr) evil classroom assignment idea.

Making your work reproducible

  • You need a scriptable program (e.g. R, Python)
    • forces you to record the linear sequence of events in an analysis
    • should be able to retrace your steps
    • avoid point-n-click!
    • avoid any “by hand” actions (e.g. data cleaning in Excel)

But scriptable doesn’t always mean reproducible

  • You should make your workflow (“soup to nuts”) transparent and easily followed
    • meaningful file and variable names
    • don’t overly complicate code, use packages when only when needed (the fewer dependencies the better)
    • only relevant code included
    • written description of your analysis process and results alongside your code

Why should we strive for reproducibility?

  • Karl Broman paraphrasing Mark Holder:
    • Your closest collaborator is you six months ago, but you don’t reply to emails.
    • You need to document your workflow for both yourself and current/future collaborators
  • Can you open one of your old stats homework scripts and understand
    • the data used?
    • the questions you were trying to answer?
    • what your results mean?
  • Statistical Consulting
    • from term to term we can continue, repeat or redo projects!

Reproducibility using R Markdown

  • R Markdown is a literate programming language that integrates R code, results and write-up.
    • literate = it is readable and easy to learn
  • There are other programs, like Sweave, that also integrate R code, results and write-up.
    • not so readable and steeper learning curve

R Markdown: how it works

  1. You create a .Rmd file
  2. You click the knit button in Rstudio (or run the rmarkdown package command render("my.Rmd")).
  3. A compiled html/pdf/word/… document magically appears
  • Behind the scenes:
    • rmarkdown uses knitr package to create a .md (Markdown) file
    • the document converter pandoc takes the .md file and creates a html/pdf/word… document!

Writing a R Markdown document

1. Formatting text

  • References: Hadley’s page, Rstudio page, RStudio cheatsheet
  • Simple rules for
    • section headers (#,##,etc)
    • lists (need ~2 tabs to create sublists)
    • formatting (bold **, italics *)
    • tables
    • R syntax (use backward tick `)
    • web links [linked text](url)
    • latex math equations \(\beta_1 + \beta_2\)
    • in HTML docs, you can use HTML commands (in pdf, latex commands)

Writing a R Markdown document

2. R chunks

  • Begin with ```{r} and end with ```
  • Can run code within chunk in the R console
    • as an entire chunk
    • line by line

Writing a R Markdown document

2. R chunk options

  • many options control the output produced/shown by a chunk
```{r, options}
code stuff
```
  • echo=FALSE omits R code but not results
  • include=FALSE omits R code and results
  • eval=FALSE code chunk is not run, but code displayed
  • results='hide' or fig.show='hide' runs code but suppresses output
  • message=FALSE or warning=FALSE suppresses messages/warnings in output
  • error=TRUE knits doc even with a code error

Writing a R Markdown document

2. R chunk global options

  • Set global document chunk options at the start of your file:
```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE, prompt=TRUE, comment=NULL, 
eval=TRUE, message=FALSE, include=TRUE)
```
  • evaluate R chunks and include results, suppress (package) messages
  • prompt=TRUE shows > prompt in front of commands
  • comment=NULL no comment ## in front of output
  • collapse=TRUE compresses code/results in the chunk output

Writing a R Markdown document

2. R chunk names

  • Name your code chunks:
```{r ChunkName1} 
my fancy code chunk
```
  • can see in drop-down menu (lower left)
  • can reuse named chunks by just referencing chunk name
```{r ChunkName1} 
```

Writing a R Markdown document

2. Inline code

  • You can embed one R code command in your text by wrapping it in `r followed by `
    • .Rmd file: The square root of 77 is `r sqrt(77)`
    • .md file: The square root of 77 is 8.7749644
  • Steps towards reproducibility
    • Use inline commands when reporting results/stats in case your data changes…

Writing a R Markdown document

3. Header

  • The YAML header is always positioned at the top of a .Rmd, surrounded by ---
  • Basic elements are title, author and date.

Writing a R Markdown document

3. Header

  • output tells pandoc what form the final document should take
    • html_document (can’t view in GitHub)
    • pdf_document (need MikTex or MacTex installed)
    • github_document (creates a .md Markdown doc)
    • ioslides_presentation, beamer_presentation
  • each output type can be further refined with formatting options
    • e.g. for these slides I specify widescreen=true, incremental=true, and keep_md=false
    • for documents, you might use fig.caption=true
  • for help: in Rstudio select Help > Cheatsheets > R Markdown Reference Guide

Final comments

  • Start working towards writing reproducible analysis
    • doesn’t happen overnight
    • using R Markdown is a good start!
    • I don’t expect you to use inline code for all your assignments