2 Using R Markdown for reproducible research
- Opening and saving an R Notebook
- Basic layout in R Markdown
2.1 Reproducibility in research
Usually, analysing data and generating the report are two separate tasks. First, you analyse your data (hopefully in R đ), and then you describe your methods, results and conclusions in a text document. However, this procedure is error-prone and not reproducible. You have to copy-paste results from R into tables or include figures in your word processor. The connection between the analysis code, the results and the report is lost.
Donald Knuth, the creator of TEX, suggested the idea of literate programming, where analysis code and the report are combined in one document (Knuth 1984). This kind of document is human-centred and allows to better understand the analysis. It helps to generate completely reproducible data analyses.
2.2 Combining code and report in one document
We will use R Markdown to combine analysis code and report in one reproducible document. In general, R Markdown can produce different output documents (html, word, pdf, or slides). However, in this course we will concentrate on html output and use so-called R Notebooks (mostly).
2.2.1 Create a new R Notebook
To create a new R Notebook, click on the little green plus or click on File
on the upper left hand and select R Notebook as in the image below. Save your notebook in the subfolder notebooks.
In contrast to a new R script, a new notebook has some template text and example R code in grey boxes called chunks. Have a look at this template text. It provides basic example of layout and R code chunks.
2.2.2 Customize the header
Every R Markdown document starts with a header. It is enclosed between two lines of ---
signs. Inside the header, you find some (blue) keywords like title:
and output:
. Letâs customize the header for our needs:
Change the title at the top to âGetting to know R Notebooksâ. Be sure to keep the quotation marks.
Add an author line and put your name there in quotation marks.
Additionally, you might want to add the date. The syntax is date: "some date here"
.
2.2.3 Structure your notebook
Structure with headers and subheaders helps to orginze content and ideas. To add a header, put #
followed by the header title. Be sure to include a space between # and the text! A subheader is produced with ##
and a subsubheader with ###
. Be sure to include a space before the header text!
Delete the template text and structure your notebook. Your final result should look something like this:
2.2.4 Preview
Notebooks have the great advantage to offer the preview of your work. Just click the Preview
button. The preview is refreshed every time you save your notebook.
Inspect the preview of your notebook to see how your formatting with headers and subheaders affects the output. There more layout elements, and you will experiment with them in the exercises.
2.2.5 Other output options
You can also produce different outputs from your R Notebook because it is a normal R Markdown file and supports different output formats. However, if you produce an .html output, the Preview
button will disappear! To bring it back, you need to edit the header of your R Notebook file to output: html_notebook
.
Note that there is now an R Notebook file (.Rmd
) and an
html file (nb.html
) in the Notebooks
folder.
2.3 Entering and running commands
In contrast to text, headings etc. R code is typed in special boxes called chunks. To create an empty chunk to type code, click the little green C on top or type Str + Alt + i
. On an international keyboard, Str
equals Control
and on a Mac the Command
key.
Using your first code chunk, type the following command to create a new variable called x
with the value of 42.
x <- 42
Remember that the arrow <-
is the assignment operator. It generates (or overwrites, if it already exists) the object x
and assigns it the value of 42.
Note the direction of the arrow! It points from the value to the object name.
To Run this command in your console, you can either:
- click on the green triangle in the code chunk on the right or
- highlight the code in the chunk and hit
Str + Enter
(as in an R script).
Note that you now have a new object in your workspace, called x!
2.4 A brief recap of data types
You have created a numeric variable x
. However, you are not restricted to numbers. R can also handle other types of objects, like characters, for example. To tell R that you want to generate a variable containing characters in contrast to numbers, you need to enclose the assigned content in quotes.
Create the following chunk in your notebook and let it run.
day_of_week <- "Sunday"
To generate a more complicated object, namely a numeric vector, we use the command c()
to concatenate several numbers.
v <- c(4.5, 6.234, 10)
Note in the Environment pane that your vector v
contains numbers (listed as num
). The information [1:3] shows you that your vector has three elements, indexed from 1 to 3. Indices indicate the place of an element in the vector. To access and change a particular element, we use its index like
v[2] <- 4.5
Now the second element of your vector equals 4.5. Remember that R will not warn you when changing your objects!
You can calculate with objects as you can with numbers. Letâs divide evely single element of v by 2.
v / 2
2.5 Practice on your own!
When you work on your exercises, please structure your R Notebook with e.g. headers and subheaders for each exercise. Some of the exercises require code and explanation!
Remember to save your work as you go along! Click the save button in the upper left hand corner of the R Markdown window.
-
Answer the following with code in a code chunk (no text necessary). Remember that the code is just instructions for R. You need to run the code chunk to make R execute those instructions!
- Create a variable called
y
with the value of 13. - Multiply
x
byy
, and store the answer in a variable namedz
like so:z <- x * y
- You should now see
day_of_week
,x
,v
,y
, andz
all in your Environment pane.
- Create a variable called
- Run the following mathematical operation in a code chunk:
6 + 3
. - Where does the answer appear?
- Run the following mathematical operation in a code chunk:
- Now add a code chunk, and save the results of
6 + 3
as a variable called a. - Does the answer appear?
- Where does the object
a
show up? - Next type
a
into the code chunk and re-run the code chunk. What happens?
- Now add a code chunk, and save the results of
- Run following command in a new code chunk.
a^2
. - What does the
^
operator do?
- Run following command in a new code chunk.
- Type the following command into a new code chunk.
sum(a, x, y)
-
sum
is a function. Based on the output, what do you think thesum
function does?
- Type the following command into a new code chunk.
- Click the little broom icon in the upper right hand corner of the Environment pane. Click yes on the window that opens.
- What happened?
- Go to the Run button at the top right of the R Markdown pane, and choose Run All (the last option)
- What happened?
Recall the vector
v
we created earlier. Copy, paste and run the following in a code chunk. What does this code accomplish?v + 2
Copy, paste, and run the following code to make a vector called
music
, that contains music genres. Recall a vector is a data object that has multiple elements of the same type. Here the data type is a character. Look in the environment pane. How does R tell us that this vector contains characters, not numbers?music <- c("bluegrass", "funk", "folk")
Now letâs practice some basic formatting. Using this formatting tips page figure out how to put the following into your lab report. These all can get typed into the white section, where text goes. Hint: To put each of these on its own line! hit a hard return between each line of text.
Italicize like this
Bold like this
A superscript: R2