1 Setup & intro
1 Bootcamp Introduction (first day with R
)
This page is intended to guide people during their first installation and use of R and RStudio.
1.1 Objectives
Here is what we will work on:
How the
R
Stats Bootcamp worksR motivation
Install
R
andRStudio
or set upRStudio Cloud
RStudio
components and setupWorkflow for scripts in
R
Practice exercises
2 How the R
Stats Bootcamp works
The R
Stats Bootcamp aims to provide practical, open instructional materials to support learning the R
programming language, to review simple statistics in R
, and to introduce reproducibility and collaboration tools. The content is a blend of practical, referenced material with videos and self-assessment.
We also a have a friendly community Slack channel - go over and introduce yourself and say hi!
3 R
motivation
The motivation for using R
is that it is designed to help people with no programming experience to perform sophisticated statistical analysis with minimum effort. R
has grown in popularity recently and is used extensively by universities, companies, and researchers everywhere. Because of this, there is a very large community of users and a demand in business and academia for skills using R
.
R
is free and open source. R
is easy to learn and works the same for folks with fast and slow computers, no matter what kind of operating system or computer they like to use, and it is easy to use via the web on any device.
4 Install R
and RStudio
or set up RStudio Cloud
You have two options for following along with these materials as they are intended.
Option 1 Download and install R from CRAN and then download and install RStudio desktop.
Install R
first, then RStudio
. It is probably a good idea to go ahead and install the latest version of each if you have older versions installed. If you have a PC or laptop you regularly use, this option is probably best and will work for almost all hardware and operating systems.
Option 2 If you can’t install R
or do not wish to, or if you prefer to work in “the cloud” from a web browser, you may wish to start a free account at RStudio Cloud and follow along that way.
5 RStudio
components and setup
RStudio desktop is an environment to write R
code, perform statistical analysis, organize big or small projects with multiple files, and view and organize outputs. There are many features of RStudio
, but we are only going to point out a few. One of the most useful features is syntax highlighting
, that gives visual cues to help you write computer code.
Be aware (beware?) of:
The Script window
The script window is located in the upper left of the RStudio
console by default. You may need to open a script or start a new one: File > New File > R Script (hotkey Ctrl+Shift+N
).
The script window is where you are likely to spend most of your time building scripts and executing commands you write. You can have many scripts open at the same time (in “tabs”), and you can have different kinds of scripts, e.g., for different parts of a project or even for programming languages.
The Console window
The Console window is in the lower left by default. Notice there are several other tabs visible, but we will only mention the Console for now. The Console is the place where text outputs will be printed (e.g. the results of statistical tests), and also is a place where R will print Warning and Error messages.
The Global Environment
The Global Environment is in the Environment tab in the upper right of RStudio by default. This pane is useful in displaying data objects that you have loaded and available.
The Plots window
The Plots window is a tab in the lower right by default. This is the place where graphics output is displayed and where plots can be named, resized, copied and saved. There are some other important tabs here as well, which you can also explore. When a new plot is produced, the Plots tab will become active.
6 Workflow for R scripts
Script setup
An R
script is a plain text file where the file name ends in “dot R” (.R
) by default.
An R
script serves several purposes:
First, it documents your analysis allowing it to be reproduced exactly by yourself (your future self!) or by others like collaborators, friends, colleagues, your professor, your student, etc.
Second, it is the interface between your commands and R software.
A goal is that your scripts should contain only important R
commands and information, in an organized and logical way that has meaning for other people, maybe for people you have never spoken to. A typical way to achieve this is to organize every script according to the same plan.
Your
R
script should be a file good enough to show to a person in the future (like a supervisor, or even your future self). Someone who can help you, but also someone who you may not be able to explain the contents to. The script should be documented and complete. Think of this future person as a friend you respect.
Although there are many ways to achieve this, for the purposes of the Bootcamp we strongly encourage you to organize you scripts like this:
Header
Contents
One separate section for each item of contents
Header
Start every script with a Header, that contains your name, the date of the most recent edit, and a short description of the PURPOSE of the script.
# A typical script Header
## HEADER ####
## Who: <your name>
## What: My first script
## Last edited: yyyy-mm-dd (ISO 8601 date format... Google it!)
####
6.1 Contents
A Contents section should also be present near the top, to provide a road map for the analysis.
# A typical script Contents section
## CONTENTS ####
## 00 Setup
## 01 Graphs
## 02 Analysis
## 03 Etc
6.2 Section for each item of contents
Finally, code chunk breaks should be used to aid the readability of the script and to provide a section for each item in your table of contents. A code chunk is just a section of code set off from other sections.
Below is the beginning of a typical code chunk in an R script.
Code chunks must start with at least one hash sign “#”,
should have a title descriptive of code chunk contents,
and end with (at least) four hash signs “####”
consecutively numbered titles makes things very tidy
## 01 This here is the first line of MY CODE CHUNK ####
We will practice each of these components.
6.4 Submitting commands
A final thing that must be mentioned here is how to actually submit commands in your R script for R to execute. There are a few ways to do this.
run the whole line of code your cursor rests on (no selection)
Ctrl+Enter
(Cmd+Return in Macs)run code you have selected with your cursor
Ctrl+Enter
(Cmd+Return in Macs).Use the “Run” button along the top of the Script window
Run code from the menu
Code > Run Selected Line(s)
.
7 Practice exercises
7.1
Download this script and open it with RStudio. Save the script in a specific folder on your computer that you can find again and where you will save other scripts for the Bootcamp.
Read the script comments and examine the structure of the code chunks. Run the code in the script using one of the methods above, and examine the output in the Console window.
7.2
Add a code chunk title to your CONTENTS section and to your script. Make sure to write brief comments for your code. Add the following code to your chunk run it and examine the output:
Don’t worry about understanding the code for now. We are just working on interfacing with R
and submitting commands.
# Create a new variable
<- c(6.5, 1.35, 3.5)
my_variable
# Calculate the mean of my_variable
mean(my_variable)
# Calculate the standard deviation of my_variable
sd(my_variable)
Harper Adams Data Science
This website supports students and staff at Harper Adams University and the MSc in Data Science for Global Agriculture, Food, and Environment led by Ed Harris.
6.3 Comments
Comments are messages that explain code in your script, and they should be used throughout every script. You can think of comments like the methods section of a scientific paper - there should be enough detail to exactly replicate and understand the script, but it should also be concise.
Comment lines begin with the
#
character and are not treated as “code” byR
.