This tutorial is part of a series designed to accompany a course using The Analysis of Biological Data. The rest of the tutorials can be found here.
Learning how to start with R and RStudio
Use the command line
Use functions in R
Data, R scripts, and other resources for these tutorials can be downloaded from here as a .zip file. Please open the ABDLabs folder created by the .zip file in a location on your computer that you can come back to use repeatedly.
R is a computer program that allows an extraordinary range of statistical calculations. It is a free program, mainly written by voluntary contributions from statisticians around the world. R is available on most operating systems, including Windows, Mac OS, and Linux.
R can make graphics and do statistical calculations. It is also a full-fledged computing language. In this series of tutorials, we will only scratch the surface of what R can do.
RStudio is a separate program, also free, that provides a more elegant front end for R. RStudio allows you to easily organize separate windows for R commands, graphic, help, etc. in one place.
If your computer does not already have a version of R and RStudio installed, look at the instructions about getting set up.
If you haven’t done so already, download the folder called ABDLabs. Inside this folder are all the data sets you will need for these tutorials. Also, there is a file in that folder called ABDLabs.Rproj. Double-click on this file to start R and RStudio. If you start R from this file, it will automatically load some packages that will add some useful functionality, and it will tell R to look for files inside the ABDLabs folder. Both of these will let you skip some steps later.
You can also start R and Rstudio directly from the RStudio application. The icon for the application should look something like this:
When you start RStudio, it will automatically start R as well. You run R inside RStudio.
After you have started RStudio, you should see a new window with a menu bar at the top and three main sections. One of the sections is called the “Console” – this is where you type commands to give instructions to R and typically where you see R’s answers to you.
Another important corner of this window can show a variety of information. Most importantly to us, this is where graphics will appear, under the tab marked “Plots”.
When you start RStudio, you’ll see a corner of the window called the “Console.” By the default the console window is in the bottom left of the Rstudio screen.
You can type commands in this window where there is a prompt (which will look like a > sign at the bottom of the window). The Console has to be the selected window. (Clicking anywhere in the Console selects it.)
The > prompt is R’s way of inviting you to give it instructions. You communicate with R by typing commands after the > prompt.
Type “2+2” at the > prompt, and hit return. You’ll see that R can work like a calculator (among its many other powers). It will give you the answer, 4, and it will label that answer with [1] to indicate that it is the first element in the answer. (This is sort of annoying when the answers are simple like this, but can be very valuable when the answers become more complex.)
In these tutorials, the input will show up in a gray box and the output, if any, will follow in a white box.
2+2
## [1] 4
You can use a wide variety of math functions to make calculations here, e.g., log() calculates the log of a number:
log(42)
## [1] 3.73767
(By default, this gives the natural log with base e.)
Parentheses are used both as a way to group elements of the calculation and also as a way to denote the arguments of functions. (The “arguments” of a function are the set of values given to it as input.) For example, log(3) is applying the function log() to the argument 3.
Another mathematical function that often comes in handy is the square root function, sqrt(). For example, the square root of 4 is:
sqrt(4)
## [1] 2
To calculate a value with an exponent, used the ^ symbol. For example 43 is written as:
4^3
## [1] 64
Of course, many math functions can be combined to give an almost infinite possibility of mathematical expressions. For example,
\[\frac{1}{\sqrt{2\pi (3.1)^2}} e^{-\frac{(12-10.7)^2}{2(3.1) }}\]
can be calculated with
(1/(sqrt(2 * pi * (3.1)^2))) * exp(-(12-10.7)^2/(2*3.1))
## [1] 0.09798692
When you analyze your own data, we strongly recommend that you keep a record of all commands used, along with copious notes, so that weeks or years later you can retrace the steps of your earlier analysis.
In RStudio, you can create a text file (sometimes called a script), which contains R commands that can be reloaded and used at a later date. Under the menu at the top, choose “File”, then “New File”, and then “R Script”. This will create a new section in RStudio with the temporary name “Untitled1” (or similar). You can copy and paste any commands that you want from the Console, or type directly here. (When you copy and paste, it’s better to not include the > prompt in the script.)
If you want to keep this script for later, just hit Save under the File menu. In the future you can open this file in all the normal ways to have those commands available for use again.
It is best to type all your commands in the script window and run them from there, rather than typing directly into the console. This lets you save a record of your session so that you can more easily re-create what you have done later.
Most of the work in R is done by functions. A function has a name and one or more arguments. For example, log(4) is a function that calculates the log in base e for the value 4 given as input.
Sometimes functions have optional input arguments. For the function log(), for example, we can specify the optional input argument base to tell the function what base to use for the logarithm. If we don’t specify the base variable, it has a default value of base = e. To get a log in base 10, for example, we would use:
log(4, base = 10)
## [1] 0.60206
In R, we can store information of various sorts by assigning them to variables. For example, if we want to create a variable called x and give it a value of 4, we would write
x <- 4
The middle bit of this—a less than sign and a hyphen typed together to make something that looks a little like a left-facing arrow—tells R to assign the value on the right to the variable on the left. After running the command above, whenever we use x in a command it would be replaced by its value 4. For example, if we add 3 to x, we would expect to get 7.
x + 3
## [1] 7
Variables in R can store more than just simple numbers. They can store lists of numbers, functions, graphics, etc., depending on what values get assigned to the variable.
We can always reassign a new value to a variable. If we now tell R that x is equal to 32
x <- 32
then x takes its new value:
x
## [1] 32
Naming variables and functions in R is pretty flexible.
A name has to start with a letter, but that can be followed by letters or numbers. There can’t be any spaces, though.
Names in R are case-sensitive, which means that Weights and weights are completely different things to R. This is a common and incredibly frustrating source of errors in R.
It’s a good idea to have your names be as descriptive as possible, so that you will know what you meant later on when looking at it. (However, if they get too long, it becomes painful and error prone to type them each time we use them, so this, as with all things, requires moderation.)
Sometimes clear naming means that it is best to have multiple words in the name, but we can’t have spaces. Therefore a common approach is like we saw in the previous section, to chain the words with underscores (not hyphens!), as in weights_before_hospital. (Another solution to make separate words stand out in a variable name is to vary the case: weightsBeforeHospital.)
Using the Student data sheet 1, record the requested information about yourself. This is optional; if you have any reason to not want to record this (relatively innocuous) data about yourself, you do not have to. If you feel that you would like to skip just one of the bits of information and fill in the rest, that is fine too. The data sheets do not identify students by name. Pass the sheet to the instructor when you are finished. We’ll use these data in the tutorial on Graphics.
1. For each week, create an R script that captures the commands that you use to answer the questions. Use a # at the beginning of each comment line.
Open a new R script file. Start by adding comments with your name and the week (Week 1) at the top.
For each of the questions below, write the question number as a comment, followed by any R code you use to do the question, and give the answers as comments.
2. All of the commands used in the “Learning the Tools” section for this lab are in a script called “LearningToolsWeek1.R” in the ABDLabs folder that you should have downloaded. (A similar file will be available for each week of these labs.)
Load this script into RStudio.
Run most or all of the commands in R. Did you get the same answers as shown in the text?
3. For each of the following, come up with a variable name that would be appropriate to use in R for the listed variable:
Body temperature in Celsius
How much aspirin is given per dose for a patient
Number of televisions per person
Height (including neck and extended legs) of giraffes
4. Use R to calculate:
15 x 17
133
log(14) (use the natural log)
log(100) (use base 10)
\(\sqrt{81}\).
Comments
In scripts, it can be very useful to save a bit of text which is not to be evaluated by R. You can leave a note to yourself (or a colleague) about what the next line is supposed to do, what its strengths and limitations are, or anything else you want to remember later. To leave a note, we use “comments”, which are a line of text that starts with the hash symbol #. Anything on a line after a # will be ignored by R.