Home > Teaching > Tutorials > R Tutorial Index
1 ::: R Basics
R is a powerful (and free) statistical software that is making headway into academic, government, and industry uses. It may be downloaded on r-project.org at no cost.
working with directories
accessing and saving files
closing R
comments
variables
vectors
more on vectors
matrices
data frames
workspace clean-up
At all times, R looks in a certain folder (also called 'directory') on the computer. To find which folder R is currently looking in, type getwd(). The folder R looks in may be changed to ease access to files. For instance, if R was looking at the Desktop and all R files are kept in a folder on the Desktop called R files, the following command will make R look in that folder:
setwd("R files")
When working on a Mac, to go to the desktop, type setwd("~/Desktop")To move back or up one folder (for instance, back to the Desktop from a folder on the Desktop), use
setwd("../")
Once R is looking in the folder of interest, files (data, R code, etc) in that folder may be accessed or created. Functions are used to access .R, .txt, or .csv files.
source("file.R") is used to access a file file.R, which will run the code contained in the file (this file should be a plain text file and saved with the extension .R).
read.delim("file.txt", header=T) opens a datafile file.txt that is tab-delimited. header=T tells R that the first row contains the names of the columns of the table.
read.table("file.txt", header=T, sep="\t") is an alternative to read.delim. It offers more control in the file type since it can look at .txt or .csv files. sep="\t" tells R that the file is tab-delimited (use " " for space delimited and "," for comma delimited; use "," for a .csv file).
Also, files may also be accessed in folders outside of the current folder. For example,
source("R folder/code.R") read.table("../data.txt",header=T,sep="\t")
The second line looks "back" in another folder (up in the folder tree).
When done working with R, either close the window or type
q()
R will then prompt whether the workspace (ie, all the variables created) should be saved for the next time R is opened.
When writing code in a file, sometimes it is nice to make comments to clarify what the code is doing. This can be done using the comment character, #. For example, typing
x <- 5
produces the same result as typing
x <- 5 # these words after the # are ignored by R
# will make R ignore everything after it on that line. To write more R code, just go to the next line.
As with most programs, R can do simple computations:
> 3 + 8 [1] 11
(The character > is found at the beginning of every R input line and is not typed but provided automatically in R. This will become clear after working briefly in R.) Other typical calculator-like operations may also be carried out in R. To assign a variable in R, use the assignment operator, <-
> x <- 5 > x [1] 5
= (the equals sign) may be used in place of
<- in recent versions of R. Variables may be used together:
> x <- 4 > y <- 3 > x^y [1] 64
Variables can also be overwritten, as was done above with x. This is just the beginning. Variables may be used to hold all types of objects in R and they come in handy to make R code easier to follow.
The most basic vectors are numbers. The variables x and y above are vectors of length 1. Look back at the output above. There is a [1]. That means that the first element shown on that line is the first element of x. A bracketed number will be displayed on each line of the output from a vector and symbolizes the vector position. Vectors of length greater than 1 may be created by
> c(1, 5, 4) [1] 1 5 4 > x <- c(1, 5, 4) # vectors may be assigned to a variable > x [1] 1 5 4
The length of a vector is only limited by the memory on the computer (although it does take more time to handle longer vectors). Using x above, different parts of x may be looked at separately:
> x[2] [1] 5
Above, the second element of x was called and printed. Using a vector inside the brackets, more than one element of a vector may be called:
> x[c(2,3)] [1] 5 4
Vectors are not limited to being just a set of numbers. They may be contain TRUE/FALSE values or character strings:
> c(TRUE, TRUE, TRUE, FALSE) [1] TRUE TRUE TRUE FALSE > c("these", "are", "each", "character", "strings", "in", + "this", "vector") [1] "these" "are" "each" "character" "strings" "in" [7] "this" "vector"
However, only one type of entry may be used for vectors (for example, if a vector contains both numbers and character strings, then the numbers will be treated like they are character strings.
There are several ways to call elements of a vector. One of the more useful options is to use a vector that is of the same length but only contains the values TRUE and FALSE:
> x[c(FALSE, TRUE, TRUE)] [1] 5 4
Only the elements that correspond to a TRUE value are returned. A TRUE/FALSE vector is known as a Boolean vector.
There are a few tricks to working with vectors that will ease manipulation. For instance, pick two integers and call them integer1 and integer2. Then integer1:integer2 returns a vector listing integer1 to integer2 with steps of 1:
> 1:30 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [20] 20 21 22 23 24 25 26 27 28 29 30 > 1:-5 [1] 1 0 -1 -2 -3 -4 -5
Some other useful functions include seq() (sequence) and rep() (repeat):
> seq(0, 4, 0.5) [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 > rep(5, 3) [1] 5 5 5
Finally, logical expressions will return a vector:
> x > 2 [1] FALSE TRUE TRUE
These logical expressions can come in very handy when only a part of a vector is needed. For example, if only elements that are greater than 2 in x are of interest, then use the following command:
> x[x > 2] [1] 5 4
Because the second and third elements were TRUE, their values were returned.
This section has not been completed.
A data frame is essentially a matrix that is allowed to have different variable types in each column (a single column must contain only one variable type). Data is typically imported as a data frameThis is usually the variable type that data is after it is imported. To load a data frame from a file in the current folder, recall that the following command may be used (this command also stores the data frame into the variable dt):
> dt <- read.table("data_file.txt",header=T,sep="\t")
It is often convenient to have access to each column of the data separately. Suppose the column names of dt are age and height. To make these two columns available, the data must be attached:
> attach(dt)
Now the columns can be accessed by their names.
> height [1] 62 60 69 67 71 66 68
(In this example, there are only 7 rows in the data frame so there are only 7 entries in height.) Notice that height is treated like a vector now. To detach data, use the function detach() and specify the data frame to be detached.
To look at all of the variables that have been created, use the command ls(). This will create a list containing the variables in the workspace. To remove a variable that is no longer needed, use rm():
> x [1] 1 5 4 > rm(x) > x Error: object "x" not found
To clear all variables from R's memory, use rm(list=ls()).