If you haven’t already, please install **both R and
RStudio** following the instructions in this video

This setting will cause you issues in the future if you don’t change it now. Set R/RStudio to not automatically an RData file:

For an extremely detailed introduction, please see

`help.start()`

In this documentation, the above command will be executed at the command prompt, see below.

From `help.start()`

:

R is a free software environment for statistical computing and graphics.

and from https://www.rstudio.com/products/RStudio/:

RStudio is an integrated development environment (IDE) for R.

In contrast to many other statistical software packages that use a
point-and-click interface, e.g. SPSS, JMP, Stata, etc, R has a
command-line interface. The command line has a command prompt,
e.g. `>`

, see below.

`>`

This means, that you will be entering commands on this command line and hitting enter to execute them, e.g.

`help()`

Use the **up arrow** to recover past commands.

```
hepl()
help() # Use up arrow and fix
```

Most likely, you are using a graphical user interface (GUI) and therefore, in addition, to the command line, you also have a windowed version of R with some point-and-click options, e.g. File, Edit, and Help.

In particular, there is an editor to create a new R script. So rather
than entering commands on the command line, you will write commands in a
script and then send those commands to the command line using
`Ctrl-R`

(PC) or `Command-Enter`

(Mac).

```
a = 1
b = 2
a + b
```

`## [1] 3`

Multiple lines can be run in sequence by selecting them and then
using `Ctrl-R`

(PC) or `Command-Enter`

(Mac).

One of the most effective ways to use this documentation is to cut-and-paste the commands into a script and then execute them.

Cut-and-paste the following commands into a **new
script** and then run those commands directly from the script
using `Ctrl-R`

(PC) or `Command-Enter`

(Mac).

```
x <- 1:10
y <- rep(c(1,2), each=5)
m <- lm(y~x)
s <- summary(m)
```

Now, look at the result of each line

```
x
y
m
s
s$r.squared
```

`x`

`## [1] 1 2 3 4 5 6 7 8 9 10`

`y`

`## [1] 1 1 1 1 1 2 2 2 2 2`

`m`

```
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 0.6667 0.1515
```

`s`

```
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4242 -0.1667 0.0000 0.1667 0.4242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6667 0.1880 3.546 0.00756 **
## x 0.1515 0.0303 5.000 0.00105 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2752 on 8 degrees of freedom
## Multiple R-squared: 0.7576, Adjusted R-squared: 0.7273
## F-statistic: 25 on 1 and 8 DF, p-value: 0.001053
```

`s$r.squared`

`## [1] 0.7575758`

All basic calculator operations can be performed in R.

`1+2`

`## [1] 3`

`1-2`

`## [1] -1`

`1/2`

`## [1] 0.5`

`1*2`

`## [1] 2`

`2^3 # same as 2**3`

`## [1] 8`

For now, you can ignore the [1] at the beginning of the line, we’ll learn about that when we get to vectors.

Many advanced calculator operations are also available.

`(1+3)*2 + 100^2 # standard order of operations (PEMDAS)`

`## [1] 10008`

`sin(2*pi) # the result is in scientific notation, i.e. -2.449294 x 10^-16 `

`## [1] -2.449294e-16`

`sqrt(4)`

`## [1] 2`

`log(10) # the default is base e`

`## [1] 2.302585`

`log(10, base = 10)`

`## [1] 1`

A real advantage to using R rather than a calculator (or calculator app) is the ability to store quantities using variables.

```
a = 1
b = 2
a + b
```

`## [1] 3`

`a - b`

`## [1] -1`

`a / b`

`## [1] 0.5`

`a * b`

`## [1] 2`

`b ^ 3`

`## [1] 8`

When assigning variables values, you can also use arrows <- and -> and you will often see this in code, e.g.

```
a <- 1 # recommended
2 -> b # uncommon, but sometimes useful
c = 3 # similar to other languages
```

Now print them.

`a`

`## [1] 1`

`b`

`## [1] 2`

`c`

`## [1] 3`

While using variables alone is useful, it is much more useful to use informative variables names.

```
# Rectangle
length <- 4
width <- 3
area <- length * width
area
```

`## [1] 12`

```
perimeter <- 2 * (length + width)
# Circle
radius <- 2
area <- pi*radius^2 # this overwrites the previous `area` variable
circumference <- 2*pi*radius
area
```

`## [1] 12.56637`

`circumference`

`## [1] 12.56637`

```
# (Right) Triangle
opposite <- 1
angleDegrees <- 30
angleRadians <- angleDegrees * pi/180
(adjacent <- opposite / tan(angleRadians)) # = sqrt(3)
```

`## [1] 1.732051`

`(hypotenuse <- opposite / sin(angleRadians)) # = 2`

`## [1] 2`

Suppose an individual tests positive for a disease, what is the probability the individual has the disease? Let

- \(D\) indicates the individual has the disease
- \(N\) means the individual does not have the disease
- \(+\) indicates a positive test result
- \(-\) indicates a negative test

The above probability can be calculated using Bayes’ Rule:

\[ P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D)+P(+|N)P(N)} = \frac{P(+|D)P(D)}{P(+|D)P(D)+(1-P(-|N))\times(1-P(D))} \]

where

- \(P(+|D)\) is the sensitivity of the test
- \(P(-|N)\) is the specificity of the test
- \(P(D)\) is the prevalence of the disease

Calculate the probability the individual has the disease if the test is positive when

- the specificity of the test is 0.95,
- the sensitivity of the test is 0.99, and
- the prevalence of the disease is 0.001.

```
specificity <- 0.95
sensitivity <- 0.99
prevalence <- 0.001
probability <- (sensitivity*prevalence) / (sensitivity*prevalence + (1-specificity)*(1-prevalence))
probability
```

`## [1] 0.01943463`

Objects in R can be broadly classified according to their dimensions:

- scalar
- vector
- matrix
- array (higher dimensional matrix)

and according to the type of variable they contain:

- logical
- integer
- numeric
- character (string)

Scalars have a single value assigned to the object in R.

```
a <- 3.14159265
b <- "STAT 587 (Eng)"
c <- TRUE
```

Print the objects

`a`

`## [1] 3.141593`

`b`

`## [1] "STAT 587 (Eng)"`

`c`

`## [1] TRUE`

The `c()`

function creates a vector in R

```
a <- c(1, 2, -5, 3.6)
b <- c("STAT", "587", "(Eng)")
c <- c(TRUE, FALSE, TRUE, TRUE)
```

To determine the length of a vector in R use
`length()`

`length(a)`

`## [1] 4`

`length(b)`

`## [1] 3`

`length(c)`

`## [1] 4`

To determine the type of a vector in R use `class()`

`class(a)`

`## [1] "numeric"`

`class(b)`

`## [1] "character"`

`class(c)`

`## [1] "logical"`

Create a numeric vector that is a sequence using : or
`seq()`

.

`1:10`

`## [1] 1 2 3 4 5 6 7 8 9 10`

`5:-2`

`## [1] 5 4 3 2 1 0 -1 -2`

`seq(from = 2, to = 5, by = .05)`

```
## [1] 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70
## [16] 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45
## [31] 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20
## [46] 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95
## [61] 5.00
```

Another useful function to create vectors is `rep()`

`rep(1:4, times = 2)`

`## [1] 1 2 3 4 1 2 3 4`

`rep(1:4, each = 2)`

`## [1] 1 1 2 2 3 3 4 4`

`rep(1:4, each = 2, times = 2)`

`## [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4`

Arguments to functions in R can be referenced either by position or by name or both. The safest and easiest to read approach is to name all your arguments. I will often name all but the first argument.

Elements of a vector can be accessed using brackets, e.g. [index].

```
a <- c("one","two","three","four","five")
a[1]
```

`## [1] "one"`

`a[2:4]`

`## [1] "two" "three" "four"`

`a[c(3,5)]`

`## [1] "three" "five"`

`a[rep(3,4)]`

`## [1] "three" "three" "three" "three"`

Alternatively we can access elements using a logical vector where only TRUE elements are accessed.

`a[c(TRUE, TRUE, FALSE, FALSE, FALSE)]`

`## [1] "one" "two"`

You can also see all elements except some using a negative sign
`-`

.

`a[-1]`

`## [1] "two" "three" "four" "five"`

`a[-(2:3)]`

`## [1] "one" "four" "five"`

You can assign new values to elements in a vector using = or <-.

```
a[2] <- "twenty-two"
a
```

`## [1] "one" "twenty-two" "three" "four" "five"`

```
a[3:4] <- "three-four" # assigns "three-four" to both the 3rd and 4th elements
a
```

`## [1] "one" "twenty-two" "three-four" "three-four" "five"`

```
a[c(3,5)] <- c("thirty-three","fifty-five")
a
```

`## [1] "one" "twenty-two" "thirty-three" "three-four" "fifty-five"`

Matrices can be constructed using `cbind()`

,
`rbind()`

, and `matrix()`

:

```
m1 <- cbind(c(1,2), c(3,4)) # Column bind
m2 <- rbind(c(1,3), c(2,4)) # Row bind
m1
```

```
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
```

`all.equal(m1, m2)`

`## [1] TRUE`

```
m3 <- matrix(1:4, nrow = 2, ncol = 2)
all.equal(m1, m3)
```

`## [1] TRUE`

```
m4 <- matrix(1:4, nrow = 2, ncol = 2, byrow = TRUE)
all.equal(m3, m4)
```

`## [1] "Mean relative difference: 0.4"`

`m3`

```
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
```

`m4`

```
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
```

Elements of a matrix can be accessed using brackets separated by a comma, e.g. [row index, column index].

```
m <- matrix(1:12, nrow=3, ncol=4)
m
```

```
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
```

`m[2,3]`

`## [1] 8`

Multiple elements can be accessed at once

`m[1:2,3:4]`

```
## [,1] [,2]
## [1,] 7 10
## [2,] 8 11
```

If no row (column) index is provided, then the whole row (column) is accessed.

`m[1:2,]`

```
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
```

Like vectors, you can eliminate rows (or columns)

`m[-c(3,4),]`

```
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
```

Be careful not to forget the comma, e.g.

`m[1:4]`

`## [1] 1 2 3 4`

When you install R, you actually install several R packages. When you initially start R, you will load the following packages

- [stats]
- [graphics]
- [grDevices]
- [utils]
- [datasets]
- [methods]
- [base]

You can find this information by running the following code and looking at the attached base packages.

`sessionInfo()`

```
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.29 R6_2.5.1 jsonlite_1.8.0 magrittr_2.0.3
## [5] evaluate_0.15 stringi_1.7.6 rlang_1.0.2 cli_3.3.0
## [9] rstudioapi_0.13 jquerylib_0.1.4 bslib_0.3.1 rmarkdown_2.14
## [13] tools_4.2.2 stringr_1.4.0 xfun_0.31 yaml_2.3.5
## [17] fastmap_1.1.0 compiler_4.2.2 htmltools_0.5.2 knitr_1.39
## [21] sass_0.4.1
```

These packages provide a lot of functionality and have been in existence (almost as they currently are) from the early days of R.

While a lot of functionality exist in these packages, much more functionality exists in user contributed packages. On the comprehensive R archive network (CRAN), there are (as of 2023/01/29) 19,122 packages available for download. On Bioconductor, there are an additional 2,183. There are also additional packages that exist outside of these repositories.

To install packages from CRAN, use the `install.packages`

function. For example,

`install.packages("tidyverse")`

or, to install all the packages needed for this class,

```
install.packages(c("tidyverse",
"gridExtra",
"rmarkdown",
"knitr"))
```

R packages almost always depend on other packages. When you use the
`install.packages()`

function, R will automatically install
these dependencies.

You may run into problems during this installation process. Sometimes
the dependencies will fail. If this occurs, try to install just that
dependency using `install.packages()`

.

Sometimes you will be asked whether you want to install a newer
version of a package from *source*. My general advice (for those
new to R) is to say no and instead install the older version of the
package. If you want to install from source, you will need Rtools
(Windows) or Xcode
(Mac). Alternatively, you can wait a couple of days for the newer
version to be pre-compiled.

The installation only needs to be done once. But we will need to load the packages in every R session where we want to use them. To load the packages, use

```
library("dplyr")
library("tidyr")
library("ggplot2")
```

alternatively, you can load the entire (not very big)
`tidyverse`

.

`library("tidyverse")`

```
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
```

To learn R, you may want to try the swirl package. To install, use

`install.packages("swirl")`

After installation, use the following to get started

```
library("swirl")
swirl()
```

Also, the R 4 Data Science book is extremely helpful.

As you work with R, there will be many times when you need to get help.

My basic approach is

- Use the help contained within R.
- Perform an internet search for an answer.
- Find somebody else who knows.

In all cases, knowing the R keywords, e.g. a function name, will be extremely helpful.

If you know the function name, then you can use
`?<function>`

, e.g.

`?mean`

The structure of help is

- Description: quick description of what the function does
- Usage: the arguments, their order, and default values (if any)
- Arguments: more thorough description about the arguments
- Value: what the funtion returns
- See Also: similar functions
- Examples: examples of how to use the function

If you cannot remember the function name, then you can use
`help.search("<something>")`

, e.g.

`help.search("mean")`

Depending on how many packages you have installed, you will find a lot or a little here.

I google for `<something> R`

, e.g.

`calculate mean R`

Some useful sites are

Although the general R help can still be used, e.g.

```
?ggplot
?geom_point
```

It is much more helpful to google for an answer

```
geom_point
ggplot2 line colors
```

The top hits will all have the code along with what the code produces.

These sites all provide code. The first two also provide the plots that are produced.