We will start with loading up the necessary packages using the library() command.

library("tidyverse")

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Read data

Before, we can start reading data in from our file system, we need to 1) understand the concept of a working directory and 2) have a file that we can read.

Working directory

getwd()

## [1] "C:/Users/niemi/Documents/git/jarad.github.com/courses/stat486/slides/07-wrangling"

Set working directory

You can set the working directory in RStudio by going to

Session > Set Working Directory > Choose Working Directory

Alternatively, we can use the setwd() function.

?setwd

Use RStudio projects!! This will set your working directory to the same folder every time you open that project.

Write csv

In order to read a file, we need an appropriate file. Comma-separated value (csv) files are a de-facto standard for transporting data. In this file type, each column is separated by a comma and each row is separated by a newline.

readr::write_csv(ToothGrowth, file = "ToothGrowth.csv")

Read csv

d <- readr::read_csv("ToothGrowth.csv")

## Rows: 60 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): supp
## dbl (2): len, dose
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

all.equal(ToothGrowth, d)

## [1] "Attributes: < Names: 1 string mismatch >"                                              
## [2] "Attributes: < Length mismatch: comparison on first 2 components >"                     
## [3] "Attributes: < Component \"class\": Lengths (1, 4) differ (string compare on first 1) >"
## [4] "Attributes: < Component \"class\": 1 string mismatch >"                                
## [5] "Attributes: < Component 2: Modes: numeric, externalptr >"                              
## [6] "Attributes: < Component 2: Lengths: 60, 1 >"                                           
## [7] "Attributes: < Component 2: target is numeric, current is externalptr >"                
## [8] "Component \"supp\": 'current' is not a factor"

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

str(d)

## spc_tbl_ [60 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ len : num [1:60] 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: chr [1:60] "VC" "VC" "VC" "VC" ...
##  $ dose: num [1:60] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   len = col_double(),
##   ..   supp = col_character(),
##   ..   dose = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Options

d <- readr::read_csv("ToothGrowth.csv",
                     col_types = cols(
                       len = col_double(),
                       supp = col_factor(),
                       dose = col_double()
                     ))

Read Excel

Check out

install.packages("readxl")
?readxl::read_excel

Databases

To read other types of files, including databases, start with these suggestions in R4DS.

CSV file format

There is no official standard for CSV file formats, but there are attempts to define a standard or at least outline what is standard.

Library of Congress

The Internet Society

Formatting

commas (or semi-colons)
line return (CR or LF, mabye CRLF)
header (optional)
quotes (optional)

Bad formatting

Additional rows

The file format only suggests an (optional) header row. Otherwise all rows should strictly be data.

Some (Canvas….ahem!) add additional row(s) between the header and the data. This should not be done as it makes it difficult for standard software including R to read it since data is assumed to immediately come after the header.

If you want to include additional rows, you can include rows above the header row. See the skip argument of readr::read_csv().

Subtotals

A common use of spreadsheets is to calculate subtotals either by adding additional rows or columns. Although these can be removed, it is annoying and therefore should be avoided. Simply calculate subtotals in a different Excel sheet or copy the data over to a new Excel sheet.

Data wrangling

Data often come in multiple files and need to be wrangled to get it into the format necessary for analysis.

Bind

A common issue is that data are created on a periodic basis and each period has its own file. In these circumstances, you need to bind the data together.

Bind rows

dA <- ToothGrowth %>% filter(supp == "VC")
dB <- ToothGrowth %>% filter(supp == "OJ")
bind_rows(dA, dB)

##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
## 11 16.5   VC  1.0
## 12 16.5   VC  1.0
## 13 15.2   VC  1.0
## 14 17.3   VC  1.0
## 15 22.5   VC  1.0
## 16 17.3   VC  1.0
## 17 13.6   VC  1.0
## 18 14.5   VC  1.0
## 19 18.8   VC  1.0
## 20 15.5   VC  1.0
## 21 23.6   VC  2.0
## 22 18.5   VC  2.0
## 23 33.9   VC  2.0
## 24 25.5   VC  2.0
## 25 26.4   VC  2.0
## 26 32.5   VC  2.0
## 27 26.7   VC  2.0
## 28 21.5   VC  2.0
## 29 23.3   VC  2.0
## 30 29.5   VC  2.0
## 31 15.2   OJ  0.5
## 32 21.5   OJ  0.5
## 33 17.6   OJ  0.5
## 34  9.7   OJ  0.5
## 35 14.5   OJ  0.5
## 36 10.0   OJ  0.5
## 37  8.2   OJ  0.5
## 38  9.4   OJ  0.5
## 39 16.5   OJ  0.5
## 40  9.7   OJ  0.5
## 41 19.7   OJ  1.0
## 42 23.3   OJ  1.0
## 43 23.6   OJ  1.0
## 44 26.4   OJ  1.0
## 45 20.0   OJ  1.0
## 46 25.2   OJ  1.0
## 47 25.8   OJ  1.0
## 48 21.2   OJ  1.0
## 49 14.5   OJ  1.0
## 50 27.3   OJ  1.0
## 51 25.5   OJ  2.0
## 52 26.4   OJ  2.0
## 53 22.4   OJ  2.0
## 54 24.5   OJ  2.0
## 55 24.8   OJ  2.0
## 56 30.9   OJ  2.0
## 57 26.4   OJ  2.0
## 58 27.3   OJ  2.0
## 59 29.4   OJ  2.0
## 60 23.0   OJ  2.0

d <- list(
  A = ToothGrowth %>% filter(supp == "VC", dose == 0.5),
  B = ToothGrowth %>% filter(supp == "VC", dose == 1),
  C = ToothGrowth %>% filter(supp == "VC", dose == 2),
  D = ToothGrowth %>% filter(supp == "OJ", dose == 0.5),
  E = ToothGrowth %>% filter(supp == "OJ", dose == 1),
  F = ToothGrowth %>% filter(supp == "OJ", dose == 2)
)

class(d)

## [1] "list"

d[[1]]

##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5

d$B

##     len supp dose
## 1  16.5   VC    1
## 2  16.5   VC    1
## 3  15.2   VC    1
## 4  17.3   VC    1
## 5  22.5   VC    1
## 6  17.3   VC    1
## 7  13.6   VC    1
## 8  14.5   VC    1
## 9  18.8   VC    1
## 10 15.5   VC    1

bind_rows(d)

##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
## 11 16.5   VC  1.0
## 12 16.5   VC  1.0
## 13 15.2   VC  1.0
## 14 17.3   VC  1.0
## 15 22.5   VC  1.0
## 16 17.3   VC  1.0
## 17 13.6   VC  1.0
## 18 14.5   VC  1.0
## 19 18.8   VC  1.0
## 20 15.5   VC  1.0
## 21 23.6   VC  2.0
## 22 18.5   VC  2.0
## 23 33.9   VC  2.0
## 24 25.5   VC  2.0
## 25 26.4   VC  2.0
## 26 32.5   VC  2.0
## 27 26.7   VC  2.0
## 28 21.5   VC  2.0
## 29 23.3   VC  2.0
## 30 29.5   VC  2.0
## 31 15.2   OJ  0.5
## 32 21.5   OJ  0.5
## 33 17.6   OJ  0.5
## 34  9.7   OJ  0.5
## 35 14.5   OJ  0.5
## 36 10.0   OJ  0.5
## 37  8.2   OJ  0.5
## 38  9.4   OJ  0.5
## 39 16.5   OJ  0.5
## 40  9.7   OJ  0.5
## 41 19.7   OJ  1.0
## 42 23.3   OJ  1.0
## 43 23.6   OJ  1.0
## 44 26.4   OJ  1.0
## 45 20.0   OJ  1.0
## 46 25.2   OJ  1.0
## 47 25.8   OJ  1.0
## 48 21.2   OJ  1.0
## 49 14.5   OJ  1.0
## 50 27.3   OJ  1.0
## 51 25.5   OJ  2.0
## 52 26.4   OJ  2.0
## 53 22.4   OJ  2.0
## 54 24.5   OJ  2.0
## 55 24.8   OJ  2.0
## 56 30.9   OJ  2.0
## 57 26.4   OJ  2.0
## 58 27.3   OJ  2.0
## 59 29.4   OJ  2.0
## 60 23.0   OJ  2.0

Bind columns

Hmm…maybe don’t bother??

Join

Data sets are often in multiple files with each file contain some of the information.

Left_join

The most common join operation I use is left_join().

Rather than refactoring the supp column in ToothGrowth, I would probably just create a new tibble that contains the necessary information.

suppTranslation <- tibble(
  supp = c("VC","OJ"),
  `Delivery Method` = c("Ascorbic Acid","Orange Juice")
)

Now, I can simply left_join() these two data sets:

left_join(ToothGrowth, suppTranslation, by = "supp")

##     len supp dose Delivery Method
## 1   4.2   VC  0.5   Ascorbic Acid
## 2  11.5   VC  0.5   Ascorbic Acid
## 3   7.3   VC  0.5   Ascorbic Acid
## 4   5.8   VC  0.5   Ascorbic Acid
## 5   6.4   VC  0.5   Ascorbic Acid
## 6  10.0   VC  0.5   Ascorbic Acid
## 7  11.2   VC  0.5   Ascorbic Acid
## 8  11.2   VC  0.5   Ascorbic Acid
## 9   5.2   VC  0.5   Ascorbic Acid
## 10  7.0   VC  0.5   Ascorbic Acid
## 11 16.5   VC  1.0   Ascorbic Acid
## 12 16.5   VC  1.0   Ascorbic Acid
## 13 15.2   VC  1.0   Ascorbic Acid
## 14 17.3   VC  1.0   Ascorbic Acid
## 15 22.5   VC  1.0   Ascorbic Acid
## 16 17.3   VC  1.0   Ascorbic Acid
## 17 13.6   VC  1.0   Ascorbic Acid
## 18 14.5   VC  1.0   Ascorbic Acid
## 19 18.8   VC  1.0   Ascorbic Acid
## 20 15.5   VC  1.0   Ascorbic Acid
## 21 23.6   VC  2.0   Ascorbic Acid
## 22 18.5   VC  2.0   Ascorbic Acid
## 23 33.9   VC  2.0   Ascorbic Acid
## 24 25.5   VC  2.0   Ascorbic Acid
## 25 26.4   VC  2.0   Ascorbic Acid
## 26 32.5   VC  2.0   Ascorbic Acid
## 27 26.7   VC  2.0   Ascorbic Acid
## 28 21.5   VC  2.0   Ascorbic Acid
## 29 23.3   VC  2.0   Ascorbic Acid
## 30 29.5   VC  2.0   Ascorbic Acid
## 31 15.2   OJ  0.5    Orange Juice
## 32 21.5   OJ  0.5    Orange Juice
## 33 17.6   OJ  0.5    Orange Juice
## 34  9.7   OJ  0.5    Orange Juice
## 35 14.5   OJ  0.5    Orange Juice
## 36 10.0   OJ  0.5    Orange Juice
## 37  8.2   OJ  0.5    Orange Juice
## 38  9.4   OJ  0.5    Orange Juice
## 39 16.5   OJ  0.5    Orange Juice
## 40  9.7   OJ  0.5    Orange Juice
## 41 19.7   OJ  1.0    Orange Juice
## 42 23.3   OJ  1.0    Orange Juice
## 43 23.6   OJ  1.0    Orange Juice
## 44 26.4   OJ  1.0    Orange Juice
## 45 20.0   OJ  1.0    Orange Juice
## 46 25.2   OJ  1.0    Orange Juice
## 47 25.8   OJ  1.0    Orange Juice
## 48 21.2   OJ  1.0    Orange Juice
## 49 14.5   OJ  1.0    Orange Juice
## 50 27.3   OJ  1.0    Orange Juice
## 51 25.5   OJ  2.0    Orange Juice
## 52 26.4   OJ  2.0    Orange Juice
## 53 22.4   OJ  2.0    Orange Juice
## 54 24.5   OJ  2.0    Orange Juice
## 55 24.8   OJ  2.0    Orange Juice
## 56 30.9   OJ  2.0    Orange Juice
## 57 26.4   OJ  2.0    Orange Juice
## 58 27.3   OJ  2.0    Orange Juice
## 59 29.4   OJ  2.0    Orange Juice
## 60 23.0   OJ  2.0    Orange Juice

Right_join

Of course, if the tibbles were in a different order (perhaps due to a data pipeline), I could use right_join().

right_join(suppTranslation, ToothGrowth, by = "supp")

## # A tibble: 60 × 4
##    supp  `Delivery Method`   len  dose
##    <chr> <chr>             <dbl> <dbl>
##  1 VC    Ascorbic Acid       4.2   0.5
##  2 VC    Ascorbic Acid      11.5   0.5
##  3 VC    Ascorbic Acid       7.3   0.5
##  4 VC    Ascorbic Acid       5.8   0.5
##  5 VC    Ascorbic Acid       6.4   0.5
##  6 VC    Ascorbic Acid      10     0.5
##  7 VC    Ascorbic Acid      11.2   0.5
##  8 VC    Ascorbic Acid      11.2   0.5
##  9 VC    Ascorbic Acid       5.2   0.5
## 10 VC    Ascorbic Acid       7     0.5
## # … with 50 more rows

Anti-join

One way to filter observations is to use the dplyr::filter() function. If the observations you want to remove are already part of a data.frame, then you can use the anti_join() function.

len10 <- filter(ToothGrowth, len < 10)

anti_join(ToothGrowth, len10, by = c("len", "supp", "dose"))

##     len supp dose
## 1  11.5   VC  0.5
## 2  10.0   VC  0.5
## 3  11.2   VC  0.5
## 4  11.2   VC  0.5
## 5  16.5   VC  1.0
## 6  16.5   VC  1.0
## 7  15.2   VC  1.0
## 8  17.3   VC  1.0
## 9  22.5   VC  1.0
## 10 17.3   VC  1.0
## 11 13.6   VC  1.0
## 12 14.5   VC  1.0
## 13 18.8   VC  1.0
## 14 15.5   VC  1.0
## 15 23.6   VC  2.0
## 16 18.5   VC  2.0
## 17 33.9   VC  2.0
## 18 25.5   VC  2.0
## 19 26.4   VC  2.0
## 20 32.5   VC  2.0
## 21 26.7   VC  2.0
## 22 21.5   VC  2.0
## 23 23.3   VC  2.0
## 24 29.5   VC  2.0
## 25 15.2   OJ  0.5
## 26 21.5   OJ  0.5
## 27 17.6   OJ  0.5
## 28 14.5   OJ  0.5
## 29 10.0   OJ  0.5
## 30 16.5   OJ  0.5
## 31 19.7   OJ  1.0
## 32 23.3   OJ  1.0
## 33 23.6   OJ  1.0
## 34 26.4   OJ  1.0
## 35 20.0   OJ  1.0
## 36 25.2   OJ  1.0
## 37 25.8   OJ  1.0
## 38 21.2   OJ  1.0
## 39 14.5   OJ  1.0
## 40 27.3   OJ  1.0
## 41 25.5   OJ  2.0
## 42 26.4   OJ  2.0
## 43 22.4   OJ  2.0
## 44 24.5   OJ  2.0
## 45 24.8   OJ  2.0
## 46 30.9   OJ  2.0
## 47 26.4   OJ  2.0
## 48 27.3   OJ  2.0
## 49 29.4   OJ  2.0
## 50 23.0   OJ  2.0

Other joins

There are also inner_join(), full_join(), semi_join(), and nest_join(). I don’t use these, but probably I should.

Transform

Another very common need is to convert data from a relatively wide format to a long format for the purpose of visualization or modeling. Wide is generally a more efficient storage structure while most statistical software expects a long format.

For example, the AirPassengers data.frame contains monthly air passengers across years.

AirPassengers

##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432

Formally, AirPassengers is a time series object and thus R treats it a bit differently. Let’s convert it to a data.frame. Can you understand the piped code below?

months <- c("Jan","Feb","Mar","Apr","May","Jun",
             "July","Aug","Sep","Oct","Nov","Dec")

d <- AirPassengers %>% 
  matrix(ncol=12) %>%
  as.data.frame() %>%
  setNames(months) %>%
  mutate(year = 1949:1960) %>%
  relocate(year)

Now we have a wide data frame, but we will typically want a long data.frame for visualization or analysis.

Longer

To convert to a longer data.frame where each row is a single observation, we can use pivot_longer().

dl <- d %>%
  pivot_longer(Jan:Dec, names_to = "month", values_to = "passengers") %>%
  mutate(month = factor(month, levels = months)) # to order the months

table(dl$month)

## 
##  Jan  Feb  Mar  Apr  May  Jun July  Aug  Sep  Oct  Nov  Dec 
##   12   12   12   12   12   12   12   12   12   12   12   12

Create monthly totals

dl %>%
  group_by(month) %>%
  summarize(
    n = n(),
    mean = mean(passengers),
    sd = sd(passengers)
  )

## # A tibble: 12 × 4
##    month     n  mean    sd
##    <fct> <int> <dbl> <dbl>
##  1 Jan      12  127.  13.7
##  2 Feb      12  140.  19.1
##  3 Mar      12  170.  18.4
##  4 Apr      12  197   23.0
##  5 May      12  225   28.5
##  6 Jun      12  239.  34.9
##  7 July     12  284   42.1
##  8 Aug      12  328.  47.9
##  9 Sep      12  368.  57.9
## 10 Oct      12  381   64.5
## 11 Nov      12  428.  69.8
## 12 Dec      12  476.  77.7

Plot yearly comparisons

ggplot(dl, aes(x = month, y = passengers, 
               group = year, color = year)) +
  geom_line() + 
  scale_y_log10() +
  theme_bw()

Fit a regression model

m <- lm(log(passengers) ~ year + month, data = dl)
summary(m)

## 
## Call:
## lm(formula = log(passengers) ~ year + month, data = dl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24772 -0.09147 -0.02467  0.08133  0.28331 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9.055371   6.447154  -1.405   0.1625    
## year         0.007107   0.003299   2.155   0.0330 *  
## monthFeb     0.094646   0.055784   1.697   0.0921 .  
## monthMar     0.295231   0.055784   5.292 4.94e-07 ***
## monthApr     0.441052   0.055784   7.906 9.62e-13 ***
## monthMay     0.572518   0.055784  10.263  < 2e-16 ***
## monthJun     0.630359   0.055784  11.300  < 2e-16 ***
## monthJuly    0.803090   0.055784  14.397  < 2e-16 ***
## monthAug     0.948250   0.055784  16.999  < 2e-16 ***
## monthSep     1.062194   0.055784  19.041  < 2e-16 ***
## monthOct     1.094404   0.055784  19.619  < 2e-16 ***
## monthNov     1.212189   0.055784  21.730  < 2e-16 ***
## monthDec     1.318037   0.055784  23.628  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1366 on 131 degrees of freedom
## Multiple R-squared:  0.9122, Adjusted R-squared:  0.9042 
## F-statistic: 113.5 on 12 and 131 DF,  p-value: < 2.2e-16

Wider

Less often, we want to convert back to a wide format.

dl %>% pivot_wider(id_cols = year, 
                   names_from = "month", 
                   values_from = "passengers")

## # A tibble: 12 × 13
##     year   Jan   Feb   Mar   Apr   May   Jun  July   Aug   Sep   Oct   Nov   Dec
##    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  1949   112   115   145   171   196   204   242   284   315   340   360   417
##  2  1950   118   126   150   180   196   188   233   277   301   318   342   391
##  3  1951   132   141   178   193   236   235   267   317   356   362   406   419
##  4  1952   129   135   163   181   235   227   269   313   348   348   396   461
##  5  1953   121   125   172   183   229   234   270   318   355   363   420   472
##  6  1954   135   149   178   218   243   264   315   374   422   435   472   535
##  7  1955   148   170   199   230   264   302   364   413   465   491   548   622
##  8  1956   148   170   199   242   272   293   347   405   467   505   559   606
##  9  1957   136   158   184   209   237   259   312   355   404   404   463   508
## 10  1958   119   133   162   191   211   229   274   306   347   359   407   461
## 11  1959   104   114   146   172   180   203   237   271   305   310   362   390
## 12  1960   118   140   166   194   201   229   278   306   336   337   405   432

Binary file formats

Generally, I don’t suggest story data in binary formats, but these formats can be useful to store intermediate data. For example, if there is some important results from a statistical analysis that takes a long time to perform (I’m looking at you Bayesians) you might want to store the results in a binary format.

RData

There are two functions that will save RData files: save() and save.image(). The latter will save everything in the environment while the former will only save what it is specifically told to save. Both of these are read using the load() function.

a <- 1
save(a, file = "a.RData")

RDS

An RDS file contains a single R object. These files can be written using saveRDS() and read using readRDS().

saveRDS(a, file = "a.RDS")
rm(a)

When you read this file, you need to save it into an R object.

b <- readRDS("a.RDS")

Cleanup

unlink("ToothGrowth.csv")
unlink("a.RData")
unlink("a.RDS")

Other resources

Data transformation cheatsheet

STAT 486/586

Jarad Niemi

2023-02-08