To follow along, use the lab08 code and make sure the following package is installed:
You can use the following code to perform the installation:
install.packages("Sleuth3")
Now load the packages
library("tidyverse")
library("Sleuth3")
set.seed(20210413) # for reproducibility
Categorical variables in R are represented either by a
A factor is a special character variable that
Characters is how R represents any non-numeric data.
is.character("a")
## [1] TRUE
is.character(1)
## [1] FALSE
is.character(pi)
## [1] FALSE
is.character("a123")
## [1] TRUE
R is case sensitive so
unique(c("a","a","A","A"))
## [1] "a" "A"
If you sort a character vector, you will get alphabetical ordering with the lowercase version of a character appearing before the uppercase version.
sort(sample(c(letters,LETTERS), 64, replace = TRUE))
## [1] "a" "a" "a" "B" "c" "C" "d" "e" "E" "E" "f" "g" "g" "G" "h" "i" "i" "i" "i" "j" "J"
## [22] "J" "J" "l" "L" "L" "m" "M" "n" "N" "N" "o" "O" "O" "p" "p" "P" "r" "r" "R" "s" "S"
## [43] "S" "t" "T" "T" "u" "U" "v" "v" "w" "W" "W" "x" "x" "x" "X" "X" "y" "y" "y" "y" "Y"
## [64] "z"
In real data sets, explanatory variables will typically not be represented by single letters but instead will be strings (although these are still “characters” in terms of R’s data type).
Let’s take a look at the case0501
data set, but suppose that it is available from a file, just like it would be for your research.
write.csv(Sleuth3::case0501, file = "case0501.csv",
row.names = FALSE)
Now we will read this file
d = read.csv("case0501.csv")
str(d)
## 'data.frame': 349 obs. of 2 variables:
## $ Lifetime: num 35.5 35.4 34.9 34.8 33.8 33.5 32.6 32.4 31.8 31.6 ...
## $ Diet : chr "NP" "NP" "NP" "NP" ...
Note that Lifetime
is read in as a numeric variable while Diet
is a character. Let’s take a look at Diet
:
sort(unique(d$Diet))
## [1] "lopro" "N/N85" "N/R40" "N/R50" "NP" "R/R50"
This tells us the levels of this explanatory variable and the ordering that R gives it. Importantly, the level lopro
is the first alphabetically. This will come in later when we start running regressions.
Let’s understand a bit more about this variable
summary(d$Diet) # really uninformative
## Length Class Mode
## 349 character character
table(d$Diet) # number of observations of each type
##
## lopro N/N85 N/R40 N/R50 NP R/R50
## 56 57 60 71 49 56
Unfortunately, the summary()
function is really unhelpful. Typically table()
is what you will want as this provides the number of observations of each type.
Use the ex0518
data set from the Sleuth3
package to practice writing to a csv file and reading from the csv file. For this activity, assume Protein
is the response and Treatment
is the only explanatory variable.
Treatment
is read in as a characterTreatment
Treatment
comes first alphabeticallyTreatment
write.csv(Sleuth3::ex0518, file = "ex0518.csv", row.names = FALSE)
d2 = read.csv("ex0518.csv")
is.character(d2$Treatment)
## [1] TRUE
unique(d2$Treament)
## NULL
sort(unique(d2$Treatment)) # first alphabetically
## [1] "Control" "CPFA150" "CPFA300" "CPFA450" "CPFA50" "CPFA600"
table(d2$Treatment)
##
## Control CPFA150 CPFA300 CPFA450 CPFA50 CPFA600
## 15 3 3 3 3 3
A character variable can be converted to a factor.
d = read.csv(file = "case0501.csv") # Making sure we are all on the same page
d$Diet = factor(d$Diet)
is.character(d$Diet)
## [1] FALSE
is.factor(d$Diet)
## [1] TRUE
Diet
is now a factor, but it almost looks like nothing has changed.
d$Diet
## [1] NP NP NP NP NP NP NP NP NP NP NP NP NP
## [14] NP NP NP NP NP NP NP NP NP NP NP NP NP
## [27] NP NP NP NP NP NP NP NP NP NP NP NP NP
## [40] NP NP NP NP NP NP NP NP NP NP N/N85 N/N85 N/N85
## [53] N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85
## [66] N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85
## [79] N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85
## [92] N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85 N/N85
## [105] N/N85 N/N85 lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro
## [118] lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro
## [131] lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro
## [144] lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro lopro
## [157] lopro lopro lopro lopro lopro lopro N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50
## [170] N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50
## [183] N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50
## [196] N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50
## [209] N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50
## [222] N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 N/R50 R/R50
## [235] R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50
## [248] R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50
## [261] R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50
## [274] R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50 R/R50
## [287] R/R50 R/R50 R/R50 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40
## [300] N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40
## [313] N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40
## [326] N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40
## [339] N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40 N/R40
## Levels: lopro N/N85 N/R40 N/R50 NP R/R50
Except there are no quotes and at the bottom there is the line that indicates the levels for this factor.
There is actually an internal representation via integers:
as.numeric(d$Diet)
## [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [42] 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [83] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [124] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4
## [165] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [206] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6
## [247] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [288] 6 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [329] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
This representation can be accessed using levels()
:
nlevels(d$Diet)
## [1] 6
levels(d$Diet)
## [1] "lopro" "N/N85" "N/R40" "N/R50" "NP" "R/R50"
The default ordering is alphabetical.
When you run summary()
or table()
, R uses this ordering for displaying results:
summary(case0501$Diet)
## N/N85 N/R40 N/R50 NP R/R50 lopro
## 57 60 71 49 56 56
table( case0501$Diet)
##
## N/N85 N/R40 N/R50 NP R/R50 lopro
## 57 60 71 49 56 56
You can convert factor variables back into character variables
d$Diet_char <- as.character(d$Diet)
str(d)
## 'data.frame': 349 obs. of 3 variables:
## $ Lifetime : num 35.5 35.4 34.9 34.8 33.8 33.5 32.6 32.4 31.8 31.6 ...
## $ Diet : Factor w/ 6 levels "lopro","N/N85",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Diet_char: chr "NP" "NP" "NP" "NP" ...
and characters back to factors
d$Diet_fact <- factor(d$Diet_char)
str(d)
## 'data.frame': 349 obs. of 4 variables:
## $ Lifetime : num 35.5 35.4 34.9 34.8 33.8 33.5 32.6 32.4 31.8 31.6 ...
## $ Diet : Factor w/ 6 levels "lopro","N/N85",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Diet_char: chr "NP" "NP" "NP" "NP" ...
## $ Diet_fact: Factor w/ 6 levels "lopro","N/N85",..: 5 5 5 5 5 5 5 5 5 5 ...
all.equal(d$Diet, d$Diet_fact)
## [1] TRUE
When creating factors you can choose the ordering using the levels
argument of the factor()
function.
d$Diet_fact <- factor(d$Diet_char,
levels = c("N/N85", "lopro", "N/R40",
"N/R50", "NP", "R/R50"))
str(d)
## 'data.frame': 349 obs. of 4 variables:
## $ Lifetime : num 35.5 35.4 34.9 34.8 33.8 33.5 32.6 32.4 31.8 31.6 ...
## $ Diet : Factor w/ 6 levels "lopro","N/N85",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ Diet_char: chr "NP" "NP" "NP" "NP" ...
## $ Diet_fact: Factor w/ 6 levels "N/N85","lopro",..: 5 5 5 5 5 5 5 5 5 5 ...
all.equal(d$Diet, d$Diet_fact) # no longer equal
## [1] "Attributes: < Component \"levels\": 2 string mismatches >"
Alternatively, you can move one of the levels to be first. This will be helpful when running regression analyses. We can use the relevel()
function to choose a new “reference” level.
d$Diet_fact = relevel(d$Diet_fact, ref = "NP")
levels(d$Diet_fact)
## [1] "NP" "N/N85" "lopro" "N/R40" "N/R50" "R/R50"
Notice that “NP” is now first.
Read in the ex0518
data set from the file you created in the previous activity. Convert Treatment
into a factor variable.
Treatment
is now a factorTreatment
Treatment
comes firstTreatment
Here is what I would do
d2 = read.csv("ex0518.csv") %>%
mutate(Treatment = factor(Treatment))
is.factor(d2$Treatment)
levels(d2$Treatment)
levels(d2$Treatment)[1] # first alphabetically
summary(d2$Treatment)
d2$Treatment = relevel(d2$Treatment, ref = "CPFA50")
levels(d2$Treatment)
Let’s make sure we have the correct data.frame
for the following.
d <- Sleuth3::case0501 %>%
mutate(Diet_char = as.character(Diet),
Diet_fact = as.factor(Diet_char))
Recall that the approach to including categorical explanatory variables in a regression involves
Let’s suppose choose “N/N85” as our reference level because this is the standard Diet
that mice get. Now we need to create dummy variables for all other levels.
d = d %>%
mutate(X1 = Diet == "N/R40",
X2 = Diet == "N/R50",
X3 = Diet == "NP",
X4 = Diet == "R/R50",
X5 = Diet == "lopro")
m = lm(Lifetime ~ X1 + X2 + X3 + X4 + X5, data = d)
summary(m)
##
## Call:
## lm(formula = Lifetime ~ X1 + X2 + X3 + X4 + X5, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5167 -3.3857 0.8143 5.1833 10.0143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.6912 0.8846 36.958 < 2e-16 ***
## X1TRUE 12.4254 1.2352 10.059 < 2e-16 ***
## X2TRUE 9.6060 1.1877 8.088 1.06e-14 ***
## X3TRUE -5.2892 1.3010 -4.065 5.95e-05 ***
## X4TRUE 10.1945 1.2565 8.113 8.88e-15 ***
## X5TRUE 6.9945 1.2565 5.567 5.25e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared: 0.4543, Adjusted R-squared: 0.4463
## F-statistic: 57.1 on 5 and 343 DF, p-value: < 2.2e-16
By construction, 32.7 is the expected lifetime on the N/N85 Diet and 12.4 is the expected difference in lifetimes between the N/N85 and lopro Diets (since it is positive expected lifetime on lopro is higher).
This was pretty tedious. Fortunately, R can do all the work for us.
Let’s perform the Lifetime-Diet regression using the character vector for Diet.
m <- lm(Lifetime ~ Diet_char, data = d)
summary(m)
##
## Call:
## lm(formula = Lifetime ~ Diet_char, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5167 -3.3857 0.8143 5.1833 10.0143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.6857 0.8924 44.470 < 2e-16 ***
## Diet_charN/N85 -6.9945 1.2565 -5.567 5.25e-08 ***
## Diet_charN/R40 5.4310 1.2409 4.377 1.60e-05 ***
## Diet_charN/R50 2.6115 1.1936 2.188 0.0293 *
## Diet_charNP -12.2837 1.3064 -9.403 < 2e-16 ***
## Diet_charR/R50 3.2000 1.2621 2.536 0.0117 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared: 0.4543, Adjusted R-squared: 0.4463
## F-statistic: 57.1 on 5 and 343 DF, p-value: < 2.2e-16
Notice that it automatically constructed all our indicator variables. It automatically chose the lopro diet as the reference level (because it came first alphabetically).
If you leave Diet as a character vector then you cannot control the reference level.
Let’s perform the Lifetime-Diet regression using the factor vector for Diet.
m <- lm(Lifetime ~ Diet_fact, data = d)
summary(m)
##
## Call:
## lm(formula = Lifetime ~ Diet_fact, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5167 -3.3857 0.8143 5.1833 10.0143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.6857 0.8924 44.470 < 2e-16 ***
## Diet_factN/N85 -6.9945 1.2565 -5.567 5.25e-08 ***
## Diet_factN/R40 5.4310 1.2409 4.377 1.60e-05 ***
## Diet_factN/R50 2.6115 1.1936 2.188 0.0293 *
## Diet_factNP -12.2837 1.3064 -9.403 < 2e-16 ***
## Diet_factR/R50 3.2000 1.2621 2.536 0.0117 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared: 0.4543, Adjusted R-squared: 0.4463
## F-statistic: 57.1 on 5 and 343 DF, p-value: < 2.2e-16
These results are identical to the results using the character vector.
Now suppose you want to run the regression using N/N85 (normal diet) as the reference level. Do this using the relevel()
function and its ref
argument.
d$Diet_fact <- relevel(d$Diet_fact, ref = "N/N85")
table(d$Diet_fact)
##
## N/N85 lopro N/R40 N/R50 NP R/R50
## 57 56 60 71 49 56
Notice how N/N85 is now the first level.
Now when you run the regression the reference level will be N/N85.
m <- lm(Lifetime ~ Diet_fact, data = d)
summary(m)
##
## Call:
## lm(formula = Lifetime ~ Diet_fact, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5167 -3.3857 0.8143 5.1833 10.0143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.6912 0.8846 36.958 < 2e-16 ***
## Diet_factlopro 6.9945 1.2565 5.567 5.25e-08 ***
## Diet_factN/R40 12.4254 1.2352 10.059 < 2e-16 ***
## Diet_factN/R50 9.6060 1.1877 8.088 1.06e-14 ***
## Diet_factNP -5.2892 1.3010 -4.065 5.95e-05 ***
## Diet_factR/R50 10.1945 1.2565 8.113 8.88e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared: 0.4543, Adjusted R-squared: 0.4463
## F-statistic: 57.1 on 5 and 343 DF, p-value: < 2.2e-16
The authors of the Sleuth3
package had already made N/N85 the reference level so if you run the regression with the original data, N/N85 will be the reference level.
m <- lm(Lifetime ~ Diet, data = Sleuth3::case0501)
summary(m)
##
## Call:
## lm(formula = Lifetime ~ Diet, data = Sleuth3::case0501)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.5167 -3.3857 0.8143 5.1833 10.0143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.6912 0.8846 36.958 < 2e-16 ***
## DietN/R40 12.4254 1.2352 10.059 < 2e-16 ***
## DietN/R50 9.6060 1.1877 8.088 1.06e-14 ***
## DietNP -5.2892 1.3010 -4.065 5.95e-05 ***
## DietR/R50 10.1945 1.2565 8.113 8.88e-15 ***
## Dietlopro 6.9945 1.2565 5.567 5.25e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.678 on 343 degrees of freedom
## Multiple R-squared: 0.4543, Adjusted R-squared: 0.4463
## F-statistic: 57.1 on 5 and 343 DF, p-value: < 2.2e-16
Use the ex0518
data.frame
from the Sleuth3
package.
Adjust the order of the levels of treatment to have increasing CPFA and make “Control” the reference level. Create a plot of protein versus treatment. Run a regression of protein on treatment to determine which CPFA levels have a mean protein level that is significantly different from the control. Look at the diagnostic plots to determine if there are any clear violations of model assumptions.
Get the data and adjust the levels of treatment.
d <- Sleuth3::ex0518 %>%
mutate(Treatment = factor(Treatment,
levels = c("Control","CPFA50","CPFA150",
"CPFA300","CPFA450","CPFA600")))
Now create the plot.
library(ggplot2)
ggplot(d, aes(Treatment, Protein)) +
geom_jitter(height=0, width=.25) +
theme_bw()
Notice that the ordering on the x-axis is according to the ordering of the levels of treatment.
Now run the regression.
m <- lm(Protein ~ Treatment, data = d)
summary(m)
##
## Call:
## lm(formula = Protein ~ Treatment, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.600 -9.417 3.400 9.300 30.400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 185.600 4.441 41.792 < 2e-16 ***
## TreatmentCPFA50 -17.267 10.878 -1.587 0.12554
## TreatmentCPFA150 -13.933 10.878 -1.281 0.21249
## TreatmentCPFA300 -38.933 10.878 -3.579 0.00151 **
## TreatmentCPFA450 -34.600 10.878 -3.181 0.00402 **
## TreatmentCPFA600 -33.267 10.878 -3.058 0.00540 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.2 on 24 degrees of freedom
## Multiple R-squared: 0.5043, Adjusted R-squared: 0.401
## F-statistic: 4.883 on 5 and 24 DF, p-value: 0.003196
Based on this regression analysis, CPFA levels at 300 or above have significantly different mean protein levels.
Finally look at the diagnostic plots.
opar = par(mfrow=c(2,3))
plot(m, 1:6, ask=FALSE)
par(opar)
There is no clear violation here.
Use the previous data to construct a continuous explanatory variable that has the CPFA level. Then run a regression using this explanatory variable and assess model assumptions using a diagnostic plot.
I’m not necessarily expecting you to come up with this approach that uses gsub()
and as.numeric()
.
d <- d %>%
mutate(cpfa = as.numeric(gsub("CPFA","", Treatment)),
cpfa = ifelse(Treatment == "Control", 0, cpfa))
## Warning: Problem with `mutate()` input `cpfa`.
## i NAs introduced by coercion
## i Input `cpfa` is `as.numeric(gsub("CPFA", "", Treatment))`.
summary(d)
## Protein Treatment Day TrtDayGroup cpfa
## Min. :124.0 Control:15 Day1:6 Group1 : 3 Min. : 0
## 1st Qu.:157.0 CPFA50 : 3 Day2:6 Group10: 3 1st Qu.: 0
## Median :164.5 CPFA150: 3 Day3:6 Group2 : 3 Median : 25
## Mean :171.8 CPFA300: 3 Day4:6 Group3 : 3 Mean :155
## 3rd Qu.:190.8 CPFA450: 3 Day5:6 Group4 : 3 3rd Qu.:300
## Max. :216.0 CPFA600: 3 Group5 : 3 Max. :600
## (Other):12
Run the regression
m <- lm(Protein ~ cpfa, data = d)
opar = par(mfrow=c(2,3))
plot(m, 1:6, ask = FALSE)
par(opar)
plot(d$cpfa, m$residuals)
If you want to clean up your directory, run the following lines
if (file.exists("case0501.csv")) file.remove("case0501.csv")
## [1] TRUE
if (file.exists("ex0518.csv")) file.remove("ex0518.csv")
## [1] TRUE