R is a functional programming language.
class(log)
## [1] "function"
Functions take in some input and return some output. The input are a collection of arguments to the function and the output is the return value.
log(10)
## [1] 2.302585
log(x = 10)
## [1] 2.302585
log(10, base = exp(1))
## [1] 2.302585
log(10, base = 10)
## [1] 1
log(x = 10, base = 10)
## [1] 1
Take a look at the arguments.
args(log)
## function (x, base = exp(1))
## NULL
In the log
function, the default value for the
base
argument is exp(1)
.
all.equal(
log(10),
log(10, base = exp(1))
)
## [1] TRUE
log(10, exp(1))
## [1] 2.302585
log(exp(1), 10)
## [1] 0.4342945
log(x = 10, base = exp(1))
## [1] 2.302585
log(base = exp(1), x = 10)
## [1] 2.302585
log(10, b = exp(1))
## [1] 2.302585
log(10, ba = exp(1))
## [1] 2.302585
log(10, bas = exp(1))
## [1] 2.302585
log(10, base = exp(1))
## [1] 2.302585
y <- 100
log(y)
## [1] 4.60517
class(log(10))
## [1] "numeric"
class(as.data.frame(10))
## [1] "data.frame"
class(all.equal(1,1))
## [1] "logical"
class(all.equal(1,2))
## [1] "character"
m <- lm(len ~ dose, data = ToothGrowth)
class(m)
## [1] "lm"
class(summary(m))
## [1] "summary.lm"
# Create a function
add <- function(x, y) {
x + y
}
add(1,2)
## [1] 3
add(x = 1, y = 2)
## [1] 3
add(1:2, 3:4)
## [1] 4 6
add(1:2, 3)
## [1] 4 5
add(1:2, 3:5)
## Warning in x + y: longer object length is not a multiple of shorter object
## length
## [1] 4 6 6
add <- function(x = 1, y = 2) {
x + y
}
add()
## [1] 3
add(3)
## [1] 5
add(y = 5)
## [1] 6
R functions will return the last
add <- function(x, y) {
return(x + y)
}
add(1,2)
## [1] 3
Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE If you don’t find the character, you can return FALSE.
is_char_in_string <- function(string, char) {
for (i in 1:nchar(string)) {
if (char == substr(string, i, i))
return(TRUE)
}
return(FALSE)
}
is_char_in_string("this is my string", "a")
## [1] FALSE
is_char_in_string("this is my string", "s")
## [1] TRUE
add(1, "a")
## Error in x + y: non-numeric argument to binary operator
add <- function(x, y) {
message("Here is a message!")
return(x + y)
}
add(1,2)
## Here is a message!
## [1] 3
add <- function(x, y) {
warning("Here is a warning!")
return(x + y)
}
add(1,2)
## Warning in add(1, 2): Here is a warning!
## [1] 3
add <- function(x, y) {
stop("Here is an error!")
return(x + y)
}
add(1,2)
## Error in add(1, 2): Here is an error!
add <- function(x, y) {
stopifnot(x < 0)
return(x + y)
}
add(1,2)
## Error in add(1, 2): x < 0 is not TRUE
add <- function(x, y) {
stopifnot(is.numeric(x))
stopifnot(is.numeric(y))
return(x + y)
}
add(1,2)
## [1] 3
add(1,"a")
## Error in add(1, "a"): is.numeric(y) is not TRUE
add("b")
## Error in add("b"): is.numeric(x) is not TRUE
These are some issues I want you to be aware of so you (I hope) avoid issues in the future.
my_fancy_function <- function(x, y) {
return(x + y*100)
}
What is the result of the following?
my_fancy_function(y <- 5, x <- 4)
## [1] 405
What happened? We assigned y
the value 5 and
x
the value 4 outside the function. Then, we passed
y
(5) as the first argument of the function and
x
(4) as the second argument fo the function.
This was equivalent to
y <- 5
x <- 4
my_fancy_function(x = y, y = x)
## [1] 405
So, when assigning function arguments, use =
. Also, it
is probably helpful to avoid naming objects the same name as the
argument names.
Here is a function
f <- function() {
return(y)
}
What is the result of the following?
f()
## [1] 5
Basically, R searches through a series of environments to
find the variable called y
.
Sometimes you get baffling error messages due to closure
errors or special
errors.
mean[1]
## Error in mean[1]: object of type 'closure' is not subsettable
log[1]
## Error in log[1]: object of type 'special' is not subsettable
This is related to functions having a typeof
closure
or special
.
typeof(mean)
## [1] "closure"
typeof(log)
## [1] "special"
You will see closure
errors much more commonly than
special
errors.
print(mean)
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x00000243fc48e350>
## <environment: namespace:base>
Take a look at the help file
?mean
Notice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.
mean(1:5)
## [1] 3
mean(as.Date(c("2023-01-01","2022-01-01")))
## [1] "2022-07-02"
I bring up generic functions primarily to point out that it can be
hard to track down the appropriate helpful. Generally you will look up
<function>.<class>
.
For example,
# Determine the class
class(as.Date(c("2023-01-01","2022-01-01")))
## [1] "Date"
# Look up the function
?mean.Date
This didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.
class(1:5)
## [1] "integer"
?mean.integer
There is typically a default method that will be used if a specific method can’t be found.
?mean.default
summary(ToothGrowth$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 13.07 19.25 18.81 25.27 33.90
summary(ToothGrowth$supp)
## OJ VC
## 30 30
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
summary(lm(len ~ supp, data = ToothGrowth))
##
## Call:
## lm(formula = len ~ supp, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.7633 -5.7633 0.4367 5.5867 16.9367
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.663 1.366 15.127 <2e-16 ***
## suppVC -3.700 1.932 -1.915 0.0604 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.482 on 58 degrees of freedom
## Multiple R-squared: 0.05948, Adjusted R-squared: 0.04327
## F-statistic: 3.668 on 1 and 58 DF, p-value: 0.06039
?summary
?summary.numeric
?summary.factor
?summary.data.frame
?summary.lm
?sum
sum(1,2,3)
## [1] 6
sum(5:6)
## [1] 11
sum(1,2,3,5:6)
## [1] 17
Typos get ignored
sum(c(1,2,NA), na.mr = TRUE) # vs
## [1] NA
sum(c(1,2,NA), na.rm = TRUE) # vs
## [1] 3