R code

R is a functional programming language.

class(log)
## [1] "function"

Function basics

Functions take in some input and return some output. The input are a collection of arguments to the function and the output is the return value.

Arguments

log(10)
## [1] 2.302585
log(x = 10)
## [1] 2.302585
log(10, base = exp(1))
## [1] 2.302585
log(10, base = 10)
## [1] 1
log(x = 10, base = 10)
## [1] 1

Take a look at the arguments.

args(log)
## function (x, base = exp(1)) 
## NULL

Default arguments

In the log function, the default value for the base argument is exp(1).

all.equal(
  log(10),
  log(10, base = exp(1))
)
## [1] TRUE

Positional matching

log(10, exp(1))
## [1] 2.302585
log(exp(1), 10)
## [1] 0.4342945

Name matching

log(x = 10, base = exp(1))
## [1] 2.302585
log(base = exp(1), x = 10)
## [1] 2.302585

Partial matching

log(10, b = exp(1))
## [1] 2.302585
log(10, ba = exp(1))
## [1] 2.302585
log(10, bas = exp(1))
## [1] 2.302585
log(10, base = exp(1))
## [1] 2.302585

R objects as input

y <- 100
log(y)
## [1] 4.60517

Return value

class(log(10))
## [1] "numeric"
class(as.data.frame(10))
## [1] "data.frame"
class(all.equal(1,1))
## [1] "logical"
class(all.equal(1,2))
## [1] "character"
m <- lm(len ~ dose, data = ToothGrowth)
class(m)
## [1] "lm"
class(summary(m))
## [1] "summary.lm"

Building functions

Function definition

# Create a function
add <- function(x, y) {
  x + y
}
add(1,2)
## [1] 3
add(x = 1, y = 2)
## [1] 3
add(1:2, 3:4)
## [1] 4 6
add(1:2, 3)
## [1] 4 5
add(1:2, 3:5)
## Warning in x + y: longer object length is not a multiple of shorter object
## length
## [1] 4 6 6

Default arguments

add <- function(x = 1, y = 2) {
  x + y
}
add()
## [1] 3
add(3)
## [1] 5
add(y = 5)
## [1] 6

Explicit return

R functions will return the last

add <- function(x, y) {
  return(x + y)
}
add(1,2)
## [1] 3

Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE If you don’t find the character, you can return FALSE.

is_char_in_string <- function(string, char) {
  for (i in 1:nchar(string)) {
    if (char == substr(string, i, i))
      return(TRUE)
  }
  return(FALSE)
}
is_char_in_string("this is my string", "a")
## [1] FALSE
is_char_in_string("this is my string", "s")
## [1] TRUE

Error handling

add(1, "a")
## Error in x + y: non-numeric argument to binary operator

message()

add <- function(x, y) {
  message("Here is a message!")
  return(x + y)
}
add(1,2)
## Here is a message!
## [1] 3

warning()

add <- function(x, y) {
  warning("Here is a warning!")
  return(x + y)
}
add(1,2)
## Warning in add(1, 2): Here is a warning!
## [1] 3

stop()

add <- function(x, y) {
  stop("Here is an error!")
  return(x + y)
}
add(1,2)
## Error in add(1, 2): Here is an error!

stopifnot()

add <- function(x, y) {
  stopifnot(x < 0)
  return(x + y)
}
add(1,2)
## Error in add(1, 2): x < 0 is not TRUE
add <- function(x, y) {
  stopifnot(is.numeric(x))
  stopifnot(is.numeric(y))
  return(x + y)
}
add(1,2)
## [1] 3
add(1,"a")
## Error in add(1, "a"): is.numeric(y) is not TRUE
add("b")
## Error in add("b"): is.numeric(x) is not TRUE

Function issues

These are some issues I want you to be aware of so you (I hope) avoid issues in the future.

Argument vs object assignment

my_fancy_function <- function(x, y) {
  return(x + y*100)
}

What is the result of the following?

my_fancy_function(y <- 5, x <- 4)
## [1] 405

What happened? We assigned y the value 5 and x the value 4 outside the function. Then, we passed y(5) as the first argument of the function and x(4) as the second argument fo the function.

This was equivalent to

y <- 5
x <- 4
my_fancy_function(x = y, y = x)
## [1] 405

So, when assigning function arguments, use =. Also, it is probably helpful to avoid naming objects the same name as the argument names.

Scoping

Here is a function

f <- function() {
  return(y)
}

What is the result of the following?

f()
## [1] 5

Basically, R searches through a series of environments to find the variable called y.

Closure errors

Sometimes you get baffling error messages due to closure errors or special errors.

mean[1]
## Error in mean[1]: object of type 'closure' is not subsettable
log[1]
## Error in log[1]: object of type 'special' is not subsettable

This is related to functions having a typeof closure or special.

typeof(mean)
## [1] "closure"
typeof(log)
## [1] "special"

You will see closure errors much more commonly than special errors.

Generic functions

mean()

print(mean)
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x00000243fc48e350>
## <environment: namespace:base>

Take a look at the help file

?mean

Notice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.

mean(1:5)
## [1] 3
mean(as.Date(c("2023-01-01","2022-01-01")))
## [1] "2022-07-02"

I bring up generic functions primarily to point out that it can be hard to track down the appropriate helpful. Generally you will look up <function>.<class>.

For example,

# Determine the class
class(as.Date(c("2023-01-01","2022-01-01")))
## [1] "Date"
# Look up the function
?mean.Date

This didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.

class(1:5)
## [1] "integer"
?mean.integer

There is typically a default method that will be used if a specific method can’t be found.

?mean.default

summary()

summary(ToothGrowth$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   13.07   19.25   18.81   25.27   33.90
summary(ToothGrowth$supp)
## OJ VC 
## 30 30
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
summary(lm(len ~ supp, data = ToothGrowth))
## 
## Call:
## lm(formula = len ~ supp, data = ToothGrowth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.7633  -5.7633   0.4367   5.5867  16.9367 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   20.663      1.366  15.127   <2e-16 ***
## suppVC        -3.700      1.932  -1.915   0.0604 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.482 on 58 degrees of freedom
## Multiple R-squared:  0.05948,    Adjusted R-squared:  0.04327 
## F-statistic: 3.668 on 1 and 58 DF,  p-value: 0.06039
?summary
?summary.numeric
?summary.factor
?summary.data.frame
?summary.lm

… argument

?sum
sum(1,2,3)
## [1] 6
sum(5:6)
## [1] 11
sum(1,2,3,5:6)
## [1] 17

Typos get ignored

sum(c(1,2,NA), na.mr = TRUE) # vs 
## [1] NA
sum(c(1,2,NA), na.rm = TRUE) # vs 
## [1] 3

Suggestions

  • Define functions for tasks that you do 2 or more times
  • Use informative names (verbs)
  • Use consistent return values