R is a functional programming language.

class(log)

## [1] "function"

Function basics

Functions take in some input and return some output. The input are a collection of arguments to the function and the output is the return value.

Arguments

log(10)

## [1] 2.302585

log(x = 10)

## [1] 2.302585

log(10, base = exp(1))

## [1] 2.302585

log(10, base = 10)

## [1] 1

log(x = 10, base = 10)

## [1] 1

Take a look at the arguments.

args(log)

## function (x, base = exp(1)) 
## NULL

Default arguments

In the log function, the default value for the base argument is exp(1).

all.equal(
  log(10),
  log(10, base = exp(1))
)

## [1] TRUE

Positional matching

log(10, exp(1))

## [1] 2.302585

log(exp(1), 10)

## [1] 0.4342945

Name matching

log(x = 10, base = exp(1))

## [1] 2.302585

log(base = exp(1), x = 10)

## [1] 2.302585

Partial matching

log(10, b = exp(1))

## [1] 2.302585

log(10, ba = exp(1))

## [1] 2.302585

log(10, bas = exp(1))

## [1] 2.302585

log(10, base = exp(1))

## [1] 2.302585

R objects as input

y <- 100
log(y)

## [1] 4.60517

Return value

class(log(10))

## [1] "numeric"

class(as.data.frame(10))

## [1] "data.frame"

class(all.equal(1,1))

## [1] "logical"

class(all.equal(1,2))

## [1] "character"

m <- lm(len ~ dose, data = ToothGrowth)
class(m)

## [1] "lm"

class(summary(m))

## [1] "summary.lm"

Building functions

Function definition

# Create a function
add <- function(x, y) {
  x + y
}

add(1,2)

## [1] 3

add(x = 1, y = 2)

## [1] 3

add(1:2, 3:4)

## [1] 4 6

add(1:2, 3)

## [1] 4 5

add(1:2, 3:5)

## Warning in x + y: longer object length is not a multiple of shorter object
## length

## [1] 4 6 6

Default arguments

add <- function(x = 1, y = 2) {
  x + y
}

add()

## [1] 3

add(3)

## [1] 5

add(y = 5)

## [1] 6

Explicit return

R functions will return the last

add <- function(x, y) {
  return(x + y)
}

add(1,2)

## [1] 3

Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE If you don’t find the character, you can return FALSE.

is_char_in_string <- function(string, char) {
  for (i in 1:nchar(string)) {
    if (char == substr(string, i, i))
      return(TRUE)
  }
  return(FALSE)
}

is_char_in_string("this is my string", "a")

## [1] FALSE

is_char_in_string("this is my string", "s")

## [1] TRUE

Error handling

add(1, "a")

## Error in x + y: non-numeric argument to binary operator

message()

add <- function(x, y) {
  message("Here is a message!")
  return(x + y)
}

add(1,2)

## Here is a message!

## [1] 3

warning()

add <- function(x, y) {
  warning("Here is a warning!")
  return(x + y)
}

add(1,2)

## Warning in add(1, 2): Here is a warning!

## [1] 3

stop()

add <- function(x, y) {
  stop("Here is an error!")
  return(x + y)
}

add(1,2)

## Error in add(1, 2): Here is an error!

stopifnot()

add <- function(x, y) {
  stopifnot(x < 0)
  return(x + y)
}

add(1,2)

## Error in add(1, 2): x < 0 is not TRUE

add <- function(x, y) {
  stopifnot(is.numeric(x))
  stopifnot(is.numeric(y))
  return(x + y)
}

add(1,2)

## [1] 3

add(1,"a")

## Error in add(1, "a"): is.numeric(y) is not TRUE

add("b")

## Error in add("b"): is.numeric(x) is not TRUE

Function issues

These are some issues I want you to be aware of so you (I hope) avoid issues in the future.

Argument vs object assignment

my_fancy_function <- function(x, y) {
  return(x + y*100)
}

What is the result of the following?

my_fancy_function(y <- 5, x <- 4)

## [1] 405

What happened? We assigned y the value 5 and x the value 4 outside the function. Then, we passed y(5) as the first argument of the function and x(4) as the second argument fo the function.

This was equivalent to

y <- 5
x <- 4
my_fancy_function(x = y, y = x)

## [1] 405

So, when assigning function arguments, use =. Also, it is probably helpful to avoid naming objects the same name as the argument names.

Scoping

Here is a function

f <- function() {
  return(y)
}

What is the result of the following?

f()

## [1] 5

Basically, R searches through a series of environments to find the variable called y.

Closure errors

Sometimes you get baffling error messages due to closure errors or special errors.

mean[1]

## Error in mean[1]: object of type 'closure' is not subsettable

log[1]

## Error in log[1]: object of type 'special' is not subsettable

This is related to functions having a typeof closure or special.

typeof(mean)

## [1] "closure"

typeof(log)

## [1] "special"

You will see closure errors much more commonly than special errors.

Generic functions

mean()

print(mean)

## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x00000243fc48e350>
## <environment: namespace:base>

Take a look at the help file

?mean

Notice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.

mean(1:5)

## [1] 3

mean(as.Date(c("2023-01-01","2022-01-01")))

## [1] "2022-07-02"

I bring up generic functions primarily to point out that it can be hard to track down the appropriate helpful. Generally you will look up <function>.<class>.

For example,

# Determine the class
class(as.Date(c("2023-01-01","2022-01-01")))

## [1] "Date"

# Look up the function
?mean.Date

This didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.

class(1:5)

## [1] "integer"

?mean.integer

There is typically a default method that will be used if a specific method can’t be found.

?mean.default

summary()

summary(ToothGrowth$len)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   13.07   19.25   18.81   25.27   33.90

summary(ToothGrowth$supp)

## OJ VC 
## 30 30

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

summary(lm(len ~ supp, data = ToothGrowth))

## 
## Call:
## lm(formula = len ~ supp, data = ToothGrowth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.7633  -5.7633   0.4367   5.5867  16.9367 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   20.663      1.366  15.127   <2e-16 ***
## suppVC        -3.700      1.932  -1.915   0.0604 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.482 on 58 degrees of freedom
## Multiple R-squared:  0.05948,    Adjusted R-squared:  0.04327 
## F-statistic: 3.668 on 1 and 58 DF,  p-value: 0.06039

?summary
?summary.numeric
?summary.factor
?summary.data.frame
?summary.lm

… argument

?sum

sum(1,2,3)

## [1] 6

sum(5:6)

## [1] 11

sum(1,2,3,5:6)

## [1] 17

Typos get ignored

sum(c(1,2,NA), na.mr = TRUE) # vs

## [1] NA

sum(c(1,2,NA), na.rm = TRUE) # vs

## [1] 3

Suggestions

Define functions for tasks that you do 2 or more times
Use informative names (verbs)
Use consistent return values

STAT 486/586

Jarad Niemi

2023-02-16

Function basics

Arguments

Default arguments

Positional matching

Name matching

Partial matching

R objects as input

Return value

Building functions

Function definition

Default arguments

Explicit return

Error handling

message()

warning()

stop()

stopifnot()

Function issues

Argument vs object assignment

Scoping

Closure errors

Generic functions

mean()

summary()

… argument

Suggestions