Preparation

To follow along, use the lab11 code and make sure the following packag is installed:

You can use the following code to perform the installation:

install.packages("emmeans")

Now load the packages

library("tidyverse")
library("emmeans")

One explanatory variables

Consider the fiber data set in the emmeans package where the strength of fibers produced by 3 different machines is tested.

ggplot(emmeans::fiber, aes(machine, strength)) + 
  geom_jitter() + 
  theme_bw()

Means

We might be interested in the mean strength of fibers produced by each machine. One approach would be to fit a regression model and predict a new observation for each machine type.

m <- lm(strength ~ machine, data = emmeans::fiber)
nd <- data.frame(machine = c("A","B","C"))
p <- predict(m, 
             newdata = nd, 
             interval = "confidence")
bind_cols(nd, p %>% as.data.frame)
##   machine  fit      lwr      upr
## 1       A 41.4 37.36282 45.43718
## 2       B 43.2 39.16282 47.23718
## 3       C 36.0 31.96282 40.03718

Alternatively, use the emmeans function in the emmeans package.

emmeans(m, ~machine)
##  machine emmean   SE df lower.CL upper.CL
##  A         41.4 1.85 12     37.4     45.4
##  B         43.2 1.85 12     39.2     47.2
##  C         36.0 1.85 12     32.0     40.0
## 
## Confidence level used: 0.95

Means activity

Consider the ex0518 data set in the Sleuth3 package.

ex0518 <- Sleuth3::ex0518 %>%
  mutate(Treatment = relevel(Treatment, ref="Control"))

ggplot(ex0518, aes(Treatment, Protein)) + 
  geom_jitter() + 
  theme_bw()

Fit a regession model of protein on treatment and compute a point estimate as well as a 95% CI for the mean protein level in each of the treatment (diet) levels.

Click for solution
m <- lm(Protein ~ Treatment, data = ex0518)
emmeans(m, ~Treatment)
##  Treatment emmean   SE df lower.CL upper.CL
##  Control      186 4.44 24      176      195
##  CPFA150      172 9.93 24      151      192
##  CPFA300      147 9.93 24      126      167
##  CPFA450      151 9.93 24      131      171
##  CPFA50       168 9.93 24      148      189
##  CPFA600      152 9.93 24      132      173
## 
## Confidence level used: 0.95

Comparison of means

Typically we are more interested in saying something about differences in means. We can either try to specify the contrasts of interest, or we can use pre-packaged analyses to extract those contrasts we are interested in. Suppose that machine C was really the control and we are interested primarily in comparing

  • machine A to machine C and
  • machine B to machine C.

We can use the contrast function in the emmeans package to perform the comparison.

# First let's make C the reference level
fiber <- emmeans::fiber %>% 
  mutate(machine = relevel(machine, ref="C"))
  
m <- lm(strength ~ machine, data = fiber)
ls <- emmeans(m, ~ machine)
(co <- contrast(ls, method = "pairwise"))
##  contrast estimate   SE df t.ratio p.value
##  C - A        -5.4 2.62 12 -2.061  0.1402 
##  C - B        -7.2 2.62 12 -2.748  0.0434 
##  A - B        -1.8 2.62 12 -0.687  0.7754 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

To get the CIs, use confint:

confint(co)
##  contrast estimate   SE df lower.CL upper.CL
##  C - A        -5.4 2.62 12   -12.39    1.591
##  C - B        -7.2 2.62 12   -14.19   -0.209
##  A - B        -1.8 2.62 12    -8.79    5.191
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

By default, these functions use a Tukey multiple comparison adjustment which is the most appropriate adjustment for performing all pairwise comparisons.

You can turn off the adjustment by setting the adjust argument to “none”, e.g.

(co <- contrast(ls, method = "pairwise", adjust="none"))
##  contrast estimate   SE df t.ratio p.value
##  C - A        -5.4 2.62 12 -2.061  0.0617 
##  C - B        -7.2 2.62 12 -2.748  0.0177 
##  A - B        -1.8 2.62 12 -0.687  0.5052
confint(co)
##  contrast estimate   SE df lower.CL upper.CL
##  C - A        -5.4 2.62 12   -11.11    0.309
##  C - B        -7.2 2.62 12   -12.91   -1.491
##  A - B        -1.8 2.62 12    -7.51    3.909
## 
## Confidence level used: 0.95

Also, we weren’t really interested in looking at all pairwise comparisons. Instead, we were really mainly interested in looking at machine A/B versus machine C (the control).

We can specify just these comparisons by changing the method argument to “trt.vs.ctrl” (which stands for treatment vs control).

(co <- contrast(ls, method = "trt.vs.ctrl"))
##  contrast estimate   SE df t.ratio p.value
##  A - C         5.4 2.62 12 2.061   0.1123 
##  B - C         7.2 2.62 12 2.748   0.0333 
## 
## P value adjustment: dunnettx method for 2 tests
confint(co)
##  contrast estimate   SE df lower.CL upper.CL
##  A - C         5.4 2.62 12    -1.21     12.0
##  B - C         7.2 2.62 12     0.59     13.8
## 
## Confidence level used: 0.95 
## Conf-level adjustment: dunnettx method for 2 estimates

Notice that the adjustment method used here is the dunnettx method which is an approximation to the exact Dunnett method. The exact Dunnett method is an appropriate choice when comparing treatments to a control and can be obtained using adjust = "mvt", but this requires more computing time.

Contrast activity

Reconsider the ex0518 data set and provide an estimate and 95% CI for the difference in mean protein level for all treatments compared to control.

Click for solution
m <- lm(Protein ~ Treatment, data = ex0518)
ls <- emmeans(m, ~ Treatment)
(co <- contrast(ls, method = "trt.vs.ctrl"))
##  contrast          estimate   SE df t.ratio p.value
##  CPFA150 - Control    -13.9 10.9 24 -1.281  0.5708 
##  CPFA300 - Control    -38.9 10.9 24 -3.579  0.0068 
##  CPFA450 - Control    -34.6 10.9 24 -3.181  0.0174 
##  CPFA50 - Control     -17.3 10.9 24 -1.587  0.3907 
##  CPFA600 - Control    -33.3 10.9 24 -3.058  0.0232 
## 
## P value adjustment: dunnettx method for 5 tests
confint(co)
##  contrast          estimate   SE df lower.CL upper.CL
##  CPFA150 - Control    -13.9 10.9 24    -43.4    15.58
##  CPFA300 - Control    -38.9 10.9 24    -68.4    -9.42
##  CPFA450 - Control    -34.6 10.9 24    -64.1    -5.09
##  CPFA50 - Control     -17.3 10.9 24    -46.8    12.25
##  CPFA600 - Control    -33.3 10.9 24    -62.8    -3.75
## 
## Confidence level used: 0.95 
## Conf-level adjustment: dunnettx method for 5 estimates

Two categorical explanatory variables

These data are taken from a wood glue experiment. For simplicity, the data here are taken to be balanced.

wood_glue <- data.frame(weight = c(185,170,210,240,245,190,210,250,
                                  290,280,260,270,200,280,350,350),
                        wood = rep(c("spruce","maple"), each = 8),
                        glue = rep(c("carpenter's","weldbond",
                                     "gorilla","titebond"), 
                                   each=2, times=2))
ggplot(wood_glue, aes(wood, weight, color=glue, shape=glue)) +
  geom_jitter() +
  theme_bw()

We can fit the regression model with wood, glue, and their interaction.

m <- lm(weight ~ wood*glue, data = wood_glue)
anova(m)
## Analysis of Variance Table
## 
## Response: weight
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## wood       1 21025.0 21025.0 27.2389 0.0008035 ***
## glue       3  9687.5  3229.2  4.1835 0.0468482 *  
## wood:glue  3  7037.5  2345.8  3.0391 0.0927400 .  
## Residuals  8  6175.0   771.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compare mean weight averaged across wood type

(ls <- emmeans(m, ~ glue))
## NOTE: Results may be misleading due to involvement in interactions
##  glue        emmean   SE df lower.CL upper.CL
##  carpenter's    231 13.9  8      199      263
##  gorilla        229 13.9  8      197      261
##  titebond       290 13.9  8      258      322
##  weldbond       245 13.9  8      213      277
## 
## Results are averaged over the levels of: wood 
## Confidence level used: 0.95

The note here indicates that there may be issues with comparing glues when averaged over wood since, by construction of the model, the glue’s effect on the weight may be different depending on the wood type.

If we want to compare the glues, we can use contrast as before.

(co <- contrast(ls, "pairwise"))
##  contrast               estimate   SE df t.ratio p.value
##  carpenter's - gorilla       2.5 19.6  8  0.127  0.9992 
##  carpenter's - titebond    -58.8 19.6  8 -2.991  0.0674 
##  carpenter's - weldbond    -13.8 19.6  8 -0.700  0.8943 
##  gorilla - titebond        -61.2 19.6  8 -3.118  0.0563 
##  gorilla - weldbond        -16.2 19.6  8 -0.827  0.8403 
##  titebond - weldbond        45.0 19.6  8  2.291  0.1794 
## 
## Results are averaged over the levels of: wood 
## P value adjustment: tukey method for comparing a family of 4 estimates
confint(co)
##  contrast               estimate   SE df lower.CL upper.CL
##  carpenter's - gorilla       2.5 19.6  8    -60.4    65.41
##  carpenter's - titebond    -58.8 19.6  8   -121.7     4.16
##  carpenter's - weldbond    -13.8 19.6  8    -76.7    49.16
##  gorilla - titebond        -61.2 19.6  8   -124.2     1.66
##  gorilla - weldbond        -16.2 19.6  8    -79.2    46.66
##  titebond - weldbond        45.0 19.6  8    -17.9   107.91
## 
## Results are averaged over the levels of: wood 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 4 estimates

Compare mean weight within each wood type

Since the glue’s effect may depend on wood type, we may be interested in comparing the glues within each wood type. To estimate the mean weight required to break the joint for every wood-glue combination use the emmeans function with A|B denoting we want to calculate the means for A by B.

(em <- emmeans(m, ~ glue | wood))
## wood = maple:
##  glue        emmean   SE df lower.CL upper.CL
##  carpenter's    285 19.6  8      240      330
##  gorilla        240 19.6  8      195      285
##  titebond       350 19.6  8      305      395
##  weldbond       265 19.6  8      220      310
## 
## wood = spruce:
##  glue        emmean   SE df lower.CL upper.CL
##  carpenter's    178 19.6  8      132      223
##  gorilla        218 19.6  8      172      263
##  titebond       230 19.6  8      185      275
##  weldbond       225 19.6  8      180      270
## 
## Confidence level used: 0.95

We would have gotten the exact same result if we switched glue and wood, but this provides a table in the more directly comparable order.

If we want to compare glues within the wood type, we again use contrast.

(co <- contrast(em, "pairwise"))
## wood = maple:
##  contrast               estimate   SE df t.ratio p.value
##  carpenter's - gorilla      45.0 27.8  8  1.620  0.4204 
##  carpenter's - titebond    -65.0 27.8  8 -2.340  0.1678 
##  carpenter's - weldbond     20.0 27.8  8  0.720  0.8865 
##  gorilla - titebond       -110.0 27.8  8 -3.959  0.0176 
##  gorilla - weldbond        -25.0 27.8  8 -0.900  0.8055 
##  titebond - weldbond        85.0 27.8  8  3.059  0.0612 
## 
## wood = spruce:
##  contrast               estimate   SE df t.ratio p.value
##  carpenter's - gorilla     -40.0 27.8  8 -1.440  0.5116 
##  carpenter's - titebond    -52.5 27.8  8 -1.890  0.3039 
##  carpenter's - weldbond    -47.5 27.8  8 -1.710  0.3787 
##  gorilla - titebond        -12.5 27.8  8 -0.450  0.9678 
##  gorilla - weldbond         -7.5 27.8  8 -0.270  0.9926 
##  titebond - weldbond         5.0 27.8  8  0.180  0.9978 
## 
## P value adjustment: tukey method for comparing a family of 4 estimates
confint(co)
## wood = maple:
##  contrast               estimate   SE df lower.CL upper.CL
##  carpenter's - gorilla      45.0 27.8  8   -43.97    134.0
##  carpenter's - titebond    -65.0 27.8  8  -153.97     24.0
##  carpenter's - weldbond     20.0 27.8  8   -68.97    109.0
##  gorilla - titebond       -110.0 27.8  8  -198.97    -21.0
##  gorilla - weldbond        -25.0 27.8  8  -113.97     64.0
##  titebond - weldbond        85.0 27.8  8    -3.97    174.0
## 
## wood = spruce:
##  contrast               estimate   SE df lower.CL upper.CL
##  carpenter's - gorilla     -40.0 27.8  8  -128.97     49.0
##  carpenter's - titebond    -52.5 27.8  8  -141.47     36.5
##  carpenter's - weldbond    -47.5 27.8  8  -136.47     41.5
##  gorilla - titebond        -12.5 27.8  8  -101.47     76.5
##  gorilla - weldbond         -7.5 27.8  8   -96.47     81.5
##  titebond - weldbond         5.0 27.8  8   -83.97     94.0
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 4 estimates

Two categorical explanatory variables activity

Consider the ex1321 data set in the Sleuth3 package. Estimate the average treatment effect on the intelligence test across all the classes.

Click for solution

We should always look at the data first.

ggplot(Sleuth3::ex1321, aes(Class, Gain, color=Treatment, shape = Treatment)) +
  geom_jitter() + 
  theme_bw()

Then we can estimate the average treatment effect across the classes.

m <- lm(Gain ~ Treatment*Class, data = Sleuth3::ex1321)
(em <- emmeans(m, ~ Treatment))
## NOTE: Results may be misleading due to involvement in interactions
##  Treatment emmean    SE  df lower.CL upper.CL
##  control     8.52 0.773 286      7.0     10.0
##  pygmalion  13.67 1.749 286     10.2     17.1
## 
## Results are averaged over the levels of: Class 
## Confidence level used: 0.95
(co <- contrast(em, "pairwise"))
##  contrast            estimate   SE  df t.ratio p.value
##  control - pygmalion    -5.15 1.91 286 -2.693  0.0075 
## 
## Results are averaged over the levels of: Class
confint(co)
##  contrast            estimate   SE  df lower.CL upper.CL
##  control - pygmalion    -5.15 1.91 286    -8.92    -1.39
## 
## Results are averaged over the levels of: Class 
## Confidence level used: 0.95

ANCOVA - Categorical explanatory variable with continuous explanatory variable

Reconsider the fiber data set in the emmeans package where the strength of fibers produced by 3 different machines is tested. The machines do not actually produce a uniform size fiber. So in addition to recording the strength, the diameter of the fiber is also recorded. Since we expect there to be differences in strength depending on the diameter, we should compare machine after adjusting for diameter.

ggplot(emmeans::fiber, aes(diameter, strength, color=machine, shape=machine)) + 
  geom_jitter() + 
  theme_bw()

Comparing treatments at a particular level of the continuous explanatory variable

We will fit the model with an interaction between machine and diameter to allow the possibility that the effect of diameter depends on the machine.

m <- lm(strength ~ diameter*machine, data=fiber)
em <- emmeans(m, ~ machine)
## NOTE: Results may be misleading due to involvement in interactions
co <- contrast(em, method = "pairwise")
confint(co)
##  contrast estimate   SE df lower.CL upper.CL
##  C - A       -1.69 1.24  9    -5.15    1.775
##  C - B       -3.07 1.29  9    -6.67    0.542
##  A - B       -1.38 1.16  9    -4.61    1.853
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

Again, we are warned that these results may not make sense due to the interaction. Perhaps we should evaluate these at a particular diameter.

em <- emmeans(m, ~ machine | diameter)
co <- contrast(em, method = "pairwise")
confint(co)
## diameter = 24.1:
##  contrast estimate   SE df lower.CL upper.CL
##  C - A       -1.69 1.24  9    -5.15    1.775
##  C - B       -3.07 1.29  9    -6.67    0.542
##  A - B       -1.38 1.16  9    -4.61    1.853
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

We can see that, in fact, the analysis provided above was at a diameter of 24.1333 which happens to be the mean diameter in the data set.

mean(fiber$diameter)
## [1] 24.13333

To set your own value(s) for diameter use the at argument, e.g.

em <- emmeans(m, ~ machine | diameter, at = list(diameter = c(20,30)))
co <- contrast(em, method = "pairwise")
confint(co)
## diameter = 20:
##  contrast estimate   SE df lower.CL upper.CL
##  C - A      -0.695 1.48  9    -4.83     3.45
##  C - B      -3.094 1.73  9    -7.92     1.73
##  A - B      -2.399 1.98  9    -7.94     3.14
## 
## diameter = 30:
##  contrast estimate   SE df lower.CL upper.CL
##  C - A      -3.096 2.31  9    -9.55     3.36
##  C - B      -3.024 2.30  9    -9.44     3.39
##  A - B       0.072 1.67  9    -4.59     4.73
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

Comparing the effect of the continuous variable across machines

We can also look at the effect of the continuous variable across machines and see if this is the same, i.e. is the slope the same for all machines. To do this we use the lstrends function.

( lst <- lstrends(m, "machine", var = "diameter") )
##  machine diameter.trend    SE df lower.CL upper.CL
##  C                0.864 0.208  9    0.394     1.33
##  A                1.104 0.194  9    0.666     1.54
##  B                0.857 0.224  9    0.351     1.36
## 
## Confidence level used: 0.95
(co <- contrast(lst, method = "pairwise"))
##  contrast estimate    SE df t.ratio p.value
##  C - A    -0.24008 0.284  9 -0.845  0.6863 
##  C - B     0.00705 0.306  9  0.023  0.9997 
##  A - B     0.24714 0.296  9  0.835  0.6919 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates
confint(co)
##  contrast estimate    SE df lower.CL upper.CL
##  C - A    -0.24008 0.284  9   -1.034    0.554
##  C - B     0.00705 0.306  9   -0.846    0.860
##  A - B     0.24714 0.296  9   -0.579    1.074
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

ANCOVA activity

Take a look at the bat energy expenditure data set in case1002 and determine the relationship between echo-locating ability and energy expenditure after adjusting for mass. In particular, compare the differences in mean energy expenditure between echolocating bats and non-echolocating bats as well as echolocating bats and birds at the mean mass. Does comparing at the mean mass make sense? Also compare the effects of mass on mean energy expenditure between those same groups.

Click for solution

As always, plot the data.

ggplot(Sleuth3::case1002, aes(Mass, Energy, color = Type, shape = Type)) +
  geom_jitter() + 
  theme_bw()

Compare energy expenditure at mean mass.

m <- lm(Energy ~ Mass*Type, data = Sleuth3::case1002)
ls <- emmeans(m, ~ Type | Mass)
co <- contrast(ls, "trt.vs.ctrl") 
confint(co)
## Mass = 263:
##  contrast                                     estimate   SE df lower.CL upper.CL
##  (non-echolocating bats) - echolocating bats     -2.29 16.5 14    -43.2     38.6
##  (non-echolocating birds) - echolocating bats    -2.92 16.2 14    -43.0     37.1
## 
## Confidence level used: 0.95 
## Conf-level adjustment: dunnettx method for 2 estimates

This doesn’t really make sense since there are no echolocating bats whose mass is anywhere near this range.

To compare the slopes, use

( lst <- lstrends(m, "Type", var = "Mass") )
##  Type                   Mass.trend     SE df lower.CL upper.CL
##  echolocating bats          0.0896 0.0680 14  -0.0563   0.2356
##  non-echolocating bats      0.0400 0.0117 14   0.0150   0.0651
##  non-echolocating birds     0.0678 0.0092 14   0.0480   0.0875
## 
## Confidence level used: 0.95
(co <- contrast(lst, method = "pairwise"))
##  contrast                                           estimate     SE df t.ratio p.value
##  echolocating bats - (non-echolocating bats)          0.0496 0.0690 14  0.718  0.7569 
##  echolocating bats - (non-echolocating birds)         0.0219 0.0687 14  0.318  0.9459 
##  (non-echolocating bats) - (non-echolocating birds)  -0.0277 0.0149 14 -1.867  0.1848 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates
confint(co)
##  contrast                                           estimate     SE df lower.CL
##  echolocating bats - (non-echolocating bats)          0.0496 0.0690 14  -0.1311
##  echolocating bats - (non-echolocating birds)         0.0219 0.0687 14  -0.1579
##  (non-echolocating bats) - (non-echolocating birds)  -0.0277 0.0149 14  -0.0666
##  upper.CL
##    0.2303
##    0.2016
##    0.0111
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

Custom contrasts

Although pairwise comparisons are common for contrasts, we may be interested in more sophisticated contrasts. For example, in the mice lifetime diet, we may be interested in comparing the standard N/N85 diet to the low calorie diets. In addition, we may be interested in comparing N/R50 vs R/R50 to look specifically at pre-weaning calorie restriction. To perform these analysis, we use custom contrasts (although the second one is technically part of the pairwise comarisons).

m <- lm(Lifetime ~ Diet, data = Sleuth3::case0501)
ls <- emmeans(m, "Diet")
#                                           N/N85 N/R40 N/R50  NP R/R50 lopro
co <- contrast(ls, list(`High - Low`    = c(    4,   -1,   -1,  0,   -1,   -1) /4,
                        `Pre-wean: R-N` = c(    0,    0,   -1,  0,    1,    0) ))
confint(co)
##  contrast      estimate    SE  df lower.CL upper.CL
##  High - Low      -9.805 0.984 343   -11.74    -7.87
##  Pre-wean: R-N    0.589 1.194 343    -1.76     2.94
## 
## Confidence level used: 0.95