To follow along, use the lab09 code and make sure the following packages are installed:
You can use the following code to perform the installation:
install.packages("GGally")
Now load packages
library("tidyverse")
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0.9000
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("GGally")
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library("Sleuth3")
Consider the case1001
data set.
ggplot(case1001, aes(Height, Distance)) +
geom_point() + theme_bw()
There are a number of ways to run the cubic model.
# First create the variables yourself
case1001_tmp <- case1001 %>%
mutate(Height2 = Height * Height,
Height3 = Height * Height2)
# Then run the regression
m1 <- lm(Distance ~ Height + Height2 + Height3, data = case1001_tmp)
I don’t prefer this method because
data.frame
with unnecessary columns andnew <- data.frame(Height = 500) %>%
mutate(Height2 = Height * Height, # Need to manually create the
Height3 = Height * Height2) # quadratic and cubic terms
predict(m1, new)
## 1
## 470.6527
An alternative approach is to use a formula to construct the higher order terms. e.g.
m2 <- lm(Distance ~ Height + I(Height^2) + I(Height^3), data = case1001) # or
m3 <- lm(Distance ~ poly(Height, 3, raw = TRUE), data = case1001)
Now, your original data.frame
is unchanged and your new data.frame
is simpler.
new <- data.frame(Height = 500)
predict(m2, new) # or
## 1
## 470.6527
predict(m3, new)
## 1
## 470.6527
Run the following code to construct the longnosedace data
# From http://www.biostathandbook.com/multipleregression.html
# Modified from http://rcompanion.org/rcompanion/e_05.html
Input = ("
stream count acreage dO2 maxdepth no3 so4 temp
BASIN_RUN 13 2528 9.6 80 2.28 16.75 15.3
BEAR_BR 12 3333 8.5 83 5.34 7.74 19.4
BEAR_CR 54 19611 8.3 96 0.99 10.92 19.5
BEAVER_DAM_CR 19 3570 9.2 56 5.44 16.53 17
BEAVER_RUN 37 1722 8.1 43 5.66 5.91 19.3
BENNETT_CR 2 583 9.2 51 2.26 8.81 12.9
BIG_BR 72 4790 9.4 91 4.1 5.65 16.7
BIG_ELK_CR 164 35971 10.2 81 3.2 17.53 13.8
BIG_PIPE_CR 18 25440 7.5 120 3.53 8.2 13.7
BLUE_LICK_RUN 1 2217 8.5 46 1.2 10.85 14.3
BROAD_RUN 53 1971 11.9 56 3.25 11.12 22.2
BUFFALO_RUN 16 12620 8.3 37 0.61 18.87 16.8
BUSH_CR 32 19046 8.3 120 2.93 11.31 18
CABIN_JOHN_CR 21 8612 8.2 103 1.57 16.09 15
CARROLL_BR 23 3896 10.4 105 2.77 12.79 18.4
COLLIER_RUN 18 6298 8.6 42 0.26 17.63 18.2
CONOWINGO_CR 112 27350 8.5 65 6.95 14.94 24.1
DEAD_RUN 25 4145 8.7 51 0.34 44.93 23
DEEP_RUN 5 1175 7.7 57 1.3 21.68 21.8
DEER_CR 26 8297 9.9 60 5.26 6.36 19.1
DORSEY_RUN 8 7814 6.8 160 0.44 20.24 22.6
FALLS_RUN 15 1745 9.4 48 2.19 10.27 14.3
FISHING_CR 11 5046 7.6 109 0.73 7.1 19
FLINTSTONE_CR 11 18943 9.2 50 0.25 14.21 18.5
GREAT_SENECA_CR 87 8624 8.6 78 3.37 7.51 21.3
GREENE_BR 33 2225 9.1 41 2.3 9.72 20.5
GUNPOWDER_FALLS 22 12659 9.7 65 3.3 5.98 18
HAINES_BR 98 1967 8.6 50 7.71 26.44 16.8
HAWLINGS_R 1 1172 8.3 73 2.62 4.64 20.5
HAY_MEADOW_BR 5 639 9.5 26 3.53 4.46 20.1
HERRINGTON_RUN 1 7056 6.4 60 0.25 9.82 24.5
HOLLANDS_BR 38 1934 10.5 85 2.34 11.44 12
ISRAEL_CR 30 6260 9.5 133 2.41 13.77 21
LIBERTY_RES 12 424 8.3 62 3.49 5.82 20.2
LITTLE_ANTIETAM_CR 24 3488 9.3 44 2.11 13.37 24
LITTLE_BEAR_CR 6 3330 9.1 67 0.81 8.16 14.9
LITTLE_CONOCOCHEAGUE_CR 15 2227 6.8 54 0.33 7.6 24
LITTLE_DEER_CR 38 8115 9.6 110 3.4 9.22 20.5
LITTLE_FALLS 84 1600 10.2 56 3.54 5.69 19.5
LITTLE_GUNPOWDER_R 3 15305 9.7 85 2.6 6.96 17.5
LITTLE_HUNTING_CR 18 7121 9.5 58 0.51 7.41 16
LITTLE_PAINT_BR 63 5794 9.4 34 1.19 12.27 17.5
MAINSTEM_PATUXENT_R 239 8636 8.4 150 3.31 5.95 18.1
MEADOW_BR 234 4803 8.5 93 5.01 10.98 24.3
MILL_CR 6 1097 8.3 53 1.71 15.77 13.1
MORGAN_RUN 76 9765 9.3 130 4.38 5.74 16.9
MUDDY_BR 25 4266 8.9 68 2.05 12.77 17
MUDLICK_RUN 8 1507 7.4 51 0.84 16.3 21
NORTH_BR 23 3836 8.3 121 1.32 7.36 18.5
NORTH_BR_CASSELMAN_R 16 17419 7.4 48 0.29 2.5 18
NORTHWEST_BR 6 8735 8.2 63 1.56 13.22 20.8
NORTHWEST_BR_ANACOSTIA_R 100 22550 8.4 107 1.41 14.45 23
OWENS_CR 80 9961 8.6 79 1.02 9.07 21.8
PATAPSCO_R 28 4706 8.9 61 4.06 9.9 19.7
PINEY_BR 48 4011 8.3 52 4.7 5.38 18.9
PINEY_CR 18 6949 9.3 100 4.57 17.84 18.6
PINEY_RUN 36 11405 9.2 70 2.17 10.17 23.6
PRETTYBOY_BR 19 904 9.8 39 6.81 9.2 19.2
RED_RUN 32 3332 8.4 73 2.09 5.5 17.7
ROCK_CR 3 575 6.8 33 2.47 7.61 18
SAVAGE_R 106 29708 7.7 73 0.63 12.28 21.4
SECOND_MINE_BR 62 2511 10.2 60 4.17 10.75 17.7
SENECA_CR 23 18422 9.9 45 1.58 8.37 20.1
SOUTH_BR_CASSELMAN_R 2 6311 7.6 46 0.64 21.16 18.5
SOUTH_BR_PATAPSCO 26 1450 7.9 60 2.96 8.84 18.6
SOUTH_FORK_LINGANORE_CR 20 4106 10.0 96 2.62 5.45 15.4
TUSCARORA_CR 38 10274 9.3 90 5.45 24.76 15
WATTS_BR 19 510 6.7 82 5.25 14.19 26.5
")
longnosedace = read.table(textConnection(Input),header=TRUE)
Take a quick look at the data
head(longnosedace)
## stream count acreage dO2 maxdepth no3 so4 temp
## 1 BASIN_RUN 13 2528 9.6 80 2.28 16.75 15.3
## 2 BEAR_BR 12 3333 8.5 83 5.34 7.74 19.4
## 3 BEAR_CR 54 19611 8.3 96 0.99 10.92 19.5
## 4 BEAVER_DAM_CR 19 3570 9.2 56 5.44 16.53 17.0
## 5 BEAVER_RUN 37 1722 8.1 43 5.66 5.91 19.3
## 6 BENNETT_CR 2 583 9.2 51 2.26 8.81 12.9
summary(longnosedace)
## stream count acreage dO2
## Length:68 Min. : 1.00 Min. : 424 Min. : 6.400
## Class :character 1st Qu.: 12.00 1st Qu.: 2156 1st Qu.: 8.300
## Mode :character Median : 23.00 Median : 4748 Median : 8.600
## Mean : 38.81 Mean : 7565 Mean : 8.762
## 3rd Qu.: 40.50 3rd Qu.: 8992 3rd Qu.: 9.425
## Max. :239.00 Max. :35971 Max. :11.900
## maxdepth no3 so4 temp
## Min. : 26.00 Min. :0.250 Min. : 2.500 Min. :12.00
## 1st Qu.: 51.00 1st Qu.:1.198 1st Qu.: 7.397 1st Qu.:16.98
## Median : 64.00 Median :2.375 Median :10.220 Median :18.60
## Mean : 72.56 Mean :2.672 Mean :11.650 Mean :18.87
## 3rd Qu.: 90.25 3rd Qu.:3.533 3rd Qu.:14.270 3rd Qu.:20.85
## Max. :160.00 Max. :7.710 Max. :44.930 Max. :26.50
ggpairs(longnosedace %>% select(-stream),
progress = interactive()) +
theme_bw()
Adding additional explanatory variables is accomplished by separating the explanatory variables by +
.
m <- lm(sqrt(count) ~ no3 + maxdepth, data = longnosedace)
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ no3 + maxdepth, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3896 -1.9651 -0.5409 1.3018 7.8564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5683 1.0285 1.525 0.13217
## no3 0.6005 0.1882 3.191 0.00218 **
## maxdepth 0.0308 0.0117 2.633 0.01057 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.804 on 65 degrees of freedom
## Multiple R-squared: 0.2145, Adjusted R-squared: 0.1903
## F-statistic: 8.874 on 2 and 65 DF, p-value: 0.0003914
You can run a regression with all variables using .
m <- lm(sqrt(count) ~ ., data = longnosedace)
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ ., data = longnosedace)
##
## Residuals:
## ALL 68 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6056 NA NA NA
## streamBEAR_BR -0.1414 NA NA NA
## streamBEAR_CR 3.7429 NA NA NA
## streamBEAVER_DAM_CR 0.7533 NA NA NA
## streamBEAVER_RUN 2.4772 NA NA NA
## streamBENNETT_CR -2.1913 NA NA NA
## streamBIG_BR 4.8797 NA NA NA
## streamBIG_ELK_CR 9.2007 NA NA NA
## streamBIG_PIPE_CR 0.6371 NA NA NA
## streamBLUE_LICK_RUN -2.6056 NA NA NA
## streamBROAD_RUN 3.6746 NA NA NA
## streamBUFFALO_RUN 0.3944 NA NA NA
## streamBUSH_CR 2.0513 NA NA NA
## streamCABIN_JOHN_CR 0.9770 NA NA NA
## streamCARROLL_BR 1.1903 NA NA NA
## streamCOLLIER_RUN 0.6371 NA NA NA
## streamCONOWINGO_CR 6.9775 NA NA NA
## streamDEAD_RUN 1.3944 NA NA NA
## streamDEEP_RUN -1.3695 NA NA NA
## streamDEER_CR 1.4935 NA NA NA
## streamDORSEY_RUN -0.7771 NA NA NA
## streamFALLS_RUN 0.2674 NA NA NA
## streamFISHING_CR -0.2889 NA NA NA
## streamFLINTSTONE_CR -0.2889 NA NA NA
## streamGREAT_SENECA_CR 5.7218 NA NA NA
## streamGREENE_BR 2.1390 NA NA NA
## streamGUNPOWDER_FALLS 1.0849 NA NA NA
## streamHAINES_BR 6.2939 NA NA NA
## streamHAWLINGS_R -2.6056 NA NA NA
## streamHAY_MEADOW_BR -1.3695 NA NA NA
## streamHERRINGTON_RUN -2.6056 NA NA NA
## streamHOLLANDS_BR 2.5589 NA NA NA
## streamISRAEL_CR 1.8717 NA NA NA
## streamLIBERTY_RES -0.1414 NA NA NA
## streamLITTLE_ANTIETAM_CR 1.2934 NA NA NA
## streamLITTLE_BEAR_CR -1.1561 NA NA NA
## streamLITTLE_CONOCOCHEAGUE_CR 0.2674 NA NA NA
## streamLITTLE_DEER_CR 2.5589 NA NA NA
## streamLITTLE_FALLS 5.5596 NA NA NA
## streamLITTLE_GUNPOWDER_R -1.8735 NA NA NA
## streamLITTLE_HUNTING_CR 0.6371 NA NA NA
## streamLITTLE_PAINT_BR 4.3317 NA NA NA
## streamMAINSTEM_PATUXENT_R 11.8541 NA NA NA
## streamMEADOW_BR 11.6915 NA NA NA
## streamMILL_CR -1.1561 NA NA NA
## streamMORGAN_RUN 5.1122 NA NA NA
## streamMUDDY_BR 1.3944 NA NA NA
## streamMUDLICK_RUN -0.7771 NA NA NA
## streamNORTH_BR 1.1903 NA NA NA
## streamNORTH_BR_CASSELMAN_R 0.3944 NA NA NA
## streamNORTHWEST_BR -1.1561 NA NA NA
## streamNORTHWEST_BR_ANACOSTIA_R 6.3944 NA NA NA
## streamOWENS_CR 5.3387 NA NA NA
## streamPATAPSCO_R 1.6860 NA NA NA
## streamPINEY_BR 3.3227 NA NA NA
## streamPINEY_CR 0.6371 NA NA NA
## streamPINEY_RUN 2.3944 NA NA NA
## streamPRETTYBOY_BR 0.7533 NA NA NA
## streamRED_RUN 2.0513 NA NA NA
## streamROCK_CR -1.8735 NA NA NA
## streamSAVAGE_R 6.6901 NA NA NA
## streamSECOND_MINE_BR 4.2685 NA NA NA
## streamSENECA_CR 1.1903 NA NA NA
## streamSOUTH_BR_CASSELMAN_R -2.1913 NA NA NA
## streamSOUTH_BR_PATAPSCO 1.4935 NA NA NA
## streamSOUTH_FORK_LINGANORE_CR 0.8666 NA NA NA
## streamTUSCARORA_CR 2.5589 NA NA NA
## streamWATTS_BR 0.7533 NA NA NA
## acreage NA NA NA NA
## dO2 NA NA NA NA
## maxdepth NA NA NA NA
## no3 NA NA NA NA
## so4 NA NA NA NA
## temp NA NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 67 and 0 DF, p-value: NA
But you need to make sure you want to include all the variables. In this case, we probably wanted to exclude the stream
variable
m <- lm(sqrt(count) ~ ., data = longnosedace %>% select(-stream))
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ ., data = longnosedace %>% select(-stream))
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3635 -1.9071 -0.3538 1.3078 8.2489
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.259e+00 4.084e+00 -2.267 0.02694 *
## acreage 1.428e-04 4.153e-05 3.440 0.00106 **
## dO2 7.157e-01 3.311e-01 2.162 0.03456 *
## maxdepth 2.277e-02 1.097e-02 2.075 0.04223 *
## no3 5.580e-01 1.787e-01 3.123 0.00274 **
## so4 7.571e-03 4.756e-02 0.159 0.87406
## temp 2.165e-01 1.042e-01 2.077 0.04198 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.524 on 61 degrees of freedom
## Multiple R-squared: 0.4026, Adjusted R-squared: 0.3438
## F-statistic: 6.852 on 6 and 61 DF, p-value: 1.367e-05
or
m <- lm(sqrt(count) ~ .-stream, data = longnosedace)
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ . - stream, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3635 -1.9071 -0.3538 1.3078 8.2489
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.259e+00 4.084e+00 -2.267 0.02694 *
## acreage 1.428e-04 4.153e-05 3.440 0.00106 **
## dO2 7.157e-01 3.311e-01 2.162 0.03456 *
## maxdepth 2.277e-02 1.097e-02 2.075 0.04223 *
## no3 5.580e-01 1.787e-01 3.123 0.00274 **
## so4 7.571e-03 4.756e-02 0.159 0.87406
## temp 2.165e-01 1.042e-01 2.077 0.04198 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.524 on 61 degrees of freedom
## Multiple R-squared: 0.4026, Adjusted R-squared: 0.3438
## F-statistic: 6.852 on 6 and 61 DF, p-value: 1.367e-05
You can also use this technique to eliminate the intercept and perform “regression through the origin”.
m <- lm(sqrt(count) ~ no3 + acreage - 1, data = longnosedace)
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ no3 + acreage - 1, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9745 -1.3463 0.5319 2.3545 9.5786
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## no3 1.143e+00 1.273e-01 8.976 4.80e-13 ***
## acreage 2.431e-04 3.806e-05 6.386 1.97e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.849 on 66 degrees of freedom
## Multiple R-squared: 0.797, Adjusted R-squared: 0.7909
## F-statistic: 129.6 on 2 and 66 DF, p-value: < 2.2e-16
This is not a common model but there may be a time when you want to force the intercept to be 0.
Interactions can be added to a model by using a colon (:) between two explanatory variables, e.g.
m1 <- lm(sqrt(count) ~ no3 + maxdepth + no3:maxdepth, data = longnosedace)
summary(m1)
##
## Call:
## lm(formula = sqrt(count) ~ no3 + maxdepth + no3:maxdepth, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.369 -1.848 -0.567 1.251 7.213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.017978 1.532259 1.970 0.0532 .
## no3 -0.006920 0.513095 -0.013 0.9893
## maxdepth 0.007891 0.021448 0.368 0.7142
## no3:maxdepth 0.009373 0.007372 1.272 0.2081
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.791 on 64 degrees of freedom
## Multiple R-squared: 0.2338, Adjusted R-squared: 0.1979
## F-statistic: 6.511 on 3 and 64 DF, p-value: 0.0006502
But because we have a convention of including main effects (lower order terms) whenever an interaction (higher order term) is included in the model, R provides a convenient way of including both the main effects as well as the interaction by using an asterisk (*).
m2 <- lm(sqrt(count) ~ no3*maxdepth, data = longnosedace)
summary(m2)
##
## Call:
## lm(formula = sqrt(count) ~ no3 * maxdepth, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.369 -1.848 -0.567 1.251 7.213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.017978 1.532259 1.970 0.0532 .
## no3 -0.006920 0.513095 -0.013 0.9893
## maxdepth 0.007891 0.021448 0.368 0.7142
## no3:maxdepth 0.009373 0.007372 1.272 0.2081
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.791 on 64 degrees of freedom
## Multiple R-squared: 0.2338, Adjusted R-squared: 0.1979
## F-statistic: 6.511 on 3 and 64 DF, p-value: 0.0006502
Alternatively, you can use the caret symbol (^) and specifying what order interactions you want. For example ^2
will include all two-way interactions.
m3 <- lm(sqrt(count) ~ (no3 + maxdepth)^2, data = longnosedace)
summary(m3)
##
## Call:
## lm(formula = sqrt(count) ~ (no3 + maxdepth)^2, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.369 -1.848 -0.567 1.251 7.213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.017978 1.532259 1.970 0.0532 .
## no3 -0.006920 0.513095 -0.013 0.9893
## maxdepth 0.007891 0.021448 0.368 0.7142
## no3:maxdepth 0.009373 0.007372 1.272 0.2081
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.791 on 64 degrees of freedom
## Multiple R-squared: 0.2338, Adjusted R-squared: 0.1979
## F-statistic: 6.511 on 3 and 64 DF, p-value: 0.0006502
If you want to include more terms, you have a couple of options, e.g.
m4 <- lm(sqrt(count) ~ (no3 + maxdepth + acreage)^2, data = longnosedace) # main effects and two-way interactions
summary(m4)
##
## Call:
## lm(formula = sqrt(count) ~ (no3 + maxdepth + acreage)^2, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9715 -1.7287 -0.6365 1.3585 7.4738
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.771e-01 1.891e+00 0.517 0.6072
## no3 3.596e-01 4.977e-01 0.723 0.4727
## maxdepth 1.277e-02 2.575e-02 0.496 0.6218
## acreage 2.893e-04 1.585e-04 1.825 0.0729 .
## no3:maxdepth 7.511e-03 7.100e-03 1.058 0.2943
## no3:acreage -2.060e-05 2.156e-05 -0.956 0.3431
## maxdepth:acreage -1.200e-06 1.859e-06 -0.646 0.5210
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.613 on 61 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.2968
## F-statistic: 5.714 on 6 and 61 DF, p-value: 9.178e-05
m5 <- lm(sqrt(count) ~ (no3 + maxdepth + acreage)^3, data = longnosedace) # main effects, two-way interactions, and three-way interaction
summary(m5)
##
## Call:
## lm(formula = sqrt(count) ~ (no3 + maxdepth + acreage)^3, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1123 -1.5007 -0.3734 1.1348 7.5601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.540e+00 2.492e+00 1.822 0.0734 .
## no3 -9.005e-01 7.668e-01 -1.174 0.2449
## maxdepth -4.593e-02 3.735e-02 -1.230 0.2236
## acreage -2.026e-04 2.787e-04 -0.727 0.4700
## no3:maxdepth 2.856e-02 1.210e-02 2.361 0.0215 *
## no3:acreage 1.679e-04 9.141e-05 1.837 0.0712 .
## maxdepth:acreage 6.495e-06 4.057e-06 1.601 0.1147
## no3:maxdepth:acreage -2.895e-06 1.366e-06 -2.119 0.0383 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.541 on 60 degrees of freedom
## Multiple R-squared: 0.4044, Adjusted R-squared: 0.3349
## F-statistic: 5.819 on 7 and 60 DF, p-value: 3.555e-05
m5 <- lm(sqrt(count) ~ no3 * maxdepth * acreage, data = longnosedace) # main effects, two-way interactions, and three-way interaction
summary(m5)
##
## Call:
## lm(formula = sqrt(count) ~ no3 * maxdepth * acreage, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1123 -1.5007 -0.3734 1.1348 7.5601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.540e+00 2.492e+00 1.822 0.0734 .
## no3 -9.005e-01 7.668e-01 -1.174 0.2449
## maxdepth -4.593e-02 3.735e-02 -1.230 0.2236
## acreage -2.026e-04 2.787e-04 -0.727 0.4700
## no3:maxdepth 2.856e-02 1.210e-02 2.361 0.0215 *
## no3:acreage 1.679e-04 9.141e-05 1.837 0.0712 .
## maxdepth:acreage 6.495e-06 4.057e-06 1.601 0.1147
## no3:maxdepth:acreage -2.895e-06 1.366e-06 -2.119 0.0383 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.541 on 60 degrees of freedom
## Multiple R-squared: 0.4044, Adjusted R-squared: 0.3349
## F-statistic: 5.819 on 7 and 60 DF, p-value: 3.555e-05
We can combine the .
with this notation for higher order terms.
m <- lm(sqrt(count) ~ .^2, data = longnosedace %>% select(-stream))
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ .^2, data = longnosedace %>% select(-stream))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6901 -1.5280 -0.2438 1.4632 6.0108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.920e+01 2.715e+01 -0.707 0.48300
## acreage -5.081e-05 8.671e-04 -0.059 0.95352
## dO2 2.193e+00 2.572e+00 0.853 0.39825
## maxdepth 7.802e-02 1.856e-01 0.420 0.67626
## no3 7.466e-01 4.068e+00 0.184 0.85520
## so4 -2.789e-01 9.418e-01 -0.296 0.76846
## temp 4.744e-01 1.062e+00 0.447 0.65715
## acreage:dO2 -2.020e-05 6.624e-05 -0.305 0.76172
## acreage:maxdepth 1.063e-06 2.399e-06 0.443 0.65993
## acreage:no3 -1.860e-05 3.016e-05 -0.617 0.54057
## acreage:so4 1.568e-05 1.315e-05 1.192 0.23931
## acreage:temp 7.263e-06 1.697e-05 0.428 0.67071
## dO2:maxdepth -1.478e-02 1.478e-02 -1.000 0.32248
## dO2:no3 -5.095e-02 2.867e-01 -0.178 0.85974
## dO2:so4 1.021e-01 8.758e-02 1.166 0.24966
## dO2:temp -6.416e-02 1.034e-01 -0.621 0.53785
## maxdepth:no3 6.118e-03 9.920e-03 0.617 0.54046
## maxdepth:so4 -7.029e-03 2.448e-03 -2.872 0.00615 **
## maxdepth:temp 6.953e-03 6.417e-03 1.084 0.28422
## no3:so4 6.959e-03 3.333e-02 0.209 0.83553
## no3:temp -7.861e-03 9.412e-02 -0.084 0.93380
## so4:temp -1.306e-02 2.748e-02 -0.475 0.63673
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.533 on 46 degrees of freedom
## Multiple R-squared: 0.5462, Adjusted R-squared: 0.3391
## F-statistic: 2.637 on 21 and 46 DF, p-value: 0.003049
Another approach that might be useful is to construct a model and then add or remove variables from that model.
m <- lm(sqrt(count) ~ no3, data = longnosedace)
summary(m)
##
## Call:
## lm(formula = sqrt(count) ~ no3, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3751 -1.9991 -0.1519 0.9806 9.6578
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7547 0.6335 5.927 1.24e-07 ***
## no3 0.6185 0.1963 3.150 0.00245 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.927 on 66 degrees of freedom
## Multiple R-squared: 0.1307, Adjusted R-squared: 0.1175
## F-statistic: 9.924 on 1 and 66 DF, p-value: 0.002452
mA <- update(m, ~ . + acreage)
summary(mA)
##
## Call:
## lm(formula = sqrt(count) ~ no3 + acreage, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9154 -1.8807 -0.3148 1.3880 9.4343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3053896 0.6785796 3.397 0.001166 **
## no3 0.6890353 0.1782946 3.865 0.000259 ***
## acreage 0.0001667 0.0000419 3.978 0.000178 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.645 on 65 degrees of freedom
## Multiple R-squared: 0.3009, Adjusted R-squared: 0.2794
## F-statistic: 13.99 on 2 and 65 DF, p-value: 8.872e-06
mR <- update(mA, ~ . - no3)
summary(mR)
##
## Call:
## lm(formula = sqrt(count) ~ acreage, data = longnosedace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8406 -2.0205 -0.2371 1.5548 10.3053
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.269e+00 4.951e-01 8.622 2.04e-12 ***
## acreage 1.505e-04 4.588e-05 3.281 0.00165 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.911 on 66 degrees of freedom
## Multiple R-squared: 0.1402, Adjusted R-squared: 0.1272
## F-statistic: 10.77 on 1 and 66 DF, p-value: 0.001654
You can find out more information by looking at
?formula
Practice these skills by taking a look at the data sets in the Sleuth3
package.
help(package="Sleuth3")
The data sets are organized by
The Statistical Sleuth 3rd ed by Ramsey and Schafer begins discussing regression in Chapter 7 (although ANOVA is discussed in Chapter 5). Thus, data sets in Chapters 5 through 14 are relevant for exploration. When you get a new data set ask yourself these questions in this order