Model evaluation

Author

Prof. Maria Tackett

Published

Feb 04, 2025

Announcements

  • Lab 02 due Thursday at 11:59pm

  • HW 02 released later today

  • Click here to learn more about the Academic Resource Center

  • Statistics experience due Tuesday, April 15

Office hours poll

🔗 https://forms.office.com/r/QTxJvhGSFF

Topics

  • Recap of MLR and predictor types
  • Interaction terms
  • ANOVA and sum of squares
  • \(R^2\) and RMSE

Computational setup

# load packages
library(tidyverse)
library(tidymodels)
library(patchwork)
library(knitr)
library(kableExtra)
library(colorblindr)

# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_bw())

Recap: Multiple linear regression

Data: Palmer penguins

The penguins data set contains data for penguins found on three islands in the Palmer Archipelago, Antarctica. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. These data can be found in the palmerpenguins R package.

. . .

# A tibble: 342 × 4
   body_mass_g flipper_length_mm bill_length_mm species
         <int>             <int>          <dbl> <fct>  
 1        3750               181           39.1 Adelie 
 2        3800               186           39.5 Adelie 
 3        3250               195           40.3 Adelie 
 4        3450               193           36.7 Adelie 
 5        3650               190           39.3 Adelie 
 6        3625               181           38.9 Adelie 
 7        4675               195           39.2 Adelie 
 8        3475               193           34.1 Adelie 
 9        4250               190           42   Adelie 
10        3300               186           37.8 Adelie 
# ℹ 332 more rows

Variables

Predictors:

  • bill_length_mm: Bill length in millimeters
  • flipper_length_mm: Flipper length in millimeters
  • species: Adelie, Gentoo, or Chinstrap species

Response: body_mass_g: Body mass in grams


The goal of this analysis is to use the bill length, flipper length, and species to predict body mass.

Response vs. predictors

Model fit

penguin_fit <- lm(body_mass_g ~ flipper_length_mm + species + 
                bill_length_mm, data = penguins)

tidy(penguin_fit) |>
  kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -3904.387 529.257 -7.377 0.000
flipper_length_mm 27.429 3.176 8.638 0.000
speciesChinstrap -748.562 81.534 -9.181 0.000
speciesGentoo 90.435 88.647 1.020 0.308
bill_length_mm 61.736 7.126 8.664 0.000

Interpreting \(\hat{\beta}_j\)

  • The estimated coefficient \(\hat{\beta}_j\) is the expected change in the mean of \(Y\) when \(X_j\) increases by one unit, holding the values of all other predictor variables constant.

. . .

  • Example: The estimated coefficient for flipper_length_mm is 27.429. This means for each additional millimeter in a penguin’s flipper length, its body mass is expected to be greater by 27.429 grams, on average, holding species and bill length constant.

Indicator variables

term estimate std.error statistic p.value conf.low conf.high
(Intercept) -3904.387 529.257 -7.377 0.000 -4945.450 -2863.324
flipper_length_mm 27.429 3.176 8.638 0.000 21.182 33.675
speciesChinstrap -748.562 81.534 -9.181 0.000 -908.943 -588.182
speciesGentoo 90.435 88.647 1.020 0.308 -83.937 264.807
bill_length_mm 61.736 7.126 8.664 0.000 47.720 75.753


Interpret the coefficient of Gentoo in the context of the data.

Centering

  • Centering a quantitative predictor means shifting every value by some constant \(C\)

\[ X_{cent} = X - C \]

  • One common type of centering is mean-centering, in which every value of a predictor is shifted by its mean

  • Only quantitative predictors are centered

  • Center all quantitative predictors in the model for ease of interpretation

How does centering change the model and/or interpretations?

Standardizing

  • Standardizing a quantitative predictor mean shifting every value by the mean and dividing by the standard deviation of that variable

\[ X_{std} = \frac{X - \bar{X}}{S_X} \]

  • Only quantitative predictors are standardized

  • Standardize all quantitative predictors in the model for ease of interpretation

How does standardizing change the model and/or interpretations?

Interaction terms

Interaction terms

  • Sometimes the relationship between a predictor variable and the response depends on the value of another predictor variable.
  • This is an interaction effect.
  • To account for this, we can include interaction terms in the model.

Bill length versus species

If the lines are not parallel, there is indication of a potential interaction effect, i.e., the slope of bill length may differ based on the species.

Interaction term in model

penguin_int_fit <- lm(body_mass_g ~ flipper_length_mm + species + bill_length_mm + species * bill_length_mm,
      data = penguins)
term estimate std.error statistic p.value
(Intercept) -4297.905 645.054 -6.663 0.000
flipper_length_mm 27.263 3.175 8.586 0.000
speciesChinstrap 1146.287 726.217 1.578 0.115
speciesGentoo 54.716 619.934 0.088 0.930
bill_length_mm 72.692 10.642 6.831 0.000
speciesChinstrap:bill_length_mm -41.035 16.104 -2.548 0.011
speciesGentoo:bill_length_mm -1.163 14.436 -0.081 0.936

Interaction terms in the model

term estimate std.error statistic p.value
(Intercept) -4297.905 645.054 -6.663 0.000
flipper_length_mm 27.263 3.175 8.586 0.000
speciesChinstrap 1146.287 726.217 1.578 0.115
speciesGentoo 54.716 619.934 0.088 0.930
bill_length_mm 72.692 10.642 6.831 0.000
speciesChinstrap:bill_length_mm -41.035 16.104 -2.548 0.011
speciesGentoo:bill_length_mm -1.163 14.436 -0.081 0.936


  • Write the model equation for penguins in the Adelie species.
  • Write the model equation for penguins in the Chinstrap species.

Interpreting interaction terms

  • What the interaction means: The effect of bill length on the body mass is 41.035 less when the penguin is from the Chinstrap species compared to the effect for the Adelie species, holding all else constant.
  • Interpreting bill_length_mm for Chinstrap: For each additional millimeter in bill length, we expect the body mass of Chinstrap penguins to increase by 31.657 grams (72.692 - 41.035), holding all else constant.

Summary


In general, how do

  • indicators for categorical predictors impact the model equation?

  • interaction terms impact the model equation?

Model evaluation

Data: Restaurant tips

Which variables help us predict the amount customers tip at a restaurant? To answer this question, we will use data collected in 2011 by a student at St. Olaf who worked at a local restaurant.

# A tibble: 8 × 4
    Tip Party Meal   Age   
  <dbl> <dbl> <chr>  <chr> 
1  2.99     1 Dinner Yadult
2  2        1 Dinner Yadult
3  5        1 Dinner SenCit
4  4        3 Dinner Middle
5 10.3      2 Dinner SenCit
6  4.85     2 Dinner Middle
7  5        4 Dinner Yadult
8  4        3 Dinner Middle

Dahlquist, Samantha, and Jin Dong. 2011. “The Effects of Credit Cards on Tipping.” Project for Statistics 212-Statistics for the Sciences, St. Olaf College

Variables

Predictors:

  • Party: Number of people in the party
  • Meal: Time of day (Lunch, Dinner, Late Night)
  • Age: Age category of person paying the bill (Yadult, Middle, SenCit)

Response: Tip: Amount of tip in US dollars

Response: Tip

Min Q1 Median Q3 Max Mean SD
0 3 4.5 6 19.46 4.98 3.37

Predictors

Relevel categorical predictors

tips <- tips |>
  mutate(
    Meal = fct_relevel(Meal, "Lunch", "Dinner", "Late Night"),
    Age  = fct_relevel(Age, "Yadult", "Middle", "SenCit")
  )

Predictors, again

Response vs. predictors

Fit and summarize model

tip_fit <- lm(Tip ~ Party + Age, data = tips)

tidy(tip_fit) |>
  kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -0.170 0.366 -0.465 0.643
Party 1.837 0.124 14.758 0.000
AgeMiddle 1.009 0.408 2.475 0.014
AgeSenCit 1.388 0.485 2.862 0.005

. . .


Overall, how well does this model help us understand variability n tips?

Two statistics

  • Root mean square error, RMSE: A measure of the average error (average difference between observed and predicted values of the outcome)

  • R-squared, \(R^2\) : Percentage of variability in the outcome explained by the regression model

What indicates a good model fit? Higher or lower RMSE? Higher or lower \(R^2\)?

RMSE

\[ RMSE = \sqrt{\frac{\sum_{i=1}^n(y_i - \hat{y}_i)^2}{n}} = \sqrt{\frac{\sum_{i=1}^ne_i^2}{n}} \]

. . .

  • Ranges between 0 (perfect predictor) and infinity (terrible predictor)

  • Same units as the response variable

  • The value of RMSE is more useful for comparing across models than evaluating a single model

Compute RMSE: Augmented data frame

Use the augment() function from the broom package to add columns for predicted values, residuals, and other observation-level model statistics

tip_aug <- augment(tip_fit)

. . .

# A tibble: 169 × 9
     Tip Party Age    .fitted .resid   .hat .sigma   .cooksd .std.resid
   <dbl> <dbl> <fct>    <dbl>  <dbl>  <dbl>  <dbl>     <dbl>      <dbl>
 1  2.99     1 Yadult    1.67  1.32  0.0247   2.04 0.00274        0.657
 2  2        1 Yadult    1.67  0.333 0.0247   2.05 0.000173       0.165
 3  5        1 SenCit    3.05  1.95  0.0371   2.04 0.00911        0.972
 4  4        3 Middle    6.35 -2.35  0.0111   2.04 0.00376       -1.16 
 5 10.3      2 SenCit    4.89  5.45  0.0301   2.00 0.0571         2.71 
 6  4.85     2 Middle    4.51  0.337 0.0126   2.05 0.0000881      0.166
 7  5        4 Yadult    7.18 -2.18  0.0471   2.04 0.0148        -1.09 
 8  4        3 Middle    6.35 -2.35  0.0111   2.04 0.00376       -1.16 
 9  5        2 Middle    4.51  0.487 0.0126   2.05 0.000184       0.240
10  1.58     1 SenCit    3.05 -1.47  0.0371   2.04 0.00524       -0.737
# ℹ 159 more rows

Finding RMSE in R

Use the rmse() function from the yardstick package (part of tidymodels)

rmse(tip_aug, truth = Tip, estimate = .fitted)
# A tibble: 1 × 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard        2.02


. . .

Is the model a good fit for the data? What information do you need to make this determination?

Analysis of variance (ANOVA)

Analysis of variance (ANOVA)

Analysis of Variance (ANOVA): Technique to partition variability in Y by the sources of variability

ANOVA

  • Main Idea: Decompose the total variation in the response into
    • the variation that can be explained by the each of the variables in the model

    • the variation that can’t be explained by the model (left in the residuals)

  • If the variation that can be explained by the variables in the model is greater than the variation in the residuals, this signals that the model might be “valuable” (at least one of the \(\beta\)’s not equal to 0)

Sum of Squares


\[ \begin{aligned} \color{#407E99}{SST} \hspace{5mm}&= &\color{#993399}{SSM} &\hspace{5mm} + &\color{#8BB174}{SSR} \\[10pt] \color{#407E99}{\sum_{i=1}^n(y_i - \bar{y})^2} \hspace{5mm}&= &\color{#993399}{\sum_{i = 1}^{n}(\hat{y}_i - \bar{y})^2} &\hspace{5mm}+ &\color{#8BB174}{\sum_{i = 1}^{n}(y_i - \hat{y}_i)^2} \end{aligned} \]

Click here to see why this equality holds.

ANOVA output in R1

anova(tip_fit) |>
  tidy() |>
  kable(digits = 2)
term df sumsq meansq statistic p.value
Party 1 1188.64 1188.64 285.71 0.00
Age 2 38.03 19.01 4.57 0.01
Residuals 165 686.44 4.16 NA NA

ANOVA output, with totals

term df sumsq meansq statistic p.value
Party 1 1188.64 1188.64 285.71 0
Age 2 38.03 19.01 4.57 0.01
Residuals 165 686.44 4.16
Total 168 1913.11

Sum of squares

term df sumsq
Party 1 1188.64
Age 2 38.03
Residuals 165 686.44
Total 168 1913.11
  • \(SST\): Sum of squares total, variability of the response, \(\sum_{i = 1}^n (y_i - \bar{y})^2\)
  • \(SSR\): Sum of squares residuals, variability of residuals, \(\sum_{i = 1}^n (y_i - \hat{y}_i)^2\)
  • \(SSM = SST - SSR\): Sum of squares model, variability explained by the model

Sum of squares: \(SST\)

term df sumsq
Party 1 1188.64
Age 2 38.03
Residuals 165 686.44
Total 168 1913.11


\(SST\): Sum of squares total, variability of the response


\(\sum_{i = 1}^n (y_i - \bar{y})^2\) = 1913.11

Sum of squares: \(SSR\)

term df sumsq
Party 1 1188.64
Age 2 38.03
Residuals 165 686.44
Total 168 1913.11


\(SSR\): Sum of squares residuals, variability of residuals


\(\sum_{i = 1}^n (y_i - \hat{y}_i)^2\) = 686.44

Sum of squares: \(SSM\)

term df sumsq
Party 1 1188.64
Age 2 38.03
Residuals 165 686.44
Total 168 1913.11


\(SSM\): Sum of squares model, Variability explained by the model


\(SST - SSR\) = 1226.67

\(R^2\)

The coefficient of determination \(R^2\) is the proportion of variation in the response, \(Y\), that is explained by the regression model

. . .

\[ R^2 = \frac{SSM}{SST} = 1 - \frac{SSR}{SST} = 1 - \frac{686.44}{1913.11} = 0.641 \]

. . .

What is the range of \(R^2\)? Does \(R^2\) have units?

Interpreting \(R^2\)

glance(tip_fit)$r.squared
[1] 0.6411891

. . .

Select the best interpretation for \(R^2\) .

  1. Party and age correctly predicts 64.12% of tips.
  2. 64.12% of the variability in tips can be explained by party and age.
  3. 64.12% of the variability in party and age can be explained by tips.
  4. 64.12% of the time tips can be predicted by party and age.

Recap

  • Introduced interaction terms

  • Introduced ANOVA and sum of squares

  • Introduced \(R^2\) and RMSE

Next class

  • Model comparison

Footnotes

  1. Click here for explanation about the way R calculates sum of squares for each variable.↩︎