library(tidyverse)
library(tidymodels)
library(knitr)AE 06: Model comparison
Restuarant tips
Render, commit, and push your responses to GitHub by the end of class.
Data
Which variables help us predict the amount customers tip at a restaurant? To answer this question, we will use data collected in 2011 by a student at St. Olaf who worked at a local restaurant.
The variables we’ll focus on for this analysis are
Tip: amount of the tipParty: number of people in the partyMeal: Time of day (Lunch, Dinner, Late Night)Age: Age category of person paying the bill (Yadult, Middle, SenCit)Day: Day of the week (includes every day but Monday)
View the data set to see the remaining variables.
tips <- read_csv("data/tip-data.csv")Exercise 1
Split the data into training (80%) and testing (20%) sets. Use seed 2025.
Exercise 2
Use the training data to fit a model using Party, Age, and Meal to predict tips. Compute the \(R^2\) and \(Adj. R^2\) for this model.
Exercise 3
Now fit a model predicting tips using Party, Age, and Meal, such that the effect of party can differ by Meal. Compute \(R^2\) and \(Adj. R^2\) for this model.
Exercise 4
Which model do you choose - the model from Exercise 2 or Exercise 3? Why?
Compute RMSE for the selected model.
Exercise 5
Now let’s use the testing data to assess the performance of the model selected in Exercise 4.
Compute the predicted tips for the testing data. Add the predictions to the testing data set.
Compute RMSE and \(R^2\) for the testing data.
# add code hereExercise 6
How do RMSE compare between the training and testing data?
How does \(R^2\) compare between the training and testing data?
Is this what you expect? Why?
Exercise 7
Why can we use \(R^2\) as an assessment of performance on the testing data even if we can’t use it to compare models?
To submit the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Push all your work to your repo on GitHub. (You do not submit AEs on Gradescope).