Mar 27, 2025
Project presentations in March 31 lab
Statistics experience due April 15
Comparing logistic regression models using
Drop-in-deviance test
AIC
BIC
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.
high_risk:
age: Age at exam time (in years)
education: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College
currentSmoker: 0 = nonsmoker, 1 = smoker
totChol: Total cholesterol (in mg/dL)
# A tibble: 4,086 × 6
age education TenYearCHD totChol currentSmoker high_risk
<dbl> <fct> <dbl> <dbl> <fct> <fct>
1 39 4 0 195 0 0
2 46 2 0 250 0 0
3 48 1 0 245 1 0
4 61 3 1 225 1 1
5 46 3 0 285 1 0
6 43 2 0 228 0 0
7 63 1 1 205 0 1
8 45 2 0 313 1 0
9 52 1 0 260 0 0
10 43 1 0 225 1 0
# ℹ 4,076 more rows
Using age, totChol, and currentSmoker
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -6.673 | 0.378 | -17.647 | 0.000 | -7.423 | -5.940 |
| age | 0.082 | 0.006 | 14.344 | 0.000 | 0.071 | 0.094 |
| totChol | 0.002 | 0.001 | 1.940 | 0.052 | 0.000 | 0.004 |
| currentSmoker1 | 0.443 | 0.094 | 4.733 | 0.000 | 0.260 | 0.627 |
high_risk_aug <- augment(high_risk_fit)
roc_curve_data <- high_risk_aug |>
roc_curve(high_risk, .fitted, event_level = "second")
#calculate AUC
high_risk_aug |>
roc_auc(high_risk, .fitted, event_level = "second")# A tibble: 1 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.697
We will use a threshold of 0.2 to classify observations
Compute the misclassification rate.
Compute sensitivity and explain what it means in the context of the data.
Compute specificity and explain what it means in the context of the data.
| term | estimate |
|---|---|
| (Intercept) | -6.673 |
| age | 0.082 |
| totChol | 0.002 |
| currentSmoker1 | 0.443 |
| term | estimate |
|---|---|
| (Intercept) | -6.456 |
| age | 0.080 |
| totChol | 0.002 |
| currentSmoker1 | 0.445 |
| education2 | -0.270 |
| education3 | -0.232 |
| education4 | -0.035 |
\[ \begin{aligned} \log L&(\hat{p}_i|x_1, \ldots, x_n, y_1, \dots, y_n) \\ &= \sum\limits_{i=1}^n[y_i \log(\hat{p}_i) + (1 - y_i)\log(1 - \hat{p}_i)] \end{aligned} \]
Measure of how well the model fits the data
Higher values of \(\log L\) are better
Deviance = \(-2 \log L\)
Suppose there are two nested models:
We want to test the hypotheses
\[ \begin{aligned} H_0&: \beta_{q+1} = \dots = \beta_p = 0 \\ H_a&: \text{ at least one }\beta_j \text{ is not } 0 \end{aligned} \]
To do so, we will use the Drop-in-deviance test, also known as the Nested Likelihood Ratio test
Hypotheses:
\[ \begin{aligned} H_0&: \beta_{q+1} = \dots = \beta_p = 0 \\ H_a&: \text{ at least one }\beta_j \text{ is not } 0 \end{aligned} \]
Test Statistic: \[\begin{aligned}G &= \text{Deviance}_{reduced} - \text{Deviance}_{full} \\ &= (-2 \log L_{reduced}) - (-2 \log L_{full})\end{aligned}\]
P-value: \(P(\chi^2 > G)\), calculated using a \(\chi^2\) distribution with degrees of freedom equal to the difference in the number of parameters in the full and reduced models
education to the model?First model, reduced:
education to the model?Calculate deviance for each model:
[1] 3224.812
[1] 3217.6
education to the model?Calculate the p-value using a pchisq(), with degrees of freedom equal to the number of new model terms in the second model:
What is your conclusion?
We can use the anova function to conduct this test
Add test = "Chisq" to conduct the drop-in-deviance test
Estimators of prediction error and relative quality of models:
Akaike’s Information Criterion (AIC)1: \[AIC = -2\log L + 2(p+1)\]
Schwarz’s Bayesian Information Criterion (BIC)2: \[ BIC = -2\log L + \log(n)\times(p+1)\]
\[ \begin{aligned} & AIC = \color{blue}{-2\log L} \color{black}{+ 2(p+1)} \\ & BIC = \color{blue}{-2\log L} + \color{black}{\log(n)\times(p+1)} \end{aligned} \]
First Term: Decreases as p increases
\[ \begin{aligned} & AIC = -2\log L + \color{blue}{2(p+1)} \\ & BIC = -2\log L + \color{blue}{\log(n)\times(p+1)} \end{aligned} \]
Second term: Increases as p increases
\[ \begin{aligned} & AIC = -2\log L + \color{red}{2(p+1)} \\ & BIC = -2 \log L + \color{red}{\log(n)\times(p+1)} \end{aligned} \]
Choose model with the smaller value of AIC or BIC
If \(n \geq 8\), the penalty for BIC is larger than that of AIC, so BIC tends to favor more parsimonious models (i.e. models with fewer terms)
glance() functionLet’s look at the AIC for the model that includes age, totChol, and currentSmoker
Let’s compare the full and reduced models using AIC.
Based on AIC, which model would you choose?
Let’s compare the full and reduced models using BIC
Based on BIC, which model would you choose?