Logistic Regression: Model comparison + inference

Author

Prof. Maria Tackett

Published

Apr 01, 2025

Announcements

Next project milestone: Analysis and draft in April 14 lab
Team Feedback (email from TEAMMATES) due Tuesday, April 8 at 11:59pm (check email)
HW 04 due Tuesday, April 8 - released after class
Statistics experience due April 15

Topics

Comparing logistic regression models using
- Drop-in-deviance test
- AIC
- BIC
Inference for a single model coefficient

Computational setup

# load packages
library(tidyverse)
library(tidymodels)
library(openintro)
library(knitr)
library(kableExtra)  # for table embellishments
library(Stat2Data)   # for empirical logit

# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_bw(base_size = 20))

Data: Risk of coronary heart disease

This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.

high_risk:
- 1: High risk of having heart disease in next 10 years
- 0: Not high risk of having heart disease in next 10 years
age: Age at exam time (in years)
education: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College
currentSmoker: 0 = nonsmoker, 1 = smoker
totChol: Total cholesterol (in mg/dL)

Review: Drop-in-deviance test

Which model do we choose?

Model 1

term	estimate
(Intercept)	-6.673
age	0.082
totChol	0.002
currentSmoker1	0.443

Model 2

term	estimate
(Intercept)	-6.456
age	0.080
totChol	0.002
currentSmoker1	0.445
education2	-0.270
education3	-0.232
education4	-0.035

We will use the Drop-in-deviance test to compare the two models

Drop-in-deviance test

Hypotheses

\[ \begin{aligned} H_0&: \beta_{ed2} = \beta_{ed3} = \beta_{ed4} = 0 \\ H_a&: \text{ at least one }\beta_j \text{ is not } 0 \end{aligned} \]

. . .

Test in R

## reduced and full models
high_risk_fit_reduced <- glm(high_risk ~ age + totChol + currentSmoker,
              data = heart_disease, family = "binomial")

high_risk_fit_full <- glm(high_risk ~ age + totChol + currentSmoker +
                            education,
              data = heart_disease, family = "binomial")

## drop-in-deviance test 
anova(high_risk_fit_reduced, high_risk_fit_full, test = "Chisq") |>
  tidy() |> 
  kable(digits = 3)

Drop-in-deviance test

Hypotheses

term	df.residual	residual.deviance	df	deviance	p.value
high_risk ~ age + totChol + currentSmoker	4082	3224.812	NA	NA	NA
high_risk ~ age + totChol + currentSmoker + education	4079	3217.600	3	7.212	0.065

Model comparison using AIC and BIC

AIC & BIC

Estimators of prediction error and relative quality of models:

Akaike’s Information Criterion (AIC)¹: \[AIC = -2\log L + 2(p+1)\]

Schwarz’s Bayesian Information Criterion (BIC)²: \[ BIC = -2\log L + \log(n)\times(p+1)\]

AIC & BIC

\[ \begin{aligned} & AIC = \color{blue}{-2\log L} \color{black}{+ 2(p+1)} \\ & BIC = \color{blue}{-2\log L} + \color{black}{\log(n)\times(p+1)} \end{aligned} \]

. . .

First Term: Decreases as p increases

AIC & BIC

\[ \begin{aligned} & AIC = -2\log L + \color{blue}{2(p+1)} \\ & BIC = -2\log L + \color{blue}{\log(n)\times(p+1)} \end{aligned} \]

Second term: Increases as p increases

Using AIC & BIC

\[ \begin{aligned} & AIC = -2\log L + \color{red}{2(p+1)} \\ & BIC = -2 \log L + \color{red}{\log(n)\times(p+1)} \end{aligned} \]

Choose model with the smaller value of AIC or BIC
If \(n \geq 8\), the penalty for BIC is larger than that of AIC, so BIC tends to favor more parsimonious models (i.e. models with fewer terms)

AIC from the `glance()` function

Let’s look at the AIC for the model that includes age, totChol, and currentSmoker

glance(high_risk_fit)$AIC

[1] 3232.812

. . .

Calculating AIC

- 2 * glance(high_risk_fit)$logLik + 2 * (3 + 1)

[1] 3232.812

Comparing the models using AIC

Let’s compare the full and reduced models using AIC.

glance(high_risk_fit_reduced)$AIC

[1] 3232.812

glance(high_risk_fit_full)$AIC

[1] 3231.6

Based on AIC, which model would you choose?

Comparing the models using BIC

Let’s compare the full and reduced models using BIC

glance(high_risk_fit_reduced)$BIC

[1] 3258.074

glance(high_risk_fit_full)$BIC

[1] 3275.807

Based on BIC, which model would you choose?

Application exercise

📋 sta210-sp25.netlify.app/ae/ae-10-logistic-compare.html

Estimating coefficients

Statistical model

The form of the statistical model for logistic regression is

\[ \log\Big(\frac{\pi}{1-\pi}\Big) = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_pX_p \]

where \(\pi\) is the probability \(Y = 1\).

. . .

Notice there is no error term when writing the statistical model for logistic regression. Why?

Recall that the statistical model is the “data-generating” model
Each individual observed \(Y\) is generated from a Bernoulli distribution, \(Bernoulli(\pi)\) (similarly we can think of \(n\) observed \(Y\)’s as generated from a Binomial distribution, \(Binomial(n,p)\))
Therefore, the randomness is not produced by an error term but rather in the distribution used to generate \(Y\)

Estimating coefficients

Recall the log likelihood function

\[ \log L = \sum\limits_{i=1}^n[y_i \log(\hat{\pi}_i) + (1 - y_i)\log(1 - \hat{\pi}_i)] \]

where

\(\hat{\pi} = \frac{exp\{\hat{\beta}_0 + \hat{\beta}_1X_1 + \dots + \hat{\beta}_pX_p\}}{1 + exp\{\hat{\beta}_0 + \hat{\beta}_1X_1 + \dots + \hat{\beta}_pX_p\}}\)

. . .

The coefficients \(\hat{\beta}_0, \ldots, \hat{\beta}_p\) are estimated using maximum likelihood estimation
Basic idea: Find the values of \(\hat{\beta}_0, \ldots, \hat{\beta}_p\) that give the observed data the maximum probability of occurring

Iterative model fitting in R

high_risk_fit_full <- glm(high_risk ~ age + totChol + currentSmoker,
                          data = heart_disease, family = "binomial", 
                          # print each iteration
                        control = glm.control(trace = TRUE))

Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  .... on entry 
Deviance = 3268.418 Iterations - 1
NULL
Deviance = 3225.648 Iterations - 2
[1] -4.4798060788  0.0496514919  0.0008626177  0.2457937103
Deviance = 3224.813 Iterations - 3
[1] -6.394448772  0.078826360  0.001808306  0.417970457
Deviance = 3224.812 Iterations - 4
[1] -6.665531647  0.082366229  0.001992166  0.442315330
Deviance = 3224.812 Iterations - 5
[1] -6.673085394  0.082464718  0.001997547  0.442980128

#stop printing for future models
untrace(glm.fit)

Inference for coefficients

There are two approaches for testing coefficients in logistic regression

Drop-in-deviance test. Use to test…
- a single coefficient
- a categorical predictor with 3+ levels
- a group of predictor variables
(Wald) hypothesis test. Use to test
- a single coefficient

Hypothesis test for \(\beta_j\)

Hypotheses: \(H_0: \beta_j = 0 \hspace{2mm} \text{ vs } \hspace{2mm} H_a: \beta_j \neq 0\), given the other variables in the model

. . .

(Wald) Test Statistic: \[z = \frac{\hat{\beta}_j - 0}{SE(\hat{\beta}_j)}\]

. . .

P-value: \(P(|Z| > |z|)\), where \(Z \sim N(0, 1)\), the Standard Normal distribution

Confidence interval for \(\beta_j\)

We can calculate the C% confidence interval for \(\beta_j\) as the following:

\[ \hat{\beta}_j \pm z^* \times SE(\hat{\beta}_j) \]

where \(z^*\) is calculated from the \(N(0,1)\) distribution

. . .

Note

This is an interval for the change in the log-odds for every one unit increase in \(x_j\)

Interpretation in terms of the odds

The change in odds for every one unit increase in \(x_j\).

\[ \exp\{\hat{\beta}_j \pm z^* \times SE(\hat{\beta}_j)\} \]

. . .

Interpretation: We are \(C\%\) confident that for every one unit increase in \(x_j\), the odds multiply by a factor of \(\exp\{\hat{\beta}_j - z^*\times SE(\hat{\beta}_j)\}\) to \(\exp\{\hat{\beta}_j + z^* \times SE(\hat{\beta}_j)\}\), holding all else constant.

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.67309	0.37815	-17.64688	0.00000	-7.42283	-5.94008
age	0.08246	0.00575	14.34353	0.00000	0.07127	0.09382
totChol	0.00200	0.00103	1.94024	0.05235	-0.00003	0.00401
currentSmoker1	0.44298	0.09359	4.73305	0.00000	0.25998	0.62699

. . .

Hypotheses:

\[ H_0: \beta_{age} = 0 \hspace{2mm} \text{ vs } \hspace{2mm} H_a: \beta_{age} \neq 0 \], given total cholesterol and current smoker is in the model

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.67309	0.37815	-17.64688	0.00000	-7.42283	-5.94008
age	0.08246	0.00575	14.34353	0.00000	0.07127	0.09382
totChol	0.00200	0.00103	1.94024	0.05235	-0.00003	0.00401
currentSmoker1	0.44298	0.09359	4.73305	0.00000	0.25998	0.62699

Test statistic:

\[z = \frac{0.08246 - 0}{0.00575} = 14.34 \]

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.67309	0.37815	-17.64688	0.00000	-7.42283	-5.94008
age	0.08246	0.00575	14.34353	0.00000	0.07127	0.09382
totChol	0.00200	0.00103	1.94024	0.05235	-0.00003	0.00401
currentSmoker1	0.44298	0.09359	4.73305	0.00000	0.25998	0.62699

P-value:

\[ P(|Z| > |14.34|) \approx 0 \]

. . .

2 * pnorm(14.34,lower.tail = FALSE)

[1] 1.230554e-46

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.67309	0.37815	-17.64688	0.00000	-7.42283	-5.94008
age	0.08246	0.00575	14.34353	0.00000	0.07127	0.09382
totChol	0.00200	0.00103	1.94024	0.05235	-0.00003	0.00401
currentSmoker1	0.44298	0.09359	4.73305	0.00000	0.25998	0.62699

Conclusion:

The p-value is very small, so we reject \(H_0\). The data provide sufficient evidence that age is a statistically significant predictor of whether someone is high risk of having heart disease, after accounting for education.

CI for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

Interpret the 95% confidence interval for age in terms of the odds of being high risk for heart disease.

Recap

Comparing logistic regression models using
- Drop-in-deviance test
- AIC
- BIC
Inference for a single model coefficient

Footnotes

Akaike, Hirotugu. “A new look at the statistical model identification.” IEEE transactions on automatic control 19.6 (1974): 716-723.↩︎
Schwarz, Gideon. “Estimating the dimension of a model.” The annals of statistics (1978): 461-464.↩︎

Announcements

Topics

Computational setup

Data: Risk of coronary heart disease

Review: Drop-in-deviance test

Which model do we choose?

Drop-in-deviance test

Drop-in-deviance test

Model comparison using AIC and BIC

AIC & BIC

AIC & BIC

AIC & BIC

Using AIC & BIC

AIC from the glance() function

Comparing the models using AIC

Comparing the models using BIC

Application exercise

Estimating coefficients

Statistical model

Estimating coefficients

Iterative model fitting in R

Inference for coefficients

Inference for coefficients

Hypothesis test for \(\beta_j\)

Confidence interval for \(\beta_j\)

Interpretation in terms of the odds

Coefficient for age

Coefficient for age

Coefficient for age

Coefficient for age

CI for age

Recap

Footnotes

AIC from the `glance()` function

Coefficient for `age`

Coefficient for `age`

Coefficient for `age`

Coefficient for `age`

CI for `age`