class: center, middle, inverse, title-slide # Introduction to logistic regression ### Dr. Olanrewaju Michael Akande ### Sept 17, 2019 --- ## Announcements - Final project outline now online; check the course schedule on the course website - Modified office hours today ## Outline - Questions from last lecture - Relative risk and odds ratio - Logistic regression - Interpreting coefficients --- class: center, middle # Relative risk and odds ratio --- ## Introduction - So far in this course, our response variables have been continuous. -- - Sometimes, we would also like to build models for binary outcome variables. For example, + `\(Y = 1\)`: healthy, `\(Y = 0\)`: not healthy + `\(Y = 1\)`: employed, `\(Y = 0\)`: not employed + `\(Y = 1\)`: win, `\(Y = 0\)`: lose -- - Often, we want to predict or explain the binary outcome variable from several predictors. -- - Linear regression is NOT appropriate, because normality for the response variable (and errors) makes no sense in this case. -- - This brings us to .hlight[logistic regression], the most popular model for binary outcomes. -- - First let's review relative risk, odds and odds ratios. --- ## Absolute risk and relative risk - `\(Y\)`: binary response variable, `\(X\)`: binary predictor <br /> | `\(Y=1\)` | `\(Y=0\)` | :----------- | :---------: | :---------: | `\(X=1\)` | a | b | `\(X=0\)` | c | d | - .hlight[Absolute risk] of `\(Y=1\)` for level `\(X=1\)`: `\(\dfrac{a}{(a+b)}\)` - .hlight[Absolute risk] of `\(Y=1\)` for level `\(X=0\)`: `\(\dfrac{c}{(c+d)}\)` - .hlight[Relative risk (RR)]: `\(\dfrac{a/(a+b)}{c/(c+d)}\)` -- <div class="question"> In which fields do you think relative risk might be useful? </div> --- ## Odds and odds ratio - `\(Y\)`: binary response variable, `\(X\)`: binary predictor <br /> | `\(Y=1\)` | `\(Y=0\)` | :----------- | :---------: | :---------: | `\(X=1\)` | a | b | `\(X=0\)` | c | d | - .hlight[Odds] of `\(Y=1\)` for level `\(X=1\)`: `\(\dfrac{a}{b}\)` - .hlight[Odds] of `\(Y=1\)` for level `\(X=0\)`: `\(\dfrac{c}{d}\)` - .hlight[Odds ratio (OR)]: `\(\dfrac{a/b}{c/d}\)` -- <div class="question"> In which fields do you think odds or odds ratio might be useful? </div> --- ## Probabilities and odds: motivating example - Physicians' Health Study (1989): randomized experiment with 22071 male physicians at least 40 years old. -- - Half the subjects were assigned to take aspirin every other day. -- - The other half were assigned to take a placebo pill. -- - Here are the number of people in each cell of the contingency table <br /> | Heart attack | No heart attack | :------------- | :-----------: | :-------------: | Aspirin | 104 | 10933 | Placebo | 189 | 10845 | --- ## Absolute risk and relative risk for physicians health study - Physicians Health Study <br /> | Heart attack | No heart attack | :------------- | :-----------: | :-------------: | Aspirin | 104 | 10933 | Placebo | 189 | 10845 | - .block[Relative risk of a heart attack when taking aspirin versus when taking a placebo equals] `$$\textrm{RR} = \dfrac{104/(104+10933)}{189/(189+10845)} = 0.55$$` -- - .block[Odds of having a heart attack when taking aspirin over odds of a heart attach when talking a placebo (odds ratio)] `$$\textrm{OR} = \dfrac{104/10933}{189/10845} = 0.546$$` --- ## Interpreting odds ratios and relative risks <br /> | `\(Y=1\)` | `\(Y=0\)` | :----------- | :---------: | :---------: | `\(X=1\)` | a | b | `\(X=0\)` | c | d | - When the variables `\(X\)` and `\(Y\)` are independent `$$OR = 1; \ \ \ \ \ \ \ \ \ RR = 1$$` -- - When subjects with level `\(X=1\)` are more likely to have `\(Y=1\)` than subjects with level `\(X=0\)`, then `$$OR > 1; \ \ \ \ \ \ \ \ \ RR > 1$$` -- - When subjects with level `\(X=1\)` are less likely to have `\(Y=1\)` than subjects with level `\(X=0\)`, then `$$OR < 1; \ \ \ \ \ \ \ \ \ RR < 1$$` --- ## Relative risk vs. absolute risk: smoking and lung cancer - Small or large values of relative risk may or may not be significant depending on the base rate. -- - Thus, it can be more helpful or meaningful to present both the absolute risk and RR. -- - For example, + Percentage of smokers who get lung cancer: 8% (conservative estimate) + Relative risk of lung cancer for smokers: 800% + That is, getting lung cancer is not commonplace, even for smokers but, smokers’chances of getting lung cancer are much, much higher than non-smokers’ chances. + The absolute risk helps place the RR in context. --- class: center, middle # Logistic regression --- ## What if we want to use regression to predict binary outcomes? - RR and OR can be useful, but it would be great to be able to do either or both in a model setting; especially when we have multiple predictors. -- - Let's start small: suppose we want to use linear regression to predict binary `\(y\)` from some predictor `\(x\)`. -- - Recall that the simple linear regression model is .block[ .small[ `$$y_i = \beta_0 + \beta_1 x_{i} + \epsilon_i; \ \ \epsilon_i \overset{iid}{\sim} N(0, \sigma^2).$$` ] ] - Also recall that this means the model implies that `\(y\)` could be any continuous value, when in fact for a binary outcome, it has to be exactly zero or one. -- - Therefore, linear regression is not a reasonable model here. -- - <div class="question"> What distribution do you think would be more ideal for y? </div> --- ## More appropriate distribution for `\(y\)` - Assume for any observation `\(i\)` that .block[ .small[ $$\Pr[y_i = 1 | x_i] = \pi_i \ \ \textrm{and} \ \ \Pr[y_i = 0 | x_i] = 1-\pi_i $$ ] ] where `\(\pi_i\)` is some function of `\(x_i\)`. -- - Notice that this is simply a .hlight[Bernoulli distribution] or a .hlight[Binomial distribution] (with number of trials = 1) where the probability `\(\pi_i\)` is allowed to be potentially different for each observation `\(i\)`. -- - What then should the function that connects `\(\pi_i\)` to `\(x_i\)` look like? -- - Some "not so ideal" options could be: + .hlight[Linear]: .block[ .small[ `$$\pi_i = \beta_0 + \beta_1 x_{i}; \ \ \ \ \ \ \textrm{But } \pi_i \textrm{ can be outside } [0,1]!$$` ] ] + .hlight[Log-linear]: .block[ .small[ `$$\textrm{log}(\pi_i) = \beta_0 + \beta_1 x_{i} \ \ \Rightarrow \ \ \pi_i = e^{\beta_0 + \beta_1 x_{i}}; \ \ \ \ \ \ \textrm{But } \pi_i \textrm{ can be } > 1!$$` ] ] --- ## Logistic regression model - From the log-linear function, we can already see a potential solution to the `\(\pi_i>1\)` problem: we can divide `\(e^{\beta_0 + \beta_1 x_{i}}\)` by a denominator that will always be greater than it. -- - Thus, we can use the function .block[ .small[ `$$\pi_i = \dfrac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}} \ \ \Rightarrow \ \ \textrm{log} \left(\dfrac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1 x_i.$$` ] ] `\(\textrm{log} \left(\dfrac{\pi_i}{1-\pi_i}\right)\)` is called the .hlight[logit function], also written as `\(\textrm{logit}(\pi_i)\)`. Notice that the logit function is essentially the .hlight[log-odds], i.e., log of the odds. -- - We can then formally write the .hlight[logistic regression model] as .block[ .small[ $$ `\begin{split} \Pr[y_i = 1 | x_i] = \pi_i \ \ \textrm{and} \ \ \Pr[y_i = 0 | x_i] = 1-\pi_i; \ \ \ & \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1 x_i, \\ \textrm{OR } \ \ \ y_i | x_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ & \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1 x_i. \end{split}` $$ ] ] --- ## Solving the logit equation - Let's see how to go from `\(\textrm{logit}(\pi_i)\)` back to `\(\pi_i\)`. .block[ .small[ $$ `\begin{split} \textrm{logit}(\pi_i) = \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) & = \beta_0 + \beta_1 x_i \\ \Rightarrow \ e^{\textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right)} & = e^{\beta_0 + \beta_1 x_i} \\ \Rightarrow \ \dfrac{\pi_i}{1-\pi_i} & = e^{\beta_0 + \beta_1 x_i} \\ \Rightarrow \ \pi_i & = e^{\beta_0 + \beta_1 x_i} (1-\pi_i) \\ \Rightarrow \ \pi_i & = e^{\beta_0 + \beta_1 x_i} - \pi_i e^{\beta_0 + \beta_1 x_i} \\ \Rightarrow \ \pi_i + \pi_i e^{\beta_0 + \beta_1 x_i} & = e^{\beta_0 + \beta_1 x_i} \\ \Rightarrow \ \pi_i (1+e^{\beta_0 + \beta_1 x_i}) & = e^{\beta_0 + \beta_1 x_i} \\ \therefore \ \pi_i & = \dfrac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}} \\ \end{split}` $$ ] ] -- - By the way, another function that works well for linking `\(\pi_i\)` to `\(x_i\)` is the .hlight[probit function]; the quantile function (or inverse of the cumulative distribution function) associated with the standard normal distribution. -- - We will formally explore the .hlight[probit regression model] later in the course. --- ## Interpreting coefficients - From .block[ .small[ `$$\textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1 x_i$$` ] ] we can see that "*as we increase `\(x\)` by 1 unit, we increase the log-odds of `\(y\)` being 1 by `\(\beta_1\)`*". -- - Equivalently, from .block[ .small[ `$$\dfrac{\pi_i}{1-\pi_i} = e^{\beta_0 + \beta_1 x_i} = e^{\beta_0} e^{\beta_1 x_i}$$` ] ] we can see that "*as we increase `\(x\)` by 1 unit, we increase the odds for `\(y\)` by a multiplicative effect of `\(e^{\beta_1}\)`*". -- - With mean-centered `\(x\)`, `\(\beta_0\)` is the log-odds for `\(y\)` at the mean of `\(x\)`, and `\(e^{\beta_0}\)` is the odds for `\(y\)` at the mean of `\(x\)`. -- - Often much easier to interpret results by graphing the (predicted) probabilities for values of `\(x\)`. --- ## Interpreting coefficients: categorical predictors When `\(x\)` is binary, - .hlight[Odds] of `\(y=1\)` for level `\(x=1\)`: `\(e^{\beta_0} e^{\beta_1 (1)} = e^{\beta_0 + \beta_1}\)` -- - .hlight[Odds] of `\(y=1\)` for level `\(x=0\)`: `\(e^{\beta_0} e^{\beta_1 (0)} = e^{\beta_0}\)` -- - .hlight[Odds ratio (OR)]: `\(\dfrac{e^{\beta_0} e^{\beta_1}}{e^{\beta_0}} = e^{\beta_1}\)` -- - Thus, `\(e^{\beta_1}\)` has a nice interpretation as the odds ratio when `\(x=1\)` versus `\(x=0\)`, and `\(\beta_1\)` is the corresponding log odds ratio. -- When `\(x\)` is categorical with `\(K > 2\)` levels, the corresponding `\(e^{\beta_{1k}}\)` is the odds ratio when `\(x=k\)` versus whichever level is set as the baseline. -- It is also easy to calculate relative risk from the results of the logistic model. --- ## Estimation of coefficients - Use maximum likelihood estimation. -- - Basic idea is to find the values of `\((\beta_0, \beta_1)\)` that make the observed values of Y most likely to have arisen. -- - Requires multivariate calculus and numerical methods (Newton Raphson algorithm) for estimation. -- - R to the rescue yet again: R does it for us!!! <img src="img/phew.gif" height="250px" style="display: block; margin: auto;" /> --- ## Intervals and significance tests - As with all coefficients, the standard errors represent chance deviations in the estimated values `\((\hat{\beta}_0, \hat{\beta}_1)\)` from the actual values `\((\beta_0, \beta_1)\)` -- - Confidence intervals is usually based on large-sample normal distribution approximations. For example, + `\(95%\)` CI for `\(\hat{\beta}_1\)`: .block[ .small[ `$$\hat{\beta}_1 \pm 1.96 \times \textrm{SE}_{\hat{\beta}_1}$$` ] ] + `\(95%\)` CI for `\(e^{\hat{\beta}_1}\)`: .block[ .small[ `$$e^{\left[\hat{\beta}_1 \pm 1.96 \times \textrm{SE}_{\hat{\beta}_1}\right]}$$` ] ] -- - Confidence intervals can also be computed using the profile-likelihood approach. -- - Although both methods can yield similar CIs with large sample sizes, the profile-likelihood limits can often have better small-sample properties than the asymptotic approximations. R can compute both. --- ## Logistic regression with one predictor: The Challenger analysis - Let's start with a very small dataset to illustrate logistic regression with one predictor. -- - On January 28, 1986, the space shuttle Challenger exploded in the early stages of its flight, killing all crew members. -- - The cause of the explosion was a failure of O-rings, seals on the booster rockets that prevent hot gases from escaping the rockets. -- - When the O-rings failed, one booster rocket caught on fire. The fire eventually burned a hole in the main fuel tank rocket, allowing elements to mix together to produce the catastrophic explosion. -- - The ambient temperature was 36 degrees on the morning of the launch. --- ## Logistic regression with one predictor: The Challenger analysis - Some of NASA’s scientists worried that the O-rings may not perform well in low temperatures; they had no experience with launches in such temperatures. -- - To aid their decision making process, the NASA scientists examined data from previous shuttle flights. -- - They graphed the number of O-ring failures versus the ambient temperature for all flights where there was at least one O-ring failure. -- - They obtained these data by examining the booster rockets that fall to the Earth after being jettisoned during a flight. --- ## The Challenger analysis The data is in the file `challenger.csv` on Sakai. ```r challenger <- read.csv("data/challenger.csv",header=T,colClasses=c("numeric","factor")) challenger ``` ``` ## TEMP FAILURE ## 1 53 Yes ## 2 56 Yes ## 3 57 Yes ## 4 63 No ## 5 66 No ## 6 67 No ## 7 67 No ## 8 67 No ## 9 68 No ## 10 69 No ## 11 70 No ## 12 70 Yes ## 13 70 Yes ## 14 70 Yes ## 15 72 No ## 16 73 No ## 17 75 No ## 18 75 Yes ## 19 76 No ## 20 76 No ## 21 78 No ## 22 79 No ## 23 80 No ## 24 81 No ``` --- ## The Challenger analysis The data is in the file `challenger.csv` on Sakai. ```r colnames(challenger) <- c("temp", "failure") challenger$fail01 <- rep(0, 24); challenger$fail01[challenger$failure=="Yes"] <- 1 plot(challenger$fail01,x=challenger$temp,xlab="Temperature",ylab="Failure",col="navy",pch=17) ``` <img src="07-logistic-reg-intro_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> This is not easy to interpret. We can try a box plot. --- ## The Challenger analysis This doesn't really portray a prediction of failure from temperature, but it's somewhat useful to get an overall sense of the data. ```r boxplot(challenger$temp~challenger$fail01,ylab="Temperature",xlab="Failure") ``` <img src="07-logistic-reg-intro_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> Be careful; we don't have that many data points. Let's continue nonetheless. --- ## The Challenger analysis Let's fit the following logistic regression to the data. .block[ .small[ `$$\textrm{failure}_i | \textrm{temp}_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1 \textrm{temp}_i$$` ] ] ```r logreg <- glm(fail01~temp,family=binomial(link=logit),data=challenger); summary(logreg) ``` ``` ## ## Call: ## glm(formula = fail01 ~ temp, family = binomial(link = logit), ## data = challenger) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.2125 -0.8253 -0.4706 0.5907 2.0512 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 10.87535 5.70291 1.907 0.0565 . ## temp -0.17132 0.08344 -2.053 0.0400 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 28.975 on 23 degrees of freedom ## Residual deviance: 23.030 on 22 degrees of freedom ## AIC: 27.03 ## ## Number of Fisher Scoring iterations: 4 ``` --- ## The Challenger analysis Let's mean-center the temperature for interpretation. ```r challenger$tempcent <- challenger$temp - mean(challenger$temp) logreg <- glm(fail01~tempcent,family=binomial(link=logit),data=challenger) summary(logreg) ``` ``` ## ## Call: ## glm(formula = fail01 ~ tempcent, family = binomial(link = logit), ## data = challenger) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.2125 -0.8253 -0.4706 0.5907 2.0512 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -1.10281 0.53999 -2.042 0.0411 * ## tempcent -0.17132 0.08344 -2.053 0.0400 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 28.975 on 23 degrees of freedom ## Residual deviance: 23.030 on 22 degrees of freedom ## AIC: 27.03 ## ## Number of Fisher Scoring iterations: 4 ``` --- ## The Challenger analysis Confidence intervals for the coefficients. Remember that this is on the log-odds scale. ```r confint.default(logreg) #Asymptotic ``` ``` ## 2.5 % 97.5 % ## (Intercept) -2.1611649 -0.044453551 ## tempcent -0.3348573 -0.007783746 ``` ```r confint(logreg) #Based on the profile-likelihood ``` ``` ## Waiting for profiling to be done... ``` ``` ## 2.5 % 97.5 % ## (Intercept) -2.3117892 -0.1267476 ## tempcent -0.3779407 -0.0305637 ``` -- <div class="question"> Can you interpret the intervals? </div> --- ## The Challenger analysis Let's transform to the odds scale. ```r exp(confint.default(logreg)) #Asymptotic ``` ``` ## 2.5 % 97.5 % ## (Intercept) 0.1151909 0.9565200 ## tempcent 0.7154402 0.9922465 ``` ```r exp(confint(logreg)) #Based on the profile-likelihood ``` ``` ## Waiting for profiling to be done... ``` ``` ## 2.5 % 97.5 % ## (Intercept) 0.09908381 0.8809560 ## tempcent 0.68527115 0.9698986 ``` -- <div class="question"> Can you interpret the intervals? </div> --- ## The Challenger analysis We can get the predicted probabilities for the observed cases. ```r predprobs <- predict(logreg,type="response"); cbind(predprobs,challenger) #use predict(logreg, type="link") for the logit scale ``` ``` ## predprobs temp failure fail01 tempcent ## 1 0.85758349 53 Yes 1 -16.91666667 ## 2 0.78268818 56 Yes 1 -13.91666667 ## 3 0.75214414 57 Yes 1 -12.91666667 ## 4 0.52052785 63 No 0 -6.91666667 ## 5 0.39369565 66 No 0 -3.91666667 ## 6 0.35362920 67 No 0 -2.91666667 ## 7 0.35362920 67 No 0 -2.91666667 ## 8 0.35362920 67 No 0 -2.91666667 ## 9 0.31551837 68 No 0 -1.91666667 ## 10 0.27973723 69 No 0 -0.91666667 ## 11 0.24655222 70 No 0 0.08333333 ## 12 0.24655222 70 Yes 1 0.08333333 ## 13 0.24655222 70 Yes 1 0.08333333 ## 14 0.24655222 70 Yes 1 0.08333333 ## 15 0.18850910 72 No 0 2.08333333 ## 16 0.16368693 73 No 0 3.08333333 ## 17 0.12199327 75 No 0 5.08333333 ## 18 0.12199327 75 Yes 1 5.08333333 ## 19 0.10479854 76 No 0 6.08333333 ## 20 0.10479854 76 No 0 6.08333333 ## 21 0.07672851 78 No 0 8.08333333 ## 22 0.06543827 79 No 0 9.08333333 ## 23 0.05570909 80 No 0 10.08333333 ## 24 0.04735313 81 No 0 11.08333333 ``` --- ## The Challenger analysis Useful to examine a plot of predicted probabilities by `\(x\)`. ```r plot(y=predprobs,x=challenger$temp,xlab="Temperature",ylab="Pred. Probability of Failure", type="o",col=c("red4","navy")) ``` <img src="07-logistic-reg-intro_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## The Challenger analysis Actually, let's make the plot across various values to get a smooth curve for the predicted probabilities. ```r newdata <- data.frame(tempcent=(seq(50,85,length.out=50)-mean(challenger$temp))) newpreds <- predict(logreg, newdata, type="response") plot(y=newpreds,x=newdata$temp,xlab="Temperature",ylab="Pred. Probability of Failure", type="o",col=c("red4","navy")) ``` <img src="07-logistic-reg-intro_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ## Final notes - Let's stop here today. - In the next class, we will pick off from here and fit a logistic regression with multiple predictors - We will see how to do EDA when dealing with a binary response variable. - We will also learn to do model assessment and validation for logistic models.