class: center, middle, inverse, title-slide # MLR: model building and selection ### Dr. Olanrewaju Michael Akande ### Sept 12, 2019 --- ## Announcements - More office hours + Mon 3:00 - 4:00pm + Tue 2:00 - 4:00pm + Thur 2:00 - 3:00pm ## Outline - Questions from last lecture - Prediction vs. interpretation - Model selection criterion: AIC, BIC - Common selection strategies --- class: center, middle # Prediction vs. interpretation --- ## Which predictors should be in your model? - This is a very hard question and one of intense statistical research. - Different people have different opinions on how to answer the question. - It also depends on the goal of your analysis: prediction vs. interpretation or association. - We will not focus on answering the question on which is the best "overall". - Instead, we will focus on how to approach the problem and the most common methods used. - See Section 6.1 of [An Introduction to Statistical Learning with Applications in R](http://faculty.marshall.usc.edu/gareth-james/ISL/) for more details on the methods we will cover. --- ## What variables should you include? - .hlight[Goal]: prediction + Include only variables that are strong predictors of the outcome. + Excluding irrelevant variables can reduce the widths of the prediction intervals. - .hlight[Goal]: interpretation and association + Include all variables that you thought a apriori were related to the outcome of interest, even if they are not statistically significant. + This improves interpretation of coefficients of interest. --- class: center, middle # Model selection criterion --- ## Model selection criterion The most common are: - Adjusted R-squared: .block[ .small[ $$\textrm{Adj.}R^2 = 1 - (1-R^2) \left[\dfrac{n-1}{n-p-1} \right] $$ ] ] - Akaike's Information Criterion (AIC): .block[ .small[ $$\textrm{AIC} = n \textrm{ln}(\textrm{RSS}) - n \textrm{ln}(n) + 2(p+1) $$ ] ] - Bayesian Information Criterion (BIC) or Schwarz Criterion: .block[ .small[ `$$\textrm{BIC} = n \textrm{ln}(\textrm{RSS}) - n \textrm{ln}(n) + (p+1) \textrm{ln}(n)$$` ] ] where `\(n\)` is the number of observations, `\(p\)` is the number of variables (or parameters) excluding the intercept, and RSS is the residual sum of squares, that is, .block[ .small[ `$$\textrm{RSS} = \sum^n_{i=1} \left(y_i - \hat{y}_i \right)^2.$$` ] ] --- ## Model selection criterion - Note: + Large `\(\textrm{Adj.}R^2\)` = .hlight[good!] + Small AIC = .hlight[good!] + Small BIC = .hlight[good!] - Notice that BIC generally places a heavier penalty on models with many variables for `\(n > 8\)` since .block[ .small[ $$\textrm{ln}(n) (p+1) > 2(p+1) $$ ] ] for fixed `\(p\)` and `\(n > 8\)`. - Thus, BIC can result in the selection of smaller models than AIC. - *Note: the formulas for `\(\textrm{Adj.}R^2\)`, AIC and BIC in Section 6.1 of [An Introduction to Statistical Learning with Applications in R](http://faculty.marshall.usc.edu/gareth-james/ISL/) take slightly different forms but are equivalent to those given here when comparing models.* --- class: center, middle # Common selection strategies --- ## Backward selection - Start with the full model that includes all `\(p\)` available predictors. - Drop variables one at a time that are deemed irrelevant based on some criterion. + Drop the variable with the largest p-value + Drop variables (possibly all at once) with p-value over some threshold (for example, 0.10). + Drop the variable that leads to the smallest value of AIC or BIC, or the largest value of `\(\textrm{Adj.}R^2\)`. *You might even consider using average MSE from k-fold cross-validation if the goal is prediction.* - Stop when removing variables no longer improve the model, based on the chosen criterion. --- ## Forward selection - Start with the model that only includes the intercept. - Add variables one at a time based on some criterion. + Add the variable with the smallest p-value using some threshold (for example, 0.10). + Add the variable that leads to the smallest value of AIC or BIC, or the largest value of `\(\textrm{Adj.}R^2\)`. *Again, you might consider using average MSE from k-fold cross-validation if the goal is prediction.* - Stop when adding variables no longer improves the model, based on the chosen criterion. --- ## Stepwise selection - Start with the model that only includes the intercept. - Potentially do one forward step to enter a variable in the model, using some criterion to decide if it is worth including the variable. - From the current model, potentially do one backwards step, using some criterion to decide if it is worth dropping one of the variables in the model. - Repeat these steps until the model does not change. --- ## Model Selection in R - .hlight[step] function: forward, backward, and stepwise selection using AIC/BIC. - .hlight[regsubsets] function (.hlight[leaps] package): forward, backward, and stepwise selection using `\(\textrm{Adj.}R^2\)` or BIC. --- ## Other options: shrinkage methods - Fit a model containing all `\(p\)` available predictors, then use a technique that shrinks the coefficient estimates towards zero. - The two most common methods are: + Ridge regression + Lasso regression (performs variable selection) - We will not cover these methods in this course. - Consider taking STA521 if you are interested in learning about how they work! --- ## Back to the diamonds data Let's try model selection for our example. First, forward selection using AIC ```r diamonds <- read.csv("data/diamonds.csv", header= T, colClasses = c("numeric","factor","factor","factor","numeric")) diamonds$CaratsCent <- diamonds$Carats - mean(diamonds$Carats) diamonds$CaratsCent2 <- diamonds$CaratsCent^2 NullModel <- lm(log(Price)~1,data=diamonds) FullModel <- lm(log(Price)~CaratsCent+CaratsCent2+Color*Clarity+Color*Certification +Clarity*Certification,data=diamonds) Model_forward <- step(NullModel, scope = formula(FullModel),direction="forward",trace=0) # Remove the trace=0 option if you want to function to print the entire process # Let's see the variables the model selected Model_forward$call ``` ``` ## lm(formula = log(Price) ~ CaratsCent + CaratsCent2 + Color + ## Clarity + Certification + Color:Clarity + Color:Certification, ## data = diamonds) ``` ```r #run summary(Model_forward) to see the results of the final model ``` --- ## Back to the diamonds data Let's do the same using BIC ```r # use k = log(n) to use BIC instead. n <- nrow(diamonds) Model_forward <- step(NullModel, scope = formula(FullModel),direction="forward",trace=0, k = log(n)) # Let's see the variables the model selected Model_forward$call ``` ``` ## lm(formula = log(Price) ~ CaratsCent + CaratsCent2 + Color + ## Clarity, data = diamonds) ``` ```r #run summary(Model_forward) to see the results of the final model ``` --- ## Back to the diamonds data Backward selection using AIC ```r Model_backward <- step(FullModel,direction="backward",trace=0) # Let's see the variables the model selected Model_backward$call ``` ``` ## lm(formula = log(Price) ~ CaratsCent + CaratsCent2 + Color + ## Clarity + Certification + Color:Clarity + Color:Certification, ## data = diamonds) ``` ```r #run summary(Model_backward) to see the results of the final model ``` Same result as forward selection using AIC --- ## Back to the diamonds data Backward selection using BIC ```r Model_backward <- step(FullModel,direction="backward",trace=0,k = log(n)) # Let's see the variables the model selected Model_backward$call ``` ``` ## lm(formula = log(Price) ~ CaratsCent + CaratsCent2 + Color + ## Clarity, data = diamonds) ``` ```r #run summary(Model_backward) to see the results of the final model ``` Same result as forward selection using BIC --- ## Back to the diamonds data Stepwise selection using AIC ```r Model_stepwise <- step(NullModel, scope = formula(FullModel),direction="both",trace=0) # Let's see the variables the model selected Model_stepwise$call ``` ``` ## lm(formula = log(Price) ~ CaratsCent + CaratsCent2 + Color + ## Clarity + Certification + Color:Clarity + Color:Certification, ## data = diamonds) ``` ```r #run summary(Model_backward) to see the results of the final model ``` Same result as previous results using AIC --- ## Back to the diamonds data Stepwise selection using BIC ```r Model_stepwise <- step(NullModel, scope = formula(FullModel),direction="both",trace=0, k = log(n)) # Let's see the variables the model selected Model_stepwise$call ``` ``` ## lm(formula = log(Price) ~ CaratsCent + CaratsCent2 + Color + ## Clarity, data = diamonds) ``` ```r #run summary(Model_backward) to see the results of the final model ``` Same result as previous results using BIC --- ## Back to the diamonds data Let's use the .hlight[regsubsets] function. ```r library(leaps) Model_forward <- regsubsets(log(Price)~CaratsCent+CaratsCent2+Color*Clarity+ Color*Certification+Clarity*Certification,data=diamonds, method="forward") Select_results <- summary(Model_forward) coef(Model_forward, which.max(Select_results$adjr2)) # Adj R-sq ``` ``` ## (Intercept) CaratsCent CaratsCent2 ColorG ColorH ColorI ## 8.6185951 3.0050895 -2.0109553 -0.1275071 -0.2147009 -0.3185926 ## ClarityVS1 ClarityVS2 ClarityVVS2 ## -0.1688242 -0.2525954 -0.1116575 ``` ```r coef(Model_forward, which.min(Select_results$bic)) #BIC ``` ``` ## (Intercept) CaratsCent CaratsCent2 ColorG ColorH ColorI ## 8.6185951 3.0050895 -2.0109553 -0.1275071 -0.2147009 -0.3185926 ## ClarityVS1 ClarityVS2 ClarityVVS2 ## -0.1688242 -0.2525954 -0.1116575 ``` --- ## Back to the diamonds data ```r Model_backward <- regsubsets(log(Price)~CaratsCent+CaratsCent2+Color*Clarity+ Color*Certification+Clarity*Certification,data=diamonds, method="backward") Select_results <- summary(Model_backward) coef(Model_backward, which.max(Select_results$adjr2)) # Adj R-sq ``` ``` ## (Intercept) CaratsCent CaratsCent2 ColorG ColorH ColorI ## 8.6185951 3.0050895 -2.0109553 -0.1275071 -0.2147009 -0.3185926 ## ClarityVS1 ClarityVS2 ClarityVVS2 ## -0.1688242 -0.2525954 -0.1116575 ``` ```r coef(Model_backward, which.min(Select_results$bic)) #BIC ``` ``` ## (Intercept) CaratsCent CaratsCent2 ColorG ColorH ColorI ## 8.6185951 3.0050895 -2.0109553 -0.1275071 -0.2147009 -0.3185926 ## ClarityVS1 ClarityVS2 ClarityVVS2 ## -0.1688242 -0.2525954 -0.1116575 ```