5  Variable Selection – STAT 508 (2024)

Variable Selection

5.1 Variable Selection for the Linear Model

So in linear regression, the more features \(X_j\) the better (since RSS keeps going down)? NO!

Carefully selected features can improve model accuracy. But adding too many can lead to overfitting:

  • Overfitted models describe random error or noise instead of any underlying relationship;

  • They generally have poor predictive performance on test data;

  • For instance, we can use a 15-degree polynomial function to fit the following data so that the fitted curve goes nicely through the data points. However, a brand new dataset collected from the same population may not fit this particular curve well at all.

  • Sometimes when we do prediction we may not want to use all of the predictor variables (sometimes p is too big). For example, a DNA array expression example has a sample size (N) of 96 but a dimension (p) of over 4000!

In such cases, we would select a subset of predictor variables to perform regression or classification, e.g.to choose k predicting variables from the total of p variables yielding minimum \(RSS(\hat{\beta})\).

Variable Selection for the Linear Regression Model

When the association of Y and \(X_j\) conditioning on other features is of interest, we are interested in testing \(H_0 : beta_j= 0\) versus \(H_a : beta_j ne 0\).

  • Under the normal error (residual) assumption, \(z_j = \frac{\hat{\beta}}{\hat{\sigma}\sqrt{v_j}}\), where \(v_j\) is the jth diagonal element of \((X^{'} X)^{-1}\).

  • \(z_j\) is distributed as \(t_{N-p-1}\) (a student’s t-distribution with \(N - p - 1\) degrees of freedom).

When the prediction is of interest:

  • F-test;

  • Likelihood ratio test;

  • AIC, BIC, etc.;

  • Cross-validation.

F-test

The residual sum-of-squares \(RSS(\beta)\) is defined as:

\[RSS(\beta)=\sum_{i=1}^{N}(y_i-\hat{y}_i)^2 = \sum_{i=1}^{N}(y_i-X_i\beta)^2\]

Let \(RSS_1\) correspond to the bigger model with \(p_1 + 1\) parameters, and \(RSS_0\) correspond to the nested smaller model with p_0 + 1 parameters.

The F statistic measures the reduction of RSS per additional parameter in the bigger model:

\[(F=\frac{(RSS_0-RSS_1)/(p_1-p_0)}{RSS_1/(N-p_1-1)})\]

Under the normal error assumption, the F statistic will have a \(F_{(p_1-p_0), (N-p_1-1)}\) distribution.

For linear regression models, an individual t-test is equivalent to an F-test for dropping a single coefficient \(\beta_j\) from the model.

Likelihood Ratio Test (LRT)

Let \(L_1\) be the maximum value of the likelihood of the bigger model.

Let \(L_0\) be the maximum value of the likelihood of the nested smaller model.

The likelihood ratio \(\lambda = L_{0} / L_{1}\) is always between 0 and 1, and the less likely are the restrictive assumptions underlying the smaller model, the smaller will be (lambda).

The likelihood ratio test statistic (deviance), \(-2log(\lambda)\), approximately follows a \(\chi_{p_1-p_0}^{2}\) distribution.

So we can test the fit of the ‘null’ model \(M_0\) against a more complex model \(M_1\).

Note that the quantiles of the \(F_{(p_1-p_0), (N-p_1-1)}\) distribution approach those of the \(\chi_{p_1-p_0}^{2}\) distribution.

Akaike Information Criterion (AIC)

Use of the LRT requires that our models are nested. Akaike (1971/74) proposed a more general measure of “model badness:”

\[AIC=-2 log L(\hat{\beta}) + 2p \]

where p is the number of parameters.

Faced with a collection of putative models, the ‘best’ (or ‘least bad’) one can be chosen by seeing which has the lowest AIC.

The scale is statistical, not scientific, but the trade-off is clear; we must improve the log-likelihood by one unit for every extra parameter.

AIC is asymptotically equivalent to leave-one-out cross-validation.

Bayes Information Criterion (BIC)

AIC tends to overfit models (see Good and Hardin Chapter 12 for how to check this).

Another information criterion which penalizes complex models more severely is:

\[BIC=-2 log L(\hat{\beta})+p\times log(n)\]

also known as the Schwarz’ criterion due to Schwarz (1978), where an approximate Bayesian derivation is given.

Lowest BIC is taken to identify the ‘best model’, as before.

BIC tends to favor simpler models than those chosen by AIC.

Stepwise Selection

AIC and BIC also allow stepwise model selection.

An exhaustive search for the subset may not be feasible if p is very large. There are two main alternatives:

  • Forward stepwise selection:

    • First, we approximate the response variable y with a constant (i.e., an intercept-only regression model).

    • Then we gradually add one more variable at a time (or add main effects first, then interactions).

    • Every time we always choose from the rest of the variables the one that yields the best accuracy in prediction when added to the pool of already selected variables. This accuracy can be measured by the F-statistic, LRT, AIC, BIC, etc.

    • For example, if we have 10 predictor variables, first we would approximate y with a constant, and then use one variable out of the 10 (I would perform 10 regressions, each time using a different predictor variable; for every regression I have a residual sum of squares; the variable that yields the minimum residual sum of squares is chosen and put in the pool of selected variables). We then proceed to choose the next variable from the 9 left, etc.

  • Backward stepwise selection: This is similar to forward stepwise selection, except that we start with the full model using all the predictors and gradually delete variables one at a time.

There are various methods developed to choose the number of predictors, for instance, the F-ratio test. We stop forward or backward stepwise selection when no predictor produces an F-ratio statistic greater than some threshold.

5.2 R Scripts

Continuation from Lesson 4.5.

  1. Subset Selection To perform forward stepwise addition and backward stepwise deletion, the R function step is used for subset selection. For forward stepwise selection, baseModel indicates an initial model in the stepwise search and scope defines the range of models examined in the stepwise search. In the example below, the model starts from the base model and expands to the full model.
step(baseModel,scope=list(upper=fullModel,lower=~1),direction="forward")

The result shows the details for the selected predictor variable in each step. In this case \(Y ~ X_2 + X_6 + X_1 + X_7 + X_3 + X_8\) is the best sequence of variables added using forward selection.

Similarly, backward stepwise deletion of variables can be executed by the following code:

step(fullModel, direction="backward")

It is also possible to use the regsubsets function in the leaps library to perform the best subset selection and stepwise selection (this is covered in the R Lab for this lesson).

5  Variable Selection – STAT 508 (2024)
Top Articles
Homemade Cinnamon Baking Chips recipe
Filipino Spaghetti Recipe (w/ Sweet Spaghetti Sauce) - Hungry Huy
Funny Roblox Id Codes 2023
Www.mytotalrewards/Rtx
San Angelo, Texas: eine Oase für Kunstliebhaber
Golden Abyss - Chapter 5 - Lunar_Angel
Www.paystubportal.com/7-11 Login
Steamy Afternoon With Handsome Fernando
Craigslist Greenville Craigslist
Top Hat Trailer Wiring Diagram
World History Kazwire
R/Altfeet
George The Animal Steele Gif
Nalley Tartar Sauce
Chile Crunch Original
Teenleaks Discord
Immortal Ink Waxahachie
Craigslist Free Stuff Santa Cruz
Mflwer
Costco Gas Foster City
Obsidian Guard's Cutlass
Mission Impossible 7 Showtimes Near Marcus Parkwood Cinema
Sprinkler Lv2
Uta Kinesiology Advising
Kcwi Tv Schedule
Nesb Routing Number
Olivia Maeday
Random Bibleizer
10 Best Places to Go and Things to Know for a Trip to the Hickory M...
Receptionist Position Near Me
Gopher Carts Pensacola Beach
Duke University Transcript Request
Nikki Catsouras: The Tragic Story Behind The Face And Body Images
Kiddie Jungle Parma
Lincoln Financial Field, section 110, row 4, home of Philadelphia Eagles, Temple Owls, page 1
The Latest: Trump addresses apparent assassination attempt on X
In Branch Chase Atm Near Me
Appleton Post Crescent Today's Obituaries
Craigslist Red Wing Mn
American Bully Xxl Black Panther
Ktbs Payroll Login
Jail View Sumter
Thotsbook Com
Funkin' on the Heights
Caesars Rewards Loyalty Program Review [Previously Total Rewards]
Marcel Boom X
Www Pig11 Net
Ty Glass Sentenced
Michaelangelo's Monkey Junction
Game Akin To Bingo Nyt
Ranking 134 college football teams after Week 1, from Georgia to Temple
Latest Posts
Article information

Author: Jerrold Considine

Last Updated:

Views: 5530

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Jerrold Considine

Birthday: 1993-11-03

Address: Suite 447 3463 Marybelle Circles, New Marlin, AL 20765

Phone: +5816749283868

Job: Sales Executive

Hobby: Air sports, Sand art, Electronics, LARPing, Baseball, Book restoration, Puzzles

Introduction: My name is Jerrold Considine, I am a combative, cheerful, encouraging, happy, enthusiastic, funny, kind person who loves writing and wants to share my knowledge and understanding with you.