Hi, I am trying to understand the alternative methods that are available for selecting variables in a regression without simply imposing my own bias (having "good judgement"). The methods implimented in leaps and step and stepAIC seem to fall into the general class of stepwise procedures. But these are commonly condemmed for inducing overfitting. In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning" chapter 3, they describe a number of procedures that seem better. The use of cross-validation in the training stage presumably helps guard against overfitting. They seem particularly favorable to shrinkage through ridge regressions, and to the "lasso". This may not be too surprising, given the authorship. Is the lasso "generally accepted" as being a pretty good approach? Has it proved its worth on a variety of problems? Or is it at the "interesting idea" stage? What, if anything, would be widely accepted as being sensible -- apart from having "good judgement". In econometrics there is a school (the "LSE methodology") which argues for what amounts to stepwise regressions combined with repeated tests of the properties of the error terms. (It is actually a bit more complex than that.) This has been coded in the program PCGets: (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html) If anyone knows how this compares in terms of effectiveness to the methods discussed in Hastie et al., I would really be very interested. Cheers, Murray Murray Z. Frank B.I. Ghert Family Foundation Professor Strategy & Business Economics Faculty of Commerce University of British Columbia Vancouver, B.C. Canada V6T 1Z2 phone: 604-822-8480 fax: 604-822-8477 e-mail: Murray.Frank at commerce.ubc.ca -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 28 Feb 2002, Frank, Murray wrote:> Hi, > > I am trying to understand the alternative methods that are available for > selecting > variables in a regression without simply imposing my own bias (having "good > judgement"). The methods implimented in leaps and step and stepAIC seem to > fall into the general class of stepwise procedures. But these are commonly > condemmed for inducing overfitting.There are big differences between regression with only continuous variates, and regression involving hierarchies of factors. step/stepAIC include the latter, the rest do not. A second difference is the purpose of selecting a model. AIC is intended to select a model which is large enough to include the `true' model, and hence to give good predictions. There over-fitting is not a real problem. (There are variations on AIC which do not assume some model considered is true.) This is a different aim from trying to find the `true' model or trying to find the smallest adequate model, both aims for explanation not prediction. AIC is often criticised (`condemmed') for not being good at what it does not intend to do. [Sometimes R is, too.] Shrinkage methods have their advocates for good predictions (including me), but they are a different class of statistical methods, that is *not* regression. They too have issues of selection, usually how much to shrink and often how to calibrate equal shrinkage across predictors. In ridge regression choosing the ridge coefficient is not easy, and depends on the scaling of the variables. In the neural networks field, shrinkage is widely used.> In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning" > chapter 3, > they describe a number of procedures that seem better. The use ofI think that is a quite selective account.> cross-validation > in the training stage presumably helps guard against overfitting. They seem > particularly favorable to shrinkage through ridge regressions, and to the > "lasso". This > may not be too surprising, given the authorship. Is the lasso "generally > accepted" as > being a pretty good approach? Has it proved its worth on a variety of > problems? Or is > it at the "interesting idea" stage? What, if anything, would be widely > accepted as > being sensible -- apart from having "good judgement".Depends on the aim. If you look at the account in Venables & Ripley you will see many caveats about any automated method: all statistical problems (outside textbooks) come with a context which should be used in selecting variables if the aim is explanation, and perhaps also if it is prediction. You should use what you know about the variables and the possible mechanisms, especially to select derived variables. But generally model averaging (which you have not mentioned and is for regression a form of shrinkage) seems to have most support for prediction.> In econometrics there is a school (the "LSE methodology") which argues > for what amounts to stepwise regressions combined with repeated tests of > the properties of the error terms. (It is actually a bit more complex > than that.) This has been coded in the program PCGets: > (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)Lots of hyperbolic claims, no references. But I suspect this is `ex-LSE' methodology, associated with Hendry's group (as PcGive and Ox are), and there is a link to Hendry (who is in Oxford).> If anyone knows how this compares in terms of effectiveness to the methods > discussed in > Hastie et al., I would really be very interested.It has a different aim, I believe. Certainly `effectiveness' has to be assessed relative to a clear aim, and simulation studies with true models don't seem to me to have the right aim. Statisticians of the Box/Cox/Tukey generation would say that effectiveness in deriving scientific insights was the real test (and I recall hearing that from those I named). Chpater 2 of my `Pattern Recognition and Neural Networks' takes a much wider view of the methods available for model selection, and their philosophies. Specifically for regression, you might take a look at Frank Harrell's book. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thanks for the most informative, and helpful feedback. Professor Ripley wrote: (most of his message has been edited out)>There are big differences between regression with only continuous variates,>and regression involving hierarchies of factors. step/stepAIC include the >latter, the rest do not.In much of Venables and Ripley, bootstrapping keeps popping up. Is there a reason not to run step/stepAIC repeatedly on bootstrapped samples from the original data? On the face of it, bootstrapping seems intuitively appealing in this context. (Would some form of cross-validation on subsamples be better?)>But generally model >averaging (which you have not mentioned and is for regression a form of >shrinkage) seems to have most support for prediction.What do you mean by model averaging? It does not seem to match the discussion of model selection that I found in Venables and Ripley (ie pages 186-188).>Lots of hyperbolic claims, no references. But I suspect this is `ex-LSE' >methodology, associated with Hendry's group (as PcGive and Ox are), and >there is a link to Hendry (who is in Oxford).Quite right. It is the Hendry group. As far as I can figure out, the main specific references are to: Hoover, K. D., and Perez, S. J. (1999). Data mining reconsidered: Encompassing and the general-to specific approach to specification search. Econometrics Journal, 2, 167-191. Hoover, K. D., and Perez, S. J. (2001). Truth and robustness in cross-country growth regressions. unpublished paper, Economics Department, University of California, Davis.>It has a different aim, I believe. Certainly `effectiveness' has to be >assessed relative to a clear aim, and simulation studies with true models >don't seem to me to have the right aim.As suggested, the Hoover and Perez papers are basically simulation studies where finding a true model was the aim. The working paper on growth regressions tries to go further, and seems to have reasonable sounding economic conclusions.>Statisticians of the Box/Cox/Tukey >generation would say that effectiveness in deriving scientific insights >was the real test (and I recall hearing that from those I named).It is hard to argue with that claim. But it is equally hard to see it as complete. How do we define "scientific insight"? Or is it one of those cases of: "I don't know how to define it, but I know it when I see it"? Murray Z. Frank B.I. Ghert Family Foundation Professor Strategy & Business Economics Faculty of Commerce University of British Columbia Vancouver, B.C. Canada V6T 1Z2 phone: 604-822-8480 fax: 604-822-8477 e-mail: Murray.Frank at commerce.ubc.ca> -----Original Message----- > From: Frank, Murray > Sent: Thursday, February 28, 2002 4:12 PM > To: > Subject: step, leaps, lasso, LSE or what? > > Hi, > > I am trying to understand the alternative methods that are available for > selecting > variables in a regression without simply imposing my own bias (having > "good > judgement"). The methods implimented in leaps and step and stepAIC seem to > > fall into the general class of stepwise procedures. But these are commonly > > condemmed for inducing overfitting. > > In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning" > chapter 3, > they describe a number of procedures that seem better. The use of > cross-validation > in the training stage presumably helps guard against overfitting. They > seem > particularly favorable to shrinkage through ridge regressions, and to the > "lasso". This > may not be too surprising, given the authorship. Is the lasso "generally > accepted" as > being a pretty good approach? Has it proved its worth on a variety of > problems? Or is > it at the "interesting idea" stage? What, if anything, would be widely > accepted as > being sensible -- apart from having "good judgement". > > In econometrics there is a school (the "LSE methodology") which argues for > what > amounts to stepwise regressions combined with repeated tests of the > properties of > the error terms. (It is actually a bit more complex than that.) This has > been coded in > the program PCGets: > (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html) > If anyone knows how this compares in terms of effectiveness to the methods > discussed in > Hastie et al., I would really be very interested. > > Cheers, > Murray > > Murray Z. Frank > B.I. Ghert Family Foundation Professor > Strategy & Business Economics > Faculty of Commerce > University of British Columbia > Vancouver, B.C. > Canada V6T 1Z2 > > phone: 604-822-8480 > fax: 604-822-8477 > e-mail: Murray.Frank at commerce.ubc.ca >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._