Mike Wolfgang
2006-Apr-11 21:49 UTC
[R] variable selection when categorical variables are available
Dear All, Probably it is not highly relevant question: Why do stepwise regression functions in R (step() or stepAIC()) add/delete categorical variables as a set? For example, I have a four-level factor variable d, so dummies are d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3 are simultaneously added/removed. What's the concern here if operating dummies individually? Model interpretability or anything else? (it seems shrinkage methods can operate them one by one) Thanks mike [[alternative HTML version deleted]]
Frank E Harrell Jr
2006-Apr-11 21:59 UTC
[R] variable selection when categorical variables are available
Mike Wolfgang wrote:> Dear All, > > Probably it is not highly relevant question: Why do stepwise regression > functions in R (step() or stepAIC()) add/delete categorical variables as a > set? For example, I have a four-level factor variable d, so dummies are > d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3 > are simultaneously added/removed. What's the concern here if operating > dummies individually? Model interpretability or anything else? (it seems > shrinkage methods can operate them one by one) > > Thanks > mikeYou would be on shaky ground statistically and interpretation wise to break up the variables. Stepwise regression causes enough problems (invalidating most of the statististics from the final model) without doing that. Shrinkage methods do not operate on them one by one; they shrink the estimates to the mean of all 4 groups (see for example the ols function in the Design package). -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Prof Brian Ripley
2006-Apr-12 06:18 UTC
[R] variable selection when categorical variables are available
On Tue, 11 Apr 2006, Mike Wolfgang wrote:> Probably it is not highly relevant question: Why do stepwise regression > functions in R (step() or stepAIC()) add/delete categorical variables as a > set?Yes, those two do. Others (e.g. in package leaps) may not.> For example, I have a four-level factor variable d, so dummies are > d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3 > are simultaneously added/removed. What's the concern here if operating > dummies individually? Model interpretability or anything else? (it seems > shrinkage methods can operate them one by one)-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595