maechler@stat.math.ethz.ch
2003-Jul-16 12:02 UTC
[Rd] step.lm() fails to drop {many empty 2-way factor cells} (PR#3491)
Exec. Summary: step() basically ``fails'' whereas MASS' stepAIC() does work This may not be a bug in the strictest sense, but at least something for the wish list. Unfortunately I have no time currently to investigate further myself but want to be sure this won't be forgotten: The example is using a real data set with 216 observations on 9 variables -- where we have anonymized variable and factor level names. The 8 explanatory variables are 4 factors (with 3 / 2 / 4 / 4 levels) and 4 continuous covariates. The factor combinations observed are very far from balanced, and have actually quite a few 0 cells. The following is a reproducible R script (you need internet connection and our FTP server working) : ## step.lm() "fails" when 2-way factor interactions have (many) empty cells ## whereas stepAIC() works fine Dat <- read.table("ftp://stat.ethz.ch/U/maechler/R/step-ex.tab", header=TRUE) Dat$f3 <- ordered(Dat$f3) ## Full Model : y ~ (f1+f2+f3+f4 + C1+C2+C3+C4) ## ---------- where f1..f4 are FACTORS and C1..C4 are continuous covariates ## "Look at data" str(Dat) summary(Dat) ## several 2-way interactions have empty cells, e.g. with(Dat, table(f1,f4)) #! with(Dat, table(f1,f3)) ## see the full design: with(Dat, ftable(table(f1,f2,f3,f4))) ##--> many many 0 pairs(Dat, col = 1+as.integer(Dat$f1), pch = 1+as.integer(Dat$sex)) ## Fit a "full model" with all 2-way interactions : lmAll <- lm(formula = y ~ .^2, data = Dat) summary(lmAll) # many insignificant ones {BTW: why is it bad to have "*" here?} drop1(lmAll) ## clearly shows interaction terms that should be dropped (would decrease AIC) ## BUT: sr <- step(lmAll) ## nothing eliminated ?step claims it would work when drop1() does library(MASS) sAr <- stepAIC(lmAll) ## <<< stepAIC() *does* work ## but with instructive ## There were 19 warnings (use warnings() to see them) warnings() ##- Warning messages: ##- 1: 0 df terms are changing AIC in: stepAIC(lmAll) ##- 2: 0 df terms are changing AIC in: stepAIC(lmAll) ##- ... ##- ... summary(sAr)
Prof Brian Ripley
2003-Jul-21 12:12 UTC
[Rd] step.lm() fails to drop {many empty 2-way factor cells} (PR#3491)
Your example works on 2 of the 3 systems I tried (and in fact the warnings you got on stepAIC are spurious). It's all down to rounding errors. I've added a fuzz in the termination test. B On Wed, 16 Jul 2003 maechler@stat.math.ethz.ch wrote:> Exec. Summary: > step() basically ``fails'' whereas MASS' stepAIC() does work > > This may not be a bug in the strictest sense, but at least > something for the wish list. Unfortunately I have no time > currently to investigate further myself but want to be sure this > won't be forgotten: > > The example is using a real data set with 216 observations on 9 > variables -- where we have anonymized variable and factor level names. > The 8 explanatory variables are 4 factors (with 3 / 2 / 4 / 4 levels) > and 4 continuous covariates. The factor combinations observed > are very far from balanced, and have actually quite a few 0 cells. > > The following is a reproducible R script > (you need internet connection and our FTP server working) : > > > ## step.lm() "fails" when 2-way factor interactions have (many) empty cells > ## whereas stepAIC() works fine > > Dat <- read.table("ftp://stat.ethz.ch/U/maechler/R/step-ex.tab", header=TRUE) > Dat$f3 <- ordered(Dat$f3) > > ## Full Model : y ~ (f1+f2+f3+f4 + C1+C2+C3+C4) > ## ---------- where f1..f4 are FACTORS and C1..C4 are continuous covariates > > ## "Look at data" > str(Dat) > summary(Dat) > > ## several 2-way interactions have empty cells, e.g. > with(Dat, table(f1,f4)) #! > with(Dat, table(f1,f3)) > > ## see the full design: > with(Dat, ftable(table(f1,f2,f3,f4))) ##--> many many 0 > pairs(Dat, col = 1+as.integer(Dat$f1), pch = 1+as.integer(Dat$sex)) > > ## Fit a "full model" with all 2-way interactions : > lmAll <- lm(formula = y ~ .^2, data = Dat) > summary(lmAll) # many insignificant ones {BTW: why is it bad to have "*" here?} > > drop1(lmAll) > ## clearly shows interaction terms that should be dropped (would decrease AIC) > > ## BUT: > sr <- step(lmAll) > ## nothing eliminated ?step claims it would work when drop1() does > > library(MASS) > sAr <- stepAIC(lmAll) ## <<< stepAIC() *does* work > ## but with instructive > ## There were 19 warnings (use warnings() to see them) > warnings() > ##- Warning messages: > ##- 1: 0 df terms are changing AIC in: stepAIC(lmAll) > ##- 2: 0 df terms are changing AIC in: stepAIC(lmAll) > ##- ... > ##- ... > summary(sAr) > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595