EDUARDO GARCIA PORTUGUES
2017-Dec-19 19:12 UTC
[Rd] lm considers removed predictors when finding complete cases
Dear R-devel list, I realized that removing a predictor in lm through the "-"'s operator in formula() does not affect the complete cases that are considered. A minimal example is: summary(lm(Wind ~ ., data = airquality)) # 42 observations deleted due to missingness summary(lm(Wind ~ . - Ozone, data = airquality)) # still 42 observations deleted due to missingness, even if only 7 are # missing for the response and the rest of the predictors summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone))) # 7 observations deleted due to missingness I find this behaviour somehow striking and I was wondering whether it is intended, or whether it would be appropriate to document it in lm's help. Any insight on this issue is appreciated. Best regards, -- Eduardo Garc?a Portugu?s Assistant professor Department of Statistics Carlos III University of Madrid Office: 7.3.J21 (Legan?s) Phone: (+34) 91624 8836 [[alternative HTML version deleted]]
David Winsemius
2017-Dec-20 00:22 UTC
[Rd] lm considers removed predictors when finding complete cases
> On Dec 19, 2017, at 11:12 AM, EDUARDO GARCIA PORTUGUES <edgarcia at est-econ.uc3m.es> wrote: > > Dear R-devel list, > > I realized that removing a predictor in lm through the "-"'s operator in > formula() does not affect the complete cases that are considered. A minimal > example is: > > summary(lm(Wind ~ ., data = airquality)) > # 42 observations deleted due to missingness > > summary(lm(Wind ~ . - Ozone, data = airquality)) > # still 42 observations deleted due to missingness, even if only 7 are > # missing for the response and the rest of the predictors > > summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone))) > # 7 observations deleted due to missingness > > I find this behaviour somehow striking and I was wondering whether it is > intended, or whether it would be appropriate to document it in lm's help.The behavior in the second instance seems consistent with a desire to compare models (full versus reduced) based on the same data. You expectation appears to be something else but you have not really explained your rationale for a different expectation other than to call it "striking". If by "striking" you mean hitting your head and saying "Oh course, I should have thought of that" then we would be in agreement. -- David.> > Any insight on this issue is appreciated. > > Best regards, > -- > Eduardo Garc?a Portugu?s > Assistant professor > Department of Statistics > Carlos III University of Madrid > > Office: 7.3.J21 (Legan?s) > Phone: (+34) 91624 8836 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-develDavid Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law