thr3ads.net - R devel - [Rd] lm considers removed predictors when finding complete cases [Dec 2017]

If this information is useful, please help other people find it:
Share via:

EDUARDO GARCIA PORTUGUES

2017-Dec-19 19:12 UTC

[Rd] lm considers removed predictors when finding complete cases

Dear R-devel list,

I realized that removing a predictor in lm through the "-"'s
operator in
formula() does not affect the complete cases that are considered. A minimal
example is:

summary(lm(Wind ~ ., data = airquality))
# 42 observations deleted due to missingness

summary(lm(Wind ~ . - Ozone, data = airquality))
# still 42 observations deleted due to missingness, even if only 7 are
# missing for the response and the rest of the predictors

summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
# 7 observations deleted due to missingness

I find this behaviour somehow striking and I was wondering whether it is
intended, or whether it would be appropriate to document it in lm's help.

Any insight on this issue is appreciated.

Best regards,
-- 
Eduardo Garc?a Portugu?s
Assistant professor
Department of Statistics
Carlos III University of Madrid

Office: 7.3.J21 (Legan?s)
Phone: (+34) 91624 8836

	[[alternative HTML version deleted]]

David Winsemius

2017-Dec-20 00:22 UTC

head link

[Rd] lm considers removed predictors when finding complete cases

> On Dec 19, 2017, at 11:12 AM, EDUARDO GARCIA PORTUGUES <edgarcia at
est-econ.uc3m.es> wrote:
> 
> Dear R-devel list,
> 
> I realized that removing a predictor in lm through the "-"'s
operator in
> formula() does not affect the complete cases that are considered. A minimal
> example is:
> 
> summary(lm(Wind ~ ., data = airquality))
> # 42 observations deleted due to missingness
> 
> summary(lm(Wind ~ . - Ozone, data = airquality))
> # still 42 observations deleted due to missingness, even if only 7 are
> # missing for the response and the rest of the predictors
> 
> summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
> # 7 observations deleted due to missingness
> 
> I find this behaviour somehow striking and I was wondering whether it is
> intended, or whether it would be appropriate to document it in lm's
help.
The behavior in the second instance seems consistent with a desire to compare
models (full versus reduced) based on the same data. You expectation appears to
be something else but you have not really explained your rationale for a
different expectation other than to call it "striking". If by
"striking" you mean hitting your head and saying "Oh course, I
should have thought of that" then we would be in agreement.

-- 
David.> 
> Any insight on this issue is appreciated.
> 
> Best regards,
> -- 
> Eduardo Garc?a Portugu?s
> Assistant professor
> Department of Statistics
> Carlos III University of Madrid
> 
> Office: 7.3.J21 (Legan?s)
> Phone: (+34) 91624 8836
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.' 
-Gehm's Corollary to Clarke's Third Law

Reasonably Related Threads

Search for more possibly parallel threads

R devel - Dec 2017 - lm considers removed predictors when finding complete cases

[Rd] lm considers removed predictors when finding complete cases

[Rd] lm considers removed predictors when finding complete cases

Reasonably Related Threads