Markus Jäntti
2005-Feb-14 12:52 UTC
[R] using poly in a linear regression in the presence of NA fails (despite subsetting them out)
I ran into a to me surprising result on running lm with an orthogonal polynomial among the predictors. The lm command resulted in Error in qr(X) : NA/NaN/Inf in foreign function call (arg 1) Error during wrapup: despite my using a "subset" in the call to get rid of NA's. poly is apparently evaluated before any NA's are subsetted out of the data. Example code (attached to this e-mail as as a script): > ## generate some data > n <- 50 > x <- runif(n) > a0 <- 10 > a1 <- .5 > sigma.e <- 1 > y <- a0 + a1*x + rnorm(n)*sigma.e > tmp.d <- data.frame(y, x) > rm(list=c("n", "x", "a0", "a1", "sigma.e", "y")) > > print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d) + + ## now make a few NA's + + tmp.d$x[1:2] <- rep(NA, 2) Error: syntax error Error during wrapup: > > ## this fails, just as it should > print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d)) Call: lm(formula = y ~ poly(x, 2), data = tmp.d) Coefficients: (Intercept) poly(x, 2)1 poly(x, 2)2 10.380 -0.242 -1.441 > > ## these also fail, but should not? > > print(lm.2 <- lm(y ~ poly(x, 2), data = tmp.d, subset = !is.na(x))) Call: lm(formula = y ~ poly(x, 2), data = tmp.d, subset = !is.na(x)) Coefficients: (Intercept) poly(x, 2)1 poly(x, 2)2 10.380 -0.242 -1.441 > print(lm.3 <- lm(y ~ poly(x, 2), data = tmp.d, na.action = na.omit)) Call: lm(formula = y ~ poly(x, 2), data = tmp.d, na.action = na.omit) Coefficients: (Intercept) poly(x, 2)1 poly(x, 2)2 10.380 -0.242 -1.441 > > ## but this works > > print(lm.3 <- lm(y ~ poly(x, 2), data = subset(tmp.d, subset = !is.na(x)))) Call: lm(formula = y ~ poly(x, 2), data = subset(tmp.d, subset = !is.na(x))) Coefficients: (Intercept) poly(x, 2)1 poly(x, 2)2 10.380 -0.242 -1.441 -------------------- The documentation of lm is *not* misleading at this point, saying that subset an optional vector specifying a subset of observations to be used in the fitting process. which implies that data are subsetted once lm.fit is called. All the same, this behavior is a little unexpected to me. Is it to be considered a feature, that is, does it produce beneficial side effects which explain why it works as it does? Regards, Markus I am running R on a Debian testing system with kernel 2.6.10 and > version _ platform i386-pc-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 0.1 year 2004 month 11 day 15 language R -- Markus Jantti Abo Akademi University markus.jantti at iki.fi http://www.iki.fi/~mjantti -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: poly-example.R Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050214/68cdf3f1/poly-example.pl
Maybe Matching Threads
- using poly in a linear regression in the presence of NA f ails (despite subsetting them out)
- using poly in a linear regression in the presence of NA f ails (despite subsetting them out)
- poly() in lm() leads to wrong coefficients (but correct residuals)
- poly() can exceed degree k - 1 for k distinct points (PR#11251)
- Bug in poly() (PR#11243)