Chris Wilkinson
2013-Dec-18 18:18 UTC
[R] Predicting response from fitted linear model with incomplete new sample data
I would like to predict a new response from a fitted linear model where the new data is a single case with a missing value. My reading of the help on predict() is inconclusive on whether this is possible. Leaving out the missing value or setting it to NA both fail but differently, see example code below.> y <- runif(50) > x1 <- rnorm(50) > x2 <- rnorm(50) > dat <- data.frame(y, x1, x2) > mod <- lm(y~.,data=dat) > summary(mod)Call: lm(formula = y ~ ., data = dat) Residuals: Min 1Q Median 3Q Max -0.50467 -0.28997 0.01457 0.27970 0.47791 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.50098 0.04577 10.945 1.6e-14 *** x1 -0.01762 0.04172 -0.422 0.675 x2 -0.02753 0.04920 -0.560 0.578 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3177 on 47 degrees of freedom Multiple R-squared: 0.009301, Adjusted R-squared: -0.03286 F-statistic: 0.2206 on 2 and 47 DF, p-value: 0.8028> predict(mod, newdata=data.frame(x1=0.1, x2=0.3)) #OK as expected1 0.4909624> predict(mod, newdata=data.frame(x1=0.1)) # x2 missingError in model.frame.default(Terms, newdata, na.action = na.action, xlev object$xlevels) : variable lengths differ (found for 'x2') In addition: Warning message: 'newdata' had 1 row but variables found have 50 rows> predict(mod, newdata=data.frame(x1=0.1, x2=NA)) #x2=NAError: variable 'x2' was fitted with type "numeric" but type "logical" was supplied>Thanks Chris
Rolf Turner
2013-Dec-18 20:04 UTC
[R] Predicting response from fitted linear model with incomplete new sample data
As far as I can discern, your question makes no sense at all. Suppose you *know* that y = 2 + 3*x1 + 4*x2. Now what should you predict when x1 = 6 (with x2 "missing"/unknown)? See fortune("magic"). On 19/12/13 07:18, Chris Wilkinson wrote:> I would like to predict a new response from a fitted linear model where the > new data is a single case with a missing value. My reading of the help on > predict() is inconclusive on whether this is possible. > > Leaving out the missing value or setting it to NA both fail but differently, > see example code below. > >> y <- runif(50) >> x1 <- rnorm(50) >> x2 <- rnorm(50) >> dat <- data.frame(y, x1, x2) >> mod <- lm(y~.,data=dat) >> summary(mod) > Call: > lm(formula = y ~ ., data = dat) > Residuals: > Min 1Q Median 3Q Max > -0.50467 -0.28997 0.01457 0.27970 0.47791 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.50098 0.04577 10.945 1.6e-14 *** > x1 -0.01762 0.04172 -0.422 0.675 > x2 -0.02753 0.04920 -0.560 0.578 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.3177 on 47 degrees of freedom > Multiple R-squared: 0.009301, Adjusted R-squared: -0.03286 > F-statistic: 0.2206 on 2 and 47 DF, p-value: 0.8028 > >> predict(mod, newdata=data.frame(x1=0.1, x2=0.3)) #OK as expected > 1 > 0.4909624 > >> predict(mod, newdata=data.frame(x1=0.1)) # x2 missing > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev > object$xlevels) : > variable lengths differ (found for 'x2') > In addition: Warning message: > 'newdata' had 1 row but variables found have 50 rows >> predict(mod, newdata=data.frame(x1=0.1, x2=NA)) #x2=NA > Error: variable 'x2' was fitted with type "numeric" but type "logical" was > supplied