With R it is always possible to shoot yourself squarely in the foot, as you seem
keen to do, but R does at least often make it difficult.
When you predict, you need to have values for ALL variables used in the model.
Just leaving out the coefficients corresponding to absent predictors is
equivalent to assuming that those coefficients are zero, and there is no basis
whatever for so assuming. (In this constructed example things are different
because the missing variable is a nonsense variable and the coefficient should
be roughly zero, as it is, but in general that is not going to be the case.)
So you need to supply some value for each of the missing predictors if you are
going to use the standard prediction tools. An obvious plug is the mean of that
variable in the training data, though more sophisticated alternatives would
often be available.
Here is a suggestion for your case.
## fit some linear model to random data
x <- matrix(rnorm(100*3),100,3)
y <- sample(1:2, 100, replace = TRUE)
mydata <- data.frame(y, x)
library(splines) ## missing from your code.
mymodel <- lm(y ~ ns(X1, df = 3) + X2 + X3, data = mydata)
summary(mymodel)
## create new data with 1 missing input
mynewdata <- within(data.frame(matrix(rnorm(100*2), 100, 2)), ## add in an
X3
X3 <- mean(mydata$X3))
mypred <- predict(mymodel, mynewdata)
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Axel Urbiz [axel.urbiz at gmail.com]
Sent: 12 February 2011 11:51
To: R-help at r-project.org
Subject: [R] Predictions with missing inputs
Dear users,
I'll appreciate your help with this (hopefully) simple problem.
I have a model object which was fitted to inputs X1, X2, X3. Now, I'd like
to use this object to make predictions on a new data set where only X1 and
X2 are available (just use the estimated coefficients for these variables in
making predictions and ignoring the coefficient on X3). Here's my attempt
but, of course, didn't work.
#fit some linear model to random data
x=matrix(rnorm(100*3),100,3)
y=sample(1:2,100,replace=TRUE)
mydata <- data.frame(y,x)
mymodel <- lm(y ~ ns(X1, df=3) + X2 + X3, data=mydata)
summary(mymodel)
#create new data with 1 missing input
mynewdata <- data.frame(matrix(rnorm(100*2),100,2))
mypred <- predict(mymodel, mynewdata)
Thanks in advance for your help!
Axel.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.