frauke
2012-Oct-03 14:37 UTC
[R] predict.lm if regression vector is longer than predicton vector
Hi everybody, recently a member of the community pointed me to the useful predict.lm() comment. While I was toying with it, I stumbled across the following problem. I do the regression with data from five years. But I want to do a prediction with predict.lm for only one year. Thus my dataframe for predict.lm(mod, newdata=dataframe) is shorter than the orginial vector that I did the regression with. It gives you the following error: Warning message: 'newdata' had 365 rows but variable(s) found have 1825 rows Of course I can extend the new dataframe with a few thousands NAs, but is there a more elegant solution? Thank you! Frauke -- View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-vector-is-longer-than-predicton-vector-tp4644881.html Sent from the R help mailing list archive at Nabble.com.
S Ellison
2012-Oct-03 15:10 UTC
[R] predict.lm if regression vector is longer than predicton vector
> Of course I can extend the new dataframe with a few thousands > NAs, but is there a more elegant solution?That should not be necessary: predict.lm should work on any number of newdata rows, whether longer or shorter than the original data set. However, the help page for predict.lm says (among other things) "If the fit is rank-deficient, some of the columns of the design matrix will have been dropped. Prediction from such a fit only makes sense if 'newdata' is contained in the same subspace as the original data. That cannot be checked accurately, so a warning is issued." Could that be the situation you are in? If it is, it's not the new data that causes the problem, but the original fit. S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
William Dunlap
2012-Oct-03 15:47 UTC
[R] predict.lm if regression vector is longer than predicton vector
This can happen if your newdata data.frame does not include all the predictors required by the formula in the model. In that case predict will look in the current evaluation environment to find the missing predictors, and those will generally not match what is in your newdata. E.g.,> x1 <- 1:6 > x2 <- 1/(1:6) > y <- log(1:6) > fit <- lm(y ~ x1 + x2) > predict(fit)1 2 3 4 5 6 -0.008176128 0.725397589 1.089747865 1.361792281 1.596914353 1.813575253> predict(fit, newdata=data.frame(x2=1:5)) # didn't supply x1Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'x2') In addition: Warning message: 'newdata' had 5 rows but variable(s) found have 6 rows Put all the required variables into newdata and things are fine> predict(fit, newdata=data.frame(x2=1:5, x1=sin(1:5)))1 2 3 4 5 -0.0366699 -1.1321492 -2.3778906 -3.6469522 -4.7909516 You can also get this problem if newdata is an environment or list instead of a data.frame, because only data.frame forces all of its components to have the same length. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of frauke > Sent: Wednesday, October 03, 2012 7:37 AM > To: r-help at r-project.org > Subject: [R] predict.lm if regression vector is longer than predicton vector > > Hi everybody, > > recently a member of the community pointed me to the useful predict.lm() > comment. While I was toying with it, I stumbled across the following > problem. > I do the regression with data from five years. But I want to do a prediction > with predict.lm for only one year. Thus my dataframe for predict.lm(mod, > newdata=dataframe) is shorter than the orginial vector that I did the > regression with. It gives you the following error: > Warning message: > 'newdata' had 365 rows but variable(s) found have 1825 rows > Of course I can extend the new dataframe with a few thousands NAs, but is > there a more elegant solution? > > Thank you! Frauke > > > > -- > View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression- > vector-is-longer-than-predicton-vector-tp4644881.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Greg Snow
2012-Oct-03 17:54 UTC
[R] predict.lm if regression vector is longer than predicton vector
The most common case that I see that error is when someone fits their model using syntax like: fit <- lm( mydata$y ~ mydata$x ) instead of the preferred method: fit <- lm( y ~ x, data=mydata ) The fix (if this is what you did and why you are getting the error) is to not use the first way and instead use the second, preferred way. On Wed, Oct 3, 2012 at 8:37 AM, frauke <fhoss at andrew.cmu.edu> wrote:> Hi everybody, > > recently a member of the community pointed me to the useful predict.lm() > comment. While I was toying with it, I stumbled across the following > problem. > I do the regression with data from five years. But I want to do a prediction > with predict.lm for only one year. Thus my dataframe for predict.lm(mod, > newdata=dataframe) is shorter than the orginial vector that I did the > regression with. It gives you the following error: > Warning message: > 'newdata' had 365 rows but variable(s) found have 1825 rows > Of course I can extend the new dataframe with a few thousands NAs, but is > there a more elegant solution? > > Thank you! Frauke > > > > -- > View this message in context: http://r.789695.n4.nabble.com/predict-lm-if-regression-vector-is-longer-than-predicton-vector-tp4644881.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com