Dear all, I am stumped at what should be a painfully easy task: predicting from an lm object. A toy example would be this: XX <- matrix(runif(8),ncol=2) yy <- runif(4) model <- lm(yy~XX) XX.pred <- data.frame(matrix(runif(6),ncol=2)) colnames(XX.pred) <- c("XX1","XX2") predict(model,newdata=XX.pred) I would have expected the last line to give me the predictions from the model based on the new data given in XX.pred... but all I get are in-sample fits along with a warning "'newdata' had 3 rows but variable(s) found have 4 rows". Why would predict.lm worry about the number of rows in the model matrix? Unfortunately, ?predict.lm does not seem to be helpful, and neither RSiteSearch nor rseek.org have been useful. I'm sure that I am making an elementary error somewhere (am I misunderstanding the lm(yy~XX) part?) and would appreciate a gentle nudge in the right direction. Thank you, Stephan -- GMX DSL SOMMER-SPECIAL: Surf & Phone Flat 16.000 f?r nur 19,99 ?/mtl.!*
try it better this way: XX <- matrix(runif(8), ncol = 2) DF <- as.data.frame(XX) DF$yy <- runif(4) model <- lm(yy ~ ., DF) XX.pred <- as.data.frame(matrix(runif(6), ncol = 2)) predict(model, XX.pred) I hope it helps. Best, Dimitris On 8/17/2010 2:24 PM, Stephan Kolassa wrote:> Dear all, > > I am stumped at what should be a painfully easy task: predicting from an lm object. A toy example would be this: > > XX<- matrix(runif(8),ncol=2) > yy<- runif(4) > model<- lm(yy~XX) > XX.pred<- data.frame(matrix(runif(6),ncol=2)) > colnames(XX.pred)<- c("XX1","XX2") > predict(model,newdata=XX.pred) > > I would have expected the last line to give me the predictions from the model based on the new data given in XX.pred... but all I get are in-sample fits along with a warning "'newdata' had 3 rows but variable(s) found have 4 rows". Why would predict.lm worry about the number of rows in the model matrix? > > Unfortunately, ?predict.lm does not seem to be helpful, and neither RSiteSearch nor rseek.org have been useful. I'm sure that I am making an elementary error somewhere (am I misunderstanding the lm(yy~XX) part?) and would appreciate a gentle nudge in the right direction. > > Thank you, > Stephan >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Hi Stephan, You'll get the expected result if you pass a data.frame to the lm() function when fitting the model.> yy <- runif(4) > XX <- matrix(runif(8),ncol=2) > df <- data.frame(y=yy, XX) > dfy X1 X2 1 0.52889284 0.8055476 0.6670006 2 0.09989951 0.2498907 0.8867955 3 0.17523284 0.8959978 0.2316362 4 0.82489564 0.4446880 0.1369342> model <- lm(y~., data=df) > fitted(model)1 2 3 4 0.2442617 0.2634438 0.4734091 0.6478062> XX.pred <- data.frame(matrix(runif(6), ncol=2)) > names(XX.pred) <- c("X1", "X2") > predict(model, XX.pred)1 2 3 0.1365312 0.2394404 0.3789291 Best regards, Charlie Roosen Mango Solutions -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Stephan Kolassa Sent: 17 August 2010 14:25 To: R-help at r-project.org Subject: [R] predict.lm, matrix in formula and newdata Dear all, I am stumped at what should be a painfully easy task: predicting from an lm object. A toy example would be this: XX <- matrix(runif(8),ncol=2) yy <- runif(4) model <- lm(yy~XX) XX.pred <- data.frame(matrix(runif(6),ncol=2)) colnames(XX.pred) <- c("XX1","XX2") predict(model,newdata=XX.pred) I would have expected the last line to give me the predictions from the model based on the new data given in XX.pred... but all I get are in-sample fits along with a warning "'newdata' had 3 rows but variable(s) found have 4 rows". Why would predict.lm worry about the number of rows in the model matrix? Unfortunately, ?predict.lm does not seem to be helpful, and neither RSiteSearch nor rseek.org have been useful. I'm sure that I am making an elementary error somewhere (am I misunderstanding the lm(yy~XX) part?) and would appreciate a gentle nudge in the right direction. Thank you, Stephan -- GMX DSL SOMMER-SPECIAL: Surf & Phone Flat 16.000 f?r nur 19,99 ?/mtl.!* ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. LEGAL NOTICE This message is intended for the use o...{{dropped:9}}
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Stephan Kolassa > Sent: Tuesday, August 17, 2010 5:25 AM > To: R-help at r-project.org > Subject: [R] predict.lm, matrix in formula and newdata > > Dear all, > > I am stumped at what should be a painfully easy task: > predicting from an lm object. A toy example would be this: > > XX <- matrix(runif(8),ncol=2) > yy <- runif(4) > model <- lm(yy~XX) > XX.pred <- data.frame(matrix(runif(6),ncol=2)) > colnames(XX.pred) <- c("XX1","XX2") > predict(model,newdata=XX.pred) > > I would have expected the last line to give me the > predictions from the model based on the new data given in > XX.pred... but all I get are in-sample fits along with a > warning "'newdata' had 3 rows but variable(s) found have 4 > rows". Why would predict.lm worry about the number of rows in > the model matrix?Note that the formula in the model is y~XX so predict() is going to look for a variable called 'XX', not 'XX1' and 'XX2'. XX.pred doesn't have a variable called XX so predict() uses the one in the global environment. Put an XX of the appropriate type in your newdata argument and it will work. E.g., > XX.predMatrix <- data.frame(XX=I(matrix(runif(6),ncol=2))) > predict(model,newdata=XX.predMatrix) 1 2 3 0.8864115 0.3825788 0.4744272 > # no warnings Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> Unfortunately, ?predict.lm does not seem to be helpful, and > neither RSiteSearch nor rseek.org have been useful. I'm sure > that I am making an elementary error somewhere (am I > misunderstanding the lm(yy~XX) part?) and would appreciate a > gentle nudge in the right direction. > > Thank you, > Stephan > > -- > GMX DSL SOMMER-SPECIAL: Surf & Phone Flat 16.000 f?r nur > 19,99 ?/mtl.!* > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >