Hi, I have a question about using lm on matrix, have to admit it is very trivial but I just couldn't find the answer after searched the mailing list and other online tutorial. It would be great if you could help. I have a matrix "trainx" of 492(rows) by 220(columns) that is my x, and trainy is 492 by 1. Also, I have the newdata testx which is 240 (rows) by 220 (columns). Here is what I got: py <- predict(lm(trainy ~ trainx ), data.frame(testx)) Warning message: 'newdata' had 240 rows but variable(s) found have 492 rows The fitting formula I intended is: trainy ~ trainx[,1] + trainx[,2] + .. +trainx[,220]. Any help, please? Best, Baoqiang
On Wed, Oct 10, 2012 at 3:35 PM, Baoqiang Cao <bqcaomail at gmail.com> wrote:> Hi, > > I have a question about using lm on matrix, have to admit it is very > trivial but I just couldn't find the answer after searched the mailing > list and other online tutorial. It would be great if you could help. > > I have a matrix "trainx" of 492(rows) by 220(columns) that is my x, > and trainy is 492 by 1. Also, I have the newdata testx which is 240 > (rows) by 220 (columns). Here is what I got: > > py <- predict(lm(trainy ~ trainx ), data.frame(testx)) > Warning message: > 'newdata' had 240 rows but variable(s) found have 492 rows > > The fitting formula I intended is: trainy ~ trainx[,1] + trainx[,2] + > .. +trainx[,220]. >I think you want a formula like trainy ~ . meaning "trainy" explained by everything else. (Admittedly, I think any model with 220 regressors is going to be absolutely terrible, but that's a different email) What I think is happening here is that lm() looks for "trainx" as a column name in the data set you provide, can't find it, and then finds the "trainx" dataset as a whole, which doesn't fit the dimensionality you need. Take a look at ?formula for more on how to use formula notation properly. Cheers, Michael
Baoqiang, Here's an approach that should work: (1) Make sure that the column names of trainx and testx are the same. (2) Combine trainy and trainx into a data frame for fitting the model. (2) Use the newdata= argument in the predict() function. (3) Convert testx from matrix to data frame. # some example data nrow <- 5 ncol <- 3 colnames <- paste("x", seq(ncol), sep="") nrow2 <- 8 trainx <- matrix(rnorm(nrow*ncol), ncol=ncol, dimnames=list(NULL, colnames)) trainy <- matrix(rnorm(nrow), ncol=1, dimnames=list(NULL, "y")) testx <- matrix(rnorm(nrow2*ncol), ncol=ncol, dimnames=list(NULL, colnames)) # create data frames for model fitting and prediction traindf <- data.frame(cbind(trainy, trainx)) testdf <- data.frame(testx) # fit the model and make predictions for new data fit <- lm(y ~ ., data=traindf) py <- predict(fit, newdata=testdf) Note that the lm() function you fit to the two matrices worked just fine lm(trainy ~ trainx) but the way that names are assigned to the predictor variables trainxx1, trainxx2, etc makes it inconvenient in predicting on new data. Jean Baoqiang Cao <bqcaomail@gmail.com> wrote on 10/10/2012 09:35:47 AM:> > Hi, > > I have a question about using lm on matrix, have to admit it is very > trivial but I just couldn't find the answer after searched the mailing > list and other online tutorial. It would be great if you could help. > > I have a matrix "trainx" of 492(rows) by 220(columns) that is my x, > and trainy is 492 by 1. Also, I have the newdata testx which is 240 > (rows) by 220 (columns). Here is what I got: > > py <- predict(lm(trainy ~ trainx ), data.frame(testx)) > Warning message: > 'newdata' had 240 rows but variable(s) found have 492 rows > > The fitting formula I intended is: trainy ~ trainx[,1] + trainx[,2] + > .. +trainx[,220]. > > Any help, please? > > Best, > Baoqiang[[alternative HTML version deleted]]