eitan lavi
2009-Mar-17 00:21 UTC
[R] - help - predicting with glmnet/lars for dataframes with different nrow then the train set
Hello I'm having trouble using lars and glmnet functions to predict on a new data set with different nrow then the original : for instance: ============ log.1 = glm(temp.data$TL~(.),temp.data,family = binomial,x=TRUE,y=TRUE) nrow(test.data) != nrow(temp.data # == TRUE Val.frame = model.frame(log.1,test.data) # returns a data frame with the variables needed to use log.1 Val.model = model.matrix(log.1,Val.frame) # creates a design matrix: glmnet.best glmnet(model.matrix(log.1,temp.data),temp.data$TL,family="binomial",alpha=best.alpha) pred.glmnet.best predict.glmnet(glmnet.best,Val.model,type="response")[,best.lambda] ============== here the glm model and the glmnet were built with respect to temp.data. test.data has a different number of rows then temp.data, and I want to use one of the glmnet models (for which lambda = best.lambda) to predict test.data. test.data has the same variables as temp.data, it only differs by its number of rows. The current code (above) runs, but either truncates test.data rows to the nrow of temp.data if test.data is longer or copies rows from temp.data (not exactly sure from where in temp.data - the end I think) to test.data if temp.data is longer. how can I easily predict the responses of the model to a new data frame with a different number of rows? (I'm having the same problem with lars function - the code is almost identical) Thanks for the help :)) eitan [[alternative HTML version deleted]]
hadley wickham
2009-Mar-17 02:56 UTC
[R] - help - predicting with glmnet/lars for dataframes with different nrow then the train set
On Mon, Mar 16, 2009 at 7:21 PM, eitan lavi <lavi.eitan at gmail.com> wrote:> Hello > > I'm having trouble using lars and glmnet functions to predict on a new data > set with different nrow then the original : > > > for instance: > ============> ? ?log.1 = glm(temp.data$TL~(.),temp.data,family = binomial,x=TRUE,y=TRUE)I don't know if this is the problem or not (you didn't supply a reproducible example), but I'd expect your call to be: log.1 <- glm(TL ~ ., data = temp.data , family = binomial , x = TRUE, y = TRUE) i.e. when you supply a data frame you don't explicitly use it in the formula. Hadley -- http://had.co.nz/