eitan lavi
2009-Mar-17 00:21 UTC
[R] - help - predicting with glmnet/lars for dataframes with different nrow then the train set
Hello
I'm having trouble using lars and glmnet functions to predict on a new data
set with different nrow then the original :
for instance:
============ log.1 = glm(temp.data$TL~(.),temp.data,family =
binomial,x=TRUE,y=TRUE)
nrow(test.data) != nrow(temp.data # == TRUE
Val.frame = model.frame(log.1,test.data) # returns a data frame with the
variables needed to use log.1
Val.model = model.matrix(log.1,Val.frame) # creates a design matrix:
glmnet.best
glmnet(model.matrix(log.1,temp.data),temp.data$TL,family="binomial",alpha=best.alpha)
pred.glmnet.best
predict.glmnet(glmnet.best,Val.model,type="response")[,best.lambda]
==============
here the glm model and the glmnet were built with respect to temp.data.
test.data has a different number of rows then temp.data, and I want to use
one of the glmnet models (for which lambda = best.lambda)
to predict test.data.
test.data has the same variables as temp.data, it only differs by its number
of rows.
The current code (above) runs, but either truncates test.data rows to the
nrow of temp.data if test.data is longer
or copies rows from temp.data (not exactly sure from where in temp.data -
the end I think) to test.data if temp.data is longer.
how can I easily predict the responses of the model to a new data frame with
a different number of rows?
(I'm having the same problem with lars function - the code is almost
identical)
Thanks for the help :))
eitan
[[alternative HTML version deleted]]
hadley wickham
2009-Mar-17 02:56 UTC
[R] - help - predicting with glmnet/lars for dataframes with different nrow then the train set
On Mon, Mar 16, 2009 at 7:21 PM, eitan lavi <lavi.eitan at gmail.com> wrote:> Hello > > I'm having trouble using lars and glmnet functions to predict on a new data > set with different nrow then the original : > > > for instance: > ============> ? ?log.1 = glm(temp.data$TL~(.),temp.data,family = binomial,x=TRUE,y=TRUE)I don't know if this is the problem or not (you didn't supply a reproducible example), but I'd expect your call to be: log.1 <- glm(TL ~ ., data = temp.data , family = binomial , x = TRUE, y = TRUE) i.e. when you supply a data frame you don't explicitly use it in the formula. Hadley -- http://had.co.nz/