Juliet Hannah
2012-Mar-21 18:35 UTC
[R] glmnet: obtain predictions using predict and also by extracting coefficients
All, For my understanding, I wanted to see if I can get glmnet predictions using both the predict function and also by multiplying coefficients by the variable matrix. This is not worked out. Could anyone suggest where I am going wrong? I understand that I may not have the mean/intercept correct, but the scaling is also off, which suggests a bigger mistake. Thanks for your help. Juliet Hannah library(ElemStatLearn) library(glmnet) data(prostate) # training data data.train <- prostate[prostate$train,] y <- data.train$lpsa # isolate predictors data.train <- as.matrix(data.train[,-c(9,10)]) # test data data.test <- prostate[!prostate$train,] data.test <- as.matrix(data.test[,-c(9,10)]) # scale test data by using means and sd from training data trainMeans <- apply(data.train,2,mean) trainSDs <- apply(data.train,2,sd) # create standardized test data data.test.std <- sweep(data.test, 2, trainMeans) data.test.std <- sweep(data.test.std, 2, trainSDs, "/") # fit training model myglmnet =cv.glmnet(data.train,y) # predictions by using predict function yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") # attempting to get predictions by using coefficients beta <- as.vector( t(coef(myglmnet,s="lambda.min"))) testX <- cbind(1,data.test.std) yhat2 <- testX %*% beta # does not match plot(yhat2,yhat_enet)
Juliet Hannah
2012-Mar-21 18:50 UTC
[R] glmnet: obtain predictions using predict and also by extracting coefficients
Oops. Coefficients are returned on the scale of the original data. testX <- cbind(1,data.test) yhat2 <- testX %*% beta # works plot(yhat2,yhat_enet) On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:> All, > > For my understanding, I wanted to see if I can get glmnet predictions > using both the predict function and also by multiplying coefficients > by the variable matrix. This is not worked out. Could anyone suggest > where I am going wrong? > I understand that I may not have the mean/intercept correct, but the > scaling is also off, which suggests a bigger mistake. > > ?Thanks for your help. > > Juliet Hannah > > > library(ElemStatLearn) > library(glmnet) > > data(prostate) > > # training data > data.train <- prostate[prostate$train,] > y <- data.train$lpsa > > # isolate predictors > data.train <- as.matrix(data.train[,-c(9,10)]) > > # test data > data.test <- prostate[!prostate$train,] > data.test <- ?as.matrix(data.test[,-c(9,10)]) > > # scale test data ?by using means and sd from training data > > trainMeans <- apply(data.train,2,mean) > trainSDs <- apply(data.train,2,sd) > > # create standardized test data > > data.test.std <- sweep(data.test, 2, trainMeans) > data.test.std <- sweep(data.test.std, 2, trainSDs, "/") > > # fit training model > > myglmnet =cv.glmnet(data.train,y) > > # predictions by using predict function > > yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") > > # attempting to get predictions by using coefficients > > beta ?<- as.vector( t(coef(myglmnet,s="lambda.min"))) > > testX <- cbind(1,data.test.std) > > yhat2 ?<- testX %*% beta > > # does not match > > plot(yhat2,yhat_enet)
Weidong Gu
2012-Mar-21 20:42 UTC
[R] glmnet: obtain predictions using predict and also by extracting coefficients
Hi Juliet, First of all, cv.glmnet is used to estimate lambda based on cross-validation. To get a glmnet prediction, you should use glmnet function which uses all data in the training set. Second, you constructed testX using a different data set (data.test.std) from one for glmnet predict (data.test). It's not surprise the predictions are different. Weidong Gu On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah <juliet.hannah at gmail.com> wrote:> All, > > For my understanding, I wanted to see if I can get glmnet predictions > using both the predict function and also by multiplying coefficients > by the variable matrix. This is not worked out. Could anyone suggest > where I am going wrong? > I understand that I may not have the mean/intercept correct, but the > scaling is also off, which suggests a bigger mistake. > > ?Thanks for your help. > > Juliet Hannah > > > library(ElemStatLearn) > library(glmnet) > > data(prostate) > > # training data > data.train <- prostate[prostate$train,] > y <- data.train$lpsa > > # isolate predictors > data.train <- as.matrix(data.train[,-c(9,10)]) > > # test data > data.test <- prostate[!prostate$train,] > data.test <- ?as.matrix(data.test[,-c(9,10)]) > > # scale test data ?by using means and sd from training data > > trainMeans <- apply(data.train,2,mean) > trainSDs <- apply(data.train,2,sd) > > # create standardized test data > > data.test.std <- sweep(data.test, 2, trainMeans) > data.test.std <- sweep(data.test.std, 2, trainSDs, "/") > > # fit training model > > myglmnet =cv.glmnet(data.train,y) > > # predictions by using predict function > > yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") > > # attempting to get predictions by using coefficients > > beta ?<- as.vector( t(coef(myglmnet,s="lambda.min"))) > > testX <- cbind(1,data.test.std) > > yhat2 ?<- testX %*% beta > > # does not match > > plot(yhat2,yhat_enet) > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.