Hi all, i'm trying to get the prediction probabilities for a survival elastic net. When i use try to predict using the train model on the test set, it creates an object with the number rows of the train data (6400 rows) instead of the test data (2400 rows). I really don't understand why, and that doesn't let me check for performance c-index. the code: data<-read.csv("old4.csv", header=TRUE) library(imputeMissings) data<-impute(data,object = NULL ,method = "median/mode") trainstatus<-train$DIED1095 trainTime<-train$TIME y<-Surv(trainTime,trainstatus) trainX<-train[-c(12,63,64,65,66,67,68,69,70,71)] x<-data.matrix(trainX) library(glmnet) fit <- glmnet(x,Surv(trainTime,trainstatus),family="cox",alpha=0.1, ,maxit=10000) max.dev.index <- which.max(fit$dev.ratio) optimal.lambda <- fit$lambda[max.dev.index] optimal.beta <- fit$beta[,max.dev.index] nonzero.coef <- abs(optimal.beta)>0 selectedBeta <- optimal.beta[nonzero.coef] selectedTrainX <- x[,nonzero.coef] coxph.model<- coxph(Surv(train$TIME,train$DIED365) ~x,data=train, init=selectedBeta,iter=0) coxph.predict<-predict(coxph.model,test) nrow(test) 2872 nrow(train 6701 length(coxph.predict) 6701 [[alternative HTML version deleted]]
On 11/15/19 10:49 AM, Amir Hadanny wrote:> Hi all, > i'm trying to get the prediction probabilities for a survival elastic net. > When i use try to predict using the train model on the test set, it creates > an object with the number rows of the train data (6400 rows) instead of the > test data (2400 rows). I really don't understand why, and that doesn't let > me check for performance c-index.If you call most `predict` functions with a second argument that fails to contain the predictors in the model, it returns the predictions on the original data. The only place where the `test` object appears prior to the predict operation is in your call to `predict.coxph`, so my guess is that it fails to meet the requirements of the function for a valid newdata argument. (Another thought was that maybe `test` didn't exist, but that should have thrown an error with the predict call and the nrow call.) But since you don't provide code that creates `test` or even an unambiguous way of examining its structure, that is entirely a guess. And finally ... Rhelp is a plain text mailing list, so please to read the message at the bottom of every transmission from the mailserver ... i.e.? read the Posting Guide. (It is not at all difficult to get gmail.com to send plain text.) -- David.> the code: > > data<-read.csv("old4.csv", header=TRUE) > library(imputeMissings) > data<-impute(data,object = NULL ,method = "median/mode") > > trainstatus<-train$DIED1095 > trainTime<-train$TIME > y<-Surv(trainTime,trainstatus) > > trainX<-train[-c(12,63,64,65,66,67,68,69,70,71)] > x<-data.matrix(trainX) > > > library(glmnet) > fit <- glmnet(x,Surv(trainTime,trainstatus),family="cox",alpha=0.1, > ,maxit=10000) > max.dev.index <- which.max(fit$dev.ratio) > optimal.lambda <- fit$lambda[max.dev.index] > optimal.beta <- fit$beta[,max.dev.index] > nonzero.coef <- abs(optimal.beta)>0 > selectedBeta <- optimal.beta[nonzero.coef] > selectedTrainX <- x[,nonzero.coef] > > coxph.model<- coxph(Surv(train$TIME,train$DIED365) ~x,data=train, > init=selectedBeta,iter=0) > coxph.predict<-predict(coxph.model,test) > > nrow(test) > 2872 > > nrow(train > 6701 > > length(coxph.predict) > 6701 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you, both train and test are originated from the same data object. attached the missing code: data<-read.csv("old4.csv", header=TRUE) library(imputeMissings) data<-impute(data,object = NULL ,method = "median/mode") for (i in col[13:68]) { data[i]<-lapply(data[i], factor) } for (i in col[1:12]) { data[i]<-lapply(data[i], numeric) } data$TIME<-as.numeric(data$TIME) data<-data[-c(61,62,64,65,66,67,68)] data$TIME<-ceiling(data$TIME/12) data$TIME[which(data$TIME==37)]<-36 data1 = sort(sample(nrow(data), nrow(data)*.7)) train<-data[data1,] test<-data[-data1,] so test should be the exact same, and i still can't find the issue, thank you Amir On Sat, Nov 16, 2019 at 12:00 AM David Winsemius <dwinsemius at comcast.net> wrote:> > On 11/15/19 10:49 AM, Amir Hadanny wrote: > > Hi all, > > i'm trying to get the prediction probabilities for a survival elastic > net. > > When i use try to predict using the train model on the test set, it > creates > > an object with the number rows of the train data (6400 rows) instead of > the > > test data (2400 rows). I really don't understand why, and that doesn't > let > > me check for performance c-index. > > > If you call most `predict` functions with a second argument that fails > to contain the predictors in the model, it returns the predictions on > the original data. The only place where the `test` object appears prior > to the predict operation is in your call to `predict.coxph`, so my guess > is that it fails to meet the requirements of the function for a valid > newdata argument. (Another thought was that maybe `test` didn't exist, > but that should have thrown an error with the predict call and the nrow > call.) > > > But since you don't provide code that creates `test` or even an > unambiguous way of examining its structure, that is entirely a guess. > > > And finally ... Rhelp is a plain text mailing list, so please to read > the message at the bottom of every transmission from the mailserver ... > i.e. read the Posting Guide. (It is not at all difficult to get > gmail.com to send plain text.) > > > -- > > David. > > > the code: > > > > data<-read.csv("old4.csv", header=TRUE) > > library(imputeMissings) > > data<-impute(data,object = NULL ,method = "median/mode") > > > > trainstatus<-train$DIED1095 > > trainTime<-train$TIME > > y<-Surv(trainTime,trainstatus) > > > > trainX<-train[-c(12,63,64,65,66,67,68,69,70,71)] > > x<-data.matrix(trainX) > > > > > > library(glmnet) > > fit <- glmnet(x,Surv(trainTime,trainstatus),family="cox",alpha=0.1, > > ,maxit=10000) > > max.dev.index <- which.max(fit$dev.ratio) > > optimal.lambda <- fit$lambda[max.dev.index] > > optimal.beta <- fit$beta[,max.dev.index] > > nonzero.coef <- abs(optimal.beta)>0 > > selectedBeta <- optimal.beta[nonzero.coef] > > selectedTrainX <- x[,nonzero.coef] > > > > coxph.model<- coxph(Surv(train$TIME,train$DIED365) ~x,data=train, > > init=selectedBeta,iter=0) > > coxph.predict<-predict(coxph.model,test) > > > > nrow(test) > > 2872 > > > > nrow(train > > 6701 > > > > length(coxph.predict) > > 6701 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]