On 04/21/2015 05:00 AM, r-help-request at r-project.org wrote:> Dear All, > > I am in some difficulty with predicting 'expected time of survival' for each > observation for a glmnet cox family with LASSO. > > I have two dataset 50000 * 450 (obs * Var) and 8000 * 450 (obs * var), I > considered first one as train and second one as test. > > I got the predict output and I am bit lost here, > > pre <- predict(fit,type="response", newx =selectedVar[1:20,]) > > s0 > 1 0.9454985 > 2 0.6684135 > 3 0.5941740 > 4 0.5241938 > 5 0.5376783 > > This is the output I am getting - I understood with type "response" gives > the fitted relative-risk for "cox" family. > > I would like to know how I can convert it or change the fitted relative-risk > to 'expected time of survival' ? > > Any help would be great, thanks for all your time and effort. > > Sincerely,The answer is that you cannot predict survival time, in general. The reason is that most studies do not follow the subjects for a sufficiently long time. For instance, say that the data set comes from a study that enrolled subjects and then followed them for up to 5 years, at which time 35% had experienced mortality (using the usual Kaplan-Meier). Fit a model to the data and ask "what is the predicted survival time for a low risk subject". The answer will at best be "greater than 5 years". The program cannot say if it is 6 or 10 or even 1000. A bigger data set does not help. Terry Therneau
Dear Terry, Thank you for your reply, I understood its difficult to predict survival time, in general. I have tried another approach and I would like to know whether my approach is correct. I have clustered my dataset based on some similarity and reduced the number of variables using LASSO and some expert opinion. And then I applied Accelerated failure time model - using weibull, used survival package - survreg and then I predicted the survival time. The accuracy is little less due to the uncertainty and complexity in survival time of individual observations, and I checked the quantile 5% and 95% and almost 95% observations falls in the confidence interval even if the interval is little wide. Actual Predicted Lower Upper 1 91 83.01901 10.497993 178.65750 2 90 62.66257 7.923863 134.85030 3 115 57.59236 7.282720 123.93918 4 20 50.72860 6.414777 109.16830 5 81 83.42176 10.548922 179.52423 6 113 57.10106 7.220593 122.88188 7 8 58.29399 7.371442 125.44907 8 88 53.19866 6.727124 114.48390 9 17 34.80713 4.401461 74.90518 10 5 45.90169 5.804401 98.78076 11 20 58.99832 7.460507 126.96480 12 34 64.05572 8.100031 137.84837 13 27 39.25003 4.963279 84.46635 14 56 41.03611 5.189134 88.31000 15 60 69.70944 8.814959 150.01520 Is my approach correct ? Can I say this model is good ? Will I be able to some more testing so that I can get a probability survival curve ? Sincerely, -- View this message in context: http://r.789695.n4.nabble.com/Predict-in-glmnet-for-cox-family-tp4706070p4706248.html Sent from the R help mailing list archive at Nabble.com.
Will I be able to do a prediction similar to above with random forest and compare both the predict survival time result from AFT model and the Survival Random forest model ? Sincerely, -- View this message in context: http://r.789695.n4.nabble.com/Predict-in-glmnet-for-cox-family-tp4706070p4706320.html Sent from the R help mailing list archive at Nabble.com.