Dear All, I would like to build a model, based on survival analysis on some data, that is able to predict the /*expected time until death*/ for a new data instance. Data For each individual in the population I have the, for each unit of time, the status information and several continuous covariates for that particular time. The data is right censored since at the end of the time interval analyzed, instances could be still alive and die later. Model I created the model using R and the survreg function: lfit <- survreg(Surv(time, status) ~ X) where: - time is the time vector - status is the status vector (0 alive, 1 death) - X is a bind of multiple vectors of covariates Predict time to death Given a new individual with some covariates values, I would like to predict the estimated time to death. In other words, the number of time units for which the individual will be still alive till his death. I think I can use this: ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response') Is that correct? Am I going to get the expected-time-to-death that I would like to have? In theory, I could provide also the time information (the time when the individual has those covariates values), should I simply add that in the newdata: ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA), type='response') Is that correct? Is this going to improve the prediction? (for my data, the time already passed should be an important variable). Any other suggestions or comments? Thank you! -- View this message in context: http://r.789695.n4.nabble.com/Survival-Analysis-and-Predict-time-to-death-tp4711198.html Sent from the R help mailing list archive at Nabble.com.
On Aug 17, 2015, at 12:10 PM, survivalUser wrote:> Dear All, > > I would like to build a model, based on survival analysis on some data, that > is able to predict the /*expected time until death*/ for a new data > instance.Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival. The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page.> Data > For each individual in the population I have the, for each unit of time, the > status information and several continuous covariates for that particular > time. The data is right censored since at the end of the time interval > analyzed, instances could be still alive and die later. > > Model > I created the model using R and the survreg function: > > lfit <- survreg(Surv(time, status) ~ X) > > where: > - time is the time vector > - status is the status vector (0 alive, 1 death) > - X is a bind of multiple vectors of covariates > > Predict time to death > Given a new individual with some covariates values, I would like to predict > the estimated time to death. In other words, the number of time units for > which the individual will be still alive till his death. > > I think I can use this: > > ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response')I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial?> Is that correct? Am I going to get the expected-time-to-death that I would > like to have?Most people would be using `survfit` to construct survival estimates.> > In theory, I could provide also the time information (the time when the > individual has those covariates values), should I simply add that in the > newdata: > > ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA), > type='response') > > Is that correct?This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates.> Is this going to improve the prediction?It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds.> (for my data, the > time already passed should be an important variable). > > Any other suggestions or comments? > > Thank you! >R-help at r-project.org The real Rhelp mailing list .... not the impostor Rhelp at Nabble -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA
Thank you David for your answer. Some follow-up questions: - So, do you think that try to estimate the life expectancy would be risky and probably not justifiable? Is there some sort of 'confidence' that the model could give me for a prediction? - type=response - I found it here: https://stat.ethz.ch/R-manual/R-devel/library/survival/html/predict.survreg.html I have not tried it yet, but I was planning to use that because it says that predict the "original scale of the data". - Yes, I think they are time-varying predictors. Would you suggest other models? (coxph?) Overall, do you think this analysis is feasible/correct? Predicting how much time a new individual (with those covariates) will be alive till death, is a reasonable thing to predict with survival model? Thank you again! -- View this message in context: http://r.789695.n4.nabble.com/Survival-Analysis-and-Predict-time-to-death-tp4711198p4711207.html Sent from the R help mailing list archive at Nabble.com.
David: I may have misunderstood you here, specifically: "As such I would ask if you really wanted to use a parametric survival model in the first place? " The K-M curve is , of course, a **non-parametric** fit, and that is why there can be no mean survival time unless the last point is a death. If you use the sample data to estimate a **parametric** model, then, of course, you can estimate mean survival time (at any covariate value) as the mean of the predicted parameter estimates (e.g. through a link function). I would certainly agree that the OP seems pretty confused about all this. And apologies if I have misunderstood. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Mon, Aug 17, 2015 at 1:51 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > On Aug 17, 2015, at 12:10 PM, survivalUser wrote: > >> Dear All, >> >> I would like to build a model, based on survival analysis on some data, that >> is able to predict the /*expected time until death*/ for a new data >> instance. > > Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival. > > The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page. > >> Data >> For each individual in the population I have the, for each unit of time, the >> status information and several continuous covariates for that particular >> time. The data is right censored since at the end of the time interval >> analyzed, instances could be still alive and die later. >> >> Model >> I created the model using R and the survreg function: >> >> lfit <- survreg(Surv(time, status) ~ X) >> >> where: >> - time is the time vector >> - status is the status vector (0 alive, 1 death) >> - X is a bind of multiple vectors of covariates >> >> Predict time to death >> Given a new individual with some covariates values, I would like to predict >> the estimated time to death. In other words, the number of time units for >> which the individual will be still alive till his death. >> >> I think I can use this: >> >> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response') > > I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial? > >> Is that correct? Am I going to get the expected-time-to-death that I would >> like to have? > > Most people would be using `survfit` to construct survival estimates. > >> >> In theory, I could provide also the time information (the time when the >> individual has those covariates values), should I simply add that in the >> newdata: >> >> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA), >> type='response') >> >> Is that correct? > > This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates. > > >> Is this going to improve the prediction? > > It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds. > >> (for my data, the >> time already passed should be an important variable). >> >> Any other suggestions or comments? >> >> Thank you! >> > > R-help at r-project.org > > The real Rhelp mailing list .... not the impostor Rhelp at Nabble > > -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Aug 17, 2015, at 1:51 PM, David Winsemius wrote:> > On Aug 17, 2015, at 12:10 PM, survivalUser wrote: > >> Dear All, >> >> I would like to build a model, based on survival analysis on some data, that >> is able to predict the /*expected time until death*/ for a new data >> instance. > > Are you sure you want to use life expectancy as the outcome? In order to establish a mathematical expectation you need to have know the risk at all time in the future, which as pointed out in the print.survfit help page is undefined unless the last observation is a death. Very few datasets support such an estimate. If on the other hand you have sufficient events in the future, then you may be able to more readily justify an estimate of a median survival.Dear survivalUser; I've been reminded that you later asked for a parametric model built with survreg. The above commentary applies to the coxph models and objects and not to survreg objects. If you do have a parametric model, even with incomplete observation then calculating life expectancy should be a simple matter of plugging the parameters for the distribution's mean value, since life-expectancy is the statistical mean. So maybe you do want such a modle. The default survreg distribution is "weibull" so just go to your mathematical statistics text and look up the formula for the mean of a Weibull distribution with the estimated parameters. -- David.> > The print.survfit function does give choices of a "restricted mean survival" or time-to-median-survival as estimate options. See that function's help page. > >> Data >> For each individual in the population I have the, for each unit of time, the >> status information and several continuous covariates for that particular >> time. The data is right censored since at the end of the time interval >> analyzed, instances could be still alive and die later. >> >> Model >> I created the model using R and the survreg function: >> >> lfit <- survreg(Surv(time, status) ~ X) >> >> where: >> - time is the time vector >> - status is the status vector (0 alive, 1 death) >> - X is a bind of multiple vectors of covariates >> >> Predict time to death >> Given a new individual with some covariates values, I would like to predict >> the estimated time to death. In other words, the number of time units for >> which the individual will be still alive till his death. >> >> I think I can use this: >> >> ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response') > > I don't see type="response" as a documented option in the `?predict.survreg` help page. Were you suggesting that code on the basis of some tutorial? > >> Is that correct? Am I going to get the expected-time-to-death that I would >> like to have? > > Most people would be using `survfit` to construct survival estimates. > >> >> In theory, I could provide also the time information (the time when the >> individual has those covariates values), should I simply add that in the >> newdata: >> >> ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA), >> type='response') >> >> Is that correct? > > This sounds like you are considering time-varying predictors. Adding them as a 'newdata' argument is most definitely not the correct method. As such I would ask if you really wanted to use a parametric survival model in the first place? The coxph function has facilities for time-varying covariates. > > >> Is this going to improve the prediction? > > It would most likely severely complicate prediction. Survival estimates may be more problematic in that case on theoretical grounds. > >> (for my data, the >> time already passed should be an important variable). >> >> Any other suggestions or comments? >> >> Thank you! >> > > R-help at r-project.org > > The real Rhelp mailing list .... not the impostor Rhelp at Nabble > > -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA