Therneau, Terry M., Ph.D.
2015-Aug-18 13:19 UTC
[R] Survival analysis and predict time-to-death
I read this list a day late as a digest so my answers are rarely the first. (Which is nice as David W answers most of the survival questions for me!) What you are asking is reasonable, and in fact is common practice in the realm of industrial reliability, e.g., Meeker and Escobar, Statistical Methods for Reliability Analysis. Extrapolation of the survival curve to obtain the mean and percentiles of the lifetime distribution for some device (e.g. a washing machine) is their bread and butter, used for instance to determine the right size for an inventory of spare parts. For most of us on this list who do medical statistics and live in the Kaplan-Meier/ Cox model world the ideas are uncommon. I was lucky enough to sit through one of Bill Meeker's short courses and retain some (minimal) memory of it. 1. You are correct that parametric models are essential. If the extrapolation is substantial (30% or more censored, say), then the choice of distribution can be critical. If failure is due to repeated insult, e.g., the multi-hit model, then Weibull tends to be preferred; if it is from degradation, e.g., flexing of a diaphram, then the log-normal. Beyond this you need more guidance than mine. 2. The survreg routine assumes that log(y) ~ covariates + error. For a log-normal distribion the error is Gaussian and thus the predict(fit, type='response') will be exp(predicted mean of log time), which is not the predicted mean time. For Weibull the error dist is asymmetric so things are more muddy. Each is the MLE prediction for the subject, just not interpretable as a mean. To get the actual mean you need to look up the formulas for Weibull and/or lognormal in a textbook, and map from the survreg parameterization to whatever one the textbook uses. The two parameterizations are never the same. 3. Another option is predicted quantiles. ?predict.survreg shows how to get the entire survival curve. The mean can be obtained as the area under the survival curve. Relevant to your question, the expected time remaining for a subject still alive at time =10, say, is integral(S(t), from 10 to infin) / S(10), where S is the survival curve. You can also read off quantiles of the expected remaining life. Terry Therneau (author of the survival package) On 08/18/2015 05:00 AM, r-help-request at r-project.org wrote:> Dear All, > > I would like to build a model, based on survival analysis on some data, that > is able to predict the /*expected time until death*/ for a new data > instance. > > Data > For each individual in the population I have the, for each unit of time, the > status information and several continuous covariates for that particular > time. The data is right censored since at the end of the time interval > analyzed, instances could be still alive and die later. > > Model > I created the model using R and the survreg function: > > lfit <- survreg(Surv(time, status) ~ X) > > where: > - time is the time vector > - status is the status vector (0 alive, 1 death) > - X is a bind of multiple vectors of covariates > > Predict time to death > Given a new individual with some covariates values, I would like to predict > the estimated time to death. In other words, the number of time units for > which the individual will be still alive till his death. > > I think I can use this: > > ptime <- predict(lfit, newdata=data.frame(X=NEWDATA), type='response') > > Is that correct? Am I going to get the expected-time-to-death that I would > like to have? > > In theory, I could provide also the time information (the time when the > individual has those covariates values), should I simply add that in the > newdata: > > ptime <- predict(lfit, newdata=data.frame(time=TIME, X=NEWDATA), > type='response') > > Is that correct? Is this going to improve the prediction? (for my data, the > time already passed should be an important variable). > > Any other suggestions or comments? > > Thank you!