Therneau, Terry M., Ph.D.
2015-Aug-31 13:56 UTC
[R] using survreg() in survival package with "long" data
On 08/30/2015 05:00 AM, r-help-request at r-project.org wrote:> I'm unable to fit a parametric survival regression using survreg() in the survival package with data in "counting-process" ("long") form. > > To illustrate using a scaled-down problem with 10 subjects (with data placed on the web): >As usual I'm a day late since I read digests, and Goran has already clarified things. A discussion of this is badly needed in my as yet unwrritten book on using the survival package. From a higher level view: If an observation is interval censored (a,b) then one knows that the event happened between time "a" and time "b", but not when. The survreg routine can handle interval censored data since it is parametric (you need to integrate over the interval). The interval (-infinity, b) is called 'left censored' and the interval (a, infinity) is 'right censored'. Left censored data is rare in medical work, an example might be a chronic disease like rhuematoid arthritis where we know that the true disease onset was some time before the date it was first detected, and one is trying to deduce the duration of disease. Left truncation at time 'a' means that any events before time "a" are not in the data set. In a referral center like mine this includes any subjects who die before they come to us. The coxph model handles left truncation naturally via its counting process formulation. That same formulation also allows it to deal with time dependent covariates. Accelerated failure time models like survreg can handle left truncation in principle, but they require that the values of any covariates are known from time 0 -- even for a truncated subject. I have never added left-truncation to the survreg code, mostly because I have never needed it myself, but also because users would immediately think that they could accomplish time-dependent covariates by simply using a long format data set. Rather, each subject needs to be linked to a full covariate history, which is a bit more work. So: coxph does left truncation but not left (or interval) censoring survreg does interval censoring but not left truncation (or time dependent covariates). Terry T