Fabian Hefner
2008-Apr-28 07:57 UTC
[R] Survival Regression with multiple events per subject
Dear R users! I want to process a maximum likelihood estimation for a parametric regression survival time model with multiple events per subject. the STATA command for this survival regression is: use survreg stset failure(exercise), id(optionid) local regressors itm posret negret streg `regressors', distribution(weibull) explanation: stset declares data to be survival-time data; exercise is the indicator variable, which shows if the subject is dead or alive; optionid is the multiple-record ID variable which means every subject has a unique id and one subject can have multiple events. (see example below) streg computes a maximum likelihood estimation for parametric regression survival time models with multiple record data now I search an equivalent command in R I found the survival package but I have no solution for the use with multiple records per subject. library(survival) survRegData <- survreg(formula=Surv(time,exercise)~itm+posret+negret, data=Data, dist="weibull") summary(survRegData) My Question is: how can I modify the above command for the use of multiple events per subject when the optionid is used for indicating the subject? The dataset look like: data | exercisedate | itm | posret | negret | optionid| exercise | time 1996 | 1996 | 1.4518 | 0.05487 |-0.4485 | 1 | 0 | 1 1997 | 1997 | 2.4535 | 0.00385 |-0.2525 | 1 | 1 | 5 1998 | 1998 | 1.2523 | 0.04486 |-0.1482 | 2 | 1 | 8 1999 | 1999 | 3.4257 | 0.15287 |-0.8615 | 3 | 0 | 4 2000 | 2000 | 1.1457 | 0.07487 |-0.5485 | 3 | 1 | 5 2001 | 2001 | 2.4418 | 0.09553 |-0.3772 | 3 | 0 | 2 Thank you, Fabian Hefner
Terry Therneau
2008-Apr-28 13:22 UTC
[R] Survival Regression with multiple events per subject
> I want to process a maximum likelihood estimation for a parametric > regression survival time model with multiple events per subject.Data sets with multiple records per subjects are used for several things, you need to tell me what it is that you want to accomplish. Multiple records is a method, not a goal. 1. Robust variance: If each observation is a separate measurement on the subject, with it's own covariates, time 0, and endpoint, and you want a "GEE" type variance that accounts for the fact that multiple observations are for the same subject: survreg(Surv(time, exercise) ~ itm + posret + negret + cluster(id), ... where id is a variable that is unique for unique subjects. 2. Time dependent covariates: Each subject has one endpoint, but covariates change over time. The bookkeeping for time dependent covariates is reasonably straightforward for proportional hazards models, but a major pain for an accelerated failure time (ACF) model. I've thought about it but never implemented the feature in survreg, though this may change one day due to the increased interest in accelerated aging as a biological model among the researchers I work with (but don't hold your breath). For example, if you smoked in your youth but later quit, in an ACF model this 'adds years' to your biological age which you never lose; the computer code has to keep track of covariate histories. In a proportional hazards model today's risk = function(today's covariates), which is easier. A weibull can be written in either ACF or PH form, survreg uses the acf style, I don't know which stata uses. 3. Multiple events per subject, with a single time scale per subject. This is seen in reliability analysis where hazard = function of age. Survreg does not handle this case either. Terry Therneau