thr3ads.net - R help - [R] Speeding up prediction of survival estimates when using `survifit' [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Ravi Varadhan

2010-Aug-31 03:40 UTC

[R] Speeding up prediction of survival estimates when using `survifit'

Hi,

I fit a Cox PH model to estimate the cause-specific hazards (in a competing
risks setting).  Then , I compute the survival estimates for all the individuals
in my data set using the `survfit' function.  I am currently playing with a
data set that has about 6000 observations and 12 covariates.  I am finding that
the survfit function is very slow.

Here is a simple simulation example (modified from Frank Harrell's example
for `cph') that illustrates the problem:

#n <- 500
set.seed(4321) 

age <- 50 + 12*rnorm(n) 

sex <- factor(sample(c('Male','Female'), n, rep=TRUE,
prob=c(.6, .4)))

cens <- 5 * runif(n) 

h <- 0.02 * exp(0.04 * (age-50) + 0.8 * (sex=='Female')) 

dt <- -log(runif(n))/h 

e <- ifelse(dt <= cens, 1, 0) 

dt <- pmin(dt, cens) 

Srv <- Surv(dt, e)

 f <- coxph(Srv ~ age + sex, x=TRUE, y=TRUE) 

system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))


When I run the above code with sample sizes, n, taking on values of 500, 1000,
2000, and 4000, the time it takes for survfit to run are as follows:

# n <- 500> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))   user  system elapsed 
   0.16    0.00    0.15 


# n <- 1000> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))   user  system elapsed 
   1.45    0.00    1.48 


# n <- 2000> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))   user  system elapsed 
  10.19    0.00   10.25 


# n <- 4000> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))   user  system elapsed 
  72.87    0.05   74.87 


I eventually want to use `survfit' on a data set with roughly 50K
observations, which I am afraid is going to be painfully slow.  I would much
appreciate hints/suggestions on how to make `survfit' faster or any other
faster alternatives.

Thanks.

Best,
Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

Frank Harrell

2010-Aug-31 12:51 UTC

head link

[R] Speeding up prediction of survival estimates when using `survifit'

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

On Mon, 30 Aug 2010, Ravi Varadhan wrote:
> Hi,
>
> I fit a Cox PH model to estimate the cause-specific hazards (in a competing
risks setting).  Then , I compute the survival estimates for all the individuals
in my data set using the `survfit' function.  I am currently playing with a
data set that has about 6000 observations and 12 covariates.  I am finding that
the survfit function is very slow.
>
> Here is a simple simulation example (modified from Frank Harrell's
example for `cph') that illustrates the problem:
>
> #n <- 500
> set.seed(4321)
>
> age <- 50 + 12*rnorm(n)
>
> sex <- factor(sample(c('Male','Female'), n, rep=TRUE,
prob=c(.6, .4)))
>
> cens <- 5 * runif(n)
>
> h <- 0.02 * exp(0.04 * (age-50) + 0.8 * (sex=='Female'))
>
> dt <- -log(runif(n))/h
>
> e <- ifelse(dt <= cens, 1, 0)
>
> dt <- pmin(dt, cens)
>
> Srv <- Surv(dt, e)
>
> f <- coxph(Srv ~ age + sex, x=TRUE, y=TRUE)
>
> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))
>
>
> When I run the above code with sample sizes, n, taking on values of 500,
1000, 2000, and 4000, the time it takes for survfit to run are as follows:
>
> # n <- 500
>> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))
>   user  system elapsed
>   0.16    0.00    0.15
>
>
> # n <- 1000
>> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))
>   user  system elapsed
>   1.45    0.00    1.48
>
>
> # n <- 2000
>> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))
>   user  system elapsed
>  10.19    0.00   10.25
>
>
> # n <- 4000
>> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE,
newdata=f$x))
>   user  system elapsed
>  72.87    0.05   74.87
>
>
> I eventually want to use `survfit' on a data set with roughly 50K
observations, which I am afraid is going to be painfully slow.  I would much
appreciate hints/suggestions on how to make `survfit' faster or any other
faster alternatives.
Ravi,

If you don't need standard errors/confidence limits, the rms package's 
survest and related functions can speed things up greatly if you fit 
the model using cph(...., surv=TRUE).  [cph calls coxph, and calls 
survfit once to estimate the underlying survival curve].

Frank
>
> Thanks.
>
> Best,
> Ravi.
> ____________________________________________________________________
>
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
>
> Ph. (410) 502-2619
> email: rvaradhan at jhmi.edu

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2010 - Speeding up prediction of survival estimates when using `survifit'

[R] Speeding up prediction of survival estimates when using `survifit'

[R] Speeding up prediction of survival estimates when using `survifit'

Maybe Matching Threads