thr3ads.net - R help - [R] Expected number of events, Andersen-Gill model fit via coxph in package survival [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Omar De la Cruz C.

2012-Oct-06 03:48 UTC

[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

Hello,

I am interested in producing the expected number of events, in a
recurring events setting. I am using the Andersen-Gill model, as fit
by the function "coxph" in the package "survival."

I need to produce expected numbers of events for a cohort,
cumulatively, at several fixed times. My ultimate goal is: To fit an
AG model to a reference sample, then use that fitted model to generate
expected numbers of events for a new cohort; then, comparing the
expected vs. the observed numbers of events would give us some idea of
whether the new cohort differs from the reference one.
>From my reading of the documentation and the text by Therneau andGrambsch, it seems that the function "survexp" is what I need. But
using it I am not able to obtain expected numbers of events that match
reasonably well the observed numbers *even for the same reference
population.* So, I think I am misunderstanding something quite badly.

Below is an example that illustrates the situation. At the end I
include the sessionInfo().

Thank you!

Omar.


############################

# Example of unexpected behavior in computing estimated number of events
# in using package "survival" for fitting the Andersen-Gill model

require(survival)

head(bladder2)  # this is the data, in interval format

# Fit Andersen-Gill model
cphfit = coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)

# Choose some arbitrary time horizons
t.horiz = seq(min(bladder2$start),max(bladder2$stop),length=6)

# Compute the cohort expected survival
s = survexp(~1,data=bladder2,ratetable=cphfit,times=t.horiz)

# This are the expected survival values:
s$surv

# We are interested in the rate of events
e.r = as.vector( 1 - s$surv )

# How does this compare to the actual number of events, cumulative at
# each time horizon?

observed = numeric(length(t.horiz))

for (i in 1:length(t.horiz)){

        observed[i] = sum(bladder2$event[bladder2$stop <= t.horiz[i]])

}

print(observed)

# We would like to compute expected numbers of events that approximately
# match these observed values.

# We should multiply the expected survival rates by the number of individuals.

# Now, one would think that this is the number of at-risk individuals:
s$n.risk

# But that is actually the total number of rows in the data. In any case,
# these numbers do not match:

rbind(expected = s$n.risk*e.r,observed=observed)

# What if we multiply by the number of individuals?

rbind(expected = length(unique(bladder2$id))*e.r,observed=observed)

# This does not work either! The required factor seems to be about 133, but
# I don't see an explanation for that.

# In this example, multiplying by 133.182 gives a good match between observed
# and expected values, but in other examples even the shape of the curves
# are different.

# Multiplying by a number of individuals at risk at each time point
# (number of individuals
# for which there is a time interval containing the time horizon) does
# not work either.

#####################
> sessionInfo()R version 2.15.1 (2012-06-22)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets
methods   base

other attached packages:
[1] survival_2.36-14

loaded via a namespace (and not attached):
[1] tools_2.15.1

David Winsemius

2012-Oct-06 15:56 UTC

head link

[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

On Oct 5, 2012, at 8:48 PM, Omar De la Cruz C. wrote:
> Hello,
> 
> I am interested in producing the expected number of events, in a
> recurring events setting. I am using the Andersen-Gill model, as fit
> by the function "coxph" in the package "survival."
> 
> I need to produce expected numbers of events for a cohort,
> cumulatively, at several fixed times. My ultimate goal is: To fit an
> AG model to a reference sample, then use that fitted model to generate
> expected numbers of events for a new cohort; then, comparing the
> expected vs. the observed numbers of events would give us some idea of
> whether the new cohort differs from the reference one.
> 
>> From my reading of the documentation and the text by Therneau and
> Grambsch, it seems that the function "survexp" is what I need.
But
> using it I am not able to obtain expected numbers of events that match
> reasonably well the observed numbers *even for the same reference
> population.* So, I think I am misunderstanding something quite badly.
> 
> Below is an example that illustrates the situation. At the end I
> include the sessionInfo().
> 
> Thank you!
> 
> Omar.
> 
> 
> ############################
> 
> # Example of unexpected behavior in computing estimated number of events
> # in using package "survival" for fitting the Andersen-Gill model
> 
> require(survival)
> 
> head(bladder2)  # this is the data, in interval format
> 
> # Fit Andersen-Gill model
> cphfit =
coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
> 
> # Choose some arbitrary time horizons
> t.horiz = seq(min(bladder2$start),max(bladder2$stop),length=6)
> 
> # Compute the cohort expected survival
> s = survexp(~1,data=bladder2,ratetable=cphfit,times=t.horiz)
> # This are the expected survival values:
> s$surv
> 
> # We are interested in the rate of events
> e.r = as.vector( 1 - s$surv )
> 
Rates are events/n-exposed/time, so those are not rates as I understand the
term.  And I do not see any accounting for the length of intervals at risk in
the rest of your code. That vector does not even calculate interval event
expectations as I read it.

-- 
David

> # How does this compare to the actual number of events, cumulative at
> # each time horizon?
> 
> observed = numeric(length(t.horiz))
> 
> for (i in 1:length(t.horiz)){
> 
>        observed[i] = sum(bladder2$event[bladder2$stop <= t.horiz[i]])
> 
> }
> 
> print(observed)
> 
> # We would like to compute expected numbers of events that approximately
> # match these observed values.
> 
> # We should multiply the expected survival rates by the number of
individuals.
> 
> # Now, one would think that this is the number of at-risk individuals:
> s$n.risk
> 
> # But that is actually the total number of rows in the data. In any case,
> # these numbers do not match:
> 
> rbind(expected = s$n.risk*e.r,observed=observed)
> 
> # What if we multiply by the number of individuals?
> 
> rbind(expected = length(unique(bladder2$id))*e.r,observed=observed)
> 
> # This does not work either! The required factor seems to be about 133, but
> # I don't see an explanation for that.
> 
> # In this example, multiplying by 133.182 gives a good match between
observed
> # and expected values, but in other examples even the shape of the curves
> # are different.
> 
> # Multiplying by a number of individuals at risk at each time point
> # (number of individuals
> # for which there is a time interval containing the time horizon) does
> # not work either.
> 
> #####################
> 
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets
> methods   base
> 
> other attached packages:
> [1] survival_2.36-14
> 
> loaded via a namespace (and not attached):
> [1] tools_2.15.1
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA

Terry Therneau

2012-Oct-08 13:58 UTC

head link

[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

> I am interested in producing the expected number of events, in a
> recurring events setting. I am using the Andersen-Gill model, as fit
> by the function "coxph" in the package "survival."
>
> I need to produce expected numbers of events for a cohort,
> cumulatively, at several fixed times. My ultimate goal is: To fit an
> AG model to a reference sample, then use that fitted model to generate
> expected numbers of events for a new cohort; then, comparing the
> expected vs. the observed numbers of events would give us some idea of
> whether the new cohort differs from the reference one.
>
>> From my reading of the documentation and the text by Therneau and
> Grambsch, it seems that the function "survexp" is what I need.
But
> using it I am not able to obtain expected numbers of events that match
> reasonably well the observed numbers *even for the same reference
> population.* So, I think I am misunderstanding something quite badly.
>
  You've hit a common confusion.  Observed versus expected events
computations are done on
a cumulative hazard scale H, not the surivival scale S; S = exp(-H).  Relating
this back
to simple Poisson models H(t) would be the expected number of events by time t
and S(t)
the probability of "no events before time t".  G. Berry (Biometrics
1983) has a classic
ane readable article on this (especially if you ignore the proofs).

   Using your example:

 > cphfit <-
coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
 > zz <- predict(cphfit, type='expected')
 > c(sum(zz), sum(bladder2$event))
[1] 112 112

 > tdata <- bladder2[1:10]   #new data set (lazy way)
 > predict(cphfit, type='expected', newdata=tdata)
  [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665
  [8] 0.8864808 0.2932915 0.5190647


  You can also do this using survexp and the cohort=FALSE argument, which would
return
S(t) for each subject and we would then use -log(result) to get H.  This is how
it was
done when I wrote the book, but the newer predict function is easier.

Terry Therneau

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Oct 2012 - Expected number of events, Andersen-Gill model fit via coxph in package survival

[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

[R] Expected number of events, Andersen-Gill model fit via coxph in package survival

Seemingly Similar Threads