Hi,
I figure out the reason for the difference. There are ties in failure times
in the data set. Consequently, it matters which method is used to handle
ties in "coxph". The default is "efron". If I use
method="breslow", there
is no difference between the 2 different ways of computing the Nelson-Aalen
estimte.
require(foreign)
gb <- read.dta("GB.dta") # Green & Byar data; N = 483
# Method 1 (note the use of Breslow estimator)
fit1 <- coxph( Surv(time, status=="Cancer" |
status=="CVD" |
status=="Other") ~ 1, method="breslow", data=gb)
h1 <- basehaz(fit1)
# Method 2
fit2 <- survfit(Surv(time, status=="Cancer" |
status=="CVD" |
status=="Other") ~ 1, data=gb)
jump <- fit2$n.event > 0
h2 <- cumsum(fit2$n.event[jump]/fit2$n.risk[jump])
> min(abs(h1$hazard - h2))
[1] 1.387779e-17
Best,
Ravi.
----------------------------------------------------------------------------
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
----------------------------------------------------------------------------
--------
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Ravi Varadhan
Sent: Monday, May 04, 2009 12:33 AM
To: r-help at r-project.org
Subject: [R] Nelson-Aalen estimator of cumulative hazard
Hi,
I am computing the Nelson-Aalen (NA) estimate of baseline cumulative hazard
in two different ways using the "survival" package. I am expecting
that
they should be identical. However, they are not. Their difference is a
monotonically increasing with time. This difference is probably not large
to make any impact in the application, but is annoyingly non-trivial for me
to just ignore it.
This is a competing risks problem, with the Green & Byar (1980) data set
(the STATA data set is attached).
Can anyone explain to me the reason for the discrepancy?
require(foreign)
gb <- read.dta("GB.dta") # Green & Byar data; N = 483
# Method 1
fit1 <- coxph( Surv(time, status=="Cancer" |
status=="CVD" |
status=="Other") ~ 1, data=gb)
h1 <- basehaz(fit1)
# Method 2
fit2 <- survfit(Surv(time, status=="Cancer" |
status=="CVD" |
status=="Other") ~ 1, data=gb)
jump <- fit2$n.event > 0
h2 <- cumsum(fit2$n.event[jump]/fit2$n.risk[jump])
plot(h1$time, h1$hazard - h2)
Thank you,
Ravi.
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu