Dear Terry,
Thank you very much for taking your time to address this problem!
I did check the data in F&H. I couldn't detect any differences between
the
R data set and the one in the Appendix. The preface in F&H acknowledges
that the data set was obtained from Roland Dickinson. Is the data set in R
created by Tom Fleming based on the original Mayo data?
Where do the papers that reference this data set get their data from? Do
they get it from the URL that you gave me? It is impossible to tell from
the papers because they just cite the F&H appendix as the source of the
data, but obviously they must have gotten it as an electronic version from
somewhere. If so, is the electronic version the same as the R data set?
This is relevant for me because I am trying to compare the results of my
estimation algorithm to that in another paper (which, of course, simply
cites F&H for the data).
Best regards,
Ravi.
----------------------------------------------------------------------------
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
----------------------------------------------------------------------------
--------
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Terry Therneau
Sent: Monday, November 24, 2008 8:40 AM
To: rvaradhan at jhmi.edu
Cc: r-help at r-project.org
Subject: Re: [R] Discrepancy in the PBC data set
The data set in R is wrong. I've found mistakes on 2 lines in a quick
look.
I don't know if the data is incorrect in the Appendix of Fleming and
Harrington as well (someone seems to have borrowed my copy), which is where
the data set appears to have been taken from, given all the "-9" codes
in
it. (Note, Tom Fleming originally got the data from me, so I'm fairly
confident in calling my Mayo version the authoritative one). I'll make sure
this gets fixed.
You can grab a correct data set from our department web page. Code is
below.
Terry Therneau
pbcurl <-
"http://mayoresearch.mayo.edu/mayo/research/biostat/upload/therneau_upload/p
bc.d
at"
pbc <- read.table(pbcurl, header=F,
col.names=c('id', 'time', 'status',
'trt', 'age', 'sex',
'ascites', 'hepato',
'spiders', 'edema',
'bili', 'chol',
'albumin', 'copper',
'alk.phos', 'ast',
'trig', 'platelet',
'protime', 'stage'),
na.strings='.')
pbc$age <- pbc$age/365.25
newfit <- coxph(Surv(time, status==2) ~ age + edema + log(bili) +
log(protime) + log(albumin), data=pbc)
newfit
coef exp(coef) se(coef) z p
age 0.0396 1.0404 0.00767 5.16 2.4e-07
edema 0.8963 2.4505 0.27141 3.30 9.6e-04
log(bili) 0.8636 2.3716 0.08294 10.41 0.0e+00
log(protime) 2.3868 10.8791 0.76851 3.11 1.9e-03
log(albumin) -2.5069 0.0815 0.65292 -3.84 1.2e-04
Likelihood ratio test=231 on 5 df, p=0 n=416 (2 observations deleted due
to
missingness)
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.