Ravi Varadhan
2008-Nov-21  17:16 UTC
[R] Discrepancy in the regression coefficients for Cox regression - PBC data set
Hi, When I run the following Cox proportional hazards model on the Mayo clinic's PBC data set (given in the "survival" package), the regression coefficients do not agree with the results presented in Table 4.6.3 (p. 195) of Fleming & Harrington's book. library(survival) data(pbc) ans.cox <- coxph(Surv(time, status) ~ log(bili) + log(alb) + age + log(protime) + edema) ans.cox> ans.cox <- coxph(Surv(time, status) ~ log(bili) + log(alb) + age +log(protime) + edema)> ans.coxCall: coxph(formula = Surv(time, status) ~ log(bili) + log(alb) + age + log(protime) + edema) coef exp(coef) se(coef) z p log(bili) 0.8975 2.453 0.08271 10.85 0.0e+00 log(alb) -2.4524 0.086 0.65707 -3.73 1.9e-04 age 0.0382 1.039 0.00768 4.97 6.5e-07 log(protime) 2.3458 10.442 0.77425 3.03 2.4e-03 edema 0.6613 1.937 0.20595 3.21 1.3e-03 Likelihood ratio test=234 on 5 df, p=0 n= 418>These coefficients, however, are significantly different (i.e. the differences can't be just attributed to round-off's) from that reported in Table 4.6.3 (in the "Final model" column) of Fleming and Harrington (p. 195). The coefficients reported are: 0.8707, -2.533, 0.0394, 2.380, 0.8592. Note the big difference for the "edema" variable. It seems like the data set considered in the book and that available in "survival" package are the same (with n=418). I also re-ran the Cox PH model with the 2 "data-errors" discussed in p.188 of F&H, but still I could not match the results in Table 4.6.3. Is it possible that the results could be explained due to difference in convergence during maximization of partial likelihood? Can anyone help me figure out why this diescrepancy exists? Thanks very much, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- [[alternative HTML version deleted]]
David Winsemius
2008-Nov-21  17:34 UTC
[R] Discrepancy in the regression coefficients for Cox regression - PBC data set
There is a discussion in Appendix D.3 of "Modeling Survival Data" by Thereau and Grambsch regarding the differences in the datasets including the fact that "there was significantly more follow-up for many patients at the time this dataset was assembled". I do not see a material difference in the estimates. -- David Winsemius, MD Heritage Labs On Nov 21, 2008, at 12:16 PM, Ravi Varadhan wrote:> Hi, > > When I run the following Cox proportional hazards model on the Mayo > clinic's > PBC data set (given in the "survival" package), the regression > coefficients > do not agree with the results presented in Table 4.6.3 (p. 195) of > Fleming & > Harrington's book. > > library(survival) > > data(pbc) > > ans.cox <- coxph(Surv(time, status) ~ log(bili) + log(alb) + age + > log(protime) + edema) > > ans.cox > >> ans.cox <- coxph(Surv(time, status) ~ log(bili) + log(alb) + age + > log(protime) + edema) >> ans.cox > Call: > coxph(formula = Surv(time, status) ~ log(bili) + log(alb) + age + > log(protime) + edema) > > > coef exp(coef) se(coef) z p > log(bili) 0.8975 2.453 0.08271 10.85 0.0e+00 > log(alb) -2.4524 0.086 0.65707 -3.73 1.9e-04 > age 0.0382 1.039 0.00768 4.97 6.5e-07 > log(protime) 2.3458 10.442 0.77425 3.03 2.4e-03 > edema 0.6613 1.937 0.20595 3.21 1.3e-03 > > Likelihood ratio test=234 on 5 df, p=0 n= 418 >> > > These coefficients, however, are significantly different (i.e. the > differences can't be just attributed to round-off's) from that > reported in > Table 4.6.3 (in the "Final model" column) of Fleming and Harrington > (p. > 195). The coefficients reported are: 0.8707, -2.533, 0.0394, 2.380, > 0.8592. > Note the big difference for the "edema" variable. > > It seems like the data set considered in the book and that available > in > "survival" package are the same (with n=418). > > I also re-ran the Cox PH model with the 2 "data-errors" discussed in > p.188 > of F&H, but still I could not match the results in Table 4.6.3. > > Is it possible that the results could be explained due to difference > in > convergence during maximization of partial likelihood? > > Can anyone help me figure out why this diescrepancy exists? > > Thanks very much, > Ravi. > ---------------------------------------------------------------------------- > ------- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: rvaradhan at jhmi.edu > > Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html > > > > ---------------------------------------------------------------------------- > -------- > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.