koshihaku
2011-Oct-01 01:31 UTC
[R] Is the output of survfit.coxph survival or baseline survival?
Dear all, I am confused with the output of survfit.coxph. Someone said that the survival given by summary(survfit.coxph) is the baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}. Which one is correct? By the way, if I use "newdata=" in the survfit, does that mean the survival is estimated by the value of covariates in the new data frame? Thank you very much! Koshihaku -- View this message in context: http://r.789695.n4.nabble.com/Is-the-output-of-survfit-coxph-survival-or-baseline-survival-tp3861919p3861919.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2011-Oct-01 15:25 UTC
[R] Is the output of survfit.coxph survival or baseline survival?
On Sep 30, 2011, at 9:31 PM, koshihaku wrote:> Dear all, > I am confused with the output of survfit.coxph. > Someone said that the survival given by summary(survfit.coxph) is the > baseline survival S_0, but some said that is the survival > S=S_0^exp{beta*x}. > > Which one is correct?It may depend on who _some_ and _someone_ mean by S_0 and who they are. I have in the past posted erroneous answers, but the name on which to search the archives is 'Terry Therneau'. My current understanding is that the survival S_0 is the estimated survival for a hypothetical subject whose continuous and discrete covariates are all at their means. (But I have been wrong before.) Here is some of what Therneau has said about it: http://finzi.psych.upenn.edu/Rhelp10/2010-October/257941.html http://finzi.psych.upenn.edu/Rhelp10/2009-March/190341.html http://finzi.psych.upenn.edu/Rhelp10/2009-February/189768.html> > By the way, if I use "newdata=" in the survfit, does that mean the > survival > is estimated by the value of covariates in the new data frame?In one sense yes, but in another sense, no. If you have a cox fit and you supply newdata, the beta estimates and the baseline survival come from in the original data. If you just give it a formula, then there is no newdata argument, only a data argument. Try this: fit <- coxph( Surv(futime, fustat)~rx, data=ovarian) plot( survfit(fit, newdata=data.frame(rx=1) ) ) plot( survfit( Surv(futime, fustat)~rx, data=ovarian) ) Then flipping back and forth between those curves might clarify, at least to the extent that I understand this question. And here's a pathological extrapolation: plot(survfit(fit, newdata=data.frame(rx=1:3))) # There is no rx=3 in the original data but it wasn't defined as a factor when given to coxph. # Just checked to see if you could extrapolate past the end of a range of factors and very sensibly you cannot. > fit <- coxph( Surv(futime, fustat)~factor(rx), data=ovarian) > plot(survfit(fit, newdata=data.frame(rx=1:3))) Error in model.frame.default(data = data.frame(rx = 1:3), formula = ~factor(rx), : factor 'factor(rx)' has new level(s) 3 -- David.
Thomas Lumley
2011-Oct-02 19:12 UTC
[R] Is the output of survfit.coxph survival or baseline survival?
On Sat, Oct 1, 2011 at 2:31 PM, koshihaku <koshihaku at gmail.com> wrote:> Dear all, > I am confused with the output of survfit.coxph. > Someone said that the survival given by summary(survfit.coxph) is the > baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}. > > Which one is correct?The baseline hazard as estimated in survfit.coxph is the hazard when all covariates are equal to the sample mean (or the stratum mean for a stratified model). The means that it is using are available in the $means component of the coxph object. It is not the hazard extrapolated to all covariates equal zero. The centering at the sample mean is done for three reasons 1/ it's computationally convenient 2/ it's numerically more stable 3/ it makes the baseline hazard more interpretable, since at least it is the hazard for a set of covariate values somewhere in the interior of your data. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland
Terry Therneau
2011-Oct-03 02:06 UTC
[R] Is the output of survfit.coxph survival or baseline survival?
> Dear all, > I am confused with the output of survfit.coxph. > Someone said that the survival given by summary(survfit.coxph) is the > baseline survival S_0, but some said that is the survival > S=S_0^exp{beta*x}. > > Which one is correct?The ³baseline survival², which is the survival for a hypothetical subject with all covariates=0, may be useful mathematical shorthand when writing a book but I cannot think of a single case where the resulting curve would be of any practical interest in medical data. For this reason my survival routines in R NEVER return it. (Ask yourself ³what is the survival for someone with blood pressure=0, cholesterol=0, weight=0, ....². The answer is that they are either non-existent or dead). The intention with survfit is that you will give it a second data set containing one or more lines, each of which describes a subject whose predicted survival is of interest. If no such data is given, the survival for someone with all covariates = to the mean is given. This is better than covariates =0, but sometimes not by much. (What if sex were coded as a 0/1 numeric ‹ do we get the survival of a hermaphrodite?) Your best approach is to forget the phrase ³baseline survival² and focus on covariate sets of interest to you. Terry Therneau [[alternative HTML version deleted]]
koshihaku
2011-Oct-05 05:54 UTC
[R] Is the output of survfit.coxph survival or baseline survival?
Dear all, Your advices was a great help to my study.Thank you very much! -- View this message in context: http://r.789695.n4.nabble.com/Is-the-output-of-survfit-coxph-survival-or-baseline-survival-tp3861919p3873512.html Sent from the R help mailing list archive at Nabble.com.