Hiyoshi, Ayako
2014-May-30 11:37 UTC
[R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?
Dear R users, Hi, thank you so much for your help in advance. I have been using Stata but new to R. For my paper revision using Aalen's survival analysis, I need to use R, as the command including Aalen's survival seems to be available in R (32-bit, version 3.1.0 (2014-04-10)) but less ready to be used in Stata (version 13/SE). To make sure that I can do basics, I have fitted logistic regression and Cox proportional hazard regression using R and Stata. The data I used were from UCLA R's textbook example page: <http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm.> http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used this in Stata too. When I fitted logistic regression as below, the estimates were exactly same between R and Stata. <Example using logistic regression> R: logistic1 <- glm(censor ~ age + drug, data=xxxx, family = "binomial") summary(logistic1) exp(cbind(OR=coef(logistic1), confint(logistic1))) OR 2.5 % 97.5 % (Intercept) 1.0373731 0.06358296 16.797896 age 1.0436805 0.96801933 1.131233 drug 0.7192149 0.26042635 1.937502 Stata: logistic censor age i.drug OR CI_lower CI_upper age | 1.043681 .9662388 1.127329 drug | .719215 .2665194 1.940835 _cons | 1.037373 .065847 16.3431 However, when I fitted Cox proportional hazard regression, there were some discrepancies in coefficient (and exponentiated hazard ratios). <Example using Cox proportioanl hazard regression> R: cox1 <- coxph(Surv(time, censor) ~ drug, age, data=xxxx) summary(cox1) Call: coxph(formula = Surv(time, censor) ~ drug + age, data = xxxx) n= 100, number of events= 80 coef exp(coef) se(coef) z Pr(>|z|) drug 1.01670 2.76405 0.25622 3.968 7.24e-05 *** age 0.09714 1.10202 0.01864 5.211 1.87e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper .95 drug 2.764 0.3618 1.673 4.567 age 1.102 0.9074 1.062 1.143 Concordance= 0.711 (se = 0.042 ) Rsquare= 0.324 (max possible= 0.997 ) Likelihood ratio test= 39.13 on 2 df, p=3.182e-09 Wald test = 36.13 on 2 df, p=1.431e-08 Score (logrank) test = 38.39 on 2 df, p=4.602e-09 Stata: stset time, f(censor) stcox drug age ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- drug | 2.563531 .6550089 3.68 0.000 1.55363 4.229893 age | 1.095852 .02026 4.95 0.000 1.056854 1.136289 ------------------------------------------------------------------------------ The HR estimates for drug was 2.76 from R, but 2.56 from Stata. I searched in internet for explanation, but could not find any. In parametric survival regression with exponential distribution, R and Stata's coefficients were completely opposite while the values were exactly same (i.e. say 0.08 for Stata and -0.08 for R). I suspected something like this (http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/) going on, but for Cox proportional hazard regression, i coudl not find any resource helping me. I highly appreciate if anyone could explain this for me, or suggest me resource that I can read. Thank you so much for your help. Best, Ayako [[alternative HTML version deleted]]
Göran Broström
2014-May-30 14:58 UTC
[R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?
In the Cox regression case, the probable explanation is that you have ties in your data; Stata and coxph may have different defaults for handling ties. Read the manuals! The difference in sign in the other cases is simply due to different definitions of the models. I am sure it is well documented in relevant manuals. G?ran On 2014-05-30 13:37, Hiyoshi, Ayako wrote:> Dear R users, > > > > Hi, thank you so much for your help in advance. > > I have been using Stata but new to R. For my paper revision using > Aalen's survival analysis, I need to use R, as the command including > Aalen's survival seems to be available in R (32-bit, version 3.1.0 > (2014-04-10)) but less ready to be used in Stata (version 13/SE). > > > > To make sure that I can do basics, I have fitted logistic regression > and Cox proportional hazard regression using R and Stata. > > > > The data I used were from UCLA R's textbook example page: > <http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm.> > http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used > this in Stata too. > > > > When I fitted logistic regression as below, the estimates were > exactly same between R and Stata. > > > > <Example using logistic regression> > > R: > > > > logistic1 <- glm(censor ~ age + drug, data=xxxx, family > "binomial") > > summary(logistic1) > > exp(cbind(OR=coef(logistic1), confint(logistic1))) > > OR 2.5 % 97.5 % (Intercept) 1.0373731 0.06358296 16.797896 > age 1.0436805 0.96801933 1.131233 drug 0.7192149 > 0.26042635 1.937502 > > > > Stata: > > > > logistic censor age i.drug OR CI_lower CI_upper age | > 1.043681 .9662388 1.127329 drug | .719215 .2665194 > 1.940835 _cons | 1.037373 .065847 16.3431 > > > > However, when I fitted Cox proportional hazard regression, there were > some discrepancies in coefficient (and exponentiated hazard ratios). > > > > <Example using Cox proportioanl hazard regression> > > R: > > > > cox1 <- coxph(Surv(time, censor) ~ drug, age, data=xxxx) > summary(cox1) > > Call: coxph(formula = Surv(time, censor) ~ drug + age, data = xxxx) > n= 100, number of events= 80 coef exp(coef) se(coef) z Pr(>|z|) > drug 1.01670 2.76405 0.25622 3.968 7.24e-05 *** age 0.09714 > 1.10202 0.01864 5.211 1.87e-07 *** --- Signif. codes: 0 '***' 0.001 > '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper > .95 drug 2.764 0.3618 1.673 4.567 age 1.102 > 0.9074 1.062 1.143 Concordance= 0.711 (se = 0.042 ) Rsquare> 0.324 (max possible= 0.997 ) Likelihood ratio test= 39.13 on 2 df, > p=3.182e-09 Wald test = 36.13 on 2 df, p=1.431e-08 > Score (logrank) test = 38.39 on 2 df, p=4.602e-09 > > Stata: > > stset time, f(censor) stcox drug age > ------------------------------------------------------------------------------ > >_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]> -------------+---------------------------------------------------------------- > >drug | 2.563531 .6550089 3.68 0.000 1.55363 4.229893> age | 1.095852 .02026 4.95 0.000 1.056854 > 1.136289 > ------------------------------------------------------------------------------ > > > > > > The HR estimates for drug was 2.76 from R, but 2.56 from Stata. > > I searched in internet for explanation, but could not find any. > > > > In parametric survival regression with exponential distribution, R > and Stata's coefficients were completely opposite while the values > were exactly same (i.e. say 0.08 for Stata and -0.08 for R). I > suspected something like this > (http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/) > going on, but for Cox proportional hazard regression, i coudl not > find any resource helping me. > > > > I highly appreciate if anyone could explain this for me, or suggest > me resource that I can read. > > > > Thank you so much for your help. > > > > Best, > > Ayako > > > [[alternative HTML version deleted]] > > ______________________________________________ R-help at r-project.org > mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do > read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >