Paul Miller
2012-Apr-13 12:13 UTC
[R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Hello All, ? Am replicating in R an analysis I did earlier using SAS. See this as a test of whether I'm ready to start using R in my day-to-day work. ? Just finished replicating a Kaplan Meier analysis. Everything seems to work out fine except for one thing. The 95% CI around my estimate for the median is substantially larger in R than in SAS. For example, in SAS I have a median of 3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median of 3.29 with a 95% CI of [1.35,?13.35]. ? Can anyone tell me why I get this difference? ? My R code looks like: ? survfrm <- Surv(progression_months_landmark_14,progression==1) ~ pr_rg_landmark_14 survobj <- survfit(survfrm, data=Survival) survlrk <- survdiff(survfrm, data=Survival) summary(survobj) print(survobj) print(survlrk) ? My SAS code looks like: ? proc lifetest data=survival; strata pr_rg_landmark_14; time progression_months_landmark_14 * progression(0); run; Thought maybe the difference could have something to do with the strata statement in the SAS code not being translated properly into R. Tried changing my R code to make pr_rg_landmark_14 a strata but this didn't seem to change anything. Except that I no longer got a log rank test. Thanks, Paul ? ? ? ? ? ? ?
Paul Miller
2012-Apr-13 15:36 UTC
[R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Hi Enrico, Not sure how SAS builds the CI but I can look into it. The SAS documentation does have a section on computational formulas at: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifetest_a0000000259.htm Although I can't provide my dataset, I can provide the data and code below. This is the R-equivalent of an analysis from "Common Statistical Methods for Clinical Research with SAS Examples." R produces the follwoing output:> print(surv.by.vac)Call: survfit(formula = Surv(WKS, CENS == 0) ~ VAC, data = hsv) records n.max n.start events median 0.95LCL 0.95UCL VAC=GD2 25 25 25 14 35 15 NA VAC=PBO 23 23 23 17 15 12 35 SAS has the same 95% CI for VAC=GD2 but has a 95% CI of [10, 27] for VAC=PBO. This is just like in the analysis I'm doing currently. Thanks, Paul ? ####################################### #### Chapter 21: The Log-Rank Test #### ####################################### ? ##################################################### #### Example 21.1: HSV2 Vaccine with gD2 Vaccine #### ##################################################### ? connection <- textConnection(" GD2? 1?? 8 12? GD2? 3 -12 10? GD2? 6 -52? 7 GD2? 7? 28 10? GD2? 8? 44? 6? GD2 10? 14? 8 GD2 12?? 3? 8? GD2 14 -52? 9? GD2 15? 35 11 GD2 18?? 6 13? GD2 20? 12? 7? GD2 23? -7 13 GD2 24 -52? 9? GD2 26 -52 12? GD2 28? 36 13 GD2 31 -52? 8? GD2 33?? 9 10? GD2 34 -11 16 GD2 36 -52? 6? GD2 39? 15 14? GD2 40? 13 13 GD2 42? 21 13? GD2 44 -24 16? GD2 46 -52 13 GD2 48? 28? 9? PBO? 2? 15? 9? PBO? 4 -44 10 PBO? 5? -2 12? PBO? 9?? 8? 7? PBO 11? 12? 7 PBO 13 -52? 7? PBO 16? 21? 7? PBO 17? 19 11 PBO 19?? 6 16? PBO 21? 10 16? PBO 22 -15? 6 PBO 25?? 4 15? PBO 27? -9? 9? PBO 29? 27 10 PBO 30?? 1 17? PBO 32? 12? 8? PBO 35? 20? 8 PBO 37 -32? 8? PBO 38? 15? 8? PBO 41?? 5 14 PBO 43? 35 13? PBO 45? 28? 9? PBO 47?? 6 15 ") hsv <- data.frame(scan(connection, list(VAC="", PAT=0, WKS=0, X=0))) hsv <- transform(hsv, ??CENS = ifelse(WKS < 1, 1, 0), ??WKS? = abs(WKS), ??TRT? = ifelse(VAC=="GD2", 1, 0)) library("survival") surv.by.vac <- survfit(Surv(WKS,CENS==0)~VAC, data=hsv) plot(surv.by.vac, ?main = "The Log-Rank Test \n Example 21.1: HSV-Episodes with gD2 Vaccine", ?ylab = "Survival Distribution Function", ?xlab = "Survival Time in Weeks", ?lty = c(1,2)) legend(0.75,0.19, ?legend = c("gD2","PBO"), ?lty = c(1,2), title = "Treatment") summary(surv.by.vac) print(surv.by.vac) ?
Paul Miller
2012-Apr-14 13:36 UTC
[R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Hello Drs. Colosimo and Harrell, Thank you for your replies to my question. From Dr. Colosimo, I was able to determine that the SAS results can be replicated by adding the option conf.type="log-log" to my code as in : survobj <- survfit(survfrm, conf.type="log-log", data=Survival) Originally, it looked like the SAS results could be replicated using conf.type="plain". Applying this option to my actual data revealed that this was not the case, however.>From Dr. Harrell, I learned that using conf.type="log-log" may not be such a good idea. Interestingly though, I've seen at least one instance where experts in the R community use this option in their book. The book is about 10 years old. So maybe opinion about the use of this option has shifted since then.Thanks, Paul [[alternative HTML version deleted]]
Terry Therneau
2012-Apr-16 13:30 UTC
[R] Kaplan Meier analysis: 95% CI wider in R than in SAS
On 04/14/2012 05:00 AM, r-help-request at r-project.org wrote:> Am replicating in R an analysis I did earlier using SAS. See this as a test of whether I'm ready to start using R in my day-to-day work. > ? > Just finished replicating a Kaplan Meier analysis. Everything seems to work out fine except for one thing. The 95% CI around my estimate for the median is substantially larger in R than in SAS. For example, in SAS I have a median of 3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median of 3.29 with a 95% CI of [1.35,?13.35]. > ? > Can anyone tell me why I get this difference? >The confidence interval for the median is based on the confidence intervals for the curves. There are several methods for computing confidence intervals for the curves: plain, log, log-log, or logit scale. There are opinions on which is best, and it is a close race: except for the first of these. The type "plain" intervals are awful, it's like putting me in one lane of a championship 100 meter dash. Until about version 9 the only option in SAS was "plain", then for a time it was still the default. By 9.2 they finally went to loglog. Terry Therneau