Bryan Hanson
2009-Sep-16 01:58 UTC
[R] Teasing out logrank differences *between* groups using survdiff or something else?
R Folk: Please forgive what I'm sure is a fairly na?ve question; I hope it's clear. A colleague and I have been doing a really simple one-off survival analysis, but this is an area with which we are not very familiar, we just happen to have gathered some data that needs this type of analysis. We've done quite a bit of reading, but answers escape us, even though the question below seems simple. Considering the following example from ?survdiff:> survdiff(Surv(time, status) ~ pat.karno, data=lung)Call: survdiff(formula = Surv(time, status) ~ pat.karno, data = lung) n=225, 3 observations deleted due to missingness. N Observed Expected (O-E)^2/E (O-E)^2/V pat.karno=30 2 1 0.658 0.1774 0.179 pat.karno=40 2 1 1.337 0.0847 0.086 pat.karno=50 4 4 1.079 7.9088 8.013 pat.karno=60 30 27 15.237 9.0808 10.148 pat.karno=70 41 31 26.264 0.8540 1.027 pat.karno=80 51 39 40.881 0.0865 0.117 pat.karno=90 60 38 49.411 2.6354 3.853 pat.karno=100 35 21 27.133 1.3863 1.684 Chisq= 22.6 on 7 degrees of freedom, p= 0.00202 The p value here is for the entire group (right?). How do we go about determining the p value for the comparison of any four arbitrary groups in all combinations, say pat.karno = 40, 60, 80, and 100? We know (we think) that we can't just run the coxph analysis for the only the groups of interest, as the hazard ratio for any one group in an analysis with several groups is computed by holding the other groups at their average value, so the hazard ratio varies by the context. Seems like we need some sort of t-test or chi-squared test, but being mere chemists and molecular biologists, we don't quite see it and wouldn't trust ourselves anyway, given the special nature of survival analysis. Manual instructions or a function suggestion would be great. Thanks in Advance, Bryan ************* Bryan Hanson Professor of Chemistry & Biochemistry DePauw University, Greencastle IN USA
David Winsemius
2009-Sep-16 02:41 UTC
[R] Teasing out logrank differences *between* groups using survdiff or something else?
On Sep 15, 2009, at 9:58 PM, Bryan Hanson wrote:> R Folk: > > Please forgive what I'm sure is a fairly na?ve question; I hope it's > clear. > A colleague and I have been doing a really simple one-off survival > analysis, > but this is an area with which we are not very familiar, we just > happen to > have gathered some data that needs this type of analysis. We've > done quite > a bit of reading, but answers escape us, even though the question > below > seems simple. > > Considering the following example from ?survdiff: > >> survdiff(Surv(time, status) ~ pat.karno, data=lung) > Call: > survdiff(formula = Surv(time, status) ~ pat.karno, data = lung) > > n=225, 3 observations deleted due to missingness. > > N Observed Expected (O-E)^2/E (O-E)^2/V > pat.karno=30 2 1 0.658 0.1774 0.179 > pat.karno=40 2 1 1.337 0.0847 0.086 > pat.karno=50 4 4 1.079 7.9088 8.013 > pat.karno=60 30 27 15.237 9.0808 10.148 > pat.karno=70 41 31 26.264 0.8540 1.027 > pat.karno=80 51 39 40.881 0.0865 0.117 > pat.karno=90 60 38 49.411 2.6354 3.853 > pat.karno=100 35 21 27.133 1.3863 1.684 > > Chisq= 22.6 on 7 degrees of freedom, p= 0.00202 > > The p value here is for the entire group (right?). How do we go about > determining the p value for the comparison of any four arbitrary > groups in > all combinations, say pat.karno = 40, 60, 80, and 100? > > We know (we think) that we can't just run the coxph analysis for the > only > the groups of interest, as the hazard ratio for any one group in an > analysis > with several groups is computed by holding the other groups at their > average > value, so the hazard ratio varies by the context. > > Seems like we need some sort of t-test or chi-squared test, but > being mere > chemists and molecular biologists, we don't quite see it and > wouldn't trust > ourselves anyway, given the special nature of survival analysis. > Manual > instructions or a function suggestion would be great.The column labeled (O-E)^2/V should be distributed (at least if these were large samples) as a chi-square statistic with 1 d.f. each. You've got 3 in there that exceed the nominal 95th%ile of a X^2, but you really ought to be looking at the actual to expected looked along the ordinal scale. "pat.karno" is pretty clearly, (at least to this physician) the Karnofsky Performance score, and since that is ordinal, you probably should be doing a trend test. Perhaps you could set up pat.karno as an ordered variable and see if survdiff gives you a meaningful output. If you are dealing with an analysis that might affect people's lives, wouldn't it be ethically safer to involve a real statistician?>-- David Winsemius, MD Heritage Laboratories West Hartford, CT
Thomas Lumley
2009-Sep-16 02:43 UTC
[R] Teasing out logrank differences *between* groups using survdiff or something else?
I think you do in fact want to just run the analysis for the four groups you are interested in. The logrank chisquared test would then be of the hypothesis that these four groups have the same survival and censoring distributions, with the greatest power for detecting proportional-hazards differences between the groups. You are correct in noting that the results you get for comparing these four groups would change depending on what other groups are in the analysis. This is a seriously underappreciated property of rank-based analyses. However, because of this dependence I think you can make a good case that restricting the analysis to the groups of interest is the best way to run the test. -thomas On Tue, 15 Sep 2009, Bryan Hanson wrote:> R Folk: > > Please forgive what I'm sure is a fairly na?ve question; I hope it's clear. > A colleague and I have been doing a really simple one-off survival analysis, > but this is an area with which we are not very familiar, we just happen to > have gathered some data that needs this type of analysis. We've done quite > a bit of reading, but answers escape us, even though the question below > seems simple. > > Considering the following example from ?survdiff: > >> survdiff(Surv(time, status) ~ pat.karno, data=lung) > Call: > survdiff(formula = Surv(time, status) ~ pat.karno, data = lung) > > n=225, 3 observations deleted due to missingness. > > N Observed Expected (O-E)^2/E (O-E)^2/V > pat.karno=30 2 1 0.658 0.1774 0.179 > pat.karno=40 2 1 1.337 0.0847 0.086 > pat.karno=50 4 4 1.079 7.9088 8.013 > pat.karno=60 30 27 15.237 9.0808 10.148 > pat.karno=70 41 31 26.264 0.8540 1.027 > pat.karno=80 51 39 40.881 0.0865 0.117 > pat.karno=90 60 38 49.411 2.6354 3.853 > pat.karno=100 35 21 27.133 1.3863 1.684 > > Chisq= 22.6 on 7 degrees of freedom, p= 0.00202 > > The p value here is for the entire group (right?). How do we go about > determining the p value for the comparison of any four arbitrary groups in > all combinations, say pat.karno = 40, 60, 80, and 100? > > We know (we think) that we can't just run the coxph analysis for the only > the groups of interest, as the hazard ratio for any one group in an analysis > with several groups is computed by holding the other groups at their average > value, so the hazard ratio varies by the context. > > Seems like we need some sort of t-test or chi-squared test, but being mere > chemists and molecular biologists, we don't quite see it and wouldn't trust > ourselves anyway, given the special nature of survival analysis. Manual > instructions or a function suggestion would be great. > > Thanks in Advance, Bryan > ************* > Bryan Hanson > Professor of Chemistry & Biochemistry > DePauw University, Greencastle IN USA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle