Dear R users, I am not asking questions specifically on R, but I know there are many statistical experts here in the R community, so here it goes my questions: Freedman (1982) propose an approximation of sample size/power calculation based on log-rank test using the formula below (This is what nQuery does): (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2 N = --------------------------------------------- (2-p1-p2)*(hazard.ratio-1)^2 Where Z is the standard normal cumulative distribution. p1 and p2 are the survival probability of the 2 groups at a given time, say t. As you can see, the sample size depends on the survival probabilities, p1 and p2. This is where my question lies. Let?s say we have 2 survival curves. I can choose p1 and p2 at time 1 year, and calculate a sample size. I can also choose p1 and p2 at time 5 years (still the same hazard ratio since the same 2 survival curves), and calculate a different sample size. How to interpret the 2 estimates of sample size? This problem doesn?t occur when we calculate the number of events required using this formula: 4*( Z(?/side)+Z(power))^2 -------------------------- (log(hazard.ratio))^2 Because number of events required only depends on hazard ratio. Thanks for any suggestions. John
array chip wrote:> Dear R users, I am not asking questions specifically on R, but I know there are many statistical experts here in the R community, so here it goes my questions: > > Freedman (1982) propose an approximation of sample size/power calculation based on log-rank test using the formula below (This is what nQuery does): > (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2 > N = --------------------------------------------- > (2-p1-p2)*(hazard.ratio-1)^2 > > Where Z is the standard normal cumulative distribution. p1 and p2 are the survival probability of the 2 groups at a given time, say t. > > As you can see, the sample size depends on the survival probabilities, p1 and p2. This is where my question lies. Let?s say we have 2 survival curves. I can choose p1 and p2 at time 1 year, and calculate a sample size. I can also choose p1 and p2 at time 5 years (still the same hazard ratio since the same 2 survival curves), and calculate a different sample size. How to interpret the 2 estimates of sample size? > > This problem doesn?t occur when we calculate the number of events required using this formula: > 4*( Z(?/side)+Z(power))^2 > -------------------------- > (log(hazard.ratio))^2 > > Because number of events required only depends on hazard ratio. > > Thanks for any suggestions. > > JohnAs I recall, the survival probability used in Freedman is not at some arbitrary time of your choosing, but rather at the average length of follow-up time anticipated in the study. Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
On 05/06/2010 07:20 PM, Kevin E. Thorpe wrote:> array chip wrote: >> Dear R users, I am not asking questions specifically on R, but I know >> there are many statistical experts here in the R community, so here it >> goes my questions: >> >> Freedman (1982) propose an approximation of sample size/power >> calculation based on log-rank test using the formula below (This is >> what nQuery does): >> (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2 >> N = --------------------------------------------- >> (2-p1-p2)*(hazard.ratio-1)^2 >> >> Where Z is the standard normal cumulative distribution. p1 and p2 are >> the survival probability of the 2 groups at a given time, say t. >> >> As you can see, the sample size depends on the survival probabilities, >> p1 and p2. This is where my question lies. Let?s say we have 2 >> survival curves. I can choose p1 and p2 at time 1 year, and calculate >> a sample size. I can also choose p1 and p2 at time 5 years (still the >> same hazard ratio since the same 2 survival curves), and calculate a >> different sample size. How to interpret the 2 estimates of sample size? >> >> This problem doesn?t occur when we calculate the number of events >> required using this formula: >> 4*( Z(?/side)+Z(power))^2 >> -------------------------- >> (log(hazard.ratio))^2Note that this formula makes an unnecessary approximation that the number of events is the same in both groups. See the Hmisc package cpower, spower, ciapower functions for more info. Frank>> >> Because number of events required only depends on hazard ratio. >> >> Thanks for any suggestions. >> >> John > > As I recall, the survival probability used in Freedman is not at some > arbitrary time of your choosing, but rather at the average length of > follow-up time anticipated in the study. > > Kevin >-- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University
Thank you Joris. Your explanation makes sense. What nQuery does is confusing though. The software simply ask for p1 and p2 at any given time t, and then calculate the sample size using the formula. For example , the intepretation can be something like "100 patients per group are needed to detect the difference of p1=0.8 and p2=0.6 at time t at 5% significance level with 80% power". It seems like To calculate sample size, user just need to provide p1 and p2 at ANY given time during the follow up. This is where my confusion rose because sample size will be different based on how you choose the time point at which p1 and p2 were selected. My guess the time t at which p1 and p2 are selected is not any time point. It seems to be at the end of follow up, i.e. time t is the length of follow up. Let's say, if t=1 year, the above example should be "100 patients per group have to be followed up for 1 year to detect the difference of p1=0.8 and p2=0.6 at 1 year at 5% significance level with 80% power". If t=5 years, then the intepretation is "100 patients per group have to be followed up for 5 years to detect the difference of p1=0.8 and p2=0.6 at 5 years at 5% significance level with 80% power" any comments are appreciated. John --- On Thu, 5/6/10, Joris Meys <jorismeys@gmail.com> wrote: From: Joris Meys <jorismeys@gmail.com> Subject: Re: [R] sample size for survival curves To: "array chip" <arrayprofile@yahoo.com> Date: Thursday, May 6, 2010, 8:12 PM It sounds logic to get different sample sizes depending on the time you run the experiment. Say you expect a fixed death rate of 5% and 10% in both groups. take 20 patients in every group, and after one year you have 19 and 18 survivors, respectively. After 5 years, you have 15 and 10 survivors, which is a bigger difference, and can hence be more easily detected. Cheers Joris On Fri, May 7, 2010 at 1:45 AM, array chip <arrayprofile@yahoo.com> wrote: Dear R users, I am not asking questions specifically on R, but I know there are many statistical experts here in the R community, so here it goes my questions: Freedman (1982) propose an approximation of sample size/power calculation based on log-rank test using the formula below (This is what nQuery does): (Z(1-α/side)+Z(power))^2*(hazard.ratio+1)^2 N = --------------------------------------------- (2-p1-p2)*(hazard.ratio-1)^2 Where Z is the standard normal cumulative distribution. p1 and p2 are the survival probability of the 2 groups at a given time, say t. As you can see, the sample size depends on the survival probabilities, p1 and p2. This is where my question lies. Let’s say we have 2 survival curves. I can choose p1 and p2 at time 1 year, and calculate a sample size. I can also choose p1 and p2 at time 5 years (still the same hazard ratio since the same 2 survival curves), and calculate a different sample size. How to interpret the 2 estimates of sample size? This problem doesn’t occur when we calculate the number of events required using this formula: 4*( Z(α/side)+Z(power))^2 -------------------------- (log(hazard.ratio))^2 Because number of events required only depends on hazard ratio. Thanks for any suggestions. John ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 Joris.Meys@Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Thanks Kevin. I thought the time t is at the end of follow-up (length of follow-up)? John --- On Thu, 5/6/10, Kevin E. Thorpe <kevin.thorpe at utoronto.ca> wrote:> From: Kevin E. Thorpe <kevin.thorpe at utoronto.ca> > Subject: Re: [R] sample size for survival curves > To: "array chip" <arrayprofile at yahoo.com> > Cc: r-help at r-project.org > Date: Thursday, May 6, 2010, 8:20 PM > array chip wrote: > > Dear R users, I am not asking questions specifically > on R, but I know there are many statistical experts here in > the R community, so here it goes my questions: > > > > Freedman (1982) propose an approximation of sample > size/power calculation based on log-rank test using the > formula below (This is what nQuery does): > >? ? ? ? ? ? ? > (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2 > >? ? ???N? =? > --------------------------------------------- > >? ? ? ? ? ? ? ? > ? ? ? (2-p1-p2)*(hazard.ratio-1)^2 > > > > Where Z is the standard normal cumulative > distribution. p1 and p2 are the survival probability of the > 2 groups at a given time, say t. > > > > As you can see, the sample size depends on the > survival probabilities, p1 and p2. This is where my question > lies. Let?s say we have 2 survival curves. I can choose p1 > and p2 at time 1 year, and calculate a sample size. I can > also choose p1 and p2 at time 5 years (still the same hazard > ratio since the same 2 survival curves), and calculate a > different sample size. How to interpret the 2 estimates of > sample size? > > > > This problem doesn?t occur when we calculate the > number of events required using this formula: > >? ? ? ? ? ? ? ? > 4*( Z(?/side)+Z(power))^2 > >? ? ? ? ? ? > ???-------------------------- > >? ? ? ? ? ? ? ? > ? (log(hazard.ratio))^2 > > > > Because number of events required only depends on > hazard ratio. > > > > Thanks for any suggestions. > > > > John > > As I recall, the survival probability used in Freedman is > not at some arbitrary time of your choosing, but rather at > the average length of follow-up time anticipated in the > study. > > Kevin > > -- Kevin E. Thorpe > Biostatistician/Trialist, Knowledge Translation Program > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.thorpe at utoronto.ca? > Tel: 416.864.5776? Fax: 416.864.3016 >