thr3ads.net - R help - [R] sample size for survival curves [May 2010]

If this information is useful, please help other people find it:
Share via:

array chip

2010-May-06 23:45 UTC

[R] sample size for survival curves

Dear R users, I am not asking questions specifically on R, but I know there are
many statistical experts here in the R community, so here it goes my questions:

Freedman (1982) propose an approximation of sample size/power calculation based
on log-rank test using the formula below (This is what nQuery does):
             (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2
      N  =  ---------------------------------------------
                     (2-p1-p2)*(hazard.ratio-1)^2

Where Z is the standard normal cumulative distribution. p1 and p2 are the
survival probability of the 2 groups at a given time, say t.

As you can see, the sample size depends on the survival probabilities, p1 and
p2. This is where my question lies. Let?s say we have 2 survival curves. I can
choose p1 and p2 at time 1 year, and calculate a sample size. I can also choose
p1 and p2 at time 5 years (still the same hazard ratio since the same 2 survival
curves), and calculate a different sample size. How to interpret the 2 estimates
of sample size?

This problem doesn?t occur when we calculate the number of events required using
this formula:
               4*( Z(?/side)+Z(power))^2
              --------------------------
                 (log(hazard.ratio))^2

Because number of events required only depends on hazard ratio.

Thanks for any suggestions.

John

Kevin E. Thorpe

2010-May-07 00:20 UTC

head link

[R] sample size for survival curves

array chip wrote:> Dear R users, I am not asking questions specifically on R, but I know there
are many statistical experts here in the R community, so here it goes my
questions:
> 
> Freedman (1982) propose an approximation of sample size/power calculation
based on log-rank test using the formula below (This is what nQuery does):
>              (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2
>       N  =  ---------------------------------------------
>                      (2-p1-p2)*(hazard.ratio-1)^2
> 
> Where Z is the standard normal cumulative distribution. p1 and p2 are the
survival probability of the 2 groups at a given time, say t.
> 
> As you can see, the sample size depends on the survival probabilities, p1
and p2. This is where my question lies. Let?s say we have 2 survival curves. I
can choose p1 and p2 at time 1 year, and calculate a sample size. I can also
choose p1 and p2 at time 5 years (still the same hazard ratio since the same 2
survival curves), and calculate a different sample size. How to interpret the 2
estimates of sample size?
> 
> This problem doesn?t occur when we calculate the number of events required
using this formula:
>                4*( Z(?/side)+Z(power))^2
>               --------------------------
>                  (log(hazard.ratio))^2
> 
> Because number of events required only depends on hazard ratio.
> 
> Thanks for any suggestions.
> 
> John
As I recall, the survival probability used in Freedman is not at some 
arbitrary time of your choosing, but rather at the average length of 
follow-up time anticipated in the study.

Kevin

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

Frank E Harrell Jr

2010-May-07 02:25 UTC

head link

[R] sample size for survival curves

On 05/06/2010 07:20 PM, Kevin E. Thorpe wrote:> array chip wrote:
>> Dear R users, I am not asking questions specifically on R, but I know
>> there are many statistical experts here in the R community, so here it
>> goes my questions:
>>
>> Freedman (1982) propose an approximation of sample size/power
>> calculation based on log-rank test using the formula below (This is
>> what nQuery does):
>> (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2
>> N = ---------------------------------------------
>> (2-p1-p2)*(hazard.ratio-1)^2
>>
>> Where Z is the standard normal cumulative distribution. p1 and p2 are
>> the survival probability of the 2 groups at a given time, say t.
>>
>> As you can see, the sample size depends on the survival probabilities,
>> p1 and p2. This is where my question lies. Let?s say we have 2
>> survival curves. I can choose p1 and p2 at time 1 year, and calculate
>> a sample size. I can also choose p1 and p2 at time 5 years (still the
>> same hazard ratio since the same 2 survival curves), and calculate a
>> different sample size. How to interpret the 2 estimates of sample size?
>>
>> This problem doesn?t occur when we calculate the number of events
>> required using this formula:
>> 4*( Z(?/side)+Z(power))^2
>> --------------------------
>> (log(hazard.ratio))^2
Note that this formula makes an unnecessary approximation that the 
number of events is the same in both groups.

See the Hmisc package cpower, spower, ciapower functions for more info.

Frank
>>
>> Because number of events required only depends on hazard ratio.
>>
>> Thanks for any suggestions.
>>
>> John
>
> As I recall, the survival probability used in Freedman is not at some
> arbitrary time of your choosing, but rather at the average length of
> follow-up time anticipated in the study.
>
> Kevin
>

-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

array chip

2010-May-07 05:32 UTC

head link

[R] sample size for survival curves

Thank you Joris. Your explanation makes sense. 
 
What nQuery does is confusing though. The software simply ask for p1 and p2 at
any given time t, and then calculate the sample size using the formula. For
example , the intepretation can be something like "100 patients per group
are needed to detect the difference of p1=0.8 and p2=0.6 at time t at 5%
significance level with 80% power". It seems like To calculate sample size,
user just need to provide p1 and p2 at ANY given time during the follow up. This
is where my confusion rose because sample size will be different based on how
you choose the time point at which p1 and p2 were selected.
 
My guess the time t at which p1 and p2 are selected is not any time point. It
seems to be at the end of follow up, i.e. time t is the length of follow up.
Let's say, if t=1 year, the above example should be "100 patients per
group have to be followed up for 1 year to detect the difference of p1=0.8 and
p2=0.6 at 1 year at 5% significance level with 80% power". If t=5 years,
then the intepretation is "100 patients per group have to be followed up
for 5 years to detect the difference of p1=0.8 and p2=0.6 at 5 years at 5%
significance level with 80% power"

any comments are appreciated.
 
John

--- On Thu, 5/6/10, Joris Meys <jorismeys@gmail.com> wrote:


From: Joris Meys <jorismeys@gmail.com>
Subject: Re: [R] sample size for survival curves
To: "array chip" <arrayprofile@yahoo.com>
Date: Thursday, May 6, 2010, 8:12 PM


It sounds logic to get different sample sizes depending on the time you run the
experiment. Say you expect a fixed death rate of 5% and 10%  in both groups.
take 20 patients in every group, and after one year you have 19 and 18
survivors, respectively. After 5 years, you have 15 and 10 survivors, which is a
bigger difference, and can hence be more easily detected.

Cheers
Joris


On Fri, May 7, 2010 at 1:45 AM, array chip <arrayprofile@yahoo.com> wrote:

Dear R users, I am not asking questions specifically on R, but I know there are
many statistical experts here in the R community, so here it goes my questions:

Freedman (1982) propose an approximation of sample size/power calculation based
on log-rank test using the formula below (This is what nQuery does):
            (Z(1-α/side)+Z(power))^2*(hazard.ratio+1)^2
     N  =  ---------------------------------------------
                    (2-p1-p2)*(hazard.ratio-1)^2

Where Z is the standard normal cumulative distribution. p1 and p2 are the
survival probability of the 2 groups at a given time, say t.

As you can see, the sample size depends on the survival probabilities, p1 and
p2. This is where my question lies. Let’s say we have 2 survival curves. I can
choose p1 and p2 at time 1 year, and calculate a sample size. I can also choose
p1 and p2 at time 5 years (still the same hazard ratio since the same 2 survival
curves), and calculate a different sample size. How to interpret the 2 estimates
of sample size?

This problem doesn’t occur when we calculate the number of events required using
this formula:
              4*( Z(α/side)+Z(power))^2
             --------------------------
                (log(hazard.ratio))^2

Because number of events required only depends on hazard ratio.

Thanks for any suggestions.

John




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering 
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
Joris.Meys@Ugent.be 
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




      
	[[alternative HTML version deleted]]

array chip

2010-May-07 05:35 UTC

head link

[R] sample size for survival curves

Thanks Kevin. I thought the time t is at the end of follow-up (length of
follow-up)?

John

--- On Thu, 5/6/10, Kevin E. Thorpe <kevin.thorpe at utoronto.ca> wrote:
> From: Kevin E. Thorpe <kevin.thorpe at utoronto.ca>
> Subject: Re: [R] sample size for survival curves
> To: "array chip" <arrayprofile at yahoo.com>
> Cc: r-help at r-project.org
> Date: Thursday, May 6, 2010, 8:20 PM
> array chip wrote:
> > Dear R users, I am not asking questions specifically
> on R, but I know there are many statistical experts here in
> the R community, so here it goes my questions:
> > 
> > Freedman (1982) propose an approximation of sample
> size/power calculation based on log-rank test using the
> formula below (This is what nQuery does):
> >? ? ? ? ? ? ?
> (Z(1-?/side)+Z(power))^2*(hazard.ratio+1)^2
> >? ? ???N? =?
> ---------------------------------------------
> >? ? ? ? ? ? ? ?
> ? ? ? (2-p1-p2)*(hazard.ratio-1)^2
> > 
> > Where Z is the standard normal cumulative
> distribution. p1 and p2 are the survival probability of the
> 2 groups at a given time, say t.
> > 
> > As you can see, the sample size depends on the
> survival probabilities, p1 and p2. This is where my question
> lies. Let?s say we have 2 survival curves. I can choose p1
> and p2 at time 1 year, and calculate a sample size. I can
> also choose p1 and p2 at time 5 years (still the same hazard
> ratio since the same 2 survival curves), and calculate a
> different sample size. How to interpret the 2 estimates of
> sample size?
> > 
> > This problem doesn?t occur when we calculate the
> number of events required using this formula:
> >? ? ? ? ? ? ? ?
> 4*( Z(?/side)+Z(power))^2
> >? ? ? ? ? ?
> ???--------------------------
> >? ? ? ? ? ? ? ?
> ? (log(hazard.ratio))^2
> > 
> > Because number of events required only depends on
> hazard ratio.
> > 
> > Thanks for any suggestions.
> > 
> > John
> 
> As I recall, the survival probability used in Freedman is
> not at some arbitrary time of your choosing, but rather at
> the average length of follow-up time anticipated in the
> study.
> 
> Kevin
> 
> -- Kevin E. Thorpe
> Biostatistician/Trialist, Knowledge Translation Program
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.thorpe at utoronto.ca?
> Tel: 416.864.5776? Fax: 416.864.3016
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - May 2010 - sample size for survival curves

[R] sample size for survival curves

[R] sample size for survival curves

[R] sample size for survival curves

[R] sample size for survival curves

[R] sample size for survival curves

Apparently Analagous Threads