Wassell, James T., Ph.D.
2006-Mar-08 15:01 UTC
[R] power and sample size for a GLM with Poisson response variable
Craig, Thanks for your follow-up note on using the asypow package. My problem was not only constructing the "constraints" vector but, for my particular situation (Poisson regression, two groups, sample sizes of (1081,3180), I get very different results using asypow package compared to my other (home grown) approaches. library(asypow) pois.mean<-c(0.0065,0.0003) info.pois <- info.poisson.kgroup(pois.mean, group.size=c(1081,3180)) constraints <- matrix(c(2,1,2), ncol = 3, byrow=T) poisson.object <- asypow.noncent(pois.mean, info.pois,constraints) power.pois <- asypow.power(poisson.object, c(1081,3180), 0.05) print(power.pois) [1] 0.2438309 asy.pwr() # the function is shown below. $power [1] 0.96 sim.pwr() # the function is shown below. 4261000 Poisson random variates simulated $power [1] 0.567 I tend to think the true power is greater than 0.567 but less than 0.96 (not as small as 0.2438309). Maybe I am still not using the asypow functions correctly? Suggestions/comments welcome. -----Original Message----- From: Craig A. Faulhaber [mailto:caf at gis.usu.edu] Sent: Saturday, March 04, 2006 12:04 PM To: Wassell, James T., Ph.D. Subject: Re: [R] power and sample size for a GLM with Poisson response variable Hi James, Thanks again for your help. With the assistance of a statistician colleauge of mine, I figured out the "constraints" vector. It is a 3-column vector describing the null hypothesis. For example, let's say you have 3 populations and your null hypothesis is that there is no difference between the 3. The first row of the matrix would be "3 1 2" indicating that you have 3 populations and that populations 1 and 2 are equal. The second row would be "3 2 3" indicating that you have 3 populations and that populations 2 and 3 are equal. If you had only 2 populations, there would be only one row ("2 1 2", indicating 2 populations with 1 and 2 equal). I hope this helps. Craig Wassell, James T., Ph.D. wrote:> Craig, I found the package asypow difficult to use and it did not > yield results in the ballpark of other approaches. (could not figure> out the "constraints" vector). > > I wrote some simple functions, one asy.pwr uses the non-central > chi-square distn. > > asy.pwr<-function(counts=c(7,1),py=c(1081,3180)) > { # a two group Poisson regression power computation # py is > person-years or person-time however measured > group<-gl(2,1) > ncp<-summary(glm(counts~group+offset(log(py)),family=poisson))$null.de > viance > > q.tile<-qchisq(.95,1) # actually just the X2 critical value of > 3.841459 list(power=round(1-pchisq(q.tile,df=1,ncp),2))} > > The second function, sim.pwr, estimates power using simulated Poisson> random variates. The most time consuming step is the call to rpois.> (Maybe someone knows a more efficient way to accomplish this?). The > "for" loop is rather quick in comparison. I hope you may find this > helpful, or if you have solved your problem some other way, please > pass along your approach. Note, that for this problem, very small > values of lambda, the two approaches give much different power > estimates (96% vs. 55% or so). My problem may be better addressed > as binomial logistic regression, maybe then the simulation and the > asymptotic estimates my agree better. > > sim.pwr<-function(means=c(0.0065,0.0003),ptime=c(1081,3180),nsim=1000) > { # a two group poisson regression power computation # based > simulating lots of Poisson r.v.'s # input rates followed by a vector > of the corresponding person times # the most time consuming part is > the r.v. generation. > # power is determined by counting the how often p-values are <= 0.05 > group<-as.factor(rep(c(1,2),ptime)) > rej<-vector(length=nsim) > y<-rpois(ptime[1]*nsim,means[1]) > y<-c(y,rpois(ptime[2]*nsim,means[2])) > y<-matrix(y,nrow=nsim) > cat(sum(ptime)*nsim,"Poisson random variates simulated","\n") for(i in> 1:nsim){res<-glm(y[i,]~group,family=poisson()) > rej[i]<-summary(res)$coeff[2,4]<=0.05} > list(power=sum(rej)/nsim) } > > > >
Ken Beath
2006-Mar-09 12:18 UTC
[R] power and sample size for a GLM with Poisson response variable
The problem with the simulation is with the second group where there is a high probability of obtaining all zeroes for the sample and this is causing problems with the Wald statistics you are using to check for a difference. Here is an example. Browse[1]> summary(res) Call: glm(formula = y[i, ] ~ group, family = poisson()) Deviance Residuals: Min 1Q Median 3Q Max -1.290e-01 -1.290e-01 -1.231e-05 -1.231e-05 2.756e+00 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.7884 0.3333 -14.365 <2e-16 *** group2 -18.5142 1235.1829 -0.015 0.988 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 110.881 on 4260 degrees of freedom Residual deviance: 86.192 on 4259 degrees of freedom AIC: 108.19 Number of Fisher Scoring iterations: 21 On Wed, 8 Mar 2006 Wassell, James T., Ph.D. wrote:> Subject: Re: [R] power and sample size for a GLM with Poisson response > variable > To: <r-help at stat.math.ethz.ch> > Cc: "Craig A. Faulhaber" <caf at gis.usu.edu> > Message-ID: > <AF2DCD619279544BA454141F4A45B9E306A3E8 at m-niosh-3.niosh.cdc.gov> > Content-Type: text/plain; charset="US-ASCII" > > Craig, Thanks for your follow-up note on using the asypow package. My > problem was not only constructing the "constraints" vector but, for my > particular situation (Poisson regression, two groups, sample sizes of > (1081,3180), I get very different results using asypow package > compared > to my other (home grown) approaches. > > library(asypow) > pois.mean<-c(0.0065,0.0003) > info.pois <- info.poisson.kgroup(pois.mean, group.size=c(1081,3180)) > constraints <- matrix(c(2,1,2), ncol = 3, byrow=T) > poisson.object <- asypow.noncent(pois.mean, info.pois,constraints) > power.pois <- asypow.power(poisson.object, c(1081,3180), 0.05) > print(power.pois) > > [1] 0.2438309 > > asy.pwr() # the function is shown below. > > $power > [1] 0.96 > > sim.pwr() # the function is shown below. > 4261000 Poisson random variates simulated > $power > [1] 0.567 > > I tend to think the true power is greater than 0.567 but less than > 0.96 > (not as small as 0.2438309). > > Maybe I am still not using the asypow functions correctly? > Suggestions/comments welcome. > > > -----Original Message----- > From: Craig A. Faulhaber [mailto:caf at gis.usu.edu] > Sent: Saturday, March 04, 2006 12:04 PM > To: Wassell, James T., Ph.D. > Subject: Re: [R] power and sample size for a GLM with Poisson response > variable > > Hi James, > > Thanks again for your help. With the assistance of a statistician > colleauge of mine, I figured out the "constraints" vector. It is a > 3-column vector describing the null hypothesis. For example, let's > say > you have 3 populations and your null hypothesis is that there is no > difference between the 3. The first row of the matrix would be "3 > 1 2" > indicating that you have 3 populations and that populations 1 and 2 > are > equal. The second row would be "3 2 3" indicating that you have 3 > populations and that populations 2 and 3 are equal. If you had only 2 > populations, there would be only one row ("2 1 2", indicating 2 > populations with 1 and 2 equal). I hope this helps. > > Craig > > Wassell, James T., Ph.D. wrote: > >> Craig, I found the package asypow difficult to use and it did not >> yield results in the ballpark of other approaches. (could not >> figure > >> out the "constraints" vector). >> >> I wrote some simple functions, one asy.pwr uses the non-central >> chi-square distn. >> >> asy.pwr<-function(counts=c(7,1),py=c(1081,3180)) >> { # a two group Poisson regression power computation # py is >> person-years or person-time however measured >> group<-gl(2,1) >> ncp<-summary(glm(counts~group+offset(log(py)),family=poisson)) >> $null.de >> viance >> >> q.tile<-qchisq(.95,1) # actually just the X2 critical value of >> 3.841459 list(power=round(1-pchisq(q.tile,df=1,ncp),2))} >> >> The second function, sim.pwr, estimates power using simulated >> Poisson > >> random variates. The most time consuming step is the call to rpois. > >> (Maybe someone knows a more efficient way to accomplish this?). The >> "for" loop is rather quick in comparison. I hope you may find this >> helpful, or if you have solved your problem some other way, please >> pass along your approach. Note, that for this problem, very small >> values of lambda, the two approaches give much different power >> estimates (96% vs. 55% or so). My problem may be better addressed >> as binomial logistic regression, maybe then the simulation and the >> asymptotic estimates my agree better. >> >> sim.pwr<-function(means=c(0.0065,0.0003),ptime=c >> (1081,3180),nsim=1000) >> { # a two group poisson regression power computation # based >> simulating lots of Poisson r.v.'s # input rates followed by a vector >> of the corresponding person times # the most time consuming part is >> the r.v. generation. >> # power is determined by counting the how often p-values are <= 0.05 >> group<-as.factor(rep(c(1,2),ptime)) >> rej<-vector(length=nsim) >> y<-rpois(ptime[1]*nsim,means[1]) >> y<-c(y,rpois(ptime[2]*nsim,means[2])) >> y<-matrix(y,nrow=nsim) >> cat(sum(ptime)*nsim,"Poisson random variates simulated","\n") for >> (i in > >> 1:nsim){res<-glm(y[i,]~group,family=poisson()) >> rej[i]<-summary(res)$coeff[2,4]<=0.05} >> list(power=sum(rej)/nsim) } >> >> >> >> >