thr3ads.net - R help - [R] Understanding power analysis in glm and binomial proportion test [May 2014]

If this information is useful, please help other people find it:
Share via:

Wall, Wade A ERDC-RDE-CERL-IL

2014-May-01 14:00 UTC

[R] Understanding power analysis in glm and binomial proportion test

Hi all,

I am trying to run a power analysis using simulated data to compare the power of
a glm versus a binomial proportion test to detect differences in proportions.
For example, suppose you have some proportion that decreases by some amount over
X number of time steps.
.4,.39,.38,.37 . . . . .01 and simulate data over those time steps based on the
decrease. Does a glm approach (differences in slope) "outperform" an
approach whereby you simply look at proportional differences. I ran simulations
and basically summed up the number of times p<.05 divided by the number of
trials.  I was expecting that a glm approach would have more power because it
would be utilizing all the data across multiple time steps, whereas a binomial
proportion test is only comparing two populations ( the beginning and the end
points).

However the results indicated that a binomial proportion test had more power
relative to a glm at fewer time steps and that, depending on the simulated
decrease in proportions, the relationship between glm power and binomial
proportion test power changed. Interestingly, a greater decreases, a binomial
proportion test seems have greater power compared to a glm, which to me is
counterintuitive since the slope should be greater.

I am attaching the code below my questions in case anyone is interested.

My questions are:


(1)    Am I interpreting the results and p-values correctly?

(2)    If I am interested in trends, does glm results really have lower power
and, if so, is there a way to combine the two tests?

I know that I am ignoring a lot of issues such as autocorrelation, but I am
really just trying to understand the output.

Any suggestions or insights would be appreciated.

##### Attempt at using glm model ######
ltProb <- 0.4                       ## longterm average of nest survival
probability
change <- c(.01,.03,.05)
yrs <- seq(1,20, by=1)                                                    ##
Years of inquiry, probability of detecting a yearly change nest survival
samplesize <- 50                                               ## Reasonable
sample size range
reps <- 1000                                                                 
## of simulations per
SurvProb <- ltProb                                                          
## initiating survival probablity for later use, nuisance
power <- matrix(nrow=length(change)*length(yrs)*length(samplesize), ncol=5)
## creating a matrix to hold data

scenario <- 0                                                                
## initializing scenarios which will be a placeholder later on
for (a in 1:length(change)){                                                    
## loop through yearly pop change scenarios
  for (c in 1:length(samplesize)){                                              
## loop through sample size scenarios
    for (b in 1:length(yrs)){                                                   
## loop through years sceanarios
      scenario=scenario+1
      power[scenario,1] <- change[a]                                         
## filling matrix with pop change used
      power[scenario,2]<-ltProb*((1-change[a])**yrs[b])
      power[scenario,3] <- yrs[b]                                            
## filling matrix with number of years used
      power[scenario,4] <- samplesize[c]                                     
## filling matrix with sample size used
    }
  }
}
colnames(power) <- c("PopChange",
"Proportion2","yrs", "Sample_Size",
"Power")
power=as.data.frame(power)
power$Sample_Size=as.numeric(power$Sample_Size)

scenario=levels(as.factor(power$PopChange)) ## To subset power
ps = levels(as.factor(power$Sample_Size))
results <- matrix(nrow=0, ncol=5) ## final matrix
results=as.data.frame(results)
reps=1000 ## set number of reps

for (k in 1:length(scenario)){
  for (m in 1:length(ps)){
    sub=power[(power$PopChange==scenario[k]) & power$Sample_Size==ps[m],] ##
Subset by scenario and sample size
    probs = matrix(nrow=1000,ncol=16) ## this is matrix for probabilities.
    dat=matrix(nrow=0,ncol=3,NA)
    dat=as.data.frame(dat)
    for (l in 1:reps){ ##This should be for the replicates
      dat=matrix(nrow=0,ncol=3,NA)
      dat=as.data.frame(dat)
      print(l)
      for (j in 1:nrow(sub)){
        tmp=rbinom(n=sub$Sample_Size[j],size=1,prob=sub[j,2])
       
tmp.dat=cbind(tmp,rep(sub$yrs[j],sub$Sample_Size[j]),rep(sub$Proportion2[j],sub$Sample_Size[j]))
        dat=rbind(dat,tmp.dat)
        if (sub$yrs[j]>4){
          x=(glm(dat[,1]~dat[,2],family=binomial))
          probs[l,j-4]=ifelse(summary(x)$coefficients[2,4]<.05,1,0)
        }
      }
    }
    sub$Power[5:20]=apply(probs,2,sum)/nrow(probs)
    results=rbind(results,sub)
  }
}

tmp=results[!is.na(results$Power),] ### remove NAs
tmp$SampleSize_Decline=paste(tmp$Sample_Size,tmp$PopChange,sep=":")
tmp$SampleSize_Decline=as.factor(tmp$SampleSize_Decline)

### Now do the same using prop.test #####
tmp$PropPower = NA
for (i in 1:nrow(tmp)){
  print(i)
  x=0
  for (j in 1:1000){
    a = rbinom(tmp$Sample_Size[i],1,prob = (tmp$Proportion2[i]))
   
b=binom.test(sum(a),length(a),p=.4,conf.level=.95,alternative="less")
    if (b$p.value < .05) {x=x+1}
  }
  tmp$PropPower[i] = x/1000
}

##### graph results
d=unique(tmp$PopChange)
for (i in 1:length(d)){
  par(mfcol = c(1,2))
  sub = tmp[tmp$PopChange == d[i],]
  plot(sub$yrs,sub$Power,col="red", main = d[i], xlab = "Number
of years",ylab="Power")
  points(sub$yrs,sub$PropPower,col="black")
  plot(sub$PropPower,sub$Power,xlab="Binomial Proportion Power",
ylab="GLM Power",main=d[i])
  abline(0,1)
  readline("Press <return to continue")
}


##### End Code #####



Wade A. Wall
US Army ERDC-CERL
P.O. Box 9005
Champaign, IL  61826-9005
1-217-373-4420
Wade.A.Wall@usace.army.mil<mailto:Wade.A.Wall@usace.army.mil>




	[[alternative HTML version deleted]]

Bert Gunter

2014-May-01 18:42 UTC

head link

[R] Understanding power analysis in glm and binomial proportion test

While R is certainly used for statistical simulations as you showed,
this list is really for questions about R programming, not statistics.
While they certainly overlap and someone may respond here, I suggest
you post this to stats.stackexchange.com or other statistics site that
is specifically for such issues.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Thu, May 1, 2014 at 7:00 AM, Wall, Wade A ERDC-RDE-CERL-IL
<Wade.A.Wall at usace.army.mil> wrote:> Hi all,
>
> I am trying to run a power analysis using simulated data to compare the
power of a glm versus a binomial proportion test to detect differences in
proportions. For example, suppose you have some proportion that decreases by
some amount over X number of time steps.
> .4,.39,.38,.37 . . . . .01 and simulate data over those time steps based on
the decrease. Does a glm approach (differences in slope) "outperform"
an approach whereby you simply look at proportional differences. I ran
simulations and basically summed up the number of times p<.05 divided by the
number of trials.  I was expecting that a glm approach would have more power
because it would be utilizing all the data across multiple time steps, whereas a
binomial proportion test is only comparing two populations ( the beginning and
the end points).
>
> However the results indicated that a binomial proportion test had more
power relative to a glm at fewer time steps and that, depending on the simulated
decrease in proportions, the relationship between glm power and binomial
proportion test power changed. Interestingly, a greater decreases, a binomial
proportion test seems have greater power compared to a glm, which to me is
counterintuitive since the slope should be greater.
>
> I am attaching the code below my questions in case anyone is interested.
>
> My questions are:
>
>
> (1)    Am I interpreting the results and p-values correctly?
>
> (2)    If I am interested in trends, does glm results really have lower
power and, if so, is there a way to combine the two tests?
>
> I know that I am ignoring a lot of issues such as autocorrelation, but I am
really just trying to understand the output.
>
> Any suggestions or insights would be appreciated.
>
> ##### Attempt at using glm model ######
> ltProb <- 0.4                       ## longterm average of nest survival
probability
> change <- c(.01,.03,.05)
> yrs <- seq(1,20, by=1)                                                  
## Years of inquiry, probability of detecting a yearly change nest survival
> samplesize <- 50                                               ##
Reasonable sample size range
> reps <- 1000                                                            
## of simulations per
> SurvProb <- ltProb                                                      
## initiating survival probablity for later use, nuisance
> power <- matrix(nrow=length(change)*length(yrs)*length(samplesize),
ncol=5) ## creating a matrix to hold data
>
> scenario <- 0                                                           
## initializing scenarios which will be a placeholder later on
> for (a in 1:length(change)){                                               
## loop through yearly pop change scenarios
>   for (c in 1:length(samplesize)){                                         
## loop through sample size scenarios
>     for (b in 1:length(yrs)){                                              
## loop through years sceanarios
>       scenario=scenario+1
>       power[scenario,1] <- change[a]                                    
## filling matrix with pop change used
>       power[scenario,2]<-ltProb*((1-change[a])**yrs[b])
>       power[scenario,3] <- yrs[b]                                       
## filling matrix with number of years used
>       power[scenario,4] <- samplesize[c]                                
## filling matrix with sample size used
>     }
>   }
> }
> colnames(power) <- c("PopChange",
"Proportion2","yrs", "Sample_Size",
"Power")
> power=as.data.frame(power)
> power$Sample_Size=as.numeric(power$Sample_Size)
>
> scenario=levels(as.factor(power$PopChange)) ## To subset power
> ps = levels(as.factor(power$Sample_Size))
> results <- matrix(nrow=0, ncol=5) ## final matrix
> results=as.data.frame(results)
> reps=1000 ## set number of reps
>
> for (k in 1:length(scenario)){
>   for (m in 1:length(ps)){
>     sub=power[(power$PopChange==scenario[k]) &
power$Sample_Size==ps[m],] ## Subset by scenario and sample size
>     probs = matrix(nrow=1000,ncol=16) ## this is matrix for probabilities.
>     dat=matrix(nrow=0,ncol=3,NA)
>     dat=as.data.frame(dat)
>     for (l in 1:reps){ ##This should be for the replicates
>       dat=matrix(nrow=0,ncol=3,NA)
>       dat=as.data.frame(dat)
>       print(l)
>       for (j in 1:nrow(sub)){
>         tmp=rbinom(n=sub$Sample_Size[j],size=1,prob=sub[j,2])
>        
tmp.dat=cbind(tmp,rep(sub$yrs[j],sub$Sample_Size[j]),rep(sub$Proportion2[j],sub$Sample_Size[j]))
>         dat=rbind(dat,tmp.dat)
>         if (sub$yrs[j]>4){
>           x=(glm(dat[,1]~dat[,2],family=binomial))
>           probs[l,j-4]=ifelse(summary(x)$coefficients[2,4]<.05,1,0)
>         }
>       }
>     }
>     sub$Power[5:20]=apply(probs,2,sum)/nrow(probs)
>     results=rbind(results,sub)
>   }
> }
>
> tmp=results[!is.na(results$Power),] ### remove NAs
>
tmp$SampleSize_Decline=paste(tmp$Sample_Size,tmp$PopChange,sep=":")
> tmp$SampleSize_Decline=as.factor(tmp$SampleSize_Decline)
>
> ### Now do the same using prop.test #####
> tmp$PropPower = NA
> for (i in 1:nrow(tmp)){
>   print(i)
>   x=0
>   for (j in 1:1000){
>     a = rbinom(tmp$Sample_Size[i],1,prob = (tmp$Proportion2[i]))
>    
b=binom.test(sum(a),length(a),p=.4,conf.level=.95,alternative="less")
>     if (b$p.value < .05) {x=x+1}
>   }
>   tmp$PropPower[i] = x/1000
> }
>
> ##### graph results
> d=unique(tmp$PopChange)
> for (i in 1:length(d)){
>   par(mfcol = c(1,2))
>   sub = tmp[tmp$PopChange == d[i],]
>   plot(sub$yrs,sub$Power,col="red", main = d[i], xlab =
"Number of years",ylab="Power")
>   points(sub$yrs,sub$PropPower,col="black")
>   plot(sub$PropPower,sub$Power,xlab="Binomial Proportion Power",
ylab="GLM Power",main=d[i])
>   abline(0,1)
>   readline("Press <return to continue")
> }
>
>
> ##### End Code #####
>
>
>
> Wade A. Wall
> US Army ERDC-CERL
> P.O. Box 9005
> Champaign, IL  61826-9005
> 1-217-373-4420
> Wade.A.Wall at usace.army.mil<mailto:Wade.A.Wall at usace.army.mil>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - May 2014 - Understanding power analysis in glm and binomial proportion test

[R] Understanding power analysis in glm and binomial proportion test

[R] Understanding power analysis in glm and binomial proportion test