Wall, Wade A ERDC-RDE-CERL-IL
2014-May-01 14:00 UTC
[R] Understanding power analysis in glm and binomial proportion test
Hi all,
I am trying to run a power analysis using simulated data to compare the power of
a glm versus a binomial proportion test to detect differences in proportions.
For example, suppose you have some proportion that decreases by some amount over
X number of time steps.
.4,.39,.38,.37 . . . . .01 and simulate data over those time steps based on the
decrease. Does a glm approach (differences in slope) "outperform" an
approach whereby you simply look at proportional differences. I ran simulations
and basically summed up the number of times p<.05 divided by the number of
trials. I was expecting that a glm approach would have more power because it
would be utilizing all the data across multiple time steps, whereas a binomial
proportion test is only comparing two populations ( the beginning and the end
points).
However the results indicated that a binomial proportion test had more power
relative to a glm at fewer time steps and that, depending on the simulated
decrease in proportions, the relationship between glm power and binomial
proportion test power changed. Interestingly, a greater decreases, a binomial
proportion test seems have greater power compared to a glm, which to me is
counterintuitive since the slope should be greater.
I am attaching the code below my questions in case anyone is interested.
My questions are:
(1) Am I interpreting the results and p-values correctly?
(2) If I am interested in trends, does glm results really have lower power
and, if so, is there a way to combine the two tests?
I know that I am ignoring a lot of issues such as autocorrelation, but I am
really just trying to understand the output.
Any suggestions or insights would be appreciated.
##### Attempt at using glm model ######
ltProb <- 0.4 ## longterm average of nest survival
probability
change <- c(.01,.03,.05)
yrs <- seq(1,20, by=1) ##
Years of inquiry, probability of detecting a yearly change nest survival
samplesize <- 50 ## Reasonable
sample size range
reps <- 1000
## of simulations per
SurvProb <- ltProb
## initiating survival probablity for later use, nuisance
power <- matrix(nrow=length(change)*length(yrs)*length(samplesize), ncol=5)
## creating a matrix to hold data
scenario <- 0
## initializing scenarios which will be a placeholder later on
for (a in 1:length(change)){
## loop through yearly pop change scenarios
for (c in 1:length(samplesize)){
## loop through sample size scenarios
for (b in 1:length(yrs)){
## loop through years sceanarios
scenario=scenario+1
power[scenario,1] <- change[a]
## filling matrix with pop change used
power[scenario,2]<-ltProb*((1-change[a])**yrs[b])
power[scenario,3] <- yrs[b]
## filling matrix with number of years used
power[scenario,4] <- samplesize[c]
## filling matrix with sample size used
}
}
}
colnames(power) <- c("PopChange",
"Proportion2","yrs", "Sample_Size",
"Power")
power=as.data.frame(power)
power$Sample_Size=as.numeric(power$Sample_Size)
scenario=levels(as.factor(power$PopChange)) ## To subset power
ps = levels(as.factor(power$Sample_Size))
results <- matrix(nrow=0, ncol=5) ## final matrix
results=as.data.frame(results)
reps=1000 ## set number of reps
for (k in 1:length(scenario)){
for (m in 1:length(ps)){
sub=power[(power$PopChange==scenario[k]) & power$Sample_Size==ps[m],] ##
Subset by scenario and sample size
probs = matrix(nrow=1000,ncol=16) ## this is matrix for probabilities.
dat=matrix(nrow=0,ncol=3,NA)
dat=as.data.frame(dat)
for (l in 1:reps){ ##This should be for the replicates
dat=matrix(nrow=0,ncol=3,NA)
dat=as.data.frame(dat)
print(l)
for (j in 1:nrow(sub)){
tmp=rbinom(n=sub$Sample_Size[j],size=1,prob=sub[j,2])
tmp.dat=cbind(tmp,rep(sub$yrs[j],sub$Sample_Size[j]),rep(sub$Proportion2[j],sub$Sample_Size[j]))
dat=rbind(dat,tmp.dat)
if (sub$yrs[j]>4){
x=(glm(dat[,1]~dat[,2],family=binomial))
probs[l,j-4]=ifelse(summary(x)$coefficients[2,4]<.05,1,0)
}
}
}
sub$Power[5:20]=apply(probs,2,sum)/nrow(probs)
results=rbind(results,sub)
}
}
tmp=results[!is.na(results$Power),] ### remove NAs
tmp$SampleSize_Decline=paste(tmp$Sample_Size,tmp$PopChange,sep=":")
tmp$SampleSize_Decline=as.factor(tmp$SampleSize_Decline)
### Now do the same using prop.test #####
tmp$PropPower = NA
for (i in 1:nrow(tmp)){
print(i)
x=0
for (j in 1:1000){
a = rbinom(tmp$Sample_Size[i],1,prob = (tmp$Proportion2[i]))
b=binom.test(sum(a),length(a),p=.4,conf.level=.95,alternative="less")
if (b$p.value < .05) {x=x+1}
}
tmp$PropPower[i] = x/1000
}
##### graph results
d=unique(tmp$PopChange)
for (i in 1:length(d)){
par(mfcol = c(1,2))
sub = tmp[tmp$PopChange == d[i],]
plot(sub$yrs,sub$Power,col="red", main = d[i], xlab = "Number
of years",ylab="Power")
points(sub$yrs,sub$PropPower,col="black")
plot(sub$PropPower,sub$Power,xlab="Binomial Proportion Power",
ylab="GLM Power",main=d[i])
abline(0,1)
readline("Press <return to continue")
}
##### End Code #####
Wade A. Wall
US Army ERDC-CERL
P.O. Box 9005
Champaign, IL 61826-9005
1-217-373-4420
Wade.A.Wall@usace.army.mil<mailto:Wade.A.Wall@usace.army.mil>
[[alternative HTML version deleted]]
Bert Gunter
2014-May-01 18:42 UTC
[R] Understanding power analysis in glm and binomial proportion test
While R is certainly used for statistical simulations as you showed, this list is really for questions about R programming, not statistics. While they certainly overlap and someone may respond here, I suggest you post this to stats.stackexchange.com or other statistics site that is specifically for such issues. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Thu, May 1, 2014 at 7:00 AM, Wall, Wade A ERDC-RDE-CERL-IL <Wade.A.Wall at usace.army.mil> wrote:> Hi all, > > I am trying to run a power analysis using simulated data to compare the power of a glm versus a binomial proportion test to detect differences in proportions. For example, suppose you have some proportion that decreases by some amount over X number of time steps. > .4,.39,.38,.37 . . . . .01 and simulate data over those time steps based on the decrease. Does a glm approach (differences in slope) "outperform" an approach whereby you simply look at proportional differences. I ran simulations and basically summed up the number of times p<.05 divided by the number of trials. I was expecting that a glm approach would have more power because it would be utilizing all the data across multiple time steps, whereas a binomial proportion test is only comparing two populations ( the beginning and the end points). > > However the results indicated that a binomial proportion test had more power relative to a glm at fewer time steps and that, depending on the simulated decrease in proportions, the relationship between glm power and binomial proportion test power changed. Interestingly, a greater decreases, a binomial proportion test seems have greater power compared to a glm, which to me is counterintuitive since the slope should be greater. > > I am attaching the code below my questions in case anyone is interested. > > My questions are: > > > (1) Am I interpreting the results and p-values correctly? > > (2) If I am interested in trends, does glm results really have lower power and, if so, is there a way to combine the two tests? > > I know that I am ignoring a lot of issues such as autocorrelation, but I am really just trying to understand the output. > > Any suggestions or insights would be appreciated. > > ##### Attempt at using glm model ###### > ltProb <- 0.4 ## longterm average of nest survival probability > change <- c(.01,.03,.05) > yrs <- seq(1,20, by=1) ## Years of inquiry, probability of detecting a yearly change nest survival > samplesize <- 50 ## Reasonable sample size range > reps <- 1000 ## of simulations per > SurvProb <- ltProb ## initiating survival probablity for later use, nuisance > power <- matrix(nrow=length(change)*length(yrs)*length(samplesize), ncol=5) ## creating a matrix to hold data > > scenario <- 0 ## initializing scenarios which will be a placeholder later on > for (a in 1:length(change)){ ## loop through yearly pop change scenarios > for (c in 1:length(samplesize)){ ## loop through sample size scenarios > for (b in 1:length(yrs)){ ## loop through years sceanarios > scenario=scenario+1 > power[scenario,1] <- change[a] ## filling matrix with pop change used > power[scenario,2]<-ltProb*((1-change[a])**yrs[b]) > power[scenario,3] <- yrs[b] ## filling matrix with number of years used > power[scenario,4] <- samplesize[c] ## filling matrix with sample size used > } > } > } > colnames(power) <- c("PopChange", "Proportion2","yrs", "Sample_Size", "Power") > power=as.data.frame(power) > power$Sample_Size=as.numeric(power$Sample_Size) > > scenario=levels(as.factor(power$PopChange)) ## To subset power > ps = levels(as.factor(power$Sample_Size)) > results <- matrix(nrow=0, ncol=5) ## final matrix > results=as.data.frame(results) > reps=1000 ## set number of reps > > for (k in 1:length(scenario)){ > for (m in 1:length(ps)){ > sub=power[(power$PopChange==scenario[k]) & power$Sample_Size==ps[m],] ## Subset by scenario and sample size > probs = matrix(nrow=1000,ncol=16) ## this is matrix for probabilities. > dat=matrix(nrow=0,ncol=3,NA) > dat=as.data.frame(dat) > for (l in 1:reps){ ##This should be for the replicates > dat=matrix(nrow=0,ncol=3,NA) > dat=as.data.frame(dat) > print(l) > for (j in 1:nrow(sub)){ > tmp=rbinom(n=sub$Sample_Size[j],size=1,prob=sub[j,2]) > tmp.dat=cbind(tmp,rep(sub$yrs[j],sub$Sample_Size[j]),rep(sub$Proportion2[j],sub$Sample_Size[j])) > dat=rbind(dat,tmp.dat) > if (sub$yrs[j]>4){ > x=(glm(dat[,1]~dat[,2],family=binomial)) > probs[l,j-4]=ifelse(summary(x)$coefficients[2,4]<.05,1,0) > } > } > } > sub$Power[5:20]=apply(probs,2,sum)/nrow(probs) > results=rbind(results,sub) > } > } > > tmp=results[!is.na(results$Power),] ### remove NAs > tmp$SampleSize_Decline=paste(tmp$Sample_Size,tmp$PopChange,sep=":") > tmp$SampleSize_Decline=as.factor(tmp$SampleSize_Decline) > > ### Now do the same using prop.test ##### > tmp$PropPower = NA > for (i in 1:nrow(tmp)){ > print(i) > x=0 > for (j in 1:1000){ > a = rbinom(tmp$Sample_Size[i],1,prob = (tmp$Proportion2[i])) > b=binom.test(sum(a),length(a),p=.4,conf.level=.95,alternative="less") > if (b$p.value < .05) {x=x+1} > } > tmp$PropPower[i] = x/1000 > } > > ##### graph results > d=unique(tmp$PopChange) > for (i in 1:length(d)){ > par(mfcol = c(1,2)) > sub = tmp[tmp$PopChange == d[i],] > plot(sub$yrs,sub$Power,col="red", main = d[i], xlab = "Number of years",ylab="Power") > points(sub$yrs,sub$PropPower,col="black") > plot(sub$PropPower,sub$Power,xlab="Binomial Proportion Power", ylab="GLM Power",main=d[i]) > abline(0,1) > readline("Press <return to continue") > } > > > ##### End Code ##### > > > > Wade A. Wall > US Army ERDC-CERL > P.O. Box 9005 > Champaign, IL 61826-9005 > 1-217-373-4420 > Wade.A.Wall at usace.army.mil<mailto:Wade.A.Wall at usace.army.mil> > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.