thr3ads.net - R help - [R] Simulate p-value in lme4 [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Manuel Morales

2006-Aug-17 20:49 UTC

[R] Simulate p-value in lme4

Dear list,

This is more of a stats question than an R question per se. First, I
realize there has been a lot of discussion about the problems with
estimating P-values from F-ratios for mixed-effects models in lme4.
Using mcmcsamp() seems like a great alternative for evaluating the
significance of individual coefficients, but not for groups of
coefficients as might occur in an experimental design with 3 treatment
levels. I'm wondering if the simulation approach I use below to estimate
the P-value for a 3-level factor is appropriate, or if there are any
suggestions on how else to approach this problem. The model and data in
the example are from section 10.4 of MASS.

Thanks!
Manuel

# Load req. package (see functions to generate data at end of script)
library(lme4)
library(MASS)

# Full and reduced models - pred is a factor with 3 levels
result.full <- lmer(y~pred+(1|subject), data=epil3,
family="poisson")
result.base <- lmer(y~1+(1|subject), data=epil3, family="poisson")

# Naive P-value from LR for significance of "pred" factor
anova(result.base,result.full)$"Pr(>Chisq)"[[2]] # P-value
(test.stat <- anova(result.base,result.full)$Chisq[[2]]) # Chisq-stat

# P-value from simulation. Note that in the simulation, I use the
# estimated random effects for each subject rather than generating a new
# distribution of means. I'm not sure if this is appropriate or not ...
intercept <- fixef(result.base)
rand.effs <- ranef(result.base)[[1]]
mu <- exp(rep(intercept+rand.effs[[1]],2))

p.value <- function(iter, stat) {
  chi.stat <- vector()
  for(i in 1:iter) {
    resp <- rpois(length(mu), mu) # simulate values
    sim.data <- data.frame(y=resp,subject=epil3$subject,pred=epil3$pred)
    result.f <- lmer(y~pred+(1|subject), data=sim.data,
                     family="poisson")
    result.b <- lmer(y~1+(1|subject), data=sim.data,
family="poisson")
    chi.stat[i] <- anova(result.b,result.f)$Chisq[[2]]
  }
  val <- sum(unlist(lapply(chi.stat, function(x) if(x>stat) 1 else
             0)))/iter
  hist(chi.stat)
  return(val)
}

p.value(10,test.stat) # Increase to >=1000 to get a reasonable P-value!

# Script to generate data, from section 10.4 of MASS
epil2 <- epil[epil$period == 1, ]
epil2["period"] <- rep(0, 59); epil2["y"] <-
epil2["base"]
epil["time"] <- 1; epil2["time"] <- 4
epil2 <- rbind(epil, epil2)
epil2$pred <- unclass(epil2$trt) * (epil2$period > 0)
epil2$subject <- factor(epil2$subject)
epil3 <- aggregate(epil2, list(epil2$subject, epil2$period > 0),
                   function(x) if(is.numeric(x)) sum(x) else x[1])
epil3$pred <- factor(epil3$pred, labels = c("base",
"placebo", "drug"))

Spencer Graves

2006-Aug-20 18:22 UTC

head link

[R] Simulate p-value in lme4

You've raised a very interesting question about testing a 
fixed-effect factor with more than 2 levels using Monte Carlo.  Like 
you, I don't know how to use 'mcmcsamp' to refine the naive 
approximation. If we are lucky, someone else might comment on this for us. 

      Beyond this, you are to be commended for providing such a simple, 
self-contained example for such a sophisticated question.  I think you 
simulation misses one important point:  It assumes the between-subject 
variance is zero.  To overcome this, I think I might try either the 
bootstrap or permutation distribution scrambling the assignment of 
subjects to treatment groups but otherwise keeping the pairs of 
observations together. 

      To this end, consider the following: 

# Build a table to translate subject into 'pred'
o <- with(epil3, order(subject, y))
epil3. <- epil3[o,]
norep <- with(epil3., subject[-1]!=subject[-dim(epil3)[1]])
subj1 <- which(c(T, norep))
subj.pred <- epil3.[subj1, c("subject", "pred")]
subj. <- as.character(subj.pred$subject)
pred. <- subj.pred$pred
names(pred.) <- subj.

iter <- 10
chisq.sim <- rep(NA, iter)

set.seed(1)
for(i in 1:iter){
  s.i <- sample(subj.)
# Randomize subject assignments to 'pred' groups
  epil3.$pred <- pred.[s.i][epil3.$subject]
  fit1 <- lmer(y ~ pred+(1 | subject),
                family = poisson, data = epil3.)
  fit0 <- lmer(y ~ 1+(1 | subject),
                family = poisson, data = epil3.)
  chisq.sim[i] <- anova(fit0, fit1)[2, "Chisq"]
}

      Hope this helps. 
      Spencer Graves

Manuel Morales wrote:> Dear list,
>
> This is more of a stats question than an R question per se. First, I
> realize there has been a lot of discussion about the problems with
> estimating P-values from F-ratios for mixed-effects models in lme4.
> Using mcmcsamp() seems like a great alternative for evaluating the
> significance of individual coefficients, but not for groups of
> coefficients as might occur in an experimental design with 3 treatment
> levels. I'm wondering if the simulation approach I use below to
estimate
> the P-value for a 3-level factor is appropriate, or if there are any
> suggestions on how else to approach this problem. The model and data in
> the example are from section 10.4 of MASS.
>
> Thanks!
> Manuel
>
> # Load req. package (see functions to generate data at end of script)
> library(lme4)
> library(MASS)
>
> # Full and reduced models - pred is a factor with 3 levels
> result.full <- lmer(y~pred+(1|subject), data=epil3,
family="poisson")
> result.base <- lmer(y~1+(1|subject), data=epil3,
family="poisson")
>
> # Naive P-value from LR for significance of "pred" factor
> anova(result.base,result.full)$"Pr(>Chisq)"[[2]] # P-value
> (test.stat <- anova(result.base,result.full)$Chisq[[2]]) # Chisq-stat
>
> # P-value from simulation. Note that in the simulation, I use the
> # estimated random effects for each subject rather than generating a new
> # distribution of means. I'm not sure if this is appropriate or not ...
> intercept <- fixef(result.base)
> rand.effs <- ranef(result.base)[[1]]
> mu <- exp(rep(intercept+rand.effs[[1]],2))
>
> p.value <- function(iter, stat) {
>   chi.stat <- vector()
>   for(i in 1:iter) {
>     resp <- rpois(length(mu), mu) # simulate values
>     sim.data <- data.frame(y=resp,subject=epil3$subject,pred=epil3$pred)
>     result.f <- lmer(y~pred+(1|subject), data=sim.data,
>                      family="poisson")
>     result.b <- lmer(y~1+(1|subject), data=sim.data,
family="poisson")
>     chi.stat[i] <- anova(result.b,result.f)$Chisq[[2]]
>   }
>   val <- sum(unlist(lapply(chi.stat, function(x) if(x>stat) 1 else
>              0)))/iter
>   hist(chi.stat)
>   return(val)
> }
>
> p.value(10,test.stat) # Increase to >=1000 to get a reasonable P-value!
>
> # Script to generate data, from section 10.4 of MASS
> epil2 <- epil[epil$period == 1, ]
> epil2["period"] <- rep(0, 59); epil2["y"] <-
epil2["base"]
> epil["time"] <- 1; epil2["time"] <- 4
> epil2 <- rbind(epil, epil2)
> epil2$pred <- unclass(epil2$trt) * (epil2$period > 0)
> epil2$subject <- factor(epil2$subject)
> epil3 <- aggregate(epil2, list(epil2$subject, epil2$period > 0),
>                    function(x) if(is.numeric(x)) sum(x) else x[1])
> epil3$pred <- factor(epil3$pred, labels = c("base",
"placebo", "drug"))
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more reasonably related threads

R help - Aug 2006 - Simulate p-value in lme4

[R] Simulate p-value in lme4

[R] Simulate p-value in lme4

Seemingly Similar Threads