Douglas Bates
2006-Apr-16 17:48 UTC
[R] [S] Problems with lme and 2 levels of nesting:Summary
I have taken the liberty of including the R-help mailing list on this reply as that is the appropriate place to discuss lmer results. On 4/5/06, Andreas Svensson <andreas.svensson at bio.ntnu.no> wrote:> Hello again > I have now recieved some helpful hints in this matter and will summarize them but first let me reiterate the problem: > > I had two treatments: 2 types of food fed to females. Each female were present only in one treatment and laid one clutch (female ID = clutch ID). From each clutch, 50 larvae were taken and divided on 5 cups; 10 larvae in each cup (the cups were not meant to differ in any way, it's just impossible to count 50 larvae in motion, but ok with 10). The day of death was recorded for each larva. Since there's only one data point per larva, I see no need of including larva ID in the model. So: > > Response: DeathDay > Fixed factor: Treatment > Random factors: Clucth, Cup > Design: Cup nested in Clutch > > I'm primarlity interested in the effect of Treatment on DeathDay, but would also like to know about the variation within "Clutch" and "Cup". The unit of replication is "Clutch". I want to use a model where the variation within Clutch and within Cup is taken into account, while avoiding pseudoreplication of 5 > cups (and 50 larvae) coming from the same clutch. The solution is therefore a mixed model with > Cup nested in Clutch as random factors. > A year ago, I was recommended this script from S-help at Insightful: > > model1<-lme(DeathDay~Treatment,random=~Treatment|Clutch/Cup) > > I think this is wrong since Treatment shouldn't be among the random factors. From the various tips I've recieved the correct syntax using lme (in libarary nlme) should be: > > model2<-lme(DeathDay~ Treatment,random=~1|Clutch/Cup) > > and using lmer (in library lme4): > > model3<- lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch), method="REML") > > I get the result that there is NO effect of Treatment. So far so good.If there is no significant effect for Treatment then I would go on to fit a model without the Treatment effect and use that model to examine the significane of the random effects.> But the above models will not give me any information of the importance of the random factors. If I do the same in SPSS: > > glm > DeathDay by Cup Clutch Treatment > /method=sstype(1) > /random Cup Clutch > /design Cup(Clutch). > > ...I get an ANOVA table that I (and referees) can understand. It tells me about where the variation originates from; in one experiment only Clutch is significant, in another there is also a (surprising but explainable) effect of Cup.An analysis of variance table may be appropriate for balanced data with nested factors such as you have but it does not generalize well to other situations in which these models are applied. The preferred approach to model-building in R/S-PLUS is to build the model sequentially. I would use fm1 <- lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch)) fm2 <- lmer(DeathDay ~ 1 + (1|Clutch:Cup) + (1|Clutch)) # equivalent to your model7 fm3 <- lmer(DeathDay ~ 1 + (1|Clutch)) and then compare fm2 and fm3, say with anova(fm2, fm3). As described below the p-value from the likelihood ratio test in this case is not exact because the hypothesis you are testing is on the boundary of the parameter space. However, the value returned will be a conservatve p-value so you suffer a loss of power rather than compromising the level of the test. If you choose you could look at the distribution of the parameters from a Bayesian perspective by using the mcmcsamp function to generate a Markov Chain Monte Carlo sample from the parameters in fm2 and check if you have a precise or an imprecise estimate of the variance of the random effects for the Clutch:Cup grouping factor. If that parameter could reasonably be zero then you could repeat the sampling for fm3 and look at the distribution of the variance of the random effects associated with the Clutch grouping factor.>In R/S-plus I can achieve something similar with modelsimplification. REML will not allow me to remove the fixed factor, so I have to use "regular" ML for this: I think I would prefer to characterize the situation by saying that the log-likelihood from the REML criterion cannot be used to compare models when you change the fixed-effects specificatio. However, this still doesn't explain to me why you want to retain the Treatment factor when you have judged it to be inert.> model4<- lme(DeathDay ~ Treatment, random=~ 1 | Clutch/Cup,method="ML") #Full model > model5<- lme(DeathDay ~ Treatment, random=~ 1 | Clutch,method="ML") # model minus Cup > model6<- lme(DeathDay ~ Treatment, random=~ 1 | Cup,method="ML") # model minus Clutch > model7<- lme(DeathDay ~ 1, random=~ 1 | Clutch/Cup,method="ML") # model minus Treatment > > ...and make anova(model4, modelxxx)etc to test for effects of Cup, Clutch and Treatment. But is this really correct? In model5 I remove Cup, doesn't this wrongly boost the df for Clucth (5 x pseudoreplication)? In model6 I remove Clutch but keep Cup, despite Cup should be nested in Clutch - it doesn't seem right. Or have I misunderstood this?Whether a model with random effects fort Cup only would make sense depends on how the levels of Cup are assigned. You have 5 Cups for each Clutch. If the Clutches are, say, 'A', 'B', 'C', ... and the Cups are uniquely labelled within Clutch as, say, "A1", "A2", "A3", "A4", "A5", "B1", ... then you can reasonably use a random effect grouped by Cup. However, if you have assigned the labels of the Cups as "1", "2", "3", "4" and "5" for every Clutch then you can't assign a random effect for Cup without specifying Clutch because you would have the cups "A1", "B1", "C1", ... sharing the same random effect even though they are completely unrelated. This is why the grouping factor Clutch:Cup is used in lmer. Because lmer can fit models with nested or crossed or partially crossed grouping factors you must have a unique label for every cup. Otherwise lmer will fit the model as if the grouping factors Clutch and Cup are crossed, not nested, and as described above this is nonsense because cups "A1", "B1", "C1", ... are not related and should not share a common random effect.> This leads me to a new question which I think has been the subject for some debate: Is there in R/S-plus a way to get P values for the random factors in a mixed model?Well, no. To get a P value one must propose a statistic that measures the effect of interest and provide a reference distribution for the statistic. It seems that a reasonable test statistic would be the difference in the log-likelihoods or the log-restricted-likelihoods but you don't have an easily calculated reference distribution for that. A Chi-squared distribution for negative twice the difference in the log-likelihoods is not appropriate because you are testing on the boundary of the parameter space. Some approximations have been proposed (see David Ruppert's web site for some references) but they are approximations. Many computing packages (including SAS PROC MIXED) quote p-values for such a test that are, in my opinion, absolute nonsense. They evaluate the estimates of the variance components by maximizing the log-likelihood or the log-restricted-likelihood then use the Hessian of the criterion to obtain an approximate standard error of the estimate of the variance and create the ratio estimate/(approx. standard error) which is then compared to the standard normal distribution. This would be appropriate if we were confident that the distribution of the estimate was close to being normal. However, we know that it is not. Even in the simplest case of estimating the variance from an iid sample from a normal distribution we know that S^2 has a scaled Chi-squared distribution. Why should we assume that in a much more complicated model the distribution of the estimates of variance components suddenly becomes much simpler? We can look at the distribution of the parameter estimates through Markov Chain Monte Carlo sampling and I would prefer to do that. I hope these comments are helpful to you. I know that it is convenient to have the results of an analysis expressed simply and concisely and it may indeed be possible to do that for your model and data. In general linear mixed models can be very complicated to analyze and the simple results don't apply. That is why I would prefer to provide more powerful tools for the general case first and later see if there are simplifications that can be applied to the easier cases.> > Thanks for input on this by Robert Campbell, Thomas Jagger and Nicola Koper. I also had some help from Brian Ripley's posting "[S] random factors in anova". > > Andreas Svensson > > > > -------------------------------------------------------------------- > This message was distributed by s-news at lists.biostat.wustl.edu. To > ...(s-news.. clipped)...>