thr3ads.net - R help - [R] [S] Problems with lme and 2 levels of nesting:Summary [Apr 2006]

If this information is useful, please help other people find it:
Share via:
Douglas Bates
2006-Apr-16 17:48 UTC
[R] [S] Problems with lme and 2 levels of nesting:Summary

I have taken the liberty of including the R-help mailing list on this
reply as that is the appropriate place to discuss lmer results.

On 4/5/06, Andreas Svensson <andreas.svensson at bio.ntnu.no>
wrote:> Hello again
> I have now recieved some helpful hints in this matter and will summarize
them but first let me reiterate the problem:
>
> I had two treatments: 2 types of food fed to females. Each female were
present only in one treatment and laid one clutch (female ID = clutch ID). From
each clutch, 50 larvae were taken and divided on 5 cups; 10 larvae in each cup
(the cups were not meant to differ in any way, it's just impossible to count
50 larvae in motion, but ok with 10). The day of death was recorded for each
larva. Since there's only one data point per larva, I see no need of
including larva ID in the model. So:
>
> Response: DeathDay
> Fixed factor: Treatment
> Random factors: Clucth, Cup
> Design: Cup nested in Clutch
>
> I'm primarlity interested in the effect of Treatment on DeathDay, but
would also like to know about the variation within "Clutch" and
"Cup". The unit of replication is "Clutch". I want to use a
model where the variation within Clutch and within Cup is taken into account,
while avoiding pseudoreplication of 5
> cups (and 50 larvae) coming from the same clutch. The solution is therefore
a mixed model with
> Cup nested in Clutch as random factors.
> A year ago, I was recommended this script from S-help at Insightful:
>
> model1<-lme(DeathDay~Treatment,random=~Treatment|Clutch/Cup)
>
> I think this is wrong since Treatment shouldn't be among the random
factors. From the various tips I've recieved the correct syntax using lme
(in libarary nlme) should be:
>
> model2<-lme(DeathDay~ Treatment,random=~1|Clutch/Cup)
>
> and using lmer (in library lme4):
>
> model3<- lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch), 
method="REML")
>
> I get the result that there is NO effect of Treatment. So far so good.
If there is no significant effect for Treatment then I would go on to
fit a model without the Treatment effect and use that model to examine
the significane of the random effects.
> But the above models will not give me any information of the importance of
the random factors.  If I do the same in SPSS:
>
> glm
> DeathDay by Cup Clutch Treatment
> /method=sstype(1)
> /random Cup Clutch
> /design Cup(Clutch).
>
> ...I get an ANOVA table that I (and referees) can understand. It tells me
about where the variation originates from; in one experiment only Clutch is
significant, in another there is also a (surprising but explainable) effect of
Cup.
An analysis of variance table may be appropriate for balanced data
with nested factors such as you have but it does not generalize well
to other situations in which these models are applied. The preferred
approach to model-building in R/S-PLUS is to build the model
sequentially.  I would use


fm1 <- lmer(DeathDay ~ Treatment + (1|Clutch:Cup) + (1|Clutch))
fm2 <- lmer(DeathDay ~ 1 + (1|Clutch:Cup) + (1|Clutch)) # equivalent
to your model7
fm3 <- lmer(DeathDay ~ 1 + (1|Clutch))

and then compare fm2 and fm3, say with anova(fm2, fm3).  As described
below the p-value from the likelihood ratio test in this case  is not
exact because the hypothesis you are testing is on the boundary of the
parameter space.  However, the value returned will be a conservatve
p-value so you suffer a loss of power rather than compromising the
level of the test.

If you choose you could look at the distribution of the parameters
from a Bayesian perspective by using the mcmcsamp function to generate
a Markov Chain Monte Carlo sample from the parameters in fm2 and check
if you have a precise or an imprecise estimate of the variance of the
random effects for the Clutch:Cup grouping factor.  If that parameter
could reasonably be zero then you could repeat the sampling for fm3
and look at the distribution of the variance of the random effects
associated with the Clutch grouping factor.
>In R/S-plus I can achieve something similar with modelsimplification. REML will not allow me to remove the fixed factor, so
I have to use "regular" ML for this:

I think I would prefer to characterize the situation by saying that
the log-likelihood from the REML criterion cannot be used to compare
models when you change the fixed-effects specificatio.  However, this
still doesn't explain to me why you want to retain the Treatment
factor when you have judged it to be inert.
> model4<- lme(DeathDay ~ Treatment, random=~ 1 |
Clutch/Cup,method="ML")  #Full model
> model5<- lme(DeathDay ~ Treatment, random=~ 1 |
Clutch,method="ML")      # model minus Cup
> model6<- lme(DeathDay ~ Treatment, random=~ 1 |
Cup,method="ML")        # model minus Clutch
> model7<- lme(DeathDay ~ 1, random=~ 1 |
Clutch/Cup,method="ML")         # model minus Treatment
>
> ...and make anova(model4, modelxxx)etc to test for effects of Cup, Clutch
and Treatment. But is this really correct? In model5 I remove Cup, doesn't
this wrongly boost the df for Clucth (5 x pseudoreplication)? In model6 I remove
Clutch but keep Cup, despite Cup should be nested in Clutch - it doesn't
seem right. Or have I misunderstood this?
Whether a model with random effects fort Cup only would make sense
depends on how the levels of Cup are assigned. You have 5 Cups for
each Clutch.  If the Clutches are, say, 'A', 'B', 'C',
... and the
Cups are uniquely labelled within Clutch as, say, "A1",
"A2", "A3",
"A4", "A5", "B1", ... then you can reasonably use
a random effect
grouped by Cup.  However, if you have assigned the labels of the Cups
as "1", "2", "3", "4" and "5"
for every Clutch then you can't assign a
random effect for Cup without specifying Clutch because you would have
the cups "A1", "B1", "C1", ... sharing the same
random effect even
though they are completely unrelated.

This is why the grouping factor Clutch:Cup is used in lmer.  Because
lmer can fit models with nested or crossed or partially crossed
grouping factors you must have a unique label for every cup. 
Otherwise lmer will fit the model as if the grouping factors Clutch
and Cup are crossed, not nested, and as described above this is
nonsense because cups "A1", "B1", "C1", ... are
not related and should
not share a common random effect.
> This leads me to a new question which I think has been the subject for some
debate: Is there in R/S-plus a way to get P values for the random factors in a
mixed model?
Well, no.  To get a P value one must propose a statistic that measures
the effect of interest and provide a reference distribution for the
statistic.  It seems that a reasonable test statistic would be the
difference in the log-likelihoods or the log-restricted-likelihoods
but you don't have an easily calculated reference distribution for
that.  A Chi-squared distribution for negative twice the difference in
the log-likelihoods is not appropriate because you are testing on the
boundary of the parameter space.  Some approximations have been
proposed (see David Ruppert's web site for some references) but they
are approximations.

Many computing packages (including SAS PROC MIXED) quote p-values for
such a test that are, in my opinion, absolute nonsense.  They evaluate
the estimates of the variance components by maximizing the
log-likelihood or the log-restricted-likelihood then use the Hessian
of the criterion to obtain an approximate standard error of the
estimate of the variance and create the ratio estimate/(approx.
standard error) which is then compared to the standard normal
distribution. This would be appropriate if we were confident that the
distribution of the estimate was close to being normal.  However, we
know that it is not.  Even in the simplest case of estimating the
variance from an iid sample from a normal distribution we know that
S^2 has a scaled Chi-squared distribution.  Why should we assume that
in a much more complicated model the distribution of the estimates of
variance components suddenly becomes much simpler?

We can look at the distribution of the parameter estimates through
Markov Chain Monte Carlo sampling and I would prefer to do that.

I hope these comments are helpful to you.  I know that it is
convenient to have the results of an analysis expressed simply and
concisely and it may indeed be possible to do that for your model and
data.  In general linear mixed models can be very complicated to
analyze and the simple results don't apply.  That is why I would
prefer to provide more powerful tools for the general case first and
later see if there are simplifications that can be applied to the
easier cases.
>
> Thanks for input on this by Robert Campbell, Thomas Jagger and Nicola
Koper. I also had some help from Brian Ripley's posting "[S] random
factors in anova".
>
> Andreas Svensson
>
>
>
> --------------------------------------------------------------------
> This message was distributed by s-news at lists.biostat.wustl.edu.  To
> ...(s-news.. clipped)...
>
Maybe Matching Threads

Search for more apparently analagous threads
R help - Apr 2006 - [S] Problems with lme and 2 levels of nesting:Summary

[R] [S] Problems with lme and 2 levels of nesting:Summary

Maybe Matching Threads

Wisdom of the Ancients