Martin Maechler
2003-Jun-17 08:13 UTC
[R] lme() vs aov(y ~ A*B + Error(aa %in% A + bb %in% B)) [repost]
I've posted the following to R-help on May 15. It has reproducible R code for real data -- and a real (academic, i.e unpaid) consultion background. I'd be glad for some insight here, mainly not for myself. In the mean time, we've learned that it is to be expected for anova(*, "marginal") to be contrast dependent, but still are glad for advice if you have experience. Thank you in advance, Martin
Martin Maechler
2003-Jun-17 15:32 UTC
[R] lme() vs aov(y ~ A*B + Error(aa %in% A + bb %in% B)) [repost]
>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Tue, 17 Jun 2003 10:13:44 +0200 writes:MM> I've posted the following to R-help on May 15. MM> It has reproducible R code for real data -- and a real MM> (academic, i.e unpaid) consultion background. MM> I'd be glad for some insight here, mainly not for myself. MM> In the mean time, we've learned that it is to be expected for MM> anova(*, "marginal") to be contrast dependent, but still are MM> glad for advice if you have experience. MM> Thank you in advance, MM> Martin and indeed, the forwarded message, also a kind of attachment (message/rfc822) was dropped by mailman's content filtering too... Here, it's "as regular" text: Here is a reproducible (when you're on-net) script for a relatively simple real data (consulting) situation here. The data analyst tried to use lme() rather than aov( + Error()) and asked several questions that I couldn't easily answer and thought to be interesting for a more general audience in any case. I attach the script as text/plain file that you should run "everywhere". The three questions are summarized at the beginning of the file. ----------------------------snip-snip------------------------------------ ### LME/aov example {real data!} ### Two crossed fixed effects each with a random factor inside ### Problems : ### 1) need(?) ugly pseudo grouping ### 2) anova() depends on setting of contrasts ### 3) numerical problem "Singular precision matrix" .. (should not happen?) download.file("ftp://stat.ethz.ch/U/maechler/R/ex4.rda", "lme-ex4.rda") load("lme-ex4.rda")##-> object `dw' summary(dw) str(dw) ## For some EDA {and S+ compatibility; for R, I'd use with(dw, ...)} : attach(dw) ## TREE: 1:3 at "1", 4:6 at "2" : TREE completely nested in LOCATION table(TREE, LOCATION) ## and the same with THALLUS in PROV: th 1:5 > pr1; 6:7 > 2; 8:12 > 3;... table(THALLUS, PROV) colSums(table(THALLUS, PROV) > 0) ## pr1 pr2 pr3 pr4 pr5 pr6 ## 5 2 5 5 1 5 interaction.plot(PROV, LOCATION, y) ## Hmm.. PROV=5 is a bit peculiar ## well, for one because it has only 3+3 observations : table(PROV, LOCATION) plot(y ~ PROV, data = dw)# not so useful ## or (better?): stripchart(y ~ PROV, vertical=TRUE, method="stack") if(FALSE)# quite a bit to see table(PROV, THALLUS, TREE) all(table(PROV, THALLUS, TREE) %in% c(0,1)) ## TRUE detach("dw")##--------------------- end of "EDA" ------------- ## Setup: ## - 2 locations (LOCATION) with 3 trees each ## - 6 kinds proveniences (PROV) with 2-5 kinds "thallus" each ## - L * P are fixed (crossed), each has a random factor nested inside ## ==> Simple Approach is possible ## ## aov Models with Error() ## ----------------------- ##--------- 1 ------------------------ default contrasts --------- options(contrasts=c("contr.treatment", "contr.poly")) aov1 <- aov(y ~ PROV*LOCATION + Error(TREE%in%LOCATION + THALLUS%in%PROV), data = dw) summary(aov1) ##--------- 2 ------------------------ sum contrasts --------- options(contrasts=c("contr.sum", "contr.poly")) aov2 <- aov(y ~ PROV*LOCATION + Error(TREE%in%LOCATION + THALLUS%in%PROV), data = dw) ## really the imporant things do NOT depend on the contrasts: stopifnot(all.equal(summary(aov1), summary(aov2))) ## For residual analysis : aov1. <- aov(y ~ PROV*LOCATION + TREE%in%LOCATION + THALLUS%in%PROV, data = dw) ## non-sense model ## -- used only for residuals() , fitted() which "fail" for aov1 << !! ## ((this is not nice behavior !)) par(mfrow = c(1,2)) plot(fitted(aov1.), residuals(aov1.)); abline(h = 0, lty = 2) qqnorm(residuals(aov1.)); qqline(residuals(aov1.)) ## model.tables(aov1.,type = "means") ### Now try to be modern, use REML and hence ### ### lme models: ### === ------- library(nlme) ## Construct a pseudo grouping factor with just one level dw$pseudogrp <- factor(c("g1")) dwgrp <- groupedData(y ~ 1 | pseudogrp, data=dw) ###--------- 1 ------------------------ default contrasts --------- options(contrasts=c("contr.treatment", "contr.poly")) lme1 <- lme(y ~ LOCATION * PROV , data = dwgrp, random = pdBlocked(list(pdIdent(~TREE -1), pdIdent(~THALLUS -1)))) anova(lme1, type = "marginal") ## quite close to the aov() results above; ## F value for PROV:LOCATION is the same, but has wrong (?) deg.freedom ## ((because of the "wrong" pseudogrp grouping)) if(FALSE) summary(lme1) intervals(lme1,which = "var-cov") ranef(lme1) trellis.device() lset(col.whitebg()) ## Residual analysis: par(mfrow = c(1,2)) plot(fitted(lme1), residuals(lme1)); abline(h = 0, lty = 2) qqnorm(residuals(lme1)); qqline(residuals(lme1)) ###--------- 2 ------------------------ sum contrasts --------- options(contrasts=c("contr.sum", "contr.poly")) lme2 <- lme(y ~ LOCATION * PROV , data = dwgrp, random = pdBlocked(list(pdIdent(~TREE -1), pdIdent(~THALLUS -1)))) ## Same model as lme1 ? ## In some sense, yes: stopifnot(all.equal(residuals(lme1), residuals(lme2)), all.equal(fitted(lme1), fitted(lme2))) ## But this is quite different -- <<< why (REML) ? >>>> ## -- the SS(residuals) are the same, ## and the factor's SS should not depend on the contrasts? anova(lme2, type = "marginal") intervals(lme2,which = "var-cov") ranef(lme2) ## ty <- asin(sqrt(y)) lme4 <- lme(ty ~ LOCATION * PROV , data = dwgrp, random = pdBlocked(list(pdIdent(~TREE -1), pdIdent(~THALLUS -1)))) ## Warning messages: ## 1: Singular precision matrix in level -1, block 1 ## 2: Singular precision matrix in level -1, block 1 ## ????? ^^^^ numerical problem ?? --- investigating : if(FALSE){ options(warn = 2)## make warnings into errors ! lme4 <- lme(ty ~ LOCATION * PROV , data = dwgrp, random = pdBlocked(list(pdIdent(~TREE -1), pdIdent(~THALLUS -1)))) traceback() ## gives a long stack trace, with the lowest level function ## logLik.reStruct(.) options(warn = 0) # back to default } anova(lme4) # is quite a bit different from the anova()s above! ry <- round(ty, 2) lme5 <- lme(ry ~ LOCATION * PROV , data = dwgrp, random = pdBlocked(list(pdIdent(~TREE -1), pdIdent(~THALLUS -1))))