Andrew Robinson
2006-Mar-31 09:17 UTC
[R] Odd anova(lm()) order phenomenon, looking for an explanation
Hi everyone, I'm witnessing an odd modelling phenomenon that I can't explain. If anyone has seen this before, or can explain what's going on would let me know, I'd be very grateful! Especially if I'm just being dim. I'm fitting a pair of continuous variates and their interaction to some residuals from another model. The sequential anova statement changes with the term order; that's fine. But each term explains a much larger Sum Sq when it is listed second than when it is listed first.> anova(lm(residuals(delta.point.lm.0) ~ canopy.h + canopy.d,+ data=snow)) Analysis of Variance Table Response: residuals(delta.point.lm.0) Df Sum Sq Mean Sq F value Pr(>F) canopy.h 1 156.2 156.2 11.118 0.0009613 *** canopy.d 1 198.0 198.0 14.098 0.0002080 *** Residuals 303 4256.6 14.0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> anova(lm(residuals(delta.point.lm.0) ~ canopy.d + canopy.h,+ data=snow)) Analysis of Variance Table Response: residuals(delta.point.lm.0) Df Sum Sq Mean Sq F value Pr(>F) canopy.d 1 0.4 0.4 0.0284 0.8664 canopy.h 1 353.8 353.8 25.1871 8.887e-07 *** Residuals 303 4256.6 14.0 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1>I would have expected any term to explain less Sum Sq if listed second than if listed first. Is my intuition awry? Does anyone have any modelling insight to help me interpret what I'm seeing? Cheers Andrew -- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au
Berwin A Turlach
2006-Mar-31 09:31 UTC
[R] Odd anova(lm()) order phenomenon, looking for an explanation
G'day Andrew,>>>>> "AR" == Andrew Robinson <A.Robinson at ms.unimelb.edu.au> writes:AR> I would have expected any term to explain less Sum Sq if AR> listed second than if listed first. Is my intuition awry? Yes. :-) I would not expect that *any* term explains less Sum Sq if listed second, then life and (linear) modelling would be simple. The problem with multiple regression is that a covariate might look unimportant if used first (i.e. has small Sum Sq associated with it in the anova table), but if we first correct for other regressor, then this covariate becomes important all of a sudden (i.e. has large Sum Sq associated with it in the anova table). What surprised me, was that you observed this phenomenon with respect to both regressors. If only one had displayed this behaviour, I would have readily explained it as above, but that both display it, I found surprising too. AR> Does anyone have any modelling insight to help me interpret AR> what I'm seeing? Don't know if the following example, which shows the same behaviour, leads to any insight.> n <- 100 > x1 <- runif(n, -1,1) > x2 <- runif(n, -1,1) > y <- x1*x1*x2 + rnorm(n, sd=0.05) > y <- y - mean(y) > anova(lm(y~x1+x2))Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x1 1 0.0055 0.0055 0.1485 0.7008 x2 1 5.0071 5.0071 134.3499 <2e-16 *** Residuals 97 3.6151 0.0373 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> anova(lm(y~x2+x1))Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x2 1 4.9930 4.9930 133.9723 <2e-16 *** x1 1 0.0196 0.0196 0.5261 0.47 Residuals 97 3.6151 0.0373 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Cheers, Berwin ========================== Full address ===========================Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Mathematics and Statistics +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au Australia http://www.maths.uwa.edu.au/~berwin
Possibly Parallel Threads
- Multiple comparisons with a mixed effects model
- Questions about glht() and interpretation of output from Tukey's in multcomp
- Grouped data objects within GLS and Variogram
- lmer with spatial and temporal random factors, not nested
- Asterisk with Motorola Canopy