Amasco Miralisus
2006-Aug-26 22:07 UTC
[R] Type II and III sum of square in Anova (R, car package)
Hello everybody, I have some questions on ANOVA in general and on ANOVA in R particularly. I am not Statistician, therefore I would be very appreciated if you answer it in a simple way. 1. First of all, more general question. Standard anova() function for lm() or aov() models in R implements Type I sum of squares (sequential), which is not well suited for unbalanced ANOVA. Therefore it is better to use Anova() function from car package, which was programmed by John Fox to use Type II and Type III sum of squares. Did I get the point? 2. Now more specific question. Type II sum of squares is not well suited for unbalanced ANOVA designs too (as stated in STATISTICA help), therefore the general rule of thumb is to use Anova() function using Type II SS only for balanced ANOVA and Anova() function using Type III SS for unbalanced ANOVA? Is this correct interpretation? 3. I have found a post from John Fox in which he wrote that Type III SS could be misleading in case someone use some contrasts. What is this about? Could you please advice, when it is appropriate to use Type II and when Type III SS? I do not use contrasts for comparisons, just general ANOVA with subsequent Tukey post-hoc comparisons. Thank you in advance, Amasco [[alternative HTML version deleted]]
Mark Lyman
2006-Aug-27 05:50 UTC
[R] Type II and III sum of square in Anova (R, car package)
> 1. First of all, more general question. Standard anova() function for lm() > or aov() models in R implements Type I sum of squares (sequential), which > is not well suited for unbalanced ANOVA. Therefore it is better to use > Anova() function from car package, which was programmed by John Fox to use > Type II and Type III sum of squares. Did I get the point? > > 2. Now more specific question. Type II sum of squares is not well suited > for unbalanced ANOVA designs too (as stated in STATISTICA help), therefore > the general rule of thumb is to use Anova() function using Type II SS > only for balanced ANOVA and Anova() function using Type III SS for > unbalanced ANOVA? Is this correct interpretation? > > 3. I have found a post from John Fox in which he wrote that Type III SS > could be misleading in case someone use some contrasts. What is this about? > Could you please advice, when it is appropriate to use Type II and when > Type III SS? I do not use contrasts for comparisons, just general ANOVA > with subsequent Tukey post-hoc comparisons.There are many threads on this list that discuss this issue. Not being a great statistician myself, I would suggest you read through some of these as a start. As I understand, the best philosophy with regards to types of sums of squares is to use the type that tests the hypothesis you want. They were developed as a convenience to test many of the hypotheses a person might want "automatically," and put it into a nice, neat little table. However, with an interactive system like R it is usually even easier to test a full model vs. a reduced model. That is if I want to test the significance of an interaction, I would use anova(lm.fit2, lm.fit1) where lm.fit2 contains the interaction and lm.fit2 does not. The anova function will return the appropriate F-test. The danger with worrying about what type of sums of squares to use is that often we do not think about what hypotheses we are testing and if those hypotheses make sense in our situation. Mark Lyman
Prof Brian Ripley
2006-Aug-27 07:37 UTC
[R] Type II and III sum of square in Anova (R, car package)
I think this starts from the position of a batch-oriented package. In R you can refit models with update(), add1() and drop1(), and experienced S/R users almost never use ANOVA tables for unbalanced designs. Rather than fit a pre-specified set of sub-models, why not fit those sub-models that appear to make some sense for your problem and data? SInce your post lacks a signature and your credentials we have no idea of your background, which makes it very difficult even to know what reading to suggest to you. But Bill Venables' 'exegeses' paper (http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf) may be a good start. It does explain your comment '3.'. On Sun, 27 Aug 2006, Amasco Miralisus wrote:> Hello everybody, > > I have some questions on ANOVA in general and on ANOVA in R particularly. > I am not Statistician, therefore I would be very appreciated if you answer > it in a simple way. > > 1. First of all, more general question. Standard anova() function for lm() > or aov() models in R implements Type I sum of squares (sequential), which > is not well suited for unbalanced ANOVA. Therefore it is better to use > Anova() function from car package, which was programmed by John Fox to use > Type II and Type III sum of squares. Did I get the point? > > 2. Now more specific question. Type II sum of squares is not well suited > for unbalanced ANOVA designs too (as stated in STATISTICA help), therefore > the general rule of thumb is to use Anova() function using Type II SS > only for balanced ANOVA and Anova() function using Type III SS for > unbalanced ANOVA? Is this correct interpretation? > > 3. I have found a post from John Fox in which he wrote that Type III SS > could be misleading in case someone use some contrasts. What is this about? > Could you please advice, when it is appropriate to use Type II and when > Type III SS? I do not use contrasts for comparisons, just general ANOVA > with subsequent Tukey post-hoc comparisons. > > Thank you in advance, > Amasco > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
John Fox
2006-Aug-27 13:51 UTC
[R] Type II and III sum of square in Anova (R, car package)
Dear Amasco, A complete explanation of the issues that you raise is awkward in an email, so I'll address your questions briefly. Section 8.2 of my text, Applied Regression Analysis, Linear Models, and Related Methods (Sage, 1997) has a detailed discussion. (1) In balanced designs, so-called "Type I," "II," and "III" sums of squares are identical. If the STATA manual says that Type II tests are only appropriate in balanced designs, then that doesn't make a whole lot of sense (unless one believes that Type-II tests are nonsense, which is not the case). (2) One should concentrate not directly on different "types" of sums of squares, but on the hypotheses to be tested. Sums of squares and F-tests should follow from the hypotheses. Type-II and Type-III tests (if the latter are properly formulated) test hypotheses that are reasonably construed as tests of main effects and interactions in unbalanced designs. In unbalanced designs, Type-I sums of squares usually test hypotheses of interest only by accident. (3) Type-II sums of squares are constructed obeying the principle of marginality, so the kinds of contrasts employed to represent factors are irrelevant to the sums of squares produced. You get the same answer for any full set of contrasts for each factor. In general, the hypotheses tested assume that terms to which a particular term is marginal are zero. So, for example, in a three-way ANOVA with factors A, B, and C, the Type-II test for the AB interaction assumes that the ABC interaction is absent, and the test for the A main effect assumes that the ABC, AB, and AC interaction are absent (but not necessarily the BC interaction, since the A main effect is not marginal to this term). A general justification is that we're usually not interested, e.g., in a main effect that's marginal to a nonzero interaction. (4) Type-III tests do not assume that terms higher-order to the term in question are zero. For example, in a two-way design with factors A and B, the type-III test for the A main effect tests whether the population marginal means at the levels of A (i.e., averaged across the levels of B) are the same. One can test this hypothesis whether or not A and B interact, since the marginal means can be formed whether or not the profiles of means for A within levels of B are parallel. Whether the hypothesis is of interest in the presence of interaction is another matter, however. To compute Type-III tests using incremental F-tests, one needs contrasts that are orthogonal in the row-basis of the model matrix. In R, this means, e.g., using contr.sum, contr.helmert, or contr.poly (all of which will give you the same SS), but not contr.treatment. Failing to be careful here will result in testing hypotheses that are not reasonably construed, e.g., as hypotheses concerning main effects. (5) The same considerations apply to linear models that include quantitative predictors -- e.g., ANCOVA. Most software will not automatically produce sensible Type-III tests, however. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Amasco > Miralisus > Sent: Saturday, August 26, 2006 5:07 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Type II and III sum of square in Anova (R, car package) > > Hello everybody, > > I have some questions on ANOVA in general and on ANOVA in R > particularly. > I am not Statistician, therefore I would be very appreciated > if you answer it in a simple way. > > 1. First of all, more general question. Standard anova() > function for lm() or aov() models in R implements Type I sum > of squares (sequential), which is not well suited for > unbalanced ANOVA. Therefore it is better to use > Anova() function from car package, which was programmed by > John Fox to use Type II and Type III sum of squares. Did I > get the point? > > 2. Now more specific question. Type II sum of squares is not > well suited for unbalanced ANOVA designs too (as stated in > STATISTICA help), therefore the general rule of thumb is to > use Anova() function using Type II SS only for balanced ANOVA > and Anova() function using Type III SS for unbalanced ANOVA? > Is this correct interpretation? > > 3. I have found a post from John Fox in which he wrote that > Type III SS could be misleading in case someone use some > contrasts. What is this about? > Could you please advice, when it is appropriate to use Type > II and when Type III SS? I do not use contrasts for > comparisons, just general ANOVA with subsequent Tukey > post-hoc comparisons. > > Thank you in advance, > Amasco > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bill.Venables at csiro.au
2006-Aug-28 23:36 UTC
[R] Type II and III sum of square in Anova (R, car package)
I cannot resist a very brief entry into this old and seemingly immortal issue, but I will be very brief, I promise! Amasco Miralisus suggests:> As I understood form R FAQ, there is disagreement among Statisticians > which SS to use >(http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-output-from-a nova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f). To let this go is to concede way too much. The 'disagreement' is really over whether this is a sensible question to ask in the first place. One side of the debate suggests that the real question is what hypotheses does it make sense to test and within what outer hypotheses. Settle that question and no issue on "types" of sums of squares arises. This is often a hard question to get your head around, and the attraction of offering a variety of 'types of sums of squares' holds out the false hope that perhaps you don't need to do so. The bad news is that for good science and good decision making, you do. Bill Venables.
Frede Aakmann Tøgersen
2006-Sep-21 07:02 UTC
[R] Different result from nls in R-2.2.1 and R-2.3.1
Short story: January 2006 I did some analysis in R-2.2.1 using nls. Repeating the exercise in R-2.3.1 yesterday produced somewhat different results. After some debugging I found that either nls is the problem or that mine understanding of environments or scoping rules is lacking something. This is a short reproducing example. x <- seq(0,5,len=20) n <- 1 y <- 2*x^2 + n + rnorm(x) xy <- data.frame(x=x,y=y) myf <- function(x,a,b,n){ res <- a*x^b + n ## a print for debugging purpose print(n) res } ## This works as I expect it to do in R-2.2.1 but doesn't work in R-2.3.1. ## n is somehow sat to nrow(xy) inside nls() ## Note that x and y is defined in the dataframe xy, whereas n is found in the global environment. fit <- nls(y ~ myf(x,a,b,n), data=xy, start=c(a=1,b=1), trace=TRUE) ## this works in both versions ## x,y,n found in the .GlobalEnv fit <- nls(y ~ myf(x,a,b,n), start=c(a=1,b=1), trace=TRUE) ## this works in both versions. ## x, y, n found in dataframe xyn xyn <- data.frame(xy,n=n) fit <- nls(y ~ myf(x,a,b,n), data=xyn, start=c(a=1,b=1), trace=TRUE) ## this works in both versions ## Now using the variable .n instead of n ## .n is found in .GlobaEnv .n <- 1 fit <- nls(y ~ myf(x,a,b,.n), data=xy, start=c(a=1,b=1), trace=TRUE) In my real case and the example above, I do have three or more parameters of which fitting is done only on few of theme. Is this a problem? Or should I ask, why is this a problem in R-2.3.1 but not in R-2.2.1? Is my problem related to this difference between lines of code from nls: R-2.2.1: mf <- as.list(eval(mf, parent.frame())) R-2.3.1: mf <- eval.parent(mf) n <- nrow(mf) mf <- as.list(mf) where n is being defined in the scope of nls in the latest version? Best regards Frede Aakmann T?gersen Danish Institute of Agricultural Sciences Research Centre Foulum Dept. of Genetics and Biotechnology Blichers All? 20, P.O. BOX 50 DK-8830 Tjele Phone: +45 8999 1900 Direct: +45 8999 1878 E-mail: FredeA.Togersen at agrsci.dk Web: http://www.agrsci.org This email may contain information that is confidential. Any use or publication of this email without written permission from DIAS is not allowed. If you are not the intended recipient, please notify DIAS immediately and delete this email.