Hello, as far as I see, R reports type I sums of squares. I'd like to get R to print out type III sums of squares. e.g. I have the following model: vardep~factor1*factor2 to get the type III sum of squares for factor1 I've tried anova(lm(vardep~factor2+factor1:factor2),lm(vardep~factor1*factor2)) but that didn't yield the desired result. Could anyone give me a hint how to proceed? thanks in advance Josef
The short answer: use drop1(). The long(er) answer: think harder about what question(s) you want answered (i.e., what hypotheses you really want to test, and test only those). The model hierarchy says that a model should not have an interaction term involving a factor whose main effect is not present in the model. Seen in this light, the hypothesis you're trying to test involves a non-sensical model. Andy> -----Original Message----- > From: Josef Frank [mailto:josef.frank at gmx.ch] > Sent: Thursday, March 06, 2003 6:14 PM > To: r-help at stat.math.ethz.ch > Subject: [R] type III Sum Sq in ANOVA table - Howto? > > > Hello, > > as far as I see, R reports type I sums of squares. I'd like > to get R to > print out type III sums of squares. > > e.g. I have the following model: > vardep~factor1*factor2 > > to get the type III sum of squares for factor1 I've tried > anova(lm(vardep~factor2+factor1:factor2),lm(vardep~factor1*factor2)) > but that didn't yield the desired result. > > Could anyone give me a hint how to proceed? > > thanks in advance > Josef > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >------------------------------------------------------------------------------
Andy Liaw wrote:> The long(er) answer: think harder about what question(s) you want answered > (i.e., what hypotheses you really want to test, and test only those). The > model hierarchy says that a model should not have an interaction term > involving a factor whose main effect is not present in the model. Seen in > this light, the hypothesis you're trying to test involves a non-sensical > model.Not really. The hypothesis being tested by Type III sums of square may be suspected of not being of ``central interest'', but it is NOT (as is commonly believed) ``non-sensical''. Let us think about the 2-way ANOVA case, where one can actually understand what is going on. Let the population ***cell means*** be mu_ij (i = 1, ..., m, j = 1, ..., n) and forget about the confusing and misleading over-parameterized model. Testing for the significance of the ``row factor'' by Type III sums of squares (with interaction in the model of course) tests H_0: mu_{1.}-bar = mu_{2.}-bar = ... = mu_{m.}-bar I.e. that the means of the population cell means, over columns, are all equal. I.e. that ``when rows are averaged over columns'' there is no row effect. This could, at least conceiveably, be of interest. Note that the average is not a weighted average, saying that all columns are equally important. If all columns are NOT equally important (e.g. if an item randomly drawn from the population is more likely to ``come from'' column 1 than from column 2 etc.) then this hypothesis is less likely to be of interest. But it isn't nonsensical. It is true, however, that most of the time when people test things using Type III sums of squares they don't understand what they are really testing. But then (said he cynically) people don't understand what the hell they are really testing in most situations, not just in the context of Type III sums of squares. cheers, Rolf Turner
> From: Rolf Turner [mailto:rolf at math.unb.ca] > > Andy Liaw wrote: > > > The long(er) answer: think harder about what question(s) > you want answered > > (i.e., what hypotheses you really want to test, and test > only those). The > > model hierarchy says that a model should not have an > interaction term > > involving a factor whose main effect is not present in the > model. Seen in > > this light, the hypothesis you're trying to test involves > a non-sensical > > model. > > Not really. The hypothesis being tested by Type III sums of square > may be suspected of not being of ``central interest'', but it is NOT > (as is commonly believed) ``non-sensical''. > > Let us think about the 2-way ANOVA case, where one can actually > understand what is going on. Let the population ***cell means*** be > mu_ij (i = 1, ..., m, j = 1, ..., n) and forget about the confusing > and misleading over-parameterized model. > > Testing for the significance of the ``row factor'' by Type III > sums of squares (with interaction in the model of course) tests > > H_0: mu_{1.}-bar = mu_{2.}-bar = ... = mu_{m.}-bar > > I.e. that the means of the population cell means, over columns, are > all equal. I.e. that ``when rows are averaged over columns'' there > is no row effect. > > This could, at least conceiveably, be of interest. Note that the > average is not a weighted average, saying that all columns are > equally important. If all columns are NOT equally important (e.g. > if an item randomly drawn from the population is more likely to > ``come from'' column 1 than from column 2 etc.) then this hypothesis > is less likely to be of interest. > > But it isn't nonsensical. > > It is true, however, that most of the time when people test things > using Type III sums of squares they don't understand what they are > really testing. But then (said he cynically) people don't understand > what the hell they are really testing in most situations, not just > in the context of Type III sums of squares. > > cheers, > > Rolf TurnerI'm sorry, but I still don't see sense of this argument. By including the interaction term in the model, isn't it implied that the cells have different means, and the structure isn't a simple row + column? Assuming that being the case, what's the sense of "averaging" over columns (or rows)? I can perhaps understand the utility of such "test" in an exploratory setting, but fail to see how this can be valid test in a more rigorous sense. Maybe I'm stuck too deep in the rut... Cheers, Andy ------------------------------------------------------------------------------
On Fri, 7 Mar 2003, Josef Frank wrote:> Hello, > > as far as I see, R reports type I sums of squares. I'd like to get R to > print out type III sums of squares. > > e.g. I have the following model: > vardep~factor1*factor2 > > to get the type III sum of squares for factor1 I've tried > anova(lm(vardep~factor2+factor1:factor2),lm(vardep~factor1*factor2)) > but that didn't yield the desired result. > > Could anyone give me a hint how to proceed? >Unfortunately the arguments about whether Type III sums of squares are part of the axis of evil have drowned out a real issue. I would have expected the command to work, and in fact wrote a FAQ answer saying this was the way to do it. However, if factor1 is indeed a factor its main effect is helpfully stuck back in the model by terms.formula. I think this is a bug, since it doesn't happen if factor1 isn't a factor, and leaving aside any question about Type III SS it seems to make it impossible to fit the model lm(vardep~factor2+factor1:factor2) While this model isn't terribly often useful, it is sometimes. -thomas
Dear Thomas et al., At 05:33 PM 3/6/2003 -0800, Thomas Lumley wrote:>On Fri, 7 Mar 2003, Josef Frank wrote: > > > Hello, > > > > as far as I see, R reports type I sums of squares. I'd like to get R to > > print out type III sums of squares. > > > > e.g. I have the following model: > > vardep~factor1*factor2 > > > > to get the type III sum of squares for factor1 I've tried > > anova(lm(vardep~factor2+factor1:factor2),lm(vardep~factor1*factor2)) > > but that didn't yield the desired result. > > > > Could anyone give me a hint how to proceed? > > > >Unfortunately the arguments about whether Type III sums of squares are >part of the axis of evil have drowned out a real issue. > >I would have expected the command to work, and in fact wrote a FAQ answer >saying this was the way to do it. However, if factor1 is indeed a factor >its main effect is helpfully stuck back in the model by terms.formula. > >I think this is a bug, since it doesn't happen if factor1 isn't a factor, >and leaving aside any question about Type III SS it seems to make it >impossible to fit the model > lm(vardep~factor2+factor1:factor2) >While this model isn't terribly often useful, it is sometimes.The description of model formulas in Ch. 2 of Statistical Models in S explains why ~factor2+factor1:factor2 is treated as it is. Assuming that one really wants to test a "Type-III" hypothesis, the Anova function in the car package will do it (and "Type-II" tests as well). Regards, John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox -----------------------------------------------------
Bill.Venables@CMIS.CSIRO.AU
2003-Mar-07 07:46 UTC
[R] type III Sum Sq in ANOVA table - Howto?
Having sounded off on this issue so frequently (and furiously) in the past it is perhaps de rigeur for me to say something here... I'm older and calmer now, though. Suppose we have: fm <- aov(y ~ x + A*B, data) Then dropterm(fm, test = "F") Will get you the appropriate information when excluding the *Marginal* terms, one at a time, from the model, i.e. for X and A:B. It's not a bug that nothing else happens automatically. If you want sums of squares for the non-marginal terms, in this case for the main effects A and B, as well, you have strayed into tricky territory. The first sign that not all is as it seems is that the test now depends on what treatment contrasts you have specified: contr.treatment or one of the others. If you DO NOT use contr.treatment but one where the column sums of the contrast matrix are all zero, then you can get the "SAS Type III" sums of squares by an inexplicable (to me) trick: dropterm(fm, . ~ ., test = "F") but you can check that changing the contrasts back to "contr.treatment" gives you different (and even more dud) results. Rolf is right: there are conceivably cases where this is testing an hypothesis of interest, just as occasionally it is interesting to test if a regression line goes through the origin or if a quadratic regression has zero slope at some point, but these are not the usual cases. But it is rare, and in 35 years of consulting I have never really encountered such an occasion. The often-quoted reason to use 'Type III' tests is "to test the main effects when interactions ARE present", which if not further amplified or explained, really is a nonsense. My quarrel with SAS is that what they routinely provide *encourages* misunderstandings like this and hence bad inference. Making users go to some length to get such results is, in my view, no bad thing, (although the sequential AOV table that R and S-PLUS routinely provides is in some respects not much better from this point of view). Moral: Decide what null hypothesis you would like to test, within what outer hypothesis. Fit both models and explicitly test one within the other. There is then no need at all for any of this Type x palarver. Attempts to short-circuit the process with anova tables have to be viewed with some caution, even scepticism, as the capacity for nonsense factor is very operative. Note that if you go no further than what drop1 or dropterm provides under the default case, i.e. marginal terms only, then we have no quarrel. These are precisely the terms invariant with respect to contrast matrix. However beware of hidden non-marginal terms, such as the linear term in a quadratic regression. Bill Venables.> -----Original Message----- > From: John Fox [mailto:jfox at mcmaster.ca] > Sent: Friday, March 07, 2003 12:39 PM > To: Thomas Lumley; Josef Frank > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] type III Sum Sq in ANOVA table - Howto? > > Dear Thomas et al., > > At 05:33 PM 3/6/2003 -0800, Thomas Lumley wrote: > >On Fri, 7 Mar 2003, Josef Frank wrote: > > > > > Hello, > > > > > > as far as I see, R reports type I sums of squares. I'd like to get R > to > > > print out type III sums of squares. > > > > > > e.g. I have the following model: > > > vardep~factor1*factor2 > > > > > > to get the type III sum of squares for factor1 I've tried > > > anova(lm(vardep~factor2+factor1:factor2),lm(vardep~factor1*factor2)) > > > but that didn't yield the desired result. > > > > > > Could anyone give me a hint how to proceed? > > > > > > >Unfortunately the arguments about whether Type III sums of squares are > >part of the axis of evil have drowned out a real issue. > > > >I would have expected the command to work, and in fact wrote a FAQ answer > >saying this was the way to do it. However, if factor1 is indeed a factor > >its main effect is helpfully stuck back in the model by terms.formula. > > > >I think this is a bug, since it doesn't happen if factor1 isn't a factor, > >and leaving aside any question about Type III SS it seems to make it > >impossible to fit the model > > lm(vardep~factor2+factor1:factor2) > >While this model isn't terribly often useful, it is sometimes. > > The description of model formulas in Ch. 2 of Statistical Models in S > explains why ~factor2+factor1:factor2 is treated as it is. > > Assuming that one really wants to test a "Type-III" hypothesis, the Anova > function in the car package will do it (and "Type-II" tests as well). > > Regards, > John > > ----------------------------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario, Canada L8S 4M4 > email: jfox at mcmaster.ca > phone: 905-525-9140x23604 > web: www.socsci.mcmaster.ca/jfox > ----------------------------------------------------- > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help