thr3ads.net - R help - [R] Anova - adjusted or sequential sums of squares? [Apr 2005]

If this information is useful, please help other people find it:
Share via:

michael watson (IAH-C)

2005-Apr-20 13:40 UTC

[R] Anova - adjusted or sequential sums of squares?

Hi

I am performing an analysis of variance with two factors, each with two
levels.  I have differing numbers of observations in each of the four
combinations, but all four combinations *are* present (2 of the factor
combinations have 3 observations, 1 has 4 and 1 has 5)

I have used both anova(aov(...)) and anova(lm(...)) in R and it gave the
same result - as expected.  I then plugged this into minitab, performed
what minitab called a General Linear Model (I have to use this in
minitab as I have an unbalanced data set) and got a different result.
After a little mining this is because minitab, by default, uses the type
III adjusted SS.  Sure enough, if I changed minitab to use the type I
sequential SS, I get exactly the same results as aov() and lm() in R.  

So which should I use?  Type I adjusted SS or Type III sequential SS?
Minitab help tells me that I would "usually" want to use type III
adjusted SS, as  type I sequential "sums of squares can differ when your
design is unbalanced" - which mine is.  The R functions I am using are
clearly using the type I sequential SS.

Any help would be very much appreciated!

Thanks
Mick

Liaw, Andy

2005-Apr-20 14:04 UTC

head link

[R] Anova - adjusted or sequential sums of squares?

> From: michael watson (IAH-C)
> 
> Hi
> 
> I am performing an analysis of variance with two factors, 
> each with two
> levels.  I have differing numbers of observations in each of the four
> combinations, but all four combinations *are* present (2 of the factor
> combinations have 3 observations, 1 has 4 and 1 has 5)
> 
> I have used both anova(aov(...)) and anova(lm(...)) in R and 
> it gave the
> same result - as expected.  I then plugged this into minitab, 
> performed
> what minitab called a General Linear Model (I have to use this in
> minitab as I have an unbalanced data set) and got a different result.
> After a little mining this is because minitab, by default, 
> uses the type
> III adjusted SS.  Sure enough, if I changed minitab to use the type I
> sequential SS, I get exactly the same results as aov() and 
> lm() in R.  
> 
> So which should I use?  Type I adjusted SS or Type III sequential SS?
> Minitab help tells me that I would "usually" want to use type III
> adjusted SS, as  type I sequential "sums of squares can 
> differ when your
> design is unbalanced" - which mine is.  The R functions I am using are
> clearly using the type I sequential SS.
Here we go again...  The `type I vs. type III SS' controversy has
long been debated here and elsewhere.  I'll give my personal bias,
and leave you to dig deeper if you care to.

The `types' of sum of squares are creation of SAS.  Each type 
corresponds to different hypothesis being considered.  The
short answer to your question would be: `What are your null
and alternative hypotheses'?

One of the problems with categorizing like that is it tends to
keep people from thinking about the question above, and thus
leading to the confusion of which to use.

The school of thought I was broght up in says you need (and should)
not think that way.  Rather, frame your question in terms of 
model comparisons.  This approach avoids the notorious problem
of comparing the full model to ones that contain interaction, but 
lack one main effect that is involved in that interaction.

More practically:  Do you have interaction in your model?  If
so, the result for the interaction term should be the same in
either `type' of test.  If that interaction term is significant,
you should find other ways to understand the effects, and 
_not_ test for significance of the main effects in the presence
of interaction.  If there is no interaction term, you can 
assess effects by model comparisons such as:

m.full <- lm(y ~ A + B)
m.A <- lm(y ~ A)
m.B <- lm(y ~ B)
anova(m.B, m.full)  ## test for A effect
anova(m.A, m.full)  ## test for B effect

HTH,
Andy

 > Any help would be very much appreciated!
> 
> Thanks
> Mick
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

Douglas Bates

2005-Apr-20 14:06 UTC

head link

[R] Anova - adjusted or sequential sums of squares?

michael watson (IAH-C) wrote:> Hi
> 
> I am performing an analysis of variance with two factors, each with two
> levels.  I have differing numbers of observations in each of the four
> combinations, but all four combinations *are* present (2 of the factor
> combinations have 3 observations, 1 has 4 and 1 has 5)
> 
> I have used both anova(aov(...)) and anova(lm(...)) in R and it gave the
> same result - as expected.  I then plugged this into minitab, performed
> what minitab called a General Linear Model (I have to use this in
> minitab as I have an unbalanced data set) and got a different result.
> After a little mining this is because minitab, by default, uses the type
> III adjusted SS.  Sure enough, if I changed minitab to use the type I
> sequential SS, I get exactly the same results as aov() and lm() in R.  
> 
> So which should I use?  Type I adjusted SS or Type III sequential SS?
> Minitab help tells me that I would "usually" want to use type III
> adjusted SS, as  type I sequential "sums of squares can differ when
your
> design is unbalanced" - which mine is.  The R functions I am using are
> clearly using the type I sequential SS.
Install the fortunes package and try
 > fortune("Venables")

I'm really curious to know why the "two types" of sum of squares
are called
"Type I" and "Type III"! This is a very common
misconception, particularly
among SAS users who have been fed this nonsense quite often for all their
professional lives. Fortunately the reality is much simpler. There is, 
by any
sensible reckoning, only ONE type of sum of squares, and it always 
represents
an improvement sum of squares of the outer (or alternative) model over the
inner (or null hypothesis) model. What the SAS highly dubious 
classification of
sums of squares does is to encourage users to concentrate on the null
hypothesis model and to forget about the alternative. This is always a 
very bad
idea and not surprisingly it can lead to nonsensical tests, as in the 
test it
provides for main effects "even in the presence of interactions",
something
which beggars definition, let alone belief.
    -- Bill Venables
       R-help (November 2000)

In the words of the master, "there is ... only one type of sum of 
squares", which is the one that R reports.  The others are awkward 
fictions created for times when one could only afford to fit one or two 
linear models per week and therefore wanted the output to give results 
for all possible tests one could conceive, even if the models being 
tested didn't make sense.

michael watson (IAH-C)

2005-Apr-20 14:18 UTC

head link

[R] Anova - adjusted or sequential sums of squares?

Thanks for the response.  Answers to your questions in turn:

My null hypothesis is that these is no difference between the treatment
means.  I guess that makes my alternative there is a difference.

I understand all about interactions, and yes, there's an interaction
term in my model.  Moreover, it is a pretty easy to understand and
interpret interaction.  In this example case, yes the interaction term
is significant, and so I know I can and should only interpret this term
and not any of the lower order terms.  

However, I will be repeating this analysis for other response variables,
some of which inevitably will not have a significant interaction term.
What then?  I guess one answer would be to say that as it's not
significant, I could remove it from the model and perform some model
comparisons as you suggest?

Doug agrees with the guy who taught me stats, and I should only be
looking at the type I sequential sums of squares.  I also like that as
it comes out of R.  It's just minitab freaked me out.  

I guess what I want to know is if I use the type I sequential SS, as
reported by R, on my factorial anova which is unbalanced, am I doing
something horribly wrong?  I think the answer is no.  

I guess I could use drop1() to get from the type I to the type III in
R...

-----Original Message-----
From: Liaw, Andy [mailto:andy_liaw at merck.com] 
Sent: 20 April 2005 15:05
To: michael watson (IAH-C); r-help at stat.math.ethz.ch
Subject: RE: [R] Anova - adjusted or sequential sums of squares?

> From: michael watson (IAH-C)
> 
> Hi
> 
> I am performing an analysis of variance with two factors,
> each with two
> levels.  I have differing numbers of observations in each of the four
> combinations, but all four combinations *are* present (2 of the factor
> combinations have 3 observations, 1 has 4 and 1 has 5)
> 
> I have used both anova(aov(...)) and anova(lm(...)) in R and
> it gave the
> same result - as expected.  I then plugged this into minitab, 
> performed
> what minitab called a General Linear Model (I have to use this in
> minitab as I have an unbalanced data set) and got a different result.
> After a little mining this is because minitab, by default, 
> uses the type
> III adjusted SS.  Sure enough, if I changed minitab to use the type I
> sequential SS, I get exactly the same results as aov() and 
> lm() in R.  
> 
> So which should I use?  Type I adjusted SS or Type III sequential SS? 
> Minitab help tells me that I would "usually" want to use type III
> adjusted SS, as  type I sequential "sums of squares can differ when 
> your design is unbalanced" - which mine is.  The R functions I am 
> using are clearly using the type I sequential SS.
Here we go again...  The `type I vs. type III SS' controversy has long
been debated here and elsewhere.  I'll give my personal bias, and leave
you to dig deeper if you care to.

The `types' of sum of squares are creation of SAS.  Each type 
corresponds to different hypothesis being considered.  The short answer
to your question would be: `What are your null and alternative
hypotheses'?

One of the problems with categorizing like that is it tends to keep
people from thinking about the question above, and thus leading to the
confusion of which to use.

The school of thought I was broght up in says you need (and should) not
think that way.  Rather, frame your question in terms of 
model comparisons.  This approach avoids the notorious problem of
comparing the full model to ones that contain interaction, but 
lack one main effect that is involved in that interaction.

More practically:  Do you have interaction in your model?  If so, the
result for the interaction term should be the same in either `type' of
test.  If that interaction term is significant, you should find other
ways to understand the effects, and 
_not_ test for significance of the main effects in the presence of
interaction.  If there is no interaction term, you can 
assess effects by model comparisons such as:

m.full <- lm(y ~ A + B)
m.A <- lm(y ~ A)
m.B <- lm(y ~ B)
anova(m.B, m.full)  ## test for A effect
anova(m.A, m.full)  ## test for B effect

HTH,
Andy

 > Any help would be very much appreciated!
> 
> Thanks
> Mick
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> 
> 


------------------------------------------------------------------------
------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

michael watson (IAH-C)

2005-Apr-20 14:37 UTC

head link

[R] Anova - adjusted or sequential sums of squares?

I guess the real problem is this:

As I have a different number of observations in each of the groups, the
results *change* depending on which order I specify the factors in the
model.  This unnerves me.  With a completely balanced design, this
doesn't happen - the results are the same no matter which order I
specify the factors.  

It's this reason that I have been given for using the so-called type III
adjusted sums of squares...

Mick

-----Original Message-----
From: Douglas Bates [mailto:bates at stat.wisc.edu] 
Sent: 20 April 2005 15:07
To: michael watson (IAH-C)
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Anova - adjusted or sequential sums of squares?


michael watson (IAH-C) wrote:> Hi
> 
> I am performing an analysis of variance with two factors, each with 
> two levels.  I have differing numbers of observations in each of the 
> four combinations, but all four combinations *are* present (2 of the 
> factor combinations have 3 observations, 1 has 4 and 1 has 5)
> 
> I have used both anova(aov(...)) and anova(lm(...)) in R and it gave 
> the same result - as expected.  I then plugged this into minitab, 
> performed what minitab called a General Linear Model (I have to use 
> this in minitab as I have an unbalanced data set) and got a different 
> result. After a little mining this is because minitab, by default, 
> uses the type III adjusted SS.  Sure enough, if I changed minitab to 
> use the type I sequential SS, I get exactly the same results as aov()
and lm() in R.> 
> So which should I use?  Type I adjusted SS or Type III sequential SS? 
> Minitab help tells me that I would "usually" want to use type III
> adjusted SS, as  type I sequential "sums of squares can differ when 
> your design is unbalanced" - which mine is.  The R functions I am 
> using are clearly using the type I sequential SS.
Install the fortunes package and try
 > fortune("Venables")

I'm really curious to know why the "two types" of sum of squares
are
called "Type I" and "Type III"! This is a very common
misconception,
particularly among SAS users who have been fed this nonsense quite often
for all their professional lives. Fortunately the reality is much
simpler. There is, 
by any
sensible reckoning, only ONE type of sum of squares, and it always 
represents
an improvement sum of squares of the outer (or alternative) model over
the inner (or null hypothesis) model. What the SAS highly dubious 
classification of
sums of squares does is to encourage users to concentrate on the null
hypothesis model and to forget about the alternative. This is always a 
very bad
idea and not surprisingly it can lead to nonsensical tests, as in the 
test it
provides for main effects "even in the presence of interactions",
something which beggars definition, let alone belief.
    -- Bill Venables
       R-help (November 2000)

In the words of the master, "there is ... only one type of sum of 
squares", which is the one that R reports.  The others are awkward 
fictions created for times when one could only afford to fit one or two 
linear models per week and therefore wanted the output to give results 
for all possible tests one could conceive, even if the models being 
tested didn't make sense.

michael watson (IAH-C)

2005-Apr-21 08:51 UTC

head link

[R] Anova - adjusted or sequential sums of squares?

OK, I had no idea I was opening such a pandora's box, but thank you for
all of your answers, it's been fascinating reading.

This is how far I have got:

I will fit the most complex model, that is the one that includes the
interaction term.  If the interaction term is significant, I will only
interpret this term.

If the interaction term is not significant, then it makes sense to test
the effects of the factors on their own.  This is where I get a little
shaky... Using the example from the WNV paper, page 14.  If I want to
test for the effect of Litter, given that I have already decided that
there is no interaction term, I can fit:

Wt ~ Mother + Litter
Wt ~ Litter + Mother
Wt ~ Litter

The latter tests for the effect of Litter ignoring the effect of Mother.
The first two test for the effect of Litter eliminating the effect of
Mother.  Have I read that correct?  However, it still remains that the
top two give different results due to the non-orthogonal design.  

The way I see it I can do a variety of things when the interaction term
is NOT significant and I have a non-orthogonal design:

1) Run both models "Wt ~ Mother + Litter" and "Wt ~ Litter +
Mother" and
take the consensus opinion.  If that's the case, which p-values do I use
in my paper? (that's not as flippant a remark as it should be...)
2) Run both models "Wt ~ Litter" and "Wt ~ Mother", and use
those.  Is
that valid?
3) Believe Minitab, that I should use type III SS, change my contrast
matrices to sum to zero and use drop1(model, .~., test="F")

Many thanks

Mick

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Sent: 20 April 2005 16:35
To: michael watson (IAH-C)
Cc: Liaw, Andy; r-help at stat.math.ethz.ch
Subject: RE: [R] Anova - adjusted or sequential sums of squares?

On Wed, 20 Apr 2005, michael watson (IAH-C) wrote:
> I guess what I want to know is if I use the type I sequential SS, as 
> reported by R, on my factorial anova which is unbalanced, am I doing 
> something horribly wrong?  I think the answer is no.
Sort of.  You really should test a hypothesis at a time.  See Bill's 
examples in MASS.
> I guess I could use drop1() to get from the type I to the type III in 
> R...
Only if you respect marginality.  The quote Doug gave is based on a
longer 
paper available at

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

Do read it all.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Lucke, Joseph F

2005-Apr-21 14:28 UTC

head link

[R] Anova - adjusted or sequential sums of squares?

Assume Type 1 SS and no interaction.

Under Model 1, your sums of squares (SS) is partitioned SS(M), SS(L|M),
SS(E1|L,M).  In Model 2 it is SS(L), SS(M|L), SS(E2|L,M).  The total SS
in both Model 1 & 2 are equal, and SS(E1|L,M) = SS(E2|L,M). [ If the
design had been orthogonal then also SS(M)= SS(M|L) and SS(L)=SS(L|M) ].
In Model 3 it is
SS(L), SS(E3|L).  Now SS(E3|L) = SS(M|L)+ SS(E2|M,L).

If you want to test the _unconditional_ effect of Mother (ignoring
Mother), you compare Model 1 to Model 3 (using drop1() for example).  If
you want to test the _conditional_ effect of Mother (Litter effect
adjusted for Mother effect), you run Model 1 and test the main effect of
Litter (=Litter|Mother).

These are the same concepts as found in regression.

Joe

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of michael watson
(IAH-C)
Sent: Thursday, April 21, 2005 3:51 AM
To: Prof Brian Ripley
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] Anova - adjusted or sequential sums of squares?

OK, I had no idea I was opening such a pandora's box, but thank you for
all of your answers, it's been fascinating reading.

This is how far I have got:

I will fit the most complex model, that is the one that includes the
interaction term.  If the interaction term is significant, I will only
interpret this term.

If the interaction term is not significant, then it makes sense to test
the effects of the factors on their own.  This is where I get a little
shaky... Using the example from the WNV paper, page 14.  If I want to
test for the effect of Litter, given that I have already decided that
there is no interaction term, I can fit:

Wt ~ Mother + Litter
Wt ~ Litter + Mother
Wt ~ Litter

The latter tests for the effect of Litter ignoring the effect of Mother.
The first two test for the effect of Litter eliminating the effect of
Mother.  Have I read that correct?  However, it still remains that the
top two give different results due to the non-orthogonal design.  

The way I see it I can do a variety of things when the interaction term
is NOT significant and I have a non-orthogonal design:

1) Run both models "Wt ~ Mother + Litter" and "Wt ~ Litter +
Mother" and
take the consensus opinion.  If that's the case, which p-values do I use
in my paper? (that's not as flippant a remark as it should be...)
2) Run both models "Wt ~ Litter" and "Wt ~ Mother", and use
those.  Is
that valid?
3) Believe Minitab, that I should use type III SS, change my contrast
matrices to sum to zero and use drop1(model, .~., test="F")

Many thanks

Mick

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Sent: 20 April 2005 16:35
To: michael watson (IAH-C)
Cc: Liaw, Andy; r-help at stat.math.ethz.ch
Subject: RE: [R] Anova - adjusted or sequential sums of squares?

On Wed, 20 Apr 2005, michael watson (IAH-C) wrote:
> I guess what I want to know is if I use the type I sequential SS, as
> reported by R, on my factorial anova which is unbalanced, am I doing 
> something horribly wrong?  I think the answer is no.
Sort of.  You really should test a hypothesis at a time.  See Bill's 
examples in MASS.
> I guess I could use drop1() to get from the type I to the type III in
> R...
Only if you respect marginality.  The quote Doug gave is based on a
longer 
paper available at

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

Do read it all.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Reasonably Related Threads

Search for more seemingly similar threads

R help - Apr 2005 - Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

[R] Anova - adjusted or sequential sums of squares?

Reasonably Related Threads