thr3ads.net - R help - [R] drop1, 2-way Unbalanced ANOVA [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Nathan Miller

2012-Jul-23 16:58 UTC

[R] drop1, 2-way Unbalanced ANOVA

Hi all,

I've spent quite a lot of time searching through the help lists and reading
about how best to run perform a 2-way ANOVA with unbalanced data. I realize
this has been covered a great deal so I was trying to avoid adding yet
another entry to the long list considering the use of different SS, etc.
Unfortunately, I have come to the point where I feel I have to wade in and
see if someone can help me out. Hopefully I'll phrase this properly given
and hopefully it will end up only requiring a simple response.

I have an experiment where I have measured a response variable (such as
water content) following exposure to two treatments ("oxygen content"
and
"medium"). Oxygen content has three levels (5, 20, 35) and medium has
two
levels (Air, Water). I am interested if water content is different under
the two treatments and whether the effect of oxygen content depends upon
the medium in which the experiment was conducted (Air or Water).

Unfortunately, the design is unbalanced as some experimental subjects had
to be removed from the experiment.

I realize that if I just use aov() to perform a two-way ANOVA the order in
which the terms ("oxygen content" and "medium") are entered
will give
different results because of the sequential SS.

What I have done in the past is utilize drop1() in conjunction with aov()

drop1(aov(WaterContent~Oxygen*Medium, data), test="F")

to see if the interaction term was significant (F, p-value) and if its
inclusion improved model fit (AIC). If from this I determine that the
interaction term can be removed and the model can be rerun without it, I am
able to test for main-effects and get F and p-values that I can report in a
manuscript.

However, if the interaction term is significant and its inclusion is
warranted, drop1() only provide me with SS, F, and p-value for the
interaction term. Now this is fine, because I do not wish to interpret the
main-effects with a significant interaction, but in a manuscript reviewers
will request an "ANOVA table" where l will be asked to report SS, F
and
p-values for the other terms. I don't have those because I used drop1()
which only provides these for the highest order term in the model.

How best should I calculate the values that I know I will be asked to
provide in a manuscript?

I don't wish to come across as a scientist who is simply a slave to the F
and p-values with little regard for the data, the hypotheses, and the
actual statistical interpretation. I am interested in doing this
"right",
but I also know that practically in the current status of our field, while
I focus on doing statistics that address my hypotheses of interest and can
choose to not discuss the main effects in isolation when an interaction
exists, I will be asked to provide the "ANOVA table" with all the
degrees
of freedom, SS, F-values, p-values etc...for the entire model, not just the
highest order term.

Can anyone provide advice here? Should I just use the car package and Type
III SS with an appropriate contrast and not use the drop1() function, even
though I'm really not interested in using the Type III SS and I kinda like
the drop1()? I am not opposed to Type II SS, but clearly if the interaction
is important then using Type II SS, which do not consider interactions, are
not appropriate.

Hopefully this is somewhat clear and doesn't simply sound like a rehashing
of the same old "ANOVA and SS" story. Maybe I should be doing
something
completely different

I greatly appreciate constructive comments.

Thanks,

Nate

	[[alternative HTML version deleted]]

John Fox

2012-Jul-23 17:52 UTC

head link

[R] drop1, 2-way Unbalanced ANOVA

Dear Nate,

I don't want to repeat everything that I previously said on this subject,
both on this email list and more generally, but briefly: The reason to do
so-called type-II tests is that they're maximally powerful for the main
effects if the interactions are nil, which is precisely the circumstance in
which main effects are generally of interest. The argument for doing so-called
type-III tests, properly formulated of course, is that they are valid (if not
maximally powerful) when the interactions are nil and also test hypotheses that
can be constructed to be about "main effects" (i.e., the effect of
each factor averaged over the levels of the other) when interactions are not
nil. I personally prefer the first approach. If you use drop1(), removing the
interaction after testing it, you'll get something similar to a type-II
test, but pooling the interaction into the estimate of error variance.

I hope this helps,
 John

On Mon, 23 Jul 2012 09:58:50 -0700
 Nathan Miller <natemiller77 at gmail.com> wrote:> Hi all,
> 
> I've spent quite a lot of time searching through the help lists and
reading
> about how best to run perform a 2-way ANOVA with unbalanced data. I realize
> this has been covered a great deal so I was trying to avoid adding yet
> another entry to the long list considering the use of different SS, etc.
> Unfortunately, I have come to the point where I feel I have to wade in and
> see if someone can help me out. Hopefully I'll phrase this properly
given
> and hopefully it will end up only requiring a simple response.
> 
> I have an experiment where I have measured a response variable (such as
> water content) following exposure to two treatments ("oxygen
content" and
> "medium"). Oxygen content has three levels (5, 20, 35) and medium
has two
> levels (Air, Water). I am interested if water content is different under
> the two treatments and whether the effect of oxygen content depends upon
> the medium in which the experiment was conducted (Air or Water).
> 
> Unfortunately, the design is unbalanced as some experimental subjects had
> to be removed from the experiment.
> 
> I realize that if I just use aov() to perform a two-way ANOVA the order in
> which the terms ("oxygen content" and "medium") are
entered will give
> different results because of the sequential SS.
> 
> What I have done in the past is utilize drop1() in conjunction with aov()
> 
> drop1(aov(WaterContent~Oxygen*Medium, data), test="F")
> 
> to see if the interaction term was significant (F, p-value) and if its
> inclusion improved model fit (AIC). If from this I determine that the
> interaction term can be removed and the model can be rerun without it, I am
> able to test for main-effects and get F and p-values that I can report in a
> manuscript.
> 
> However, if the interaction term is significant and its inclusion is
> warranted, drop1() only provide me with SS, F, and p-value for the
> interaction term. Now this is fine, because I do not wish to interpret the
> main-effects with a significant interaction, but in a manuscript reviewers
> will request an "ANOVA table" where l will be asked to report SS,
F and
> p-values for the other terms. I don't have those because I used drop1()
> which only provides these for the highest order term in the model.
> 
> How best should I calculate the values that I know I will be asked to
> provide in a manuscript?
> 
> I don't wish to come across as a scientist who is simply a slave to the
F
> and p-values with little regard for the data, the hypotheses, and the
> actual statistical interpretation. I am interested in doing this
"right",
> but I also know that practically in the current status of our field, while
> I focus on doing statistics that address my hypotheses of interest and can
> choose to not discuss the main effects in isolation when an interaction
> exists, I will be asked to provide the "ANOVA table" with all the
degrees
> of freedom, SS, F-values, p-values etc...for the entire model, not just the
> highest order term.
> 
> Can anyone provide advice here? Should I just use the car package and Type
> III SS with an appropriate contrast and not use the drop1() function, even
> though I'm really not interested in using the Type III SS and I kinda
like
> the drop1()? I am not opposed to Type II SS, but clearly if the interaction
> is important then using Type II SS, which do not consider interactions, are
> not appropriate.
> 
> Hopefully this is somewhat clear and doesn't simply sound like a
rehashing
> of the same old "ANOVA and SS" story. Maybe I should be doing
something
> completely different
> 
> I greatly appreciate constructive comments.
> 
> Thanks,
> 
> Nate
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2012-Jul-23 18:10 UTC

head link

[R] drop1, 2-way Unbalanced ANOVA

On Jul 23, 2012, at 18:58 , Nathan Miller wrote:
> Hi all,
> 
> I've spent quite a lot of time searching through the help lists and
reading
> about how best to run perform a 2-way ANOVA with unbalanced data. I realize
> this has been covered a great deal so I was trying to avoid adding yet
> another entry to the long list considering the use of different SS, etc.
> Unfortunately, I have come to the point where I feel I have to wade in and
> see if someone can help me out. Hopefully I'll phrase this properly
given
> and hopefully it will end up only requiring a simple response.
> 
> I have an experiment where I have measured a response variable (such as
> water content) following exposure to two treatments ("oxygen
content" and
> "medium"). Oxygen content has three levels (5, 20, 35) and medium
has two
> levels (Air, Water). I am interested if water content is different under
> the two treatments and whether the effect of oxygen content depends upon
> the medium in which the experiment was conducted (Air or Water).
> 
> Unfortunately, the design is unbalanced as some experimental subjects had
> to be removed from the experiment.
> 
> I realize that if I just use aov() to perform a two-way ANOVA the order in
> which the terms ("oxygen content" and "medium") are
entered will give
> different results because of the sequential SS.
> 
> What I have done in the past is utilize drop1() in conjunction with aov()
> 
> drop1(aov(WaterContent~Oxygen*Medium, data), test="F")
> 
> to see if the interaction term was significant (F, p-value) and if its
> inclusion improved model fit (AIC). If from this I determine that the
> interaction term can be removed and the model can be rerun without it, I am
> able to test for main-effects and get F and p-values that I can report in a
> manuscript.
> 
> However, if the interaction term is significant and its inclusion is
> warranted, drop1() only provide me with SS, F, and p-value for the
> interaction term. Now this is fine, because I do not wish to interpret the
> main-effects with a significant interaction, but in a manuscript reviewers
> will request an "ANOVA table" where l will be asked to report SS,
F and
> p-values for the other terms. I don't have those because I used drop1()
> which only provides these for the highest order term in the model.
> 
> How best should I calculate the values that I know I will be asked to
> provide in a manuscript?
> 
> I don't wish to come across as a scientist who is simply a slave to the
F
> and p-values with little regard for the data, the hypotheses, and the
> actual statistical interpretation. I am interested in doing this
"right",
> but I also know that practically in the current status of our field, while
> I focus on doing statistics that address my hypotheses of interest and can
> choose to not discuss the main effects in isolation when an interaction
> exists, I will be asked to provide the "ANOVA table" with all the
degrees
> of freedom, SS, F-values, p-values etc...for the entire model, not just the
> highest order term.
> 
> Can anyone provide advice here? Should I just use the car package and Type
> III SS with an appropriate contrast and not use the drop1() function, even
> though I'm really not interested in using the Type III SS and I kinda
like
> the drop1()? I am not opposed to Type II SS, but clearly if the interaction
> is important then using Type II SS, which do not consider interactions, are
> not appropriate.
> 
> Hopefully this is somewhat clear and doesn't simply sound like a
rehashing
> of the same old "ANOVA and SS" story. Maybe I should be doing
something
> completely different
> 
> I greatly appreciate constructive comments.
(1) Do yourself a favor and restrict aov() to balanced analyses. It probably
won't do anything badly wrong if there is only one error term, but it offers
little above plain lm() in those cases.

(2) Give the reviewers what they want, unless clearly ridiculous. In this case
it probably isn't. (For one thing, some may want to disregard a weakly
significant interaction if one or both tests of main effects are not
significant. Also, stating the df etc. gives some indication that the author
knows what he is doing and guards against silliness like using category codes as
numerical.)

I'd avoid Type-III SS since they have fooled me so often. If you can get
away with it, see if they'll accept the _two_ Type-I tables corresponding to

anova(lm(WaterContent ~ Oxygen * Medium, data), test="F")
anova(lm(WaterContent ~ Medium * Oxygen, data), test="F")

If that doesn't work, you can consider selecting the Type-I table that is
most relevant, e.g., if "everybody knows" that there is a Medium
effect, then ensure that it enters the model first. Alternatively, you can give
Type II SS for the main effects, "Medium | Oxygen" and "Oxygen |
Medium". In both cases, it would be appropriate to insert a note that the
table does not tell the whole story  because of unbalancedness.

However, first and foremost: If there is an interaction, you need to spend some
time explaining what its nature is: Simple effect modification, only effect of
say Oxygen in some Media, complete effect reversion or....

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jul 2012 - drop1, 2-way Unbalanced ANOVA

[R] drop1, 2-way Unbalanced ANOVA

[R] drop1, 2-way Unbalanced ANOVA

[R] drop1, 2-way Unbalanced ANOVA

Seemingly Similar Threads