Hi, I have an unbalanced dataset on which I would like to perform a one-way anova test using R (aov). According to Wannacott and Wannacott (1990) p. 333, one-way anova with unbalanced data is possible with a few modifications in the anova-calculations. The modified anova calculations should take into account different sample sizes and a modified definition of the average. I was wondering if the aov-function in R is suitable for one-way anova on unbalanced data. Thanks, Ake Nauta [[alternative HTML version deleted]]
On Thu, Feb 28, 2008 at 7:52 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> wrote:> Hi,> I have an unbalanced dataset on which I would like to perform a one-way anova test using R (aov). According to Wannacott and Wannacott (1990) p. 333, one-way anova with unbalanced data is possible with a few modifications in the anova-calculations. The modified anova calculations should take into account different sample sizes and a modified definition of the average. I was wondering if the aov-function in R is suitable for one-way anova on unbalanced data.Yes. The analysis of variance is performed in R by fitting a linear model created from indicator variables for the levels of the factor. This validity of this approach does not depend on balance in the data. The formulas given in an introductory textbook are almost never the way that results are computed in practice. I think we would all be better off if they didn't even give these misleading formulas.
On Fri, Feb 29, 2008 at 4:47 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> wrote:> Thank you for your reply, > is your answer (that the approach does not depend on balance in the data) > only valid for one-way anova, or also for two-way or more-way anova?Any kind. You should be aware that for unbalanced data sets the sum of squares attributed to a term depends on the order in which the terms occur in the model. That is, the sum of squares and the F-ratios and the p-values for, say, factor A will be different if you fit a model y ~ A + B versus the model y ~ B + A to a data set where factors A and B are unbalanced. This is because the sums of squares displayed by R's anova methods are the sequential sums of squares. Although other statistical software may calculate other, more exotic, types of sums of squares, many of us would argue that these are the only ones that make sense. If in doubt about which sum of squares to use, the general rule is that you should only pay attention to the F ratio and p-value for the last term in the model.> ________________________________ > From: dmbates at gmail.com on behalf of Douglas Bates > Sent: Fri 29-2-2008 0:39 > To: Nauta, A.L. > Cc: r-help at r-project.org > Subject: Re: [R] unbalanced one-way ANOVA > > > > > > On Thu, Feb 28, 2008 at 7:52 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> > wrote: > > Hi, > > > I have an unbalanced dataset on which I would like to perform a one-way > anova test using R (aov). According to Wannacott and Wannacott (1990) p. > 333, one-way anova with unbalanced data is possible with a few modifications > in the anova-calculations. The modified anova calculations should take into > account different sample sizes and a modified definition of the average. I > was wondering if the aov-function in R is suitable for one-way anova on > unbalanced data. > > Yes. > > The analysis of variance is performed in R by fitting a linear model > created from indicator variables for the levels of the factor. This > validity of this approach does not depend on balance in the data. > > The formulas given in an introductory textbook are almost never the > way that results are computed in practice. I think we would all be > better off if they didn't even give these misleading formulas. >
Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt. Name: nicht verf?gbar URL: https://stat.ethz.ch/pipermail/r-help/attachments/20080229/021ca679/attachment.pl
On Fri, Feb 29, 2008 at 10:32 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> wrote:> I tried a 6-way anova, and indeed found out that changing the order of > factors influences the SS, F-ratio's and p-values. So what should I do if I > want to know which factor most strongly rejects H0? (H0 is the hypothese of > "no difference" in the population means) Should I better do 6 one-way > anova's (on each factor) and then compare the p-values?No. If you are going to try to perform a 6-way anova on an unbalanced data set you should read more about the analysis of variance so that you can understand the model and the hypotheses involved or ask a statistical consultant. This is not a topic that can be explained in a couple of email messages. You may find Bill Venables paper "Exegeses on Linear Models" (do an internet search on the title to find a copy) a good starting point.> ________________________________ > > From: dmbates at gmail.com on behalf of Douglas Bates > Sent: Fri 29-2-2008 15:38 > To: Nauta, A.L. > Cc: R Help > > > Subject: Re: [R] unbalanced one-way ANOVA > > > > > > On Fri, Feb 29, 2008 at 4:47 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> > wrote: > > > Thank you for your reply, > > is your answer (that the approach does not depend on balance in the data) > > only valid for one-way anova, or also for two-way or more-way anova? > > Any kind. > > You should be aware that for unbalanced data sets the sum of squares > attributed to a term depends on the order in which the terms occur in > the model. That is, the sum of squares and the F-ratios and the > p-values for, say, factor A will be different if you fit a model > > y ~ A + B > > versus the model > > y ~ B + A > > to a data set where factors A and B are unbalanced. > > This is because the sums of squares displayed by R's anova methods are > the sequential sums of squares. Although other statistical software > may calculate other, more exotic, types of sums of squares, many of us > would argue that these are the only ones that make sense. > > If in doubt about which sum of squares to use, the general rule is > that you should only pay attention to the F ratio and p-value for the > last term in the model. > > > ________________________________ > > From: dmbates at gmail.com on behalf of Douglas Bates > > Sent: Fri 29-2-2008 0:39 > > To: Nauta, A.L. > > Cc: r-help at r-project.org > > Subject: Re: [R] unbalanced one-way ANOVA > > > > > > > > > > > > On Thu, Feb 28, 2008 at 7:52 AM, Nauta, A.L. <A.L.Nauta at students.uu.nl> > > wrote: > > > Hi, > > > > > I have an unbalanced dataset on which I would like to perform a one-way > > anova test using R (aov). According to Wannacott and Wannacott (1990) p. > > 333, one-way anova with unbalanced data is possible with a few > modifications > > in the anova-calculations. The modified anova calculations should take > into > > account different sample sizes and a modified definition of the average. I > > was wondering if the aov-function in R is suitable for one-way anova on > > unbalanced data. > > > > Yes. > > > > The analysis of variance is performed in R by fitting a linear model > > created from indicator variables for the levels of the factor. This > > validity of this approach does not depend on balance in the data. > > > > The formulas given in an introductory textbook are almost never the > > way that results are computed in practice. I think we would all be > > better off if they didn't even give these misleading formulas. > > >