dear members, I have a question on anova as implemented in R. If there is an unbalanced design in multifactor anova, will aov or lm work properly? I was reading a book on excel where the author points that in an unbalanced design, the factors, as coded vectors, are correlated. He says that variance will be allocated properly only when the coded vectors are uncorrelated. But he also justifies that the function TREND() in Excel handles this automatically using semipartial correlations. What about aov or lm in R, which are used to implement anova? Should we do some thing extra for them to work properly in an unbalanced design? Or will the coding system used by R to represent the factors or levels internally handles the correlation? THanking you, Yours sincerely, AKSHAY M KULKARNI [[alternative HTML version deleted]]
In brief, aov() requires balancedness (or at least you _really_ need to know what you are doing otherwise), lm() does not, but you need to be careful that results, like in any multiple regression, depends on test order. For models with random effects, things get tricky and you likely need to use the "lme4" package. - Peter D.> On 18 Jan 2022, at 08:14 , akshay kulkarni <akshay_e4 at hotmail.com> wrote: > > dear members, > I have a question on anova as implemented in R. > > If there is an unbalanced design in multifactor anova, will aov or lm work properly? I was reading a book on excel where the author points that in an unbalanced design, the factors, as coded vectors, are correlated. He says that variance will be allocated properly only when the coded vectors are uncorrelated. But he also justifies that the function TREND() in Excel handles this automatically using semipartial correlations. > > What about aov or lm in R, which are used to implement anova? Should we do some thing extra for them to work properly in an unbalanced design? Or will the coding system used by R to represent the factors or levels internally handles the correlation? > > THanking you, > Yours sincerely, > AKSHAY M KULKARNI > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Please read ?aov and ?lm (and ?anova.lm). This should ordinarily be your first port of call before posting here. The former explicitly says: "aov is designed for balanced designs, and the results can be hard to interpret without balance: beware that missing values in the response(s) will likely lose the balance. If there are two or more error strata, the methods used are statistically inefficient without balance, and it may be better to use lme in package nlme. Balance can be checked with the replications function." Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jan 17, 2022 at 11:14 PM akshay kulkarni <akshay_e4 at hotmail.com> wrote:> > dear members, > I have a question on anova as implemented in R. > > If there is an unbalanced design in multifactor anova, will aov or lm work properly? I was reading a book on excel where the author points that in an unbalanced design, the factors, as coded vectors, are correlated. He says that variance will be allocated properly only when the coded vectors are uncorrelated. But he also justifies that the function TREND() in Excel handles this automatically using semipartial correlations. > > What about aov or lm in R, which are used to implement anova? Should we do some thing extra for them to work properly in an unbalanced design? Or will the coding system used by R to represent the factors or levels internally handles the correlation? > > THanking you, > Yours sincerely, > AKSHAY M KULKARNI > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Tue, 18 Jan 2022 07:14:17 +0000 akshay kulkarni <akshay_e4 at hotmail.com> wrote: <SNIP>> I was reading a book on excel ....<SNIP> Don't!!! Clearly you are way out of your depth. Seek local advice from a qualified mathematical statistician. cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276