John Christie

2003-Jul-12 00:20 UTC

### [R] ss''s are incorrect from aov with multiple factors

Hi, I have been trying to work with error terms given back from aov to make confidence intervals. However, the numbers seem to be incorrect. If there is more than one term in the ANOVA then the error terms can be inflated by the number of factors in the extra terms. The F''s are correct so it is right back to the SS. I was wondering if this is standard practice for stats programs or unique to R?

On 11 Jul 03, at 21:20, John Christie wrote:> Hi, > I have been trying to work with error terms given back from aov to > make confidence intervals. However, the numbers seem to be incorrect. > If there is more than one term in the ANOVA then the error terms can be > inflated by the number of factors in the extra terms. The F''s are > correct so it is right back to the SS. I was wondering if this is > standard practice for stats programs or unique to R? > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-helpIn what sense are the SSs "incorrect", exactly? And what do you think the "correct" values should be? ---JRG John R. Gleason Associate Professor Syracuse University 430 Huntington Hall Voice: 315-443-3107 Syracuse, NY 13244-2340 USA FAX: 315-443-4085 PGP public key at keyservers

John Christie

2003-Jul-12 00:57 UTC

### [R] ss''s are incorrect from aov with multiple factors

On Friday, July 11, 2003, at 09:38 PM, JRG wrote:> On 11 Jul 03, at 21:20, John Christie wrote: > > In what sense are the SSs "incorrect", exactly? And what do you think > the "correct" values should be?Well, if I take the residuals for one of the main effects I should be able to calculate a confidence interval from it that has some relationship to the actual values for that effect so that an accurate plot can be made. The transformation isn''t too hard, I just need to divide them by the number of factors in the other term. But, my recollection is that this isn''t true for other stats packages and may lead people who are new and trying to calculate them for the first time to incorrect conclusions. The reported values should be corrected. Although, my recollection could be wrong.

Spencer Graves

2003-Jul-12 01:00 UTC

### [R] ss''s are incorrect from aov with multiple factors

Dear John Christie: People tend to get the quickest and most helpful responses when they provide a toy problem that produces what they think are anamolous results. This increases the chances that someone will be able to provide a sensible answer in the few seconds they have available for your question. Often, the process of preparing a toy problem leads them to an answer to their question. Sorry I couldn''t be more helpful. spencer graves JRG wrote:> On 11 Jul 03, at 21:20, John Christie wrote: > > >>Hi, >> I have been trying to work with error terms given back from aov to >>make confidence intervals. However, the numbers seem to be incorrect. >>If there is more than one term in the ANOVA then the error terms can be >>inflated by the number of factors in the extra terms. The F''s are >>correct so it is right back to the SS. I was wondering if this is >>standard practice for stats programs or unique to R? >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > In what sense are the SSs "incorrect", exactly? And what do you think the "correct" values should be? > > ---JRG > > > John R. Gleason > Associate Professor > > Syracuse University > 430 Huntington Hall Voice: 315-443-3107 > Syracuse, NY 13244-2340 USA FAX: 315-443-4085 > > PGP public key at keyservers > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help

John Christie

2003-Jul-12 02:33 UTC

### [R] ss''s are incorrect from aov with multiple factors (EXAMPLE!)

OK, I do see that there is a problem in my first email. I have noticed this with repeated measures designs. Otherwise, of course, there is only one error term for all factors. But, with repeated measures designs this is not the case. On Friday, July 11, 2003, at 10:00 PM, Spencer Graves wrote:> People tend to get the quickest and most helpful responses when > they provide a toy problem that produces what they think are anamolous > resultshere is an admittedly poor example with factors a and b and s subjects. a<-factor(rep(c(0,1),12)) b<-factor(rep(c(0,0,1,1),6)) s<- factor(rep(1:6,each=4)) x <- c(49.5, 62.8, 46.8, 57, 59.8, 58.5, 55.5, 56, 62.8, 55.8, 69.5, 55, 62, 48.8, 45.5, 44.2, 52, 51.5, 49.8, 48.8, 57.2, 59, 53.2, 56) now summary(aov(x~a*b+Error(s/(a*b)))) gives a table of results but, if one wanted to generate a confidence interval for factor b one needs to reanalyze the results thusly ss<-aggregate(x, list(s=s, b=b), mean) summary(aov(x~b+Error(s/b), data=ss)) This yields an error term half the size as that reported for b in the combined ANOVA. I would suggest that the way the ss and MSE are reported is erroneous since they should be able to be used to directly calculate confidence intervals or make mean comparisons without having to collapse and reanalyze for every effect. Furthermore, I am guessing that this problem makes it impossible to get a correct average MSE that includes the interaction term. OK, far from impossible, but very difficult to verify that the term is correct. NOTE F for b is the same in the first ANOVA and the second.

Peter Dalgaard BSA

2003-Jul-12 10:37 UTC

### [R] ss''s are incorrect from aov with multiple factors (EXAMPLE!)

John Christie <jc at or.psychology.dal.ca> writes:> OK, I do see that there is a problem in my first email. I have > noticed this with repeated measures designs. Otherwise, of course, > there is only one error term for all factors. But, with repeated > measures designs this is not the case. > > > On Friday, July 11, 2003, at 10:00 PM, Spencer Graves wrote: > > > People tend to get the quickest and most helpful responses > > when they provide a toy problem that produces what they think are > > anamolous results > > here is an admittedly poor example with factors a and b and s subjects. > > a<-factor(rep(c(0,1),12)) > b<-factor(rep(c(0,0,1,1),6)) > s<- factor(rep(1:6,each=4)) > x <- c(49.5, 62.8, 46.8, 57, 59.8, 58.5, 55.5, 56, 62.8, 55.8, 69.5, > 55, 62, 48.8, 45.5, 44.2, 52, 51.5, 49.8, 48.8, 57.2, 59, 53.2, 56) > > now > > summary(aov(x~a*b+Error(s/(a*b)))) > > gives a table of results > but, if one wanted to generate a confidence interval for factor b one > needs to reanalyze the results thusly > > ss<-aggregate(x, list(s=s, b=b), mean) > summary(aov(x~b+Error(s/b), data=ss)) > > This yields an error term half the size as that reported for b in the > combined ANOVA. I would suggest that the way the ss and MSE are > reported is erroneous since they should be able to be used to directly > calculate confidence intervals or make mean comparisons without having > to collapse and reanalyze for every effect. > > Furthermore, I am guessing that this problem makes it impossible to > get a correct average MSE that includes the interaction term. OK, far > from impossible, but very difficult to verify that the term is correct. > > NOTE F for b is the same in the first ANOVA and the second.As far as I can tell, yes, you get different results if you analyse the original data than if you collapse by taking means over the a factor, and no, you should not expect otherwise. The various SS in the full analysis are distance measures in 24-dim space, whereas in the aggregated analysis you get a distance in 12-space. The relation is that every value entering in the b and s:b terms will be duplicated in the former, hence the SS is twice as big. This is standard procedure, and R does the same as e.g. Genstat in this respect. It is also necessary to ensure that the residual MS are comparable, e.g. that you can test for a significant s:b random effect by comparing with the residual MS to that of the s:a:b stratum. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /''_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907

John Christie

2003-Jul-12 22:47 UTC

### [R] ss''s are incorrect from aov with multiple factors (EXAMPLE!)

On Saturday, July 12, 2003, at 07:40 AM, Peter Dalgaard BSA wrote:> factor, and no, you should not expect otherwise. The various SS in the > full analysis are distance measures in 24-dim space, whereas in the > aggregated analysis you get a distance in 12-space. The relation is > that every value entering in the b and s:b terms will be duplicated in > the former, hence the SS is twice as big. > > This is standard procedure, and R does the same as e.g. Genstat in > this respect. It is also necessary to ensure that the residual MS are > comparable, e.g. that you can test for a significant s:b random effect > by comparing with the residual MS to that of the s:a:b stratum.OK, perhaps I need a little help then. Suppose I do an interaction plot of a*b and I want to see what it looks like with 95%CI error bars. Following Loftus & Masson (1995) there would be one of two ways. I could generate an error bar for the main effect I was interested in and stress in the description that the error bars only apply across that main effect. I take it from what you have said that I would collapse the data in order to generate a proper error bar for only one effect. Or, I could generate one from a weighted average of the MSE from a, b, and a:b. The question I have is, would I get each of the main effects in that from separate analyses? BTW, Statview seems to generate the same MSE for me whether I collapse the data or not.