Larry A Sonna
2005-Sep-09 14:10 UTC
[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA
Dear R community, I am trying to resolve a discrepancy between the way SPSS and R handle 2-way, repeated measures ANOVA. An experiment was performed in which samples were drawn before and after treatment of four groups of subjects (control and disease states 1, 2 and 3). Each group contained five subjects. An experimental measurement was performed on each sample to yield a "signal". The before and after treatment signals for each subject were treated as repeated measures. We desire to obtain P values for disease state ("CONDITION"), and the interaction between signal over time and disease state ("CONDITION*TIME"). Using SPSS, the following output was obtained: DF SumSq (Type 3) Mean Sq F value P COND 3 42861 14287 3.645 0.0355 TIME 1 473 473 0.175 0.681 COND*TIME 3 975 325 0.120 0.947 Error 16 43219 2701 By contrast, using the following R command: summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND), Type="III")) the output was as follows: Df Sum Sq Mean Sq F value Pr(>F) COND 3 26516 8839 3.2517 0.03651 * TIME 1 473 473 0.1739 0.67986 COND:TIME 3 975 325 0.1195 0.94785 Residuals 28 76107 2718 I don't understand why the two results are discrepant. In particular, I'm not sure why R is yielding 28 DF for the residuals whereas SPSS only yields 16. Can anyone help? E-mail replies would be much appreciated. I can be reached at larry_sonna at yahoo.com and at larry_sonna at hotmail.com Thanks in advance, Larry Sonna
John Maindonald
2005-Sep-10 12:17 UTC
[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA
There are 20 distinct individuals, right? expno breaks the 20 individuals into five groups of 4, right? Is this a blocking factor? If expno is treated as a blocking factor, the following is what you get: > xy <- expand.grid(expno=letters[1:5],cond=letters[1:4], + time=factor(paste(1:2))) > xy$subj <- factor(paste(xy$expno, xy$cond, sep=":")) > xy$cond <- factor(xy$cond) > xy$expno <- factor(xy$expno) > xy$y <- rnorm(40) > summary(aov(y~cond*time+Error(expno/cond), data=xy)) Error: expno Df Sum Sq Mean Sq F value Pr(>F) Residuals 4 3.59 0.90 Error: expno:cond Df Sum Sq Mean Sq F value Pr(>F) cond 3 1.06 0.35 0.36 0.78 Residuals 12 11.86 0.99 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 1 2.27 2.27 1.38 0.26 cond:time 3 3.27 1.09 0.67 0.59 Residuals 16 26.19 1.64 If on the other hand this is analyzed as for a complete randomized design, the following is the output: > summary(aov(y~cond*time+Error(subj), data=xy)) Error: subj Df Sum Sq Mean Sq F value Pr(>F) cond 3 1.06 0.35 0.37 0.78 Residuals 16 15.46 0.97 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 1 2.27 2.27 1.38 0.26 cond:time 3 3.27 1.09 0.67 0.59 Residuals 16 26.19 1.64 On 10 Sep 2005, at 8:00 PM, Larry A Sonna wrote:> From: "Larry A Sonna" <larry_sonna at hotmail.com> > Date: 10 September 2005 12:10:06 AM > To: <r-help at stat.math.ethz.ch> > Subject: [R] Discrepancy between R and SPSS in 2-way, repeated > measures ANOVA > > > Dear R community, > > I am trying to resolve a discrepancy between the way SPSS and R > handle 2-way, repeated measures ANOVA. > > An experiment was performed in which samples were drawn before and > after treatment of four groups of subjects (control and disease > states 1, 2 and 3). Each group contained five subjects. An > experimental measurement was performed on each sample to yield a > "signal". The before and after treatment signals for each subject > were treated as repeated measures. We desire to obtain P values > for disease state ("CONDITION"), and the interaction between signal > over time and disease state ("CONDITION*TIME"). > > Using SPSS, the following output was obtained: > DF SumSq (Type 3) Mean Sq F > value P> > COND 3 42861 14287 > 3.645 0.0355 > > TIME 1 473 > 473 0.175 0.681 > > COND*TIME 3 975 325 > 0.120 0.947 > > Error 16 43219 2701 > > > > By contrast, using the following R command: > > summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND), > Type="III")) > > the output was as follows: > > Df Sum Sq Mean Sq F value Pr(>F) > > COND 3 26516 8839 3.2517 0.03651 * > > TIME 1 473 473 0.1739 0.67986 > > COND:TIME 3 975 325 0.1195 0.94785 > > Residuals 28 76107 2718 > > > > I don't understand why the two results are discrepant. In > particular, I'm not sure why R is yielding 28 DF for the residuals > whereas SPSS only yields 16. Can anyone help? > >John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.
John Maindonald
2005-Sep-13 00:48 UTC
[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA
For the record, it turns out that EXPNO ran from 1 to 20, i.e., it identified subject. Thus EXPNO/COND parsed into the two error terms (additional to residual) EXPNO and EXPNO:COND. This second error term accounts for all variation between levels of COND; so there is no COND sum of squares. (In SPSS the fixed effect COND may have taken precedence; I do not know for sure.) In R, if this was a complete randomized design, the term Error(EXPO), or in the mock-up example I gave Error(subj), would be enough on its own. The R implementation can handle error terms akin to Error(REPNO/subj), but because there are redundant model matrix columns generated by the REPNO:subj term, complains that the Error() model is singular. In general, terms of the form a/b should be used only if b is nested within a, i.e., REPNO/IndividualWithinBlock (where IndividualWithinBlock runs from 1 to 4) not REPNO/subj. (Either of these cause REPNO to be treated as a blocking factor). > xy <- expand.grid(REPNO=letters[1:5], COND=letters[1:4], + TIME=factor(paste(1:2))) > xy$subj <- factor(paste(xy$REPNO, xy$COND, sep=":")) > ## Below subj becomes EXPNO > xy$COND <- factor(xy$COND) > xy$REPNO <- factor(xy$REPNO) > xy$y <- rnorm(40) Plea to those who post such questions to the list: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Please Include either a toy data set or, if the actual data set is small, lists of factor values. If you are happy to make the information public, give the result vector also (this is less important!) Or you can put the data and, where relevant, your code, on a web site. Be careful about the use of the word "groups" in an experimental design context; speak of "treatment groups" if that is the meaning, or "blocks" if that is what is intended. I suspect that confusion between these two contexts in which the word groups is wont to be used lay behind the use of the EXPNO/COND form of model formula. John Maindonald. On 10 Sep 2005, at 8:00 PM, Larry A Sonna wrote:> From: "Larry A Sonna" <larry_sonna at hotmail.com> > Date: 10 September 2005 12:10:06 AM > To: <r-help at stat.math.ethz.ch> > Subject: [R] Discrepancy between R and SPSS in 2-way, repeated > measures ANOVA > > > Dear R community, > > I am trying to resolve a discrepancy between the way SPSS and R > handle 2-way, repeated measures ANOVA. > > An experiment was performed in which samples were drawn before and > after treatment of four groups of subjects (control and disease > states 1, 2 and 3). Each group contained five subjects. An > experimental measurement was performed on each sample to yield a > "signal". The before and after treatment signals for each subject > were treated as repeated measures. We desire to obtain P values > for disease state ("CONDITION"), and the interaction between signal > over time and disease state ("CONDITION*TIME"). > > Using SPSS, the following output was obtained: > DF SumSq (Type 3) Mean Sq F > value P> > COND 3 42861 14287 > 3.645 0.0355 > > TIME 1 473 > 473 0.175 0.681 > > COND*TIME 3 975 325 > 0.120 0.947 > > Error 16 43219 2701 > > > > By contrast, using the following R command: > > summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND), > Type="III")) > > the output was as follows: > > Df Sum Sq Mean Sq F value Pr(>F) > > COND 3 26516 8839 3.2517 0.03651 * > > TIME 1 473 473 0.1739 0.67986 > > COND:TIME 3 975 325 0.1195 0.94785 > > Residuals 28 76107 2718 > > > > I don't understand why the two results are discrepant. In > particular, I'm not sure why R is yielding 28 DF for the residuals > whereas SPSS only yields 16. Can anyone help?John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.