Larry A Sonna
2005-Sep-09  14:10 UTC
[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA
Dear R community,
I am trying to resolve a discrepancy between the way SPSS and R handle 
2-way, repeated measures ANOVA.
An experiment was performed in which samples were drawn before and after 
treatment of four groups of subjects (control and disease states 1, 2 and 
3).  Each group contained five subjects.  An experimental measurement was 
performed on each sample to yield a "signal".  The before and after 
treatment signals for each subject were treated as repeated measures.  We 
desire to obtain P values for disease state ("CONDITION"), and the 
interaction between signal over time and disease state
("CONDITION*TIME").
Using SPSS, the following output was obtained:
                      DF        SumSq (Type 3)    Mean Sq    F value     P
COND              3                 42861            14287       3.645 
0.0355
TIME                1                     473               473       0.175 
0.681
COND*TIME     3                     975               325       0.120 
0.947
Error                16                43219             2701
By contrast, using the following R command:
summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND),
Type="III"))
the output was as follows:
                  Df     Sum Sq     Mean Sq     F value  Pr(>F)
COND          3          26516       8839      3.2517     0.03651 *
TIME            1            473         473      0.1739     0.67986
COND:TIME  3            975         325      0.1195     0.94785
Residuals     28        76107      2718
I don't understand why the two results are discrepant.  In particular,
I'm
not sure why R is yielding 28 DF for the residuals whereas SPSS only yields 
16.  Can anyone help?
E-mail replies would be much appreciated.  I can be reached at 
larry_sonna at yahoo.com and at larry_sonna at hotmail.com
Thanks in advance,
Larry Sonna
John Maindonald
2005-Sep-10  12:17 UTC
[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA
There are 20 distinct individuals, right? expno breaks the 20
individuals into five groups of 4, right? Is this a blocking factor?
If expno is treated as a blocking factor, the following is what you get:
 > xy <- expand.grid(expno=letters[1:5],cond=letters[1:4],
+                                    time=factor(paste(1:2)))
 > xy$subj <- factor(paste(xy$expno, xy$cond, sep=":"))
 > xy$cond <- factor(xy$cond)
 > xy$expno <- factor(xy$expno)
 > xy$y <- rnorm(40)
 > summary(aov(y~cond*time+Error(expno/cond), data=xy))
Error: expno
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals  4   3.59    0.90
Error: expno:cond
           Df Sum Sq Mean Sq F value Pr(>F)
cond       3   1.06    0.35    0.36   0.78
Residuals 12  11.86    0.99
Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
time       1   2.27    2.27    1.38   0.26
cond:time  3   3.27    1.09    0.67   0.59
Residuals 16  26.19    1.64
If on the other hand this is analyzed as for a complete
randomized design, the following is the output:
 > summary(aov(y~cond*time+Error(subj), data=xy))
Error: subj
           Df Sum Sq Mean Sq F value Pr(>F)
cond       3   1.06    0.35    0.37   0.78
Residuals 16  15.46    0.97
Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
time       1   2.27    2.27    1.38   0.26
cond:time  3   3.27    1.09    0.67   0.59
Residuals 16  26.19    1.64
On 10 Sep 2005, at 8:00 PM, Larry A Sonna wrote:
> From: "Larry A Sonna" <larry_sonna at hotmail.com>
> Date: 10 September 2005 12:10:06 AM
> To: <r-help at stat.math.ethz.ch>
> Subject: [R] Discrepancy between R and SPSS in 2-way, repeated  
> measures ANOVA
>
>
> Dear R community,
>
> I am trying to resolve a discrepancy between the way SPSS and R  
> handle 2-way, repeated measures ANOVA.
>
> An experiment was performed in which samples were drawn before and  
> after treatment of four groups of subjects (control and disease  
> states 1, 2 and 3).  Each group contained five subjects.  An  
> experimental measurement was performed on each sample to yield a  
> "signal".  The before and after treatment signals for each
subject
> were treated as repeated measures.  We desire to obtain P values  
> for disease state ("CONDITION"), and the interaction between
signal
> over time and disease state ("CONDITION*TIME").
>
> Using SPSS, the following output was obtained:
>                      DF        SumSq (Type 3)    Mean Sq    F  
> value     P>
> COND              3                 42861            14287        
> 3.645 0.0355
>
> TIME                1                     473                
> 473       0.175 0.681
>
> COND*TIME     3                     975               325        
> 0.120 0.947
>
> Error                16                43219             2701
>
>
>
> By contrast, using the following R command:
>
> summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND),  
> Type="III"))
>
> the output was as follows:
>
>                  Df     Sum Sq     Mean Sq     F value  Pr(>F)
>
> COND          3          26516       8839      3.2517     0.03651 *
>
> TIME            1            473         473      0.1739     0.67986
>
> COND:TIME  3            975         325      0.1195     0.94785
>
> Residuals     28        76107      2718
>
>
>
> I don't understand why the two results are discrepant.  In  
> particular, I'm not sure why R is yielding 28 DF for the residuals  
> whereas SPSS only yields 16.  Can anyone help?
>
>
John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
John Maindonald
2005-Sep-13  00:48 UTC
[R] Discrepancy between R and SPSS in 2-way, repeated measures ANOVA
For the record, it turns out that EXPNO ran from 1 to 20, i.e., it identified subject. Thus EXPNO/COND parsed into the two error terms (additional to residual) EXPNO and EXPNO:COND. This second error term accounts for all variation between levels of COND; so there is no COND sum of squares. (In SPSS the fixed effect COND may have taken precedence; I do not know for sure.) In R, if this was a complete randomized design, the term Error(EXPO), or in the mock-up example I gave Error(subj), would be enough on its own. The R implementation can handle error terms akin to Error(REPNO/subj), but because there are redundant model matrix columns generated by the REPNO:subj term, complains that the Error() model is singular. In general, terms of the form a/b should be used only if b is nested within a, i.e., REPNO/IndividualWithinBlock (where IndividualWithinBlock runs from 1 to 4) not REPNO/subj. (Either of these cause REPNO to be treated as a blocking factor). > xy <- expand.grid(REPNO=letters[1:5], COND=letters[1:4], + TIME=factor(paste(1:2))) > xy$subj <- factor(paste(xy$REPNO, xy$COND, sep=":")) > ## Below subj becomes EXPNO > xy$COND <- factor(xy$COND) > xy$REPNO <- factor(xy$REPNO) > xy$y <- rnorm(40) Plea to those who post such questions to the list: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Please Include either a toy data set or, if the actual data set is small, lists of factor values. If you are happy to make the information public, give the result vector also (this is less important!) Or you can put the data and, where relevant, your code, on a web site. Be careful about the use of the word "groups" in an experimental design context; speak of "treatment groups" if that is the meaning, or "blocks" if that is what is intended. I suspect that confusion between these two contexts in which the word groups is wont to be used lay behind the use of the EXPNO/COND form of model formula. John Maindonald. On 10 Sep 2005, at 8:00 PM, Larry A Sonna wrote:> From: "Larry A Sonna" <larry_sonna at hotmail.com> > Date: 10 September 2005 12:10:06 AM > To: <r-help at stat.math.ethz.ch> > Subject: [R] Discrepancy between R and SPSS in 2-way, repeated > measures ANOVA > > > Dear R community, > > I am trying to resolve a discrepancy between the way SPSS and R > handle 2-way, repeated measures ANOVA. > > An experiment was performed in which samples were drawn before and > after treatment of four groups of subjects (control and disease > states 1, 2 and 3). Each group contained five subjects. An > experimental measurement was performed on each sample to yield a > "signal". The before and after treatment signals for each subject > were treated as repeated measures. We desire to obtain P values > for disease state ("CONDITION"), and the interaction between signal > over time and disease state ("CONDITION*TIME"). > > Using SPSS, the following output was obtained: > DF SumSq (Type 3) Mean Sq F > value P> > COND 3 42861 14287 > 3.645 0.0355 > > TIME 1 473 > 473 0.175 0.681 > > COND*TIME 3 975 325 > 0.120 0.947 > > Error 16 43219 2701 > > > > By contrast, using the following R command: > > summary(aov(SIGNAL~(COND+TIME+COND*TIME)+Error(EXPNO/COND), > Type="III")) > > the output was as follows: > > Df Sum Sq Mean Sq F value Pr(>F) > > COND 3 26516 8839 3.2517 0.03651 * > > TIME 1 473 473 0.1739 0.67986 > > COND:TIME 3 975 325 0.1195 0.94785 > > Residuals 28 76107 2718 > > > > I don't understand why the two results are discrepant. In > particular, I'm not sure why R is yielding 28 DF for the residuals > whereas SPSS only yields 16. Can anyone help?John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Bioinformation Science, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200.