Daniel Bolnick
2007-Apr-26 01:45 UTC
[R] ANOVA results in R conflicting with results in other software packages
Hi, I'm wrestling with an analysis of a dataset, which I previously analyzed in SYSTAT, but am now converting to R and was doing a re-analysis. I noticed, however, that the same model yields different results (different sums of squares) from the two programs. I first thought this might be because the two programs use different calculations to get the sums of squares, but the problem persisted even after I specified type III sums of squares. Can anyone help me by clarifying why there is this discrepancy? The data table is: host size2 maladapt increase A yes 35 21 A yes 30 13 A no 73 -6 A yes 22 3 C yes 19 -1 A no 53 1 C no 48 -27 A yes 32 26 A yes 14 1 A no 83 42 A yes 19 -3 A no 66 -7 C no 69 -14 A yes 30 30 C no 69 -22 A yes 10 6 C no 65 -15 A yes 11 4 A yes 15 15 A no 77 30 C yes 11 11 A no 48 -4 C yes 29 -4 A yes 0 0 C no 69 -2 A yes 10 -40 C yes 8 -6 C no 91 -2 C no 65 13 A yes 12 0 C yes 16 -26 C yes 38 -12 A no 43 20 C no 81 -7 A yes 9 9 C no 100 25 A yes 18 12 C yes 27 -6 A yes 11 -3 The dialogue in R is as follows:> > library(car) > > > read.table(file="/Users/lukeharmon/Desktop/glmnosil.txt", >header=T)->nn > > attach(nn) > > ls(2) >[1] "host" "increase" "maladapt" "size2" "size4" > > lm(maladapt~host*increase*size2) > >Call: >lm(formula = maladapt ~ host * increase * size2) > >Coefficients: > (Intercept) hostC >increase size2yes > 59.54144 17.13828 >0.34487 -44.41381 > hostC:increase hostC:size2yes >increase:size2yes hostC:increase:size2yes > 0.30449 -12.50558 >0.03766 -0.90697 > > > lm(maladapt~host*increase*size2)->fm > > Anova(fm, type="III") >Anova Table (Type III tests) > >Response: maladapt > Sum Sq Df F value Pr(>F) >(Intercept) 18348.5 1 152.9683 1.595e-13 *** >host 920.9 1 7.6774 0.009366 ** >increase 278.4 1 2.3210 0.137773 >size2 7447.0 1 62.0841 6.806e-09 *** >host:increase 105.1 1 0.8758 0.356584 >host:size2 266.9 1 2.2252 0.145880 >increase:size2 2.0 1 0.0171 0.896902 >host:increase:size2 332.3 1 2.7703 0.106108 >Residuals 3718.4 31 >--- >Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Contrast this with the results from SYSTAT SourceSum-of-SquaresdfMean-SquareF-ratioP HOST$808.9491808.9496.7440.014 SIZE2$17525.418117525.418146.1060.000 INCREASE540.5791540.5794.5070.042 SIZE2$*HOST$266.9151266.9152.2250.146 SIZE2$*INCREASE279.3891279.3892.3290.137 HOST$*INCREASE35.869135.8690.2990.588 SIZE2$*HOST$*INCREASE332.2931332.2932.7700.106 Error3718.44131119.950 I've been trying to find anything in the documentation for anova() that would give a default that is different from what is in SYSTAT, but part of the problem is that SYSTAT is somewhat opaque as to its calculations, so it is hard to contrast the two. I would really really welcome feedback as to what may cause this discrepancy. Thanks very much for your help, Dan Bolnick Section of Integrative Biology University of Texas at Austin [[alternative HTML version deleted]]
Richard M. Heiberger
2007-Apr-26 02:16 UTC
[R] ANOVA results in R conflicting with results in other software packages
It looks from your tables that you have the same residual in both programs, suggesting that the arithmetic is correct. The terms are in a different order. Since anova() gives sequential sums of squares (Type I), the numerical values depend on the order. Force both programs to use the same order for the terms. Here is how in R.> tmp <- data.frame(y=rnorm(12),+ a=factor(rep(letters[1:2],6)), + b=factor(rep(letters[3:5], each=4)), + d=factor(rep(LETTERS[6:9], each=3)))> anova(aov(terms(y ~ a*b + d, keep.order=TRUE), data=tmp))Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) a 1 4.3786 4.3786 11.5479 0.04252 * b 2 0.0688 0.0344 0.0907 0.91567 a:b 2 2.1225 1.0612 2.7988 0.20612 d 3 3.2254 1.0751 2.8354 0.20741 Residuals 3 1.1375 0.3792 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> anova(aov(y ~ a*b + d, data=tmp))Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) a 1 4.3786 4.3786 11.5479 0.04252 * b 2 0.0688 0.0344 0.0907 0.91567 d 3 3.0020 1.0007 2.6391 0.22327 a:b 2 2.3458 1.1729 3.0933 0.18661 Residuals 3 1.1375 0.3792 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1>If you need to resend this, please put some spacing in the SYSTAT output to make it legible. Rich
Simon Blomberg
2007-Apr-26 02:28 UTC
[R] ANOVA results in R conflicting with results in other software packages
R uses "treatment" contrasts for factors (ie 0/1 coding) by default. Systat is using "sum" (ie sum to zero) contrasts: Try this: options(contrasts=c("contr.sum", "contr.poly") lm(maladapt~host*increase*size2)->fm Anova(fm, type="III") I won't discuss the dangers of "types" of sums of squares and different contrast codings. That would be tempting the wrath of the gods. See section 7.18 in the R FAQ. John Fox's "Companion" book also has a brief discussion (p. 140). Cheers, Simon. -- Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320, Goddard Building (8) T: +61 7 3365 2506 email: S.Blomberg1_at_uq.edu.au The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey.
Possibly Parallel Threads
- Indirect routing issue?
- Route certain trafic via a tinc node that is not directly connected.
- Route certain trafic via a tinc node that is not directly connected.
- Route certain trafic via a tinc node that is not directly connected.
- ProxyCommand and ExitOnForwardFailure = leftover process