Stéphane Adamowicz
2014-Sep-23 07:14 UTC
[R] ANOVA and permutation tests : beware of traps
Recently, I came across a strange and potentially troublesome behaviour of the lm and aov functions that ask questions about calculation accuracy. Let us consider the 2 following datasets dat1 & dat2 :> (dat1 <- data.frame(Y=c(1:3, 10+1:3), F=c(rep("A",3), rep("B",3))))Y F 1 1 A 2 2 A 3 3 A 4 11 B 5 12 B 6 13 B> (dat2 <- data.frame(Y=c(10+1:3, 1:3), F=c(rep("A",3), rep("B",3))))Y F 1 11 A 2 12 A 3 13 A 4 1 B 5 2 B 6 3 B They only differ in the order of values that were exchanged between samples A and B. Thus the sd is 1 for each sample in either data sets, and the absolute mean difference |A-B| is 10 in both datasets. Now, let us perform an anova to compare samples A and B in both datasets (of course, in such simple case, a bilateral T test would do the job, but an anova is nevertheless allowed and should give the same probability than Student's test):> (anova1 <- anova(lm(Y~F, dat1)))Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) F 1 150 150 150 0.0002552 *** Residuals 4 4 1> (anova2 <- anova(lm(Y~F, dat2)))Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) F 1 150 150 150 0.0002552 *** Residuals 4 4 1 As expected, both datasets give a same anova table, but this is only apparent. Indeed :> anova1$F[1] == anova2$F[1][1] FALSE> anova1$F[1] - anova2$F[1][1] 5.684342e-14 In fact the F values differ slightly, and this holds also for the aov function. I checked also (not shown) that both the residual and factorial sums of squares differ between dat1 and dat2. Thus, for some undocumented reason (at least for the end user), the F values depend on the order of data! While such tiny differences (e-14 in this example) are devoid of consequences on the risk evaluation by Fisher's distribution, they may have huge consequences on the risk evaluation by the permutation method. Indeed, the shift from continuous to discrete distributions is far from being insignificant. For instance, the following code in R is at the basis of many permutation algorithms found in the internet and in teaching because it seems quite straightforward (see for example http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html http://www.usna.edu/Users/math/jager/courses/sm439/labs/Lab7.pdf http://biostatistics1014.blogspot.fr/2013/04/one-way-anova-permutation-test-and.html http://adn.biol.umontreal.ca/~numericalecology/Rcode/):> nperm <- 1000 # number of permutations > Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) # calculates nperm F values > (prob <- (sum(anova1$F[1] <= Fperm) + 1)/(nperm +1)) # risk calculation[1] 0.04695305 Of course, because of the sample function, repeating this code gives different prob values, but they remain always close to 5% instead of the exact probability of 10%. Indeed, there are only choose(6,3) = 20 possible permutations, but because they are symmetric, they give only 10 absolute mean differences. Thus, the only exact probabilities are 10%, 20% ... 100%. In our example where samples do not overlap, 10% is obviously the right answer. Thus, the use of lm and aov functions in permutation methods does not seem a good idea as it results in biases that underestimate the exact risk. In the simple case of one-way anova, it is quite simple to remedy this problem. As the total Sums of squares and the degrees of freedom do not change with permutations, it is easier and much faster to compare the residual sums of squares. For instance, the exact probabilities can be calculated that way :> combi <- combn(6, 3, FUN=function(i) {append(dat1$Y[i], dat1$Y[-i])}) # all permutations > SCEResid <- apply(combi, 2, FUN=function(x) sum(tapply(x, dat1$F, function(x) sum((x - mean(x))^2)))) # all resi SS > (prob <- mean(SCEResid <= SCEResid[1])) # risk calculation[1] 0.1 10% is indeed the exact risk. Finally, my problem is : How can we know if R libraries that use randomization procedures are not biased ? In the basic case of one way anova, it seems easy to submit the above example (by the way, the defunct lmPerm library does not succeed ...), but how can we check more complex anova models ? [[alternative HTML version deleted]]
Hi Stephane, This is the well known result of limitted floating point precision (e.g., http://www.validlab.com/goldberg/addendum.html). Using a test of approximate rather than exact equality shows R yields the correct answer: nperm <- 10000 Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) # calculates nperm F values (prob <- (sum(anova1$F[1] <= (Fperm + .Machine$double.eps ^ 0.5)) + 1)/(nperm +1)) # risk calculation yields 0.10009 I picked .Machine$double.eps ^ 0.5 somewhat arbitrarily as the default for equality testing from ?all.equal Cheers, Josh On Tue, Sep 23, 2014 at 5:14 PM, St?phane Adamowicz < stephane.adamowicz at avignon.inra.fr> wrote:> Recently, I came across a strange and potentially troublesome behaviour of > the lm and aov functions that ask questions about calculation accuracy. Let > us consider the 2 following datasets dat1 & dat2 : > > > (dat1 <- data.frame(Y=c(1:3, 10+1:3), F=c(rep("A",3), rep("B",3)))) > Y F > 1 1 A > 2 2 A > 3 3 A > 4 11 B > 5 12 B > 6 13 B > > (dat2 <- data.frame(Y=c(10+1:3, 1:3), F=c(rep("A",3), rep("B",3)))) > Y F > 1 11 A > 2 12 A > 3 13 A > 4 1 B > 5 2 B > 6 3 B > > They only differ in the order of values that were exchanged between > samples A and B. Thus the sd is 1 for each sample in either data sets, and > the absolute mean difference |A-B| is 10 in both datasets. > Now, let us perform an anova to compare samples A and B in both datasets > (of course, in such simple case, a bilateral T test would do the job, but > an anova is nevertheless allowed and should give the same probability than > Student's test): > > > (anova1 <- anova(lm(Y~F, dat1))) > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > F 1 150 150 150 0.0002552 *** > Residuals 4 4 1 > > > (anova2 <- anova(lm(Y~F, dat2))) > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > F 1 150 150 150 0.0002552 *** > Residuals 4 4 1 > > As expected, both datasets give a same anova table, but this is only > apparent. Indeed : > > > anova1$F[1] == anova2$F[1] > [1] FALSE > > anova1$F[1] - anova2$F[1] > [1] 5.684342e-14 > > In fact the F values differ slightly, and this holds also for the aov > function. I checked also (not shown) that both the residual and factorial > sums of squares differ between dat1 and dat2. Thus, for some undocumented > reason (at least for the end user), the F values depend on the order of > data! > While such tiny differences (e-14 in this example) are devoid of > consequences on the risk evaluation by Fisher's distribution, they may have > huge consequences on the risk evaluation by the permutation method. Indeed, > the shift from continuous to discrete distributions is far from being > insignificant. > > For instance, the following code in R is at the basis of many permutation > algorithms found in the internet and in teaching because it seems quite > straightforward (see for example > http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html > http://www.usna.edu/Users/math/jager/courses/sm439/labs/Lab7.pdf > > http://biostatistics1014.blogspot.fr/2013/04/one-way-anova-permutation-test-and.html > http://adn.biol.umontreal.ca/~numericalecology/Rcode/): > > > nperm <- 1000 # number of permutations > > Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) # > calculates nperm F values > > (prob <- (sum(anova1$F[1] <= Fperm) + 1)/(nperm +1)) # risk calculation > [1] 0.04695305 > > Of course, because of the sample function, repeating this code gives > different prob values, but they remain always close to 5% instead of the > exact probability of 10%. Indeed, there are only choose(6,3) = 20 possible > permutations, but because they are symmetric, they give only 10 absolute > mean differences. Thus, the only exact probabilities are 10%, 20% ... 100%. > In our example where samples do not overlap, 10% is obviously the right > answer. > > Thus, the use of lm and aov functions in permutation methods does not seem > a good idea as it results in biases that underestimate the exact risk. In > the simple case of one-way anova, it is quite simple to remedy this > problem. As the total Sums of squares and the degrees of freedom do not > change with permutations, it is easier and much faster to compare the > residual sums of squares. For instance, the exact probabilities can be > calculated that way : > > > combi <- combn(6, 3, FUN=function(i) {append(dat1$Y[i], dat1$Y[-i])}) # > all permutations > > SCEResid <- apply(combi, 2, FUN=function(x) sum(tapply(x, dat1$F, > function(x) sum((x - mean(x))^2)))) # all resi SS > > (prob <- mean(SCEResid <= SCEResid[1])) # risk calculation > [1] 0.1 > > 10% is indeed the exact risk. > > Finally, my problem is : How can we know if R libraries that use > randomization procedures are not biased ? In the basic case of one way > anova, it seems easy to submit the above example (by the way, the defunct > lmPerm library does not succeed ...), but how can we check more complex > anova models ? > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]]
Beware of the trap of listening to people with no knowledge of basic numerical methods! It really is basic that the results of floating-point computer calculations depends on the order in which they are done (and the compiler can change the order). Using == on such calculations is warned about on its help page. The same warning applies to using <= if equality is important. It bears repeating that the problems are exacerbated on platforms which use extended-precision registers for some but not all of their calculations (and most R platforms do). The classic reference is pp. 248ff of http://www.validlab.com/goldberg/paper.pdf (part of a Sun manual). On 23/09/2014 08:14, St?phane Adamowicz wrote:> Recently, I came across a strange and potentially troublesome behaviour of the lm and aov functions that ask questions about calculation accuracy. Let us consider the 2 following datasets dat1 & dat2 : > >> (dat1 <- data.frame(Y=c(1:3, 10+1:3), F=c(rep("A",3), rep("B",3)))) > Y F > 1 1 A > 2 2 A > 3 3 A > 4 11 B > 5 12 B > 6 13 B >> (dat2 <- data.frame(Y=c(10+1:3, 1:3), F=c(rep("A",3), rep("B",3)))) > Y F > 1 11 A > 2 12 A > 3 13 A > 4 1 B > 5 2 B > 6 3 B > > They only differ in the order of values that were exchanged between samples A and B. Thus the sd is 1 for each sample in either data sets, and the absolute mean difference |A-B| is 10 in both datasets. > Now, let us perform an anova to compare samples A and B in both datasets (of course, in such simple case, a bilateral T test would do the job, but an anova is nevertheless allowed and should give the same probability than Student's test): > >> (anova1 <- anova(lm(Y~F, dat1))) > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > F 1 150 150 150 0.0002552 *** > Residuals 4 4 1 > >> (anova2 <- anova(lm(Y~F, dat2))) > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > F 1 150 150 150 0.0002552 *** > Residuals 4 4 1 > > As expected, both datasets give a same anova table, but this is only apparent. Indeed : > >> anova1$F[1] == anova2$F[1] > [1] FALSE >> anova1$F[1] - anova2$F[1] > [1] 5.684342e-14 > > In fact the F values differ slightly, and this holds also for the aov function. I checked also (not shown) that both the residual and factorial sums of squares differ between dat1 and dat2. Thus, for some undocumented reason (at least for the end user), the F values depend on the order of data! > While such tiny differences (e-14 in this example) are devoid of consequences on the risk evaluation by Fisher's distribution, they may have huge consequences on the risk evaluation by the permutation method. Indeed, the shift from continuous to discrete distributions is far from being insignificant. > > For instance, the following code in R is at the basis of many permutation algorithms found in the internet and in teaching because it seems quite straightforward (see for example http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Permutation%20Anova/PermTestsAnova.html > http://www.usna.edu/Users/math/jager/courses/sm439/labs/Lab7.pdf > http://biostatistics1014.blogspot.fr/2013/04/one-way-anova-permutation-test-and.html > http://adn.biol.umontreal.ca/~numericalecology/Rcode/): > >> nperm <- 1000 # number of permutations >> Fperm <- replicate(n=nperm, anova(lm(sample(Y) ~ F, dat1))$F[1]) # calculates nperm F values >> (prob <- (sum(anova1$F[1] <= Fperm) + 1)/(nperm +1)) # risk calculation > [1] 0.04695305 > > Of course, because of the sample function, repeating this code gives different prob values, but they remain always close to 5% instead of the exact probability of 10%. Indeed, there are only choose(6,3) = 20 possible permutations, but because they are symmetric, they give only 10 absolute mean differences. Thus, the only exact probabilities are 10%, 20% ... 100%. In our example where samples do not overlap, 10% is obviously the right answer. > > Thus, the use of lm and aov functions in permutation methods does not seem a good idea as it results in biases that underestimate the exact risk. In the simple case of one-way anova, it is quite simple to remedy this problem. As the total Sums of squares and the degrees of freedom do not change with permutations, it is easier and much faster to compare the residual sums of squares. For instance, the exact probabilities can be calculated that way : > >> combi <- combn(6, 3, FUN=function(i) {append(dat1$Y[i], dat1$Y[-i])}) # all permutations >> SCEResid <- apply(combi, 2, FUN=function(x) sum(tapply(x, dat1$F, function(x) sum((x - mean(x))^2)))) # all resi SS >> (prob <- mean(SCEResid <= SCEResid[1])) # risk calculation > [1] 0.1 > > 10% is indeed the exact risk. > > Finally, my problem is : How can we know if R libraries that use randomization procedures are not biased ? In the basic case of one way anova, it seems easy to submit the above example (by the way, the defunct lmPerm library does not succeed ...), but how can we check more complex anova models ? > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK