Michael Friendly
2011-Oct-14 15:38 UTC
[R] non-parametric permutation and signed paired-difference distributions
Hi all Consider the classic data below from Darwin on the heights of 15 pairs of zea mays (corn) plants either cross-fertilized or self-fertilized, where the goal is to see if it makes a difference. > head(ZeaMays) pair pot cross self diff 1 1 1 23.500 17.375 6.125 2 2 1 12.000 20.375 -8.375 3 3 1 21.000 20.000 1.000 4 4 2 22.000 20.000 2.000 5 5 2 19.125 18.375 0.750 6 6 2 21.500 18.625 2.875 ... I'd like to illustrate two types of non-parametric tests of whether the mean(diff) = 0. (a) Permutation test, where the values of, say self are permuted and diff=cross - self is calculated for each permutation. There are 15! permutations, but a reasonably large number of random permutations would suffice. (b) Test based on assigning each abs(diff) a + or - sign, and calculating the mean(diff). There are 2^15 such possible values, but again, a reasonably large number of random samples would do. This is obviously a case for apply and friends, but I can't quite see how to set it up. The complete data: > dput(ZeaMays) structure(list(pair = 1:15, pot = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), cross = c(23.5, 12, 21, 22, 19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125, 23, 12), self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625, 15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18), diff = c(6.125, -8.375, 1, 2, 0.75, 2.875, 3.5, 5.125, 1.75, 3.625, 7, 3, 9.375, 7.5, -6)), row.names = c(NA, -15L), .Names = c("pair", "pot", "cross", "self", "diff"), class = "data.frame") -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Weidong Gu
2011-Oct-14 17:20 UTC
[R] non-parametric permutation and signed paired-difference distributions
On Fri, Oct 14, 2011 at 11:38 AM, Michael Friendly <friendly at yorku.ca> wrote:> Hi all > Consider the classic data below from Darwin on the heights of 15 pairs of > zea mays (corn) plants > either cross-fertilized or self-fertilized, where the goal is to see if it > makes a difference. > >> head(ZeaMays) > ?pair pot ?cross ? self ? diff > 1 ? ?1 ? 1 23.500 17.375 ?6.125 > 2 ? ?2 ? 1 12.000 20.375 -8.375 > 3 ? ?3 ? 1 21.000 20.000 ?1.000 > 4 ? ?4 ? 2 22.000 20.000 ?2.000 > 5 ? ?5 ? 2 19.125 18.375 ?0.750 > 6 ? ?6 ? 2 21.500 18.625 ?2.875 > ... > > I'd like to illustrate two types of non-parametric tests of whether the > mean(diff) = 0. > > (a) Permutation test, where the values of, say self are permuted and > diff=cross - self > is calculated for each permutation. ?There are 15! permutations, but a > reasonably > large number of random permutations would suffice. > > (b) Test based on assigning each abs(diff) a + or - sign, and calculating > the mean(diff). > There are 2^15 such possible values, but again, a reasonably large number of > random > samples would do. >What do you mean by 'assigning each abs(diff) a + or - sign, and calculating the mean(diff)'? abs(diff) should be all positive, right?> This is obviously a case for apply and friends, but I can't quite see how to > set it up. > > The complete data: > >> dput(ZeaMays) > structure(list(pair = 1:15, pot = structure(c(1L, 1L, 1L, 2L, > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", > "2", "3", "4"), class = "factor"), cross = c(23.5, 12, 21, 22, > 19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125, > 23, 12), self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625, > 15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18), diff = c(6.125, > -8.375, 1, 2, 0.75, 2.875, 3.5, 5.125, 1.75, 3.625, 7, 3, 9.375, > 7.5, -6)), row.names = c(NA, -15L), .Names = c("pair", "pot", > "cross", "self", "diff"), class = "data.frame") > > > > -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology > Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 > Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jean V Adams
2011-Oct-14 18:45 UTC
[R] non-parametric permutation and signed paired-difference distributions
Michael Friendly wrote on 10/14/2011 10:38:44 AM:> > Hi all > Consider the classic data below from Darwin on the heights of 15 pairs > of zea mays (corn) plants > either cross-fertilized or self-fertilized, where the goal is to see if > it makes a difference. > > > head(ZeaMays) > pair pot cross self diff > 1 1 1 23.500 17.375 6.125 > 2 2 1 12.000 20.375 -8.375 > 3 3 1 21.000 20.000 1.000 > 4 4 2 22.000 20.000 2.000 > 5 5 2 19.125 18.375 0.750 > 6 6 2 21.500 18.625 2.875 > ... > > I'd like to illustrate two types of non-parametric tests of whether the > mean(diff) = 0. > > (a) Permutation test, where the values of, say self are permuted and > diff=cross - self > is calculated for each permutation. There are 15! permutations, but a > reasonably > large number of random permutations would suffice.You have paired data. To conduct a permutation test you would randomly assign one member of each pair to being either cross or self-fertilized. The other member of the pair would be assigned to the opposite "treatment". That would lead to 2^15 = 32,768 permutations. See the perm.test() function in the R package exactRankTests.> (b) Test based on assigning each abs(diff) a + or - sign, and > calculating the mean(diff). > There are 2^15 such possible values, but again, a reasonably large > number of random > samples would do.Sounds like you are attempting to combine a sign test and a permutation test. You could do this by using the logical cross>self rather than the difference cross-self. Jean> This is obviously a case for apply and friends, but I can't quite see > how to set it up. > > The complete data: > > > dput(ZeaMays) > structure(list(pair = 1:15, pot = structure(c(1L, 1L, 1L, 2L, > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", > "2", "3", "4"), class = "factor"), cross = c(23.5, 12, 21, 22, > 19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125, > 23, 12), self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625, > 15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18), diff = c(6.125, > -8.375, 1, 2, 0.75, 2.875, 3.5, 5.125, 1.75, 3.625, 7, 3, 9.375, > 7.5, -6)), row.names = c(NA, -15L), .Names = c("pair", "pot", > "cross", "self", "diff"), class = "data.frame")[[alternative HTML version deleted]]