Ghandalf
2012-Mar-09 05:49 UTC
[R] (Fisher) Randomization Test for Matched Pairs: Permutation Data Setup Based on Signs
Hi, I am currently attempting to write a small program for a randomization test (based on rank/combination) for matched pairs. If you will please allow me to introduce you to some background information regarding the test prior to my question at hand, or you may skip down to the bold portion for my issue. There are two sample sizes; the data, as I am sure you guessed, is matched into pairs and each pair's difference is denoted by Di. The test statistic =*T* = Sum(Di) (only for those Di > 0). The issue I am having is based on the method required to use in R to setup the data into the proper structure. I am to consider the absolute value of Di, without regard to their sign. There are 2^n ways of assigning + or - signs to the set of absolute differences obtained, where n = the number of Dis. That is, we can assign + signs to all n of the |Di|, or we might assign + to |D1| but - signs to |D2| to |Dn|, and so forth. So, for example, if I have *D1=-16, D2=-4, D3=-7, D4=-3, D5=-5, D6=+1, and D7=-10 and n=7. * I need to consider the 2^7 ways of assigning signs that result in the lowest sum of the "positive" absolute difference. To exemplify further, we have * -16, -4, -7, -3, -5, -1, -10 T = 0 -16, -4, -7, -3, -5, +1, -10 T = 1 -16, -4, -7, +3, -5, -1, -10 T = 3 -16, -4, -7, +3, -5, +1, -10 T = 4 * ... and so on. So, if you are willing to help me, I am having trouble on setting up my data as illustrated above./ How do I create (a code for) the 2^n lines of data required with all the possible combinations of + and - in order to calculate the positive values in each line (the test statistic, T)?/ I have tried to use combn(d=data set, n=7) with a data set, d, consisting of both the positive and negative sign of the respective value, to no avail. I apologize if this is lengthy, I was not sure how to ask the aforementioned question without incorrectly portraying my thoughts. If any clarification is required then I will by more than willing to oblige with any further explanation. I have searched for possible solutions, but alas, came out empty handed. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Fisher-Randomization-Test-for-Matched-Pairs-Permutation-Data-Setup-Based-on-Signs-tp4458606p4458606.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2012-Mar-11 02:17 UTC
[R] (Fisher) Randomization Test for Matched Pairs: Permutation Data Setup Based on Signs
In general, I *think* this is a hard problem (it sounds knapsack-ish) but since you are on small enough data sets, that's probably not so important: if I understand you right, this little function will help you. plusminus <- function(n){ t(as.matrix(do.call(expand.grid, rep(list(c(-1,1)), n)))) } plusminus(3) plusminus(5) If you multiply the output of this function by your data set you will have rows corresponding to all possible sign choices: e.g., plusminus(3) * c(1,2,3) Then you can colSums() using only the positive elements: x <- plusminus(3) * c(1,2,3) x[x < 0] <- 0 colSums(x) To wrap this all in one function: I'd do something like this: test.statistic <- function(v){ m <- t(as.matrix(do.call(expand.grid, rep(list(c(-1, 1)), length(v))))) x <- m * v x[x<0] <- 0 out <- rbind(m * v, colSums(x)) rownames(out)[length(rownames(out))] <- "Sum of Positive Elements" out } X <- test.statistic(c(-16, -4, -7, -3, -5, +1, -10)) X[,1:10] Hopefully that helps (I'm a little fuzzy on your overall goal -- so that second bit might be a red herring) Michael On Fri, Mar 9, 2012 at 12:49 AM, Ghandalf <moolag- at hotmail.com> wrote:> Hi, > > I am currently attempting to write a small program for a randomization test > (based on rank/combination) for matched pairs. If you will please allow me > to introduce you to some background information regarding the test prior to > my question at hand, or you may skip down to the bold portion for my issue. > > There are two sample sizes; the data, as I am sure you guessed, is matched > into pairs and each pair's difference is denoted by Di. > > The test statistic =*T* = Sum(Di) (only for those Di > 0). > > The issue I am having is based on the method required to use in R to setup > the data into the proper structure. I am to consider the absolute value of > Di, without regard to their sign. There are 2^n ways of assigning + or - > signs to the set of absolute differences obtained, where n = the number of > Dis. That is, we can assign + signs to all n of the |Di|, or we might assign > + to |D1| but - signs to |D2| to |Dn|, and so forth. > > ?So, for example, if I have *D1=-16, D2=-4, D3=-7, D4=-3, D5=-5, D6=+1, and > D7=-10 and n=7. * > I need to consider the 2^7 ways of assigning signs that result in the lowest > sum of the "positive" absolute difference. To exemplify further, we have > * > -16, -4, -7, -3, -5, -1, -10 ? ? ? ? ? ?T = 0 > -16, -4, -7, -3, -5, +1, -10 ? ? ? ? ? T = 1 > -16, -4, -7, +3, -5, -1, -10 ? ? ? ? ? T = 3 > -16, -4, -7, +3, -5, +1, -10 ? ? ? ? ?T = 4 * > ... and so on. > > So, if you are willing to help me, I am having trouble on setting up my data > as illustrated above./ How do I create (a code for) the 2^n lines of data > required with all the possible combinations of + and - in order to calculate > the positive values in each line (the test statistic, T)?/ I have tried to > use combn(d=data set, n=7) with a data set, d, consisting of both the > positive and negative sign of the respective value, to no avail. > > I apologize if this is lengthy, I was not sure how to ask the aforementioned > question without incorrectly portraying my thoughts. If any clarification is > required then I will by more than willing to oblige with any further > explanation. I have searched for possible solutions, but alas, came out > empty handed. > > Thank you. > > -- > View this message in context: http://r.789695.n4.nabble.com/Fisher-Randomization-Test-for-Matched-Pairs-Permutation-Data-Setup-Based-on-Signs-tp4458606p4458606.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Petr Savicky
2012-Mar-11 08:30 UTC
[R] (Fisher) Randomization Test for Matched Pairs: Permutation Data Setup Based on Signs
On Thu, Mar 08, 2012 at 09:49:20PM -0800, Ghandalf wrote:> Hi, > > I am currently attempting to write a small program for a randomization test > (based on rank/combination) for matched pairs. If you will please allow me > to introduce you to some background information regarding the test prior to > my question at hand, or you may skip down to the bold portion for my issue. > > There are two sample sizes; the data, as I am sure you guessed, is matched > into pairs and each pair's difference is denoted by Di. > > The test statistic =*T* = Sum(Di) (only for those Di > 0). > > The issue I am having is based on the method required to use in R to setup > the data into the proper structure. I am to consider the absolute value of > Di, without regard to their sign. There are 2^n ways of assigning + or - > signs to the set of absolute differences obtained, where n = the number of > Dis. That is, we can assign + signs to all n of the |Di|, or we might assign > + to |D1| but - signs to |D2| to |Dn|, and so forth. > > So, for example, if I have *D1=-16, D2=-4, D3=-7, D4=-3, D5=-5, D6=+1, and > D7=-10 and n=7. * > I need to consider the 2^7 ways of assigning signs that result in the lowest > sum of the "positive" absolute difference. To exemplify further, we have > * > -16, -4, -7, -3, -5, -1, -10 T = 0 > -16, -4, -7, -3, -5, +1, -10 T = 1 > -16, -4, -7, +3, -5, -1, -10 T = 3 > -16, -4, -7, +3, -5, +1, -10 T = 4 * > ... and so on.Hi. The minimum sum of "positive" absolute differencies is always zero and is achieved for every sign combination, which assigns -1 to all nonzero abs(Di) and any sign to zero abs(Di). In particular, the combination rep(-1, times=7) is a solution. I am not sure, whether this is, what you are asking for. Can you give more detail? Petr Savicky.