Dale Steele
2010-Feb-26 16:40 UTC
[R] possible arrangements of across sample ties for runs test
I'm trying to implement the two-sample Wald-Wolfowitz runs test. Daniel (1990) suggests a method to deal with ties across samples. His suggestion is to prepare ordered arrangements, one resulting in the fewest number of runs, and one resulting in the largest number of runs. Then take the mean of these. The code below counts 9 runs for my example data where '60' is tied across samples. X <- c(58, 62, 55, 60, 60, 67) n1 <- length(X) Y <- c(60, 59, 72, 73, 56, 53, 50, 50) n2 <- length(Y) data <- c(X, Y) names(data) <- c(rep("X", n1), rep("Y", n2)) data <- sort(data) runs <- rle(names(data)) r <- length(runs$lengths) r Y Y Y X Y X Y X X Y X X Y Y 50 50 53 55 56 58 59 60 60 60 62 67 72 73 --> r = 9 runs The other possible orderings are: Y Y Y X Y X Y X Y X X X Y Y --> 9 runs 50 50 53 55 56 58 59 60 60 60 62 67 72 73 Y Y Y X Y X Y Y X X X X Y Y --> 7 runs 50 50 53 55 56 58 59 60 60 60 62 67 72 73 How to I generate the other possible orderings? Thus, far, I've found a day to identify cross sample duplicates... # find the ties across samples dd <- data[duplicated(data)] #find all duplicates idd <- dd %in% X & dd %in% Y #determine found in both X and Y duplicates <- dd[idd] Thanks! --Dale [[alternative HTML version deleted]]