I am not an expert in this area, but here are some thoughts that may get you
started towards an answer.
First, there are 2 ways (possibly more) that can lead to the data for a runs
test that lead to different theoretical distributions:
1. We have a true or hypothesized value of the median that we subtracted from
the data, therefore each value has 50% probability of being positive/negative
and all are independent of each other (assuming being exactly equal to the
median is impossible or discarded).
2. We have subtracted the sample median from each sample value (and discarded
any 0's) leaving us with exactly half positive and half negative and not
having independence.
In case 1, the 1st observation will always start a run. The second observation
has a 50% chance of being the same sign (F) or different sign (S), with the
probability being 0.5 for each new observation and them all being independent
(under assumption of random selections from population with known/hypothesized
median) and the number of runs equaling the number of S's, this looks like a
binomial to me (with some '-1's inserted in appropriate places.
In case 2, this looks like a hypergeometric distribution, there would be
n!/((n/2)!(n/2)!) possible permutations, just need to compute how many of those
permutations result in x runs to get the probability. There is probably a way
to think about this in terms of balls and urns, but I have not worked that out
yet.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Dale Steele
> Sent: Wednesday, February 10, 2010 6:16 PM
> To: R-help at r-project.org
> Subject: [R] Code find exact distribution for runs test?
>
> I've been attempting to understand the one-sample run test for
> randomness. I've found run.test{tseries} and run.test{lawstat}. Both
> use a large sample approximation for distribution of the total number
> of runs in a sample of n1 observations of one type and n2 observations
> of another type.
>
> I've been unable to find R code to generate the exact distribution and
> would like to see how this could be done (not homework).
>
> For example, given the data:
>
> dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12, -9,
> 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1)
>
> The Monte Carlo permutation approach seems to get me part way.
>
>
> # calculate the number of runs in the data vector
> nruns <- function(x) {
> signs <- sign(x)
> runs <- rle(signs)
> r <- length(runs$lengths)
> return(r)
> }
>
> MC.runs <- function(x, nperm) {
> RUNS <- numeric(nperm)
> for (i in 1:nperm) {
> RUNS[i] <- nruns(sample(x))
> }
> cdf <- cumsum(table(RUNS))/nperm
> return(list(RUNS=RUNS, cdf=cdf, nperm=nperm))
> }
>
> MC.runs(dtemp, 100000)
>
> Thanks. --Dale
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.