> Date: Fri, 31 Mar 2006 14:44:05 -0800 (PST)
> From: "Fred J." <phddas at yahoo.com>
>
> given a numeric array (a sequence of reals), I am interested in
> finding the subsets of sequences (each with start and end index) which
match a given sd range.
>
> I read the docs on match and which and the "see also" but could
not come up with a way. I could loop with a stepping window over the sequence
but that would be limited to a fixed size window, I guess I as well loop through
all window sizes from 1:length(data) but that is not elegant. any hints towards
a solution?
>
Presumably your "array" is a vector, in which case you can use
vectorisation to eliminate one of the loops. The following function
produces a matrix in upper triangular form for which res[i, j] is the
variance of the sequence dat[i:j] (where dat is your "array"). It is
(approximately) two orders of magnitude faster than the double loop,
but doesn't produce exactly the same numbers because of numerical
rounding because it uses the formula: variance = (sum(xi^2) -
n*xbar^2)/(n - 1)
Now all you have to do is test this matrix against the range of
variances you want to isolate, and use the matching indices to provide
you with the endpoints of your subsets.
Hope this helps,
Ray Brownrigg
----
allvar <- function(dat) {
len <- length(dat)
res <- numeric((len - 1)*len/2)
mym <- dat[1]
myv <- 0
for (j in 2:len) {
oldm <- mym
mym <- (((j - 1):1)*mym + dat[j])/(j:2)
myv <- (((j - 2):0)*myv + ((j - 1):1)*oldm^2 + dat[j]^2 - (j:2)*mym^2)/
((j - 1):1)
res[((j - 2)*(j - 1)/2 + 1):((j - 1)*j/2)] <- myv
mym <-c(mym, dat[j])
myv <- c(myv, 0)
}
return(res)
}
# Now test it> n <- 500; dat <- runif(n)
> unix.time({res2 <- matrix(0, n, n); res2[upper.tri(res2)] <-
allvar(dat)})
[1] 0.64 0.00 0.64 0.00 0.00> unix.time({res1 <- matrix(0, n, n); for (j in 2:n) for (i in 1:(j - 1))
res1[i, j] <- var(dat[i:j])})
[1] 72.64 13.74 86.38 0.00 0.00> max(abs(res1 - res2))
[1] 1.249001e-15>
----