Hi, I have a vector like: r <- runif(100) Now I would like to split r into 10 pieces (each with 10 elements) ? but the 'pieces' should be roughly similar with regard to mean and sd. what is an efficient way to do this in R? thanks!
Hi Martin,
Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:
# create the vector to be split
r <- runif(100)
# write a function to split it, with various knobs and toggles
splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500,
verbose=FALSE) {
M <- mean.tol+1
SD <- sd.tol+1
I <- 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
while((M > mean.tol | SD > sd.tol) & I <= itr) {
I <- I + 1
## pick another split
x1 <- data.frame(g = rep(letters[1:n], length(x)/n),
value = sample(x, length(x)))
M <- sd(tapply(x1$value, x1$g, FUN=mean))
SD <- sd(tapply(x1$value, x1$g, FUN=sd))
if(verbose) {
cat("M = ", M, ", mean.tol =", mean.tol, ": SD =
", SD, ",
sd.tol=", sd.tol, "\n")
}
}
# don't try forever...
if(I >= itr) {
stop("failed to find split matching criteria: try increasing
tolerance")
} else {
return(x1)
}
}
# now use our function to find a set of splits within our mean and sd
tolerance.
tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)
# adjust some of the dials and switches to suit...
tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)
Best,
Ista
On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
<batholdy at googlemail.com> wrote:> Hi,
>
>
> I have a vector like:
>
> r <- runif(100)
>
> Now I would like to split r into 10 pieces (each with 10 elements) ?
> but the 'pieces' should be roughly similar with regard to mean and
sd.
>
> what is an efficient way to do this in R?
>
>
> thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
On Dec 19, 2012, at 12:23 PM, Martin Batholdy wrote:> Hi, > > > I have a vector like: > > r <- runif(100) > > Now I would like to split r into 10 pieces (each with 10 elements) ? > but the 'pieces' should be roughly similar with regard to mean and sd. > > what is an efficient way to do this in R? >> m <- sort(runif(100)) > do.call(rbind, split(m, (1:100)%%10 ))[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 0 0.073246870 0.17794968 0.2923314 0.4314560 0.4774632 0.6035957 0.7122246 0.7671372 0.8759190 0.9994554 1 0.004766445 0.08639538 0.1922977 0.2976945 0.4327731 0.4966852 0.6094609 0.7124650 0.7771450 0.9009393 2 0.016612211 0.12028226 0.2052309 0.3336055 0.4349006 0.5161239 0.6204279 0.7149662 0.7830977 0.9022377 3 0.027497879 0.12147150 0.2061456 0.3427435 0.4381574 0.5179506 0.6252453 0.7244906 0.8065418 0.9055773 4 0.028392933 0.12856468 0.2086340 0.3482647 0.4420098 0.5308244 0.6348948 0.7271810 0.8202800 0.9072492 5 0.042657119 0.14656184 0.2251334 0.3487408 0.4484275 0.5423360 0.6480134 0.7298033 0.8298771 0.9297432 6 0.045639209 0.15821977 0.2372649 0.3816321 0.4561417 0.5481704 0.6758081 0.7309329 0.8355179 0.9427048 7 0.050771165 0.16489115 0.2625372 0.4225952 0.4701286 0.5512640 0.6765688 0.7508822 0.8510762 0.9444102 8 0.051595323 0.16541512 0.2713721 0.4235584 0.4724879 0.5652690 0.7066615 0.7512220 0.8625107 0.9610963 9 0.057932068 0.17766175 0.2834772 0.4284754 0.4725581 0.5782843 0.7084244 0.7533327 0.8668086 0.9961111> res <- do.call(rbind, split(m, (1:100)%%10 ))Rows could be unsorted via apply(res, 1, sample, 10)> apply(res, 1, mean)0 1 2 3 4 5 6 7 8 9 0.5410779 0.4510622 0.4647485 0.4715821 0.4776296 0.4891294 0.5012032 0.5145125 0.5231188 0.5323066> apply(res, 1, sd)0 1 2 3 4 5 6 7 8 9 0.3046305 0.3031683 0.2957381 0.2978136 0.2992292 0.2988865 0.2987615 0.2967925 0.3019649 0.3047879>-- David Winsemius Alameda, CA, USA