Hi, I have a vector like: r <- runif(100) Now I would like to split r into 10 pieces (each with 10 elements) ? but the 'pieces' should be roughly similar with regard to mean and sd. what is an efficient way to do this in R? thanks!
Hi Martin, Interesting question. This is not efficient, but I thought I would post a brute force method that might be good enough. Surely someone will have a better approach... Well we'll see. Here is a dumb, inefficient (but workable) way: # create the vector to be split r <- runif(100) # write a function to split it, with various knobs and toggles splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500, verbose=FALSE) { M <- mean.tol+1 SD <- sd.tol+1 I <- 0 # as long as the sd of the means and standard deviations are greater than tolerance... while((M > mean.tol | SD > sd.tol) & I <= itr) { I <- I + 1 ## pick another split x1 <- data.frame(g = rep(letters[1:n], length(x)/n), value = sample(x, length(x))) M <- sd(tapply(x1$value, x1$g, FUN=mean)) SD <- sd(tapply(x1$value, x1$g, FUN=sd)) if(verbose) { cat("M = ", M, ", mean.tol =", mean.tol, ": SD = ", SD, ", sd.tol=", sd.tol, "\n") } } # don't try forever... if(I >= itr) { stop("failed to find split matching criteria: try increasing tolerance") } else { return(x1) } } # now use our function to find a set of splits within our mean and sd tolerance. tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1) # adjust some of the dials and switches to suit... tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000) Best, Ista On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy <batholdy at googlemail.com> wrote:> Hi, > > > I have a vector like: > > r <- runif(100) > > Now I would like to split r into 10 pieces (each with 10 elements) ? > but the 'pieces' should be roughly similar with regard to mean and sd. > > what is an efficient way to do this in R? > > > thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Dec 19, 2012, at 12:23 PM, Martin Batholdy wrote:> Hi, > > > I have a vector like: > > r <- runif(100) > > Now I would like to split r into 10 pieces (each with 10 elements) ? > but the 'pieces' should be roughly similar with regard to mean and sd. > > what is an efficient way to do this in R? >> m <- sort(runif(100)) > do.call(rbind, split(m, (1:100)%%10 ))[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 0 0.073246870 0.17794968 0.2923314 0.4314560 0.4774632 0.6035957 0.7122246 0.7671372 0.8759190 0.9994554 1 0.004766445 0.08639538 0.1922977 0.2976945 0.4327731 0.4966852 0.6094609 0.7124650 0.7771450 0.9009393 2 0.016612211 0.12028226 0.2052309 0.3336055 0.4349006 0.5161239 0.6204279 0.7149662 0.7830977 0.9022377 3 0.027497879 0.12147150 0.2061456 0.3427435 0.4381574 0.5179506 0.6252453 0.7244906 0.8065418 0.9055773 4 0.028392933 0.12856468 0.2086340 0.3482647 0.4420098 0.5308244 0.6348948 0.7271810 0.8202800 0.9072492 5 0.042657119 0.14656184 0.2251334 0.3487408 0.4484275 0.5423360 0.6480134 0.7298033 0.8298771 0.9297432 6 0.045639209 0.15821977 0.2372649 0.3816321 0.4561417 0.5481704 0.6758081 0.7309329 0.8355179 0.9427048 7 0.050771165 0.16489115 0.2625372 0.4225952 0.4701286 0.5512640 0.6765688 0.7508822 0.8510762 0.9444102 8 0.051595323 0.16541512 0.2713721 0.4235584 0.4724879 0.5652690 0.7066615 0.7512220 0.8625107 0.9610963 9 0.057932068 0.17766175 0.2834772 0.4284754 0.4725581 0.5782843 0.7084244 0.7533327 0.8668086 0.9961111> res <- do.call(rbind, split(m, (1:100)%%10 ))Rows could be unsorted via apply(res, 1, sample, 10)> apply(res, 1, mean)0 1 2 3 4 5 6 7 8 9 0.5410779 0.4510622 0.4647485 0.4715821 0.4776296 0.4891294 0.5012032 0.5145125 0.5231188 0.5323066> apply(res, 1, sd)0 1 2 3 4 5 6 7 8 9 0.3046305 0.3031683 0.2957381 0.2978136 0.2992292 0.2988865 0.2987615 0.2967925 0.3019649 0.3047879>-- David Winsemius Alameda, CA, USA