thr3ads.net - R help - [R] create stratified splits [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Martin Batholdy

2012-Dec-19 20:23 UTC

[R] create stratified splits

Hi,


I have a vector like:

r <- runif(100)

Now I would like to split r into 10 pieces (each with 10 elements) ?
but the 'pieces' should be roughly similar with regard to mean and sd.

what is an efficient way to do this in R?


thanks!

Ista Zahn

2012-Dec-19 22:45 UTC

head link

[R] create stratified splits

Hi Martin,

Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:

# create the vector to be split
r <- runif(100)

# write a function to split it, with various knobs and toggles
splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500,
verbose=FALSE) {
  M <- mean.tol+1
  SD <- sd.tol+1
  I <- 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
  while((M > mean.tol | SD > sd.tol) & I <= itr) {
    I <- I + 1
    ## pick another split
    x1 <- data.frame(g = rep(letters[1:n], length(x)/n),
                     value = sample(x, length(x)))
    M <- sd(tapply(x1$value, x1$g, FUN=mean))
    SD <- sd(tapply(x1$value, x1$g, FUN=sd))
    if(verbose) {
      cat("M = ", M, ", mean.tol =", mean.tol, ": SD =
", SD, ",
sd.tol=", sd.tol, "\n")
    }
  }
# don't try forever...
  if(I >= itr) {
    stop("failed to find split matching criteria: try increasing
tolerance")
  } else {
    return(x1)
  }
}

# now use our function to find a set of splits within our mean and sd
tolerance.
tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)

# adjust some of the dials and switches to suit...
tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)

Best,
Ista

On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
<batholdy at googlemail.com> wrote:> Hi,
>
>
> I have a vector like:
>
> r <- runif(100)
>
> Now I would like to split r into 10 pieces (each with 10 elements) ?
> but the 'pieces' should be roughly similar with regard to mean and
sd.
>
> what is an efficient way to do this in R?
>
>
> thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2012-Dec-19 23:38 UTC

head link

[R] create stratified splits

On Dec 19, 2012, at 12:23 PM, Martin Batholdy wrote:
> Hi,
> 
> 
> I have a vector like:
> 
> r <- runif(100)
> 
> Now I would like to split r into 10 pieces (each with 10 elements) ?
> but the 'pieces' should be roughly similar with regard to mean and
sd.
> 
> what is an efficient way to do this in R?
> 
> m <- sort(runif(100))
> do.call(rbind, split(m, (1:100)%%10 ))          [,1]       [,2]      [,3]      [,4]      [,5]      [,6]      [,7]     
[,8]      [,9]     [,10]
0 0.073246870 0.17794968 0.2923314 0.4314560 0.4774632 0.6035957 0.7122246
0.7671372 0.8759190 0.9994554
1 0.004766445 0.08639538 0.1922977 0.2976945 0.4327731 0.4966852 0.6094609
0.7124650 0.7771450 0.9009393
2 0.016612211 0.12028226 0.2052309 0.3336055 0.4349006 0.5161239 0.6204279
0.7149662 0.7830977 0.9022377
3 0.027497879 0.12147150 0.2061456 0.3427435 0.4381574 0.5179506 0.6252453
0.7244906 0.8065418 0.9055773
4 0.028392933 0.12856468 0.2086340 0.3482647 0.4420098 0.5308244 0.6348948
0.7271810 0.8202800 0.9072492
5 0.042657119 0.14656184 0.2251334 0.3487408 0.4484275 0.5423360 0.6480134
0.7298033 0.8298771 0.9297432
6 0.045639209 0.15821977 0.2372649 0.3816321 0.4561417 0.5481704 0.6758081
0.7309329 0.8355179 0.9427048
7 0.050771165 0.16489115 0.2625372 0.4225952 0.4701286 0.5512640 0.6765688
0.7508822 0.8510762 0.9444102
8 0.051595323 0.16541512 0.2713721 0.4235584 0.4724879 0.5652690 0.7066615
0.7512220 0.8625107 0.9610963
9 0.057932068 0.17766175 0.2834772 0.4284754 0.4725581 0.5782843 0.7084244
0.7533327 0.8668086 0.9961111
> res <- do.call(rbind, split(m, (1:100)%%10 )) 
Rows could be unsorted via apply(res, 1, sample, 10)
> apply(res, 1, mean)        0         1         2         3         4         5         6         7 
8         9
0.5410779 0.4510622 0.4647485 0.4715821 0.4776296 0.4891294 0.5012032 0.5145125
0.5231188 0.5323066 > apply(res, 1, sd)        0         1         2         3         4         5         6         7 
8         9
0.3046305 0.3031683 0.2957381 0.2978136 0.2992292 0.2988865 0.2987615 0.2967925
0.3019649 0.3047879 > -- 
David Winsemius
Alameda, CA, USA

Reasonably Related Threads

Search for more possibly parallel threads

R help - Dec 2012 - create stratified splits

[R] create stratified splits

[R] create stratified splits

[R] create stratified splits

Reasonably Related Threads