Arne Schulz
2010-Jul-13 13:09 UTC
[R] Generate groups with random size but given total sample size
Dear list, I am currently doing some simulation studies where I want to compare different scenarios. In particular, two scenarios should be compared: 10.000 cases in 100 groups with 100 cases per group and 10.000 cases in 100 groups with random group size (ranging from 5 to 500). The first part is no problem:> id <- seq(1,10000) > group <- sort(rep(seq(1,100),100))But I don't get along with the second scenario. Using sample does give me 100 groups with random cases, but generates more than 10.000 cases:> set.seed(13) > sum(sample(5:500, 100))[1] 24583 Another way could be generating one sample at a time and sum the cases. But this would end up in trail & error to fit the 10.000 cases. Maybe it would break rules of probability, too. I'm convinced that there should be another (and even better) way to handle this problem in R... :-) Best regards, Arne Schulz
Greg Snow
2010-Jul-13 16:17 UTC
[R] Generate groups with random size but given total sample size
For one definition of random: ss <- rexp(100) ss <- ss/sum(ss) ss <- 5 + round( ss*9500 ) cnt <- 0 while( ( d <- sum(ss) - 10000 ) != 0 ) { tmpid <- sample.int(100,1) ss[tmpid] <- ss[tmpid] - d ss[ ss > 500 ] <- 500 ss[ ss < 5 ] <- 5 cnt <- cnt + 1 if (cnt > 100) { cat('problems finding a solution, stopping after 100 iterations\n') break } } group <- rep( 1:100, ss ) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Arne Schulz > Sent: Tuesday, July 13, 2010 7:10 AM > To: r-help at r-project.org > Subject: [R] Generate groups with random size but given total sample > size > > Dear list, > I am currently doing some simulation studies where I want to compare > different scenarios. > In particular, two scenarios should be compared: 10.000 cases in 100 > groups with 100 cases per group and 10.000 cases in 100 groups with > random group size (ranging from 5 to 500). > > The first part is no problem: > > id <- seq(1,10000) > > group <- sort(rep(seq(1,100),100)) > > But I don't get along with the second scenario. Using sample does give > me 100 groups with random cases, but generates more than 10.000 cases: > > set.seed(13) > > sum(sample(5:500, 100)) > [1] 24583 > > Another way could be generating one sample at a time and sum the cases. > But this would end up in trail & error to fit the 10.000 cases. Maybe > it would break rules of probability, too. > > I'm convinced that there should be another (and even better) way to > handle this problem in R... :-) > > > Best regards, > Arne Schulz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.