thr3ads.net - R help - [R] Generate groups with random size but given total sample size [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Arne Schulz

2010-Jul-13 13:09 UTC

[R] Generate groups with random size but given total sample size

Dear list,
I am currently doing some simulation studies where I want to compare different
scenarios.
In particular, two scenarios should be compared: 10.000 cases in 100 groups with
100 cases per group and 10.000 cases in 100 groups with random group size
(ranging from 5 to 500).

The first part is no problem:> id <- seq(1,10000)
> group <- sort(rep(seq(1,100),100))
But I don't get along with the second scenario. Using sample does give me
100 groups with random cases, but generates more than 10.000
cases:> set.seed(13)
> sum(sample(5:500, 100))[1] 24583

Another way could be generating one sample at a time and sum the cases. But this
would end up in trail & error to fit the 10.000 cases. Maybe it would break
rules of probability, too.

I'm convinced that there should be another (and even better) way to handle
this problem in R... :-)


Best regards,
Arne Schulz

Greg Snow

2010-Jul-13 16:17 UTC

head link

[R] Generate groups with random size but given total sample size

For one definition of random:

ss <- rexp(100)
ss <- ss/sum(ss)

ss <- 5 + round( ss*9500 )

cnt <- 0
while( ( d <- sum(ss) - 10000 ) != 0 ) {
	
	tmpid <- sample.int(100,1)
	ss[tmpid] <- ss[tmpid] - d

	ss[ ss > 500 ] <- 500
	ss[ ss < 5 ] <- 5

	cnt <- cnt + 1
	if (cnt > 100) {
		cat('problems finding a solution, stopping after 100 iterations\n')
		break
	}
}

group <- rep( 1:100, ss )


Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Arne Schulz
> Sent: Tuesday, July 13, 2010 7:10 AM
> To: r-help at r-project.org
> Subject: [R] Generate groups with random size but given total sample
> size
> 
> Dear list,
> I am currently doing some simulation studies where I want to compare
> different scenarios.
> In particular, two scenarios should be compared: 10.000 cases in 100
> groups with 100 cases per group and 10.000 cases in 100 groups with
> random group size (ranging from 5 to 500).
> 
> The first part is no problem:
> > id <- seq(1,10000)
> > group <- sort(rep(seq(1,100),100))
> 
> But I don't get along with the second scenario. Using sample does give
> me 100 groups with random cases, but generates more than 10.000 cases:
> > set.seed(13)
> > sum(sample(5:500, 100))
> [1] 24583
> 
> Another way could be generating one sample at a time and sum the cases.
> But this would end up in trail & error to fit the 10.000 cases. Maybe
> it would break rules of probability, too.
> 
> I'm convinced that there should be another (and even better) way to
> handle this problem in R... :-)
> 
> 
> Best regards,
> Arne Schulz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jul 2010 - Generate groups with random size but given total sample size

[R] Generate groups with random size but given total sample size

[R] Generate groups with random size but given total sample size

Seemingly Similar Threads