Majonu
2010-Jul-26 20:28 UTC
[R] Sample size calculation for non-normal population with unknown mean and SD
Basically, we have a population of 4,392 documents and we want to find out the number of patents per document. We don?t want to go through all 4,392 documents, but want a reliable sample size from which to draw inferences. I feel like this count data will not follow a normal distribution, but more like a Poisson (skewed right.) The problem is we don?t have much similar data to this data set, so mean and standard deviation are unknown. Is there any way to derive a sample size based off the confidence interval, margin of error, and population size for what I assume to be a non-normal population? Any help would be greatly appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Sample-size-calculation-for-non-normal-population-with-unknown-mean-and-SD-tp2302833p2302833.html Sent from the R help mailing list archive at Nabble.com.
Bert Gunter
2010-Jul-26 20:39 UTC
[R] Sample size calculation for non-normal population with unknown mean and SD
The obvious: Take a small sample, say 25-50. ?Get an estimate of your distribution from that. Then use this to determine how many more (if any) additional samples you need for desired precision. This latter can probably easily be done via simulation/bootstrap if you don't want to specify a parametric form. My guess is that your distribution is right-skew but not Poisson -- probably more like a truncated Poisson. But of course I have no idea what sorts of documents you've got, so how would I know? Bert Gunter Genentech Nonclinical Biostatistics On Mon, Jul 26, 2010 at 1:28 PM, Majonu <mnunez at andrew.cmu.edu> wrote:> > Basically, we have a population of 4,392 documents and we want to find out > the number of patents per document. We don?t want to go through all 4,392 > documents, but want a reliable sample size from which to draw inferences. I > feel like this count data will not follow a normal distribution, but more > like a Poisson (skewed right.) The problem is we don?t have much similar > data to this data set, so mean and standard deviation are unknown. Is there > any way to derive a sample size based off the confidence interval, margin of > error, and population size for what I assume to be a non-normal population? > Any help would be greatly appreciated. > -- > View this message in context: http://r.789695.n4.nabble.com/Sample-size-calculation-for-non-normal-population-with-unknown-mean-and-SD-tp2302833p2302833.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >