thr3ads.net - R help - [R] OT: What distribution is this? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Rainer M Krug

2010-Sep-25 14:24 UTC

[R] OT: What distribution is this?

Hi

This is OT, but I need it for my simulation in R.

I have a special case for sampling with replacement: instead of sampling
once and replacing it immediately, I sample n times, and then replace all n
items.


So:

N entities
x samples with replacement
each sample consists of n sub-samples WITHOUT replacement, which are all
replaced before the next sample is drawn

My question is: which distribution can I use to describe how often each
entity of the N has been sampled?

Thanks for your help,

Rainer

-- 
NEW GERMAN FAX NUMBER!!!

Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:           +27 - (0)83 9479 042
Fax:            +27 - (0)86 516 2782
Fax:            +49 - (0)321 2125 2244
email:          Rainer@krugs.de

Skype:          RMkrug
Google:         R.M.Krug@gmail.com

	[[alternative HTML version deleted]]

Berwin A Turlach

2010-Sep-25 15:09 UTC

head link

[R] OT: What distribution is this?

G'day Rainer,

On Sat, 25 Sep 2010 16:24:17 +0200
Rainer M Krug <r.m.krug at gmail.com> wrote:
> This is OT, but I need it for my simulation in R.
> 
> I have a special case for sampling with replacement: instead of
> sampling once and replacing it immediately, I sample n times, and
> then replace all n items.
> 
> 
> So:
> 
> N entities
> x samples with replacement
> each sample consists of n sub-samples WITHOUT replacement, which are
> all replaced before the next sample is drawn
> 
> My question is: which distribution can I use to describe how often
> each entity of the N has been sampled?
Surely, unless I am missing something, any given entity would have
(marginally) a binomial distribution:

A sub-sample of size n either contains the entity or it does not.  The
probability that a sub-sample contains the entity is a function of N
and n alone.

x sub-samples are drawn (with replacement), so the number of times that
an entity has been sampled is the number of sub-samples in which it
appears.  This is given by the binomial distribution with parameters x
and p, where p is the probability determined in the previous paragraph.

I guess the fun starts if you try to determine the joint distribution
of two (or more) entities.........

HTH.

Cheers,

	Berwin 

========================== Full address ===========================Berwin A
Turlach                      Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
Australia                        http://www.maths.uwa.edu.au/~berwin

Peter Dalgaard

2010-Sep-25 15:19 UTC

head link

[R] OT: What distribution is this?

On 09/25/2010 04:24 PM, Rainer M Krug wrote:> Hi
> 
> This is OT, but I need it for my simulation in R.
> 
> I have a special case for sampling with replacement: instead of sampling
> once and replacing it immediately, I sample n times, and then replace all n
> items.
> 
> 
> So:
> 
> N entities
> x samples with replacement
> each sample consists of n sub-samples WITHOUT replacement, which are all
> replaced before the next sample is drawn
> 
> My question is: which distribution can I use to describe how often each
> entity of the N has been sampled?
> 
> Thanks for your help,
> 
> Rainer
> 
How did you know I was in the middle of preparing lectures on the
variance of the hypergeometric distribution and such? ;-)

If you look at a single item, the answer is of course that you have a
binomial with size=x and prob=n/N. The problem is that these binomials
are correlated between items.

If you can make do with a 2nd order approximation, then the covariances
between the indicators for two items being selected is easily found from
the symmetry and the fact that if you sum all N indicators you get the
constant n. I.e. the variance is p(1-p) and the covariance is
-p(1-p)/(N-1). For sums over repeated samples, just multiply everything
by the number x of samples.

If you intend to just count the frequency of a particular feature in
each of your n-samples, i.e., you have x replications of a
hypergeometric experiment, then you can do somewhat better by computing
the explicit convolution of x hypergeometrics (convolve(x, rev(y),
type="o") and Reduce() are your friends). I'm not sure this is
actually
worth the trouble, but it should be doable for decent-sized N and x.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Apparently Analagous Threads

Search for more maybe matching threads

R help - Sep 2010 - OT: What distribution is this?

[R] OT: What distribution is this?

[R] OT: What distribution is this?

[R] OT: What distribution is this?

Apparently Analagous Threads