thr3ads.net - R help - [R] A question about the hypergeometric distribution and phyper() [Sep 2008]

If this information is useful, please help other people find it:
Share via:

michael watson (IAH-C)

2008-Sep-10 13:19 UTC

[R] A question about the hypergeometric distribution and phyper()

Dear All

I have a question about the hypergeomteric distribution.

Example 1: I have a universe of 6187 objects, and 164 have a particular
attribute, therefore 6187-164 do not have that attribute.  I sample 249
of those objects, and find that 19 have that attribute.  I get a p-value
here (looking at just over-representation):

phyper(19, 164, 6187-164, 249, lower.tail=FALSE)
[1] 7.816235e-06

Example 2: I have a universe of 6187 objects, and 12 have a particular
attribute, therefore 6187-12 do not have that attribute.  I sample 249
of those objects, and find that 4 have that attribute.  I get a p-value
here (looking at just over-representation):

phyper(4, 12, 6187-12, 249, lower.tail=FALSE)
[1] 6.368919e-05

It seems to me that the probability of seeing 19 out of 164 in a sample
of 249 being less than the probability of seeing 4 out of 12 in a sample
of the same size is counter-intuitive.

First off, am I using phyper() properly?
Secondly, can someone point me to some documentation explaining why
these seemingly counter-intuitive p-values occur?

Thanks
Mick

Stefan Evert

2008-Sep-10 14:11 UTC

head link

[R] A question about the hypergeometric distribution and phyper()

On 10 Sep 2008, at 15:19, michael watson (IAH-C) wrote:
> Example 1: I have a universe of 6187 objects, and 164 have a  
> particular
> attribute, therefore 6187-164 do not have that attribute.  I sample  
> 249
> of those objects, and find that 19 have that attribute.  I get a p- 
> value
> here (looking at just over-representation):
>
> phyper(19, 164, 6187-164, 249, lower.tail=FALSE)
> [1] 7.816235e-06
Actually, if you look at ?phyper, you'll see that this should be

phyper(18, 164, 6187-164, 249, lower.tail=FALSE)
[1] 2.775819e-05

if you want to calculate Pr(X >= 19) = Pr(X > 18). Similarly:
> phyper(4, 12, 6187-12, 249, lower.tail=FALSE)
> [1] 6.368919e-05
phyper(3, 12, 6187-12, 249, lower.tail=FALSE)
[1] 0.0009816739

Which you'll still find counterintuitive, of course.
> It seems to me that the probability of seeing 19 out of 164 in a  
> sample
> of 249 being less than the probability of seeing 4 out of 12 in a  
> sample
> of the same size is counter-intuitive.
>
> Secondly, can someone point me to some documentation explaining why
> these seemingly counter-intuitive p-values occur?
I think it's just because the hypergeometric distribution becomes very  
skewed and non-normal for expected values < 1 (expectations should be  
roughly 6.6 in the first case and 0.5 in the second case). Perhaps it  
helps to visualize the two distributions?

M <- rbind(dhyper(0:20, 164, 6187-164, 249), dhyper(0:20, 12, 6187-12,  
249))
rownames(M) <- c("164 out of 6187", "12 out of 6187")
colnames(M) <- 0:20
barplot(M, beside=TRUE, legend = TRUE)


Best regards,
Stefan Evert

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Sep 2008 - A question about the hypergeometric distribution and phyper()

[R] A question about the hypergeometric distribution and phyper()

[R] A question about the hypergeometric distribution and phyper()

Possibly Parallel Threads