Hi all,
I am having a problem with the function "p.adjust" in stats. I have
looked at the manuals and searched the R site, but didn't get anything that
seems directly relevant. Can anybody throw any light on it or confirm my
suspicion that this might be a bug?
I am trying to use the p.adjust() function to do Benjamini/Hochberg FDR control
on a vector of p-values that are the smallest K p-values selected from an
N-length vector. So I am using p.adjust(p, n=N, method="BH")
This seems to give smaller adjusted p-values than if I use p.adjust(p,
method="BH"). For example (K=10, N=20)
> p=runif(10)
> p
[1] 0.4690849 0.5108430 0.8703507 0.1950010 0.9938629 0.8582079 0.7987490
0.9312799 0.9195361 0.7814532> p.adjust(p, n=20, method="BH")
[1] 0.7818081 0.7859123 0.9802947 0.3545472 0.9938629 0.9802947 0.9802947
0.9802947 0.9802947 0.9802947> p.adjust(p, method="BH")
[1] 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629 0.9938629
0.9938629 0.9938629 0.9938629
This seemed unusual, since if p is being selected from a larger vector, one
would expect the FDR to increase. Then I figured out that what is happening is
that the p.adjust function is working as if p is being padded with zeros to make
it N-length. For example> p.adjust(c(p, rep(0,10)), method="BH")
[1] 0.7818081 0.7859123 0.9802947 0.3545472 0.9938629 0.9802947 0.9802947
0.9802947 0.9802947 0.9802947 0.0000000
[12] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
0.0000000 0.0000000
Which gives the same adjusted p-values as when I had used n=20 in the function.
This looks contrary to what the manual for the function actually says:
"Note that you can set 'n' larger than 'length(p)' which
means the unobserved p-values are assumed to be greater than all the observed p
for '"bonferroni"' and '"holm"' methods and
equal to 1 for the other methods."
Instead of assuming un-observed p-values to be equal to 1, the function seems to
assume un-observed p-values to be 0.
I am using R version 2.7.2 on Windows XP Professional. sessionInfo() data pasted
at end of email.
Regards
Sudhir Varma
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] GEOquery_2.4.1 RCurl_0.9-3 Biobase_2.0.1 mvtnorm_0.9-2