dear all,
I recently came across the following issue and I was not sure whether it is
intentionally or not:
using p.adjust to adjust p-values for multiple hypothesis testing using the
method from Benjamini and Hochberg removes all NA values from the input vector
and does not account for them in the adjustment, i.e. in a vector of 23 p-values
with 20 of them being NA it adjusts the 3 non-NA p-values as if there had only
been 3 tests to adjust for (see example). I was not aware of that behaviour, and
also implementations like the one in Bioconductor's multtest package handle
NAs differently.
If this behaviour is intentionally I would appreciate if a related note could be
added to the help page.
Example:
x <- c( 0.001, 0.01, 0.02, rep( NA, 20 ) )
p.adjust( x, method="BH" )
[1] 0.003 0.015 0.020 NA NA NA NA NA NA NA NA NA
[13] NA NA NA NA NA NA NA NA NA NA NA
p.adjust( x, method="BH", n=length( x ) )
[1] 0.0230000 0.1150000 0.1533333 NA NA NA NA
[8] NA NA NA NA NA NA NA
[15] NA NA NA NA NA NA NA
[22] NA NA
in the default settings (without specifying n, i.e. n=length(p)) the value of n
is determined after all NAs have been removed from the p-value vector p.
cheers, jo
my R:> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin14.0.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
>