thr3ads.net - R devel - [Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values" [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2005-Jan-08 17:19 UTC

[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

>>>>> "GS" == Gordon K Smyth <smyth@wehi.edu.au>
>>>>>     on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:    
    <.............>

    GS> p.adjust() unfortunately gives incorrect results when
    GS> 'p' includes NAs.  The results from topTable are
    GS> correct.  topTable() takes care to remove NAs before
    GS> passing the values to p.adjust().

There's at least one bug in p.adjust():  The "hommel" method
currently does not work at all with NAs (and I have an
uncommitted fix ready for this bug).
OTOH, the current version of p.adjust() ``works'' with NA's,
apart from Hommel's method, but by using "n = length(p)" in the
correction formulae, i.e. *including* the NAs for determining
sample size `n'  {my fix to "hommel" would do this as well}.

My question is what p.adjust() should do when there are NA's
more generally, or more specifically which `n' to use in the
correction formula. Your proposal amounts to
  ``drop NA's and forget about them till the very end''
  (where they are wanted in the result),
i.e., your sample size `n' would be sum(!is.na(p)) instead of
length(p).

To me it doesn't seem obvious that this setting 
"n = #{non-NA observations}" is desirable for all 
P-value adjustment methods. One argument for keeping
``n = #{all observations}'' at least for some correction
methods is the following  "continuity" one:

If only a few ``irrelevant'' (let's say > 0.5) P-values are
replaced by NA, the adjusted relevant small P-values shouldn't
change much, ideally not at all.  I'm really no scholar on this
topic, but e.g. for "holm" I think I would want to keep ``full
n'' because of the above continuity argument.
BTW, for "fdr", I don't see a straightforward way to achieve the
desired continuity.
5D
Of course, p.adjust() could adopt the possibility of chosing how
NA's should be treated e.g. by another argument ``use.na =
TRUE/FALSE''
and hence allow both versions.  

Feedback very welcome, particularly from ``P-value experts'' ;-)

Martin Maechler, ETH Zurich

Martin Maechler

2005-Jan-08 21:29 UTC

head link

[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

I've thought more and made experiements with R code versions
and just now committed a new version of  p.adjust()  to R-devel
--> https://svn.r-project.org/R/trunk/src/library/stats/R/p.adjust.R
which does sensible NA handling by default and 
*additionally* has an "na.rm" argument (set to FALSE by
default).  The extended 'Examples' secion on the help page
    https://svn.r-project.org/R/trunk/src/library/stats/man/p.adjust.Rd
shows how the new NA handling is typically much more sensible
than using "na.rm = TRUE".

Martin 

>>>>> "MM" == Martin Maechler
<maechler@stat.math.ethz.ch>
>>>>>     on Sat, 8 Jan 2005 17:19:23 +0100 writes:
>>>>> "GS" == Gordon K Smyth <smyth@wehi.edu.au>
>>>>>     on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:    
    MM>     <.............>

    GS> p.adjust() unfortunately gives incorrect results when
    GS> 'p' includes NAs.  The results from topTable are
    GS> correct.  topTable() takes care to remove NAs before
    GS> passing the values to p.adjust().

    MM> There's at least one bug in p.adjust(): The "hommel"
    MM> method currently does not work at all with NAs (and I
    MM> have an uncommitted fix ready for this bug).  OTOH, the
    MM> current version of p.adjust() ``works'' with NA's, apart
    MM> from Hommel's method, but by using "n = length(p)" in
    MM> the correction formulae, i.e. *including* the NAs for
    MM> determining sample size `n' {my fix to "hommel" would
do
    MM> this as well}.

    MM> My question is what p.adjust() should do when there are
    MM> NA's more generally, or more specifically which `n' to
    MM> use in the correction formula. Your proposal amounts to
    MM> ``drop NA's and forget about them till the very end''
    MM> (where they are wanted in the result), i.e., your sample
    MM> size `n' would be sum(!is.na(p)) instead of length(p).

    MM> To me it doesn't seem obvious that this setting "n    
MM> #{non-NA observations}" is desirable for all P-value
    MM> adjustment methods. One argument for keeping ``n = #{all
    MM> observations}'' at least for some correction methods is
    MM> the following "continuity" one:

    MM> If only a few ``irrelevant'' (let's say > 0.5)
P-values
    MM> are replaced by NA, the adjusted relevant small P-values
    MM> shouldn't change much, ideally not at all.  I'm really
    MM> no scholar on this topic, but e.g. for "holm" I think I
    MM> would want to keep ``full n'' because of the above
    MM> continuity argument.  BTW, for "fdr", I don't see a
    MM> straightforward way to achieve the desired continuity.
    MM> 5D Of course, p.adjust() could adopt the possibility of
    MM> chosing how NA's should be treated e.g. by another
    MM> argument ``use.na = TRUE/FALSE'' and hence allow both
    MM> versions.

    MM> Feedback very welcome, particularly from ``P-value
    MM> experts'' ;-)

    MM> Martin Maechler, ETH Zurich

Apparently Analagous Threads

Search for more seemingly similar threads

R devel - Jan 2005 - p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

Apparently Analagous Threads