Gordon Smyth
2005-Jan-16 09:55 UTC
[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"
I append below a suggested update for p.adjust(). 1. A new method "yh" for control of FDR is included which is valid for any dependency structure. Reference is Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188. 2. I've re-named the "fdr" method to "bh" but kept "fdr" as a synonym for backward compatability. 3. Upper case values for method "BH" or "YH" are also accepted. 4. p.adust() now preserves attributes like names for named vectors (as does cumsum and friends for example). 5. p.adjust() now works columnwise on numeric data.frames (as does cumsum and friends). 6. method="hommel" now works correctly even for n=2 7. The 'n' argument is removed. Setting this argument for any methods other than "none" or "bonferroni" make the p-values indeterminate, and the argument seems to be seldom used. (It isn't used in the R default distribution.) I think trying to combine this argument with NAs would get you into lots of hot water. For example, what does p.adjust(c(NA,NA,0.05),n=2) mean? Which 2 values should be adjusted? 8. NAs are treated in na.exclude style. This is the correct approach for most applications. The only other consistent thing you could do would be to treat the NAs as if they all had value=1. But then you would have to explain clearly that the values being returned are not actually the correct adjusted p-values, which are unknown, but are the most conservative possible values assuming the worst-case for the missing values. This would become arbitrarily unreasonable as the number of NAs increases. Gordon p.adjust.methods <- c("holm", "hochberg", "hommel", "bonferroni", "yh", "bh", "fdr", "none") p.adjust <- function(p, method = p.adjust.methods) { method <- match.arg(tolower(method),p.adjust.methods) if(method=="fdr") method <- "bh" if(is.data.frame(p)) { if(length(p)) for(i in 1:length(p)) p[[i]] <- Recall(p[[i]],method=method) return(p) } porig <- p notna <- !is.na(p) p <- as.vector(p[notna]) n <- length(p) if (n == 1) return(porig) if (n == 2 && method=="hommel") method <- "hochberg" porig[notna] <- switch(method, holm = { i <- 1:n o <- order(p) ro <- order(o) pmin(1, cummax( (n - i + 1) * p[o] ))[ro] }, hochberg = { i <- n:1 o <- order(p, decreasing = TRUE) ro <- order(o) pmin(1, cummin( (n - i + 1) * p[o] ))[ro] }, hommel = { i <- 1:n s <- sort(p, index = TRUE) p <- s$x ro <- order(s$ix) q <- pa <- rep.int( min(n*p/(1:n)), n) for (j in (n-1):2) { q1 <- min(j*p[(n-j+2):n]/(2:j)) q[1:(n-j+1)] <- pmin( j*p[1:(n-j+1)], q1) q[(n-j+2):n] <- q[n-j+1] pa <- pmax(pa,q) } pmax(pa,p)[ro] }, bh = { i <- n:1 o <- order(p, decreasing = TRUE) ro <- order(o) pmin(1, cummin( n / i * p[o] ))[ro] }, yh = { i <- n:1 o <- order(p, decreasing = TRUE) ro <- order(o) q <- sum(1/(1:n)) pmin(1, cummin(q * n/i * p[o]))[ro] }, bonferroni = pmin(n * p, 1), none = p) porig }
Martin Maechler
2005-Jan-17 22:02 UTC
[Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"
>>>>> "GS" == Gordon Smyth <smyth@wehi.edu.au> >>>>> on Sun, 16 Jan 2005 19:55:35 +1100 writes:GS> I append below a suggested update for p.adjust(). thank you. GS> 1. A new method "yh" for control of FDR is included which is GS> valid for any dependency structure. Reference is GS> Benjamini, Y., and Yekutieli, D. (2001). The control of GS> the false discovery rate in multiple testing under GS> dependency. Annals of Statistics 29, 1165-1188. good, thanks! GS> 2. I've re-named the "fdr" method to "bh" but kept "fdr" GS> as a synonym for backward compatability. ok GS> 3. Upper case values for method "BH" or "YH" are also GS> accepted. I don't see why we'd want this. The S language is case-sensitive and we don't want to lead people to believe that case wouldn't matter. GS> 4. p.adust() now preserves attributes like names for GS> named vectors (as does cumsum and friends for example). good point; definitely desirable!! GS> 5. p.adjust() now works columnwise on numeric GS> data.frames (as does cumsum and friends). well, "cusum and friends" are either generic or groupgeneric (for the "Math" group) -- there's a Math.data.frame group method. This is quite different for p.adjust which is not generic and I'm not (yet?) convinced it should become so. People can easily use sapply(d.frame, p.adjust, method) if needed; In any case it's not in the spirit of R's OO programming to special case "data.frame" inside a function such as p.adjust GS> 6. method="hommel" now works correctly even for n=2 ok, thank you (but as said, in R 2.0.1 the behavior was much more problematic) GS> 7. The 'n' argument is removed. Setting this argument GS> for any methods other than "none" or "bonferroni" make GS> the p-values indeterminate, and the argument seems to be GS> seldom used. (It isn't used in the R default GS> distribution.) I think trying to combine this argument GS> with NAs would get you into lots of hot water. For GS> example, what does p.adjust(c(NA,NA,0.05),n=2) mean? GS> Which 2 values should be adjusted? I agree that I don't see a good reason to allow specifying 'n' as argument unless e.g. for "bonferroni". What do other think ? GS> 8. NAs are treated in na.exclude style. This is the GS> correct approach for most applications. The only other GS> consistent thing you could do would be to treat the NAs GS> as if they all had value=1. But then you would have to GS> explain clearly that the values being returned are not GS> actually the correct adjusted p-values, which are GS> unknown, but are the most conservative possible values GS> assuming the worst-case for the missing values. This GS> would become arbitrarily unreasonable as the number of GS> NAs increases. I now agree that your proposed default behavior is more sensible than my proposition. I'm not sure yet if it wasn't worth to allow for other NA treatment, like the "treat as if 1" {which my code proposition was basically doing} or rather mre sophisticated procedure like "integrating" over all P ~ U[0,1] marginals for each missing value, approximating the integral possibly by "Monte-Carlo" even quasi random numbers.