dkoschuetzki@gmx.de
2005-Aug-18 11:43 UTC
[Rd] kendall tau correlation test for ties: Potential error (PR#8076)
Full_Name: Dirk Koschuetzki Version: 2.1.1 OS: source code Submission from: (NULL) (194.94.136.34) Hello,>From the source code (R-2.1.1, file: .../R-2.1.1/src/library/stats/R/)****************************** cor.test.default <- function(x, y, alternative = c("two.sided", "less", "greater"), method = c("pearson", "kendall", "spearman"), exact = NULL, conf.level = 0.95, ...) { alternative <- match.arg(alternative) method <- match.arg(method) DNAME <- paste(deparse(substitute(x)), "and", deparse(substitute(y))) if(length(x) != length(y)) stop("'x' and 'y' must have the same length") OK <- complete.cases(x, y) x <- x[OK] y <- y[OK] n <- length(x) PVAL <- NULL NVAL <- 0 conf.int <- FALSE if(method == "pearson") { // Omitted } else { if(n < 2) stop("not enough finite observations") PARAMETER <- NULL TIES <- (min(length(unique(x)), length(unique(y))) < n) if(method == "kendall") { method <- "Kendall's rank correlation tau" names(NVAL) <- "tau" r <- cor(x,y, method = "kendall") ESTIMATE <- c(tau = r) if(!is.finite(ESTIMATE)) { # all x or all y the same ESTIMATE[] <- NA STATISTIC <- c(T = NA) PVAL <- NA } else { if(is.null(exact)) exact <- (n < 50) if(exact && !TIES) { q <- round((r + 1) * n * (n - 1) / 4) pkendall <- function(q, n) { .C("pkendall", length(q), p = as.double(q), as.integer(n), PACKAGE = "stats")$p } PVAL <- switch(alternative, "two.sided" = { if(q > n * (n - 1) / 4) p <- 1 - pkendall(q - 1, n) else p <- pkendall(q, n) min(2 * p, 1) }, "greater" = 1 - pkendall(q - 1, n), "less" = pkendall(q, n)) STATISTIC <- c(T = q) } else { STATISTIC <- c(z = r / sqrt((4 * n + 10) / (9 * n*(n-1)))) p <- pnorm(STATISTIC) if(exact && TIES) warning("Cannot compute exact p-value with ties") } } } else { // OMITTED } } if(is.null(PVAL)) # for "pearson" only, currently PVAL <- switch(alternative, "less" = p, "greater" = 1 - p, "two.sided" = 2 * min(p, 1 - p)) RVAL <- list(statistic = STATISTIC, parameter = PARAMETER, p.value = as.numeric(PVAL), estimate = ESTIMATE, null.value = NVAL, alternative = alternative, method = method, data.name = DNAME) if(conf.int) RVAL <- c(RVAL, list(conf.int = cint)) class(RVAL) <- "htest" RVAL } ************* Please look at the computation of the p-value for Kendalls tau. There is an assignment to "p" right above the warning. In the bottom of the function there is a comment that for the pearson case we have to use the modification and set PVAL. The problem is: * Either the comment is wrong because the modification should be done with kendall too, or * The variable PVAL has to be assigned in the kendall block. I hope this is clear so far. Please send me some comments, because I'm not sure if my observation is ok. And currently I try to figure out the significance in the biserial case which of course makes heavy use of the tied case. Cheers, Dirk
Peter Dalgaard
2005-Aug-18 13:07 UTC
[Rd] kendall tau correlation test for ties: Potential error (PR#8076)
dkoschuetzki at gmx.de writes:> } else { > STATISTIC <- c(z = r / sqrt((4 * n + 10) / (9 * n*(n-1)))) > p <- pnorm(STATISTIC) > if(exact && TIES) > warning("Cannot compute exact p-value with ties") > } > } > } else { > // OMITTED > } > } > > if(is.null(PVAL)) # for "pearson" only, currently > PVAL <- switch(alternative, > "less" = p, > "greater" = 1 - p, > "two.sided" = 2 * min(p, 1 - p))...> > Please look at the computation of the p-value for Kendalls tau. There is an > assignment to "p" right above the warning. In the bottom of the function there > is a comment that for the pearson case we have to use the modification and set > PVAL. > > The problem is: > * Either the comment is wrong because the modification should be done with > kendall too, or > * The variable PVAL has to be assigned in the kendall block. > > I hope this is clear so far.I think it is the comment that is wrong. The calculation of opposite-side one-sided and two-sided alternatives make OK sense when the normal approximation of the test statistic is being used. It's when you use a discrete distribution that you need to be careful. (As brought up recently, the normal approximation itself is not too hot in the tied case, but that's another matter.) -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
dkoschuetzki@gmx.de
2005-Aug-24 16:55 UTC
[Rd] kendall tau correlation test for ties: Potential error (PR#8076)
Hello, On Thu, 18 Aug 2005 15:07:07 +0200, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:> dkoschuetzki at gmx.de writes:>> The problem is: >> * Either the comment is wrong because the modification should be done >> with >> kendall too, or >> * The variable PVAL has to be assigned in the kendall block. > > I think it is the comment that is wrong. [...]Thanks for the clarification. I think I got the untied case and I think that I have an understanding of the problems of the tied one by now. Thanks for you comments and many thanks for the great R system. I use it as my daily working environment and I'm very happy with it. Thanks! Dirk