charlie@muskrat.stat.umn.edu
2001-Oct-26 23:09 UTC
[Rd] wilcox.test point estimates perverse (PR#1150)
The point estimates produced by wilcox.test are perverse (not wrong, just brain damaged). The Hodges-Lehmann estimator that goes with the signed rank test is the median of the Walsh averages. The Hodges-Lehmann estimator that goes with the rank sum test is the median of the pairwise differences. wilcox.test agrees except that it uses the following very peculiar definition of "sample median": if the number of items is even, the average of the two middle items (agrees with the usual definition), and if the number of items is odd, the average of the two on either side of the middle item in sorted order (huh??? why???). I know this is asymptotically equivalent to the usual definition, but * Why get answers that disagree with every nonparametrics textbook? * If wilcox.test is right then median is wrong and should be fixed (just kidding, don't mess with median!) Thus the complicated code in lines 87--89 of wilcox.test.default should be replaced by the simple ESTIMATE <- median(diffs) and the complicated code in lines 214-216 of wilcox.test.default should be again be replaced replaced by the simple ESTIMATE <- median(diffs) Moreover, there is NO "correction for ties" in the Hodges-Lehmann estimator. Thus the code in lines 147-148 and 272-273 is silly. The code for the point estimate should be done by exactly the same code when there are ties or zeroes (or both) and when there are not. Reference: Sections 3.2 and 4.2 of Hollander and Wolfe. I understand that one doesn't want to produce the vector diffs (which is order n^2 when n is large), but one doesn't have to to calculate the median if this is taken to C. Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the examples, but take my word for it, it's not bulletproof). Note that the confidence intervals are also bizarre in the case of ties, but that's another bug report. -- Charles Geyer Professor, School of Statistics University of Minnesota charlie@stat.umn.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> The point estimates produced by wilcox.test are perverse (not wrong, just > brain damaged). The Hodges-Lehmann estimator that goes with the signed > rank test is the median of the Walsh averages. The Hodges-Lehmann estimator > that goes with the rank sum test is the median of the pairwise differences. > > wilcox.test agrees except that it uses the following very peculiar definition > of "sample median": if the number of items is even, the average of the two > middle items (agrees with the usual definition), and if the number of items > is odd, the average of the two on either side of the middle item in sorted > order (huh??? why???). I know this is asymptotically equivalent to the > usual definition, but > > * Why get answers that disagree with every nonparametrics textbook? > > * If wilcox.test is right then median is wrong and should be fixed > (just kidding, don't mess with median!) > > Thus the complicated code in lines 87--89 of wilcox.test.default > should be replaced by the simple > > ESTIMATE <- median(diffs) > > and the complicated code in lines 214-216 of wilcox.test.default should > be again be replaced replaced by the simple > > ESTIMATE <- median(diffs) >For tied samples this estimators may be not inside the confidence sets which is confusing. Example (from ?wilcox.exact, package exactRankTests):> treat <- c(94, 108, 110, 90) > contr <- c(80, 94, 85, 90, 90, 90, 108, 94, 78, 105, 88) > > # StatXact 4 for Windows: p.value = 0.0989, point prob = 0.019 > > wilcox.exact(contr, treat, conf.int=T)Exact Wilcoxon rank sum test data: contr and treat W = 9, point prob = 0.019, p-value = 0.0989 alternative hypothesis: true mu is not equal to 0 95 percent confidence interval: -22 4 sample estimates: difference in location 13 <- when computed with median(diffs) For compatibility between the 2 versions of the Wilcoxon-Test, we use the basic definition: d_1 = sup {d | W(d) > E(W) } d_2 = inf {d | W(d) < E(W) } Hodges-Lehmann = mean(d1,d2) (using max and min instead of sup and inf which causes the difference). However, this may be questionable ...> Moreover, there is NO "correction for ties" in the Hodges-Lehmann estimator. > Thus the code in lines 147-148 and 272-273 is silly. The code for the point > estimate should be done by exactly the same code when there are ties or zeroes > (or both) and when there are not. Reference: Sections 3.2 and 4.2 of > Hollander and Wolfe.wilcox.test uses the normal approximation when a) the sample sizes are large or b) ties occur. computing all differences when a) is not feasible (taking it to C does not improve `outer(x,y,"-")' significantly). Therefore, we use uniroot for searching d with W(X - d, Y) = E(W). In case b) in `wilcox.test' the normal approximation is used for p-values and confidence intervals, it seems natural to me to compute the point estimator the way the conf ints are computed. Additionally median(diffs) may lie outside the confidence set in this situation (see above).> > I understand that one doesn't want to produce the vector diffs (which is > order n^2 when n is large), but one doesn't have to to calculate the median > if this is taken to C. > > Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the > examples, but take my word for it, it's not bulletproof). > > Note that the confidence intervals are also bizarre in the case of ties, but > that's another bug report.Because the normal approximation is bizzare for small, tied samples? That is what `wilcox.exact' is for. Torsten> -- > Charles Geyer > Professor, School of Statistics > University of Minnesota > charlie@stat.umn.edu > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, I need to append one vector to another in C. I have this little test program that crashes is this not what append is for? #include<R.h> #include<Rinternals.h> SEXP apt(SEXP a, SEXP b){ SEXP ans; PROTECT(ans = append(a,b)); UNPROTECT(1); return(ans); } Thanks Nicholas CH3 | N Nicholas Lewin-Koh / \ Dept of Statistics N----C C==O Program in Ecology and Evolutionary Biology || || | Iowa State University || || | Ames, IA 50011 CH C N--CH3 http://www.public.iastate.edu/~nlewin \ / \ / nlewin@iastate.edu N C | || Currently CH3 O Graphics Lab School of Computing National University of Singapore The Real Part of Coffee kohnicho@comp.nus.edu.sg -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Apparently Analagous Threads
- Base R wilcox.test gives incorrect answers, has been fixed in DescTools, solution can likely be ported to Base R
- Hodges-Lehmann EXACT confidence interval for small dataset with ties
- Hodges-lehmann test and CI/significance
- wilcox.test returned estimates
- one (small) sample wilcox.test confidence intervals