charlie@muskrat.stat.umn.edu
2001-Oct-26 23:09 UTC
[Rd] wilcox.test point estimates perverse (PR#1150)
The point estimates produced by wilcox.test are perverse (not wrong, just
brain damaged). The Hodges-Lehmann estimator that goes with the signed
rank test is the median of the Walsh averages. The Hodges-Lehmann estimator
that goes with the rank sum test is the median of the pairwise differences.
wilcox.test agrees except that it uses the following very peculiar definition
of "sample median": if the number of items is even, the average of the
two
middle items (agrees with the usual definition), and if the number of items
is odd, the average of the two on either side of the middle item in sorted
order (huh??? why???). I know this is asymptotically equivalent to the
usual definition, but
* Why get answers that disagree with every nonparametrics textbook?
* If wilcox.test is right then median is wrong and should be fixed
(just kidding, don't mess with median!)
Thus the complicated code in lines 87--89 of wilcox.test.default
should be replaced by the simple
ESTIMATE <- median(diffs)
and the complicated code in lines 214-216 of wilcox.test.default should
be again be replaced replaced by the simple
ESTIMATE <- median(diffs)
Moreover, there is NO "correction for ties" in the Hodges-Lehmann
estimator.
Thus the code in lines 147-148 and 272-273 is silly. The code for the point
estimate should be done by exactly the same code when there are ties or zeroes
(or both) and when there are not. Reference: Sections 3.2 and 4.2 of
Hollander and Wolfe.
I understand that one doesn't want to produce the vector diffs (which is
order n^2 when n is large), but one doesn't have to to calculate the median
if this is taken to C.
Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the
examples, but take my word for it, it's not bulletproof).
Note that the confidence intervals are also bizarre in the case of ties, but
that's another bug report.
--
Charles Geyer
Professor, School of Statistics
University of Minnesota
charlie@stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> The point estimates produced by wilcox.test are perverse (not wrong, just > brain damaged). The Hodges-Lehmann estimator that goes with the signed > rank test is the median of the Walsh averages. The Hodges-Lehmann estimator > that goes with the rank sum test is the median of the pairwise differences. > > wilcox.test agrees except that it uses the following very peculiar definition > of "sample median": if the number of items is even, the average of the two > middle items (agrees with the usual definition), and if the number of items > is odd, the average of the two on either side of the middle item in sorted > order (huh??? why???). I know this is asymptotically equivalent to the > usual definition, but > > * Why get answers that disagree with every nonparametrics textbook? > > * If wilcox.test is right then median is wrong and should be fixed > (just kidding, don't mess with median!) > > Thus the complicated code in lines 87--89 of wilcox.test.default > should be replaced by the simple > > ESTIMATE <- median(diffs) > > and the complicated code in lines 214-216 of wilcox.test.default should > be again be replaced replaced by the simple > > ESTIMATE <- median(diffs) >For tied samples this estimators may be not inside the confidence sets which is confusing. Example (from ?wilcox.exact, package exactRankTests):> treat <- c(94, 108, 110, 90) > contr <- c(80, 94, 85, 90, 90, 90, 108, 94, 78, 105, 88) > > # StatXact 4 for Windows: p.value = 0.0989, point prob = 0.019 > > wilcox.exact(contr, treat, conf.int=T)Exact Wilcoxon rank sum test data: contr and treat W = 9, point prob = 0.019, p-value = 0.0989 alternative hypothesis: true mu is not equal to 0 95 percent confidence interval: -22 4 sample estimates: difference in location 13 <- when computed with median(diffs) For compatibility between the 2 versions of the Wilcoxon-Test, we use the basic definition: d_1 = sup {d | W(d) > E(W) } d_2 = inf {d | W(d) < E(W) } Hodges-Lehmann = mean(d1,d2) (using max and min instead of sup and inf which causes the difference). However, this may be questionable ...> Moreover, there is NO "correction for ties" in the Hodges-Lehmann estimator. > Thus the code in lines 147-148 and 272-273 is silly. The code for the point > estimate should be done by exactly the same code when there are ties or zeroes > (or both) and when there are not. Reference: Sections 3.2 and 4.2 of > Hollander and Wolfe.wilcox.test uses the normal approximation when a) the sample sizes are large or b) ties occur. computing all differences when a) is not feasible (taking it to C does not improve `outer(x,y,"-")' significantly). Therefore, we use uniroot for searching d with W(X - d, Y) = E(W). In case b) in `wilcox.test' the normal approximation is used for p-values and confidence intervals, it seems natural to me to compute the point estimator the way the conf ints are computed. Additionally median(diffs) may lie outside the confidence set in this situation (see above).> > I understand that one doesn't want to produce the vector diffs (which is > order n^2 when n is large), but one doesn't have to to calculate the median > if this is taken to C. > > Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the > examples, but take my word for it, it's not bulletproof). > > Note that the confidence intervals are also bizarre in the case of ties, but > that's another bug report.Because the normal approximation is bizzare for small, tied samples? That is what `wilcox.exact' is for. Torsten> -- > Charles Geyer > Professor, School of Statistics > University of Minnesota > charlie@stat.umn.edu > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi,
I need to append one vector to another in C. I have this little test
program that crashes is this not what append is for?
#include<R.h>
#include<Rinternals.h>
SEXP apt(SEXP a, SEXP b){
SEXP ans;
PROTECT(ans = append(a,b));
UNPROTECT(1);
return(ans);
}
Thanks
Nicholas
CH3
|
N Nicholas Lewin-Koh
/ \ Dept of Statistics
N----C C==O Program in Ecology and Evolutionary Biology
|| || | Iowa State University
|| || | Ames, IA 50011
CH C N--CH3 http://www.public.iastate.edu/~nlewin
\ / \ / nlewin@iastate.edu
N C
| || Currently
CH3 O Graphics Lab
School of Computing
National University of Singapore
The Real Part of Coffee kohnicho@comp.nus.edu.sg
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Maybe Matching Threads
- Base R wilcox.test gives incorrect answers, has been fixed in DescTools, solution can likely be ported to Base R
- Hodges-Lehmann EXACT confidence interval for small dataset with ties
- Hodges-lehmann test and CI/significance
- wilcox.test returned estimates
- one (small) sample wilcox.test confidence intervals