thr3ads.net - R devel - [Rd] wilcox.test point estimates perverse (PR#1150) [Oct 2001]

If this information is useful, please help other people find it:
Share via:

charlie@muskrat.stat.umn.edu

2001-Oct-26 23:09 UTC

[Rd] wilcox.test point estimates perverse (PR#1150)

The point estimates produced by wilcox.test are perverse (not wrong, just
brain damaged).  The Hodges-Lehmann estimator that goes with the signed
rank test is the median of the Walsh averages.  The Hodges-Lehmann estimator
that goes with the rank sum test is the median of the pairwise differences.

wilcox.test agrees except that it uses the following very peculiar definition
of "sample median": if the number of items is even, the average of the
two
middle items (agrees with the usual definition), and if the number of items
is odd, the average of the two on either side of the middle item in sorted
order (huh???  why???).  I know this is asymptotically equivalent to the
usual definition, but

  * Why get answers that disagree with every nonparametrics textbook?

  * If wilcox.test is right then median is wrong and should be fixed
    (just kidding, don't mess with median!)

Thus the complicated code in lines 87--89 of wilcox.test.default
should be replaced by the simple

  ESTIMATE <- median(diffs)

and the complicated code in lines 214-216 of wilcox.test.default should
be again be replaced replaced by the simple

  ESTIMATE <- median(diffs)

Moreover, there is NO "correction for ties" in the Hodges-Lehmann
estimator.
Thus the code in lines 147-148 and 272-273 is silly.  The code for the point
estimate should be done by exactly the same code when there are ties or zeroes
(or both) and when there are not.  Reference: Sections 3.2 and 4.2 of
Hollander and Wolfe.

I understand that one doesn't want to produce the vector diffs (which is
order n^2 when n is large), but one doesn't have to to calculate the median
if this is taken to C.

Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the
examples, but take my word for it, it's not bulletproof).

Note that the confidence intervals are also bizarre in the case of ties, but
that's another bug report.
-- 
Charles Geyer
Professor, School of Statistics
University of Minnesota
charlie@stat.umn.edu

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Torsten Hothorn

2001-Oct-27 10:34 UTC

head link

[Rd] wilcox.test point estimates perverse (PR#1150)

> The point estimates produced by wilcox.test are perverse (not wrong, just
> brain damaged).  The Hodges-Lehmann estimator that goes with the signed
> rank test is the median of the Walsh averages.  The Hodges-Lehmann
estimator
> that goes with the rank sum test is the median of the pairwise differences.
> 
> wilcox.test agrees except that it uses the following very peculiar
definition
> of "sample median": if the number of items is even, the average
of the two
> middle items (agrees with the usual definition), and if the number of items
> is odd, the average of the two on either side of the middle item in sorted
> order (huh???  why???).  I know this is asymptotically equivalent to the
> usual definition, but
> 
>   * Why get answers that disagree with every nonparametrics textbook?
> 
>   * If wilcox.test is right then median is wrong and should be fixed
>     (just kidding, don't mess with median!)
> 
> Thus the complicated code in lines 87--89 of wilcox.test.default
> should be replaced by the simple
> 
>   ESTIMATE <- median(diffs)
> 
> and the complicated code in lines 214-216 of wilcox.test.default should
> be again be replaced replaced by the simple
> 
>   ESTIMATE <- median(diffs)
> 
For tied samples this estimators may be not inside the confidence sets which
is confusing. Example (from ?wilcox.exact, package exactRankTests):
> treat <- c(94, 108, 110, 90)
> contr <- c(80, 94, 85, 90, 90, 90, 108, 94, 78, 105, 88)
> 
> # StatXact 4 for Windows: p.value = 0.0989, point prob = 0.019
> 
> wilcox.exact(contr, treat, conf.int=T)
        Exact Wilcoxon rank sum test

data:  contr and treat 
W = 9, point prob = 0.019, p-value = 0.0989 
alternative hypothesis: true mu is not equal to 0 
95 percent confidence interval:
 -22   4 
sample estimates:
difference in location 
                    13 		<- when computed with median(diffs)

For compatibility between the 2 versions of the Wilcoxon-Test, 
we use the basic definition: 

  d_1 = sup {d | W(d) > E(W) }
  d_2 = inf {d | W(d) < E(W) }

  Hodges-Lehmann = mean(d1,d2)

(using max and min instead of sup and inf which causes the difference).
However, this may be questionable ...
> Moreover, there is NO "correction for ties" in the Hodges-Lehmann
estimator.
> Thus the code in lines 147-148 and 272-273 is silly.  The code for the
point
> estimate should be done by exactly the same code when there are ties or
zeroes
> (or both) and when there are not.  Reference: Sections 3.2 and 4.2 of
> Hollander and Wolfe.

wilcox.test uses the normal approximation when 

a) the sample sizes are large or
b) ties occur.

computing all differences when a) is not feasible (taking it to C does not
improve `outer(x,y,"-")' significantly). Therefore, we
use uniroot for searching d with W(X - d, Y) = E(W).

In case b) in `wilcox.test' the normal
approximation is used for p-values and confidence intervals, it seems
natural to me to compute the point estimator the way the conf ints are
computed. Additionally median(diffs) may lie outside the confidence set
in this situation (see above).
> 
> I understand that one doesn't want to produce the vector diffs (which
is
> order n^2 when n is large), but one doesn't have to to calculate the
median
> if this is taken to C.
> 
> Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the
> examples, but take my word for it, it's not bulletproof).
> 
> Note that the confidence intervals are also bizarre in the case of ties,
but
> that's another bug report.
Because the normal approximation is bizzare for small, tied samples? 
That is what `wilcox.exact' is for. 

Torsten
> -- 
> Charles Geyer
> Professor, School of Statistics
> University of Minnesota
> charlie@stat.umn.edu
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Nicholas Lewin-Koh

2001-Oct-27 15:19 UTC

head link

[Rd] Concatenating vectors in C (R API)

Hi,
I need to append one vector to another in C. I have this little test
program that crashes is this not what append is for? 

#include<R.h>
#include<Rinternals.h>

SEXP apt(SEXP a, SEXP b){

  SEXP ans;
  PROTECT(ans = append(a,b));
  UNPROTECT(1);
  return(ans);

}

Thanks

Nicholas









 

                 CH3
                  |
                  N             Nicholas Lewin-Koh
                 / \            Dept of Statistics
           N----C   C==O        Program in Ecology and Evolutionary Biology
          ||   ||   |           Iowa State University
          ||   ||   |           Ames, IA 50011
          CH    C   N--CH3      http://www.public.iastate.edu/~nlewin
            \  / \ /            nlewin@iastate.edu
             N    C
             |   ||             Currently
            CH3   O             Graphics Lab
                                School of Computing
                                National University of Singapore
     The Real Part of Coffee    kohnicho@comp.nus.edu.sg

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Seemingly Similar Threads

Search for more seemingly similar threads

R devel - Oct 2001 - wilcox.test point estimates perverse (PR#1150)

[Rd] wilcox.test point estimates perverse (PR#1150)

[Rd] wilcox.test point estimates perverse (PR#1150)

[Rd] Concatenating vectors in C (R API)

Seemingly Similar Threads