bates@stat.wisc.edu
2003-Jan-14 18:11 UTC
[Rd] ctest package: wilcox.test() produces integer overflow (PR#2453)
This was filed as a bug report on the Debian r-base package. It is more properly a bug report on the ctest package in R. The default method for wilcox.test manipulates x and y without checking the class or data.class of these objects. Possible solutions are - create wilcox.test.factor (if appropriate) - check the class and/or data.class of x and y in wilcox.test.default and produce error messages or warnings for inappropriate objects - coerce to numeric unconditionally (probably not a good idea) Martin Michlmayr <tbm@cyrius.com> writes:> Package: r-base > Version: 1.5.0-2 / 1.6.1.cvs.20030103-1 > Severity: normal > > I have some ordinal data and I wanted to perform an u-test. However, > a problem occured: > > > x <- read.table("spss-3.txt", header=TRUE) > > a = factor(x$a) > > b = factor(x$b) > > summary(a) > 1 2 3 4 5 6 > 23900 20362 15238 10007 3399 472 > > summary(b) > 1 2 3 4 5 6 > 23809 20649 15069 9952 3415 484 > > wilcox.test(a, b) > > Wilcoxon rank sum test with continuity correction > > data: a and b > W = 5384330884, p-value = NA > alternative hypothesis: true mu is not equal to 0 > > Warning messages: > 1: "-" not meaningful for factors in: Ops.factor(x, mu) > 2: NAs produced by integer overflow in: n.x * n.y > 3: NAs produced by integer overflow in: n.x * n.y > > > > Now there appear to be two issues: First of all, the NAs produced by > integer overflow. Since they go away when I use less data, this looks > like an R bug with big data sets. When I use less data, the warning > goes away: > > 57:tbm@arborlon: ~] wc -l s > 40000 s > > > summary(a) > 1 2 3 4 5 6 > 13034 11086 8341 5412 1869 257 > > summary(b) > 1 2 3 4 5 6 > 13034 11086 8341 5412 1869 257 > > wilcox.test(a, b) > > Wilcoxon rank sum test with continuity correction > > data: a and b > W = 1599920001, p-value = < 2.2e-16 > alternative hypothesis: true mu is not equal to 0 > > Warning message: > "-" not meaningful for factors in: Ops.factor(x, mu) > > > > > However, I still don't know what the other warning is. I dont have an > "-" in my data. I reduced the data to 2 lines and the problem still > occurs: > > > summary(a) > 2 3 > 1 1 > > summary(b) > 2 3 > 1 1 > > wilcox.test(a, b) > > Wilcoxon rank sum test > > data: a and b > W = 4, p-value = 0.3333 > alternative hypothesis: true mu is not equal to 0 > > Warning message: > "-" not meaningful for factors in: Ops.factor(x, mu) > > > > The file is: > > 67:tbm@arborlon: ~] cat s > a b > 2 4 > 3 1 > 68:tbm@arborlon: ~] > > > I'm not an R expert, so this might be a pilot error; but I don't see > where. > > > -- System Information: > Debian Release: 3.0 > Architecture: i386 > Kernel: Linux regression 2.4.19-686 #1 Thu Aug 8 21:30:09 EST 2002 i686 > Locale: LANG=en_US, LC_CTYPE=en_US > > Versions of packages r-base depends on: > ii r-base-core 1.5.0-2 GNU R core of statistical computin > ii r-base-html 1.5.0-2 GNU R html docs for statistical co > ii r-base-latex 1.5.0-2 GNU R LaTeX docs for statistical c > > -- no debconf information > > > -- > Martin Michlmayr > tbm@cyrius.com > >-- Douglas Bates bates@stat.wisc.edu Statistics Department 608/262-2598 University of Wisconsin - Madison http://www.stat.wisc.edu/~bates/
Kurt Hornik
2003-Jan-14 18:48 UTC
[Rd] ctest package: wilcox.test() produces integer overflow (PR#2453)
>>>>> bates writes:> This was filed as a bug report on the Debian r-base package. It is > more properly a bug report on the ctest package in R.> The default method for wilcox.test manipulates x and y without > checking the class or data.class of these objects. Possible solutions > are > - create wilcox.test.factor (if appropriate) > - check the class and/or data.class of x and y in wilcox.test.default > and produce error messages or warnings for inappropriate objects > - coerce to numeric unconditionally (probably not a good idea)Hmm, but the documentation clearly says \item{x}{numeric vector of data values.} \item{y}{an optional numeric vector of data values.} -k> Martin Michlmayr <tbm@cyrius.com> writes:>> Package: r-base >> Version: 1.5.0-2 / 1.6.1.cvs.20030103-1 >> Severity: normal >> >> I have some ordinal data and I wanted to perform an u-test. However, >> a problem occured: >> >> > x <- read.table("spss-3.txt", header=TRUE) >> > a = factor(x$a) >> > b = factor(x$b) >> > summary(a) >> 1 2 3 4 5 6 >> 23900 20362 15238 10007 3399 472 >> > summary(b) >> 1 2 3 4 5 6 >> 23809 20649 15069 9952 3415 484 >> > wilcox.test(a, b) >> >> Wilcoxon rank sum test with continuity correction >> >> data: a and b >> W = 5384330884, p-value = NA >> alternative hypothesis: true mu is not equal to 0 >> >> Warning messages: >> 1: "-" not meaningful for factors in: Ops.factor(x, mu) >> 2: NAs produced by integer overflow in: n.x * n.y >> 3: NAs produced by integer overflow in: n.x * n.y >> > >> >> Now there appear to be two issues: First of all, the NAs produced by >> integer overflow. Since they go away when I use less data, this looks >> like an R bug with big data sets. When I use less data, the warning >> goes away: >> >> 57:tbm@arborlon: ~] wc -l s >> 40000 s >> >> > summary(a) >> 1 2 3 4 5 6 >> 13034 11086 8341 5412 1869 257 >> > summary(b) >> 1 2 3 4 5 6 >> 13034 11086 8341 5412 1869 257 >> > wilcox.test(a, b) >> >> Wilcoxon rank sum test with continuity correction >> >> data: a and b >> W = 1599920001, p-value = < 2.2e-16 >> alternative hypothesis: true mu is not equal to 0 >> >> Warning message: >> "-" not meaningful for factors in: Ops.factor(x, mu) >> > >> >> >> However, I still don't know what the other warning is. I dont have an >> "-" in my data. I reduced the data to 2 lines and the problem still >> occurs: >> >> > summary(a) >> 2 3 >> 1 1 >> > summary(b) >> 2 3 >> 1 1 >> > wilcox.test(a, b) >> >> Wilcoxon rank sum test >> >> data: a and b >> W = 4, p-value = 0.3333 >> alternative hypothesis: true mu is not equal to 0 >> >> Warning message: >> "-" not meaningful for factors in: Ops.factor(x, mu) >> > >> >> The file is: >> >> 67:tbm@arborlon: ~] cat s >> a b >> 2 4 >> 3 1 >> 68:tbm@arborlon: ~] >> >> >> I'm not an R expert, so this might be a pilot error; but I don't see >> where. >> >> >> -- System Information: >> Debian Release: 3.0 >> Architecture: i386 >> Kernel: Linux regression 2.4.19-686 #1 Thu Aug 8 21:30:09 EST 2002 i686 >> Locale: LANG=en_US, LC_CTYPE=en_US >> >> Versions of packages r-base depends on: >> ii r-base-core 1.5.0-2 GNU R core of statistical computin >> ii r-base-html 1.5.0-2 GNU R html docs for statistical co >> ii r-base-latex 1.5.0-2 GNU R LaTeX docs for statistical c >> >> -- no debconf information >> >> >> -- >> Martin Michlmayr >> tbm@cyrius.com >> >>> -- > Douglas Bates bates@stat.wisc.edu > Statistics Department 608/262-2598 > University of Wisconsin - Madison http://www.stat.wisc.edu/~bates/> ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-devel