thr3ads.net - R devel - [Rd] bug in sum() on integer vector [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Hervé Pagès

2011-Dec-09 18:40 UTC

[Rd] bug in sum() on integer vector

Hi,

   x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))

This is correct:

   > sum(as.double(x))
   [1] 0

This is not:

   > sum(x)
   [1] 4996000

Returning NA (with a warning) would also be acceptable for the latter.
That would make it consistent with cumsum(x):

   > cumsum(x)[length(x)]
   [1] NA
   Warning message:
   Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'

Thanks!
H.

 > sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Duncan Murdoch

2011-Dec-09 19:39 UTC

head link

[Rd] bug in sum() on integer vector

On 09/12/2011 1:40 PM, Herv? Pag?s wrote:> Hi,
>
>     x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
>
> This is correct:
>
>     >  sum(as.double(x))
>     [1] 0
>
> This is not:
>
>     >  sum(x)
>     [1] 4996000
>
> Returning NA (with a warning) would also be acceptable for the latter.
> That would make it consistent with cumsum(x):
>
>     >  cumsum(x)[length(x)]
>     [1] NA
>     Warning message:
>     Integer overflow in 'cumsum'; use
'cumsum(as.numeric(.))'
This is a 64 bit problem; in 32 bits things work out properly.   I'd 
guess in 64 bit arithmetic we or the run-time are doing something to 
simulate 32 bit arithmetic (since integers are 32 bits), but it looks as 
though we're not quite getting it right.

Duncan Murdoch
> Thanks!
> H.
>
>   >  sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>    [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>    [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>    [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>    [7] LC_PAPER=C                 LC_NAME=C
>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>

Murray Stokely

2011-Dec-14 00:24 UTC

head link

[Rd] bug in sum() on integer vector

FYI, the new int64 package on CRAN gets this right, but is of course
somewhat slower since it is not doing hardware 64-bit arithmetic.

?x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
 library(int64)
 sum(as.int64(x))
# [1] 0

             - Murray

2011/12/9 Herv? Pag?s <hpages at fhcrc.org>:> Hi,
>
> ?x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
>
> This is correct:
>
> ?> sum(as.double(x))
> ?[1] 0
>
> This is not:
>
> ?> sum(x)
> ?[1] 4996000
>
> Returning NA (with a warning) would also be acceptable for the latter.
> That would make it consistent with cumsum(x):
>
> ?> cumsum(x)[length(x)]
> ?[1] NA
> ?Warning message:
> ?Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
>
> Thanks!
> H.
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> ?[1] LC_CTYPE=en_CA.UTF-8 ? ? ? LC_NUMERIC=C
> ?[3] LC_TIME=en_CA.UTF-8 ? ? ? ?LC_COLLATE=en_CA.UTF-8
> ?[5] LC_MONETARY=en_CA.UTF-8 ? ?LC_MESSAGES=en_CA.UTF-8
> ?[7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C
> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> --
> Herv? Pag?s
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: ?(206) 667-5791
> Fax: ? ?(206) 667-1319
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

John C Nash

2011-Dec-14 15:19 UTC

head link

[Rd] bug in sum() on integer vector

Following this thread, I wondered why nobody tried cumsum to see where the
integer
overflow occurs. On the shorter xx vector in the little script below I get a
message:

Warning message:
Integer overflow in 'cumsum'; use
'cumsum(as.numeric(.))'>
But sum() does not give such a warning, which I believe is the point of
contention. Since
cumsum() does manage to give such a warning, and show where the overflow occurs,
should
sum() not be able to do so? For the record, I don't class the non-zero
answer as an error
in itself. I regard the failure to warn as the issue.

For info, on my Ubnuntu Lucid 10.04 system that has 4 GB of RAM but no swap, the
last line
of the script to do the int64 sum chugs for about 2 minutes then gives
"Killed" and
returns to the terminal prompt. It also seems to render some other applications
unstable
(I had Thunderbird running to read R-devel, and this started to behave strangely
after the
crash, and I had to reboot.) I'm copying Romain as package maintainer, and
I'll be happy
to try to work off-list to figure out how to avoid the "Killed"
result. (On a 16GB
machine, I got the 0 answer.)

Best,

John Nash

Here's the system info and small script.
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>  [7] LC_PAPER=C                LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] int64_1.1.2
>>

## sumerr.R  20111214
library(int64)
x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
xx <- c(rep(1800000003L, 1000), -rep(1200000002L, 1500))
sum(x)
sum(as.double(x))
sum(xx)
sum(as.double(xx))
cumsum(xx)
cumsum(as.int64(xx))

tmp<-readline("Now try the VERY SLOW int64")
sum(as.int64(x))

Seemingly Similar Threads

Search for more reasonably related threads

R devel - Dec 2011 - bug in sum() on integer vector

[Rd] bug in sum() on integer vector

[Rd] bug in sum() on integer vector

[Rd] bug in sum() on integer vector

[Rd] bug in sum() on integer vector

Seemingly Similar Threads