Hi, x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000)) This is correct: > sum(as.double(x)) [1] 0 This is not: > sum(x) [1] 4996000 Returning NA (with a warning) would also be acceptable for the latter. That would make it consistent with cumsum(x): > cumsum(x)[length(x)] [1] NA Warning message: Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' Thanks! H. > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
On 09/12/2011 1:40 PM, Herv? Pag?s wrote:> Hi, > > x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000)) > > This is correct: > > > sum(as.double(x)) > [1] 0 > > This is not: > > > sum(x) > [1] 4996000 > > Returning NA (with a warning) would also be acceptable for the latter. > That would make it consistent with cumsum(x): > > > cumsum(x)[length(x)] > [1] NA > Warning message: > Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'This is a 64 bit problem; in 32 bits things work out properly. I'd guess in 64 bit arithmetic we or the run-time are doing something to simulate 32 bit arithmetic (since integers are 32 bits), but it looks as though we're not quite getting it right. Duncan Murdoch> Thanks! > H. > > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base >
FYI, the new int64 package on CRAN gets this right, but is of course somewhat slower since it is not doing hardware 64-bit arithmetic. ?x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000)) library(int64) sum(as.int64(x)) # [1] 0 - Murray 2011/12/9 Herv? Pag?s <hpages at fhcrc.org>:> Hi, > > ?x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000)) > > This is correct: > > ?> sum(as.double(x)) > ?[1] 0 > > This is not: > > ?> sum(x) > ?[1] 4996000 > > Returning NA (with a warning) would also be acceptable for the latter. > That would make it consistent with cumsum(x): > > ?> cumsum(x)[length(x)] > ?[1] NA > ?Warning message: > ?Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' > > Thanks! > H. > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > ?[1] LC_CTYPE=en_CA.UTF-8 ? ? ? LC_NUMERIC=C > ?[3] LC_TIME=en_CA.UTF-8 ? ? ? ?LC_COLLATE=en_CA.UTF-8 > ?[5] LC_MONETARY=en_CA.UTF-8 ? ?LC_MESSAGES=en_CA.UTF-8 > ?[7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C > ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > -- > Herv? Pag?s > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Following this thread, I wondered why nobody tried cumsum to see where the integer overflow occurs. On the shorter xx vector in the little script below I get a message: Warning message: Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'>But sum() does not give such a warning, which I believe is the point of contention. Since cumsum() does manage to give such a warning, and show where the overflow occurs, should sum() not be able to do so? For the record, I don't class the non-zero answer as an error in itself. I regard the failure to warn as the issue. For info, on my Ubnuntu Lucid 10.04 system that has 4 GB of RAM but no swap, the last line of the script to do the int64 sum chugs for about 2 minutes then gives "Killed" and returns to the terminal prompt. It also seems to render some other applications unstable (I had Thunderbird running to read R-devel, and this started to behave strangely after the crash, and I had to reboot.) I'm copying Romain as package maintainer, and I'll be happy to try to work off-list to figure out how to avoid the "Killed" result. (On a 16GB machine, I got the 0 answer.) Best, John Nash Here's the system info and small script.>> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] int64_1.1.2 >>## sumerr.R 20111214 library(int64) x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000)) xx <- c(rep(1800000003L, 1000), -rep(1200000002L, 1500)) sum(x) sum(as.double(x)) sum(xx) sum(as.double(xx)) cumsum(xx) cumsum(as.int64(xx)) tmp<-readline("Now try the VERY SLOW int64") sum(as.int64(x))