paradis@isem.univ-montp2.fr
2001-Dec-26 12:07 UTC
[Rd] bug with var(rep(1e30, 3)) (PR#1228)
There seems to be a bug with var() when the argument is a vector with exactly three values of 1e30 (or close to this value). This does not happen with twice, four (or more) times this value, or another value.> var(rep(1e30, 3))[1] 2.971056e+28> var(rep(1.2e30, 3))[1] 2.971056e+28> var(rep(0.9e30, 3))[1] 2.971056e+28> var(rep(0.8e30, 3))[1] 0> var(rep(1e29, 3))[1] 0> var(rep(1e30, 2))[1] 0> var(rep(1e30, 4))[1] 0> var(rep(1e31, 3))[1] 0 The bug is repeatable, and I got the same results with R 1.3.1 (binary from CRAN) and R 1.4.0 (binary from BD Ripley's site) both on Windows NT, and R 1.4.0 on Solaris (compiled from sources). Emmanuel Paradis Laboratoire de Paléontologie Institut des Sciences de l'Évolution Université Montpellier II F-34095 Montpellier cédex 05 France phone: +33 4 67 14 39 64 fax: +33 4 67 14 36 10 mailto:paradis@isem.univ-montp2.fr -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
paradis@isem.univ-montp2.fr writes:> There seems to be a bug with var() when the argument is a vector with > exactly three values of 1e30 (or close to this value). This does not happen > with twice, four (or more) times this value, or another value. > > > > var(rep(1e30, 3)) > [1] 2.971056e+28 > > var(rep(1.2e30, 3)) > [1] 2.971056e+28 > > var(rep(0.9e30, 3)) > [1] 2.971056e+28 > > var(rep(0.8e30, 3)) > [1] 0 > > var(rep(1e29, 3)) > [1] 0 > > var(rep(1e30, 2)) > [1] 0 > > var(rep(1e30, 4)) > [1] 0 > > var(rep(1e31, 3)) > [1] 0 > > > The bug is repeatable, and I got the same results with R 1.3.1 (binary from > CRAN) and R 1.4.0 (binary from BD Ripley's site) both on Windows NT, and R > 1.4.0 on Solaris (compiled from sources).Not a bug, roundoff:> (1e30+1e30+1e30)/3-1e30[1] 1.407375e+14> 3*((1e30+1e30+1e30)/3-1e30)^2/2[1] 2.971056e+28 Relative errors of the order 1e-16 are completely expectable in floating point arithmetic. For illustration, look at it in octal, after removing a bunch of trailing zeroes:> x<-1e30/(8^16) > structure(x, class="octmode")[1] "144762623464043165"> structure(x+x+x, class="octmode")[1] "456730272634151540" If you look carefully, you will see that a bit is lost in the process since 5+5+5 is 17 octal. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
But there are better algorithms that are not subject to this sort of rounding error. I reckon that computing a variance this way is not the quality of algorithm one should expect from R. On 26 Dec 2001, Peter Dalgaard BSA wrote:> paradis@isem.univ-montp2.fr writes: > > > There seems to be a bug with var() when the argument is a vector with > > exactly three values of 1e30 (or close to this value). This does not happen > > with twice, four (or more) times this value, or another value. > > > > > > > var(rep(1e30, 3)) > > [1] 2.971056e+28 > > > var(rep(1.2e30, 3)) > > [1] 2.971056e+28 > > > var(rep(0.9e30, 3)) > > [1] 2.971056e+28 > > > var(rep(0.8e30, 3)) > > [1] 0 > > > var(rep(1e29, 3)) > > [1] 0 > > > var(rep(1e30, 2)) > > [1] 0 > > > var(rep(1e30, 4)) > > [1] 0 > > > var(rep(1e31, 3)) > > [1] 0 > > > > > > The bug is repeatable, and I got the same results with R 1.3.1 (binary from > > CRAN) and R 1.4.0 (binary from BD Ripley's site) both on Windows NT, and R > > 1.4.0 on Solaris (compiled from sources). > > Not a bug, roundoff: > > > (1e30+1e30+1e30)/3-1e30 > [1] 1.407375e+14 > > 3*((1e30+1e30+1e30)/3-1e30)^2/2 > [1] 2.971056e+28 > > Relative errors of the order 1e-16 are completely expectable in > floating point arithmetic. > > For illustration, look at it in octal, after removing a bunch of > trailing zeroes: > > > x<-1e30/(8^16) > > structure(x, class="octmode") > [1] "144762623464043165" > > structure(x+x+x, class="octmode") > [1] "456730272634151540" > > If you look carefully, you will see that a bit is lost in the process > since 5+5+5 is 17 octal. > > -- > O__ ---- Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian Ripley <ripley@stats.ox.ac.uk> writes:> But there are better algorithms that are not subject to this sort > of rounding error. > > I reckon that computing a variance this way is not the quality of > algorithm one should expect from R.Hmm. Which algorithms are there? Surely almost any kind of provisional means subtraction could avoid the the extreme case where the mean of a bunch of identical numbers is different from all of them, but will it also improve precision in the general case? Essentially our current algorithm in cov.c is xm<- sum(x)/n v <- 1/(n-1)*sum((x-xm)^2)> > > x<-1e30/(8^16) > > > structure(x, class="octmode") > > [1] "144762623464043165" > > > structure(x+x+x, class="octmode") > > [1] "456730272634151540" > > > > If you look carefully, you will see that a bit is lost in the process > > since 5+5+5 is 17 octal.-- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._