I have encountered a strange behaviour of as.integer() which does not seem correct to me. Sorry if this is just an indication of me not understanding floating point arithmetic.> .57 * 100[1] 57> .29 * 100[1] 29 So far, so good. But:> as.integer(.57 * 100)[1] 56> as.integer(.29 * 100)[1] 28 Then again:> all.equal(.57 * 100, as.integer(57))[1] TRUE> all.equal(.29 * 100, as.integer(29))[1] TRUE This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 (Windows), all 32 bit versions. Is this really intended?
On 07/01/2010 7:31 AM, Ulrich Keller wrote:> I have encountered a strange behaviour of as.integer() which does not > seem correct to me. Sorry if this is just an indication of me not > understanding floating point arithmetic. > >> .57 * 100 > [1] 57 >> .29 * 100 > [1] 29 > > So far, so good. But: > >> as.integer(.57 * 100) > [1] 56 >> as.integer(.29 * 100) > [1] 28 > > Then again: > >> all.equal(.57 * 100, as.integer(57)) > [1] TRUE >> all.equal(.29 * 100, as.integer(29)) > [1] TRUE > > This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 > (Windows), > all 32 bit versions. Is this really intended?Yes, as the man page states, non-integer values are truncated towards zero. Normal printing rounds them. So .57*100, which is slightly less than 57, is rounded to 57 for printing, but is truncated to 56 by as.integer. > .57*100 < 57 [1] TRUE Duncan Murdoch
On Jan 7, 2010, at 7:31 AM, Ulrich Keller wrote:> I have encountered a strange behaviour of as.integer() which does not > seem correct to me. Sorry if this is just an indication of me not > understanding floating point arithmetic. > >> .57 * 100 > [1] 57 >> .29 * 100 > [1] 29 > > So far, so good. But: > >> as.integer(.57 * 100) > [1] 56 >> as.integer(.29 * 100) > [1] 28 >From help page for as.integer: "Non-integral numeric values are truncated towards zero (i.e., as.integer(x) equals trunc(x) there), "> Then again: > >> all.equal(.57 * 100, as.integer(57)) > [1] TRUE >> all.equal(.29 * 100, as.integer(29)) > [1] TRUE > > This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 > (Windows), > all 32 bit versions. Is this really intended?Yes, it works as documented. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Hi Maybe FAQ 7.31 strikes again.> .57 * 100==as.integer(57)[1] FALSE>Regards Petr r-help-bounces at r-project.org napsal dne 07.01.2010 13:31:42:> I have encountered a strange behaviour of as.integer() which does not > seem correct to me. Sorry if this is just an indication of me not > understanding floating point arithmetic. > > > .57 * 100 > [1] 57 > > .29 * 100 > [1] 29 > > So far, so good. But: > > > as.integer(.57 * 100) > [1] 56 > > as.integer(.29 * 100) > [1] 28 > > Then again: > > > all.equal(.57 * 100, as.integer(57)) > [1] TRUE > > all.equal(.29 * 100, as.integer(29)) > [1] TRUE > > This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 > (Windows), > all 32 bit versions. Is this really intended? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On 07-Jan-10 12:31:42, Ulrich Keller wrote:> I have encountered a strange behaviour of as.integer() which > does not seem correct to me. Sorry if this is just an indication > of me not understanding floating point arithmetic.I'm afraid it probably is -- but being aware of what the problem is, is 0.875 of solving it (sticking to binary-compatible fractions)! See below.>> .57 * 100 > [1] 57 >> .29 * 100 > [1] 29So it seems, but: 57 - .57 * 100 # [1] 7.105427e-15 (.57 * 100 < 57) # [1] TRUE So things are not what they seem. Now:> So far, so good. But: > >> as.integer(.57 * 100) > [1] 56 >> as.integer(.29 * 100) > [1] 28But if you look at ?as.integer you see: "Non-integral numeric values are truncated towards zero (i.e., ?as.integer(x)? equals ?trunc(x)? there)" so since .57 * 100 is stored as the equivalent of 56.999<something> its fractional part i discarded, resulting in 56.> Then again: > >> all.equal(.57 * 100, as.integer(57)) > [1] TRUE >> all.equal(.29 * 100, as.integer(29)) > [1] TRUEAnd now you should also read ?all.equal: "'all.equal(x,y)' is a utility to compare R objects 'x' and 'y' testing 'near equality'. [...] Usage: [...] all.equal(target, current, tolerance = .Machine$double.eps ^ 0.5, scale = NULL, check.attributes = TRUE, ...) [...] tolerance: numeric >= 0. Differences smaller than 'tolerance' are not considered." Now, on my R, .Machine$double.eps ^ 0.5 # [1] 1.490116e-08 whereas (see above) (57 - .57 * 100) = 7.105427e-15, which is smaller than .Machine$double.eps ^ 0.5.> This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 > (Windows), all 32 bit versions. Is this really intended?Yes! And, as you suspect, it is all down to the binary representation of fractional numbers input as decimal. There is no finite-length binary fraction which is exactly equal to 0.57[decimal]. If there were, then for some power k of 2 (2^k)*0.57 would be an exact integer. You can easily verify that this is not the case. Just keep doubling 0.57: the series starts as 0.57 1.14 2.28 4.56 9.12 18.24 ... and finally, at the 23rd position, you get 2390753.28 and you are now back at the "****.28" fractional part (as at position 3 above). Hence the fractional parts will cycle through .28, .56, .12, ... forever, so there is no exact binary representation of 0.57. To be absolutely sure of it, you should do it by hand on paper, (lest you tickle rounding errors in R) but R will in fact give you the sequence: 0.57*2^(0:24) # [1] 0.57 1.14 *2.28* 4.56 # [5] 9.12 18.24 36.48 72.96 # [9] 145.92 291.84 583.68 1167.36 # [13] 2334.72 4669.44 9338.88 18677.76 # [27] 37355.52 74711.04 149422.08 298844.16 # [21] 597688.32 1195376.64 *2390753.28* 4781506.56 Hoping this helps! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 07-Jan-10 Time: 13:32:31 ------------------------------ XFMail ------------------------------
Use round(), floor(), or ceiling() to convert numbers with possible fractional parts to numbers without fraction parts. as.integer()'s main use is to convert from one internal representation (i.e., bit pattern) of a number to another so you can interface to C or Fortran code. Note that as.integer(x) also doesn't "work" when abs(x)>2^31, while round(), floor(), and ceiling() do work up to c. 2^52. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Ulrich Keller > Sent: Thursday, January 07, 2010 4:32 AM > To: r-help at r-project.org > Subject: [R] Strange behaviour of as.integer() > > I have encountered a strange behaviour of as.integer() which does not > seem correct to me. Sorry if this is just an indication of me not > understanding floating point arithmetic. > > > .57 * 100 > [1] 57 > > .29 * 100 > [1] 29 > > So far, so good. But: > > > as.integer(.57 * 100) > [1] 56 > > as.integer(.29 * 100) > [1] 28 > > Then again: > > > all.equal(.57 * 100, as.integer(57)) > [1] TRUE > > all.equal(.29 * 100, as.integer(29)) > [1] TRUE > > This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 > (Windows), > all 32 bit versions. Is this really intended? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
See below.>> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of Ulrich Keller >> Sent: Thursday, January 07, 2010 4:32 AM >> To: r-help at r-project.org >> Subject: [R] Strange behaviour of as.integer() >> >> I have encountered a strange behaviour of as.integer() which does not >> seem correct to me. Sorry if this is just an indication of me not >> understanding floating point arithmetic. >> >> > .57 * 100 >> [1] 57 >> > .29 * 100 >> [1] 29 >> >> So far, so good. But: >> >> > as.integer(.57 * 100) >> [1] 56 >> > as.integer(.29 * 100) >> [1] 28 >> >> Then again: >> >> > all.equal(.57 * 100, as.integer(57)) >> [1] TRUE >> > all.equal(.29 * 100, as.integer(29)) >> [1] TRUE >> >> This behaviour is the same in R 2.10.1 (Ubuntu and Windows) and 2.9.2 >> (Windows), all 32 bit versions. Is this really intended?I would like to add a salutory tail-piece to this correspondence. It is a simple recursive calculation which, after not many steps, R will in almost all cases get badly wrong as a result of the finite binary representation of fractions. (I have posted it before, some years ago, but it may be worth bringing it back since it shows very vividly what can go wrong if you do not pay attention to this aspect of numerical computation). Working in the interval [0,1], X[n+1] is calculated from x[n] according to: if 0 <= X[n] <= 1/2 then X[n+1] = 2*X[n] if 1/2 <= X[n] <= 1 then X[n+1] = 2*(1 - X[n]) X = 2/3 is a fixed point of this, since 2/3 -> 2*(1 - 2/3) = 2/3; X = 2/5 -> 4/5 -> 2*(1 - 4/5) = 2/5 has period 2; X = 2/7 -> 4/7 -> 2*(1 - 4/7) = 6/7 -> 2*(1 - 6/7) = 2/7 (period 3) and so on. All fractions which are multiples of 1/(2^k) for some k>0 eventually end up at 0 and stay there. Irrational numbers, mathematically, never repeat. However, none of the above periodic numbers 2/(2*m + 1) can be represented exactly in a finite binary representation, and that is where the trouble starts. So, in R: nextX <- function(x){if(x <= 0.5) (2*x) else (2*(1 - x))} Now try the fixed point x=2/3: i<-0; x<-2/3; while(x>0){i<-(i+1); x<-nextX(x) ; print(c(i,x))} of which the last few lines are: [1] 46.0000000 0.6640625 [1] 47.000000 0.671875 [1] 48.00000 0.65625 [1] 49.0000 0.6875 [1] 50.000 0.625 [1] 51.00 0.75 [1] 52.0 0.5 [1] 53 1 [1] 54 0 Similarly try any of the other periodic values, e.g. i<-0; x<-2/11; while(x>0){i<-(i+1); x<-nextX(x) ; print(c(i,x))} They will all halt at x=0 after 50-55 iterations. Similarly a non-periodic number such as 1/sqrt(2) or 1/pi will also fail. Thus the results of such calculations will eventually be grossly wrong. So do not try this kind of calculation in R! -- at any rate not without adopting special measures. For instance, it would be possible, for rational x = m/n, to program a function which kept track of M and N in the rational expression M/N of the result after each iteration, by working out what M and N would be. Then, at any iteration, the numerical value of M/N could be computed (to within the precision used by R) and returned. This would still give wrong answers for irrational starting numbers, since they would have to be stored as rational approximations and then would either end up at 0 or be periodic while the mathematical result would never repeat, so eventually they would be arbitrarily far apart (within [0,1]). Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Jan-10 Time: 18:13:41 ------------------------------ XFMail ------------------------------
I would like to thank all those who wrote helpful and interesting replies to my question. I could have saved myself the time for writing it up had I simply read ?as.integer. However, under the circumstances in which I first encountered as.integer's truncating behaviour this was not so obvious because as.integer was called implicitly. To summarize my initial problem:> length(rep(0, .57 * 100))[1] 56