hi ?integer says: Note that on almost all implementations of R the range of representable integers is restricted to about +/-2*10^9: 'double's can hold much larger integers exactly. I am getting very confused as to when to use integers and when not to. In my line I need exact comparisons of large integer-valued arrays, so I often use as.integer(), but the above seems to tell me that doubles might be better. Consider the following R idiom of Euclid's algorithm for the highest common factor of two positive integers: gcd <- function(a, b){ if (b == 0){ return(a)} return(Recall(b, a%%b)) } If I call this with gcd(10,12), for example, then a%%b is not an integer, so the first line of the function, testing b for being zero, isn't legitimate. OK, so I have some options: (1) stick in "a <- as.integer(a), b <- as.integer(b)" into the function: then a%%b *will* be an integer and the "==" test is appropriate (2) use some test like abs(b) < TOL for some suitable TOL (0.5?) (3) use identical(all.equal(b,0),TRUE) like it says in identical.Rd (4) use identical(all.equal(b,as.integer(0)),TRUE) How does the List deal with this kind of problem? Also, gcd() as written returns a non-integer. Would the List recommend rewriting the last line as return(as.integer(Recall(b,a%%b))) or not? -- Robin Hankin Uncertainty Analyst Southampton Oceanography Centre European Way, Southampton SO14 3ZH, UK tel 023-8059-7743
Robin Hankin <r.hankin at soc.soton.ac.uk> writes:> hi > > > ?integer says: > > Note that on almost all implementations of R the range of > representable integers is restricted to about +/-2*10^9: 'double's > can hold much larger integers exactly. > > > I am getting very confused as to when to use integers and when not to. > In my line > I need exact comparisons of large integer-valued arrays, so I often > use as.integer(), > but the above seems to tell me that doubles might be better. > > Consider the following R idiom of Euclid's algorithm for the highest > common factor > of two positive integers: > > gcd <- function(a, b){ > if (b == 0){ return(a)} > return(Recall(b, a%%b)) > } > > If I call this with gcd(10,12), for example, then a%%b is not an > integer, so the first > line of the function, testing b for being zero, isn't legitimate. > > OK, so I have some options: > > (1) stick in "a <- as.integer(a), b <- as.integer(b)" into the > function: then a%%b *will* be an > integer and the "==" test is appropriate > (2) use some test like abs(b) < TOL for some suitable TOL (0.5?) > (3) use identical(all.equal(b,0),TRUE) like it says in identical.Rd > (4) use identical(all.equal(b,as.integer(0)),TRUE) > > How does the List deal with this kind of problem? > > Also, gcd() as written returns a non-integer. Would the List > recommend rewriting the last > line as > > return(as.integer(Recall(b,a%%b))) > > or not?Not if you want things to work in the large-integer domain... You're in somewhat murky waters here because it all has to do with whether you can rely on the floating point aritmetic being exact for integers up to 2^53. *If* that works, then there's really no reason to distrust "==" in this context and the gcd() works as originally written. You might consider wrapping it in a function that checks whether a and b are both (1) in range and (2) that they are integers in the sense that round(x)==x. (Failing 2, you likely get an infinite recursion). -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Tue, 8 Mar 2005 09:03:43 +0000, Robin Hankin <r.hankin at soc.soton.ac.uk> wrote :>hi > > >?integer says: > > Note that on almost all implementations of R the range of > representable integers is restricted to about +/-2*10^9: 'double's > can hold much larger integers exactly. > > >I am getting very confused as to when to use integers and when not to. >In my line >I need exact comparisons of large integer-valued arrays, so I often use >as.integer(), >but the above seems to tell me that doubles might be better. > >Consider the following R idiom of Euclid's algorithm for the highest >common factor >of two positive integers: > > gcd <- function(a, b){ > if (b == 0){ return(a)} > return(Recall(b, a%%b)) > } > >If I call this with gcd(10,12), for example, then a%%b is not an >integer, so the first >line of the function, testing b for being zero, isn't legitimate.When you say it isn't legitimate, you mean that it violates the advice never to use exact comparison on floating point values? I think that's just advice, it's not a hard and fast rule. If you happen to know that the values being compared have been calculated and stored exactly, then "==" is valid. In your function, when a and b are integers that are within some range (I'm not sure what it is, but it approaches +/- 2^53), the %% operator should return exact results. (Does it do so on all platforms? I'm not sure, but I'd call it a bug if it didn't unless a and/or b were very close to the upper limit of exactly representable integers.) Do you know of examples where a and b are integers stored in floating point, and a %% b returns a value that is different from as.integer(a) %% as.integer(b)?> >OK, so I have some options: > >(1) stick in "a <- as.integer(a), b <- as.integer(b)" into the >function: then a%%b *will* be an > integer and the "==" test is appropriate >(2) use some test like abs(b) < TOL for some suitable TOL (0.5?) >(3) use identical(all.equal(b,0),TRUE) like it says in identical.Rd >(4) use identical(all.equal(b,as.integer(0)),TRUE)I'd suggest (5) Use your gcd function almost as above, but modified to work on vectors: gcd <- function(a, b){ result <- a nonzero <- b != 0 if (any(nonzero)) result[nonzero] <- Recall(b[nonzero], a[nonzero] %% b[nonzero]) return(result) }>How does the List deal with this kind of problem? > >Also, gcd() as written returns a non-integer. Would the List recommend >rewriting the last >line as > >return(as.integer(Recall(b,a%%b))) > >or not?I'd say not. Your original function returns integer when both a and b are stored as integers, and double when at least one of them is not. That seems like reasonable behaviour to me. Duncan Murdoch