Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get> 0.3 %in% seq(from=.2,to=.3)[1] FALSE Yet> 0.3 %in% c(.2,.3)[1] TRUE For arbitrary sequences, this "invisible .3" has been problematic. What is the best way to work around this? Thank you. Dan [[alternative HTML version deleted]]
On 3/16/2009 9:36 AM, Daniel Murphy wrote:> Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get >> 0.3 %in% seq(from=.2,to=.3) > [1] FALSE > Yet >> 0.3 %in% c(.2,.3) > [1] TRUE > For arbitrary sequences, this "invisible .3" has been problematic. What is > the best way to work around this?Don't assume that computations on floating point values are exact. Generally computations on small integers *are* exact, so you could change that to 3 %in% seq(from=2, to=3) and get the expected result. You can divide by 10 just before you use the number, or if you're starting with one decimal place, multiply by 10 *and round to an integer* before doing the test. Alternatively, use some approximate test rather than an exact one, e.g. all.equal() (but you'll need a bit of work to make use of all.equal() in an expression like 0.3 %in% c(.2,.3)). Duncan Murdoch
Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I assume you really mean something like seq(from=.2,to=.3,by=.1), which gives c(0.2, 0.3). %in% tests for exact equality, which is almost never a good idea with floating-point numbers. You need to define what exactly you mean by "in" for floating-point numbers. What sort of tolerance are you willing to allow? Some possibilities would be for example: approxin <- function(x,list,tol) any(abs(list-x)<tol) # absolute tolerance rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) || any(abs((list-x)/x)<=tol,na.rm=TRUE) # relative tolerance; only exact 0 will match 0 Hope this helps, -s On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy <chiefmurphy at gmail.com> wrote:> Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get >> 0.3 %in% seq(from=.2,to=.3) > [1] FALSE > Yet >> 0.3 %in% c(.2,.3) > [1] TRUE > For arbitrary sequences, this "invisible .3" has been problematic. What is > the best way to work around this?
On Mon, Mar 16, 2009 at 06:36:53AM -0700, Daniel Murphy wrote:> Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get > > 0.3 %in% seq(from=.2,to=.3) > [1] FALSEAs others already pointed out, you should use seq(from=0.2,to=0.3,by=0.1) to get 0.3 in the sequence. In order to get correct %in%, it is also possible to use round(), for example > 0.3 %in% round(seq(from=0.2,to=0.3,by=0.1),digits=1) [1] TRUE See FAQ 7.31 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f or http://wiki.r-project.org/rwiki/doku.php?id=misc:r_accuracy:decimal_numbers for more detail. Petr.
On Tue, Mar 17, 2009 at 10:04:39AM -0400, Stavros Macrakis wrote: ...> 1) Factor allows repeated levels, e.g. factor(c(1),c(1,1,1)), with no > warning or error.Yes, this is a confusing behavior, since repeated levels are never meaningful.> 2) Even from distinct inputs, factor of a numeric vector may generate > repeated levels, because it only uses 15 digits.I think, 15 digits is a reasonable choice. Mapping double precision numbers and character strings with a given decimal precision is never bijective. With 15 digits, we can achive that every character value has unique double precision representation, but not vice versa. With 17 digits, we have a unique character string for each double precision number, but not vice versa. What is better? Specification of as.character says() that the numbers are represented with 15 significant digits. So, I think, if as.factor() applies signif(,digits=15) to a numeric vector before determining the levels using sort(unique.default(x), this could help to eliminate most of the problems without being in conflict with the existing specification.> 3) The algorithm to determine the shortest format is inconsistent with > the algorithm to actually print, giving pathological cases like 0.3 > vs. 0.300000000000000.I do not exactly understand what you mean by inconsistent. If you do nums <- (.3 + 2e-16 * c(-2,-1,1,2)) options(digits=15) for (x in nums) print(x) # [1] 0.300000000000000 # [1] 0.3 # [1] 0.3 # [1] 0.300000000000000 as.character(nums) # [1] "0.300000000000000" "0.3" "0.3" # [4] "0.300000000000000" then print and as.character are consistent. Printing the whole vector behaves differently, since it uses the same format for all numbers.> The original problem was testing whether a floating-point number was a > member of a vector. rounding and then converting to a factor seem > like a very poor way of doing that, even if the above problems were > resolved. Comparing with a tolerance seems much more robust, clean, > and efficient.Definitely, using comparison tolerance is a meaningful approach. Its disadvantage is that the relation abs(x - y) <= eps is not transitive. So, it may also produce confusing results in some situations. I think that one has to choose the right solution depending on the application. Petr.