thr3ads.net - R devel - [Rd] Match .3 in a sequence [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Daniel Murphy

2009-Mar-16 13:36 UTC

[Rd] Match .3 in a sequence

Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I
get> 0.3 %in% seq(from=.2,to=.3)[1] FALSE
Yet> 0.3 %in% c(.2,.3)[1] TRUE
For arbitrary sequences, this "invisible .3" has been problematic.
What is
the best way to work around this?
Thank you.
Dan

	[[alternative HTML version deleted]]

Duncan Murdoch

2009-Mar-16 14:12 UTC

head link

[Rd] Match .3 in a sequence

On 3/16/2009 9:36 AM, Daniel Murphy wrote:> Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get
>> 0.3 %in% seq(from=.2,to=.3)
> [1] FALSE
> Yet
>> 0.3 %in% c(.2,.3)
> [1] TRUE
> For arbitrary sequences, this "invisible .3" has been
problematic. What is
> the best way to work around this?
Don't assume that computations on floating point values are exact. 
Generally computations on small integers *are* exact, so you could 
change that to

3 %in% seq(from=2, to=3)

and get the expected result.  You can divide by 10 just before you use 
the number, or if you're starting with one decimal place, multiply by 10 
*and round to an integer* before doing the test.  Alternatively, use 
some approximate test rather than an exact one, e.g. all.equal() (but 
you'll need a bit of work to make use of all.equal() in an expression 
like 0.3 %in% c(.2,.3)).

Duncan Murdoch

Stavros Macrakis

2009-Mar-16 15:24 UTC

head link

[Rd] Match .3 in a sequence

Well, first of all, seq(from=.2,to=.3) gives c(0.2), so I assume you
really mean something like seq(from=.2,to=.3,by=.1), which gives
c(0.2, 0.3).

%in% tests for exact equality, which is almost never a good idea with
floating-point numbers.

You need to define what exactly you mean by "in" for floating-point
numbers.  What sort of tolerance are you willing to allow?

Some possibilities would be for example:

approxin <- function(x,list,tol) any(abs(list-x)<tol)   # absolute
tolerance

rapproxin <- function(x,list,tol) (x==0 && 0 %in% list) ||
any(abs((list-x)/x)<=tol,na.rm=TRUE)
     # relative tolerance; only exact 0 will match 0

Hope this helps,

          -s

On Mon, Mar 16, 2009 at 9:36 AM, Daniel Murphy <chiefmurphy at gmail.com>
wrote:> Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get
>> 0.3 %in% seq(from=.2,to=.3)
> [1] FALSE
> Yet
>> 0.3 %in% c(.2,.3)
> [1] TRUE
> For arbitrary sequences, this "invisible .3" has been
problematic. What is
> the best way to work around this?

Petr Savicky

2009-Mar-16 16:41 UTC

head link

[Rd] Match .3 in a sequence

On Mon, Mar 16, 2009 at 06:36:53AM -0700, Daniel Murphy
wrote:> Hello:I am trying to match the value 0.3 in the sequence seq(.2,.3). I get
> > 0.3 %in% seq(from=.2,to=.3)
> [1] FALSE
As others already pointed out, you should use seq(from=0.2,to=0.3,by=0.1)
to get 0.3 in the sequence. In order to get correct %in%, it is also
possible to use round(), for example
  > 0.3 %in% round(seq(from=0.2,to=0.3,by=0.1),digits=1)
  [1] TRUE

See FAQ 7.31

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
or 
  http://wiki.r-project.org/rwiki/doku.php?id=misc:r_accuracy:decimal_numbers
for more detail.

Petr.

Petr Savicky

2009-Mar-17 15:21 UTC

head link

[Rd] Match .3 in a sequence

On Tue, Mar 17, 2009 at 10:04:39AM -0400, Stavros Macrakis wrote:
...> 1) Factor allows repeated levels, e.g. factor(c(1),c(1,1,1)), with no
> warning or error.
Yes, this is a confusing behavior, since repeated levels are never meaningful.
> 2) Even from distinct inputs, factor of a numeric vector may generate
> repeated levels, because it only uses 15 digits.
I think, 15 digits is a reasonable choice. Mapping double precision numbers
and character strings with a given decimal precision is never bijective.
With 15 digits, we can achive that every character value has unique double
precision representation, but not vice versa. With 17 digits, we have a unique
character string for each double precision number, but not vice versa.
What is better?

Specification of as.character says() that the numbers are represented with
15 significant digits. So, I think, if as.factor() applies signif(,digits=15)
to a numeric vector before determining the levels using sort(unique.default(x),
this could help to eliminate most of the problems without being in conflict
with the existing specification.
> 3) The algorithm to determine the shortest format is inconsistent with
> the algorithm to actually print, giving pathological cases like 0.3
> vs. 0.300000000000000.
I do not exactly understand what you mean by inconsistent. If you do
  nums <- (.3 + 2e-16 * c(-2,-1,1,2))
  options(digits=15)
  for (x in nums) print(x)
  # [1] 0.300000000000000
  # [1] 0.3
  # [1] 0.3
  # [1] 0.300000000000000
  as.character(nums)
  # [1] "0.300000000000000" "0.3"              
"0.3"
  # [4] "0.300000000000000"
then print and as.character are consistent. Printing the whole vector
behaves differently, since it uses the same format for all numbers.
> The original problem was testing whether a floating-point number was a
> member of a vector.  rounding and then converting to a factor seem
> like a very poor way of doing that, even if the above problems were
> resolved.  Comparing with a tolerance seems much more robust, clean,
> and efficient.
Definitely, using comparison tolerance is a meaningful approach. Its
disadvantage
is that the relation abs(x - y) <= eps is not transitive. So, it may also
produce
confusing results in some situations. I think that one has to choose the right
solution depending on the application.

Petr.

Apparently Analagous Threads

Search for more seemingly similar threads

R devel - Mar 2009 - Match .3 in a sequence

[Rd] Match .3 in a sequence

[Rd] Match .3 in a sequence

[Rd] Match .3 in a sequence

[Rd] Match .3 in a sequence

[Rd] Match .3 in a sequence

Apparently Analagous Threads