Jason Thorpe
2016-Feb-27 22:52 UTC
[R] Why does match() treat NaN's as compables; Bug or Feature?
For some reason `match()` treats `NaN`'s as comparables by default:> x <- c(1,2,3,NaN,4,5) > match(x,x)[1] 1 2 3 4 5 6 which I can override when using `match()` directly:> match(x,x,incomparables=NaN)[1] 1 2 3 NA 5 6 but not necessarily when calling a function that uses `match()` internally:> stats::ecdf(x)(x)[1] 0.2 0.4 0.6 0.8 0.8 1.0 Obviously there are workarounds for any given scenario, but the bigger problem is that this behavior causes difficult to discover bugs. For example, the behavior of stats::ecdf is definitely a bug introduced by it's use of `match()` (unless you think NaN == 4 is correct). Is there a good reason that NaN's are treated as comparables by match(), or his this a bug? For reference, I'm using R version 3.2.3 -Jason [[alternative HTML version deleted]]
Bert Gunter
2016-Feb-27 23:34 UTC
[R] Why does match() treat NaN's as compables; Bug or Feature?
If I understand you correctly, the "bug" is that you do not understand match(). See inline comment below and note carefully the "Value" section of ?match. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Feb 27, 2016 at 2:52 PM, Jason Thorpe <jdthorpe at gmail.com> wrote:> For some reason `match()` treats `NaN`'s as comparables by default: > >> x <- c(1,2,3,NaN,4,5) >> match(x,x) > [1] 1 2 3 4 5 6 > > which I can override when using `match()` directly: > >> match(x,x,incomparables=NaN) > [1] 1 2 3 NA 5 6 > > but not necessarily when calling a function that uses `match()` internally: > >> stats::ecdf(x)(x) > [1] 0.2 0.4 0.6 0.8 0.8 1.0 > > Obviously there are workarounds for any given scenario, but the bigger > problem is that this behavior causes difficult to discover bugs. For > example, the behavior of stats::ecdf is definitely a bug introduced by it's > use of `match()` (unless you think NaN == 4 is correct).No, you misunderstand. match() returns the POSITION of the match, and clearly NaN in the 4th position of table =x matches NaN in x. e.g.> match(c(x,NaN),x)[1] 1 2 3 4 5 6 4> > Is there a good reason that NaN's are treated as comparables by match(), or > his this a bug? > > For reference, I'm using R version 3.2.3 > > -Jason > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2016-Feb-28 00:06 UTC
[R] Why does match() treat NaN's as compables; Bug or Feature?
That is one valid point, but according to IEEE754 "a comparison with NaN always returns an unordered result" which it doesn't do unless the incomparables argument to match is specified. Ick. -- Sent from my phone. Please excuse my brevity. On February 27, 2016 3:34:34 PM PST, Bert Gunter <bgunter.4567 at gmail.com> wrote:>If I understand you correctly, the "bug" is that you do not understand >match(). See inline comment below and note carefully the "Value" >section of ?match. > >Cheers, >Bert > >Bert Gunter > >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >On Sat, Feb 27, 2016 at 2:52 PM, Jason Thorpe <jdthorpe at gmail.com> >wrote: >> For some reason `match()` treats `NaN`'s as comparables by default: >> >>> x <- c(1,2,3,NaN,4,5) >>> match(x,x) >> [1] 1 2 3 4 5 6 >> >> which I can override when using `match()` directly: >> >>> match(x,x,incomparables=NaN) >> [1] 1 2 3 NA 5 6 >> >> but not necessarily when calling a function that uses `match()` >internally: >> >>> stats::ecdf(x)(x) >> [1] 0.2 0.4 0.6 0.8 0.8 1.0 >> >> Obviously there are workarounds for any given scenario, but the >bigger >> problem is that this behavior causes difficult to discover bugs. For >> example, the behavior of stats::ecdf is definitely a bug introduced >by it's >> use of `match()` (unless you think NaN == 4 is correct). > >No, you misunderstand. match() returns the POSITION of the match, and >clearly NaN in the 4th position of table =x matches NaN in x. e.g. > >> match(c(x,NaN),x) >[1] 1 2 3 4 5 6 4 > > > >> >> Is there a good reason that NaN's are treated as comparables by >match(), or >> his this a bug? >> >> For reference, I'm using R version 3.2.3 >> >> -Jason >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Bert Gunter
2016-Feb-28 03:06 UTC
[R] Why does match() treat NaN's as compables; Bug or Feature?
(on list, since others might not have gotten it either). OK, I get it now. It was I who misunderstood. But isn't the bug in the **misuse** of match() in ecdf() (by failing to specify the nomatch argument). Jeff says comparisons with NaN should return an unordered result, which NaN is afaics:> NaN < 0[1] NA> NaN > 0[1] NA match() just does its thing:> match(c(NA,NaN),c(1,2,NA,3,4,NaN,5))[1] 3 6 It's up to the caller to use it correctly, which apparently ecdf() fails to do. Am I missing something here? Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Feb 27, 2016 at 3:49 PM, Jason Thorpe <jdthorpe at gmail.com> wrote:> The bug is that NaN is not part of any cumulative distribution... > > -Jason > sent from my mobile device > > On Feb 27, 2016 3:34 PM, "Bert Gunter" <bgunter.4567 at gmail.com> wrote: >> >> If I understand you correctly, the "bug" is that you do not understand >> match(). See inline comment below and note carefully the "Value" >> section of ?match. >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sat, Feb 27, 2016 at 2:52 PM, Jason Thorpe <jdthorpe at gmail.com> wrote: >> > For some reason `match()` treats `NaN`'s as comparables by default: >> > >> >> x <- c(1,2,3,NaN,4,5) >> >> match(x,x) >> > [1] 1 2 3 4 5 6 >> > >> > which I can override when using `match()` directly: >> > >> >> match(x,x,incomparables=NaN) >> > [1] 1 2 3 NA 5 6 >> > >> > but not necessarily when calling a function that uses `match()` >> > internally: >> > >> >> stats::ecdf(x)(x) >> > [1] 0.2 0.4 0.6 0.8 0.8 1.0 >> > >> > Obviously there are workarounds for any given scenario, but the bigger >> > problem is that this behavior causes difficult to discover bugs. For >> > example, the behavior of stats::ecdf is definitely a bug introduced by >> > it's >> > use of `match()` (unless you think NaN == 4 is correct). >> >> No, you misunderstand. match() returns the POSITION of the match, and >> clearly NaN in the 4th position of table =x matches NaN in x. e.g. >> >> > match(c(x,NaN),x) >> [1] 1 2 3 4 5 6 4 >> >> >> >> > >> > Is there a good reason that NaN's are treated as comparables by match(), >> > or >> > his this a bug? >> > >> > For reference, I'm using R version 3.2.3 >> > >> > -Jason >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code.
Martin Maechler
2016-Feb-29 10:14 UTC
[R] Why does match() treat NaN's as compables; Bug or Feature?
>>>>> Bert Gunter <bgunter.4567 at gmail.com> >>>>> on Sat, 27 Feb 2016 19:06:05 -0800 writes:> (on list, since others might not have gotten it either). > OK, I get it now. It was I who misunderstood. > But isn't the bug in the **misuse** of match() in ecdf() > (by failing to specify the nomatch argument). Jeff says > comparisons with NaN should return an unordered result, > which NaN is afaics: >> NaN < 0 > [1] NA >> NaN > 0 > [1] NA > match() just does its thing: >> match(c(NA,NaN),c(1,2,NA,3,4,NaN,5)) > [1] 3 6 > It's up to the caller to use it correctly, which > apparently ecdf() fails to do. > Am I missing something here? not much, if any. Let me still clarify : 1) This has *nothing* to do with match, and I am confused why nobody has mentioned this till now. 2) In x <- c(1,2,NA,3,4,NaN,5) Fn <- ecdf(x) there is no error: ecdf() does drop all NA/NaN from its input on purpose and returns the empirical CDF of the other elements: so Fn is identical (practically, not strictly formally) to Fn. <- ecdf(1:5) 3) The bug is really in the underlying C code of approx() / approxfun() on which ecdf() and notably the function it creates (!) relies : > L <- approxfun(1:6, 1:6, method = "constant") > L( (2:10)/2) [1] 1 1 2 2 3 3 4 4 5 > L( c(NaN, NA, 2:10)/2) [1] 5 NA 1 1 2 2 3 3 4 4 5 4) A fix for this bug has been committed to R-devel already, a a minute ago. [svn rev 70239] Martin Maechler, ETH Zurich > Bert Gunter > "The trouble with having an open mind is that people keep > coming along and sticking things into it." -- Opus (aka > Berkeley Breathed in his "Bloom County" comic strip ) > On Sat, Feb 27, 2016 at 3:49 PM, Jason Thorpe > <jdthorpe at gmail.com> wrote: >> The bug is that NaN is not part of any cumulative >> distribution... >> >> -Jason sent from my mobile device >> >> On Feb 27, 2016 3:34 PM, "Bert Gunter" >> <bgunter.4567 at gmail.com> wrote: >>> >>> If I understand you correctly, the "bug" is that you do >>> not understand match(). See inline comment below and >>> note carefully the "Value" section of ?match. >>> >>> Cheers, Bert >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people >>> keep coming along and sticking things into it." -- Opus >>> (aka Berkeley Breathed in his "Bloom County" comic strip >>> ) >>> >>> >>> On Sat, Feb 27, 2016 at 2:52 PM, Jason Thorpe >>> <jdthorpe at gmail.com> wrote: > For some reason `match()` >>> treats `NaN`'s as comparables by default: >>> > >>> >> x <- c(1,2,3,NaN,4,5) >> match(x,x) > [1] 1 2 3 4 5 6 >>> > >>> > which I can override when using `match()` directly: >>> > >>> >> match(x,x,incomparables=NaN) > [1] 1 2 3 NA 5 6 >>> > >>> > but not necessarily when calling a function that uses >>> `match()` > internally: >>> > >>> >> stats::ecdf(x)(x) > [1] 0.2 0.4 0.6 0.8 0.8 1.0 >>> > >>> > Obviously there are workarounds for any given >>> scenario, but the bigger > problem is that this behavior >>> causes difficult to discover bugs. For > example, the >>> behavior of stats::ecdf is definitely a bug introduced >>> by > it's > use of `match()` (unless you think NaN == 4 >>> is correct). >>> >>> No, you misunderstand. match() returns the POSITION of >>> the match, and clearly NaN in the 4th position of table >>> =x matches NaN in x. e.g. >>> >>> > match(c(x,NaN),x) [1] 1 2 3 4 5 6 4 >>> >>> >>> >>> > >>> > Is there a good reason that NaN's are treated as >>> comparables by match(), > or > his this a bug? >>> > >>> > For reference, I'm using R version 3.2.3 >>> > >>> > -Jason >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and >>> more, see > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > and >>> provide commented, minimal, self-contained, reproducible >>> code. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code.