thr3ads.net - R devel - [Rd] complex NA's match(), etc: not back-compatible change proposal [May 2016]

If this information is useful, please help other people find it:
Share via:

Suharto Anggono Suharto Anggono

2016-May-28 09:34 UTC

[Rd] complex NA's match(), etc: not back-compatible change proposal

On 'factor', I meant the case where 'levels' is not specified,
where 'unique' is called.
> factor(c(complex(real=NaN), complex(imaginary=NaN)))[1] NaN+0i <NA>
Levels: NaN+0i

Look at <NA> in the result above. Yes, it happens in earlier versions of
R, too.

On matching both NA and NaN, another consequence is that length(unique(.)) may
depend on order. Example using R devel r70604:
> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1);
rm(x0)
> (z <- z[is.na(z)]) [1]       NA NaN+  0i       NA NaN+  1i       NA       NA       NA       NA
 [9]   0+NaNi   1+NaNi       NA NaN+NaNi> length(print(unique(z)))[1]     NA NaN+0i
[1] 2> length(print(unique(c(z[8], z[-8]))))[1] NA
[1] 1
--------------------------------------------
On Mon, 23/5/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote:

 Subject: Re: [Rd] complex NA's match(), etc: not back-compatible change
proposal

 Cc: R-devel at r-project.org
 Date: Monday, 23 May, 2016, 11:06 PM

 >>>>>
 Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
 >>>>>? ???on Fri, 13
 May 2016 16:33:05 +0000 writes:

 ? ? > That, for example, complex(real=NaN)
 and complex(imaginary=NaN) are regarded as equal makes it
 possible that 

 ? ? >?
 length(unique(as.character(x))) > length(unique(x)) 

 ? ? > (current code of
 function 'factor' doesn't expect it). 

 Thank you, that is an
 interesting remark - but is already true,
 in
[[elided Yahoo spam]]

 ..
 and of course this is because we do
 *print*???0+NaNi? etc,
 i.e., we
 differentiate the? non-NA-but-NaN complex values in
 formatting / printing but not in match(),
 unique() ...

 and indeed,
 with the? 'z'? example below,
 ?
 fz <- factor(z,z)
 gives a warnings about
 duplicated levels and gives such warnings
 also in current (and previous) versions of R,
 at least for the slightly
 larger z?
 I've used in the tests/reg-tests-1c.R example.

 For the moment I can live with
 that warning, as I don't think
 factor()s
 are constructed from complex numbers "often"...
 and the performance of factor() in the more
 regular cases is important.

 > Yes, an argument for the behavior is that
 NA and NaN are of one kind.
 > On my
 system, using 32-bit R for Windows from binary from CRAN,
 the result of sapply(z, match, table = z) (not in current
 R-devel) may be different from below:
 ? ?
 > 1 2 3 4 1 3 7 8 2 4 8 12? # R 2.10.1, different from
 below
 ? ? > 1 2 3 4 1 3 7 8 2 4 8 12?
 # R 3.2.5, different from below

 interesting, thank you... and another reason
 why the change
 (currently only in R-devel)
 may have been a good one: More uniformity.

 ? ? > I noticed that, by
 function 'cequal' in unique.c, a complex number that
 has both NA and NaN matches NA and also matches NaN.

 ? ? >> x0 <- c(0,1,
 NA, NaN); z <- outer(x0,x0, complex, length.out=1);
 rm(x0)
 ? ? >> (z <-
 z[is.na(z)])
 ? ? > [1]? ?
 ???NA NaN+? 0i? ? ???NA NaN+? 1i?
 ? ???NA? ? ???NA? ?
 ???NA? ? ???NA
 ? ?
 > [9]???0+NaNi???1+NaNi? ?
 ???NA NaN+NaNi

 ? ? >> sapply(z, match, table  z[8])
 ? ? > [1] 1 1 1 1 1 1 1 1 1 1 1
 1
 ? ? >> match(z, z[8])
 ? ? > [1] 1 1 1 1 1 1 1 1 1 1 1 1

 Yes, I see the same. But is
 n't it what we expect:

 All of our z[] entries has at least one NA or a
 NaN in its real
 or imaginary, and since z[8]
 has both, it does match with all
 z[]'s
 either because of the NA or because of the NaN in common.

 Hence, currently, I don't
 think this needs to be changed...
 but if
 there are other reasons / arguments ...

 Thank you again,
 Martin
 Maechler


 ? ? >> sessionInfo()
 ?
 ? > R Under development (unstable) (2016-05-12
 r70604)
 ? ? > Platform:
 i386-w64-mingw32/i386 (32-bit)
 ? ? >
 Running under: Windows XP (build 2600) Service Pack 2

 ? ? > locale:
 ? ? > [1] LC_COLLATE=English_United
 States.1252
 ? ? > [2]
 LC_CTYPE=English_United States.1252
 ? ?
 > [3] LC_MONETARY=English_United States.1252
 ? ? > [4] LC_NUMERIC=C
 ?
 ? > [5] LC_TIME=English_United States.1252

 ? ? > attached base
 packages:
 ? ? > [1] stats?
 ???graphics? grDevices utils?
 ???datasets? methods???base

 ? ? >
 -----------------
 >>>>>
 Martin Maechler <maechler at stat.math.ethz.ch>
 >>>>>? ???on Tue, 10
 May 2016 16:08:39 +0200 writes:

 ? ? >> This is an RFC / announcement
 related to the 2nd part of PR#16885
 ? ?
 >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
 ? ? >> about? complex NA's.

 ? ? >> The (somewhat
 rare) incompatibility in R's 3.3.0 match() behavior for
 the
 ? ? >> case of complex numbers
 with NA & NaN's {which has been fixed for R 3.3.0
 ? ? >> patched in the mean time}
 triggered some more comprehensive "research".

 ? ? >> I found that we
 have had a long-standing inconsistency at least between
 the
 ? ? >> documented and the real
 behavior.? I am claiming that the documented
 ? ? >> behavior is desirable and hence
 R's current "real" behavior is bugous, and
 ? ? >> I am proposing to change it, in
 R-devel (to be 3.4.0) for now.

 ? ? > After the? "roaring
 unanimous" assent? (one private msg
 ?
 ? > encouraging me to go forward, no dissenting voice,
 hence an
 ? ? > "odds ratio"
 of? +Inf? in favor ;-)

 ?
 ? > I have now committed my proposal to R-devel (svn
 rev. 70597) and
 ? ? > some of us will
 be seeing the effect in package space within a
 ? ? > day or so, in the CRAN checks
 against R-devel (not for
 ? ? >
 bioconductor AFAIK; their checks using R-devel only when it
 less
 ? ? > than ca 6 months from
 release).

 ? ? >
 It's still worthwhile to discuss the issue, if you come
 late
 ? ? > to it, notably as
 ---paraphrasing Dirk on the R-package-devel list---
 ? ? > the release of 3.4.0 is almost a
 year away, and so now is the
 ? ? > best
 time to tinker with the API, in other words, consider
 breaking
 ? ? > rarely used legacy
 APIs..

 ? ? > Martin


 ? ?
 >> In help(match) we have been saying

 ? ? >> |? Exactly
 what matches what is to some extent a matter of
 definition.
 ? ? >> |? For all
 types, \code{NA} matches \code{NA} and no other value.
 ? ? >> |? For real and complex values,
 \code{NaN} values are regarded
 ? ?
 >> |? as matching any other \code{NaN} value, but not
 matching \code{NA}.

 ? ?
 >> for at least 10 years.? But we don't do that
 at all in the
 ? ? >> complex case
 (and AFAIK never got a bug report about it).

 ? ? >> Also, e.g.,
 print(.) or format(.) do simply use? "NA" for
 all
 ? ? >> the different complex
 NA-containing numbers, where OTOH,
 ? ?
 >> non-NA NaN's { <=>? !is.nan(z) &
 is.na(z) }
 ? ? >> in format() or
 print() do show the NaN in real and/or imaginary
 ? ? >> parts; for an example, look at
 the "format" column of the matrix
 ? ? >> below, after
 'print(cbind' ...

 ? ? >> The current match()---and
 duplicated(), unique() which are based on the same
 ? ? >> C code---*do* distinguish almost
 all complex NA / NaN's which is
 ? ?
 >> NOT according to documentation. I have found that
 this is just because of 
 ? ? >> of
 our hashing function for the complex case, chash() in
 R/src/main/unique.c,
 ? ? >> is
 bogous in the sense that it is not compatible with the above
 documentation
 ? ? >> and also not
 with the cequal() function (in the same file uniqu.c) for
 checking
 ? ? >> equality of complex
 numbers.

 ? ? >> As
 I have found,, a *simplified* version of the chash()
 function
 ? ? >> to make it
 compatible with cequal() does solve all the problems
 I've
 ? ? >> indicated,? and the
 current plan is to commit that change --- after some
 ? ? >> discussion time, here on R-devel
 ---? to the code base.

 ?
 ? >> My change passes? 'make check-all'
 fine, but I'm 100% sure that there will
 ? ? >> be effects in package-space. ...
 one reason for this posting.

 ? ? >> As mentioned above, note that
 the chash() function has been in
 ? ?
 >> use for all three functions
 ? ?
 >> match()
 ? ? >>
 duplicated()
 ? ? >> unique()
 ? ? >> and the change will affect all
 three --- but just for the case of complex
 ? ? >> vectors with NA or NaN's.

 ? ? >> To show more, a
 small R session -- using my version of R-devel
 ? ? >> == the proposition: 
 ? ? >> The R script
 ('complex-NA-short.R') for (a bit more than) the
 ? ? >> session is attached {{you can
 attach? text/plain easily}}:

 ? ? >>> x0 <- c(0,1, NA, NaN); z
 <- outer(x0,x0, complex, length.out=1); rm(x0)
 ? ? >>> ##? ? ? ?
 ???--- = NA_real_? but that does not exist e.g.,
 in R 2.3.1
 ? ? >>> ##? ? ? ?
 ? ? ? ? ???similarly,? '1L',
 '2L', .. do not exist e.g., in R 2.3.1
 ? ? >>> (z <- z[is.na(z)])
 ? ? >> [1]? ? ???NA NaN+?
 0i? ? ???NA NaN+? 1i? ? ???NA?
 ? ???NA? ? ???NA? ?
 ???NA
 ? ? >>
 [9]???0+NaNi???1+NaNi? ?
 ???NA NaN+NaNi
 ? ? >>>
 outerID <- function(x,y, ...) { ## ugly; can we get
 outer() to work ?
 ? ? >> +?
 ???r <- matrix( , length(x), length(y))
 ? ? >> +? ???for(i in
 seq(along=x))
 ? ? >> +? ? ?
 ???for(j in seq(along=y))
 ? ?
 >> +? ? ? ? ? ???r[i,j] <-
 identical(z[i], z[j], ...)
 ? ? >>
 +? ???r
 ? ? >> + }
 ? ? >>> ## Very strictly - in the
 sense of identical() -- these 12 complex numbers all
 differ:
 ? ? >>> ## a version that
 works in older versions of R, where identical() had fewer
 arguments!
 ? ? >>> outerID.picky
 <- function(x,y) {
 ? ? >> +?
 ???nF <- length(formals(identical)) - 2
 ? ? >> +?
 ???do.call("outerID", c(list(x, y),
 as.list(rep(FALSE, nF))))
 ? ? >> +
 }
 ? ? >>> oldR <-
 !exists("getRversion") || getRversion() <
 "3.0.0" ## << FIXME: 3.0.0 is? a wild
 guess
 ? ? >>> symnum(id.z <-
 outerID.picky(z,z)) ## == Diagonal matrix [newer versions of
 R]
 ? ? ? ? ? ? ? ? ? ? ? ? ?
 ???
 ? ? >> [1,] | . . . .
 . . . . . . .
 ? ? >> [2,] . | . . .
 . . . . . . .
 ? ? >> [3,] . . | . .
 . . . . . . .
 ? ? >> [4,] . . . | .
 . . . . . . .
 ? ? >> [5,] . . . . |
 . . . . . . .
 ? ? >> [6,] . . . . .
 | . . . . . .
 ? ? >> [7,] . . . . .
 . | . . . . .
 ? ? >> [8,] . . . . .
 . . | . . . .
 ? ? >> [9,] . . . . .
 . . . | . . .
 ? ? >> [10,] . . . . .
 . . . . | . .
 ? ? >> [11,] . . . . .
 . . . . . | .
 ? ? >> [12,] . . . . .
 . . . . . . |
 ? ? >>> try(# for
 older R versions
 ? ? >> +
 stopifnot(identical(id.z, outerID(z,z)), oldR ||
 identical(id.z, diag(12) == 1))
 ? ?
 >> + )
 ? ? >>> (mz <-
 match(z, z)) # currently different {NA,NaN} patterns differ
 - not in print()/format() _FIXME_
 ? ?
 >> [1] 1 2 1 2 1 1 1 1 2 2 1 2
 ? ?
 >>> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see
 the pattern :
 ? ? >>>
 print(cbind(format = format(z), t(zRI), mz), quote=FALSE)
 ? ? >>
 format???Re???Im???mz
 ? ? >> [1,]? ? ???NA
 <NA> 0? ? 1 
 ? ? >> [2,]
 NaN+? 0i NaN? 0? ? 2 
 ? ? >>
 [3,]? ? ???NA <NA> 1? ? 1 
 ? ? >> [4,] NaN+? 1i NaN? 1? ? 2

 ? ? >> [5,]? ? ???NA
 0? ? <NA> 1 
 ? ? >> [6,]?
 ? ???NA 1? ? <NA> 1 
 ?
 ? >> [7,]? ? ???NA <NA> <NA>
 1 
 ? ? >> [8,]? ? ???NA
 NaN? <NA> 1 
 ? ? >>
 [9,]???0+NaNi 0? ? NaN? 2 
 ?
 ? >> [10,]???1+NaNi 1? ? NaN? 2 
 ? ? >> [11,]? ? ???NA
 <NA> NaN? 1 
 ? ? >> [12,]
 NaN+NaNi NaN? NaN? 2 
 ? ? >>>

 ? ? >>
 -------------------------------
 ? ?
 >> Note that 'mz <- match(z, z)' and hence
 the last column of the matrix above
 ? ?
 >> are very different in current R, 
 ? ? >> distinguishing most kinds of NA
 / NaN? against the documentation (and the
 ? ? >> real/numeric case).

 ? ? >> Martin
 Maechler
 ? ? >> R Core Team


 ? ?
 >> ### Basically a shortened version of? the PR#16885
 -- complex part b)
 ? ? >> ### of?
 R/tests/reg-tests-1c.R

 ?
 ? >> ## b) complex 'x' with different kinds
 of NaN
 ? ? >> x0 <- c(0,1, NA,
 NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
 ? ? >> ##? ? ? ? ???---
 = NA_real_? but that does not exist e.g., in R 2.3.1
 ? ? >> ##? ? ? ? ? ? ? ?
 ???similarly,? '1L', '2L', .. do
 not exist e.g., in R 2.3.1
 ? ? >> (z
 <- z[is.na(z)])
 ? ? >> outerID
 <- function(x,y, ...) { ## ugly; can we get outer() to
 work ?
 ? ? >> r <- matrix( ,
 length(x), length(y))
 ? ? >> for(i
 in seq(along=x))
 ? ? >> for(j in
 seq(along=y))
 ? ? >> r[i,j] <-
 identical(z[i], z[j], ...)
 ? ? >>
 r
 ? ? >> }
 ? ?
 >> ## Very strictly - in the sense of identical() --
 these 12 complex numbers all differ:
 ? ?
 >> ## a version that works in older versions of R,
[[elided Yahoo spam]]
 ? ?
 >> outerID.picky <- function(x,y) {
 ? ? >> nF <-
 length(formals(identical)) - 2
 ? ?
 >> do.call("outerID", c(list(x, y),
 as.list(rep(FALSE, nF))))
 ? ? >>
 }
 ? ? >> oldR <-
 !exists("getRversion") || getRversion() <
 "3.0.0" ## << FIXME: 3.0.0 is? a wild
 guess
 ? ? >> symnum(id.z <-
 outerID.picky(z,z)) ## == Diagonal matrix [newer versions of
 R]
 ? ? >> try(# for older R
 versions
 ? ? >>
 stopifnot(identical(id.z, outerID(z,z)), oldR ||
 identical(id.z, diag(12) == 1))
 ? ?
 >> )
 ? ? >> (mz <- match(z,
 z)) # currently different {NA,NaN} patterns differ - not in
 print()/format() _FIXME_
 ? ? >> zRI
 <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
 ? ? >> print(cbind(format = format(z),
 t(zRI), mz), quote=FALSE)

 ? ? >> ## compute? match(z[i], z) ,
 for? i = 1,2,..,12? :
 ? ? >> (m1z
 <- sapply(z, match, table = z))
 ? ?
 >> ## 1 2 1 2 2 2 1 2 2 2 1 2???# R 1.2.3?
 (2001-04-26)
 ? ? >> ## 1 2 3 4 1 3 7
 8 2 4 8 7???# R 1.4.1? (2002-01-30)
 ? ? >> ## 1 2 3 4 1 3 7 8 2 4 8 12? #
 R 1.5.1? (2002-06-17)
 ? ? >> ## 1 2
 3 4 1 3 7 8 2 4 8 12? # R 1.8.1? (2003-11-21)
 ? ? >> ## 1 2 3 4 1 3 7 8 2 4 8 12? #
 R 2.0.1? (2004-11-15)
 ? ? >> ## 1 2
 3 4 1 3 7 4 2 4 4 12? # R 2.1.1? (2005-06-20)
 ? ? >> ## 1 2 3 4 1 3 7 4 2 4 4 12? #
 R 2.3.1? (2006-06-01)
 ? ? >> ## 1 2
 3 4 1 3 7 8 2 4 8 12? # R 2.5.1? (2007-06-27)
 ? ? >> ## 1 2 3 4 1 3 7 4 2 4 4 12? #
 R 2.10.1 (2009-12-14)
 ? ? >> ## 1 2
 3 4 1 3 7 4 2 4 4 12? # R 3.1.1? (2014-07-10)
 ? ? >> ## 1 2 3 4 1 3 7 4 2 4 4 12? #
 R 3.2.5 -- and 3.3.0 patched
 ? ? >>
 ## 1 2 1 2 1 1 1 1 2 2 1 2???# <<--
 Martin's R-devel and proposed future R

 ? ? >>
 if(!exists("anyNA", mode="function"))
 anyNA <- function(x) any(is.na(x))
 ? ?
 >> stopifnot(apply(zRI, 2, anyNA)) # *all* are? NA
 *or* NaN (or both)
 ? ? >> is.NA
 <- function(.) is.na(.) & !is.nan(.)
 ? ? >> (iNaN <- apply(zRI, 2,
 function(.) any(is.nan(.))))
 ? ? >>
 (iNA <-? apply(zRI, 2, function(.) any(is.NA (.)))) #
 has non-NaN NA's
 ? ? >> ## In
 Martin's version of R-devel :
 ? ?
 >> stopifnot(identical(m1z == 1, iNA),
 ? ? >> identical(m1z == 2, !iNA))
 ? ? >> ## m1z uses match(x, *) with
 length(x) == 1 and failed in R 3.3.0
 ? ?
 >> stopifnot(identical(m1z, mz))
 ? ?
 >> ______________________________________________
 ? ? >> R-devel at r-project.org mailing
 list
 ? ? >> https://stat.ethz.ch/mailman/listinfo/r-devel

 ? ? >
 ______________________________________________
 ? ? > R-devel at r-project.org
 mailing list
 ? ? > https://stat.ethz.ch/mailman/listinfo/r-devel

Martin Maechler

2016-May-30 10:48 UTC

head link

[Rd] complex NA's match(), etc: not back-compatible change proposal

>>>>> Suharto Anggono 
>>>>>     on Sat, 28 May 2016 09:34:08 +0000 writes:
    > On 'factor', I meant the case where 'levels' is not
    > specified, where 'unique' is called.

I see, thank you.
    
    >> factor(c(complex(real=NaN), complex(imaginary=NaN)))
    > [1] NaN+0i <NA>
    > Levels: NaN+0i

    > Look at <NA> in the result above. Yes, it happens in
    > earlier versions of R, too.

Yes; let's call this "problem 1"

    > On matching both NA and NaN, another consequence is that
    > length(unique(.)) may depend on order. 
    > Example using R devel r70604:

    >> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex,
length.out=1); rm(x0)
    >> (z <- z[is.na(z)])
    > [1]       NA NaN+  0i       NA NaN+  1i       NA       NA       NA     
NA
    > [9]   0+NaNi   1+NaNi       NA NaN+NaNi
    >> length(print(unique(z)))
    > [1]     NA NaN+0i
    > [1] 2
    >> length(print(unique(c(z[8], z[-8]))))
    > [1] NA
    > [1] 1
    > --------------------------------------------

Thank you, Suharto. I agree these are even more convincing
reasons to consider changing.
Let's call this ("matching both NA and NaN")  "problem
2".

I think we agree that the R-devel -- comparted to previous
versions -- *is* consistent in its (C level) functions cequal()
and  chash() and also is consistent with the documentation
of match()/unique()/duplicated().

Hence I think a change would have to affect all of the above,
including a change of documentation.

Also, resolution of "problem 1" and "problem 2" are related,
but
--I think-- almost separate.
For the following, let's use a vector notation for complex
numbers, say
    (a, b) :== complex(real = a, imaginary = b)

With R  (showing relevant examples):
##------------------------------------------------------------------------------
options(width = max(85, getOption("width"))) # so 'z' prints
in one line
p.z <- function(z)
print(noquote(paste0("(",Re(z),",",Im(z),")")))
z <- c(1,NA,NaN); z <- outer(z,z, complex, length.out=1); (z <-
z[is.na(z)])
##     NA NaN+  1i       NA       NA       NA   1+NaNi       NA NaN+NaNi
p.z(z)
##  (NA,1)  (NaN,1)  (1,NA)  (NA,NA)  (NaN,NA)  (1,NaN)  (NA,NaN)  (NaN,NaN)
length(p.z(unique(z[ 1:8 ])))
## [1] (NA,1)  (NaN,1)
## [1] 2
length(p.z(unique(z[ c(8,1:7) ])))
## [1] (NaN,NaN) (NA,1)
## [1] 2
length(p.z(unique(z[ c(7:8,1:6) ])))
## [1] (NA,NaN)
## [1] 1
##------------------------------------------------------------------------------

Problem 1:
  To me, at the moment, it would seem most "natural" to consider a
  change where the match()/unique()/duplicated()  behavior  matched
  the behavior of print()/format()/as.character()  for such
  complex vectors.
  I think this would automatically solve the issue that sometimes

	length(unique(as.character(x))) > length(unique(x))

  The are principally two solutions to this:

  A: change  match()/unique()/duplicated()
  B: change  print()/format()/as.character()
  
  For A -- which seems "less disruptive" and more desirable to
  me -- we would have to change cequal() {and chash()!} and say
  that complex numbers with NA|NaN  "match" if they have any NA, but
  otherwise, both the regular (r,i) and the NaN must be at the
  exact same places (and *different* NaNs should match, of course).


Problem 2:   unique(z[i])  depends on the permutation 'i'

  What should a change be here ...  notably after the "proposed"
  (rather only "considered") change   '1 A' above ?

  Can "the" new behavior easily be described in words (if '1
A'
  above is already assumed)?

At the moment, I would not tackle Problem 2.
It would become less problematic once  Problem 1 is solved
according to '1 A', because it least  length(unique(.)) would
not change:  It would contain *one* z[] with an NA, and all the
other z[]s.

Opinions ?  Thank you in advance for chiming in..

Martin Maechler,
ETH Zurich

    > On Mon, 23/5/16, Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

    > Subject: Re: [Rd] complex NA's match(), etc: not back-compatible
change proposal

    > Cc: R-devel at r-project.org
    > Date: Monday, 23 May, 2016, 11:06 PM

    >>>>>> 
    > Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
    >>>>>> ? ???on Fri, 13
    > May 2016 16:33:05 +0000 writes:

    > ? ? > That, for example, complex(real=NaN)
    > and complex(imaginary=NaN) are regarded as equal makes it
    > possible that 

    > ? ? >?
    > length(unique(as.character(x))) > length(unique(x)) 

    > ? ? > (current code of
    > function 'factor' doesn't expect it). 

    > Thank you, that is an
    > interesting remark - but is already true,
    > in
    > [[elided Yahoo spam]]

    > ..
    > and of course this is because we do
    > *print*???0+NaNi? etc,
    > i.e., we
    > differentiate the? non-NA-but-NaN complex values in
    > formatting / printing but not in match(),
    > unique() ...

    > and indeed,
    > with the? 'z'? example below,
    > ?
    > fz <- factor(z,z)
    > gives a warnings about
    > duplicated levels and gives such warnings
    > also in current (and previous) versions of R,
    > at least for the slightly
    > larger z?
    > I've used in the tests/reg-tests-1c.R example.

    > For the moment I can live with
    > that warning, as I don't think
    > factor()s
    > are constructed from complex numbers "often"...
    > and the performance of factor() in the more
    > regular cases is important.

    >> Yes, an argument for the behavior is that
    > NA and NaN are of one kind.
    >> On my
    > system, using 32-bit R for Windows from binary from CRAN,
    > the result of sapply(z, match, table = z) (not in current
    > R-devel) may be different from below:
    > ? ?
    >> 1 2 3 4 1 3 7 8 2 4 8 12? # R 2.10.1, different from
    > below
    > ? ? > 1 2 3 4 1 3 7 8 2 4 8 12?
    > # R 3.2.5, different from below

    > interesting, thank you... and another reason
    > why the change
    > (currently only in R-devel)
    > may have been a good one: More uniformity.

    > ? ? > I noticed that, by
    > function 'cequal' in unique.c, a complex number that
    > has both NA and NaN matches NA and also matches NaN.

    > ? ? >> x0 <- c(0,1,
    > NA, NaN); z <- outer(x0,x0, complex, length.out=1);
    > rm(x0)
    > ? ? >> (z <-
    > z[is.na(z)])
    > ? ? > [1]? ?
    > ???NA NaN+? 0i? ? ???NA NaN+? 1i?
    > ? ???NA? ? ???NA? ?
    > ???NA? ? ???NA
    > ? ?
    >> [9]???0+NaNi???1+NaNi? ?
    > ???NA NaN+NaNi

    > ? ? >> sapply(z, match, table     > z[8])
    > ? ? > [1] 1 1 1 1 1 1 1 1 1 1 1
    > 1
    > ? ? >> match(z, z[8])
    > ? ? > [1] 1 1 1 1 1 1 1 1 1 1 1 1

    > Yes, I see the same. But is
    > n't it what we expect:

    > All of our z[] entries has at least one NA or a
    > NaN in its real
    > or imaginary, and since z[8]
    > has both, it does match with all
    > z[]'s
    > either because of the NA or because of the NaN in common.

    > Hence, currently, I don't
    > think this needs to be changed...
    > but if
    > there are other reasons / arguments ...

    > Thank you again,
    > Martin
    > Maechler


    > ? ? >> sessionInfo()
    > ?
    > ? > R Under development (unstable) (2016-05-12
    > r70604)
    > ? ? > Platform:
    > i386-w64-mingw32/i386 (32-bit)
    > ? ? >
    > Running under: Windows XP (build 2600) Service Pack 2

    > ? ? > locale:
    > ? ? > [1] LC_COLLATE=English_United
    > States.1252
    > ? ? > [2]
    > LC_CTYPE=English_United States.1252
    > ? ?
    >> [3] LC_MONETARY=English_United States.1252
    > ? ? > [4] LC_NUMERIC=C
    > ?
    > ? > [5] LC_TIME=English_United States.1252

    > ? ? > attached base
    > packages:
    > ? ? > [1] stats?
    > ???graphics? grDevices utils?
    > ???datasets? methods???base

    > ? ? >
    > -----------------
    >>>>>> 
    > Martin Maechler <maechler at stat.math.ethz.ch>
    >>>>>> ? ???on Tue, 10
    > May 2016 16:08:39 +0200 writes:

    > ? ? >> This is an RFC / announcement
    > related to the 2nd part of PR#16885
    > ? ?
    >>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
    > ? ? >> about? complex NA's.

    > ? ? >> The (somewhat
    > rare) incompatibility in R's 3.3.0 match() behavior for
    > the
    > ? ? >> case of complex numbers
    > with NA & NaN's {which has been fixed for R 3.3.0
    > ? ? >> patched in the mean time}
    > triggered some more comprehensive "research".

    > ? ? >> I found that we
    > have had a long-standing inconsistency at least between
    > the
    > ? ? >> documented and the real
    > behavior.? I am claiming that the documented
    > ? ? >> behavior is desirable and hence
    > R's current "real" behavior is bugous, and
    > ? ? >> I am proposing to change it, in
    > R-devel (to be 3.4.0) for now.

    > ? ? > After the? "roaring
    > unanimous" assent? (one private msg
    > ?
    > ? > encouraging me to go forward, no dissenting voice,
    > hence an
    > ? ? > "odds ratio"
    > of? +Inf? in favor ;-)

    > ?
    > ? > I have now committed my proposal to R-devel (svn
    > rev. 70597) and
    > ? ? > some of us will
    > be seeing the effect in package space within a
    > ? ? > day or so, in the CRAN checks
    > against R-devel (not for
    > ? ? >
    > bioconductor AFAIK; their checks using R-devel only when it
    > less
    > ? ? > than ca 6 months from
    > release).

    > ? ? >
    > It's still worthwhile to discuss the issue, if you come
    > late
    > ? ? > to it, notably as
    > ---paraphrasing Dirk on the R-package-devel list---
    > ? ? > the release of 3.4.0 is almost a
    > year away, and so now is the
    > ? ? > best
    > time to tinker with the API, in other words, consider
    > breaking
    > ? ? > rarely used legacy
    > APIs..

    > ? ? > Martin


    > ? ?
    >>> In help(match) we have been saying

    > ? ? >> |? Exactly
    > what matches what is to some extent a matter of
    > definition.
    > ? ? >> |? For all
    > types, \code{NA} matches \code{NA} and no other value.
    > ? ? >> |? For real and complex values,
    > \code{NaN} values are regarded
    > ? ?
    >>> |? as matching any other \code{NaN} value, but not
    > matching \code{NA}.

    > ? ?
    >>> for at least 10 years.? But we don't do that
    > at all in the
    > ? ? >> complex case
    > (and AFAIK never got a bug report about it).

    > ? ? >> Also, e.g.,
    > print(.) or format(.) do simply use? "NA" for
    > all
    > ? ? >> the different complex
    > NA-containing numbers, where OTOH,
    > ? ?
    >>> non-NA NaN's { <=>? !is.nan(z) &
    > is.na(z) }
    > ? ? >> in format() or
    > print() do show the NaN in real and/or imaginary
    > ? ? >> parts; for an example, look at
    > the "format" column of the matrix
    > ? ? >> below, after
    > 'print(cbind' ...

    > ? ? >> The current match()---and
    > duplicated(), unique() which are based on the same
    > ? ? >> C code---*do* distinguish almost
    > all complex NA / NaN's which is
    > ? ?
    >>> NOT according to documentation. I have found that
    > this is just because of 
    > ? ? >> of
    > our hashing function for the complex case, chash() in
    > R/src/main/unique.c,
    > ? ? >> is
    > bogous in the sense that it is not compatible with the above
    > documentation
    > ? ? >> and also not
    > with the cequal() function (in the same file uniqu.c) for
    > checking
    > ? ? >> equality of complex
    > numbers.

    > ? ? >> As
    > I have found,, a *simplified* version of the chash()
    > function
    > ? ? >> to make it
    > compatible with cequal() does solve all the problems
    > I've
    > ? ? >> indicated,? and the
    > current plan is to commit that change --- after some
    > ? ? >> discussion time, here on R-devel
    > ---? to the code base.

    > ?
    > ? >> My change passes? 'make check-all'
    > fine, but I'm 100% sure that there will
    > ? ? >> be effects in package-space. ...
    > one reason for this posting.

    > ? ? >> As mentioned above, note that
    > the chash() function has been in
    > ? ?
    >>> use for all three functions
    > ? ?
    >>> match()
    > ? ? >>
    > duplicated()
    > ? ? >> unique()
    > ? ? >> and the change will affect all
    > three --- but just for the case of complex
    > ? ? >> vectors with NA or NaN's.

    > ? ? >> To show more, a
    > small R session -- using my version of R-devel
    > ? ? >> == the proposition: 
    > ? ? >> The R script
    > ('complex-NA-short.R') for (a bit more than) the
    > ? ? >> session is attached {{you can
    > attach? text/plain easily}}:

    > ? ? >>> x0 <- c(0,1, NA, NaN); z
    > <- outer(x0,x0, complex, length.out=1); rm(x0)
    > ? ? >>> ##? ? ? ?
    > ???--- = NA_real_? but that does not exist e.g.,
    > in R 2.3.1
    > ? ? >>> ##? ? ? ?
    > ? ? ? ? ???similarly,? '1L',
    > '2L', .. do not exist e.g., in R 2.3.1
    > ? ? >>> (z <- z[is.na(z)])
    > ? ? >> [1]? ? ???NA NaN+?
    > 0i? ? ???NA NaN+? 1i? ? ???NA?
    > ? ???NA? ? ???NA? ?
    > ???NA
    > ? ? >>
    > [9]???0+NaNi???1+NaNi? ?
    > ???NA NaN+NaNi
    > ? ? >>>
    > outerID <- function(x,y, ...) { ## ugly; can we get
    > outer() to work ?
    > ? ? >> +?
    > ???r <- matrix( , length(x), length(y))
    > ? ? >> +? ???for(i in
    > seq(along=x))
    > ? ? >> +? ? ?
    > ???for(j in seq(along=y))
    > ? ?
    >>> +? ? ? ? ? ???r[i,j] <-
    > identical(z[i], z[j], ...)
    > ? ? >>
    > +? ???r
    > ? ? >> + }
    > ? ? >>> ## Very strictly - in the
    > sense of identical() -- these 12 complex numbers all
    > differ:
    > ? ? >>> ## a version that
    > works in older versions of R, where identical() had fewer
    > arguments!
    > ? ? >>> outerID.picky
    > <- function(x,y) {
    > ? ? >> +?
    > ???nF <- length(formals(identical)) - 2
    > ? ? >> +?
    > ???do.call("outerID", c(list(x, y),
    > as.list(rep(FALSE, nF))))
    > ? ? >> +
    > }
    > ? ? >>> oldR <-
    > !exists("getRversion") || getRversion() <
    > "3.0.0" ## << FIXME: 3.0.0 is? a wild
    > guess
    > ? ? >>> symnum(id.z <-
    > outerID.picky(z,z)) ## == Diagonal matrix [newer versions of
    > R]
    > ? ? ? ? ? ? ? ? ? ? ? ? ?
    > ???
    > ? ? >> [1,] | . . . .
    > . . . . . . .
    > ? ? >> [2,] . | . . .
    > . . . . . . .
    > ? ? >> [3,] . . | . .
    > . . . . . . .
    > ? ? >> [4,] . . . | .
    > . . . . . . .
    > ? ? >> [5,] . . . . |
    > . . . . . . .
    > ? ? >> [6,] . . . . .
    > | . . . . . .
    > ? ? >> [7,] . . . . .
    > . | . . . . .
    > ? ? >> [8,] . . . . .
    > . . | . . . .
    > ? ? >> [9,] . . . . .
    > . . . | . . .
    > ? ? >> [10,] . . . . .
    > . . . . | . .
    > ? ? >> [11,] . . . . .
    > . . . . . | .
    > ? ? >> [12,] . . . . .
    > . . . . . . |
    > ? ? >>> try(# for
    > older R versions
    > ? ? >> +
    > stopifnot(identical(id.z, outerID(z,z)), oldR ||
    > identical(id.z, diag(12) == 1))
    > ? ?
    >>> + )
    > ? ? >>> (mz <-
    > match(z, z)) # currently different {NA,NaN} patterns differ
    > - not in print()/format() _FIXME_
    > ? ?
    >>> [1] 1 2 1 2 1 1 1 1 2 2 1 2
    > ? ?
    >>>> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see
    > the pattern :
    > ? ? >>>
    > print(cbind(format = format(z), t(zRI), mz), quote=FALSE)
    > ? ? >>
    > format???Re???Im???mz
    > ? ? >> [1,]? ? ???NA
    > <NA> 0? ? 1 
    > ? ? >> [2,]
    > NaN+? 0i NaN? 0? ? 2 
    > ? ? >>
    > [3,]? ? ???NA <NA> 1? ? 1 
    > ? ? >> [4,] NaN+? 1i NaN? 1? ? 2

    > ? ? >> [5,]? ? ???NA
    > 0? ? <NA> 1 
    > ? ? >> [6,]?
    > ? ???NA 1? ? <NA> 1 
    > ?
    > ? >> [7,]? ? ???NA <NA> <NA>
    > 1 
    > ? ? >> [8,]? ? ???NA
    > NaN? <NA> 1 
    > ? ? >>
    > [9,]???0+NaNi 0? ? NaN? 2 
    > ?
    > ? >> [10,]???1+NaNi 1? ? NaN? 2 
    > ? ? >> [11,]? ? ???NA
    > <NA> NaN? 1 
    > ? ? >> [12,]
    > NaN+NaNi NaN? NaN? 2 
    > ? ? >>>

    > ? ? >>
    > -------------------------------
    > ? ?
    >>> Note that 'mz <- match(z, z)' and hence
    > the last column of the matrix above
    > ? ?
    >>> are very different in current R, 
    > ? ? >> distinguishing most kinds of NA
    > / NaN? against the documentation (and the
    > ? ? >> real/numeric case).

    > ? ? >> Martin
    > Maechler
    > ? ? >> R Core Team


    > ? ?
    >>> ### Basically a shortened version of? the PR#16885
    > -- complex part b)
    > ? ? >> ### of?
    > R/tests/reg-tests-1c.R

    > ?
    > ? >> ## b) complex 'x' with different kinds
    > of NaN
    > ? ? >> x0 <- c(0,1, NA,
    > NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
    > ? ? >> ##? ? ? ? ???---
    > = NA_real_? but that does not exist e.g., in R 2.3.1
    > ? ? >> ##? ? ? ? ? ? ? ?
    > ???similarly,? '1L', '2L', .. do
    > not exist e.g., in R 2.3.1
    > ? ? >> (z
    > <- z[is.na(z)])
    > ? ? >> outerID
    > <- function(x,y, ...) { ## ugly; can we get outer() to
    > work ?
    > ? ? >> r <- matrix( ,
    > length(x), length(y))
    > ? ? >> for(i
    > in seq(along=x))
    > ? ? >> for(j in
    > seq(along=y))
    > ? ? >> r[i,j] <-
    > identical(z[i], z[j], ...)
    > ? ? >>
    > r
    > ? ? >> }
    > ? ?
    >>> ## Very strictly - in the sense of identical() --
    > these 12 complex numbers all differ:
    > ? ?
    >>> ## a version that works in older versions of R,
    > [[elided Yahoo spam]]
    > ? ?
    >>> outerID.picky <- function(x,y) {
    > ? ? >> nF <-
    > length(formals(identical)) - 2
    > ? ?
    >>> do.call("outerID", c(list(x, y),
    > as.list(rep(FALSE, nF))))
    > ? ? >>
    > }
    > ? ? >> oldR <-
    > !exists("getRversion") || getRversion() <
    > "3.0.0" ## << FIXME: 3.0.0 is? a wild
    > guess
    > ? ? >> symnum(id.z <-
    > outerID.picky(z,z)) ## == Diagonal matrix [newer versions of
    > R]
    > ? ? >> try(# for older R
    > versions
    > ? ? >>
    > stopifnot(identical(id.z, outerID(z,z)), oldR ||
    > identical(id.z, diag(12) == 1))
    > ? ?
    >>> )
    > ? ? >> (mz <- match(z,
    > z)) # currently different {NA,NaN} patterns differ - not in
    > print()/format() _FIXME_
    > ? ? >> zRI
    > <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
    > ? ? >> print(cbind(format = format(z),
    > t(zRI), mz), quote=FALSE)

    > ? ? >> ## compute? match(z[i], z) ,
    > for? i = 1,2,..,12? :
    > ? ? >> (m1z
    > <- sapply(z, match, table = z))
    > ? ?
    >>> ## 1 2 1 2 2 2 1 2 2 2 1 2???# R 1.2.3?
    > (2001-04-26)
    > ? ? >> ## 1 2 3 4 1 3 7
    > 8 2 4 8 7???# R 1.4.1? (2002-01-30)
    > ? ? >> ## 1 2 3 4 1 3 7 8 2 4 8 12? #
    > R 1.5.1? (2002-06-17)
    > ? ? >> ## 1 2
    > 3 4 1 3 7 8 2 4 8 12? # R 1.8.1? (2003-11-21)
    > ? ? >> ## 1 2 3 4 1 3 7 8 2 4 8 12? #
    > R 2.0.1? (2004-11-15)
    > ? ? >> ## 1 2
    > 3 4 1 3 7 4 2 4 4 12? # R 2.1.1? (2005-06-20)
    > ? ? >> ## 1 2 3 4 1 3 7 4 2 4 4 12? #
    > R 2.3.1? (2006-06-01)
    > ? ? >> ## 1 2
    > 3 4 1 3 7 8 2 4 8 12? # R 2.5.1? (2007-06-27)
    > ? ? >> ## 1 2 3 4 1 3 7 4 2 4 4 12? #
    > R 2.10.1 (2009-12-14)
    > ? ? >> ## 1 2
    > 3 4 1 3 7 4 2 4 4 12? # R 3.1.1? (2014-07-10)
    > ? ? >> ## 1 2 3 4 1 3 7 4 2 4 4 12? #
    > R 3.2.5 -- and 3.3.0 patched
    > ? ? >>
    > ## 1 2 1 2 1 1 1 1 2 2 1 2???# <<--
    > Martin's R-devel and proposed future R

    > ? ? >>
    > if(!exists("anyNA", mode="function"))
    > anyNA <- function(x) any(is.na(x))
    > ? ?
    >>> stopifnot(apply(zRI, 2, anyNA)) # *all* are? NA
    > *or* NaN (or both)
    > ? ? >> is.NA
    > <- function(.) is.na(.) & !is.nan(.)
    > ? ? >> (iNaN <- apply(zRI, 2,
    > function(.) any(is.nan(.))))
    > ? ? >>
    > (iNA <-? apply(zRI, 2, function(.) any(is.NA (.)))) #
    > has non-NaN NA's
    > ? ? >> ## In
    > Martin's version of R-devel :
    > ? ?
    >>> stopifnot(identical(m1z == 1, iNA),
    > ? ? >> identical(m1z == 2, !iNA))
    > ? ? >> ## m1z uses match(x, *) with
    > length(x) == 1 and failed in R 3.3.0
    > ? ?
    >>> stopifnot(identical(m1z, mz))
    > ? ?
    >>> ______________________________________________
    > ? ? >> R-devel at r-project.org mailing
    > list
    > ? ? >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ? ? >
    > ______________________________________________
    > ? ? > R-devel at r-project.org
    > mailing list
    > ? ? > https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Seemingly Similar Threads

Search for more maybe matching threads

R devel - May 2016 - complex NA's match(), etc: not back-compatible change proposal

[Rd] complex NA's match(), etc: not back-compatible change proposal

[Rd] complex NA's match(), etc: not back-compatible change proposal

Seemingly Similar Threads