Josh O'Brien
2016-Jan-05 00:16 UTC
[Rd] For integer vectors, `as(x, "numeric")` has no effect.
On Dec 19, 2015, at 3:32 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>>> on Sat, 12 Dec 2015 10:32:51 +0100 writes: > >>>>>> John Chambers <jmc at r-project.org> >>>>>> on Fri, 11 Dec 2015 10:11:05 -0800 writes: > >>> Somehow, the most obvious fixes are always back-incompatible these days. >>> The example intrigued me, so I looked into it a bit (should have been doing something else, but ....) > >>> You're right that this is the proverbial thin-edge-of-the-wedge. > >>> The problem is in setDataPart(), which will be called whenever a class extends one of the vector types. > >>> It does >>> as(value, dataClass) >>> The key point is that the third argument to as(), strict=TRUE by default. So, yes, the change will cause all integer vectors to become double when the class extends "numeric". Generally, strict=TRUE makes sense here and of course changing THAT would open up yet more incompatibilities. > >>> For back compatibility, one would have to have some special code in setDataPart() for the case of integer/numeric. > >>> John > >>> (Historically, the original sin was probably not making a distinction between "numeric" as a virtual class and "double" as a type/class.) > >> Yes, indeed. In the mean time, I've seen more cases where >> "the change will cause all integer vectors to become double when the class extends "numeric". >> seems detrimental. > >> OTOH, I still think we could go in the right direction --- >> hopefully along the wishes of bioconductor S4 development, see >> Martin Morgan's e-mail: > >> [This is all S4 - only; should not much affect base R / S3] >> Currently, "integer" is a subclass of "numeric" and so the >> "integer become double" part seems unwanted to me. >> OTOH, it would really make sense to more formally >> have the basic subclasses of "numeric" to be "integer" and "double", >> and to let as(*, "double") to become different to as(*, "numeric") >> [Again, this is just for the S4 classes and as() coercions, *not* e.g. >> for as.numeric() / as.double() !] > >> In the DEPRECATED part of the NEWS for R 2.7.0 (April 2008) we >> have had > >> o The S4 pseudo-classes "single" and double have been removed. >> (The S4 class for a REALSXP is "numeric": for back-compatibility >> as(x, "double") coerces to "numeric".) > >> I think the removal of "single" was fine, but in hindsight, >> maybe the removal of "double" -- which was partly broken then -- >> possibly could rather have been a fixup of "double" along the >> following > >> Current "thought experiment proposal" : > >> 1) "numeric" := {"integer", "double"} { class - subclasses } >> 2) as(1L, "numeric") continues to return 1L .. since integer is >> one case of "numeric" >> 3) as(1L, "double") newly returns 1.0 {and in fact would be >> "equivalent" to as.double(1L)} > >> After the above change, S4 as(*, "double") would correspond to S3 as.double >> but as(*, "numeric") would continue to differ from >> as.numeric(*), the former *not* changing integers to double. > >> Martin > > Also note that e.g. > > class(pi) would return "double" instead of "numeric" > > and this will break all the bad programming style usages of > > if(class(x) == "numeric") > > which I tend to see in gazillions of user and even package codes > This bad (aka error prone !) because "correct" usage would be > > if(inherits(x, "numeric")) > > and that of course would *not* break after the change above. > > - - - - > > A week later, I'm still pretty convinced it would be worth going > in the direction proposed above. > > But I was actually hoping for some encouragement or "mental support"... > or then to hear why you think the proposition is not good or not > viable ... > >I really like Martin Maechler's "thought experiment proposal", but (based partly on the reception its gotten) figure I mustn't be appreciating the complications it would introduce.. That said, if it's decided to just make a smaller fix of as(x, "numeric"), might it be better to make the change at the C level, to R_set_class in $RHOME/src/main/coerce.c? Here's the 'offending' bit of R_set_class (the C-code implementing the R function `class<-`), which treats an INTSXP in the same way as a REALSXP, breaking before it get coerced to REALSXP: else if(!strcmp("numeric", valueString)) { setAttrib(obj, R_ClassSymbol, R_NilValue); if(IS_S4_OBJECT(obj)) /* NULL class is only valid for S3 objects */ do_unsetS4(obj, value); switch(TYPEOF(obj)) { case INTSXP: case REALSXP: break; default: PROTECT(obj = coerceVector(obj, REALSXP)); nProtect++; } } Simply deleting "case INTSXP: " will let integer vectors pass through and get converted to REALSXP by coerceVector(). This approach has a couple of advantages that (unless I'm missing something) make it seem relatively clean: - It requires no additional explicit S4 "coerce" method for signature c("integer", "numeric"). - It simultaneously fixes the behavior of `class<-` (also mentioned in my OP) so that it too will convert an "integer" vector to one of class "numeric" when a user asks it to. (Not that they "should" be doing that, given the admonition in ?class). FWIW, I've recompiled R with the simple modification mentioned above. It passes all tests in "make check" until it reaches one very similar to that presented by Martin Maechler back on December 11th, failing it for the same reason (i.e. conversion of the integer vector to "numeric" by the call to as() in setDataPart()). Here's the test and its outcome: setClass("NumX", contains="numeric", representation(id="character")) nn = new("NumX", 1:10, id="test") stopifnot(identical(1:10, S3Part(nn, strict = TRUE))) ## Error: identical(1:10, S3Part(nn, strict = TRUE)) is not TRUE ## Execution halted>>> On Dec 11, 2015, at 1:25 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote: > >>>>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>>>>>> on Tue, 8 Dec 2015 15:25:21 +0100 writes: >>>> >>>>>>>>> John Chambers <jmc at r-project.org> >>>>>>>>> on Mon, 7 Dec 2015 16:05:59 -0800 writes: >>>> >>>>>> We do need an explicit method here, I think. >>>>>> The issue is that as() uses methods for the generic function coerce() but cannot use inheritance in the usual way (if it did, you would be immediately back with no change, since "integer" inherits from "numeric"). >>>> >>>>>> Copying in the general method for coercing to "numeric" as an explicit method for "integer" gives the expected result: >>>> >>>>>>> setMethod("coerce", c("integer", "numeric"), getMethod("coerce", c("ANY", "numeric"))) >>>>>> [1] "coerce" >>>>>>> typeof(as(1L, "numeric")) >>>>>> [1] "double" >>>> >>>>>> Seems like a reasonable addition to the code, unless someone sees a problem. >>>>>> John >>>> >>>>> I guess that that some package checks (in CRAN + Bioc + ... - >>>>> land) will break, >>>>> but I still think we should add such a coercion to R. >>>> >>>>> Martin >>>> >>>> Hmm... I've tried to add the above to R >>>> and do notice that there are consequences that may be larger than >>>> anticipated: >>>> >>>> Here is example code: >>>> >>>> myN <- setClass("myN", contains="numeric") >>>> myNid <- setClass("myNid", contains="numeric", representation(id="character")) >>>> NN <- setClass("NN", representation(x="numeric")) >>>> >>>> (m1 <- myN (1:3)) >>>> (m2 <- myNid(1:3, id = "i3")) >>>> tools::assertError(NN (1:3))# in all R versions >>>> >>>> ## # current R | new R >>>> ## # -----------|---------- >>>> class(getDataPart(m1)) # integer | numeric >>>> class(getDataPart(m2)) # integer | numeric >>>> >>>> >>>> In other words, with the above setting, the traditional >>>> gentleperson's agreement in S and R, >>>> >>>> __ "numeric" sometimes conveniently means "integer" or "double" __ >>>> >>>> will be slightly less often used ... which of course may be a >>>> very good thing. >>>> >>>> However, it breaks strict back compatibility also in cases where >>>> the previous behavior may have been preferable: >>>> After all integer vectors need only have the space of doubles. >>>> >>>> Shall we still go ahead and do apply this change to R-devel >>>> and then all package others will be willing to update where necessary? >>>> >>>> As this may affect the many hundreds of bioconductor packages >>>> using S4 classes, I am -- exceptionally -- cross posting to the >>>> bioc-devel list. >>>> >>>> Martin Maechler >>>> >>>> >>>>>> On Dec 7, 2015, at 3:37 PM, Benjamin Tyner <btyner at gmail.com> wrote: >>>> >>>>>>> Perhaps it is not that surprising, given that >>>>>>> >>>>>> mode(1L) >>>>>>> [1] "numeric" >>>>>>> >>>>>>> and >>>>>>> >>>>>> is.numeric(1L) >>>>>>> [1] TRUE >>>>>>> >>>>>>> On the other hand, this is curious, to say the least: >>>>>>> >>>>>> is.double(as(1L, "double")) >>>>>>> [1] FALSE >>>>>>> >>>>>> Here's the surprising behavior: >>>>>> >>>>>> x <- 1L >>>>>> xx <- as(x, "numeric") >>>>>> class(xx) >>>>>> ## [1] "integer" >>>>>> >>>>>> It occurs because the call to `as(x, "numeric")` dispatches the coerce >>>>>> S4 method for the signature `c("integer", "numeric")`, whose body is >>>>>> copied in below. >>>>>> >>>>>> function (from, to = "numeric", strict = TRUE) >>>>>> if (strict) { >>>>>> class(from) <- "numeric" >>>>>> from >>>>>> } else from >>>>>> >>>>>> This in turn does nothing, even when strict=TRUE, because that >>>>>> assignment to class "numeric" has no effect: >>>>>> >>>>>> x <- 10L >>>>>> class(x) <- "numeric" >>>>>> class(x) >>>>>> [1] "integer" >>>>>> >>>>>> Is this the desired behavior for `as(x, "numeric")`? >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-devel at r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>>>> ______________________________________________ >>>>>> R-devel at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>>> ______________________________________________ >>>>> R-devel at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2016-Jan-05 09:31 UTC
[Rd] For integer vectors, `as(x, "numeric")` has no effect.
>>>>> Josh O'Brien <joshmobrien at gmail.com> >>>>> on Mon, 4 Jan 2016 16:16:51 -0800 writes:> On Dec 19, 2015, at 3:32 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote: >>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> on >>>>>>> Sat, 12 Dec 2015 10:32:51 +0100 writes: >> >>>>>>> John Chambers <jmc at r-project.org> on Fri, 11 Dec >>>>>>> 2015 10:11:05 -0800 writes: >> >>>> Somehow, the most obvious fixes are always >>>> back-incompatible these days. The example intrigued >>>> me, so I looked into it a bit (should have been doing >>>> something else, but ....) >> >>>> You're right that this is the proverbial >>>> thin-edge-of-the-wedge. >> >>>> The problem is in setDataPart(), which will be called >>>> whenever a class extends one of the vector types. >> >>>> It does as(value, dataClass) The key point is that the >>>> third argument to as(), strict=TRUE by default. So, >>>> yes, the change will cause all integer vectors to >>>> become double when the class extends "numeric". >>>> Generally, strict=TRUE makes sense here and of course >>>> changing THAT would open up yet more incompatibilities. >> >>>> For back compatibility, one would have to have some >>>> special code in setDataPart() for the case of >>>> integer/numeric. >> >>>> John >> >>>> (Historically, the original sin was probably not making >>>> a distinction between "numeric" as a virtual class and >>>> "double" as a type/class.) >> >>> Yes, indeed. In the mean time, I've seen more cases >>> where "the change will cause all integer vectors to >>> become double when the class extends "numeric". seems >>> detrimental. >> >>> OTOH, I still think we could go in the right direction >>> --- hopefully along the wishes of bioconductor S4 >>> development, see Martin Morgan's e-mail: >> >>> [This is all S4 - only; should not much affect base R / >>> S3] Currently, "integer" is a subclass of "numeric" and >>> so the "integer become double" part seems unwanted to >>> me. OTOH, it would really make sense to more formally >>> have the basic subclasses of "numeric" to be "integer" >>> and "double", and to let as(*, "double") to become >>> different to as(*, "numeric") [Again, this is just for >>> the S4 classes and as() coercions, *not* e.g. for >>> as.numeric() / as.double() !] >> >>> In the DEPRECATED part of the NEWS for R 2.7.0 (April >>> 2008) we have had >> >>> o The S4 pseudo-classes "single" and double have been >>> removed. (The S4 class for a REALSXP is "numeric": for >>> back-compatibility as(x, "double") coerces to >>> "numeric".) >> >>> I think the removal of "single" was fine, but in >>> hindsight, maybe the removal of "double" -- which was >>> partly broken then -- possibly could rather have been a >>> fixup of "double" along the following >> >>> Current "thought experiment proposal" : >> >>> 1) "numeric" := {"integer", "double"} { class - >>> subclasses } 2) as(1L, "numeric") continues to return 1L >>> .. since integer is one case of "numeric" 3) as(1L, >>> "double") newly returns 1.0 {and in fact would be >>> "equivalent" to as.double(1L)} >> >>> After the above change, S4 as(*, "double") would >>> correspond to S3 as.double but as(*, "numeric") would >>> continue to differ from as.numeric(*), the former *not* >>> changing integers to double. >> >>> Martin >> >> Also note that e.g. >> >> class(pi) would return "double" instead of "numeric" >> >> and this will break all the bad programming style usages >> of >> >> if(class(x) == "numeric") >> >> which I tend to see in gazillions of user and even >> package codes This bad (aka error prone !) because >> "correct" usage would be >> >> if(inherits(x, "numeric")) >> >> and that of course would *not* break after the change >> above. >> >> - - - - >> >> A week later, I'm still pretty convinced it would be >> worth going in the direction proposed above. >> >> But I was actually hoping for some encouragement or >> "mental support"... or then to hear why you think the >> proposition is not good or not viable ... >> >> > I really like Martin Maechler's "thought experiment > proposal", but (based partly on the reception its gotten) > figure I mustn't be appreciating the complications it > would introduce.. Actually, I've spent half day implementing it and was very pleased about it... as matter of fact it passed *all* our checks also in all recommended packages (*) To do it cleanly... with very few code changes, the *only* consequence would be that class(1.) (and similar) then returned "double" instead of "numeric". which *would* be logical consequent, because indeed, numeric = {integer, double} in that new scheme, and class(1L) also returns "integer". To my big chagrin there was very big opposition such a change, IIRC, mainly on the grounds that for 20 years or so S and then R books and publications had said that double and numeric should be basically the same. (*) Below you have a C level proposal which as you note is similar to John Chambers R level change: The consequence is that basically you can no longer have "integer" entries in "numeric" slots; they are automagically made into "double". I personally find that not really "acceptable" {waste of storage}, and I would guess that more code "out there in package-land and user-code" would break than with my change. > That said, if it's decided to just make a smaller fix of > as(x, "numeric"), might it be better to make the change at > the C level, to R_set_class in $RHOME/src/main/coerce.c? I'm not seeing the advantage to make the change there, apart from possibly some efficiency gain. For the time being, I will not work on this ... mainly as I still believe that my proposal would lead to a much much cleaner setup (and yes, even be worth some small changes in new editions of those R books which deal with such subtle issues) Martin
Josh O'Brien
2016-Jan-05 21:55 UTC
[Rd] For integer vectors, `as(x, "numeric")` has no effect.
On Tue, Jan 5, 2016 at 1:31 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:>>>>>> Josh O'Brien <joshmobrien at gmail.com> >>>>>> on Mon, 4 Jan 2016 16:16:51 -0800 writes: > > > On Dec 19, 2015, at 3:32 AM, Martin Maechler <maechler at > stat.math.ethz.ch> wrote: > > >>>>>>> Martin Maechler <maechler at stat.math.ethz.ch> on > >>>>>>> Sat, 12 Dec 2015 10:32:51 +0100 writes: > >> > >>>>>>> John Chambers <jmc at r-project.org> on Fri, 11 Dec > >>>>>>> 2015 10:11:05 -0800 writes: > >> > >>>> Somehow, the most obvious fixes are always > >>>> back-incompatible these days. The example intrigued > >>>> me, so I looked into it a bit (should have been doing > >>>> something else, but ....) > >> > >>>> You're right that this is the proverbial > >>>> thin-edge-of-the-wedge. > >> > >>>> The problem is in setDataPart(), which will be called > >>>> whenever a class extends one of the vector types. > >> > >>>> It does as(value, dataClass) The key point is that the > >>>> third argument to as(), strict=TRUE by default. So, > >>>> yes, the change will cause all integer vectors to > >>>> become double when the class extends "numeric". > >>>> Generally, strict=TRUE makes sense here and of course > >>>> changing THAT would open up yet more incompatibilities. > >> > >>>> For back compatibility, one would have to have some > >>>> special code in setDataPart() for the case of > >>>> integer/numeric. > >> > >>>> John > >> > >>>> (Historically, the original sin was probably not making > >>>> a distinction between "numeric" as a virtual class and > >>>> "double" as a type/class.) > >> > >>> Yes, indeed. In the mean time, I've seen more cases > >>> where "the change will cause all integer vectors to > >>> become double when the class extends "numeric". seems > >>> detrimental. > >> > >>> OTOH, I still think we could go in the right direction > >>> --- hopefully along the wishes of bioconductor S4 > >>> development, see Martin Morgan's e-mail: > >> > >>> [This is all S4 - only; should not much affect base R / > >>> S3] Currently, "integer" is a subclass of "numeric" and > >>> so the "integer become double" part seems unwanted to > >>> me. OTOH, it would really make sense to more formally > >>> have the basic subclasses of "numeric" to be "integer" > >>> and "double", and to let as(*, "double") to become > >>> different to as(*, "numeric") [Again, this is just for > >>> the S4 classes and as() coercions, *not* e.g. for > >>> as.numeric() / as.double() !] > >> > >>> In the DEPRECATED part of the NEWS for R 2.7.0 (April > >>> 2008) we have had > >> > >>> o The S4 pseudo-classes "single" and double have been > >>> removed. (The S4 class for a REALSXP is "numeric": for > >>> back-compatibility as(x, "double") coerces to > >>> "numeric".) > >> > >>> I think the removal of "single" was fine, but in > >>> hindsight, maybe the removal of "double" -- which was > >>> partly broken then -- possibly could rather have been a > >>> fixup of "double" along the following > >> > >>> Current "thought experiment proposal" : > >> > >>> 1) "numeric" := {"integer", "double"} { class - > >>> subclasses } 2) as(1L, "numeric") continues to return 1L > >>> .. since integer is one case of "numeric" 3) as(1L, > >>> "double") newly returns 1.0 {and in fact would be > >>> "equivalent" to as.double(1L)} > >> > >>> After the above change, S4 as(*, "double") would > >>> correspond to S3 as.double but as(*, "numeric") would > >>> continue to differ from as.numeric(*), the former *not* > >>> changing integers to double. > >> > >>> Martin > >> > >> Also note that e.g. > >> > >> class(pi) would return "double" instead of "numeric" > >> > >> and this will break all the bad programming style usages > >> of > >> > >> if(class(x) == "numeric") > >> > >> which I tend to see in gazillions of user and even > >> package codes This bad (aka error prone !) because > >> "correct" usage would be > >> > >> if(inherits(x, "numeric")) > >> > >> and that of course would *not* break after the change > >> above. > >> > >> - - - - > >> > >> A week later, I'm still pretty convinced it would be > >> worth going in the direction proposed above. > >> > >> But I was actually hoping for some encouragement or > >> "mental support"... or then to hear why you think the > >> proposition is not good or not viable ... > >> > >> > > > I really like Martin Maechler's "thought experiment > > proposal", but (based partly on the reception its gotten) > > figure I mustn't be appreciating the complications it > > would introduce.. > > Actually, I've spent half day implementing it and was very > pleased about it... as matter of fact it passed *all* our checks > also in all recommended packages (*) > > To do it cleanly... with very few code changes, > the *only* consequence would be that > > class(1.) > > (and similar) then returned "double" instead of "numeric". > which *would* be logical consequent, because indeed, > > numeric = {integer, double} > > in that new scheme, and class(1L) also returns "integer". > > To my big chagrin there was very big opposition such a change, > IIRC, mainly on the grounds that for 20 years or so S and then R > books and publications had said that double and numeric should > be basically the same. > > (*) Below you have a C level proposal which as you note is > similar to John Chambers R level change: > > The consequence is that basically you can no longer have "integer" > entries in "numeric" slots; they are automagically made into "double". > I personally find that not really "acceptable" {waste of storage}, > and I would guess that more code "out there in package-land and > user-code" would break than with my change. > > > That said, if it's decided to just make a smaller fix of > > as(x, "numeric"), might it be better to make the change at > > the C level, to R_set_class in $RHOME/src/main/coerce.c? > > I'm not seeing the advantage to make the change there, apart > from possibly some efficiency gain. >One advantage (relative to a solution based on setting a new S4 coerce() method for signature c("integer", "numeric") ) is that it would also make the following conversion work as naively expected: x <- 10L class(x) <- "numeric" class(x) # [1] "integer" ## would be "numeric" I know that's not a recommended strategy for converting an object's class, but for users like me, trying to make sense of as() and the class system, it would be even more perplexing if `as(x, "numeric")` and `class(x) <- "numeric"` yielded different results.> For the time being, I will not work on this ... mainly as I still > believe that my proposal would lead to a much much cleaner setup > (and yes, even be worth some small changes in new editions of > those R books which deal with such subtle issues) >Thanks, anyway, for having looked into this. If no changes are to be made, then it might (?) be worth modifying the "Basic Coercion Methods" section of ?as. It currently reads: Methods are pre-defined for coercing any object to one of the basic datatypes. For example, 'as(x, "numeric")' uses the existing 'as.numeric' function. These built-in methods can be listed by 'showMethods("coerce")'. which is not accurate for integer vectors 'x'.> Martin<div id="DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><table style="border-top: 1px solid #aaabb6; margin-top: 10px;"><tr> <td style="width: 105px; padding-top: 15px;"> <a href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail" target="_blank"><img src="https://ipmcdn.avast.com/images/logo-avast-v1.png" style="width: 90px; height:33px;"/></a> </td> <td style="width: 470px; padding-top: 20px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">This email has been sent from a virus-free computer protected by Avast. <br /><a href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail" target="_blank" style="color: #4453ea;">www.avast.com</a> </td> </tr> </table><a href="#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1" height="1"></a></div>
Apparently Analagous Threads
- For integer vectors, `as(x, "numeric")` has no effect.
- For integer vectors, `as(x, "numeric")` has no effect.
- [Bioc-devel] For integer vectors, `as(x, "numeric")` has no effect.
- For integer vectors, `as(x, "numeric")` has no effect.
- For integer vectors, `as(x, "numeric")` has no effect.