Hi Martin, On 10/26/20 04:52, Martin Maechler wrote:>> >> Hi there, >> Is that change in R-devel intentional? >> >> library(Matrix) >> m <- as(matrix(c(0, 1)), "sparseMatrix") >> >> isS4(m) >> # [1] TRUE >> >> x <- I(m) >> # Warning message: >> # In `class<-`(x, unique.default(c("AsIs", oldClass(x)))) : >> # Setting class(x) to multiple strings ("AsIs", "dgCMatrix", ...); >> result will no longer be an S4 object >> >> isS4(x) >> # [1] FALSE >> >> This works fine in R 4.0.3 i.e. no warning and I() doesn't turn off the >> S4 bit of the object. >> >> This change breaks 17 Bioconductor packages. >> >> Seems that the culprit is this change in how I() is implemented: >> >> In R 4.0.3: >> >> > I >> function (x) >> { >> structure(x, class = unique(c("AsIs", oldClass(x)))) >> } >> >> In R devel: >> >> > I >> function (x) >> `class<-`(x, unique.default(c("AsIs", oldClass(x)))) > > Yes, (by me), as I() was sticking out in the slowness bug PR#17794 > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.r-2Dproject.org_bugzilla_show-5Fbug.cgi-3Fid-3D17794&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=isOqEV-_6Yk1PlRZBoBchZHZYnpQxPGEZdZPgTHMkKg&e> > and the direct dangerous `call<-` will be replaced happily in > I()'s definition. > > *But* as Luke Tierney had remarked to R-core, direct changing of > the class of an S4 object has given the above warning for a > quite while, (svn r47934 | jmc | 2009-02-17 ) > *and* it has rather been an inconsistency in R, that you could > still use "low-level" means to change the class of an S4 object > to something invalid. > > I really don't think people should be allowed to use I() to > change a valid S4 object into an invalid one, but this is what > happens (R 4.0.3 patched) : > >> require(Matrix); M <- Matrix(0, 2,3); IM <- I(M) >> validObject(IM) > Error in .classEnv(classDef) : > trying to get slot "package" from an object of a basic class ("NULL") with no slots >> IM > ----------- FAILURE REPORT -------------- > --- failure: length > 1 in coercion to logical --- > --- srcref --- > : > --- package (from environment) --- > methods > --- call from context --- > showDefault(object) > --- call from argument --- > !is.null(clDef) && isS4(object) && is.na(match(clDef at className, > .BasicClasses)) > --- R stacktrace --- > where 1: showDefault(object) > where 2: Error in showDefault(object) : > cannot get a slot ("slots") from an object of type "NULL"The way I interpret this is that 'IM' breaks validObject(). This is not exactly the same as saying that 'IM' is invalid.> >> Unfortunately there is a bunch of code around that calls I() on S4 >> objects, admittedly not necessarily for very good reasons, but it >> happens. Would it be possible that I() has a less destructive effect on >> S4 objects? > > I'm not sure if this is really desirable... but I may fail to > see the point of allowing invalid I(<S4>) objects as they > appear in R 4.0.x .. > > So what do you really propose that I(.) should be doing, e.g., > for 'M' above ?To provide some context, people typically use I() when they construct a data frame as a way to tell the data frame constructor to treat the supplied columns as-is. In Bioconductor we have DataFrame, a data-frame-like structure where the columns can be anything, including S4 objects, has long they have a vector-like semantic. Like data.frame(), the DataFrame() constructor will also treat columns that carry the AsIs tag as-is. A typical use case where this feature is used on an S4 column is to nest DataFrame objects. For example, with R 4.0.3: library(S4Vectors) df <- DataFrame(X=1:3, Y=letters[1:3]) DataFrame(a=df, Z=11:13) # DataFrame with 3 rows and 3 columns # a.X a.Y Z # <integer> <character> <integer> # 1 1 a 11 # 2 2 b 12 # 3 3 c 13 But if we wrap 'df' in I(): DataFrame(a=I(df), Z=11:13) # DataFrame with 3 rows and 2 columns # a Z # <DataFrame> <integer> # 1 1:a 11 # 2 2:b 12 # 3 3:c 13 'df' ends up in the 1st column of the returned DataFrame. AFAICT the fact that wrapping 'df' in I() was producing some kind of Frankenstein object that breaks validObject() has never been a problem in practice because the Dataframe() constructor immediately removes the AsIs tag internally. This restores the original object. Note that this is an important difference with data.frame() where the AsIs tag sticks around: class(DataFrame(a=I(letters))$a) # [1] "character" class(data.frame(a=I(letters))$a) # [1] "AsIs" Now the new behavior of I() makes a little bit more damage to an S4 object. In addition to breaking validObject(), like the old behavior did, it removes its S4 bit. This means that after removing the AsIs tag, the DataFrame() constructor now will also need to repair the object (with asS4(), which I just discovered today). This is not too bad and we could do that. However the problem would remain that now users get an ugly/obscure warning when they do things like: DataFrame(a=I(df), Z=11:13). I can think of 2 ways to move forward: 1. Keep I()'s current implementation but suppress the warning. We'll make the necessary adjustments to DataFrame() to repair columns supplied as I(<S4>) objects. Note that we would still be in the situation where I(<S4>) objects break validObject() but we've been in that situation for years and so far we've managed to work around it. However this doesn't mean that validObject() shouldn't be fixed. Note that print(I(<S4>)) would also need to be fixed (it says "<S4 Type Object>" which is misleading). Anyways, these 2 issues are separated from the main issue and can be dealt with later. 2. Completely revisit the behavior of I() on S4 objects. Maybe it shouldn't touch the class of the object, only add an attribute e.g. attr(M, "AsIs") <- TRUE. This would obviously be an important change, with the potential to break a lot of existing code. In particular, using inherits(x, "AsIs") would no longer be a reliable way to check for the presence of the AsIs tag so a dedicated function would probably be needed e.g. is.AsIs(). Furthermore, at some point in the future, maybe the attribute approach could be extended to everything, not just S4 objects. Right now I() breaks S4 method dispatch, not only on S4 objects: setGeneric("foo", function(x) standardGeneric("foo")) setMethod("foo", "dgCMatrix", function(x) 123) foo(M) # [1] 123 foo(IM) # Error in (function (classes, fdef, mtable) : # unable to find an inherited method for function ?foo? for signature ?"AsIs"? but also on S3 objects: setMethod("foo", "character", function(x) 456) foo(letters) # [1] 456 foo(I(letters)) # Error in (function (classes, fdef, mtable) : # unable to find an inherited method for function ?foo? for signature ?"AsIs"? So I would argue that the current class-based approach of I() is hopelessly broken and that an attribute-based approach would be cleaner. Thanks, H.> > Martin > > >> >> Thanks, >> H. >> >> > sessionInfo() >> R Under development (unstable) (2020-10-17 r79346) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 20.04.1 LTS >> >> Matrix products: default >> BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so >> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] Matrix_1.2-18 >> >> loaded via a namespace (and not attached): >> [1] compiler_4.1.0 grid_4.1.0 lattice_0.20-41 >> >> -- >> Herv? Pag?s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fredhutch.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ej3wnc10LiheZsqjRuonTr2WwHWU4ecaDSrTBFby8wU&s=eSd-I3vlksMkH9SQjlruGL9bsHgTkUb1m7dG3OuVNAw&e-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On 10/29/20 23:08, Pages, Herve wrote: ...> > I can think of 2 ways to move forward: > > 1. Keep I()'s current implementation but suppress the warning. We'll > make the necessary adjustments to DataFrame() to repair columns supplied > as I(<S4>) objects. Note that we would still be in the situation where > I(<S4>) objects break validObject() but we've been in that situation for > years and so far we've managed to work around it. However this doesn't > mean that validObject() shouldn't be fixed. Note that print(I(<S4>)) > would also need to be fixed (it says "<S4 Type Object>" which is > misleading). Anyways, these 2 issues are separated from the main issue > and can be dealt with later.1b. A variant of the above could be to use the old implementation for S4 objects only: I <- function(x) { if (isS4(x)) { structure(x, class = unique.default(c("AsIs", oldClass(x)))) } else { `class<-`(x, unique.default(c("AsIs", oldClass(x)))) } } That is probably a good compromise for now. I would also suggest that the "package" attribute of the S4 class be kept around so the code that we use to restore the original object has a way to restore it exactly, including its full class specification. Right now, and also with the previous implementation, we cannot do that because attr(class(x), "package") is lost. So something like this: I <- function(x) { if (isS4(x)) { x_class <- class(x) new_classes <- c("AsIs", x_class) attr(new_classes, "package") <- attr(x_class, "package") structure(x, class=new_classes) } else { `class<-`(x, unique.default(c("AsIs", oldClass(x)))) } } Thanks, H. -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
iuke-tier@ey m@iii@g oii uiow@@edu
2020-Oct-30 17:48 UTC
[Rd] [External] Re: Change to I() in R 4.1
On Fri, 30 Oct 2020, Pages, Herve wrote:> > On 10/29/20 23:08, Pages, Herve wrote: > ... >> >> I can think of 2 ways to move forward: >> >> 1. Keep I()'s current implementation but suppress the warning. We'll >> make the necessary adjustments to DataFrame() to repair columns supplied >> as I(<S4>) objects. Note that we would still be in the situation where >> I(<S4>) objects break validObject() but we've been in that situation for >> years and so far we've managed to work around it. However this doesn't >> mean that validObject() shouldn't be fixed. Note that print(I(<S4>)) >> would also need to be fixed (it says "<S4 Type Object>" which is >> misleading). Anyways, these 2 issues are separated from the main issue >> and can be dealt with later. > > 1b. A variant of the above could be to use the old implementation for S4 > objects only: > > I <- function(x) > { > if (isS4(x)) { > structure(x, class = unique.default(c("AsIs", oldClass(x)))) > } else { > `class<-`(x, unique.default(c("AsIs", oldClass(x)))) > } > } > > That is probably a good compromise for now.Not really. The underlying problem is that class<- and attributes<- (which is what structure() uses) handle the 'class' attribute differently, and that needs to be fixed. I don't have a strong opinion on what either should do, but they should do the same thing. It's probably worth re-thinking the I() mechanism. ?Modifying the value, whether by changing the class or an attribute, is going to be brittle. A little less so for an attribute, but using an attribute rules out dispatch on the AsIs property. Best, luke> > I would also suggest that the "package" attribute of the S4 class be > kept around so the code that we use to restore the original object has a > way to restore it exactly, including its full class specification. Right > now, and also with the previous implementation, we cannot do that > because attr(class(x), "package") is lost. So something like this: > > I <- function(x) > { > if (isS4(x)) { > x_class <- class(x) > new_classes <- c("AsIs", x_class) > attr(new_classes, "package") <- attr(x_class, "package") > structure(x, class=new_classes) > } else { > `class<-`(x, unique.default(c("AsIs", oldClass(x)))) > } > } > > Thanks, > H. > > >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu