Hervé Pagès
2014-Feb-19 00:17 UTC
[Rd] dispatch on "c" method when passing named arguments
Hi, Many S4 objects in Bioconductor support "combining" via a "c" method. A common use case is to combine a collection of S4 objects stored in a list with the following idiom: do.call("c", list_of_objects) For many users this doesn't return what they expect though, because their 'list_of_objects' is a named list and this seems to "break" dispatch if the "c" method is defined for a parent class. Here is an example: setClass("A", representation(aa="integer")) setMethod("c", "A", function(x, ..., recursive=FALSE) { if (missing(x)) { objects <- list(...) x <- objects[[1L]] } else { objects <- list(x, ...) } new(class(x), aa=unlist(lapply(objects, slot, "aa"), use.names=FALSE)) } ) raw_input <- list(chr1=1:3, chr2=11:12) list_of_A_objects <- lapply(raw_input, function(aa) new("A", aa=aa)) Then: > do.call("c", list_of_A_objects) An object of class "A" Slot "aa": [1] 1 2 3 11 12 ==> all is fine. But: setClass("B", contains="A") list_of_B_objects <- lapply(raw_input, function(aa) new("B", aa=aa)) Then: > do.call("c", list_of_B_objects) $chr1 An object of class "B" Slot "aa": [1] 1 2 3 $chr2 An object of class "B" Slot "aa": [1] 11 12 ==> dispatch failed! Note that selectMethod() is not helping the user understand what's going on: > selectMethod("c", "B") Method Definition: function (x, ..., recursive = FALSE) { if (missing(x)) { objects <- list(...) x <- objects[[1L]] } else { objects <- list(x, ...) } new(class(x), aa = unlist(lapply(objects, slot, "aa"), use.names = FALSE)) } Signatures: x target "B" defined "A" Not obvious to realize that dispatch failed when selectMethod() has no problem finding the expected method. Many BioC users have been bitten by this for many years now and many more will be. The work around is simple: do.call("c", unname(list_of_objects)) but not obvious for most users and easy to forget even for advanced users. Interestingly calling "c" with unnamed arguments seems to "repair" the dispatch mechanism for any subsequent call: > do.call("c", unname(list_of_B_objects)) An object of class "B" Slot "aa": [1] 1 2 3 11 12 > do.call("c", list_of_B_objects) An object of class "B" Slot "aa": [1] 1 2 3 11 12 ==> now it works even with a named list! (Apparently some caching happened, as reported by showMethods("c").) It would be really really nice if something could be done about this and I'd be happy to contribute in any way I can. A big thanks in advance from the Bioconductor community! H. PS: My understanding is that c() being a primitive, a special dispatch mechanism is used i.e. there is something like a "pre-dispatch step" where only the 1st arg is considered, and only if this arg is an S4 object is the implicit S4 generic called. But even that special dispatch doesn't quite explain the behavior I report above. > sessionInfo() R Under development (unstable) (2014-02-10 r64961) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319