Hervé Pagès
2014-Feb-19  00:17 UTC
[Rd] dispatch on "c" method when passing named arguments
Hi,
Many S4 objects in Bioconductor support "combining" via a
"c" method.
A common use case is to combine a collection of S4 objects stored in
a list with the following idiom:
   do.call("c", list_of_objects)
For many users this doesn't return what they expect though, because
their 'list_of_objects' is a named list and this seems to
"break"
dispatch if the "c" method is defined for a parent class.
Here is an example:
   setClass("A", representation(aa="integer"))
   setMethod("c", "A",
     function(x, ..., recursive=FALSE)
     {
       if (missing(x)) {
         objects <- list(...)
         x <- objects[[1L]]
       } else {
         objects <- list(x, ...)
       }
       new(class(x), aa=unlist(lapply(objects, slot, "aa"), 
use.names=FALSE))
     }
   )
   raw_input <- list(chr1=1:3, chr2=11:12)
   list_of_A_objects <- lapply(raw_input, function(aa) new("A",
aa=aa))
Then:
   > do.call("c", list_of_A_objects)
   An object of class "A"
   Slot "aa":
   [1]  1  2  3 11 12
==> all is fine.
But:
   setClass("B", contains="A")
   list_of_B_objects <- lapply(raw_input, function(aa) new("B",
aa=aa))
Then:
   > do.call("c", list_of_B_objects)
   $chr1
   An object of class "B"
   Slot "aa":
   [1] 1 2 3
   $chr2
   An object of class "B"
   Slot "aa":
   [1] 11 12
==> dispatch failed!
Note that selectMethod() is not helping the user understand what's
going on:
   > selectMethod("c", "B")
   Method Definition:
   function (x, ..., recursive = FALSE)
   {
     if (missing(x)) {
         objects <- list(...)
         x <- objects[[1L]]
     }
     else {
         objects <- list(x, ...)
     }
     new(class(x), aa = unlist(lapply(objects, slot, "aa"), use.names
=
FALSE))
   }
   Signatures:
           x
   target  "B"
   defined "A"
Not obvious to realize that dispatch failed when selectMethod() has no
problem finding the expected method.
Many BioC users have been bitten by this for many years now and many
more will be. The work around is simple:
   do.call("c", unname(list_of_objects))
but not obvious for most users and easy to forget even for advanced
users.
Interestingly calling "c" with unnamed arguments seems to
"repair"
the dispatch mechanism for any subsequent call:
   > do.call("c", unname(list_of_B_objects))
   An object of class "B"
   Slot "aa":
   [1]  1  2  3 11 12
   > do.call("c", list_of_B_objects)
   An object of class "B"
   Slot "aa":
   [1]  1  2  3 11 12
==> now it works even with a named list!
(Apparently some caching happened, as reported by showMethods("c").)
It would be really really nice if something could be done about this
and I'd be happy to contribute in any way I can.
A big thanks in advance from the Bioconductor community!
H.
PS: My understanding is that c() being a primitive, a special dispatch
mechanism is used i.e. there is something like a "pre-dispatch step"
where only the 1st arg is considered, and only if this arg is an S4
object is the implicit S4 generic called. But even that special
dispatch doesn't quite explain the behavior I report above.
 > sessionInfo()
R Under development (unstable) (2014-02-10 r64961)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
-- 
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
