What a delicious example! (I'm taking the liberty of sharing it with r-devel, since it raises some good issues.) You have two questions, presumably: 1 - how could the order of the setMethod calls make a difference in the results? 2 - what's causing the infinite loop & how could it be avoided, reliably? Second question first. The danger sign is the "vector" method: setMethod("foo", signature(A = "vector"), function(A, ...) { foo(matrix(A, nrow = 1), ...) }) This handles a vector by recalling the generic with a matrix. But "vector" is a superclass of "matrix" (see getClass("matrix")) and you DON'T have an explicit method defined for "matrix". So for sure if "vector" was the only method, you would be in immediate trouble. Coercing down the inheritance tree is potentially bad news. Generally, if you're going to take over dispatch by coercing an argument & then recalling the generic, the result is simplest if you have an exact match for the new call, not relying on inheritance. After all, if you're doing the work to coerce the argument, might as well pick one that works right away. Not a universal rule, but other things being equal ... There's an additional issue with methods for "matrix" and "array" because R never allows 2-way arrays to have class "array", which means using array() here with the same dimensions would not have helped. Also, matrix and array objects are weird in that they are not basic vectors but also have no class attribute, and is.object() is FALSE for them. More interesting though--how can the order of the setMethod() calls matter? To see that we need to look at the methods list object. (The discussion is simplified to the case that only one argument is involved, which doesn't affect the result.) The MethodsList object has a slot containing a named list of methods, with the names being those of the classes that appeared in the setMethod calls, in the order that the calls occurred(note!). All classes are essentially equal from the view of the generic function, so there's no ordering favoring the "more relevant". When method dispatch occurs, the code first looks for an exact match to the class of the actual argument--that's a quick search in the names of the list. If the direct search fails, the code now looks for an inherited method. The key point is that this second search is "greedy"--the first inherited method found is used. NOW it should be clear why the order of the setMethod() calls matters. You have two potential inherited methods here for "matrix"; namely, "array" and "vector". WE know that the "array" method is `closer', and the R dispatcher could decide this also, if it were willing to look through all possible inheritance paths and drop one possibility if a better one was found. It currently doesn't do any further search, and doing so would be a modest efficiency hit. I'm inclined to think the cost would be worth it to eliminate unpleasant suprises like this one, but opinions may differ. (Once found, the inherited method is stored directly in the list used for the first lookup, so the hit is only the first time a particular signature turns up.) To make the difference clearer, I added to your example another generic "bar" with the same methods as "foo", but with the order of the setMethod calls reversed. By looking at the "methods" slot in the two cases, we can see why the bad ("vector") method is selected for bar() but not for foo(): > names(getMethods("foo")@methods) [1] "array" "vector" > names(getMethods("bar")@methods) [1] "vector" "array" After running foo(1:10) and trying to run bar(1:10): > showMethods("foo") Function "foo": A = "array" A = "vector" A = "integer" (inherited from A = "vector") A = "matrix" (inherited from A = "array") > showMethods("bar") Function "bar": A = "vector" A = "array" A = "integer" (inherited from A = "vector") A = "matrix" (inherited from A = "vector") But including setMethod("bar", "matrix", ...) in the source code makes bar() work fine. Paul Roebuck wrote:>Sorry to bother but could you shed some light on this? >I don't understand why order of setMethod calls makes >any difference. Since it obviously does, it has shaken >the foundations of what I thought I understood about >S4 methods. Even Gabor was surprised... > > >---------- Forwarded message ---------- >Date: Wed, 12 Apr 2006 18:24:46 -0400 >From: Gabor Grothendieck <ggrothendieck@gmail.com> >To: Paul Roebuck <roebuck@mdanderson.org> >Cc: R Help Mailing List <r-help@stat.math.ethz.ch> >Subject: Re: [R] S4 method dispatch matrixOrArray > >On 4/12/06, Paul Roebuck <roebuck@mdanderson.org> wrote: > > > >>On Wed, 12 Apr 2006, Gabor Grothendieck wrote: >> >> >> >>>On 4/12/06, Paul Roebuck <roebuck@mdanderson.org> wrote: >>> >>> >>> >>>>I have some code where the primary dispatching is on >>>>other parameters so I'd like not to have to create a >>>>set of functions for "matrix" and another duplicate >>>>set for "array". But the class union technique isn't >>>>working as implemented below and I don't have my Green >>>>book with me. How do I fix my infinite recursion problem? >>>> >>>> >>>>##-------------------------------------------------------- >>>>library(methods) >>>> >>>>setGeneric("foo", >>>> function(A, ...) { >>>> cat("generic", match.call()[[1]], "\n") >>>> standardGeneric("foo") >>>> }) >>>> >>>>setMethod("foo", >>>> signature(A = "vector"), >>>> function(A, ...) { >>>> callGeneric(matrix(A, nrow = 1), ...) >>>> }) >>>> >>>>setClassUnion("matrixOrArray", c("matrix", "array")) >>>> >>>>setMethod("foo", >>>> signature(A = "matrixOrArray"), >>>> function(A, ...) { >>>> cat("A =", A, "\n") >>>> }) >>>> >>>>## Test >>>>foo(1:4) >>>>foo(matrix(1:4, 1, 4)) >>>>foo(array(1:4, c(1, 4, 1))) >>>> >>>> >>>I think its good enough to just define an array method, i.e. you >>>don't need the matrix method or the matrixOrArray class, and the >>>vector method can call foo(matrix(A,1), ...) so: >>> >>>setGeneric("foo", >>> function(A, ...) { >>> cat("generic", match.call()[[1]], "\n") >>> standardGeneric("foo") >>> }) >>> >>>setMethod("foo", >>> signature(A = "array"), >>> function(A, ...) { >>> cat("A =", A, "\n") >>> }) >>> >>>setMethod("foo", >>> signature(A = "vector"), >>> function(A, ...) { >>> foo(matrix(A, nrow = 1), ...) >>> }) >>> >>> >>Something didn't seem right here. That was pretty close >>to what I had started with, before trying to go the >>classUnion route. Matter of fact, the vector method can >>retain use of callGeneric. >> >>The solution has to do with the order in which calls to >>setMethod are made. Adding foo-vector after foo-array >>works fine; the other way around causes infinite recursion. >> >> > >This is surprising. I would have thought that the >parent/child relationships determine the order that >dispatched methods are invoked, not the order that >the setMethod commands are issued in. > > > >[[alternative HTML version deleted]]
I think a rule is needed that would require that a class not succeed any of its parents on the method list. That way "array" could never come after "vector" and, in general, a parent would never be found prior to any of its descendents. Of course that still does not address the problem that the order of the setMethod's rather than the relationships among the classes can influence the search order but it would eliminate the situation we saw here. On 4/13/06, John Chambers <jmc at r-project.org> wrote:> What a delicious example! (I'm taking the liberty of sharing it with > r-devel, since it raises some good issues.) > > You have two questions, presumably: > > 1 - how could the order of the setMethod calls make a difference in the > results? > 2 - what's causing the infinite loop & how could it be avoided, reliably? > > Second question first. The danger sign is the "vector" method: > > > setMethod("foo", > signature(A = "vector"), > function(A, ...) { > foo(matrix(A, nrow = 1), ...) > }) > > This handles a vector by recalling the generic with a matrix. But "vector" > is a superclass of "matrix" (see getClass("matrix")) and you DON'T have an > explicit method defined for "matrix". So for sure if "vector" was the only > method, you would be in immediate trouble. Coercing down the inheritance > tree is potentially bad news. > > Generally, if you're going to take over dispatch by coercing an argument & > then recalling the generic, the result is simplest if you have an exact > match for the new call, not relying on inheritance. After all, if you're > doing the work to coerce the argument, might as well pick one that works > right away. Not a universal rule, but other things being equal ... > > There's an additional issue with methods for "matrix" and "array" because R > never allows 2-way arrays to have class "array", which means using array() > here with the same dimensions would not have helped. Also, matrix and array > objects are weird in that they are not basic vectors but also have no class > attribute, and is.object() is FALSE for them. > > More interesting though--how can the order of the setMethod() calls matter? > To see that we need to look at the methods list object. (The discussion is > simplified to the case that only one argument is involved, which doesn't > affect the result.) > > The MethodsList object has a slot containing a named list of methods, with > the names being those of the classes that appeared in the setMethod calls, > in the order that the calls occurred(note!). All classes are essentially > equal from the view of the generic function, so there's no ordering favoring > the "more relevant". > > When method dispatch occurs, the code first looks for an exact match to the > class of the actual argument--that's a quick search in the names of the > list. > > If the direct search fails, the code now looks for an inherited method. The > key point is that this second search is "greedy"--the first inherited method > found is used. > > NOW it should be clear why the order of the setMethod() calls matters. You > have two potential inherited methods here for "matrix"; namely, "array" and > "vector". WE know that the "array" method is `closer', and the R dispatcher > could decide this also, if it were willing to look through all possible > inheritance paths and drop one possibility if a better one was found. > > It currently doesn't do any further search, and doing so would be a modest > efficiency hit. I'm inclined to think the cost would be worth it to > eliminate unpleasant suprises like this one, but opinions may differ. (Once > found, the inherited method is stored directly in the list used for the > first lookup, so the hit is only the first time a particular signature > turns up.) > > To make the difference clearer, I added to your example another generic > "bar" with the same methods as "foo", but with the order of the setMethod > calls reversed. > By looking at the "methods" slot in the two cases, we can see why the bad > ("vector") method is selected for bar() but not for foo(): > > > names(getMethods("foo")@methods) > [1] "array" "vector" > > names(getMethods("bar")@methods) > [1] "vector" "array" > > After running foo(1:10) and trying to run bar(1:10): > > > showMethods("foo") > > Function "foo": > A = "array" > A = "vector" > A = "integer" > (inherited from A = "vector") > A = "matrix" > (inherited from A = "array") > > showMethods("bar") > > Function "bar": > A = "vector" > A = "array" > A = "integer" > (inherited from A = "vector") > A = "matrix" > (inherited from A = "vector") > > But including setMethod("bar", "matrix", ...) in the source code makes bar() > work fine. > > > > Paul Roebuck wrote: > Sorry to bother but could you shed some light on this?I don't understand> why order of setMethod calls makesany difference. Since it obviously does,> it has shakenthe foundations of what I thought I understood about S4> methods. Even Gabor was surprised...---------- Forwarded message> ----------Date: Wed, 12 Apr 2006 18:24:46 -0400 From: Gabor Grothendieck> <ggrothendieck at gmail.com> > To: Paul Roebuck <roebuck at mdanderson.org> Cc: R Help Mailing List > <r-help at stat.math.ethz.ch> Subject: Re: [R] S4 method dispatch matrixOrArray > On 4/12/06, Paul Roebuck <roebuck at mdanderson.org> wrote: > > > On Wed, 12 Apr 2006, Gabor Grothendieck wrote:> On 4/12/06, Paul Roebuck <roebuck at mdanderson.org> wrote:> I have some code where the primary dispatching is onother parameters so I'd> like not to have to create aset of functions for "matrix" and another> duplicateset for "array". But the class union technique isn't working as> implemented below and I don't have my Greenbook with me. How do I fix my> infinite recursion > problem?##-------------------------------------------------------- library(methods) setGeneric("foo",> function(A, ...) {cat("generic", match.call()[[1]], "\n")> standardGeneric("foo")}) setMethod("foo", signature(A = "vector"),> function(A, ...) {callGeneric(matrix(A, nrow = 1), ...)> })setClassUnion("matrixOrArray", c("matrix", "array")) setMethod("foo",> signature(A = "matrixOrArray"),function(A, ...) { cat("A =", A, "\n")> })## Test foo(1:4) foo(matrix(1:4, 1, 4)) foo(array(1:4, c(1, 4, 1)))> I think its good enough to just define an array method, i.e. youdon't need> the matrix method or the matrixOrArray class, and thevector method can call> foo(matrix(A,1), ...) so:setGeneric("foo", function(A, ...) {> cat("generic", match.call()[[1]], "\n")standardGeneric("foo")> })setMethod("foo", signature(A = "array"), function(A, ...) { cat("A> =", A, "\n")}) setMethod("foo", signature(A = "vector"), function(A,> ...) {foo(matrix(A, nrow = 1), ...) })> Something didn't seem right here. That was pretty closeto what I had> started with, before trying to go theclassUnion route. Matter of fact, the> vector method canretain use of callGeneric. The solution has to do with> the order in which calls tosetMethod are made. Adding foo-vector after> foo-arrayworks fine; the other way around causes infinite recursion.> This is surprising. I would have thought that theparent/child relationships> determine the order thatdispatched methods are invoked, not the order> thatthe setMethod commands are issued in.>
Thanks for your in-depth explanation. I had noticed the difference in order in showMethods() output but was unsure whether that was indicative of the problem or if I was somehow taking advantage of an undocumented implementation-specific detail. If I could, I'd like to go back to the original question from the R-help post here. The original method was doing its real dispatching on the arguments of the example that were represented by the dots. The meat of the method was to manipulate array objects and as such I didn't want to repeat numerous array methods as matrix methods (code duplication with the only difference being the first argument). From another recent posting on a similar topic, I wanted to do something along the lines of setClassUnion("matrixOrArray", c("matrix", "array")) but that didn't work as originally presented (probably due to your comment on matrix/array weirdness). So my other question would be: 3 - Is there a way to use the class union such that I don't have to duplicate the code unnecessarily yet avoid the setMethod ordering issue and clarify the original intention? On Thu, 13 Apr 2006, John Chambers wrote:> You have two questions, presumably: > > 1 - how could the order of the setMethod calls make a difference in the > results? > 2 - what's causing the infinite loop & how could it be avoided, reliably? > > Second question first. The danger sign is the "vector" method: > > setMethod("foo", > signature(A = "vector"), > function(A, ...) { > foo(matrix(A, nrow = 1), ...) > }) > > This handles a vector by recalling the generic with a matrix. But > "vector" is a superclass of "matrix" (see getClass("matrix")) and you > DON'T have an explicit method defined for "matrix". So for sure if > "vector" was the only method, you would be in immediate trouble. > Coercing down the inheritance tree is potentially bad news. > > Generally, if you're going to take over dispatch by coercing an argument > & then recalling the generic, the result is simplest if you have an > exact match for the new call, not relying on inheritance. After all, if > you're doing the work to coerce the argument, might as well pick one > that works right away. Not a universal rule, but other things being > equal ... > > There's an additional issue with methods for "matrix" and "array" > because R never allows 2-way arrays to have class "array", which means > using array() here with the same dimensions would not have helped. > Also, matrix and array objects are weird in that they are not basic > vectors but also have no class attribute, and is.object() is FALSE for them. > > More interesting though--how can the order of the setMethod() calls > matter? To see that we need to look at the methods list object. (The > discussion is simplified to the case that only one argument is involved, > which doesn't affect the result.) > > The MethodsList object has a slot containing a named list of methods, > with the names being those of the classes that appeared in the setMethod > calls, in the order that the calls occurred(note!). All classes are > essentially equal from the view of the generic function, so there's no > ordering favoring the "more relevant". > > When method dispatch occurs, the code first looks for an exact match to > the class of the actual argument--that's a quick search in the names of > the list. > > If the direct search fails, the code now looks for an inherited method. > The key point is that this second search is "greedy"--the first > inherited method found is used. > > NOW it should be clear why the order of the setMethod() calls matters. > You have two potential inherited methods here for "matrix"; namely, > "array" and "vector". WE know that the "array" method is `closer', and > the R dispatcher could decide this also, if it were willing to look > through all possible inheritance paths and drop one possibility if a > better one was found. > > It currently doesn't do any further search, and doing so would be a > modest efficiency hit. I'm inclined to think the cost would be worth it > to eliminate unpleasant suprises like this one, but opinions may > differ. (Once found, the inherited method is stored directly in the > list used for the first lookup, so the hit is only the first time a > particular signature turns up.) > > To make the difference clearer, I added to your example another generic > "bar" with the same methods as "foo", but with the order of the > setMethod calls reversed. > By looking at the "methods" slot in the two cases, we can see why the > bad ("vector") method is selected for bar() but not for foo(): > > > names(getMethods("foo")@methods) > [1] "array" "vector" > > names(getMethods("bar")@methods) > [1] "vector" "array" > > After running foo(1:10) and trying to run bar(1:10): > > > showMethods("foo") > > Function "foo": > A = "array" > A = "vector" > A = "integer" > (inherited from A = "vector") > A = "matrix" > (inherited from A = "array") > > showMethods("bar") > > Function "bar": > A = "vector" > A = "array" > A = "integer" > (inherited from A = "vector") > A = "matrix" > (inherited from A = "vector") > > But including setMethod("bar", "matrix", ...) in the source code makes > bar() work fine. > > > > Paul Roebuck wrote: > > >Sorry to bother but could you shed some light on this? > >I don't understand why order of setMethod calls makes > >any difference. Since it obviously does, it has shaken > >the foundations of what I thought I understood about > >S4 methods. Even Gabor was surprised... > > > > > >---------- Forwarded message ---------- > >Date: Wed, 12 Apr 2006 18:24:46 -0400 > >From: Gabor Grothendieck <ggrothendieck at gmail.com> > >To: Paul Roebuck <roebuck at mdanderson.org> > >Cc: R Help Mailing List <r-help at stat.math.ethz.ch> > >Subject: Re: [R] S4 method dispatch matrixOrArray > > > >On 4/12/06, Paul Roebuck <roebuck at mdanderson.org> wrote: > > > > > > > >>On Wed, 12 Apr 2006, Gabor Grothendieck wrote: > >> > >> > >> > >>>On 4/12/06, Paul Roebuck <roebuck at mdanderson.org> wrote: > >>> > >>> > >>> > >>>>I have some code where the primary dispatching is on > >>>>other parameters so I'd like not to have to create a > >>>>set of functions for "matrix" and another duplicate > >>>>set for "array". But the class union technique isn't > >>>>working as implemented below and I don't have my Green > >>>>book with me. How do I fix my infinite recursion problem? > >>>> > >>>> > >>>>##-------------------------------------------------------- > >>>>library(methods) > >>>> > >>>>setGeneric("foo", > >>>> function(A, ...) { > >>>> cat("generic", match.call()[[1]], "\n") > >>>> standardGeneric("foo") > >>>> }) > >>>> > >>>>setMethod("foo", > >>>> signature(A = "vector"), > >>>> function(A, ...) { > >>>> callGeneric(matrix(A, nrow = 1), ...) > >>>> }) > >>>> > >>>>setClassUnion("matrixOrArray", c("matrix", "array")) > >>>> > >>>>setMethod("foo", > >>>> signature(A = "matrixOrArray"), > >>>> function(A, ...) { > >>>> cat("A =", A, "\n") > >>>> }) > >>>> > >>>>## Test > >>>>foo(1:4) > >>>>foo(matrix(1:4, 1, 4)) > >>>>foo(array(1:4, c(1, 4, 1))) > >>>> > >>>> > >>>I think its good enough to just define an array method, i.e. you > >>>don't need the matrix method or the matrixOrArray class, and the > >>>vector method can call foo(matrix(A,1), ...) so: > >>> > >>>setGeneric("foo", > >>> function(A, ...) { > >>> cat("generic", match.call()[[1]], "\n") > >>> standardGeneric("foo") > >>> }) > >>> > >>>setMethod("foo", > >>> signature(A = "array"), > >>> function(A, ...) { > >>> cat("A =", A, "\n") > >>> }) > >>> > >>>setMethod("foo", > >>> signature(A = "vector"), > >>> function(A, ...) { > >>> foo(matrix(A, nrow = 1), ...) > >>> }) > >>> > >>> > >>Something didn't seem right here. That was pretty close > >>to what I had started with, before trying to go the > >>classUnion route. Matter of fact, the vector method can > >>retain use of callGeneric. > >> > >>The solution has to do with the order in which calls to > >>setMethod are made. Adding foo-vector after foo-array > >>works fine; the other way around causes infinite recursion. > >> > >> > > > >This is surprising. I would have thought that the > >parent/child relationships determine the order that > >dispatched methods are invoked, not the order that > >the setMethod commands are issued in. > > > > > > > > >---------------------------------------------------------- SIGSIG -- signature too long (core dumped)