Martin Maechler
2007-Mar-20 22:19 UTC
[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes
As some of you may have seen / heard in the past, it is not possible to make cbind() and rbind() into proper S4 generic functions, since their first formal argument is '...'. [ BTW: S3-methods for these of course only dispatch on the first argument which is also not really satisfactory in the context of many possible matrix classes.] For this reason, after quite some discussion on R-core (and maybe a bit on R-devel) about the options, since R-2.2.0 we have had S4 generic functions cbind2() and rbind2() (and default methods) in R's "methods" which are a version of cbind() and rbind() respectively for two arguments (x,y) {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and hence don't make sense to be used to construct column-names or row-names for rbind(), respectively.} We have been defining methods for cbind2() and rbind2() for the 'Matrix' classes in late summer 2005 as well. So far so good. In addition, [see also help(cbind2) ], we have defined cbind() and rbind() functions which recursively call cbind2() and rbind2(), more or less following John Chambers proposal of dealing with such "(...)" argument functions. These new recursively defined cbind() / rbind() functions however have typically remained invisible in the methods package [you can see them via methods:::cbind or methods:::rbind ] and have been ``activated'' --- replacing base::cbind / rbind --- only via an explicit or implicit call to methods:::bind_activation(TRUE) One reason I didn't dare to make them the default was that I noticed they didn't behave identically to cbind() / rbind() in all cases, though IIRC the rare difference was only in the dimnames returned; further, being entirely written in R, and recursive, they were slower than the mostly C-based fast cbind() / rbind() functions. As some Bioconductor developers have recently found, these versions of cbind() and rbind() that have been automagically activated by loading the Matrix package can have a detrimental effect in some extreme cases, e.g. when using do.call(cbind, list_of_length_1000) because of the recursion and the many many calls to the S4 generic, each time searching for method dispatch ... For the bioconductor applications and potentially for others using cbind() / rbind() extensively, this can lead to unacceptable performance loss just because loading 'Matrix' currently calls methods:::bind_activation(TRUE) For this reason, we plan to refrain from doing this activation on loading of Matrix, but propose to 1) define and export cBind <- methods:::cbind rBind <- methods:::cbind also do this for R-2.5.0 so that other useRs / packages can start cBind() / rBind() in their code when they want to have something that can become properly object-oriented Possibly --- and this is the big RFC (request for comments) --- 2) __ for 'Matrix' only __ also define and export cbind <- methods:::cbind rbind <- methods:::cbind I currently see the possibilities of doing either '1)' or '1) and 2)' or less likely '2) alone' and like to get your feedback on this. "1)" alone would have the considerable drawback for current Matrix useRs that their code / scripts which has been using cbind() and rbind() for "Matrix" (and "matrix" and "numeric") objects no longer works, but needs to be changed to use rBind() and cBind() *instead* As soon as "2)" is done (in conjunction with "1)" or not), those who need a very fast but non-OO version of cbind() / rbind() need to call base::cbind() or base::rbind() explicitly. This however would not be necessary for packages with a NAMESPACE since these import 'base' automagically and hence would use base::cbind() automagically {unless they also import(Matrix)}. We are quite interested in your feedback! Martin Maechler and Doug Bates <Matrix-authors at R-project.org>
Luke Tierney
2007-Mar-21 01:27 UTC
[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes
On Tue, 20 Mar 2007, Martin Maechler wrote:> As some of you may have seen / heard in the past, > it is not possible to make cbind() and rbind() into proper S4 > generic functions, since their first formal argument is '...'. > [ BTW: S3-methods for these of course only dispatch on the first > argument which is also not really satisfactory in the context > of many possible matrix classes.] > > For this reason, after quite some discussion on R-core (and > maybe a bit on R-devel) about the options, since R-2.2.0 we have > had S4 generic functions cbind2() and rbind2() (and default methods) > in R's "methods" which are a version of cbind() and > rbind() respectively for two arguments (x,y) > {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and > hence don't make sense to be used to construct column-names or > row-names for rbind(), respectively.} > > We have been defining methods for cbind2() and rbind2() > for the 'Matrix' classes in late summer 2005 as well. So far so > good. > > In addition, [see also help(cbind2) ], > we have defined cbind() and rbind() functions which recursively call > cbind2() and rbind2(), more or less following John Chambers > proposal of dealing with such "(...)" argument functions. > These new recursively defined cbind() / rbind() functions > however have typically remained invisible in the methods package > [you can see them via methods:::cbind or methods:::rbind ] > and have been ``activated'' --- replacing base::cbind / rbind --- > only via an explicit or implicit call to > methods:::bind_activation(TRUE) > > One reason I didn't dare to make them the default was that I > noticed they didn't behave identically to cbind() / rbind() in > all cases, though IIRC the rare difference was only in the dimnames > returned; further, being entirely written in R, and recursive, > they were slower than the mostly C-based fast cbind() / rbind() > functions. > > As some Bioconductor developers have recently found, > these versions of cbind() and rbind() that have been > automagically activated by loading the Matrix package > can have a detrimental effect in some extreme cases, > e.g. when using > do.call(cbind, list_of_length_1000) > because of the recursion and the many many calls to the S4 > generic, each time searching for method dispatch ... > For the bioconductor applications and potentially for others using cbind() / > rbind() extensively, this can lead to unacceptable performance > loss just because loading 'Matrix' currently calls > methods:::bind_activation(TRUE)The recursion part is potentially problematic because stack space limitations will cause this to fail for "relatively" short list_of_length_1000, but that should be easily curable by rewriting methods:::cbind and methods:::rbind to use iteration rather than recursion. This might also help a little with efficiency by avoiding call overhead. It would be interesting to know how much of the performance hit is dispatch overhead and how much closure call overhead. If it's dispatch overhead then it may be worth figuring out some way of handling this with internal dispatch at the C level (at the cost of maintaining the C level stuff). My initial reaction to scanning the methods:::cbind code is that it is doing too much, at least too much R-level work, but I haven't thought it through carefully.> For this reason, we plan to refrain from doing this activation > on loading of Matrix, but propose to > > 1) define and export > cBind <- methods:::cbind > rBind <- methods:::cbind > > also do this for R-2.5.0 so that other useRs / packages > can start cBind() / rBind() in their code when they want to > have something that can become properly object-orientedIn mackage methods?> Possibly --- and this is the big RFC (request for comments) --- > > 2) __ for 'Matrix' only __ also > define and export > cbind <- methods:::cbind > rbind <- methods:::cbind > > I currently see the possibilities of doing > either '1)' > or '1) and 2)' > or less likely '2) alone' > > and like to get your feedback on this. > > "1)" alone would have the considerable drawback for current > Matrix useRs that their code / scripts which has been using > cbind() and rbind() for "Matrix" (and "matrix" and "numeric") > objects no longer works, but needs to be changed to use > rBind() and cBind() *instead* > > As soon as "2)" is done (in conjunction with "1)" or not), > those who need a very fast but non-OO version of cbind() / rbind() > need to call base::cbind() or base::rbind() explicitly. > This however would not be necessary for packages with a NAMESPACE > since these import 'base' automagically and hence would use > base::cbind() automagically {unless they also import(Matrix)}. > > We are quite interested in your feedback!Either one seems cleaner to me than having loading of one package result in mucking about in the internals of another. If we are thinking of these as long term solutions then I think having different names is cleaner, so 1) but not 2). If we are thinking of this as a transition towards making base::cbind and base::rbind support S4 dispatch via cbind2/rbind2 (assuming this can be done efficiently) then there may be some merit to 2) to minimize the need for code rewriting. It might be worth experimenting with having .Internal(cbind(...)) check its arguments and call methods:::cbind if (Methods is loaded and) any of the arguments are S4 -- as the S4 property is now cheap to determine that may be very low cost especially if done after the object bits have been checked with positive result. Best, luke> Martin Maechler and Doug Bates <Matrix-authors at R-project.org> > > ______________________________________________ > R-devel at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke at stat.uiowa.edu Iowa City, IA 52242 WWW: stat.uiowa.edu
Possibly Parallel Threads
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package
- Proper way to define cbind, rbind for s4 classes in package