Ramon Diaz
2003-Apr-01 10:02 UTC
[R] R function calling: efficiency of different alternatives
Dear all, I have a piece of code, call it "FA", that will be called thousands of times in a typical run of function "FB". I can: a) define FA as a function outside of FB (in the global environment), and call it; b) define FA as a function inside the body of FB and call it; c) "expand inline" FA inside FB. FA mainly does data frame subsetting, runs svd's, and calls compiled C++ code. I think I recall reading something about differences in efficiency between those three approaches, but I can't find the information (I've searched the email archives, Venables & Ripley's "S programming" and MASS, the R manuals, and Burn's "S Poetry"). Are there any real performance differences? (I'd personally prefer b) since these functions will hopefully become a contributed package, and FA is not supposed to be used directly by an end user. I am aware of scoping differences between the three approaches, but that is not my main concern now). Thanks, Ram?n -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) http://bioinfo.cnio.es/~rdiaz
Barry Rowlingson
2003-Apr-01 11:29 UTC
[R] R function calling: efficiency of different alternatives
Ramon Diaz wrote:> I think I recall reading something about differences in efficiency between > those three approaches, but I can't find the information (I've searched the > email archives, Venables & Ripley's "S programming" and MASS, the R manuals, > and Burn's "S Poetry"). > > Are there any real performance differences? >I've always worked on the understanding that you should get code working and then profile it before you think about optimising it. There's no point worrying about efficiency issues with function calling strategies if your program is going to be spending 99% of its time doing those matrix operations and svd calculations. Do you think the overhead in searching for a function in a few tables, passing some parameters, and returning, is going to overwhelm the time taken in all the other calculations? Seems unlikely to me. Get it working one way or another - preferably with concerns in other aspects of software engineering (modularity, reusability etc) - and then if efficiency is a problem then profile the code to see where its spending its time. If it really is in the function calling mechanism, and not the guts of the code, then worry. Barry
Douglas Bates
2003-Apr-01 14:12 UTC
[R] R function calling: efficiency of different alternatives
Ramon Diaz <rdiaz at cnio.es> writes:> Dear all, > > I have a piece of code, call it "FA", that will be called thousands of times > in a typical run of function "FB". I can: > > a) define FA as a function outside of FB (in the global environment), and call > it; > b) define FA as a function inside the body of FB and call it; > c) "expand inline" FA inside FB. > > FA mainly does data frame subsetting, runs svd's, and calls compiled C++ code. > > I think I recall reading something about differences in efficiency between > those three approaches, but I can't find the information (I've searched the > email archives, Venables & Ripley's "S programming" and MASS, the R manuals, > and Burn's "S Poetry"). > > Are there any real performance differences? > > (I'd personally prefer b) since these functions will hopefully become a > contributed package, and FA is not supposed to be used directly by an end > user. I am aware of scoping differences between the three approaches, but > that is not my main concern now).After R-1.7.0 is released you can use a namespace to make FA local to your package. As Luke Tierney said in his presentation at DSC-2003, people have in the past written functions in the style of b) primarily to keep the names of local utility functions hidden. Style a) and the use of namespaces is a better way of accomplishing this. I agree with what Barry Rowlingson wrote in his reply. The best approach is to write the code in the most convenient and understandable way then profile the execution of the code to see where the bottlenecks really are.