asmahani
2011-Jul-13 13:28 UTC
[Rd] Performance of .C and .Call functions vs. native R code
Hello, I am in the process of writing an R extension for parallelized MCMC, with heavy use of compiled code (C++). I have been getting my feet wet by implementing a simple matrix-vector multiplication function in C++ (which calls a BLAS level 2 function dgemv), and comparing it to the '%*%' operator in R (which apparently calls a BLAS level 3 function dgemm). Interestingly, I cannot replicate the performance of the R native operator, using either '.C' or '.Call'. The relative times are 17 (R), 30 (.C), and 26 (.Call). In other words, R native operator is 1.5x faster than my compiled code. Can you explain to me why this is? Through testing I strongly suspect that the BLAS function itself isn't what takes the bulk part of the time, but perhaps data transfer and other overhead associated with the calls (.C and .Call) are the main issues. Are there any ways to reach the performance level of native R code in this case? Thank you, Alireza Mahani -- View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3665017.html Sent from the R devel mailing list archive at Nabble.com.
Jeff Ryan
2011-Jul-14 12:12 UTC
[Rd] Performance of .C and .Call functions vs. native R code
The .Call overhead isn't the issue. If you'd like some insight into what you are doing wrong (and right), you need to provide code for the list to reproduce your timings with. This is outlined in the posting guide as well. Best, Jeff On Jul 13, 2011, at 8:28 AM, asmahani <alireza.s.mahani at gmail.com> wrote:> Hello, > > I am in the process of writing an R extension for parallelized MCMC, with > heavy use of compiled code (C++). I have been getting my feet wet by > implementing a simple matrix-vector multiplication function in C++ (which > calls a BLAS level 2 function dgemv), and comparing it to the '%*%' operator > in R (which apparently calls a BLAS level 3 function dgemm). > > Interestingly, I cannot replicate the performance of the R native operator, > using either '.C' or '.Call'. The relative times are 17 (R), 30 (.C), and 26 > (.Call). In other words, R native operator is 1.5x faster than my compiled > code. Can you explain to me why this is? Through testing I strongly suspect > that the BLAS function itself isn't what takes the bulk part of the time, > but perhaps data transfer and other overhead associated with the calls (.C > and .Call) are the main issues. Are there any ways to reach the performance > level of native R code in this case? > > Thank you, > Alireza Mahani > > -- > View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3665017.html > Sent from the R devel mailing list archive at Nabble.com. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Maybe Matching Threads
- How to safely using OpenMP pragma inside a .C() function?
- Is it possible to pass a function argument from R to compiled code in C?
- Best practices for writing R functions
- Assigning a new name to object loaded with "load()"
- R with external BLAS fails regression test