R Developers, Could someone help explain what it means that R is single threaded? I am trying to understand what is actually going on inside R when users want to parallelize code. For example, using mclapply or foreach (with some backend) somehow allows users to benefit from multiple CPUs. Similarly there is the RcppParallel package for RMatrix/RVector objects. But none of these address the general XPtr objects in Rcpp. Some readers here may recognize my question on SO ( http://stackoverflow.com/questions/37167479/rcpp-parallelize-functions-that-return-xptr) where I was curious about parallel calls to C++/Rcpp functions that return XPtr objects. I am being a little more persistent here as this limitation provides a very hard stop on the development on one of my packages that heavily uses XPtr objects. It's not meant to be a criticism or intended to be rude, I just want to fully understand. I am willing to accept that it may be impossible currently but I want to at least understand why it is impossible so I can explain to future users why parallel functionality is not available. Which just echos my original question, what does it mean that R is single threaded? Kind Regards, Charles [[alternative HTML version deleted]]
Charles, 1. Perhaps this question is better directed at the R-help or R-pacakge-devel mailinglist. 2. It basically means that R itself can only evaluate one R expression at the time. The parallel package circumvents this by starting multiple R-sessions and dividing workload. Compiled code called by R (such as C++ code through RCpp or C-code through base R's interface) can execute multi-threaded code for internal purposes, using e.g. openMP. A limitation is that compiled code cannot call R's C API from multiple threads (in many cases). For example, it is not thread-safe to create R-variables from multiple threads running in C. (R's variable administration is such that the order of (un)making them from compiled code matters). I am not very savvy on Rcpp or XPtr objects, but it appears that Dirk provided answers about that in your SO-question. Best, Mark Op do 12 mei 2016 om 14:46 schreef Charles Determan <cdetermanjr at gmail.com>:> R Developers, > > Could someone help explain what it means that R is single threaded? I am > trying to understand what is actually going on inside R when users want to > parallelize code. For example, using mclapply or foreach (with some > backend) somehow allows users to benefit from multiple CPUs. > > Similarly there is the RcppParallel package for RMatrix/RVector objects. > But none of these address the general XPtr objects in Rcpp. Some readers > here may recognize my question on SO ( > > http://stackoverflow.com/questions/37167479/rcpp-parallelize-functions-that-return-xptr > ) > where I was curious about parallel calls to C++/Rcpp functions that return > XPtr objects. I am being a little more persistent here as this limitation > provides a very hard stop on the development on one of my packages that > heavily uses XPtr objects. It's not meant to be a criticism or intended to > be rude, I just want to fully understand. > > I am willing to accept that it may be impossible currently but I want to at > least understand why it is impossible so I can explain to future users why > parallel functionality is not available. Which just echos my original > question, what does it mean that R is single threaded? > > Kind Regards, > Charles > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
On 12/05/2016 8:45 AM, Charles Determan wrote:> R Developers, > > Could someone help explain what it means that R is single threaded? I am > trying to understand what is actually going on inside R when users want to > parallelize code. For example, using mclapply or foreach (with some > backend) somehow allows users to benefit from multiple CPUs.I don't know what document you are quoting when you say "R is single threaded", but one possible meaning is that most base R calculations are done in a single thread. When you do vectorized calculations like x+y for long vectors x and y, they are done internally as loops over the entries. On Windows, there are two threads when running Rterm, with one to maintain the display, since otherwise the plot display couldn't update while R is waiting for input. The mclapply function in the parallel package forks the process to do its calculations. Other packages can do other variations on parallel computations. I can't help you with the rest of your question, I don't know what XPtr objects are. Duncan Murdoch> > Similarly there is the RcppParallel package for RMatrix/RVector objects. > But none of these address the general XPtr objects in Rcpp. Some readers > here may recognize my question on SO ( > http://stackoverflow.com/questions/37167479/rcpp-parallelize-functions-that-return-xptr) > where I was curious about parallel calls to C++/Rcpp functions that return > XPtr objects. I am being a little more persistent here as this limitation > provides a very hard stop on the development on one of my packages that > heavily uses XPtr objects. It's not meant to be a criticism or intended to > be rude, I just want to fully understand. > > I am willing to accept that it may be impossible currently but I want to at > least understand why it is impossible so I can explain to future users why > parallel functionality is not available. Which just echos my original > question, what does it mean that R is single threaded? > > Kind Regards, > Charles > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Thanks for the replies. Regarding the answer by Dirk, I didn't feel like I still understood the reasoning why mclapply or foreach cannot handle XPtr objects. Instead of cluttering the SO question with comments I was getting the impression that this was a limitation inherited with R objects (which XPtr is supposed to be a proxy for an R object according to Dirk's comment). If this is not the case, I could repost this on Rcpp-devel unless it could be migrated. Regards, Charles On Thu, May 12, 2016 at 8:11 AM, Mark van der Loo <mark.vanderloo at gmail.com> wrote:> Charles, > > 1. Perhaps this question is better directed at the R-help or > R-pacakge-devel mailinglist. > > 2. It basically means that R itself can only evaluate one R expression at > the time. > > The parallel package circumvents this by starting multiple R-sessions and > dividing workload. > > Compiled code called by R (such as C++ code through RCpp or C-code through > base R's interface) can execute multi-threaded code for internal purposes, > using e.g. openMP. A limitation is that compiled code cannot call R's C API > from multiple threads (in many cases). For example, it is not thread-safe > to create R-variables from multiple threads running in C. (R's variable > administration is such that the order of (un)making them from compiled code > matters). > > I am not very savvy on Rcpp or XPtr objects, but it appears that Dirk > provided answers about that in your SO-question. > > Best, > Mark > > > > > > > > > > > Op do 12 mei 2016 om 14:46 schreef Charles Determan <cdetermanjr at gmail.com > >: > >> R Developers, >> >> Could someone help explain what it means that R is single threaded? I am >> trying to understand what is actually going on inside R when users want to >> parallelize code. For example, using mclapply or foreach (with some >> backend) somehow allows users to benefit from multiple CPUs. >> >> Similarly there is the RcppParallel package for RMatrix/RVector objects. >> But none of these address the general XPtr objects in Rcpp. Some readers >> here may recognize my question on SO ( >> >> http://stackoverflow.com/questions/37167479/rcpp-parallelize-functions-that-return-xptr >> ) >> where I was curious about parallel calls to C++/Rcpp functions that return >> XPtr objects. I am being a little more persistent here as this limitation >> provides a very hard stop on the development on one of my packages that >> heavily uses XPtr objects. It's not meant to be a criticism or intended >> to >> be rude, I just want to fully understand. >> >> I am willing to accept that it may be impossible currently but I want to >> at >> least understand why it is impossible so I can explain to future users why >> parallel functionality is not available. Which just echos my original >> question, what does it mean that R is single threaded? >> >> Kind Regards, >> Charles >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >[[alternative HTML version deleted]]
The R language itself has features that limit how much mulitthreading/parallel processing can be done. There are functions with side effects, such as library(), plot(), runif(), <-, and <<- and there are no mechanisms to isolate them. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, May 12, 2016 at 5:45 AM, Charles Determan <cdetermanjr at gmail.com> wrote:> R Developers, > > Could someone help explain what it means that R is single threaded? I am > trying to understand what is actually going on inside R when users want to > parallelize code. For example, using mclapply or foreach (with some > backend) somehow allows users to benefit from multiple CPUs. > > Similarly there is the RcppParallel package for RMatrix/RVector objects. > But none of these address the general XPtr objects in Rcpp. Some readers > here may recognize my question on SO ( > > http://stackoverflow.com/questions/37167479/rcpp-parallelize-functions-that-return-xptr > ) > where I was curious about parallel calls to C++/Rcpp functions that return > XPtr objects. I am being a little more persistent here as this limitation > provides a very hard stop on the development on one of my packages that > heavily uses XPtr objects. It's not meant to be a criticism or intended to > be rude, I just want to fully understand. > > I am willing to accept that it may be impossible currently but I want to at > least understand why it is impossible so I can explain to future users why > parallel functionality is not available. Which just echos my original > question, what does it mean that R is single threaded? > > Kind Regards, > Charles > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
On 12 May 2016 at 13:11, Mark van der Loo wrote: | Charles, | | 1. Perhaps this question is better directed at the R-help or | R-pacakge-devel mailinglist. | | 2. It basically means that R itself can only evaluate one R expression at | the time. | | The parallel package circumvents this by starting multiple R-sessions and | dividing workload. | | Compiled code called by R (such as C++ code through RCpp or C-code through | base R's interface) can execute multi-threaded code for internal purposes, | using e.g. openMP. A limitation is that compiled code cannot call R's C API | from multiple threads (in many cases). For example, it is not thread-safe | to create R-variables from multiple threads running in C. (R's variable | administration is such that the order of (un)making them from compiled code | matters). Well put. | I am not very savvy on Rcpp or XPtr objects, but it appears that Dirk | provided answers about that in your SO-question. Charles seems to hang himself up completely about a small detail, failing to see the forest for the trees. There are (many) working examples of parallel (compiled) code with R. All of them stress (and I simplify here) that can you touch R objects, or call back into R, for fear of any assignment or allocation triggering an R event. R being single-threaded it cannot do this. My answer to this problem is to only use non-R data structures. That is what RcpParallel does in the actual parallel code portions in all examples -- types RVector and RMatrix do NOT connect back to R. There are several working examples. That is also what the OpenMP examples at the Rcpp Gallery do. Charles seems to be replying 'but I use XPtr' or 'I use XPtr on arma::mat or Eigen::Matrixxd' and seems to forget that these are proxy objects to SEXPs. XPtr just wrap the SEXP for external pointers; Arma's and Eigen's matrices are performant via RcppArmadillo and RcppEigen because we use R memory via proxies. All of that is 'too close to R' for comfort. So the short answer is: enter compiled code from R, set a mutex (either conceptually or explicitly), _copy_ your data in to plain C++ data structures and go to town in parallel via OpenMP and other multithreaded approaches. Then collect the result, release the mutex and move back up. I hope this help. Dirk | | Best, | Mark | | | | | | | | | | | Op do 12 mei 2016 om 14:46 schreef Charles Determan <cdetermanjr at gmail.com>: | | > R Developers, | > | > Could someone help explain what it means that R is single threaded? I am | > trying to understand what is actually going on inside R when users want to | > parallelize code. For example, using mclapply or foreach (with some | > backend) somehow allows users to benefit from multiple CPUs. | > | > Similarly there is the RcppParallel package for RMatrix/RVector objects. | > But none of these address the general XPtr objects in Rcpp. Some readers | > here may recognize my question on SO ( | > | > http://stackoverflow.com/questions/37167479/rcpp-parallelize-functions-that-return-xptr | > ) | > where I was curious about parallel calls to C++/Rcpp functions that return | > XPtr objects. I am being a little more persistent here as this limitation | > provides a very hard stop on the development on one of my packages that | > heavily uses XPtr objects. It's not meant to be a criticism or intended to | > be rude, I just want to fully understand. | > | > I am willing to accept that it may be impossible currently but I want to at | > least understand why it is impossible so I can explain to future users why | > parallel functionality is not available. Which just echos my original | > question, what does it mean that R is single threaded? | > | > Kind Regards, | > Charles | > | > [[alternative HTML version deleted]] | > | > ______________________________________________ | > R-devel at r-project.org mailing list | > https://stat.ethz.ch/mailman/listinfo/r-devel | > | | [[alternative HTML version deleted]] | | ______________________________________________ | R-devel at r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org