Dear R experts, please excuse me for writing to the mailing list without subscribing. I have a somewhat urgent problem that relates to R. I have to process large amounts of data with R - I'm in an international collaboration and the data processing protocol is fixed, that is a specific set of R commands has to be used. I wrote a perl program that manages creation of data subsets from my database and feeds these subsets to an R process via pipes. This worked all right, however, I wanted to speed things up by exploiting the fact that I have a dual-core machine. So I rewrote my perl driver program to use two threads, each starting its own R instance, getting data off a queue and feeding it to its R process. This also worked, except that I noticed something very peculiar: the processing time was almost exactly the same for both cases. I did some tests to look at this, and it seems that R needs twice the time to do the exact same thing if there are two instances of it running. I don't understand how is this possible. Maybe there is an issue of thread-safety with the R backend, meaning that the two R *interpreter* instances are talking to the same backend that's capable of processing only one thing at a time? Technical details: OS was Ubuntu 9.04 running on a Core2Dou E7300, and the R version used was the default one from the Ubuntu repository. Please see http://www.perlmonks.org/?node_id=792460 for an extended discussion of the problem, and especially http://www.perlmonks.org/?node_id=793506 for excerpts of output and actual code. Thanks for your answers in advance: P?ter Juh?sz
if I look at your output, the single thread is using almost 100% of the two cpus (3:49 real, 5:49 user or something close to that). for the two thread case it is close to the same with the user now something like 6:15. I would like to see what the contribution of each of the processes are. put some proc. time calls in the R script to see what it is using. Sent from my iPhone On Sep 5, 2009, at 7:21, Peter Juhasz <peter.juhasz83 at gmail.com> wrote:> Dear R experts, > > please excuse me for writing to the mailing list without subscribing. > I have a somewhat urgent problem that relates to R. > > I have to process large amounts of data with R - I'm in an > international collaboration and the data processing protocol is fixed, > that is a specific set of R commands has to be used. > > I wrote a perl program that manages creation of data subsets from my > database and feeds these subsets to an R process via pipes. > > This worked all right, however, I wanted to speed things up by > exploiting the fact that I have a dual-core machine. So I rewrote my > perl driver program to use two threads, each starting its own R > instance, getting data off a queue and feeding it to its R process. > > This also worked, except that I noticed something very peculiar: the > processing time was almost exactly the same for both cases. I did some > tests to look at this, and it seems that R needs twice the time to do > the exact same thing if there are two instances of it running. > > I don't understand how is this possible. Maybe there is an issue of > thread-safety with the R backend, meaning that the two R *interpreter* > instances are talking to the same backend that's capable of processing > only one thing at a time? > > Technical details: OS was Ubuntu 9.04 running on a Core2Dou E7300, and > the R version used was the default one from the Ubuntu repository. > > Please see http://www.perlmonks.org/?node_id=792460 for an extended > discussion of the problem, and especially > http://www.perlmonks.org/?node_id=793506 for excerpts of output and > actual code. > > Thanks for your answers in advance: > P?ter Juh?sz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Why instead you don't explore packages 'multicore' or 'snow+showfall (using sockets)'? Ciao! mario Peter Juhasz wrote:> Dear R experts, > > please excuse me for writing to the mailing list without subscribing. > I have a somewhat urgent problem that relates to R. > > I have to process large amounts of data with R - I'm in an > international collaboration and the data processing protocol is fixed, > that is a specific set of R commands has to be used. > > I wrote a perl program that manages creation of data subsets from my > database and feeds these subsets to an R process via pipes. > > This worked all right, however, I wanted to speed things up by > exploiting the fact that I have a dual-core machine. So I rewrote my > perl driver program to use two threads, each starting its own R > instance, getting data off a queue and feeding it to its R process. > > This also worked, except that I noticed something very peculiar: the > processing time was almost exactly the same for both cases. I did some > tests to look at this, and it seems that R needs twice the time to do > the exact same thing if there are two instances of it running. > > I don't understand how is this possible. Maybe there is an issue of > thread-safety with the R backend, meaning that the two R *interpreter* > instances are talking to the same backend that's capable of processing > only one thing at a time? > > Technical details: OS was Ubuntu 9.04 running on a Core2Dou E7300, and > the R version used was the default one from the Ubuntu repository. > > Please see http://www.perlmonks.org/?node_id=792460 for an extended > discussion of the problem, and especially > http://www.perlmonks.org/?node_id=793506 for excerpts of output and > actual code. > > Thanks for your answers in advance: > P?ter Juh?sz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ing. Mario Valle Data Analysis and Visualization Group | http://www.cscs.ch/~mvalle Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60 v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82