Hi, I have been trying to use the new .parallel argument with the most recent version of plyr [1] to speed up some tasks. I can run the example in the NEWS file [1], and it seems to be working correctly. However, R will only use a single core when I try to apply this same approach with ddply(). 1. http://cran.r-project.org/web/packages/plyr/NEWS Watching my CPUs I see that in both cases only a single core is used, and they take about the same amount of time. Is there a limitation with how ddply() dispatches parallel jobs, or is this task not suitable for parallel computing? Cheers, Dylan Here is an example: library(plyr) library(doMC) registerDoMC(cores=2) # example data d <- data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500)) # function that wastes some time f <- function(x) { m <- vector(length=10000) for(i in 1:10000) { m[i] <- mean(sample(x$y, 100)) } mean(m) } system.time(ddply(d, .(id), .fun=f, .parallel=FALSE)) # user system elapsed # 2.740 0.016 2.766 system.time(ddply(d, .(id), .fun=f, .parallel=TRUE)) # user system elapsed # 2.720 0.000 2.726 -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
Yes, this was a little bug that will be fixed in the next release. Hadley On Thu, Sep 16, 2010 at 1:11 PM, Dylan Beaudette <debeaudette at ucdavis.edu> wrote:> Hi, > > I have been trying to use the new .parallel argument with the most recent > version of plyr [1] to speed up some tasks. I can run the example in the NEWS > file [1], and it seems to be working correctly. However, R will only use a > single core when I try to apply this same approach with ddply(). > > 1. http://cran.r-project.org/web/packages/plyr/NEWS > > Watching my CPUs I see that in both cases only a single core is used, and they > take about the same amount of time. Is there a limitation with how ddply() > dispatches parallel jobs, or is this task not suitable for parallel > computing? > > Cheers, > Dylan > > > Here is an example: > > library(plyr) > library(doMC) > registerDoMC(cores=2) > > # example data > d <- data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500)) > > # function that wastes some time > f <- function(x) { > m <- vector(length=10000) > for(i in 1:10000) { > ? ? ? ?m[i] <- mean(sample(x$y, 100)) > ? ? ? ?} > mean(m) > } > > system.time(ddply(d, .(id), .fun=f, .parallel=FALSE)) > # ?user ?system elapsed > # ?2.740 ? 0.016 ? 2.766 > > system.time(ddply(d, .(id), .fun=f, .parallel=TRUE)) > # ?user ?system elapsed > # ?2.720 ? 0.000 ? 2.726 > > > > > > -- > Dylan Beaudette > Soil Resource Laboratory > http://casoilresource.lawr.ucdavis.edu/ > University of California at Davis > 530.754.7341 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On Thursday 16 September 2010, David Winsemius wrote:> On Sep 16, 2010, at 1:11 PM, Dylan Beaudette wrote: > > Hi, > > > > I have been trying to use the new .parallel argument with the most > > recent > > version of plyr [1] to speed up some tasks. I can run the example in > > the NEWS > > file [1], and it seems to be working correctly. However, R will only > > use a > > single core when I try to apply this same approach with ddply(). > > > > 1. http://cran.r-project.org/web/packages/plyr/NEWS > > > > Watching my CPUs I see that in both cases only a single core is > > used, and they > > take about the same amount of time. Is there a limitation with how > > ddply() > > dispatches parallel jobs, or is this task not suitable for parallel > > computing? > > Was this done in a GUI? The registerDoMC help page says: > "... registerDoMC, should not be used in a GUI environment, because > multiple processes then share the same GUI." > > I, by the way, before reading the above ran it on a Mac with the GUI > with cores=4 and did experience a slightly decreased time. The non-GUI > restriction may also explain why I couldn't get the multicore package > to do anything useful when I tried it in the past.Interesting. I did not run it from within a GUI, rather from a linux terminal. It is a little sad that doMC will not work when called from the GUI-- as most of the users that I am currently developing a package for will be constrained to windows. Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341