dear R experts--- I am experimenting with multicore processing, so far with pretty disappointing results. Here is my simple example: A <- 100000 randvalues <- abs(rnorm(A)) minfn <- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an arbitrary function ARGV <- commandArgs(trailingOnly=TRUE) if (ARGV[1] == "do-onecore") { ?library(foreach) ?discard <- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } else if (ARGV[1] == "do-multicore") { ?library(doMC) ?registerDoMC() ?cat("You have", getDoParWorkers(), "cores\n") ?discard <- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } else if (ARGV[1] == "plain") ?for (i in 1:A) discard <- uniroot( minfn, c(1e-20,9e20), i ) else cat("sorry, but argument", ARGV[1], "is not plain|do-onecore|do-multicore\n") on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, "plain" takes about 68 seconds (real and user, using the unix timing function). "do-onecore" takes about 300 seconds. "do-multicore" takes about 210 seconds real, (300 seconds user). this seems pretty disappointing. the cores are not used for the most part, either. feedback appreciated. /iaw ---- Ivo Welch (ivo.welch at gmail.com)
On 02.07.2011 19:32, ivo welch wrote:> dear R experts--- > > I am experimenting with multicore processing, so far with pretty > disappointing results. Here is my simple example: > > A<- 100000 > randvalues<- abs(rnorm(A)) > minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an > arbitrary function > > ARGV<- commandArgs(trailingOnly=TRUE) > > if (ARGV[1] == "do-onecore") { > library(foreach) > discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } else > if (ARGV[1] == "do-multicore") { > library(doMC) > registerDoMC() > cat("You have", getDoParWorkers(), "cores\n") > discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } else > if (ARGV[1] == "plain") > for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else > cat("sorry, but argument", ARGV[1], "is not plain|do-onecore|do-multicore\n") > > > on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, > > "plain" takes about 68 seconds (real and user, using the unix timing > function). > "do-onecore" takes about 300 seconds. > "do-multicore" takes about 210 seconds real, (300 seconds user). > > this seems pretty disappointing. the cores are not used for the most > part, either. feedback appreciated.Feedback is that a single computation within your foreach loop is so quick that the overhead of communicating data and results between processes costs more time than the actual evaluation, hence you are faster with a single process. What you should do is: write code that does, e.g., 10000 iterations within 10 other iterations and just do a foreach loop around the outer 10. Then you will probably be much faster (without testing). But this is essentially the example I am using for teaching to show when not to do parallel processing..... Best, Uwe Ligges> /iaw > > > ---- > Ivo Welch (ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
thank you, uwe. this is a little disappointing. parallel processing for embarrassingly simple parallel operations--those needing no communication---should be feasible if the thread is not always created and released, but held. is there light-weight parallel processing that could facilitate this? regards, /iaw 2011/7/2 Uwe Ligges <ligges at statistik.tu-dortmund.de>:> > > On 02.07.2011 19:32, ivo welch wrote: >> >> dear R experts--- >> >> I am experimenting with multicore processing, so far with pretty >> disappointing results. ?Here is my simple example: >> >> A<- 100000 >> randvalues<- abs(rnorm(A)) >> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an >> arbitrary function >> >> ARGV<- commandArgs(trailingOnly=TRUE) >> >> if (ARGV[1] == "do-onecore") { >> ? ?library(foreach) >> ? ?discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >> else >> if (ARGV[1] == "do-multicore") { >> ? ?library(doMC) >> ? ?registerDoMC() >> ? ?cat("You have", getDoParWorkers(), "cores\n") >> ? ?discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } >> else >> if (ARGV[1] == "plain") >> ? ?for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >> cat("sorry, but argument", ARGV[1], "is not >> plain|do-onecore|do-multicore\n") >> >> >> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >> >> ? "plain" takes about 68 seconds (real and user, using the unix timing >> function). >> ? "do-onecore" takes about 300 seconds. >> ? "do-multicore" takes about 210 seconds real, (300 seconds user). >> >> this seems pretty disappointing. ?the cores are not used for the most >> part, either. ?feedback appreciated. > > > Feedback is that a single computation within your foreach loop is so quick > that the overhead of communicating data and results between processes costs > more time than the actual evaluation, hence you are faster with a single > process. > > What you should do is: > > write code that does, e.g., 10000 iterations within 10 other iterations and > just do a foreach loop around the outer 10. Then you will probably be much > faster (without testing). But this is essentially the example I am using for > teaching to show when not to do parallel processing..... > > Best, > Uwe Ligges > > > > > > >> /iaw >> >> >> ---- >> Ivo Welch (ivo.welch at gmail.com) >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
On 02.07.2011 20:04, ivo welch wrote:> thank you, uwe. this is a little disappointing. parallel processing > for embarrassingly simple parallel operations--those needing no > communication---should be feasible if the thread is not always created > and released, but held. is there light-weight parallel processing > that could facilitate this?Hmmm, now that you asked I checked it myself using snow: On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK clsuters, i.e. slow communication) I get: > system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) user system elapsed 3.10 0.19 51.43 while on a single core without parallelization framework: > system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) user system elapsed 93.74 0.09 94.24 Hence (although my prior assumption was that the overhead would be big also for other frameworks than foreach) it scales perfectly well with snow, perhaps you have to use foreach in a different way? Best, Uwe Ligges> > regards, > > /iaw > > > 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>: >> >> >> On 02.07.2011 19:32, ivo welch wrote: >>> >>> dear R experts--- >>> >>> I am experimenting with multicore processing, so far with pretty >>> disappointing results. Here is my simple example: >>> >>> A<- 100000 >>> randvalues<- abs(rnorm(A)) >>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an >>> arbitrary function >>> >>> ARGV<- commandArgs(trailingOnly=TRUE) >>> >>> if (ARGV[1] == "do-onecore") { >>> library(foreach) >>> discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>> else >>> if (ARGV[1] == "do-multicore") { >>> library(doMC) >>> registerDoMC() >>> cat("You have", getDoParWorkers(), "cores\n") >>> discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } >>> else >>> if (ARGV[1] == "plain") >>> for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>> cat("sorry, but argument", ARGV[1], "is not >>> plain|do-onecore|do-multicore\n") >>> >>> >>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>> >>> "plain" takes about 68 seconds (real and user, using the unix timing >>> function). >>> "do-onecore" takes about 300 seconds. >>> "do-multicore" takes about 210 seconds real, (300 seconds user). >>> >>> this seems pretty disappointing. the cores are not used for the most >>> part, either. feedback appreciated. >> >> >> Feedback is that a single computation within your foreach loop is so quick >> that the overhead of communicating data and results between processes costs >> more time than the actual evaluation, hence you are faster with a single >> process. >> >> What you should do is: >> >> write code that does, e.g., 10000 iterations within 10 other iterations and >> just do a foreach loop around the outer 10. Then you will probably be much >> faster (without testing). But this is essentially the example I am using for >> teaching to show when not to do parallel processing..... >> >> Best, >> Uwe Ligges >> >> >> >> >> >> >>> /iaw >>> >>> >>> ---- >>> Ivo Welch (ivo.welch at gmail.com) >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>
hi uwe--I did not know what snow was. from my 1 minute reading, it seems like a much more involved setup that is much more flexible after the setup cost has been incurred (specifically, allowing use of many machines). the attractiveness of the doMC/foreach framework is its simplicity of installation and use. but if I understand what you are telling me, you are using a different parallelization framework, and it shows that my example is completed a lot faster using this different parallelization framework. correct? if so, the problem is my use of the doMC framework, not the inherent cost of dealing with multiple processes. is this interpretation correct? regards, /iaw ---- Ivo Welch (ivo.welch at gmail.com) http://www.ivo-welch.info/ 2011/7/2 Uwe Ligges <ligges at statistik.tu-dortmund.de>:> > > On 02.07.2011 20:04, ivo welch wrote: >> >> thank you, uwe. ?this is a little disappointing. ?parallel processing >> for embarrassingly simple parallel operations--those needing no >> communication---should be feasible if the thread is not always created >> and released, but held. ?is there light-weight parallel processing >> that could facilitate this? > > Hmmm, now that you asked I checked it myself using snow: > > On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK > clsuters, i.e. slow communication) I get: > > > >> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), >> i))) > ? user ?system elapsed > ? 3.10 ? ?0.19 ? 51.43 > > while on a single core without parallelization framework: > >> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) > ? user ?system elapsed > ?93.74 ? ?0.09 ? 94.24 > > Hence (although my prior assumption was that the overhead would be big also > for other frameworks than foreach) it scales perfectly well with snow, > perhaps you have to use foreach in a different way? > > Best, > Uwe Ligges > > > > > >> >> regards, >> >> /iaw >> >> >> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>: >>> >>> >>> On 02.07.2011 19:32, ivo welch wrote: >>>> >>>> dear R experts--- >>>> >>>> I am experimenting with multicore processing, so far with pretty >>>> disappointing results. ?Here is my simple example: >>>> >>>> A<- 100000 >>>> randvalues<- abs(rnorm(A)) >>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an >>>> arbitrary function >>>> >>>> ARGV<- commandArgs(trailingOnly=TRUE) >>>> >>>> if (ARGV[1] == "do-onecore") { >>>> ? ?library(foreach) >>>> ? ?discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>>> else >>>> if (ARGV[1] == "do-multicore") { >>>> ? ?library(doMC) >>>> ? ?registerDoMC() >>>> ? ?cat("You have", getDoParWorkers(), "cores\n") >>>> ? ?discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) >>>> } >>>> else >>>> if (ARGV[1] == "plain") >>>> ? ?for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>>> cat("sorry, but argument", ARGV[1], "is not >>>> plain|do-onecore|do-multicore\n") >>>> >>>> >>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>>> >>>> ? "plain" takes about 68 seconds (real and user, using the unix timing >>>> function). >>>> ? "do-onecore" takes about 300 seconds. >>>> ? "do-multicore" takes about 210 seconds real, (300 seconds user). >>>> >>>> this seems pretty disappointing. ?the cores are not used for the most >>>> part, either. ?feedback appreciated. >>> >>> >>> Feedback is that a single computation within your foreach loop is so >>> quick >>> that the overhead of communicating data and results between processes >>> costs >>> more time than the actual evaluation, hence you are faster with a single >>> process. >>> >>> What you should do is: >>> >>> write code that does, e.g., 10000 iterations within 10 other iterations >>> and >>> just do a foreach loop around the outer 10. Then you will probably be >>> much >>> faster (without testing). But this is essentially the example I am using >>> for >>> teaching to show when not to do parallel processing..... >>> >>> Best, >>> Uwe Ligges >>> >>> >>> >>> >>> >>> >>>> /iaw >>>> >>>> >>>> ---- >>>> Ivo Welch (ivo.welch at gmail.com) >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >
On 02.07.2011 20:42, ivo welch wrote:> hi uwe--I did not know what snow was. from my 1 minute reading, it > seems like a much more involved setup that is much more flexible after > the setup cost has been incurred (specifically, allowing use of many > machines). > > the attractiveness of the doMC/foreach framework is its simplicity of > installation and use. > > but if I understand what you are telling me, you are using a different > parallelization framework, and it shows that my example is completed a > lot faster using this different parallelization framework. correct? > if so, the problem is my use of the doMC framework, not the inherent > cost of dealing with multiple processes. is this interpretation > correct?Indeed. Uwe> regards, > > /iaw > > ---- > Ivo Welch (ivo.welch at gmail.com) > http://www.ivo-welch.info/ > > > 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>: >> >> >> On 02.07.2011 20:04, ivo welch wrote: >>> >>> thank you, uwe. this is a little disappointing. parallel processing >>> for embarrassingly simple parallel operations--those needing no >>> communication---should be feasible if the thread is not always created >>> and released, but held. is there light-weight parallel processing >>> that could facilitate this? >> >> Hmmm, now that you asked I checked it myself using snow: >> >> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK >> clsuters, i.e. slow communication) I get: >> >> >> >>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), >>> i))) >> user system elapsed >> 3.10 0.19 51.43 >> >> while on a single core without parallelization framework: >> >>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) >> user system elapsed >> 93.74 0.09 94.24 >> >> Hence (although my prior assumption was that the overhead would be big also >> for other frameworks than foreach) it scales perfectly well with snow, >> perhaps you have to use foreach in a different way? >> >> Best, >> Uwe Ligges >> >> >> >> >> >>> >>> regards, >>> >>> /iaw >>> >>> >>> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de>: >>>> >>>> >>>> On 02.07.2011 19:32, ivo welch wrote: >>>>> >>>>> dear R experts--- >>>>> >>>>> I am experimenting with multicore processing, so far with pretty >>>>> disappointing results. Here is my simple example: >>>>> >>>>> A<- 100000 >>>>> randvalues<- abs(rnorm(A)) >>>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an >>>>> arbitrary function >>>>> >>>>> ARGV<- commandArgs(trailingOnly=TRUE) >>>>> >>>>> if (ARGV[1] == "do-onecore") { >>>>> library(foreach) >>>>> discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>>>> else >>>>> if (ARGV[1] == "do-multicore") { >>>>> library(doMC) >>>>> registerDoMC() >>>>> cat("You have", getDoParWorkers(), "cores\n") >>>>> discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) >>>>> } >>>>> else >>>>> if (ARGV[1] == "plain") >>>>> for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>>>> cat("sorry, but argument", ARGV[1], "is not >>>>> plain|do-onecore|do-multicore\n") >>>>> >>>>> >>>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>>>> >>>>> "plain" takes about 68 seconds (real and user, using the unix timing >>>>> function). >>>>> "do-onecore" takes about 300 seconds. >>>>> "do-multicore" takes about 210 seconds real, (300 seconds user). >>>>> >>>>> this seems pretty disappointing. the cores are not used for the most >>>>> part, either. feedback appreciated. >>>> >>>> >>>> Feedback is that a single computation within your foreach loop is so >>>> quick >>>> that the overhead of communicating data and results between processes >>>> costs >>>> more time than the actual evaluation, hence you are faster with a single >>>> process. >>>> >>>> What you should do is: >>>> >>>> write code that does, e.g., 10000 iterations within 10 other iterations >>>> and >>>> just do a foreach loop around the outer 10. Then you will probably be >>>> much >>>> faster (without testing). But this is essentially the example I am using >>>> for >>>> teaching to show when not to do parallel processing..... >>>> >>>> Best, >>>> Uwe Ligges >>>> >>>> >>>> >>>> >>>> >>>> >>>>> /iaw >>>>> >>>>> >>>>> ---- >>>>> Ivo Welch (ivo.welch at gmail.com) >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.