I started to look at ways to improve times of certain very parallel tasks and thought that foreach should be a valid candidate to do the job. So, i opened foreach tutorial by Steve Weston and started timing examples from it. First example from tutorial is>system.time(for(i in 1:100000) sqrt(i))user system elapsed 0.06 0.00 0.06> system.time(foreach(i=1:100000) %do% sqrt(i))user system elapsed 102.37 0.21 103.38 Hmm, 1700 time slower? second example is> system.time(x <- exp(1:1000000))user system elapsed 0.34 0.03 0.42>system.time(x <- foreach(i=1:1000000, .combine='c') %do% exp(i))I stopped it at 958 seconds, didn't have enough patience -- it basically seems that foreach slows down this one down naive by more than 2000 times. I must be doing something very wrong. Am i supposed to set some environment variables before it works properly? I am running 64bit R on win7 dual core 2.27GHZ CPUs and 4GB memory laptop. [[alternative HTML version deleted]]
You're probably being killed by the overhead of parallelization which is, in this case, far more than actual computation time. I've not dug through foreach() in a while, but I think this winds up spawning many many subprocesses which isn't cheap in Windows. MW On Mon, Jan 21, 2013 at 3:59 PM, Andre Zege <azege at yahoo.com> wrote:> I started to look at ways to improve times of certain very parallel tasks and thought that foreach should be a valid candidate to do the job. > So, i opened foreach tutorial by Steve Weston and started timing examples from it. First example from tutorial is > > >>system.time(for(i in 1:100000) sqrt(i)) > > user system elapsed > 0.06 0.00 0.06 >> system.time(foreach(i=1:100000) %do% sqrt(i)) > user system elapsed > 102.37 0.21 103.38 > > Hmm, 1700 time slower? > > second example is >> system.time(x <- exp(1:1000000)) > user system elapsed > 0.34 0.03 0.42 >>system.time(x <- foreach(i=1:1000000, .combine='c') %do% exp(i)) > > > I stopped it at 958 seconds, didn't have enough patience -- it basically seems that foreach slows down this one down naive by more than 2000 times. I must be doing something very wrong. Am i supposed to set some environment variables before it works properly? I am running 64bit R on win7 dual core 2.27GHZ CPUs and 4GB memory laptop. > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi, On Mon, Jan 21, 2013 at 10:59 AM, Andre Zege <azege at yahoo.com> wrote:> I started to look at ways to improve times of certain very parallel tasks and thought that foreach should be a valid candidate to do the job. > So, i opened foreach tutorial by Steve Weston and started timing examples from it. First example from tutorial is > > >>system.time(for(i in 1:100000) sqrt(i)) > > user system elapsed > 0.06 0.00 0.06 >> system.time(foreach(i=1:100000) %do% sqrt(i)) > user system elapsed > 102.37 0.21 103.38 > > Hmm, 1700 time slower? > > second example is >> system.time(x <- exp(1:1000000)) > user system elapsed > 0.34 0.03 0.42 >>system.time(x <- foreach(i=1:1000000, .combine='c') %do% exp(i)) > > > I stopped it at 958 seconds, didn't have enough patience -- it basically seems that foreach slows down this one down naive by more than 2000 times. I must be doing something very wrong. Am i supposed to set some environment variables before it works properly? I am running 64bit R on win7 dual core 2.27GHZ CPUs and 4GB memory laptop.You should keep reading that vignette you are working from :-)>From Section 5 "Parallel Execution":""" ... But for the kinds of quick running operations that we?ve been doing, there wouldn?t be much point to executing them in parallel. Running many tiny tasks in parallel will usually take more time to execute than running them sequentially, and if it already runs fast, there?s no motivation to make it run faster anyway. But if the operation that we?re executing in parallel takes a minute or longer, there starts to be some motivation. """ The task you are parallelizing is too trivial. The time to coordinate the data splitting + forking + etc. is more than just running sqrt. When the specific task you are running within each iteration is more involved, the benefit of parallelization will become more clear. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact