julien cuisinier
2010-Feb-16 11:25 UTC
[R] for loop Vs apply function Vs foreach (REvolution enhancement)
Dear all, I know this topic has already been covered in other posts (at least the for loop Vs apply family of function), but I am looking for fresh / up-to-date opinion and feedback on those 3 methods to run unavoidable loops in R. I realise that it may be too general question for many, so any feedback appreciated. 1. apply Vs for loop>> Seems apply is (was?) supposed to be faster than using for loop, some posts mention that it is now more of a cosmetic function (wrapper for "for loop") making the code essentially neater. Any thoughts/opinion/experience on this more than welcome.>> Running the very simple function attached, I end up with for loop quicker than apply function....but may be do I not use the apply function properly?2. foreach (REvolution enhancement)>> seems the rationale of this function is to facilitate the use of multithreading to enhance the for loop speed. Given a moderate time sensitivity (process must run fast but a gain of 10-20% speed seen as probably not justifying the additional learning + dependence from yet another package), is it really worth going down that route?Has anyone extensive experience with this matter (using foreach to boost for loop running time)? any feedback welcome. Have all a nice day! Rgds, Julien _________________________________________________________________ Hotmail: Vertrauensw?rdige E-Mails dank leistungsstarkem SPAM-Schutz.
Liviu Andronic
2010-Feb-16 15:24 UTC
[R] for loop Vs apply function Vs foreach (REvolution enhancement)
Hello On 2/16/10, julien cuisinier <j_cuisinier at hotmail.com> wrote:> 1. apply Vs for loop > > >> Seems apply is (was?) supposed to be faster than using for loop, some posts mention that it is now more of a cosmetic function (wrapper for "for loop") making the code essentially neater. Any thoughts/opinion/experience on this more than welcome. > > >> Running the very simple function attached, I end up with for loop quicker than apply function....but may be do I not use the apply function properly? >Check this [1] for a discussion of vectorisation vs looping. It seems that vectorisation is not always better than looping. Liviu [1] http://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf
Steve Lianoglou
2010-Feb-16 16:24 UTC
[R] for loop Vs apply function Vs foreach (REvolution enhancement)
Hi,> 2. foreach (REvolution enhancement) > >seems the rationale of this function is to facilitate the use of multithreading to enhance the for loop speed. Given a moderate time sensitivity (process must run fast but a gain of 10-20% speed seen as probably not justifying the additional learning + dependence from yet another package), is it really worth going down that route? > > Has anyone extensive experience with this matter (using foreach to boost for loop running time)? any feedback welcome.I'm not sure what you mean by "moderate time sensitivity" notion, but you should definitely use foreach if you have a block of code that you are iterating over that (i) takes a moderately long time to execute; (ii) is independent of the code that runs before/after it in the loop; and (tangentially but not really pertinent) (iii) running a linux/os x machine so you can use the multicore package. There isn't much learning involved since parallelizing over the cpu's of a single machine is pretty much painless as long as you satisfy (iii) above. This is only because the last I heard the "multicore" package (which foreach/doMC depends on) doesn't really work on windows. For instance, instead of something like: results <- lapply(1:100, function(x) doSomethingWith(x)) or: results <- list() for (x in 1:100) { results[[x]] <- doSomethingWith(x) } You do: results <- foreach(x=1:100) %dopar% { doSomethingWith(x) } That having been said, I wouldn't use foreach all the time as a "default" replacement for the normal/sequential "for" loop, because there is some rigging involved in using it, and it might not be worth it if the code you are iterating over isn't too heavy. Another nice thing is that the foreach process "degrades" gracefully. For instance, if you are running on a machine that doesn't have any foreach backend packages installed/enabled (the backend package determines the "parallelization strategy", eg: "doMC" is a foreach backend that parallelizes over the cpus/cores of 1 machine, others parallelize over different machines in a cluster), then it will just run the code in the %dopar% block sequentially. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact