rhelp.20.trevva at spamgourmet.com
2007-Mar-06 15:33 UTC
[R] How to utilise dual cores and multi-processors on WinXP
Hello, I have a question that I was wondering if anyone had a fairly straightforward answer to: what is the quickest and easiest way to take advantage of the extra cores / processors that are now commonplace on modern machines? And how do I do that in Windows? I realise that this is a complex question that is not answered easily, so let me refine it some more. The type of scripts that I'm dealing with are well suited to parallelisation - often they involve mapping out parameter space by changing a single parameter and then re-running the simulation 10 (or n times), and then brining all the results back to gether at the end for analysis. If I can distribute the runs over all the processors available in my machine, I'm going to roughly halve the run speed. The question is, how to do this? I've looked at many of the packages in this area: rmpi, snow, snowFT, rpvm, and taskPR - these all seem to have the functionality that I want, but don't exist for windows. The best solution is to switch to Linux, but unfortunately that's not an option. Another option is to divide the task in half from the beginning, spawn two "slave" instances of R (e.g. via Rcmd), let them run, and then collate the results at the end. But how exactly to do this and how to know when they're done? Can anyone recommend a nice solution? I'm sure that I'm not the only one who'd love to double their computational speed... Cheers, Mark
Greg Snow
2007-Mar-06 17:19 UTC
[R] How to utilise dual cores and multi-processors on WinXP
The nws package does run on windows and can split calculations between multiple R processes. I have not tried it with a single multiprocessor pc (don't have one), but have used it with multiple pc's. It looks like the muliprocessor pc would work pretty much with the defaults. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > rhelp.20.trevva at spamgourmet.com > Sent: Tuesday, March 06, 2007 8:33 AM > To: r-help at stat.math.ethz.ch > Subject: [R] How to utilise dual cores and multi-processors on WinXP > > Hello, > > I have a question that I was wondering if anyone had a fairly > straightforward answer to: what is the quickest and easiest > way to take advantage of the extra cores / processors that > are now commonplace on modern machines? And how do I do that > in Windows? > > I realise that this is a complex question that is not > answered easily, so let me refine it some more. The type of > scripts that I'm dealing with are well suited to > parallelisation - often they involve mapping out parameter > space by changing a single parameter and then re-running the > simulation 10 (or n times), and then brining all the results > back to gether at the end for analysis. If I can distribute > the runs over all the processors available in my machine, I'm > going to roughly halve the run speed. The question is, how to do this? > > I've looked at many of the packages in this area: rmpi, snow, > snowFT, rpvm, and taskPR - these all seem to have the > functionality that I want, but don't exist for windows. The > best solution is to switch to Linux, but unfortunately that's > not an option. > > Another option is to divide the task in half from the > beginning, spawn two "slave" instances of R (e.g. via Rcmd), > let them run, and then collate the results at the end. But > how exactly to do this and how to know when they're done? > > Can anyone recommend a nice solution? I'm sure that I'm not > the only one who'd love to double their computational speed... > > Cheers, > > Mark > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Martin Morgan
2007-Mar-06 18:07 UTC
[R] How to utilise dual cores and multi-processors on WinXP
rhelp.20.trevva at spamgourmet.com writes:> Hello, > > I have a question that I was wondering if anyone had a fairly > straightforward answer to: what is the quickest and easiest way to > take advantage of the extra cores / processors that are now > commonplace on modern machines? And how do I do that in Windows?> I realise that this is a complex question that is not answered easily, > so let me refine it some more. The type of scripts that I'm dealing > with are well suited to parallelisation - often they involve mapping > out parameter space by changing a single parameter and then re-running > the simulation 10 (or n times), and then brining all the results back > to gether at the end for analysis. If I can distribute the runs over > all the processors available in my machine, I'm going to roughly halve > the run speed. The question is, how to do this? > > I've looked at many of the packages in this area: rmpi, snow, snowFT, > rpvm, and taskPR - these all seem to have the functionality that I > want, but don't exist for windows. The best solution is to switch to > Linux, but unfortunately that's not an option.Rmpi runs on windows (see http://www.stats.uwo.ca/faculty/yu/Rmpi/). You'll end up modifying your code, probably using one of the many parLapply-like functions (from Rmpi; comparable functions in snow and the package papply) to do 'lapply' but spread over the different compute processors. This is likely to require some thought, as for instance the data transmission costs can overwhelm any speedup and the FUN argument to the lapply-like functions should probably reference only local variables. The classic first attempt performs the equivalent of 1000 bootstraps on each node, rather than dividing the 1000 replicates amongst nodes (which is actually quite hard to do). In principle I think you might also be able to use a parallelized LAPACK, following the general instruction of the R Installation and Administration guide. I have not done this. It would likely represent a challenge, and would benefit (perhaps) the code that uses the LAPACK linear algebra routines.> Another option is to divide the task in half from the beginning, spawn > two "slave" instances of R (e.g. via Rcmd), let them run, and then > collate the results at the end. But how exactly to do this and how to > know when they're done?The Bioconductor package Biobase has a function Aggregate that might be fun to explore; I don't think it receives much use.> Can anyone recommend a nice solution? I'm sure that I'm not the only > one who'd love to double their computational speed... > > Cheers, > > Mark > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the > posting guide http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code.-- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org