Chris Gaiteri
2008-Jul-10 16:50 UTC
[R] embarrassingly parallel problem - simple loop solution
I have an "embarrassingly parallel" routine that I need to run 24000^2/2 times (based on some microarray data). All I really need to do is parallelize a nested for-loop. But I haven't found a clear list of what packages/commands I'd need to do this. I've got a dual quad core xeon system running RHEL5, so if I could use hyperthreading to increase the number of (virtual) nodes that would be great too. Appreciate the help. Chris [[alternative HTML version deleted]]
Martin Morgan
2008-Jul-11 01:51 UTC
[R] embarrassingly parallel problem - simple loop solution
Hi Chris -- "Chris Gaiteri" <gaiteri at gmail.com> writes:> I have an "embarrassingly parallel" routine that I need to run 24000^2/2 > times (based on some microarray data). All I really need to do is > parallelize a nested for-loop. But I haven't found a clear list of what > packages/commands I'd need to do this. I've got a dual quad core xeonAny of snow / Rmpi / nws / rpvm (the former has system requirements, the latter three additional software requirements) provide the basic embarrassingly parallel functionality via variants of lapply, e.g., mpi.parLapply. Vectorized ATLAS (search for ATLAS in the R Installation and Administration Guide) and the experimental package pnmath (see a thread (oops, pun) starting in June with subject Parallel R, for instance) provide parallelism at a finer grain, i.e., the level of linear algebra (ATLAS) or R's math library (pnmath).> system running RHEL5, so if I could use hyperthreading to increase the > number of (virtual) nodes that would be great too.The snow-like solutions allow you to launch as many instances of R as you like (e.g., one per CPU); each operates quasi-independently. Each instance of R uses it's own memory, and for big memory problems this might limit the number of instances per machine. ATLAS / pnmath make much better use of resources and work without code modification. But these solutions only provide benefit when the calculations are appropriately numerical; many calculations are not formulated in a way that would take advantage of this. A recent post from Prof. Ripley also mentions the benefits that come from building R with compiler flags tuned to your chip, but I'm not able to locate the thread at the moment. If you're coming at this from scratch, on a Linux-based system, then snow is probably the easiest to get going, using 'socket'-based clusters. I use Rmpi and, to a lesser extent, pnmath. Both at least in part because I'm interested in the C-level implementations (MPI and openMP, respectively). Martin> Appreciate the help. > > Chris > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793