All, We are researching approaches to parallel R with the end goal of running R in a distributed manner on a Linux cluster. We expect of course to do some work decomposing our problems to be task-parallel or data-parallel, but wouldn't mind getting an initial boost working with "embarrassingly parallel" code sections and one of the approaches below. Incidentally our environment includes R 2.6.1, RHEL 5.1, Solaris 10, SGE (Sun Grid Engine) and OpenMPI 1.2.4 (SunHPC 7.1)). In researching previous work, the most promising approaches seem to be: A. Snow (with Rmpi or Rpvm) (as described in http://www.r-project.org/useR-2006/Slides/Harrington+Salibian-Barrera.pd f from the 2006 R User Conference) It is my understanding that this approach is viable, and works with OpenMPI 1.2.4. Is anyone using this method with good results? B. taskpR, RScaLAPACK, pMatrix I read a paper http://sdm.lbl.gov/sdmcenter/projects/SDM.center.parallel.r.2-pager.4.do c coming out of the ORNL, describing what they call "parallel R", which included taskpr, RScaLAPACK, pMatrix. I notice that taskpR is no longer available in "contrib", nor is pMatrix. An old link indicates the packages are available at http://www.ASPECT-SDM.org/Parallel-R but that site displays a notice that the server is migrating. Has this work been discontinued? Anyone using this? I see RScaLAPACK is still available, from reading the above it seems that was bundled with taskpR. Does it function without the other components? (Guess I'll try it and find out :) C. Sleigh & "NetworkSpaces" I see that SCAI (Scientific Computing Associates) offers a parallel R package based on something they call NetworkSpaces and "Sleigh" (inspired by Snow). They sell services around the product but it is open source. They have an enhanced version that they sell & support. http://www.lindaspaces.com/hp/BenchmarksWithCharts.pdf. Has anyone investigated this approach or it's open source components? TIA for any information, direction, suggestions, and if I've missed any other approaches please advise. Dan Lewis [[alternative HTML version deleted]]
We've also had substantial success with the Condor project [http://www.cs.wisc.edu/condor/], not just with R, but as a generic computation grid. John -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lewis, Daniel (IS Consultant) Sent: Monday, February 11, 2008 1:09 PM To: r-help at r-project.org Subject: [R] Viable Approach to Parallel R? All, We are researching approaches to parallel R with the end goal of running R in a distributed manner on a Linux cluster. We expect of course to do some work decomposing our problems to be task-parallel or data-parallel, but wouldn't mind getting an initial boost working with "embarrassingly parallel" code sections and one of the approaches below. Incidentally our environment includes R 2.6.1, RHEL 5.1, Solaris 10, SGE (Sun Grid Engine) and OpenMPI 1.2.4 (SunHPC 7.1)). In researching previous work, the most promising approaches seem to be: A. Snow (with Rmpi or Rpvm) (as described in http://www.r-project.org/useR-2006/Slides/Harrington+Salibian-Barrera.pd f from the 2006 R User Conference) It is my understanding that this approach is viable, and works with OpenMPI 1.2.4. Is anyone using this method with good results? B. taskpR, RScaLAPACK, pMatrix I read a paper http://sdm.lbl.gov/sdmcenter/projects/SDM.center.parallel.r.2-pager.4.do c coming out of the ORNL, describing what they call "parallel R", which included taskpr, RScaLAPACK, pMatrix. I notice that taskpR is no longer available in "contrib", nor is pMatrix. An old link indicates the packages are available at http://www.ASPECT-SDM.org/Parallel-R but that site displays a notice that the server is migrating. Has this work been discontinued? Anyone using this? I see RScaLAPACK is still available, from reading the above it seems that was bundled with taskpR. Does it function without the other components? (Guess I'll try it and find out :) C. Sleigh & "NetworkSpaces" I see that SCAI (Scientific Computing Associates) offers a parallel R package based on something they call NetworkSpaces and "Sleigh" (inspired by Snow). They sell services around the product but it is open source. They have an enhanced version that they sell & support. http://www.lindaspaces.com/hp/BenchmarksWithCharts.pdf. Has anyone investigated this approach or it's open source components? TIA for any information, direction, suggestions, and if I've missed any other approaches please advise. Dan Lewis [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.>>> This e-mail and any attachments are confidential, may contain legal,professional or other privileged information, and are intended solely for the addressee. If you are not the intended recipient, do not use the information in this e-mail in any way, delete this e-mail and notify the sender. CEG-IP2
Hi Dan, I've had pretty good luck using Snow with with Rpvm. It's definitely not what you'd call "plug and play," but it does work. I'm using it on a single computer to just take advantage of multiple processors, and it does a pretty good job of keeping them busy. The main gotchas I've found with Snow are in data dissemination: You may have to clusterCall(cl, "require(foo)") or clusterExport(cl,bar) more things than you would have expected. -Eric -- Eric W. Anderson University of Colorado eric.anderson at colorado.edu Dept. of Computer Science phone: +1-720-984-8864 Systems Research Lab - ECCR 1B54 PGP key fingerprints: personal: 1BD4 CFCE 8B59 8D6E EA3E EBD5 4DC9 3E61 656C 462B academic: D3C5 D6FF EDED 9F1F C36D 53A3 74B7 53A6 3C74 5F12 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : https://stat.ethz.ch/pipermail/r-help/attachments/20080223/245f4b52/attachment.bin