thr3ads.net - R help - [R] Suggestions for poor man's parallel processing [May 2002]

If this information is useful, please help other people find it:
Share via:

David Kane <David Kane

2002-May-08 12:45 UTC

[R] Suggestions for poor man's parallel processing

Almost all of the heavy crunching I do in R is like:
> for(i in long.list){+ do.something(i)
+ }> collect.results()
Since all the invocations of do.something are independent of one another, there
is no reason that I can't run them in parallel. Since my machine has four
processors, a natural way to do this is to divide up long.list into 4 pieces
and then start 4 jobs, each of which would process 1/4 of the items. I could
then wait for the four jobs to finish (waiting for tag files and the like),
collect the results, and go on my happy way. I might do this all within R
(using system calls to fork off other R processes?) or by using Perl as a
wrapper.

But surely there are others that have faced and solved this problem already! I
do not *think* that I want to go into the details of RPVM since my needs are so
limitted. Does anyone have any advice for me? Various postings to R-help have
hinted at ideas, but I couldn't find anything definitive. I will summarize
for
the list.

To the extent that it matters:
> R.version         _                   
platform sparc-sun-solaris2.6
arch     sparc               
os       solaris2.6          
system   sparc, solaris2.6   
status                       
major    1                   
minor    5.0                 
year     2002                
month    04                  
day      29                  
language R                   


Regards,

Dave Kane
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Timothy H. Keitt

2002-May-08 15:06 UTC

head link

[R] Suggestions for poor man's parallel processing

By far the easiest approach is, as you say, just to hand code parameter
ranges into R scripts and run them on different machines.

That said, I've recently found using a relational database to store
parameter combinations and associated results really convenient. The
basic idea is

<one time>

insert all parameter combinations into database
mark each row 'not started'

<on each client>

connect to database

while (1) {
	
	break if invalid connection
	begin transaction
	lock table
	query for parameter combination marked 'not started'
	break if no row returned
	mark returned row as 'in progress'
	insert time stamp and client host name <optional>
	end transaction <unlocks table for other clients>
	compute the result
	insert result into database
	insert ending time stamp <optional>
	mark row 'completed'

}

close connection
exit


I then fire up the client script on each machine (one per processor) and
let it run until its done. You get automatic load balancing because
faster machines process more parameter combinations. You can also query
the database to get intermediate results and see how long before the
entire parameter space is processed. I recently used this approach to do
~320 days of computing in about 30 days. I added 40 client jobs on a
nearby cluster half way through the run when it became clear my 6 local
cpu's were going to take awhile. It was really convenient that I could
add clients without disrupting anything. (As written, however, you
cannot kill client jobs without leaving unfinished rows marked 'in
progress', but that can easily be fixed.)

T.

On Wed, 2002-05-08 at 08:45, David Kane -->> Almost all of the heavy crunching I do in R is like:
> 
> > for(i in long.list){
> + do.something(i)
> + }
> > collect.results()
> 
> Since all the invocations of do.something are independent of one another,
there
> is no reason that I can't run them in parallel. Since my machine has
four
> processors, a natural way to do this is to divide up long.list into 4
pieces
> and then start 4 jobs, each of which would process 1/4 of the items. I
could
> then wait for the four jobs to finish (waiting for tag files and the like),
> collect the results, and go on my happy way. I might do this all within R
> (using system calls to fork off other R processes?) or by using Perl as a
> wrapper.
> 
> But surely there are others that have faced and solved this problem
already! I
> do not *think* that I want to go into the details of RPVM since my needs
are so
> limitted. Does anyone have any advice for me? Various postings to R-help
have
> hinted at ideas, but I couldn't find anything definitive. I will
summarize for
> the list.
> 
> To the extent that it matters:
> 
> > R.version
>          _                   
> platform sparc-sun-solaris2.6
> arch     sparc               
> os       solaris2.6          
> system   sparc, solaris2.6   
> status                       
> major    1                   
> minor    5.0                 
> year     2002                
> month    04                  
> day      29                  
> language R                   
> 
> 
> Regards,
> 
> Dave Kane
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

A.J. Rossini

2002-May-09 02:05 UTC

head link

[R] Suggestions for poor man's parallel processing

Yes, RPVM is annoying to startup, however, there are higher level
functions that are coming (parallel apply), that might help justify
the effort.  Better yet, they are wrappers that hopefully will be
sitting on top of RPVM, or RMPI (via the LAM MPI implementaation), and
incorporate a prarllel pRNG for assurance of minimum quality.
(RSPRNG, a wrapper to the SPRNG library)

Coming soon (no set release date planned yet) to a CRAN archive near
you (the package is called SNOW, by Luke Tierney).
>>>>> "david" == David Kane <David Kane"
<a296180 at mica.fmr.com> writes:
    david> Almost all of the heavy crunching I do in R is like:
    >> for(i in long.list){
    david> + do.something(i)
    david> + }
    >> collect.results()

    david> Since all the invocations of do.something are independent of one
another, there
    david> is no reason that I can't run them in parallel. Since my
machine has four
    david> processors, a natural way to do this is to divide up long.list
into 4 pieces
    david> and then start 4 jobs, each of which would process 1/4 of the
items. I could
    david> then wait for the four jobs to finish (waiting for tag files and
the like),
    david> collect the results, and go on my happy way. I might do this all
within R
    david> (using system calls to fork off other R processes?) or by using
Perl as a
    david> wrapper.

    david> But surely there are others that have faced and solved this
problem already! I
    david> do not *think* that I want to go into the details of RPVM since my
needs are so
    david> limitted. Does anyone have any advice for me? Various postings to
R-help have
    david> hinted at ideas, but I couldn't find anything definitive. I
will summarize for
    david> the list.

    david> To the extent that it matters:

    >> R.version
    david>          _                   
    david> platform sparc-sun-solaris2.6
    david> arch     sparc               
    david> os       solaris2.6          
    david> system   sparc, solaris2.6   
    david> status                       
    david> major    1                   
    david> minor    5.0                 
    david> year     2002                
    david> month    04                  
    david> day      29                  
    david> language R                   


    david> Regards,

    david> Dave Kane
    david>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    david> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    david> Send "info", "help", or
"[un]subscribe"
    david> (in the "body", not the subject !)  To: r-help-request
at stat.math.ethz.ch
    david>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


-- 
A.J. Rossini				Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics		rossini at u.washington.edu	
FHCRC/SCHARP/HIV Vaccine Trials Net	rossini at scharp.org
-------------- http://software.biostat.washington.edu/ ----------------
FHCRC: M-W: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW:   Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
(my friday location is usually completely unpredictable.)


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Luke Tierney

2002-May-09 15:20 UTC

head link

[R] Suggestions for poor man's parallel processing

I've been working on a simple interface for this sort of thing modeled
loosely on the Python CoW (Cluster of Workstations) package.  A rough
draft writeup with a link to the preliminary package is at
http://www.stat.umn.edu/~luke/R/cluster/cluster.html. The idea is to
provide a very simple front end for handling things like farming out
simulations to a bunch of machines (or a bunch of processors on one
machine) and collecting the results.  The communications back ends
that are supported are sockets or pvm via Michael Li and Tony
Rossini's rpvm; mpi via Hao Yu's Rmpi should be eventually possible as
well.  Michael and Tony's rsprng is also supported.  It's very rough,
but I won't get to cleaning it up for a week or two at least, so if
anyone wants to play with it in the mean time, go ahead.

luke

On Wed, May 08, 2002 at 08:45:47AM -0400, David Kane  <David Kane
wrote:> Almost all of the heavy crunching I do in R is like:
> 
> > for(i in long.list){
> + do.something(i)
> + }
> > collect.results()
> 
> Since all the invocations of do.something are independent of one another,
there
> is no reason that I can't run them in parallel. Since my machine has
four
> processors, a natural way to do this is to divide up long.list into 4
pieces
> and then start 4 jobs, each of which would process 1/4 of the items. I
could
> then wait for the four jobs to finish (waiting for tag files and the like),
> collect the results, and go on my happy way. I might do this all within R
> (using system calls to fork off other R processes?) or by using Perl as a
> wrapper.
> 
> But surely there are others that have faced and solved this problem
already! I
> do not *think* that I want to go into the details of RPVM since my needs
are so
> limitted. Does anyone have any advice for me? Various postings to R-help
have
> hinted at ideas, but I couldn't find anything definitive. I will
summarize for
> the list.
> 
> To the extent that it matters:
> 
> > R.version
>          _                   
> platform sparc-sun-solaris2.6
> arch     sparc               
> os       solaris2.6          
> system   sparc, solaris2.6   
> status                       
> major    1                   
> minor    5.0                 
> year     2002                
> month    04                  
> day      29                  
> language R                   
> 
> 
> Regards,
> 
> Dave Kane
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-- 
Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke at stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Reasonably Related Threads

Search for more reasonably related threads

R help - May 2002 - Suggestions for poor man's parallel processing

[R] Suggestions for poor man's parallel processing

[R] Suggestions for poor man's parallel processing

[R] Suggestions for poor man's parallel processing

[R] Suggestions for poor man's parallel processing

Reasonably Related Threads