thr3ads.net - R devel - [Rd] naive question regarding running parallel C code from R [Apr 2008]

If this information is useful, please help other people find it:
Share via:

tyler

2008-Apr-18 16:03 UTC

[Rd] naive question regarding running parallel C code from R

Hi,

I have only the vaguest notions of what parallel programing, but I think
I have a situation where it might be of use to me, or at least provide
me with the opportunity to learn more about it. Before I invest in
figuring out the nuts and bolts, can anyone confirm that this is a sane
approach, or provide alternatives that I could pursue?

I'm running stochastic simulations, with the actual simulation in C
code, with an R interface to set up the parameters, format the output,
and save the resulting objects periodically through the run. The basic
layout is:

R function sets up the run
R for(c = 0; c < CYCLES; c++)
  call C function
  C for(i = 0; i < TIME; i++)
    immigration loop adds individuals to the recruit vector
    birth loop adds individuals to the recruit vector
    recruit vector is added to the community vector
    death loop removes excess individuals
  Return results to R, which processes and saves the objects
  Repeat

A typical run has 20 cycles, each with 500 time steps, and takes about
an hour. The immigration and birth loops are independent of each other,
and so could run simultaneously. They both add to the recruit vector,
but the order of the addition doesn't matter so long as both finish
before the recruit vector is added to the community vector. The
immigration, birth, and death loop iterate over arrays in a way that the
outcome at different locations is independent. i.e., the impact of the
birth vector on recruit vector position 0 has no influence on what the
birth vector does to recruit vector position 1.

What I'm thinking of doing is running the birth and immigration loops as
separate threads, and possibly running each of those threads as a group
of threads - so a thread for a birth loop that iterates over the first N
positions, another thread for the second N positions and so on.

I'm keen to learn about parallel programming, but I don't understand
enough yet to make sense of the information in the R extensions manual
and the various discussions on this list about R being thread-safe. Does
it matter if R is thread-safe if the actual simulation is being computed
in separate, shared C code?

I'm running my current, sequential code, on a cluster that supports both
OpenMP and MPI, should I figure out how to use it.

Thanks for your patience,

Tyler

-- 
Don't learn the tricks of the trade; learn the trade.

Simon Urbanek

2008-Apr-18 16:46 UTC

head link

[Rd] naive question regarding running parallel C code from R

On Apr 18, 2008, at 12:03 PM, tyler wrote:
> Hi,
>
> I have only the vaguest notions of what parallel programing, but I  
> think
> I have a situation where it might be of use to me, or at least provide
> me with the opportunity to learn more about it. Before I invest in
> figuring out the nuts and bolts, can anyone confirm that this is a  
> sane
> approach, or provide alternatives that I could pursue?
>
> I'm running stochastic simulations, with the actual simulation in C
> code, with an R interface to set up the parameters, format the output,
> and save the resulting objects periodically through the run. The basic
> layout is:
>
> R function sets up the run
> R for(c = 0; c < CYCLES; c++)
>  call C function
>  C for(i = 0; i < TIME; i++)
>    immigration loop adds individuals to the recruit vector
>    birth loop adds individuals to the recruit vector
>    recruit vector is added to the community vector
>    death loop removes excess individuals
>  Return results to R, which processes and saves the objects
>  Repeat
>
> A typical run has 20 cycles, each with 500 time steps, and takes about
> an hour. The immigration and birth loops are independent of each  
> other,
> and so could run simultaneously. They both add to the recruit vector,
> but the order of the addition doesn't matter so long as both finish
> before the recruit vector is added to the community vector. The
> immigration, birth, and death loop iterate over arrays in a way that  
> the
> outcome at different locations is independent. i.e., the impact of the
> birth vector on recruit vector position 0 has no influence on what the
> birth vector does to recruit vector position 1.
>
> What I'm thinking of doing is running the birth and immigration  
> loops as
> separate threads, and possibly running each of those threads as a  
> group
> of threads - so a thread for a birth loop that iterates over the  
> first N
> positions, another thread for the second N positions and so on.
>
> I'm keen to learn about parallel programming, but I don't
understand
> enough yet to make sense of the information in the R extensions manual
> and the various discussions on this list about R being thread-safe.  
> Does
> it matter if R is thread-safe if the actual simulation is being  
> computed
> in separate, shared C code?
>
As long as the code doesn't touch R during the time it's running  
threads, no (for all practical purposes). I.e. if you setup native  
structures to work on, you're free to spawn threads, wait for them to  
finish and then return to R. There are some issues that you could  
encounter: on some platforms you need special, reentrant versions of  
system libraries  which you don't get with R; there could be issues  
with error handling and interrupts - you will be on your own.

> I'm running my current, sequential code, on a cluster that supports  
> both OpenMP and MPI, should I figure out how to use it.
>
You can save yourself all the hassle and just use snow + Rmpi - that  
allows you to parallelize even R code, so you don' t need any of the  
above.

Cheers,
Simon

Dirk Eddelbuettel

2008-Apr-18 16:52 UTC

head link

[Rd] naive question regarding running parallel C code from R

On 18 April 2008 at 13:03, tyler wrote:
| Hi,
| 
| I have only the vaguest notions of what parallel programing, but I think
| I have a situation where it might be of use to me, or at least provide
| me with the opportunity to learn more about it. Before I invest in
| figuring out the nuts and bolts, can anyone confirm that this is a sane
| approach, or provide alternatives that I could pursue?
| 
| I'm running stochastic simulations, with the actual simulation in C
| code, with an R interface to set up the parameters, format the output,
| and save the resulting objects periodically through the run. The basic
| layout is:
| 
| R function sets up the run
| R for(c = 0; c < CYCLES; c++)
|   call C function
|   C for(i = 0; i < TIME; i++)
|     immigration loop adds individuals to the recruit vector
|     birth loop adds individuals to the recruit vector
|     recruit vector is added to the community vector
|     death loop removes excess individuals
|   Return results to R, which processes and saves the objects
|   Repeat
| 
| A typical run has 20 cycles, each with 500 time steps, and takes about
| an hour. The immigration and birth loops are independent of each other,
| and so could run simultaneously. They both add to the recruit vector,
| but the order of the addition doesn't matter so long as both finish
| before the recruit vector is added to the community vector. The
| immigration, birth, and death loop iterate over arrays in a way that the
| outcome at different locations is independent. i.e., the impact of the
| birth vector on recruit vector position 0 has no influence on what the
| birth vector does to recruit vector position 1.
| 
| What I'm thinking of doing is running the birth and immigration loops as
| separate threads, and possibly running each of those threads as a group
| of threads - so a thread for a birth loop that iterates over the first N
| positions, another thread for the second N positions and so on.
| 
| I'm keen to learn about parallel programming, but I don't understand
| enough yet to make sense of the information in the R extensions manual
| and the various discussions on this list about R being thread-safe. Does
| it matter if R is thread-safe if the actual simulation is being computed
| in separate, shared C code?
| 
| I'm running my current, sequential code, on a cluster that supports both
| OpenMP and MPI, should I figure out how to use it.

As I recall, you use Debian so do

   $ sudo apt-get install r-cran-snow r-cran-rmpi

to get R support working out-of-the box. Then study the examples for Snow on
Luke's website. [ I also have some slides on my website from presentations I
gave a few years ago. ]

That allows you to _easily_ do the so-called embarassingly parallel: same
problem, different parameters.  Or, you could also loop over CYCLES across
the cluster.  Start with something simple to study it and then go from there.

Hope this helps, Dirk

--
Three out of two people have difficulties with fractions.

Apparently Analagous Threads

Search for more reasonably related threads

R devel - Apr 2008 - naive question regarding running parallel C code from R

[Rd] naive question regarding running parallel C code from R

[Rd] naive question regarding running parallel C code from R

[Rd] naive question regarding running parallel C code from R

Apparently Analagous Threads