thr3ads.net - R help - [R] Rmpi performance [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Michela Cameletti

2006-Oct-13 15:18 UTC

[R] Rmpi performance

Dear R users,
we are trying to do some parallel computing using library(snow).
In particular we have a cluster with 3 nodes
>cl <- makeCluster(3, type = "MPI")        3 slaves are spawned successfully. 0 failed.


and we want to compute the function op_mat (see below) first with the 
master and then with the cluster using system.time for checking the 
computational performance.

op_mat = function(mat) {

+           inv = solve(mat)
+           det_inv = det(inversa)
+           tr_inv  = sum(diag(inversa))
+           return(list(c(det=det_inv,tr=tr_inv)))
+ }
>nn = 3000
>XX = matrix(rnorm(nn*nn),nn,nn)
# with the master> system.time(op_matrici(XX))[1] 42.283  1.883 44.168  0.000  0.000
# with the cluster> system.time(clusterCall(cl,op_matrici,XX))[1] 11.523 12.612 71.562  0.000  0.000

You can see that using the master it takes 44.168 seconds for computing 
the function on matrix XX while it takes 71.562 seconds (more time!!!) 
with the cluster. Can you give us some advice in order to understand why 
the cluster is slower than the master?
Thank you very much in advance,
bye
Michela  and Marco
Ps: we have a gigabit ethernet between the master and the nodes

Thomas Lumley

2006-Oct-13 16:39 UTC

head link

[R] Rmpi performance

On Fri, 13 Oct 2006, Michela Cameletti wrote:
> Dear R users,
> we are trying to do some parallel computing using library(snow).
> In particular we have a cluster with 3 nodes
>
>> cl <- makeCluster(3, type = "MPI")
>        3 slaves are spawned successfully. 0 failed.
>
>
> and we want to compute the function op_mat (see below) first with the
> master and then with the cluster using system.time for checking the
> computational performance.
>
> op_mat = function(mat) {
>
> +           inv = solve(mat)
> +           det_inv = det(inversa)
> +           tr_inv  = sum(diag(inversa))
> +           return(list(c(det=det_inv,tr=tr_inv)))
> + }
>
>> nn = 3000
>> XX = matrix(rnorm(nn*nn),nn,nn)
> # with the master
>> system.time(op_matrici(XX))
> [1] 42.283  1.883 44.168  0.000  0.000
> # with the cluster
>> system.time(clusterCall(cl,op_matrici,XX))
> [1] 11.523 12.612 71.562  0.000  0.000
>
> You can see that using the master it takes 44.168 seconds for computing
> the function on matrix XX while it takes 71.562 seconds (more time!!!)
> with the cluster. Can you give us some advice in order to understand why
> the cluster is slower than the master?
clusterCall() evaluates the same call on each computer in the cluster, so 
it will always be slower than just evaluating on the master.  It is 
useful for setup that has to be performed on each machine, or for parallel 
evaluation of random functions (eg boostrapping, simulation)

To split up a single computation you have to do it explicitly, eg with 
parLapply, parSapply, and parApply, or parMM for parallel matrix 
multiplication. It's unlikely that you could speed up inverting a dense 
matrix even with gigabit ethernet for communication -- the success of 
ATLAS and Dr Goto's tuned BLAS libraries shows that the time taken for 
dense linear algebra can be dominated by communications overhead even 
between a CPU and its own memory.

 	-thomas

Martin Morgan

2006-Oct-13 16:40 UTC

head link

[R] Rmpi performance

clusterCall invokes the same function on all three nodes. You have
basically discovered the communication costs of performing the
calculation in parallel.

You'll get the easiest gains from snow (and other parallel packages in
R) with 'embarrassingly parallel' problems, where the same algorithm is
applied to different data sets / slices of data. For performance gains
from a single call to op_mat, you'd have to do some serious parallel
algorithm development to distribute the data and computations
effectively.

Hope that helps,

Martin

Michela Cameletti <michela.cameletti at unibg.it> writes:
> Dear R users,
> we are trying to do some parallel computing using library(snow).
> In particular we have a cluster with 3 nodes
>
>>cl <- makeCluster(3, type = "MPI")
>         3 slaves are spawned successfully. 0 failed.
>
>
> and we want to compute the function op_mat (see below) first with the 
> master and then with the cluster using system.time for checking the 
> computational performance.
>
> op_mat = function(mat) {
>
> +           inv = solve(mat)
> +           det_inv = det(inversa)
> +           tr_inv  = sum(diag(inversa))
> +           return(list(c(det=det_inv,tr=tr_inv)))
> + }
>
>>nn = 3000
>>XX = matrix(rnorm(nn*nn),nn,nn)
> # with the master
>> system.time(op_matrici(XX))
> [1] 42.283  1.883 44.168  0.000  0.000
> # with the cluster
>> system.time(clusterCall(cl,op_matrici,XX))
> [1] 11.523 12.612 71.562  0.000  0.000
>
> You can see that using the master it takes 44.168 seconds for computing 
> the function on matrix XX while it takes 71.562 seconds (more time!!!) 
> with the cluster. Can you give us some advice in order to understand why 
> the cluster is slower than the master?
> Thank you very much in advance,
> bye
> Michela  and Marco
> Ps: we have a gigabit ethernet between the master and the nodes
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org

Luke Tierney

2006-Oct-13 16:56 UTC

head link

[R] Rmpi performance

On Fri, 13 Oct 2006, Michela Cameletti wrote:
> Dear R users,
> we are trying to do some parallel computing using library(snow).
> In particular we have a cluster with 3 nodes
>
>> cl <- makeCluster(3, type = "MPI")
>        3 slaves are spawned successfully. 0 failed.
>
>
> and we want to compute the function op_mat (see below) first with the
> master and then with the cluster using system.time for checking the
> computational performance.
>
> op_mat = function(mat) {
>
> +           inv = solve(mat)
> +           det_inv = det(inversa)
> +           tr_inv  = sum(diag(inversa))
> +           return(list(c(det=det_inv,tr=tr_inv)))
> + }
What is inversa?
>
>> nn = 3000
>> XX = matrix(rnorm(nn*nn),nn,nn)
> # with the master
>> system.time(op_matrici(XX))
> [1] 42.283  1.883 44.168  0.000  0.000
> # with the cluster
>> system.time(clusterCall(cl,op_matrici,XX))
> [1] 11.523 12.612 71.562  0.000  0.000
>
> You can see that using the master it takes 44.168 seconds for computing
> the function on matrix XX while it takes 71.562 seconds (more time!!!)
Of coure it takes more time to do the same computation plus
communication!

The amount of additional time seems high if your nodes are comparable
in speed to your master and you really are getting gigabit
performance.  I would look for a visualization tool an idea of what is
happening--perhaps xmpi if your MPI is LAM.

Best,

luke

> with the cluster. Can you give us some advice in order to understand why
> the cluster is slower than the master?
> Thank you very much in advance,
> bye
> Michela  and Marco
> Ps: we have a gigabit ethernet between the master and the nodes
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Reasonably Related Threads

Search for more reasonably related threads

R help - Oct 2006 - Rmpi performance

[R] Rmpi performance

[R] Rmpi performance

[R] Rmpi performance

[R] Rmpi performance

Reasonably Related Threads