Dear R users, we are trying to do some parallel computing using library(snow). In particular we have a cluster with 3 nodes>cl <- makeCluster(3, type = "MPI")3 slaves are spawned successfully. 0 failed. and we want to compute the function op_mat (see below) first with the master and then with the cluster using system.time for checking the computational performance. op_mat = function(mat) { + inv = solve(mat) + det_inv = det(inversa) + tr_inv = sum(diag(inversa)) + return(list(c(det=det_inv,tr=tr_inv))) + }>nn = 3000 >XX = matrix(rnorm(nn*nn),nn,nn)# with the master> system.time(op_matrici(XX))[1] 42.283 1.883 44.168 0.000 0.000 # with the cluster> system.time(clusterCall(cl,op_matrici,XX))[1] 11.523 12.612 71.562 0.000 0.000 You can see that using the master it takes 44.168 seconds for computing the function on matrix XX while it takes 71.562 seconds (more time!!!) with the cluster. Can you give us some advice in order to understand why the cluster is slower than the master? Thank you very much in advance, bye Michela and Marco Ps: we have a gigabit ethernet between the master and the nodes
On Fri, 13 Oct 2006, Michela Cameletti wrote:> Dear R users, > we are trying to do some parallel computing using library(snow). > In particular we have a cluster with 3 nodes > >> cl <- makeCluster(3, type = "MPI") > 3 slaves are spawned successfully. 0 failed. > > > and we want to compute the function op_mat (see below) first with the > master and then with the cluster using system.time for checking the > computational performance. > > op_mat = function(mat) { > > + inv = solve(mat) > + det_inv = det(inversa) > + tr_inv = sum(diag(inversa)) > + return(list(c(det=det_inv,tr=tr_inv))) > + } > >> nn = 3000 >> XX = matrix(rnorm(nn*nn),nn,nn) > # with the master >> system.time(op_matrici(XX)) > [1] 42.283 1.883 44.168 0.000 0.000 > # with the cluster >> system.time(clusterCall(cl,op_matrici,XX)) > [1] 11.523 12.612 71.562 0.000 0.000 > > You can see that using the master it takes 44.168 seconds for computing > the function on matrix XX while it takes 71.562 seconds (more time!!!) > with the cluster. Can you give us some advice in order to understand why > the cluster is slower than the master?clusterCall() evaluates the same call on each computer in the cluster, so it will always be slower than just evaluating on the master. It is useful for setup that has to be performed on each machine, or for parallel evaluation of random functions (eg boostrapping, simulation) To split up a single computation you have to do it explicitly, eg with parLapply, parSapply, and parApply, or parMM for parallel matrix multiplication. It's unlikely that you could speed up inverting a dense matrix even with gigabit ethernet for communication -- the success of ATLAS and Dr Goto's tuned BLAS libraries shows that the time taken for dense linear algebra can be dominated by communications overhead even between a CPU and its own memory. -thomas
clusterCall invokes the same function on all three nodes. You have basically discovered the communication costs of performing the calculation in parallel. You'll get the easiest gains from snow (and other parallel packages in R) with 'embarrassingly parallel' problems, where the same algorithm is applied to different data sets / slices of data. For performance gains from a single call to op_mat, you'd have to do some serious parallel algorithm development to distribute the data and computations effectively. Hope that helps, Martin Michela Cameletti <michela.cameletti at unibg.it> writes:> Dear R users, > we are trying to do some parallel computing using library(snow). > In particular we have a cluster with 3 nodes > >>cl <- makeCluster(3, type = "MPI") > 3 slaves are spawned successfully. 0 failed. > > > and we want to compute the function op_mat (see below) first with the > master and then with the cluster using system.time for checking the > computational performance. > > op_mat = function(mat) { > > + inv = solve(mat) > + det_inv = det(inversa) > + tr_inv = sum(diag(inversa)) > + return(list(c(det=det_inv,tr=tr_inv))) > + } > >>nn = 3000 >>XX = matrix(rnorm(nn*nn),nn,nn) > # with the master >> system.time(op_matrici(XX)) > [1] 42.283 1.883 44.168 0.000 0.000 > # with the cluster >> system.time(clusterCall(cl,op_matrici,XX)) > [1] 11.523 12.612 71.562 0.000 0.000 > > You can see that using the master it takes 44.168 seconds for computing > the function on matrix XX while it takes 71.562 seconds (more time!!!) > with the cluster. Can you give us some advice in order to understand why > the cluster is slower than the master? > Thank you very much in advance, > bye > Michela and Marco > Ps: we have a gigabit ethernet between the master and the nodes > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin T. Morgan Bioconductor / Computational Biology http://bioconductor.org
On Fri, 13 Oct 2006, Michela Cameletti wrote:> Dear R users, > we are trying to do some parallel computing using library(snow). > In particular we have a cluster with 3 nodes > >> cl <- makeCluster(3, type = "MPI") > 3 slaves are spawned successfully. 0 failed. > > > and we want to compute the function op_mat (see below) first with the > master and then with the cluster using system.time for checking the > computational performance. > > op_mat = function(mat) { > > + inv = solve(mat) > + det_inv = det(inversa) > + tr_inv = sum(diag(inversa)) > + return(list(c(det=det_inv,tr=tr_inv))) > + }What is inversa?> >> nn = 3000 >> XX = matrix(rnorm(nn*nn),nn,nn) > # with the master >> system.time(op_matrici(XX)) > [1] 42.283 1.883 44.168 0.000 0.000 > # with the cluster >> system.time(clusterCall(cl,op_matrici,XX)) > [1] 11.523 12.612 71.562 0.000 0.000 > > You can see that using the master it takes 44.168 seconds for computing > the function on matrix XX while it takes 71.562 seconds (more time!!!)Of coure it takes more time to do the same computation plus communication! The amount of additional time seems high if your nodes are comparable in speed to your master and you really are getting gigabit performance. I would look for a visualization tool an idea of what is happening--perhaps xmpi if your MPI is LAM. Best, luke> with the cluster. Can you give us some advice in order to understand why > the cluster is slower than the master? > Thank you very much in advance, > bye > Michela and Marco > Ps: we have a gigabit ethernet between the master and the nodes > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke at stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu