"Juan Pablo Romero M?ndez" <jpablo.romero at gmail.com> writes:
> Hello,
>
> The problem I'm working now requires to operate on big matrices.
>
> I've noticed that there are some packages that allows to run some
> commands in parallel. I've tried snow and NetWorkSpaces, without much
> success (they are far more slower that the normal functions)
Do you mean like this?
> library(Rmpi)
> mpi.spawn.Rslaves(nsl=2) # dual core on my laptop
> m <- matrix(0, 10000, 1000)
> system.time(x1 <- apply(m, 2, sum), gcFirst=TRUE)
   user  system elapsed
  0.644   0.148   1.017> system.time(x2 <- mpi.parApply(m, 2, sum), gcFirst=TRUE)
   user  system elapsed
  5.188   2.844  10.693
          
? (This is with Rmpi, a third alternative you did not mention;
'elapsed' time seems to be relevant here.)
The basic problem is that the overhead of dividing the matrix up and
communicating between processes outweighs the already-efficient
computation being performed.
One solution is to organize your code into 'coarse' grains, so the FUN
in apply does (considerably) more work.
A second approach is to develop a better algorithm / use an
appropriate R paradigm, e.g.,
> system.time(x3 <- colSums(m), gcFirst=TRUE)
   user  system elapsed
  0.060   0.000   0.088
     
(or even faster, x4 <- rep(0, ncol(m)) ;)
A third approach, if your calculations make heavy use of linear
algebra, is to build R with a vectorized BLAS library; see the R
Installation and Administration guide.
A fourth possibility is to use Tierney's 'pnmath' library mentioned
in
this thread
https://stat.ethz.ch/pipermail/r-help/2007-December/148756.html
The README file needs to be consulted for the not-exactly-trivial (on
my system) task of installing the package. Specific functions are
parallelized, provided the length of the calculation makes it seem
worth-while.
> system.time(exp(m), gcFirst=TRUE)
   user  system elapsed
  0.108   0.000   0.106> library(pnmath)
> system.time(exp(m), gcFirst=TRUE)
   user  system elapsed
  0.096   0.004   0.052
(elapsed time about 2x faster). Both BLAS and pnmath make much better
use of resources, since they do not require multiple R instances.
None of these approaches would make a colSums faster -- the work is
just too small for the overhead.
Martin
> My problem is very simple, it doesn't require any communication
> between parallel tasks; only that it divides simetricaly the task
> between the available cores. Also, I don't want to run the code in a
> cluster, just my multicore machine (4 cores).
>
> What solution would you propose, given your experience?
>
> Regards,
>
>   Juan Pablo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793