Hi,
as far as I know, there is no limitation on data size in regard to foreach. You
should reserve though enough memory for your application on the cluster (via
ulimit -s unlimited and ulimit -v unlimited).
Furthermore I would check the following:
Check if there are two versions of R on the cluster/your home directory on the
frontend (LSF loads this frontend environment and uses the R version installed
there). If you have two R executables (R and R64) make sure you use the 64bit
version.
Run R and call memory.limit() to see what are the limits of memory in your
system.
If this is limited to sizes below your needed sizes, increase it by calling R in
the LSF script with the options --max-mem-size=YourSize and if you get errors of
kind " cannot allocate vector of size" you should also use
--max-vsize=YourVSize.
Then, check if there is a memory leak in your application: If you compiled R
with the --enable-memory-profiling you can use Rprof to do this otherwise you
must rely on profiling instruments given by the cluster environment (I think you
work there as well with modules, so type in the shell 'module avail' for
listing available modules).
If you detect a memory leak or if you see, that at certain points in your
algorithms some objects are not used anymore call rm(ObjectName) and gc() for
garbage collection.
To your nested loop using foreach: That is a highly delicate issue in parallel
computing and for the foreach syntax I refer to the must-read
http://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf.
Using nested loops should be considered carefully in regard to organizing the
nesting. In C++ you have the ability to determine how many cores should work on
which loop. In the foreach environment using doMC this seems to me not possible.
And, please keep the discussion to the r-help mailing list, so others can learn
from it and researchers with more experience can also leave comments.
Best
Simon
On Sep 19, 2013, at 9:24 PM, pkount at bgc-jena.mpg.de wrote:
> Hi again,
>
> if you have some time I would like to bother you again with 2 more
questions. After your response the parallel code is working perfect but when I
implement that to the real case (big matrices) I get an error for not numeric
dimension and i guess that again it returns NULL or something. Are you aware if
foreach loop can handle only a certain size objects? the equation that I am
using includes 3 objects with 2Gb size each.
>
> The second question has to deal with the cores that foreach uses. Although
I am asking to our cluster (LSF) to give me certain number of cpus, and also i
am specifing that with
> library(doMC)
> registerDoMC(n)
>
> it seems from the top command that I am using all the cores. I am using 2
foreach as nest foreach(i in 1:16){
> foreach(j in 1:10) etc etc..
> maybe i should do something with this kind of nest? I am not aware about
that.
>
> I am sorry for the long text , and thank you for your nice solution
>
> _____________________________________
> Sent from http://r.789695.n4.nabble.com
>