Hi; I have a R script that includes a call to genoud(); genoud process lasts about 4 seconds, what would be OK if I hadn't have to call it about 2000 times. This yields about 2 hours of processing. And I would like to use this script operationally; so that it should be run twice a day. It seems to me that the parallel processing option included in genoud() divides the task inside the function among the computers included in the cluster. On the other hand, my consecutive calls to genoud() are independent of each other, but all depend on objects stored in the R workspace. I think that communication time among computer for a 4 second task, repeated 2000 times should be slower that to divide the calls to genoud among the number of available computers. So, perhaps a viable option to speed up the process could be something as: 1) Somehow make a copy from the workspace on the fly (I mean put some command, before the loop that call genoud(), to export the workspace in its actual state to other computers) 2) divide the task in the number of available computers in the network;e.g, if I've got my "localhost" and 3 computers more: n.comp <- 4 nsteps <- 1987 steps.c <- trunc(nsteps/n.comp) steps.c <- (1:n.comp)*steps.c steps.c <- c(steps.c[1:(n.comp-1)],nsteps) steps.i <- c(1,steps.c[-n.comp]+1) for(ic in 1:n.comp){ Somehow start remotely R, read the copied workspace and execute in computer ic for(i in steps.i[ic]:steps.c[ic]){something[i];genoud(f(i));somethin.else[i]} and somehow get back results from ic } 3) concacenate results in my "localhost" workspace You can see I'm rather lost with this. Could you help with this? Regards, Javier
Hi Javier The Rmpi or snow packages might help, e.g., mpi.parLapply; you need to pay attention to what gets (explicitly or implicitly) shared with other nodes. Martin jgarcia at ija.csic.es writes:> Hi; > I have a R script that includes a call to genoud(); genoud process lasts > about 4 seconds, what would be OK if I hadn't have to call it about 2000 > times. This yields about 2 hours of processing. > And I would like to use this script operationally; so that it should be > run twice a day. It seems to me that the parallel processing option > included in genoud() divides the task inside the function among the > computers included in the cluster. On the other hand, my consecutive calls > to genoud() are independent of each other, but all depend on objects > stored in the R workspace. I think that communication time among computer > for a 4 second task, repeated 2000 times should be slower that to divide > the calls to genoud among the number of available computers. So, perhaps a > viable option to speed up the process could be something as: > > 1) Somehow make a copy from the workspace on the fly (I mean put some > command, before the loop that call genoud(), to export the workspace in > its actual state to other computers) > 2) divide the task in the number of available computers in the > network;e.g, if I've got my "localhost" and 3 computers more: > n.comp <- 4 > nsteps <- 1987 > steps.c <- trunc(nsteps/n.comp) > steps.c <- (1:n.comp)*steps.c > steps.c <- c(steps.c[1:(n.comp-1)],nsteps) > steps.i <- c(1,steps.c[-n.comp]+1) > for(ic in 1:n.comp){ > Somehow start remotely R, read the copied workspace and execute in > computer ic for(i in > steps.i[ic]:steps.c[ic]){something[i];genoud(f(i));somethin.else[i]} > and somehow get back results from ic > } > 3) concacenate results in my "localhost" workspace > > You can see I'm rather lost with this. Could you help with this? > > Regards, > Javier > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Also look at the nws package for another way to do this. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of jgarcia at ija.csic.es > Sent: Thursday, November 22, 2007 3:29 AM > To: r-help at r-project.org > Subject: [R] manual parallel processing > > Hi; > I have a R script that includes a call to genoud(); genoud > process lasts about 4 seconds, what would be OK if I hadn't > have to call it about 2000 times. This yields about 2 hours > of processing. > And I would like to use this script operationally; so that it > should be run twice a day. It seems to me that the parallel > processing option included in genoud() divides the task > inside the function among the computers included in the > cluster. On the other hand, my consecutive calls to genoud() > are independent of each other, but all depend on objects > stored in the R workspace. I think that communication time > among computer for a 4 second task, repeated 2000 times > should be slower that to divide the calls to genoud among the > number of available computers. So, perhaps a viable option to > speed up the process could be something as: > > 1) Somehow make a copy from the workspace on the fly (I mean > put some command, before the loop that call genoud(), to > export the workspace in its actual state to other computers) > 2) divide the task in the number of available computers in > the network;e.g, if I've got my "localhost" and 3 computers more: > n.comp <- 4 > nsteps <- 1987 > steps.c <- trunc(nsteps/n.comp) > steps.c <- (1:n.comp)*steps.c > steps.c <- c(steps.c[1:(n.comp-1)],nsteps) steps.i <- > c(1,steps.c[-n.comp]+1) for(ic in 1:n.comp){ Somehow start > remotely R, read the copied workspace and execute in computer > ic for(i in > steps.i[ic]:steps.c[ic]){something[i];genoud(f(i));somethin.else[i]} > and somehow get back results from ic > } > 3) concacenate results in my "localhost" workspace > > You can see I'm rather lost with this. Could you help with this? > > Regards, > Javier > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- horizontal grouped stacked plots and removing space between bars
- Issue with crammed Y axis
- Parallel computing in R for dummies--how to optimize an external model?
- setting up a genoud run
- Minimizing two non-linear functions with genoud - Trying to minimize or converge near zero