Dear All, The "clients.txt" file of the latest Rserve package, by Simon Urbanek, says, regarding its R client, "(...) a simple R client, i.e. it allows you to connect to Rserve from R itself. It is very simple and limited, because Rserve was not primarily meant for R-to-R communication (there are better ways to do that), but it is useful for quick interactive connection to an Rserve farm." Which are those better ways to do it? I am thinking about using Rserve to have an R process send jobs to a bunch of Rserves in different machines. It is like what we could do with Rmpi (or pvm), but without the MPI layer. Therefore, presumably it'd be easier to deal with network problems, machine's failures, using checkpoints, etc. (i.e., to try to get better fault tolerance). It seems that Rserve would provide the basic infrastructure for doing that and saves me from reinventing the wheel of using sockets, etc, directly from R. However, Simon's comment about better ways of R-to-R communication made me wonder if this idea really makes sense. What is the catch? Have other people tried similar approaches? Thanks, R. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz
Hi Ramon, I've been interested in responses to your question. I have what I think is a similar issue - I have a very large simulation script and would like to be able to modularize it by having a main script that calls lots of subscripts - but I haven't done that yet because the only way I could think to do it was to call a subscript, have it run, save the objects from the subscript, and then call those objects back into the main script, which seems like a very slow and onerous way to do it. Would Rserve do what I'm looking for? On 4/7/07, Ramon Diaz-Uriarte <rdiaz02 at gmail.com> wrote:> Dear All, > > The "clients.txt" file of the latest Rserve package, by Simon Urbanek, > says, regarding its R client, > > "(...) a simple R client, i.e. it allows you to connect to Rserve from > R itself. It is very simple and limited, because Rserve was not > primarily meant for R-to-R communication (there are better ways to do > that), but it is useful for quick interactive connection to an Rserve > farm." > > Which are those better ways to do it? I am thinking about using Rserve > to have an R process send jobs to a bunch of Rserves in different > machines. It is like what we could do with Rmpi (or pvm), but without > the MPI layer. Therefore, presumably it'd be easier to deal with > network problems, machine's failures, using checkpoints, etc. (i.e., > to try to get better fault tolerance). > > It seems that Rserve would provide the basic infrastructure for doing > that and saves me from reinventing the wheel of using sockets, etc, > directly from R. > > However, Simon's comment about better ways of R-to-R communication > made me wonder if this idea really makes sense. What is the catch? > Have other people tried similar approaches? > > Thanks, > > R. > > -- > Ramon Diaz-Uriarte > Statistical Computing Team > Structural Biology and Biocomputing Programme > Spanish National Cancer Centre (CNIO) > http://ligarto.org/rdiaz > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Matthew C Keller Postdoctoral Fellow Virginia Institute for Psychiatric and Behavioral Genetics
On Apr 7, 2007, at 10:56 AM, Ramon Diaz-Uriarte wrote:> Dear All, > > The "clients.txt" file of the latest Rserve package, by Simon > Urbanek, says, regarding its R client, > > "(...) a simple R client, i.e. it allows you to connect to Rserve > from R itself. It is very simple and limited, because Rserve was > not primarily meant for R-to-R communication (there are better ways > to do that), but it is useful for quick interactive connection to > an Rserve farm." > > Which are those better ways to do it? I am thinking about using > Rserve to have an R process send jobs to a bunch of Rserves in > different machines. It is like what we could do with Rmpi (or pvm), > but without the MPI layer. Therefore, presumably it'd be easier to > deal with network problems, machine's failures, using checkpoints, > etc. (i.e., to try to get better fault tolerance). > > It seems that Rserve would provide the basic infrastructure for > doing that and saves me from reinventing the wheel of using > sockets, etc, directly from R. > > However, Simon's comment about better ways of R-to-R communication > made me wonder if this idea really makes sense. What is the catch? > Have other people tried similar approaches? >I was commenting on direct R-to-R communication using sockets + 'serialize' in R or the 'snow' package for parallel processing. The latter could be useful for what you have in mind, because it includes a socket-based implementation which allows you to spawn multiple children (across multiple machines) and collect their results. It uses regular rsh or ssh to start the jobs, so if can use that, it should work for you. 'snow' also has PVM and MPI implementations, the PVM one is really easy to setup (on unix) and that was what I was using for parallel computing in R on a cluster. Rserve is sort of comparable, but in addition it provides the spawning infrastructure due to its client/server concept. What it doesn't have is the convenience functions that snow provides like clusterApply etc. Thinking of it, it would be actually possible to add them, although I admit that the original goal of Rserve was not parallel computing :). The idea was to have one Rserve server and multiple clients whereas in 'snow' you sort of have one client and multiple servers. You could spawn multiple Rserves on multiple machines, but Rserve itself doesn't provide any load-balancing out of the box, so you'd have to do that yourself. I don't know if that helps... :) Cheers, Simon