Waichler, Scott R
2006-Mar-13 23:50 UTC
[R] Parallel computing with the snow package: external file I/O possible?
Hello, I am trying to do model autocalibration using the snow and rgenoud packages. The function I want to run in task-parallel fashion across multiple machines is one that pre- and post-processes data and runs an external model code. My problem is that external file I/O is happening only in the master node and not in the slaves. I have followed Jasjeet Sekhon's suggestion to test the cluster setup, and that is fine:> library(snow) > > #pick two machines > cl <- makeCluster(c("moab","escalante")) > > clusterCall(cl, sin, 2)> The output should be: > > clusterCall(cl, sin, 2) > [[1]] > [1] 0.9092974 > > [[2]] > [1] 0.9092974 >I do indeed get the above result, so I presume the network setup is ok. Next I tested a function that creates a file. Here is the code that I sourced from the master ("moab"): # begin script library(snow) setDefaultClusterOptions(outfile="/tmp/cluster1") setDefaultClusterOptions(master="moab") cl <- makeCluster(c("moab", "escalante"), type="SOCK") # Define base pathname for output from my.test() base.dir <- "./test" # Define a function that includes some file I/O my.test <- function(base.dir) { this.host <- as.character(system("hostname")) # to tag the node that makes the file this.rnd <- sample(1:1e6, 1) # to be 'sure' the files have different names test.file <- paste(sep="", base.dir, "_", this.host, "_", this.rnd) file.create(test.file) } # end my.test() g <- clusterCall(cl, my.test, base.dir) print(g) stopCluster(cl) # end script The output (g) was as follows: [[1]] [1] TRUE [[2]] [1] TRUE But there was only one file created, which I suspect is by the master node. A second file was not created by the process on the slave. Also, system("hostname") returns the number 0 for moab instead of the name. Any ideas as to what might be wrong? Thanks, Scott Waichler scott.waichler _at_ pnl.gov
Martin Morgan
2006-Mar-14 00:20 UTC
[R] Parallel computing with the snow package: external file I/O possible?
Hi Scott -- It took me a bit to figure it out, but the help page for system made it seem like system should return the exit status, rather than the command result, if system is invoked without specifying intern TRUE. So why does system("hostname") actually print the host name? it's a side-effect, representing the stdout of the system command rather than the result of a function evaluation in R! Compare> res <- system("hostname") > res0 You'll get the side effect printed to the screen, but the result returned to R (invisibly, I guess) is the exit status -- 0. Snow captures the return value, rather than the side effect. So the solution is to use either system("hostname", intern = TRUE ) or Sys.info()[["nodename"]] Hope that helps! Martin "Waichler, Scott R" <Scott.Waichler at pnl.gov> writes:> > Hello, > > I am trying to do model autocalibration using the snow and rgenoud > packages. The function I want to run in task-parallel fashion across > multiple machines is one that pre- and post-processes data and runs an > external model code. My problem is that external file I/O is happening > only in the master node and not in the slaves. I have followed Jasjeet > Sekhon's suggestion to test the cluster setup, and that is fine: > >> library(snow) >> >> #pick two machines >> cl <- makeCluster(c("moab","escalante")) >> >> clusterCall(cl, sin, 2) > >> The output should be: >> > clusterCall(cl, sin, 2) >> [[1]] >> [1] 0.9092974 >> >> [[2]] >> [1] 0.9092974 >> > > I do indeed get the above result, so I presume the network setup is ok. > Next I tested a function that creates a file. Here is the code that I > sourced from the master ("moab"): > > # begin script > library(snow) > > setDefaultClusterOptions(outfile="/tmp/cluster1") > setDefaultClusterOptions(master="moab") > cl <- makeCluster(c("moab", "escalante"), type="SOCK") > > # Define base pathname for output from my.test() > base.dir <- "./test" > > # Define a function that includes some file I/O > my.test <- function(base.dir) { > this.host <- as.character(system("hostname")) # to tag the node that > makes the file > this.rnd <- sample(1:1e6, 1) # to be 'sure' the files have different > names > test.file <- paste(sep="", base.dir, "_", this.host, "_", this.rnd) > file.create(test.file) > } # end my.test() > > g <- clusterCall(cl, my.test, base.dir) > print(g) > stopCluster(cl) > # end script > > > The output (g) was as follows: > > [[1]] > [1] TRUE > > [[2]] > [1] TRUE > > But there was only one file created, which I suspect is by the master > node. A second file was not created by the process on the slave. Also, > system("hostname") returns the number 0 for moab instead of the name. > Any ideas as to what might be wrong? > > Thanks, > Scott Waichler > scott.waichler _at_ pnl.gov > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html