Paul.Rustomji at csiro.au
2011-Sep-17 13:10 UTC
[R] Problem using SNOW with data frame as a function argument
Hello
I would like to use SNOW to parallelise some computations to be made on columns
of a data frame, using different parameter values for each SNOW
"worker".
I gather(?) clusterMap() is the appropriate SNOW function to do something like
this. I suspect the problem lies in the fact that I am only supplying one data
frame argument for the flow.dat function argument yet the a, b, and x arguments
have ten values each. I tried with RECYCLE=TRUE but still didn't work.
I have generated some example data below that illustrates my problem.
#example input data frames
mydat <- data.frame(a.in=1:10,b.in=1:10,x.in=1:10)
flow.dat <- data.frame(ww=100:105,zz=600:605)
#define the function
myfun<- function(a,b,x,flow.dat){
+ ee <- a+b+x
+ ff<- mean(flow.dat[,1])
+ return(ff)
+ }
#apply the function as per normal
print(myfun(a=mydat$a.in,
+ b=mydat$b.in,
+ x=mydat$x.in,
+ flow.dat=flow.dat))
[1] 102.5
#works OK, average of column one of data frame looks good
#a,b and x parameters read in OK , ee gets calculated but not returned
#now try to apply the function in parallel via SNOW
cl <- makeCluster(3,type="SOCK") #make a cluster
ll <- clusterMap(cl,fun=myfun,
+ a=mydat$a.in,
+ b=mydat$b.in,
+ x=mydat$x.in,
+ flow.dat=flow.dat)>Error in checkForRemoteErrors(val) :
10 nodes produced errors; first error: incorrect number of dimensions
stopCluster(cl)
_______________________________________________________
Here is system info
> Sys.info()
sysname release
version nodename
"Windows" "Server 2008 x64"
"build 7601, Service Pack 1" "POWERAPP4-WRON"
machine login
user
"x86-64" "xxxxxx"
"xxxxxx"
$version.string
[1] "R version 2.12.1 (2010-12-16)"
Paul Rustomji
Research Scientist
CSIRO Land and Water
Australia
[[alternative HTML version deleted]]
