Paul.Rustomji at csiro.au
2011-Sep-17 13:10 UTC
[R] Problem using SNOW with data frame as a function argument
Hello I would like to use SNOW to parallelise some computations to be made on columns of a data frame, using different parameter values for each SNOW "worker". I gather(?) clusterMap() is the appropriate SNOW function to do something like this. I suspect the problem lies in the fact that I am only supplying one data frame argument for the flow.dat function argument yet the a, b, and x arguments have ten values each. I tried with RECYCLE=TRUE but still didn't work. I have generated some example data below that illustrates my problem. #example input data frames mydat <- data.frame(a.in=1:10,b.in=1:10,x.in=1:10) flow.dat <- data.frame(ww=100:105,zz=600:605) #define the function myfun<- function(a,b,x,flow.dat){ + ee <- a+b+x + ff<- mean(flow.dat[,1]) + return(ff) + } #apply the function as per normal print(myfun(a=mydat$a.in, + b=mydat$b.in, + x=mydat$x.in, + flow.dat=flow.dat)) [1] 102.5 #works OK, average of column one of data frame looks good #a,b and x parameters read in OK , ee gets calculated but not returned #now try to apply the function in parallel via SNOW cl <- makeCluster(3,type="SOCK") #make a cluster ll <- clusterMap(cl,fun=myfun, + a=mydat$a.in, + b=mydat$b.in, + x=mydat$x.in, + flow.dat=flow.dat)>Error in checkForRemoteErrors(val) :10 nodes produced errors; first error: incorrect number of dimensions stopCluster(cl) _______________________________________________________ Here is system info> Sys.info()sysname release version nodename "Windows" "Server 2008 x64" "build 7601, Service Pack 1" "POWERAPP4-WRON" machine login user "x86-64" "xxxxxx" "xxxxxx" $version.string [1] "R version 2.12.1 (2010-12-16)" Paul Rustomji Research Scientist CSIRO Land and Water Australia [[alternative HTML version deleted]]