On Fri, Dec 21, 2012 at 10:42 AM, Chris Hergarten <chegaga at yahoo.com>
wrote:> Dear R-users
>
> I was running into problems with my R code trying to run clh sampling (clhs
package) in parallel mode (=on various data sets simultaneously).
>
> Here is the code (which I developed with some help:)):
> ******************************************
> library("clhs")
> library("snow")
> a <- as.data.frame(replicate(1000, rnorm(20)))
> b <- as.data.frame(replicate(1000, rnorm(20)))
> c <- as.data.frame(replicate(1000, rnorm(20)))
> d <- as.data.frame(replicate(1000, rnorm(20)))
> abcd <- list(a, b, c, d)
> cl <- makeCluster(4)
> results <- parLapply(cl,
> X = abcd,
> FUN = function(i) {
> clhs(x = i, size = round(nrow(i) / 5), iter = 2000, simple = FALSE)
> },
> )
> stopCluster(cl)
> ******************************************
>
> Before running the last line, R is throwing an error: "Error in
length(x) : 'x' is missing". Any ideas what I am doing wrong and
how to improve?
>
Loading clhs on the primary does not automatically load it on the workers. Try:
clusterEvalQ(cl, library(clhs))
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com