Chris Hergarten
2013-May-10 13:01 UTC
[R] Allocating outputs from code using foreach and doPar
Dear R-users, I am looking for a solution to "parallelize" my PLSR predictions in order to save processing time. I was trying to use the "foreach" construct with "doPar" (cf. 2nd part of code below), but I was unable to allocate the predicted values and the model performance parameters (RMSEP) to the output variable (all in the 2nd part). My code: set.seed(10000) # generate some data... mat <- replicate(100, rnorm(100)) y <- as.matrix(mat[,1], drop=F) x <- mat[,2:100] eD <- dist(x, method = "euclidean") # distance matrix to find close samples eDm <- as.matrix(eD) kns <- matrix(NA,nrow(x),10) # empty matrix to allocate 10 closest samples for (i in 1:nrow(eDm)) { # identify closest samples in a loop and allocate to kns kns[i,] <- head(order(eDm[,i]), 11)[-1] } So far I consider the code as "safe", but the next part is challenging me, since I never used the "foreach" construct before: library(pls) library(foreach) library(doParallel) cl <- makeCluster(2) registerDoParallel(cl) out <- foreach(j = 1:nrow(mat), .combine="rbind", .packages="pls") %dopar% { pls <- plsr(y ~ x, ncomp=5, validation="CV", , subset=kns[j,]) predict(pls, ncomp=5, newdata=x[j,,drop=F]) RMSEP(pls, estimate="CV")$val[1,1,5] } stopCluster(cl) As I understand, the 3rd-to-last code line starting with "RMSEP(pls,..." is simply overwriting the previously written data from the "predict" code line. Somehow I was assuming theĀ .combineĀ option would take care of this? Many thanks for your help! Best, Chega [[alternative HTML version deleted]]