Chris Hergarten
2013-May-10 13:01 UTC
[R] Allocating outputs from code using foreach and doPar
Dear R-users,
I am looking for a solution to "parallelize" my PLSR predictions in
order to save processing time. I was trying to use the "foreach"
construct with "doPar" (cf. 2nd part of code below), but I was unable
to allocate the predicted values and the model performance parameters (RMSEP) to
the output variable (all in the 2nd part).
My code:
set.seed(10000) # generate some data...
mat <- replicate(100, rnorm(100))
y <- as.matrix(mat[,1], drop=F)
x <- mat[,2:100]
eD <- dist(x, method = "euclidean") # distance matrix to find
close samples
eDm <- as.matrix(eD)
kns <- matrix(NA,nrow(x),10) # empty matrix to allocate 10 closest samples
for (i in 1:nrow(eDm)) { # identify closest samples in a loop and allocate to
kns kns[i,] <- head(order(eDm[,i]), 11)[-1]
}
So far I consider the code as "safe", but the next part is challenging
me, since I never used the "foreach" construct before:
library(pls) library(foreach) library(doParallel) cl <- makeCluster(2)
registerDoParallel(cl) out <- foreach(j = 1:nrow(mat),
.combine="rbind", .packages="pls") %dopar% { pls <-
plsr(y ~ x, ncomp=5, validation="CV", , subset=kns[j,]) predict(pls,
ncomp=5, newdata=x[j,,drop=F]) RMSEP(pls, estimate="CV")$val[1,1,5] }
stopCluster(cl)
As I understand, the 3rd-to-last code line starting with
"RMSEP(pls,..." is simply overwriting the previously written data from
the "predict" code line. Somehow I was assuming theĀ
.combineĀ option would take care of this?
Many thanks for your help!
Best, Chega
[[alternative HTML version deleted]]