thr3ads.net - R help - [R] Allocating outputs from code using foreach and doPar [May 2013]

If this information is useful, please help other people find it:
Share via:

Chris Hergarten

2013-May-10 13:01 UTC

[R] Allocating outputs from code using foreach and doPar

Dear R-users,
I am looking for a solution to "parallelize" my PLSR predictions in
order to save processing time. I was trying to use the "foreach"
construct with "doPar" (cf. 2nd part of code below), but I was unable
to allocate the predicted values and the model performance parameters (RMSEP) to
the output variable (all in the 2nd part).
My code:
set.seed(10000)   # generate some data...
mat <- replicate(100, rnorm(100))
y <- as.matrix(mat[,1], drop=F)
x <- mat[,2:100]
eD <- dist(x, method = "euclidean")  # distance matrix to find
close samples
eDm <- as.matrix(eD)
kns <- matrix(NA,nrow(x),10)  # empty matrix to allocate 10 closest samples
for (i in 1:nrow(eDm)) {   # identify closest samples in a loop and allocate to
kns kns[i,] <- head(order(eDm[,i]), 11)[-1]
} 
So far I consider the code as "safe", but the next part is challenging
me, since I never used the "foreach" construct before:
library(pls) library(foreach) library(doParallel) cl <- makeCluster(2)
registerDoParallel(cl) out <- foreach(j = 1:nrow(mat),
.combine="rbind", .packages="pls") %dopar% { pls <-
plsr(y ~ x, ncomp=5, validation="CV", , subset=kns[j,]) predict(pls,
ncomp=5, newdata=x[j,,drop=F]) RMSEP(pls, estimate="CV")$val[1,1,5] }
stopCluster(cl)
As I understand, the 3rd-to-last code line starting with
"RMSEP(pls,..." is simply overwriting the previously written data from
the "predict" code line. Somehow I was assuming the 
.combine option would take care of this?
Many thanks for your help!
Best, Chega
	[[alternative HTML version deleted]]

R help - May 2013 - Allocating outputs from code using foreach and doPar

[R] Allocating outputs from code using foreach and doPar