Jairaj Sathyanarayana
2013-Dec-06 07:27 UTC
[R] How to concatenate the results from parallelized nested foreach loops
Hi all,
I am working with data.table objects within nested foreach loops and I am
having trouble creating the results object the way I would prefer.
Code below with sample data:
library(iterators)
library(data.table)
library(foreach)
#generate dummy data
set.seed(1212)
sample1 <- data.frame(parentid=round((runif(50000, min=1, max=50000))),
childid=round(runif(100000, min=1, max=100000)))
length(unique(sample1$parentid))
#get unique parents
sample1uniq <- as.data.frame(unique(sample1$parentid))
names(sample1uniq) <- "parentid"
#convert original dataset to data.table
sample1 <- data.table(sample1)
setkey(sample1,parentid)
#convert unique ids to data.table
sample1uniq <- data.table(sample1uniq)
setkey(sample1uniq,parentid)
#a random sample of 5K to users to scan against
sample2uniq_idx <- sample(1:nrow(sample1uniq), size=5000)
sample2uniq <- sample1uniq[sample2uniq_idx]
sample2uniq <- data.table(sample2uniq)
setkey(sample2uniq,parentid)
#construct iterators
sample1uniq_iter <- iter(sample1uniq)
sample2uniq_iter <- iter(sample2uniq)
outerresults <- foreach (x = sample1uniq_iter, .combine=rbind,
.packages=c('foreach','doParallel', 'data.table'))
%dopar% {
b <- sample1[J(x)] #ith parent
b2 <- as.data.frame(b)[,2] #ith parent's children
foreach (y = sample2uniq_iter, .combine=rbind) %dopar% {
c <- sample1[J(y)] #jth parent
c2 <- as.data.frame(c)[,2] #jth parent's children
common <- length(intersect(b2, c2))
if (common>0) {
uni <- length(union(b2, c2))
results <- list(u1=x, u2=y, inter=common, union=uni)
}
}
}
Note that all tasks can be done in parallel with no dependency issues.
I was expecting the results to come out like this (made up):
u1 u2 inter union
1 2 10 20
1 3 4 10
1 4 7 15
1 5 6 10
2 3 10 20
2 4 4 10
3 5 7 10
4 5 6 10
But they don't. Do I need to implement a different combine function? Any
other ideas/help will be appreciated.
thx
[[alternative HTML version deleted]]