Jairaj Sathyanarayana
2013-Dec-06 07:27 UTC
[R] How to concatenate the results from parallelized nested foreach loops
Hi all, I am working with data.table objects within nested foreach loops and I am having trouble creating the results object the way I would prefer. Code below with sample data: library(iterators) library(data.table) library(foreach) #generate dummy data set.seed(1212) sample1 <- data.frame(parentid=round((runif(50000, min=1, max=50000))), childid=round(runif(100000, min=1, max=100000))) length(unique(sample1$parentid)) #get unique parents sample1uniq <- as.data.frame(unique(sample1$parentid)) names(sample1uniq) <- "parentid" #convert original dataset to data.table sample1 <- data.table(sample1) setkey(sample1,parentid) #convert unique ids to data.table sample1uniq <- data.table(sample1uniq) setkey(sample1uniq,parentid) #a random sample of 5K to users to scan against sample2uniq_idx <- sample(1:nrow(sample1uniq), size=5000) sample2uniq <- sample1uniq[sample2uniq_idx] sample2uniq <- data.table(sample2uniq) setkey(sample2uniq,parentid) #construct iterators sample1uniq_iter <- iter(sample1uniq) sample2uniq_iter <- iter(sample2uniq) outerresults <- foreach (x = sample1uniq_iter, .combine=rbind, .packages=c('foreach','doParallel', 'data.table')) %dopar% { b <- sample1[J(x)] #ith parent b2 <- as.data.frame(b)[,2] #ith parent's children foreach (y = sample2uniq_iter, .combine=rbind) %dopar% { c <- sample1[J(y)] #jth parent c2 <- as.data.frame(c)[,2] #jth parent's children common <- length(intersect(b2, c2)) if (common>0) { uni <- length(union(b2, c2)) results <- list(u1=x, u2=y, inter=common, union=uni) } } } Note that all tasks can be done in parallel with no dependency issues. I was expecting the results to come out like this (made up): u1 u2 inter union 1 2 10 20 1 3 4 10 1 4 7 15 1 5 6 10 2 3 10 20 2 4 4 10 3 5 7 10 4 5 6 10 But they don't. Do I need to implement a different combine function? Any other ideas/help will be appreciated. thx [[alternative HTML version deleted]]