I am experimenting with parallel processing using foreach and seem to be missing something fundamental. Cool stuff. I've gone through the list and seen a couple of closely related issues, but nothing I've tried seems to work. I know that the results from foreach are combined, but what if there is more than one variable within the loop? Below is a snippet (non-functioning) of code that I hope provides enough insight into what I am trying to do. The commented out lines are what I would be doing (successfully) if I wasn't trying to implement the %dopar% . The goal is to do statistics on the sequence of lambda vectors that were originally accumulated in the matrix lambdas using cbind. Thanks in advance for any suggestions, Dave ---------------snip update_N <- function(sets, indexes, lam) { n <- length(indexes)-1 # count of events N <- rep(0, K) # count of failures per node for (i in 1:n) { nodes <- as.numeric(sets[indexes[i]:(indexes[i+1]-1)]) node <- resample(nodes, 1, prob=lam[nodes]/sum(lam[nodes])) N[node] = N[node] + 1 } N } lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% { N <- update_N(min_sets, min_sets_indexes, lambda) lambda <- rgamma(K, shape=a+N, rate=bT) lambda if (j%%100==0) { print(j); print(lambda); print(N)} # if (j > burn_in) { # lambdas <- cbind(lambdas, lambda) # } } ---------------snip
Hi, On Tue, Feb 8, 2011 at 6:18 PM, Robinson, David G <drobin at sandia.gov> wrote:> I am experimenting with parallel processing using foreach and seem to be > missing something fundamental. Cool stuff. I've gone through the list and > seen a couple of closely related issues, but nothing I've tried seems to > work. > > I know that the results from foreach are combined, but what if there is more > than one variable within the loop? ?Below is a snippet (non-functioning) of > code that I hope provides enough insight into what I am trying to do. ?The > commented out lines are what I would be doing (successfully) if I wasn't > trying to implement the %dopar% . The goal is to do statistics on the > sequence of lambda vectors that were originally accumulated in the matrix > lambdas using cbind. > > Thanks in advance for any suggestions, > Dave > > ---------------snip > update_N <- function(sets, indexes, lam) { > ? ?n <- length(indexes)-1 ? ?# count of events > ? ?N <- rep(0, K) # count of failures per node > ? ?for (i in 1:n) { > ? ? ? ?nodes <- as.numeric(sets[indexes[i]:(indexes[i+1]-1)]) > ? ? ? ?node <- resample(nodes, 1, prob=lam[nodes]/sum(lam[nodes])) > ? ? ? ?N[node] = N[node] + 1 > ? ?} > ? ?N > } > > lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% { > ? ?N <- update_N(min_sets, min_sets_indexes, lambda) > ? ?lambda <- rgamma(K, shape=a+N, rate=bT) > ? ?lambda > ? ?if (j%%100==0) { print(j); print(lambda); print(N)} > # ? ?if (j > burn_in) { > # ? ? ? ?lambdas <- cbind(lambdas, lambda) > # ? ?} > } > > ---------------snipSorry -- I don't get what you're asking/trying to do. Is it a coincidence that your commented block uses the same variable name as the one you are assigning the result of foreach() to? Essentially, foreach will work just like an lapply ... if you changed foreach to lapply here, what do you expect that %dopar% {} block to return after each iteration? I'm not sure if this is what you're asking, but if you want to return two elements per iteration in your loop, just return a list with two elements, and post process it later. I'd start by trying to remove your .combine=cbind param/argument from the foreach() function and get your code running so you get the right "things" returned as a normal list (or list of lists, if you want to return > 1 thing per foreach iteration). Once that's done, you can try to auto 'cbind' your things if you think it's necessary. Sorry if this isn't helpful .. it's not clear to me what you're trying to do, so I'm kind of stabbing at the dark here. -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Steve, Thanks for taking the time to look at the question. my apologies for the confusing post. In an attempt to keep the post short, I seem to have confused the issue. The variable of interest in each iteration is the vector lambda and the goal is to collect all the lambda vectors and characterize the statistics of lambda over the course of the simulation (this is just a simply gibbs sampler) . In the series processing world I simply use cbind to accumulate the lambda vectors into an array called lambdas (as performed in commented out commands). What I am trying to do now is use a combination of foreach/dopar to do the same type of accumulation. I am not trying to capture any other variables from the loop except lambda. As you suggested, I have tried removing the .combine argument and simply collect the resulting list. Unfortunately, the lambda vectors don''t appear in the resulting list. Thanks again for taking the time to try to figure this out. Cheers, Dave On 2/8/11 7:47 PM, "Steve Lianoglou" <mailinglist.honeypot@gmail.com> wrote: Hi, On Tue, Feb 8, 2011 at 6:18 PM, Robinson, David G <drobin@sandia.gov> wrote:> I am experimenting with parallel processing using foreach and seem to be > missing something fundamental. Cool stuff. I''ve gone through the list and > seen a couple of closely related issues, but nothing I''ve tried seems to > work. > > I know that the results from foreach are combined, but what if there is more > than one variable within the loop? Below is a snippet (non-functioning) of > code that I hope provides enough insight into what I am trying to do. The > commented out lines are what I would be doing (successfully) if I wasn''t > trying to implement the %dopar% . The goal is to do statistics on the > sequence of lambda vectors that were originally accumulated in the matrix > lambdas using cbind. > > Thanks in advance for any suggestions, > Dave > > ---------------snip > update_N <- function(sets, indexes, lam) { > n <- length(indexes)-1 # count of events > N <- rep(0, K) # count of failures per node > for (i in 1:n) { > nodes <- as.numeric(sets[indexes[i]:(indexes[i+1]-1)]) > node <- resample(nodes, 1, prob=lam[nodes]/sum(lam[nodes])) > N[node] = N[node] + 1 > } > N > } > > lambdas<- foreach(j=1:(2*burn_in), .combine=cbind) %dopar% { > N <- update_N(min_sets, min_sets_indexes, lambda) > lambda <- rgamma(K, shape=a+N, rate=bT) > lambda > if (j%%100==0) { print(j); print(lambda); print(N)} > # if (j > burn_in) { > # lambdas <- cbind(lambdas, lambda) > # } > } > > ---------------snipSorry -- I don''t get what you''re asking/trying to do. Is it a coincidence that your commented block uses the same variable name as the one you are assigning the result of foreach() to? Essentially, foreach will work just like an lapply ... if you changed foreach to lapply here, what do you expect that %dopar% {} block to return after each iteration? I''m not sure if this is what you''re asking, but if you want to return two elements per iteration in your loop, just return a list with two elements, and post process it later. I''d start by trying to remove your .combine=cbind param/argument from the foreach() function and get your code running so you get the right "things" returned as a normal list (or list of lists, if you want to return > 1 thing per foreach iteration). Once that''s done, you can try to auto ''cbind'' your things if you think it''s necessary. Sorry if this isn''t helpful .. it''s not clear to me what you''re trying to do, so I''m kind of stabbing at the dark here. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
Hi David, I'm CC-ing R-help inorder to finish this one off ;-) On Wed, Feb 9, 2011 at 10:59 AM, Robinson, David G <drobin at sandia.gov> wrote: [snip]> One of you comments pointed me in the right direction and I found the > problem. I simply commented out the line " if (j%%100==0) { ...print(N)}" > and the original program ran fine. ?Not sure I understand why, but... it > runs.[/snip] It's because the last line of a "block" or "function" or whatever is the implicit return value of that block/function (as you already know -- the last line of your `update_N` function is `N`, which means that's the value you want that function to return). The last line of your the "block" inside your %dopar% { ... } was in if-statement and not the value `lambda` that you wanted to return. As a result the return value of your block was the result of that if-statement. Keep in mind that in R, even `if` statements return values, eg: x <- if (FALSE) { 1 } else { 2 } In the case above, x will be set to 2. Does that make it more clear now why your lambda vector wasn't being returned (and further processed) after each iteration of your foreach loop? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact