Benjamin Caldwell
2013-Apr-24  21:50 UTC
[R] getting started in parallel computing on a windows OS
Dear R help, I've what I think is a fairly simple parallel problem, and am getting bogged down in documentation and packages for much more complex situations. I have a big matrix (30^5,5]. I have a function that will act on each row of that matrix sequentially and output the 'best' result from the whole matrix (it compares the result from each row to the last and keeps the 'better' result). I would like to divide that first large matrix into chunks equal to the number of cores I have available to me, and work through each chunk, then output the results from each chunk. I'm really having trouble making head or tail of how to do this on a windows machine - lots of different false starts on several different packages now. Basically, I have the function, and I can of course easily divide the matrix into chunks. I just need a way to process each chunk in parallel (other than opening new R sessions for each core manually). Any help much appreciated - after two days of trying to get this to work I'm pretty burnt out. Thanks *Ben Caldwell* [[alternative HTML version deleted]]
Martin Morgan
2013-Apr-24  23:34 UTC
[R] getting started in parallel computing on a windows OS
On 04/24/2013 02:50 PM, Benjamin Caldwell wrote:> Dear R help, > > I've what I think is a fairly simple parallel problem, and am getting > bogged down in documentation and packages for much more complex situations. > > I have a big matrix (30^5,5]. I have a function that will act on each row > of that matrix sequentially and output the 'best' result from the whole > matrix (it compares the result from each row to the last and keeps the > 'better' result). I would like to divide that first large matrix into > chunks equal to the number of cores I have available to me, and work > through each chunk, then output the results from each chunk. > > I'm really having trouble making head or tail of how to do this on a > windows machine - lots of different false starts on several different > packages now. Basically, I have the function, and I can of course easily > divide the matrix into chunks. I just need a way to process each chunk > in parallel (other than opening new R sessions for each core manually). > > Any help much appreciated - after two days of trying to get this to work > I'm pretty burnt out.Hi Ben -- in your code from this morning you had a function fitting <- function(ndx.grd=two,dt.grd=one,ind.vr='ind',rsp.vr='res') { ## ... setup for(i in 1:length(ndx.grd[,1])){ ## ... do work } ## ... collate results } that you're trying to run in parallel. Obviously the ## ... represent lines I've removed. When you say something like y <- foreach(icount(length(two))) %dopar% fitting() its saying that you want to run fitting() length(two) times. So you're actually doing the same thing length(two) times, whereas you really want to divide the work thats inside fitting() into chunks, and do those on separate cores! Conceptually what you'd like to do is fit_one <- function(idx, ndx.grd, dt.grd, ind.vr, rsp.vr) { ## ... do work on row idx _ONLY_ } and then evaluate with ## ... setup y <- foreach (idx = icount(nrow(two)) %dopar% one_fit(idx, two, one, "ind", "res") ## ... collate so that fit_one fits just one of your combinations. foreach will worry about distributing the work. Make sure that fit_one works first, before trying to run this in parallel; your use of try(), trying to fit different data types (character, integer, numeric) into a matrix rather than data.frame, and the type coercions all indicate that you're fighting with R rather than working with it. Hope that helps, Martin> > Thanks > > *Ben Caldwell* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
Maybe Matching Threads
- help with execution of 'embarrassingly parallel' problem using foreach, doParallel on a windows system
- Parallel Scan of Large File
- help with parallel processing code
- Foreach %dopar% operator incorrectly load balancing
- snow makeCluster (makeSOCKcluster) not working in R-2.11