Hello, I know of some various methods out there to utilize multiple processors but am not sure what the best solution would be. First some things to note: I'm running dependent simulations, so direct parallel coding is out (multicore, doSnow, etc). I'm on Windows, and don't know C. I don't plan on learning C or any of the *nix languages. My main concern deals with Multiple analyses on large data sets. By large I mean that when I'm done running 2 simulations R is using ~3G of RAM, the remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic to compare the two resulting samples, grinding the process to a halt. I'd like to have separate cores simultaneously run each analysis. That will save on time and I'll have to ponder the BGR calculation problem another way. Can R temporarily use HD space to write calculations to instead of RAM? The second concern boils down to whether or not there is a way to split up dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1 to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and better yet, a third to run all the pre-and post- processing tidbits! So if anyone has any suggestions as to a direction I can look into, it would be appreciated. Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-633-STAT(7828) [[alternative HTML version deleted]]
On Wed, 8 Jun 2011, Robin Jeffries wrote:> Hello, > > I know of some various methods out there to utilize multiple processors but > am not sure what the best solution would be. First some things to note: > I'm running dependent simulations, so direct parallel coding is out > (multicore, doSnow, etc). > I'm on Windows, and don't know C. I don't plan on learning C or any of the > *nix languages.By restricting yourself to one of the least capable OS R runs on, you are making this harder for yourself.> My main concern deals with Multiple analyses on large data sets. By large I > mean that when I'm done running 2 simulations R is using ~3G of RAM, the > remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic > to compare the two resulting samples, grinding the process to a halt. I'd > like to have separate cores simultaneously run each analysis. That will save > on time and I'll have to ponder the BGR calculation problem another way. Can > R temporarily use HD space to write calculations to instead of RAM?By using virtual memory (R does not in fact use RAM, it always uses virtual memory). With a 64bit R you can use up to terabytes of VM. Because Windows' disc access is so slow, you will need to set a max-memory-size larger than your RAM size to enable this.> The second concern boils down to whether or not there is a way to split up > dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1 > to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate > b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2,As stated, that is pointless. The core running FUN2 would be waiting for the resuls of FUN1. However, at time t FUN1 could generate a(t+1) from a(t-1) whilst FUN2 generates b(t) and c(t).> and better yet, a third to run all the pre-and post- processing tidbits!Look into package snow (with socket clusters). The overhead of what you ask may be too high (POSIX OSes can use package multicore, which has a much lower overhead), but if the calculations are slow enough it may be worthwhile. There are Windows-oriented examples in package RSiena.> > > So if anyone has any suggestions as to a direction I can look into, it would > be appreciated. > > > Robin Jeffries > MS, DrPH Candidate > Department of Biostatistics > UCLA > 530-633-STAT(7828) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
----------------------------------------> From: rjeffries at ucla.edu > Date: Wed, 8 Jun 2011 20:54:45 -0700 > To: r-help at r-project.org > Subject: [R] Resources for utilizing multiple processors > > Hello, > > I know of some various methods out there to utilize multiple processors but > am not sure what the best solution would be. First some things to note: > I'm running dependent simulations, so direct parallel coding is out > (multicore, doSnow, etc).the> *nix languages.Well, for the situation below you seem to want a function server. You could consider Rapache and just write this like a big web application. A web server, like a DB, is not the first thing you think of with high performance computing but if your computationally intenstive tasks are in native code this could be a reasoanble overhead that requires little learning. If you literally means cores instead of machines keep in mind that cores can end up fighting over resources, like memory ( this cites IEEE article with cores making things worse in non-contrived case) http://lists.boost.org/boost-users/2008/11/42263.php I think people have mentioned some classes like bigmemory, I forget the names exactly, that let you handle larger things. Launching a bunch of threads and letting VM thrash can easily make things slower quickly. I guess a better approach would be to get an implementation that is block oriented and you can do the memory/file stuff in R until they get a data frame that uses disk transparently and with hints on expected access patterns ( prefetch etc).> > My main concern deals with Multiple analyses on large data sets. By large I > mean that when I'm done running 2 simulations R is using ~3G of RAM, the > remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic > to compare the two resulting samples, grinding the process to a halt. I'd > like to have separate cores simultaneously run each analysis. That will save > on time and I'll have to ponder the BGR calculation problem another way. Can > R temporarily use HD space to write calculations to instead of RAM? > > The second concern boils down to whether or not there is a way to split up > dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1 > to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate > b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and[[elided Hotmail spam]]> > > So if anyone has any suggestions as to a direction I can look into, it would > be appreciated. > > > Robin Jeffries > MS, DrPH Candidate > Department of Biostatistics > UCLA > 530-633-STAT(7828) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 06/08/2011 08:54 PM, Robin Jeffries wrote:> Hello, > > I know of some various methods out there to utilize multiple processors but > am not sure what the best solution would be. First some things to note: > I'm running dependent simulations, so direct parallel coding is out > (multicore, doSnow, etc). > I'm on Windows, and don't know C. I don't plan on learning C or any of the > *nix languages. > > My main concern deals with Multiple analyses on large data sets. By large I > mean that when I'm done running 2 simulations R is using ~3G of RAM, the > remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic > to compare the two resulting samples, grinding the process to a halt. I'd > like to have separate cores simultaneously run each analysis. That will save > on time and I'll have to ponder the BGR calculation problem another way. Can > R temporarily use HD space to write calculations to instead of RAM? > > The second concern boils down to whether or not there is a way to split up > dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1 > to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate > b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and > better yet, a third to run all the pre-and post- processing tidbits!If FUN1 is independent of b() and c(), perhaps the example at the bottom of ?socketConnection points in a useful direction -- start one R to calculate a(t) and send the result to a socket connection, then move on to a(t+1). Start a second R to read from the socket connection and do FUN2(t), . You'll be able to overlap the computations and double throughput; the 'pipeline' could be extended with pre- and post-processing workers, too, though one would want to watch out for the complexity of managing this. Martin> > > So if anyone has any suggestions as to a direction I can look into, it would > be appreciated. > > > Robin Jeffries > MS, DrPH Candidate > Department of Biostatistics > UCLA > 530-633-STAT(7828) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793