Hi, this question is meant to be a bit vague, since I'm really not familiar with all the issues involved. It's also a problem that I think many would have encountered and would have useful suggestions. According to MASS, 2nd ed., p. 158, "A major issue is that S is designed to be able to back out from uncompleted calculations, so that memory used in intermediate calculations is retained until they are committed." The end of the paragraph says that recent versions of S are better. And I seem to remember that R is also better with memory management. My question I guess is how bad loops are in R and what the best way is to deal with memory if I want to use loops. Are there flags I can turn on or off if I don't care about uncompleted calculations to make memory management more efficient, etc.? I've already thought about pushing looping into shell scripts and hope that this will help somewhat. The issue I'm facing is that it'd be easier for me to write a program using a loop, but the loop itself would be a large one (looping over potentially millions of entries say, with fairly nontrivial calculations per loop). Right now I've tried to avoid this through some pre-processing followed by a more complex set of calculations than I would have needed to do had I used a loop. Of course it's hard to know which method is better without actually trying both, but I would like to hear your comments on loops in R before deciding whether I should give the loop approach a try. Thanks.
On Tue, 6 May 2003, R A F wrote:> Hi, this question is meant to be a bit vague, since I'm really not > familiar with all the issues involved. It's also a problem that I > think many would have encountered and would have useful suggestions. > > According to MASS, 2nd ed., p. 158, "A major issue is that S is > designed to be able to back out from uncompleted calculations, so > that memory used in intermediate calculations is retained until > they are committed." The end of the paragraph says that recent > versions of S are better. And I seem to remember that R is also > better with memory management.That was written in 1996, when rich people had 64Mb of RAM and teaching labs often had 4 or 8Mb (and R would not run much of the code in the book and crashed quite often). Take a look at `S Programming' for a less ancient view. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I'm afraid that I don't have your new book with Venables handy. So would it be fair to assume that there's no real need to avoid loops these days? Thanks again.>From: Prof Brian Ripley <ripley at stats.ox.ac.uk> >To: R A F <raf1729 at hotmail.com> >CC: r-help at stat.math.ethz.ch >Subject: Re: [R] Loops and memory >Date: Tue, 6 May 2003 20:03:08 +0100 (BST)>That was written in 1996, when rich people had 64Mb of RAM and teaching >labs often had 4 or 8Mb (and R would not run much of the code in the book >and crashed quite often). Take a look at `S Programming' for a less >ancient view.
Interesting. The other day I was surprised by how much longer a for loop takes to add two vectors a and b compared to a + b. (I think that I made a and b have a million entries.) I guess my problem is that I don't really what the issues are, I guess, so it's not clear to me when and where loops should be avoided. I guess I should try to get a copy of this new book to find out. Thanks again.>From: Prof Brian Ripley <ripley at stats.ox.ac.uk> >To: R A F <raf1729 at hotmail.com> >CC: r-help at stat.math.ethz.ch >Subject: Re: [R] Loops and memory >Date: Tue, 6 May 2003 20:30:22 +0100 (BST) > >On Tue, 6 May 2003, R A F wrote: > > > I'm afraid that I don't have your new book with Venables handy. So > > would it be fair to assume that there's no real need to avoid loops > > these days? > >No, but the issues are different from those in 1996. It is a lot less >common to have to avoid loops, simply because memory can often be >squandered. But vectorizing calculations still pays off, sometimes >handsomely: there is an example in that book of going from several hours >to one second (and it's a real example).