Full_Name: Hugh C. Pumphrey Version: 1.4.1 OS: Linux (Debian Woody) Submission from: (NULL) (129.215.133.170) The function readChar() appears to have some type of problem with memory allocation. I don't know if "memory leak" is the correct term but if one uses readChar() many times, the R binary grows in size until it eats all your memory and swap space. The code enclosed below demonstrates the problem. As-is, it causes the size of R.bin to grow from 11MB to 50MB, even though there are no large arrays. The old Rstreams package did not do this. Also, the function readBin() does not suffer from this problem. ## Stress read/write of files in binary mode to see if there is a memory leak ## Now, readBin seems to be innocent. The memory leak is in readChar header <-paste("This is a very long and boring text header which appears", "at the beginning of each chunk of an even longer and duller", "file which contains mostly binary data") nchars <- nchar(header) nrecs <-2000 ntries <- 100 ## Write a test file stream <- file("/tmp/gunge") open(stream,open="wb") for(irec in 1:nrecs){ ## This writes null-ended strings unless you use the eos=NULL option writeChar(header,stream,eos=NULL) } close(stream) ## Read in the file ntries times. In real applications one would be reading ## ntries _different_ files and calculating some summary statistics for(itry in 1:ntries){ stream <- file("/tmp/gunge") open(stream,open="rb") if(itry %% 10== 0) print(itry) for(irec in 1:nrecs){ iheader <- readChar(stream,nchars) } close(stream) } -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
It's not really a memory leak, more that R_alloc was at the time incorrectly documented and do_readchar needs to reset vmax. (Rstreams used .C, and that did reset vmax, whereas .Internal does not.) It's actually being done rather inefficiently, as this seems an unusual case. For 1.6.0 I am planning on using alloca (where available) for this sort of short-term buffer. I'll put the two-line fix in for 1.5.0. On Fri, 26 Apr 2002 hcp@met.ed.ac.uk wrote:> Full_Name: Hugh C. Pumphrey > Version: 1.4.1 > OS: Linux (Debian Woody) > Submission from: (NULL) (129.215.133.170) > > > The function readChar() appears to have some type of problem with memory > allocation. > I don't know if "memory leak" is the correct term but if one uses readChar() > many > times, the R binary grows in size until it eats all your memory and swap space. > > The code enclosed below demonstrates the problem. As-is, it causes the size > of R.bin to grow from 11MB to 50MB, even though there are no large arrays. > The old Rstreams package did not do this. Also, the function readBin() does > not suffer from this problem. > > ## Stress read/write of files in binary mode to see if there is a memory leak > ## Now, readBin seems to be innocent. The memory leak is in readChar > header <-paste("This is a very long and boring text header which appears", > "at the beginning of each chunk of an even longer and duller", > "file which contains mostly binary data") > nchars <- nchar(header) > nrecs <-2000 > ntries <- 100 > > ## Write a test file > stream <- file("/tmp/gunge") > open(stream,open="wb") > for(irec in 1:nrecs){ > ## This writes null-ended strings unless you use the eos=NULL option > writeChar(header,stream,eos=NULL) > } > close(stream) > > ## Read in the file ntries times. In real applications one would be reading > ## ntries _different_ files and calculating some summary statistics > for(itry in 1:ntries){ > stream <- file("/tmp/gunge") > open(stream,open="rb") > if(itry %% 10== 0) print(itry) > for(irec in 1:nrecs){ > iheader <- readChar(stream,nchars) > } > close(stream) > } > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
hcp@met.ed.ac.uk writes:> Full_Name: Hugh C. Pumphrey > Version: 1.4.1 > OS: Linux (Debian Woody) > Submission from: (NULL) (129.215.133.170) > > > The function readChar() appears to have some type of problem with memory > allocation. > I don't know if "memory leak" is the correct term but if one uses readChar() > many > times, the R binary grows in size until it eats all your memory and swap space. > > The code enclosed below demonstrates the problem. As-is, it causes the size > of R.bin to grow from 11MB to 50MB, even though there are no large arrays. > The old Rstreams package did not do this. Also, the function readBin() does > not suffer from this problem.Still there in a recent pre-1.5.0 looks like a bunch of Vcells is allocated but remains protected from garbage collection. I doubt that this is something we'd dare try and fix before the release on Monday though. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
A few days ago, I wrote> > > > The function readChar() appears to have some type of problem with > >memory allocation.On Fri, 26 Apr 2002 ripley@stats.ox.ac.uk wrote:> It's not really a memory leak, more that R_alloc was at the time > incorrectly documented and do_readchar needs to reset vmax.[snip]> I'll put the two-line fix in for 1.5.0.I originally said that this problem didn't occur in readBin(). That seems to be true if you are reading 8-byte data. However, I stressed this a bit harder, and found that if you use the size=4 keyword then the problem _does_ appear with readBin(). I enclose an example below. Apologies if the original bug report was misleading and again, many thanks Hugh ## Stress read/write of files in binary mode to see if there is a ## memory problem. Now, readBin seems to be innocent if the default ## (8-byte) reals are used, but there is a problem reading 4-bute reals. nrecs <-2000 ntries <- 100 npts <- 300 gunge <- 1:npts ## Write a test file stream <- file("/tmp/gunge") open(stream,open="wb") for(irec in 1:nrecs){ writeBin(gunge,"numeric",size=4) } close(stream) ## Read in the file ntries times. In real applications one would be reading ## ntries _different_ files and calculating some summary statistics for(itry in 1:ntries){ stream <- file("/tmp/gunge") open(stream,open="rb") if(itry %% 10== 0) print(itry) for(irec in 1:nrecs){ ## Seems to be worse if reads are in several small chunks ## bunge <- readBin(stream,"numeric",10,size=4) bunge <- readBin(stream,"numeric",20,size=4) bunge <- readBin(stream,"numeric",10,size=4) bunge <- readBin(stream,"numeric",20,size=4) bunge <- readBin(stream,"numeric",npts-60,size=4) } close(stream) } ============S=u=p=p=o=r=t===D=e=b=i=a=n===debian.org===========Dr. Hugh C. Pumphrey | Tel. 0131-650-6026,Fax:0131-650-5780 Institute for Meteorology | Replace 0131 with +44-131 if outside UK The University of Edinburgh | Email hcp@met.ed.ac.uk EDINBURGH EH9 3JZ, Scotland | URL: met.ed.ac.uk/~hcp ============S=u=p=p=o=r=t==g=9=5==g95.sourceforge.net/============= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Yes, we know of a couple of other leaks, which will be fixed for 1.6.0 (and perhaps 1.5.1). BDR On Mon, 29 Apr 2002, H C Pumphrey wrote:> > A few days ago, I wrote > > > > > > The function readChar() appears to have some type of problem with > > >memory allocation. > > On Fri, 26 Apr 2002 ripley@stats.ox.ac.uk wrote: > > > It's not really a memory leak, more that R_alloc was at the time > > incorrectly documented and do_readchar needs to reset vmax. > [snip] > > I'll put the two-line fix in for 1.5.0. > > I originally said that this problem didn't occur in readBin(). That seems > to be true if you are reading 8-byte data. However, I stressed this a bit > harder, and found that if you use the size=4 keyword then the problem > _does_ appear with readBin(). I enclose an example below. > > Apologies if the original bug report was misleading and again, many thanks > > Hugh > > > > ## Stress read/write of files in binary mode to see if there is a > ## memory problem. Now, readBin seems to be innocent if the default > ## (8-byte) reals are used, but there is a problem reading 4-bute reals. > > nrecs <-2000 > ntries <- 100 > npts <- 300 > gunge <- 1:npts > ## Write a test file > stream <- file("/tmp/gunge") > open(stream,open="wb") > for(irec in 1:nrecs){ > writeBin(gunge,"numeric",size=4) > } > close(stream) > > ## Read in the file ntries times. In real applications one would be reading > ## ntries _different_ files and calculating some summary statistics > for(itry in 1:ntries){ > stream <- file("/tmp/gunge") > open(stream,open="rb") > if(itry %% 10== 0) print(itry) > for(irec in 1:nrecs){ > ## Seems to be worse if reads are in several small chunks > ## > bunge <- readBin(stream,"numeric",10,size=4) > bunge <- readBin(stream,"numeric",20,size=4) > bunge <- readBin(stream,"numeric",10,size=4) > bunge <- readBin(stream,"numeric",20,size=4) > bunge <- readBin(stream,"numeric",npts-60,size=4) > > } > close(stream) > } > > > ============S=u=p=p=o=r=t===D=e=b=i=a=n===debian.org===========> Dr. Hugh C. Pumphrey | Tel. 0131-650-6026,Fax:0131-650-5780 > Institute for Meteorology | Replace 0131 with +44-131 if outside UK > The University of Edinburgh | Email hcp@met.ed.ac.uk > EDINBURGH EH9 3JZ, Scotland | URL: met.ed.ac.uk/~hcp > ============S=u=p=p=o=r=t==g=9=5==g95.sourceforge.net/=============> >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._