I have a data set of roughly 700MB which during processing grows up to 2G ( I'm using a 4G linux box). After the work is done I clean up (rm()) and the state is returned to 700MB. Yet I find I cannot run the same routine again as it claims to not be able to allocate memory even though gcinfo() claims there is 1.1G left. At the start of the second time ============================== used (Mb) gc trigger (Mb) Ncells 2261001 60.4 3493455 93.3 Vcells 98828592 754.1 279952797 2135.9 Before Failing ============= Garbage collection 459 = 312+51+96 (level 0) ... 1222596 cons cells free (34%) 1101.7 Mbytes of heap free (51%) Error: cannot allocate vector of size 559481 Kb This looks like a fragmentation problem. Anyone have a handle on this situation? (ie. any work around?) Anyone working on improving R's fragmentation problems? On the other hand, is it possible there is a memory leak? In order to make my functions work on this dataset I tried to eliminate copies by coding with references (basic new.env() tricks). I presume that my cleaning up returned the temporary data (as evidenced by the gc output at the start of the second round of processing). Is it possible that it was not really cleaned up and is sitting around somewhere even though gc() thinks it has been returned? Thanks - any clues to follow up will be very helpful. Nawaaz
BTW, I think this is really an R-devel question, and if you want to pursue this please use that list. (See the posting guide as to why I think so.) This looks like fragmentation of the address space: many of us are using 64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation. Notice (memory.c line 1829 in the current sources) that large vectors are malloc-ed separately, so this is a malloc failure, and there is not a lot R can do about how malloc fragments the (presumably in your case as you did not say) 32-bit process address space. The message 1101.7 Mbytes of heap free (51%) is a legacy of an earlier gc() and is not really `free': I believe it means something like `may be allocated before garbage collection is triggered': see memory.c. On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:> I have a data set of roughly 700MB which during processing grows up to 2G ( > I'm using a 4G linux box). After the work is done I clean up (rm()) and the > state is returned to 700MB. Yet I find I cannot run the same routine again as > it claims to not be able to allocate memory even though gcinfo() claims there > is 1.1G left. > > At the start of the second time > ==============================> used (Mb) gc trigger (Mb) > Ncells 2261001 60.4 3493455 93.3 > Vcells 98828592 754.1 279952797 2135.9 > > Before Failing > =============> Garbage collection 459 = 312+51+96 (level 0) ... > 1222596 cons cells free (34%) > 1101.7 Mbytes of heap free (51%) > Error: cannot allocate vector of size 559481 Kb > > This looks like a fragmentation problem. Anyone have a handle on this > situation? (ie. any work around?) Anyone working on improving R's > fragmentation problems? > > On the other hand, is it possible there is a memory leak? In order to make my > functions work on this dataset I tried to eliminate copies by coding with > references (basic new.env() tricks). I presume that my cleaning up returned > the temporary data (as evidenced by the gc output at the start of the second > round of processing). Is it possible that it was not really cleaned up and is > sitting around somewhere even though gc() thinks it has been returned? > > Thanks - any clues to follow up will be very helpful. > Nawaaz-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks Brian. I looked at the code (memory.c) after I sent out the first email and noticed the malloc() call that you mention in your reply. Looking into this code suggested a possible scenario where R would fail in malloc() even if it had enough free heap address space. I noticed that if there is enough heap address space (memory.c:1796, VHEAP_FREE() > alloc_size) then the garbage collector is not run. So malloc could fail (since there is no more address space to use), even though R itself has enough free space it can reclaim. A simple fix is for R to try doing garbage collection if malloc() fails. I hacked memory.c() to look in R_GenHeap[LARGE_NODE_CLASS].New if malloc() fails (in a very similar fashion to ReleaseLargeFreeVectors()) I did a "best-fit" stealing from this list and returned it to allocVector(). This seemed to fix my particular problem - the large vectors that I had allocated in the previous round were still sitting in this list. Of course, the right thing to do is to check if there are any free vectors of the right size before calling malloc() - but it was simpler to do it my way (because I did not have to worry about how efficient my best-fit was; memory allocation was anyway going to fail). I can look deeper into this and provide more details if needed. Nawaaz Prof Brian Ripley wrote:> BTW, I think this is really an R-devel question, and if you want to > pursue this please use that list. (See the posting guide as to why I > think so.) > > This looks like fragmentation of the address space: many of us are using > 64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation. > > Notice (memory.c line 1829 in the current sources) that large vectors > are malloc-ed separately, so this is a malloc failure, and there is not > a lot R can do about how malloc fragments the (presumably in your case > as you did not say) 32-bit process address space. > > The message > 1101.7 Mbytes of heap free (51%) > is a legacy of an earlier gc() and is not really `free': I believe it > means something like `may be allocated before garbage collection is > triggered': see memory.c. > > > On Sat, 19 Feb 2005, Nawaaz Ahmed wrote: > >> I have a data set of roughly 700MB which during processing grows up to >> 2G ( I'm using a 4G linux box). After the work is done I clean up >> (rm()) and the state is returned to 700MB. Yet I find I cannot run the >> same routine again as it claims to not be able to allocate memory even >> though gcinfo() claims there is 1.1G left. >> >> At the start of the second time >> ==============================>> used (Mb) gc trigger (Mb) >> Ncells 2261001 60.4 3493455 93.3 >> Vcells 98828592 754.1 279952797 2135.9 >> >> Before Failing >> =============>> Garbage collection 459 = 312+51+96 (level 0) ... >> 1222596 cons cells free (34%) >> 1101.7 Mbytes of heap free (51%) >> Error: cannot allocate vector of size 559481 Kb >> >> This looks like a fragmentation problem. Anyone have a handle on this >> situation? (ie. any work around?) Anyone working on improving R's >> fragmentation problems? >> >> On the other hand, is it possible there is a memory leak? In order to >> make my functions work on this dataset I tried to eliminate copies by >> coding with references (basic new.env() tricks). I presume that my >> cleaning up returned the temporary data (as evidenced by the gc output >> at the start of the second round of processing). Is it possible that >> it was not really cleaned up and is sitting around somewhere even >> though gc() thinks it has been returned? >> >> Thanks - any clues to follow up will be very helpful. >> Nawaaz > >