Iago Mosqueira
2008-Feb-05 16:01 UTC
[Rd] Need for garbage collection after creating object
Hello, After experiencing some difficulties with large arrays, I was surprised to see the apparent need for class to gc() after creating fairly large arrays. For example, calling a<-array(2, dim=c(10,10,10,10,10,100)) makes the memory usage of a fresh session of R jump from 13.8 Mb to 166.4 Mb. A call to gc() brought it down to 90.8 Mb, > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 132619 3.6 350000 9.4 350000 9.4 Vcells 10086440 77.0 21335887 162.8 20086792 153.3 as expected by > object.size(a) [1] 80000136 Do I need to call gc() after creating every large array, or can I setup the system to do this more often or efficiently? Thanks very much, Iago $platform [1] "i686-pc-linux-gnu" $version.string [1] "R version 2.6.1 (2007-11-26)"
Henrik Bengtsson
2008-Feb-05 18:12 UTC
[Rd] Need for garbage collection after creating object
On Feb 5, 2008 8:01 AM, Iago Mosqueira <iago.mosqueira at gmail.com> wrote:> Hello, > > After experiencing some difficulties with large arrays, I was surprised > to see the apparent need for class to gc() after creating fairly large > arrays. For example, calling > > a<-array(2, dim=c(10,10,10,10,10,100)) > > makes the memory usage of a fresh session of R jump from 13.8 Mb to > 166.4 Mb. A call to gc() brought it down to 90.8 Mb, > > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 132619 3.6 350000 9.4 350000 9.4 > Vcells 10086440 77.0 21335887 162.8 20086792 153.3 > > as expected by > > > object.size(a) > > [1] 80000136I think the reason for this is that array() has to "expand" the input data to the right length internally; data <- rep(data, length.out = vl) That is a so called "NAMED" object internally and when the following call to dim(data) <- dim occurs, the safest thing R can do is to create a copy. [Anyone, correct me if I'm wrong]. If you expand the input data yourself, you won't see that extra copy, e.g. data <- 2 dim <- c(10,10,10,10,10,100) data <- rep(data, length.out=prod(dim)) a <- array(data, dim=dim)> > Do I need to call gc() after creating every large array, or can I setup > the system to do this more often or efficiently?The R garbage collector will free/deallocate that memory when "needed". However, calling gc() explicitly should minimize the risk for over-fragmented memory. Basically, if there are several blocks of garbage memory hanging around, you might end up with a situation where you a lot of *total* memory available, but you will only be able to allocate small chunks of memory at any time. Even calling gc() at that situation will not help; there is no mechanism that defragments memory in R. So calling gc() after large allocations will add some protection against that. /Henrik> > Thanks very much, > > > Iago > > > $platform > [1] "i686-pc-linux-gnu" > $version.string > [1] "R version 2.6.1 (2007-11-26)" > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >