Laurent Gautier wrote:> 
> Hi,
> 
> I have quite some trouble with the package methods.
> "Environments" in R are a convenient way to emulate
> pointers (and avoid copies of large objects, or of
> large collections of objects). So far, so good,
> but the package methods is becoming more (and more)
> problematic to work with. Up to version R-1.7.0,
> slots that were environments were still references
> to an environment, but I discovered in a recent
> R-patched that this is not the case any longer:
> environments as slots are now copied (increasing
> the memory consumption by more than three fold in my case).
> The (excessive) duplication (as a simple example
> shown below demonstrates it) is now enforced
> (as environments are copied too) !!!
> 
> > m <- matrix(0, 600^2, 50)
> ## RSS of the R process is about 150MB
> > rm(m); gc()
>          used (Mb) gc trigger  (Mb)
> Ncells 364813  9.8     667722  17.9
> Vcells  85605  0.7   14858185 113.4
> ## RSS is now about 15 MB
> > library(methods)
> > setClass("A", representation(a="matrix"))
> [1] "A"
> > a <- new("A", a=matrix(0, 600^2, 50))
> ## The RSS will peak to 705 MB !!!!!!
> 
> Are there any plans to make "methods" usable with
> large datasets ?
The memory growth seems real, but its connection to "environments as
slots" is unclear.
The only recent change that sounds relevant is the modification to
ensure that methods are evaluated in an environment that reflects the
lexical scope of the method's definition.  That does create a new
environment for each call to a generic function, but has nothing to do
with slots being environments.
It's possible there is some sort of "memory leak" or extra copying
there, but I'm not familiar enough with the details of that code to say
for sure.
Notice that the following workaround has no bad effects on memory
(suggesting that the extra environment in evaluating generics may in
fact be relevant):
R> setClass("A", representation(a="matrix"))
[1] "A"
R> aa <- matrix(600^2, 50)
R> a1 <- new("A")
R> a1@a <- aa
R> gc()
         used (Mb) gc trigger (Mb)
Ncells 370247  9.9     531268 14.2
Vcells  87522  0.7     786432  6.0
The general solution for dealing with large objects is likely to involve
some extensions to R to allow "reference" objects, for which the
programmer is responsible for any copying.
Environments themselves are not quite adequate for this purpose, since
different "references" to the same environment cannot have different
attributes.
John
> 
> L.
> 
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
-- 
John M. Chambers                  jmc@bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-2681
700 Mountain Avenue, Room 2C-282  fax:    (908)582-3340
Murray Hill, NJ  07974            web: http://www.cs.bell-labs.com/~jmc