There is no plan to change R's garbage collector, and I did not say
there was. What I wrote is:
If R is built to use reference counting for determining sharing
information this does not happen, so this is likely to change and not
force a copy by 3.4.0.
So reference counting is to be used for determining sharing, _not_ for
memory management.
There is some work in progress to allow alternate representation for R
vectors that would for the most part behave like standard
vectors. There are however a lot of thorny issues: while it is nice if
passing such things to sum() or mean() behaves in the 'usual' way, it
is probably not so nice if passing to log() or to serialize() behaves
in the 'usual' way. We'll have to see over the next few month
whether
these issues can be addressed in a reusable way.
Best,
luke
On Sat, 6 Aug 2016, frederik at ofb.net wrote:
> Dear R Devel,
>
> In a thread this morning Luke Tierney mentioned that R's way of
> garbage collecting is going to change soon in 3.4.0. I couldn't find
> this info on Google but I wanted to share what I had been discussing
> in another forum, in case now is not too late to raise considerations
> which could affect the design of planned changes to R's garbage
> collection facilities.
>
> I ran into a problem when trying to get R to quickly load some vectors
> from disk. R should be able to do this efficiently using memory
> mapping. There is a package 'ff' which implements efficient loading
of
> disk-based vectors using memory mapping. It works pretty well, but the
> problem is that it creates a separate data type - the vectors are not
> "native" R vectors. There are some wrapper functions in a package
> 'ffbase' which allow people to use common functions like
'sum' on
> these 'ff' vectors. However, a new wrapper has to be written for
every
> such function, and I guess the 'ffbase' authors do not have time to
> write wrappers that are as efficient as the native R functions - in my
> testing, there was a 10x slow-down for 'sum'.
>
> The situation is a bit wistful because an 'ff' vector and a native
R
> vector are basically the same data type, they both store elements
> contiguously in memory. Apparently, what prevents 'ffbase' and
'ff'
> from creating native R vectors is the fact that it is impossible to
> assign a "finalizer" to a native R vector. We need a finalizer so
that
> R can tell us when a vector is being freed, so we can unmap the
> associated memory/file. Ffbase maintainer Edwin de Jonge was even
> skeptical that CRAN would accept a package implementing the hack I had
> proposed to simulate native R vectors from mmap'ed 'ff'
vectors. The
> issue is discussed here:
>
> https://github.com/edwindj/ffbase/issues/52
>
> Of course, weak references and external pointers allow finalizers to
> be assigned to objects, but as I understand it, such objects are
i> separate types from vectors - there is no way in R to synthesize
a> native vector endowed with a finalizer - something which could be
> passed directly to built-in functions like 'sum'.
>
> I think a finalizer facility for vectors would be useful because it
> would allow us to take advantage of the memory mapping architecture
> present in all modern processors, to do fast copy-free operations on
> large disk-based data structures, without having to re-implement
> internal functions like 'sum' which are essentially the same
algorithm
> no matter where the data is stored.
>
> Thank you,
>
> Frederick
>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu