Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that memory through the GC. Right now I'm allocating via allocVector and copying all the data in it. However, this requires twice the amount of space (and time), and we're running out of memory when doing concurrent analysis. What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to the required address. - create a normal LISTSXP in the package, which holds a reference to all these objects, so that GC never takes place. - turn these objects read-only, or, at least, ensure that they are never free()d or remalloc()ed. overwriting the contents is not a critical issue. Would that approach work? Are there any alternative approaches? Any specific advice about turning these objects read-only? Thanks in advance.
On 05/07/2009 10:54 AM, Yuri D'Elia wrote:> Hi everybody, > > I have been interfacing some C++ library code into an R package but > ran into optimization issues specific to memory management that require > some insight into the GC. > > One of the C++ libraries returns simple vectors of integers, doubles and > complex which are allocated and managed from the library itself. I > cannot know the length of the array beforehand, so I cannot > pre-allocate that memory through the GC. > > Right now I'm allocating via allocVector and copying all the data in it. > However, this requires twice the amount of space (and time), and we're > running out of memory when doing concurrent analysis. > > What I'd would like to do is: > > - "patch" the SEXP returned to R so that DATAPTR() points directly to > the required address.The normal way to do what you want is to use an "external pointer". R assumes that memory management for those is handled completely externally. External pointers can have finalizers, so when you no longer have a need for the object, you can ask the external library to release it.> > - create a normal LISTSXP in the package, which holds a reference > to all these objects, so that GC never takes place.The list would hold the external pointers, which act like references.> > - turn these objects read-only, or, at least, ensure that they are > never free()d or remalloc()ed. overwriting the contents is not a > critical issue.That won't happen. I wouldn't try to trick the memory manager into thinking that it allocated these things; that will likely just lead to problems. Duncan Murdoch> > Would that approach work? > Are there any alternative approaches? > Any specific advice about turning these objects read-only? > > Thanks in advance. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
If you are in control of the c++ library (i.e. it is not from a vendor), then you can also override the new operator of your object so that it allocates an SEXP. if you implement PROTECT/UNPROTECT calls correctly, then GC will not be a problem. The approach that I've taken with my time series library is that you specify a storage policy as a template parameter. If you are using regular c++, then vectors of double/int are just allocated normally in c++, however, if you specify the R storage backend, then the constructor allocates an SEXP of doubles and sets the object's pointer to the first element in the vector. The ojbect doesn't really know that it's using R's backend storage. Sources here: R backend storage policy: http://github.com/armstrtw/r.tslib.backend/tree/master tslib: http://github.com/armstrtw/tslib/tree/master -Whit On Sun, Jul 5, 2009 at 10:54 AM, Yuri D'Elia<wavexx at users.sf.net> wrote:> Hi everybody, > > I have been interfacing some C++ library code into an R package but > ran into optimization issues specific to memory management that require > some insight into the GC. > > One of the C++ libraries returns simple vectors of integers, doubles and > complex which are allocated and managed from the library itself. I > cannot know the length of the array beforehand, so I cannot > pre-allocate that memory through the GC. > > Right now I'm allocating via allocVector and copying all the data in it. > However, this requires twice the amount of space (and time), and we're > running out of memory when doing concurrent analysis. > > What I'd would like to do is: > > - "patch" the SEXP returned to R so that DATAPTR() points directly to > ?the required address. > > - create a normal LISTSXP in the package, which holds a reference > ?to all these objects, so that GC never takes place. > > - turn these objects read-only, or, at least, ensure that they are > ?never free()d or remalloc()ed. overwriting the contents is not a > ?critical issue. > > Would that approach work? > Are there any alternative approaches? > Any specific advice about turning these objects read-only? > > Thanks in advance. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
On Jul 5, 2009, at 10:54 AM, Yuri D'Elia wrote:> Hi everybody, > > I have been interfacing some C++ library code into an R package but > ran into optimization issues specific to memory management that > require > some insight into the GC. > > One of the C++ libraries returns simple vectors of integers, doubles > and > complex which are allocated and managed from the library itself. I > cannot know the length of the array beforehand, so I cannot > pre-allocate that memory through the GC. > > Right now I'm allocating via allocVector and copying all the data in > it. > However, this requires twice the amount of space (and time), and we're > running out of memory when doing concurrent analysis. > > What I'd would like to do is: > > - "patch" the SEXP returned to R so that DATAPTR() points directly to > the required address. >Why don't you just "patch" the library to use allocVector? That's most reliable and trivial to do. Messing around with internal SEXP representation is asking for trouble as that may change at any point without notice (note that all access is through functions to avoid that). Cheers, Simon> - create a normal LISTSXP in the package, which holds a reference > to all these objects, so that GC never takes place. > > - turn these objects read-only, or, at least, ensure that they are > never free()d or remalloc()ed. overwriting the contents is not a > critical issue. > > Would that approach work? > Are there any alternative approaches? > Any specific advice about turning these objects read-only? > > Thanks in advance. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >