Yurii Aulchenko
2009-Sep-04 20:54 UTC
[Rd] asking for suggestions: interface for a C++ class
Dear All, I would like to have an advice for designing an R library, and thought that R-devel may be the best place to ask given so many people who are highly expert in R are around. We are at an early stage of designing an R library, which is effectively an interface to a C++ library providing fast access to large matrices stored on HDD as binary files. The core of the C++ library is relatively sophisticated class, which we try to "mirror" using an S4 class in R. Basically when a new object of that class is initiated, the C++ constructor is called and essential elements of the new object are reflected as slots of the R object. Now as you can imagine the problem is that if the R object is removed using say "rm" command, and not our specifically designed one, the C++ object still hangs around in RAM until R session is terminated. This is not nice, and also may be a problem, as the C++ object may allocate large part of RAM. We can of cause replace generic "rm" and "delete" functions, but this is definitely not a nice solution. Sounds like rather common problem people may face, but unfortunately I was not able to find a solution. I will greatly appreciate any suggestions. many thanks in advance, Yurii
* On 2009-09-04 at 22:54 +0200 Yurii Aulchenko wrote:> We are at an early stage of designing an R library, which is effectively an > interface to a C++ library providing fast access to large matrices stored > on HDD as binary files. The core of the C++ library is relatively > sophisticated class, which we try to "mirror" using an S4 class in R. > Basically when a new object of that class is initiated, the C++ constructor > is called and essential elements of the new object are reflected as slots > of the R object.Have a look at external pointers as described in the Writing R Extensions Manual.> Now as you can imagine the problem is that if the R object is removed using > say "rm" command, and not our specifically designed one, the C++ object > still hangs around in RAM until R session is terminated. This is not nice, > and also may be a problem, as the C++ object may allocate large part of > RAM. We can of cause replace generic "rm" and "delete" functions, but this > is definitely not a nice solution.You likely want a less literal translation of your C++ object into R's S4 system. One slot should be an external pointer which will give you the ability to define a finalizer to clean up when the R level object gets gc'd. + seth -- Seth Falcon | @sfalcon | http://userprimary.net/user
Simon Urbanek
2009-Sep-04 21:21 UTC
[Rd] asking for suggestions: interface for a C++ class
Yurii, On Sep 4, 2009, at 16:54 , Yurii Aulchenko wrote:> Dear All, > > I would like to have an advice for designing an R library, and > thought that R-devel may be the best place to ask given so many > people who are highly expert in R are around. > > We are at an early stage of designing an R library, which is > effectively an interface to a C++ library providing fast access to > large matrices stored on HDD as binary files.[FWIW there are already several packages that do waht you describe - see e.g. ff, bigMemory, nws, ...]> The core of the C++ library is relatively sophisticated class, which > we try to "mirror" using an S4 class in R. Basically when a new > object of that class is initiated, the C++ constructor is called and > essential elements of the new object are reflected as slots of the R > object. > > Now as you can imagine the problem is that if the R object is > removed using say "rm" command, and not our specifically designed > one, the C++ object still hangs around in RAM until R session is > terminated.You must have some link between the S4 object and your C++ object - ideally an external pointer - so all you have to do is to attach a finalizer to it via R_RegisterCFinalizer or R_RegisterCFinalizerEx. In that finalizer you simply free the C++ object and all is well. Note that R uses a garbage collector so the object won't go away immediately after it went out of scope - only after R thinks it needs to reclaim memory. You can use gc() to force garbage collection to test it.> This is not nice, and also may be a problem, as the C++ object may > allocate large part of RAM. We can of cause replace generic "rm" and > "delete" functions, but this is definitely not a nice solution. >... and it doesn't tackle the issue - objects can go out of scope by other means than just rm(), e.g.: f <- function() { ...; myGreatObject } f() # the great object is gone now since it was not assigned anywhere> Sounds like rather common problem people may face, but unfortunately > I was not able to find a solution. >Cheers, Simon
Yuri, Based on your brief description, we have already done this with bigmemory, and the ff package does something very similar but with emphasis on a wide range of atomic data types and data frame. Unless you are planning something different from what's been done before, you shouldn't waste valuable time on it. We (bigmemory) support shared memory and filebacked objects, and use the BOOST interprocess library to support all platforms. It's been a 2-3 year project with many unanticipated problems, but it's pretty stable and a good number of people seem to be using it at this point. We'll be releasing a fairly major redesign this Fall, and would appreciate suggestions and feedback. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay