andre zege
2012-May-05 01:50 UTC
[Rd] looking for adice on bigmemory framework with C++ and java interoperability
I work with problems that have rather large data requirements -- typically a bunch of multigig arrays. Given how generous R is with using memory, the only way for me to work with R has been to use bigmatrices from bigmemory package. One thing that is missing a bit is interoperability of bigmatrices with C++ and possibly java. What i mean by that is API that would allow read and write filebacked matrices from C++, and ideally java without being called from R. Having ability to save armadillo matrices into filebacked matrices and load them back into armadillo would be another very useful thing. This would allow really smooth cooperation between various pieces of software. I would prefer to avoid using Rinside for that. I guess i could hack bigmemory C++ code a bit, compile it into a C++ shared library and it'll do. I guess i could hack it a bit to work with armadillo matrices as well. I don't want however to reinvent the wheel and if there is something like that already somewhere i would rather use it for the moment. Looking very much for suggestions. If there is truly nothing like that and someone with C++ or especially java development experience is interested and want to cooperate on this, let me know too. Best Andre NB. I guess something like what i want -- access to the same disc caches from R, C++, and java (and python) exists in HDF world. I, however, don't know how performance of HDF compares with bigmemory matrices, which i come to like and appreciate a lot. If there is someone who could address simplicity of use and performance of HDF vs bigmemory, it'd be very interesting. [[alternative HTML version deleted]]
Simon Urbanek
2012-May-05 02:10 UTC
[Rd] looking for adice on bigmemory framework with C++ and java interoperability
Andre, On May 4, 2012, at 9:50 PM, andre zege wrote:> I work with problems that have rather large data requirements -- typically > a bunch of multigig arrays. Given how generous R is with using memory, the > only way for me to work with R has been to use bigmatrices from bigmemory > package. One thing that is missing a bit is interoperability of bigmatrices > with C++ and possibly java. What i mean by that is API that would allow > read and write filebacked matrices from C++, and ideally java without being > called from R. Having ability to save armadillo matrices into filebacked > matrices and load them back into armadillo would be another very useful > thing. This would allow really smooth cooperation between various pieces of > software. I would prefer to avoid using Rinside for that. > > I guess i could hack bigmemory C++ code a bit, compile it into a C++ shared > library and it'll do. I guess i could hack it a bit to work with armadillo > matrices as well. I don't want however to reinvent the wheel and if there > is something like that already somewhere i would rather use it for the > moment. Looking very much for suggestions. If there is truly nothing like > that and someone with C++ or especially java development experience is > interested and want to cooperate on this, let me know too. >bigmemory matrices are simply arrays of native types (typically doubles, but bm supports other types, too) so they are trivially readable/writable from both C++ (just read into memory and cast to the array type) and Java (e.g, DoubleBuffer view on a ByteBuffer). So the question is what exactly is the problem? Cheers, Simon> Best > Andre > > NB. I guess something like what i want -- access to the same disc caches > from R, C++, and java (and python) exists in HDF world. I, however, don't > know how performance of HDF compares with bigmemory matrices, which i come > to like and appreciate a lot. If there is someone who could address > simplicity of use and performance of HDF vs bigmemory, it'd be very > interesting. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >
Jay Emerson
2012-May-05 12:44 UTC
[Rd] looking for adice on bigmemory framework with C++ and java interoperability
On 4 May 2012 at 22:31, andre zege wrote: | Simon, thanks for your comment. I guess there is no problem, i am | apparently being lazy/busy and wondered if there is ready code that does | it. You are right, i suppose -- i'll look at the c++ code for bigmatrix and | will try to hack a solution.> You may want to look at the documentation for 'external pointers' in the > "Writing R Extensions" manual, and then consider at Rcpp::XPtr which > provides > an Rcpp-based route to using external pointers.It's nice having others answering our questions before we can -- many thanks Simon/Dirk! A big.matrix of dimension RxC is a column-major binary file of R*C elements of size 1, 2, 4, or 8 bytes, depending on the type of atomic element. Period, end of story, no header to worry about. So you can use it as you like from any language. Whether you can mmap it conveniently (if needed in shared memory or larger-than-RAM applications) is another story. We make use of the BOOST interprocess library for this. For working in R, the existing R API should be sufficient (though could always be expanded). For working in C++, the C++ API is pretty low-level and of course could benefit from ultimately being Rpp-ified, for example. There are plenty of examples of working in C++ inside bigmemory/biganalytics/bigtabulate. For Java... well, I don't code in Java. You can certainly make use of the data structure easily enough, but whether you can make use of the existing C++ API is something I simply can't answer. I note that one really cool trick is when you have data from another source (e.g. many satellite images) which is already a simple binary file. You can do a trivial hack to create a big.matrix descriptor file, and attach.big.matrix() to it immediately. No traditional read.*() is necessary, and it is super fast. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay