Ing. Jaroslav Kuchař
2015-Dec-06 17:36 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
Dear all, in our ongoing project we use Java implementations of several algorithms. We also provide a ?wrapper? implemented as an R package using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our recent experiments, the significant portion of time is spent on copying a dataframe from R to Java. The Java implementation needs access to the source dataframe. I have tested several approaches: calling Java method row-by-row; serialize the whole data-frame to a temp file and parsing in Java; or row binding to a single vector and calling a single Java method. Each approach has its limitations e.g. time-consuming row-by-row copying, serialization and parsing performance or memory limitations of a single vector. Is there an efficient approach how to copy a dataframe from R to Java and another one from Java to R? Thanks for any help you can provide... Regards, Jaroslav
Dirk Eddelbuettel
2015-Dec-06 18:56 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
On 6 December 2015 at 18:36, Ing. Jaroslav Kucha? wrote: | in our ongoing project we use Java implementations of several | algorithms. We also provide a ?wrapper? implemented as an R package | using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our | recent experiments, the significant portion of time is spent on copying | a dataframe from R to Java. The Java implementation needs access to the | source dataframe. | | I have tested several approaches: calling Java method row-by-row; | serialize the whole data-frame to a temp file and parsing in Java; or | row binding to a single vector and calling a single Java method. Each | approach has its limitations e.g. time-consuming row-by-row copying, | serialization and parsing performance or memory limitations of a single | vector. | | Is there an efficient approach how to copy a dataframe from R to Java | and another one from Java to R? | | Thanks for any help you can provide... Have you looked at the gold standard that is Rserve and its dedicated clients, starting with the Java one? Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Simon Urbanek
2015-Dec-07 02:19 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kucha? <jaroslav.kuchar at fit.cvut.cz> wrote:> Dear all, > > in our ongoing project we use Java implementations of several > algorithms. We also provide a ?wrapper? implemented as an R package > using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our > recent experiments, the significant portion of time is spent on copying > a dataframe from R to Java. The Java implementation needs access to the > source dataframe. > > I have tested several approaches: calling Java method row-by-row; > serialize the whole data-frame to a temp file and parsing in Java; or > row binding to a single vector and calling a single Java method. Each > approach has its limitations e.g. time-consuming row-by-row copying, > serialization and parsing performance or memory limitations of a single > vector. > > Is there an efficient approach how to copy a dataframe from R to Java > and another one from Java to R? > > Thanks for any help you can provide... >You can natively access structures on each side. The fastest way is to use R representation (column-oriented) in Java - that is much faster than any kind of serialization or anything you mention above since you pass the variables as a whole. Typically, the bottleneck are Java applications which may require very inefficient data structures. If you have control over the algorithms, you can simply use proper data structures and avoid that problem. If you don't have control, you'll have to add Java code that converts to whatever structure is needed by the Java code form the data frame pushed to the Java side. The main point here is that you do NOT want to do any conversion on the R side. Cheers, ?imon
Ing. Jaroslav Kuchař
2015-Dec-15 17:50 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
Dear all, thank you for your hints. I would prefer to do not use Rserve as Dirk mentioned. @Simon I have full control over the Java implementation - I can adapt the code that I use for the communication R <-> Java.> You can natively access structures on each side. The fastest way is to > use R representation (column-oriented) in Java - that is much faster > than any kind of serialization or anything you mention above since you > pass the variables as a whole.Could you please send any reference to more examples or documentation that can help me? The main goal is to copy a full dataframe from R to Java. Best regards, Jaroslav On 2015-12-07 03:19, Simon Urbanek wrote:> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kucha? > <jaroslav.kuchar at fit.cvut.cz> wrote: > >> Dear all, >> >> in our ongoing project we use Java implementations of several >> algorithms. We also provide a ?wrapper? implemented as an R package >> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our >> recent experiments, the significant portion of time is spent on copying >> a dataframe from R to Java. The Java implementation needs access to the >> source dataframe. >> >> I have tested several approaches: calling Java method row-by-row; >> serialize the whole data-frame to a temp file and parsing in Java; or >> row binding to a single vector and calling a single Java method. Each >> approach has its limitations e.g. time-consuming row-by-row copying, >> serialization and parsing performance or memory limitations of a single >> vector. >> >> Is there an efficient approach how to copy a dataframe from R to Java >> and another one from Java to R? >> >> Thanks for any help you can provide... >> > > You can natively access structures on each side. The fastest way is to > use R representation (column-oriented) in Java - that is much faster > than any kind of serialization or anything you mention above since you > pass the variables as a whole. > > Typically, the bottleneck are Java applications which may require very > inefficient data structures. If you have control over the algorithms, > you can simply use proper data structures and avoid that problem. If > you don't have control, you'll have to add Java code that converts to > whatever structure is needed by the Java code form the data frame > pushed to the Java side. The main point here is that you do NOT want > to do any conversion on the R side. > > Cheers, > ?imon