Ing. Jaroslav Kuchař
2015-Dec-15 17:50 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
Dear all, thank you for your hints. I would prefer to do not use Rserve as Dirk mentioned. @Simon I have full control over the Java implementation - I can adapt the code that I use for the communication R <-> Java.> You can natively access structures on each side. The fastest way is to > use R representation (column-oriented) in Java - that is much faster > than any kind of serialization or anything you mention above since you > pass the variables as a whole.Could you please send any reference to more examples or documentation that can help me? The main goal is to copy a full dataframe from R to Java. Best regards, Jaroslav On 2015-12-07 03:19, Simon Urbanek wrote:> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kucha? > <jaroslav.kuchar at fit.cvut.cz> wrote: > >> Dear all, >> >> in our ongoing project we use Java implementations of several >> algorithms. We also provide a ?wrapper? implemented as an R package >> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our >> recent experiments, the significant portion of time is spent on copying >> a dataframe from R to Java. The Java implementation needs access to the >> source dataframe. >> >> I have tested several approaches: calling Java method row-by-row; >> serialize the whole data-frame to a temp file and parsing in Java; or >> row binding to a single vector and calling a single Java method. Each >> approach has its limitations e.g. time-consuming row-by-row copying, >> serialization and parsing performance or memory limitations of a single >> vector. >> >> Is there an efficient approach how to copy a dataframe from R to Java >> and another one from Java to R? >> >> Thanks for any help you can provide... >> > > You can natively access structures on each side. The fastest way is to > use R representation (column-oriented) in Java - that is much faster > than any kind of serialization or anything you mention above since you > pass the variables as a whole. > > Typically, the bottleneck are Java applications which may require very > inefficient data structures. If you have control over the algorithms, > you can simply use proper data structures and avoid that problem. If > you don't have control, you'll have to add Java code that converts to > whatever structure is needed by the Java code form the data frame > pushed to the Java side. The main point here is that you do NOT want > to do any conversion on the R side. > > Cheers, > ?imon
Simon Urbanek
2015-Dec-15 22:15 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
You can pass the entire df, example:> data(iris) > iris$sp = as.character(iris$Species) > o=.jarray(lapply(iris, .jarray)) > .jcall("C",,"df",o)df, 6 variables [0]: double[150] [1]: double[150] [2]: double[150] [3]: double[150] [4]: int[150] [5]: String[150] Java code: public class C { static void df(Object df[]) { int n; System.out.println("df, " + (n = df.length) + " variables"); int i = 0; while (i < n) { if (df[i] instanceof double[]) { double d[] = (double[]) df[i]; System.out.println("["+i+"]: double["+d.length+"]"); } else if (df[i] instanceof int[]) { int d[] = (int[]) df[i]; System.out.println("["+i+"]: int["+d.length+"]"); } else if (df[i] instanceof String[]) { String s[] = (String[]) df[i]; System.out.println("["+i+"]: String["+s.length+"]"); } else { System.out.println("["+i+"]: some other type..."); } i++; } } } Normally, you wouldn't pass the entire df but instead have methods for the types you care about as the modeling function - that's more Java-like approach, but either is valid and there is no difference in efficiency. Cheers, Simon> On Dec 15, 2015, at 12:50 PM, Ing. Jaroslav Kucha? <jaroslav.kuchar at fit.cvut.cz> wrote: > > Dear all, > > thank you for your hints. I would prefer to do not use Rserve as Dirk > mentioned. > > @Simon > I have full control over the Java implementation - I can adapt the code > that I use for the communication R <-> Java. > >> You can natively access structures on each side. The fastest way is to >> use R representation (column-oriented) in Java - that is much faster >> than any kind of serialization or anything you mention above since you >> pass the variables as a whole. > > Could you please send any reference to more examples or documentation > that can help me? > The main goal is to copy a full dataframe from R to Java. > > Best regards, > Jaroslav > > On 2015-12-07 03:19, Simon Urbanek wrote: >> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kucha? >> <jaroslav.kuchar at fit.cvut.cz> wrote: >> >>> Dear all, >>> >>> in our ongoing project we use Java implementations of several >>> algorithms. We also provide a ?wrapper? implemented as an R package >>> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our >>> recent experiments, the significant portion of time is spent on copying >>> a dataframe from R to Java. The Java implementation needs access to the >>> source dataframe. >>> >>> I have tested several approaches: calling Java method row-by-row; >>> serialize the whole data-frame to a temp file and parsing in Java; or >>> row binding to a single vector and calling a single Java method. Each >>> approach has its limitations e.g. time-consuming row-by-row copying, >>> serialization and parsing performance or memory limitations of a single >>> vector. >>> >>> Is there an efficient approach how to copy a dataframe from R to Java >>> and another one from Java to R? >>> >>> Thanks for any help you can provide... >>> >> >> You can natively access structures on each side. The fastest way is to >> use R representation (column-oriented) in Java - that is much faster >> than any kind of serialization or anything you mention above since you >> pass the variables as a whole. >> >> Typically, the bottleneck are Java applications which may require very >> inefficient data structures. If you have control over the algorithms, >> you can simply use proper data structures and avoid that problem. If >> you don't have control, you'll have to add Java code that converts to >> whatever structure is needed by the Java code form the data frame >> pushed to the Java side. The main point here is that you do NOT want >> to do any conversion on the R side. >> >> Cheers, >> ?imon >
Ing. Jaroslav Kuchař
2015-Dec-17 06:53 UTC
[Rd] How to efficiently share data (a dataframe) between R and Java
Thank you for the example. Based on my recent experiments, this solution is the most efficient. Cheers, Jaroslav On 2015-12-15 23:15, Simon Urbanek wrote:> You can pass the entire df, example: > >> data(iris) >> iris$sp = as.character(iris$Species) >> o=.jarray(lapply(iris, .jarray)) >> .jcall("C",,"df",o) > df, 6 variables > [0]: double[150] > [1]: double[150] > [2]: double[150] > [3]: double[150] > [4]: int[150] > [5]: String[150] > > > Java code: > > public class C { > static void df(Object df[]) { > int n; > System.out.println("df, " + (n = df.length) + " variables"); > int i = 0; > while (i < n) { > if (df[i] instanceof double[]) { > double d[] = (double[]) df[i]; > System.out.println("["+i+"]: double["+d.length+"]"); > } else if (df[i] instanceof int[]) { > int d[] = (int[]) df[i]; > System.out.println("["+i+"]: int["+d.length+"]"); > } else if (df[i] instanceof String[]) { > String s[] = (String[]) df[i]; > System.out.println("["+i+"]: String["+s.length+"]"); > } else { > System.out.println("["+i+"]: some other type..."); > } > i++; > } > } > } > > Normally, you wouldn't pass the entire df but instead have methods for > the types you care about as the modeling function - that's more > Java-like approach, but either is valid and there is no difference in > efficiency. > > Cheers, > Simon > > > >> On Dec 15, 2015, at 12:50 PM, Ing. Jaroslav Kucha? <jaroslav.kuchar at fit.cvut.cz> wrote: >> >> Dear all, >> >> thank you for your hints. I would prefer to do not use Rserve as Dirk >> mentioned. >> >> @Simon >> I have full control over the Java implementation - I can adapt the code >> that I use for the communication R <-> Java. >> >>> You can natively access structures on each side. The fastest way is to >>> use R representation (column-oriented) in Java - that is much faster >>> than any kind of serialization or anything you mention above since you >>> pass the variables as a whole. >> >> Could you please send any reference to more examples or documentation >> that can help me? >> The main goal is to copy a full dataframe from R to Java. >> >> Best regards, >> Jaroslav >> >> On 2015-12-07 03:19, Simon Urbanek wrote: >>> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kucha? >>> <jaroslav.kuchar at fit.cvut.cz> wrote: >>> >>>> Dear all, >>>> >>>> in our ongoing project we use Java implementations of several >>>> algorithms. We also provide a ?wrapper? implemented as an R package >>>> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our >>>> recent experiments, the significant portion of time is spent on copying >>>> a dataframe from R to Java. The Java implementation needs access to the >>>> source dataframe. >>>> >>>> I have tested several approaches: calling Java method row-by-row; >>>> serialize the whole data-frame to a temp file and parsing in Java; or >>>> row binding to a single vector and calling a single Java method. Each >>>> approach has its limitations e.g. time-consuming row-by-row copying, >>>> serialization and parsing performance or memory limitations of a single >>>> vector. >>>> >>>> Is there an efficient approach how to copy a dataframe from R to Java >>>> and another one from Java to R? >>>> >>>> Thanks for any help you can provide... >>>> >>> >>> You can natively access structures on each side. The fastest way is to >>> use R representation (column-oriented) in Java - that is much faster >>> than any kind of serialization or anything you mention above since you >>> pass the variables as a whole. >>> >>> Typically, the bottleneck are Java applications which may require very >>> inefficient data structures. If you have control over the algorithms, >>> you can simply use proper data structures and avoid that problem. If >>> you don't have control, you'll have to add Java code that converts to >>> whatever structure is needed by the Java code form the data frame >>> pushed to the Java side. The main point here is that you do NOT want >>> to do any conversion on the R side. >>> >>> Cheers, >>> ?imon >>
Reasonably Related Threads
- How to efficiently share data (a dataframe) between R and Java
- How to efficiently share data (a dataframe) between R and Java
- How to efficiently share data (a dataframe) between R and Java
- SpeexEncoder requires 320 samples to process a Frame, not 160
- Q: Best-Practice for Swing-GUI calling R-code on Windows?