Hi everyone, Has anyone written a parser in Java for either the ASCII or binary format produced by save()? I need to parse a single large 2D array that is structured like this: list( "32609_1" = c(-9549.39231289146, -9574.07159324482, ... ), "32610_2" = c(-6369.12526971635, -6403.99620977124, ... ), "32618_2" = c(-2138.29095689061, -2057.9229403233, ... ), ... ) Or, given that I'm dealing with just a single array, would it be better to roll my own I/O using write.table or write.matrix from the MASS package? Thanks, David
On Wed, 5 Dec 2007, David Coppit wrote:> Hi everyone, > > Has anyone written a parser in Java for either the ASCII or binary format > produced by save()? I need to parse a single large 2D array that is > structured like this: > > list( > "32609_1" = c(-9549.39231289146, -9574.07159324482, ... ), > "32610_2" = c(-6369.12526971635, -6403.99620977124, ... ), > "32618_2" = c(-2138.29095689061, -2057.9229403233, ... ), > ... > ) > > Or, given that I'm dealing with just a single array, would it be better to > roll my own I/O using write.table or write.matrix from the MASS package?It would be much easier. The save() format is far more complex than you need. However, I would use writeBin() to write a binary file and read that in in Java, avoiding the binary -> ASCII -> binary conversion. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
David Coppit wrote:> Hi everyone, > > Has anyone written a parser in Java for either the ASCII or binary format > produced by save()?You might want to consider using the hdf5 package to save the array in HDF5 format. There are HDF5 libraries for Java as well <http://hdf.ncsa.uiuc.edu/hdf-java-html/>. I have never used them, but it works quite well for transferring data between R and Python.
On 12/5/07 12:15 PM, "Prof Brian Ripley" <ripley@stats.ox.ac.uk> wrote: On Wed, 5 Dec 2007, David Coppit wrote:> Or, given that I''m dealing with just a single array, would it be better to > roll my own I/O using write.table or write.matrix from the MASS package?It would be much easier. The save() format is far more complex than you need. However, I would use writeBin() to write a binary file and read that in in Java, avoiding the binary -> ASCII -> binary conversion. Thanks for the suggestion-writeBin works quite well. For posterity, here''s what I did: On the R side: # Assumes that there are no special values in the tofList, such as NA_REAL, # R_PosInf, R_NegInf, ISNAN, R_FINITE. See the "R Data Import/Export" manual. saveListAsBinary <- function( tofList, filename ) { outConn <- file( filename, "wb" ); for (m in 1:length(tofList)) { writeBin(names(tofList)[[m]], outConn); writeBin(length(tofList[[m]]), outConn, size = 4, endian = "big"); writeBin(tofList[[m]], outConn, size = 4, endian = "big"); } close(outConn); } saveListAsBinary(myList, "outfile.RDat"); On the Java side: public static void read_R_Output(String filename, ArrayList<String> names, ArrayList<ArrayList<Float>> data) { try { DataInputStream dataInputStream = new DataInputStream( new BufferedInputStream(new FileInputStream(filename))); boolean endOfFile = false; while (!endOfFile) { try { StringBuffer sb = new StringBuffer(); byte c; while ((c = dataInputStream.readByte()) != 0) sb.append((char)c); names.add(new String(sb)); int cols = dataInputStream.readInt(); ArrayList<Float> row = new ArrayList<Float>(cols); for (int i = 0; i < cols; i++) row.add(dataInputStream.readFloat()); data.add(row); } catch (EOFException e) { endOfFile = true; } } dataInputStream.close(); } catch (Exception e) { e.printStackTrace(); } } Regards, David [[alternative HTML version deleted]]
Dear David, You may also consider using the biocep project' tools and frameworks. they provide an advanced bridge that allow you to exchange between R and Java any standard R Object and any mapped S4 object. the object extracted to Java (an RList for you data) can be serialized to a file (saved as a java object). you can then read the serialized object very easily from java and this is obviously much faster than parsing and more elegant than a "proprietary" serialization/deserialization . the easiest way to achieve this is to create the Serialized RList via the Virtual R Workbench (Universal IDE for R). you can run the R workbench on any kind of OS via the biocep sources: http://www.ebi.ac.uk/microarray-srv/frontendapp/BIOCEP_README.txt or just use this link to get the software installed on your machine if you are using Mac OS or windows http://www.ebi.ac.uk/microarray-srv/frontendapp/rworkbench.jnlp create your data via the workbench R Console (you may also use the script editor or just read your list from a file) go to the menu "Java", choose "Save R/Java Object to local file" select the name you've given to your list and choose a file name. to retrieve your list in java, all you need is to add the RJB.jar to your class path and use this simple code: RList l=(RList)new ObjectInputStream(new FileInputStream("J:/list.ser")).readObject(); and that's it. this works for much more complex R objects (ExpressionSet..) best wishes, Karim> > On 12/5/07 12:15 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote: > > On Wed, 5 Dec 2007, David Coppit wrote: > >> Or, given that I'm dealing with just a single array, would it be better >> to >> roll my own I/O using write.table or write.matrix from the MASS package? > > It would be much easier. The save() format is far more complex than you > need. However, I would use writeBin() to write a binary file and read > that in in Java, avoiding the binary -> ASCII -> binary conversion. > > Thanks for the suggestion-writeBin works quite well. For posterity, here's > what I did: > > On the R side: > > # Assumes that there are no special values in the tofList, such as > NA_REAL, > # R_PosInf, R_NegInf, ISNAN, R_FINITE. See the "R Data Import/Export" > manual. > saveListAsBinary <- function( tofList, filename ) > { > outConn <- file( filename, "wb" ); > > for (m in 1:length(tofList)) { > writeBin(names(tofList)[[m]], outConn); > writeBin(length(tofList[[m]]), outConn, size = 4, endian = "big"); > writeBin(tofList[[m]], outConn, size = 4, endian = "big"); > } > > close(outConn); > } > > saveListAsBinary(myList, "outfile.RDat"); > > On the Java side: > > public static void read_R_Output(String filename, ArrayList<String> > names, > ArrayList<ArrayList<Float>> data) > { > try { > DataInputStream dataInputStream = new DataInputStream( > new BufferedInputStream(new FileInputStream(filename))); > > boolean endOfFile = false; > > while (!endOfFile) { > try { > StringBuffer sb = new StringBuffer(); > > byte c; > while ((c = dataInputStream.readByte()) != 0) > sb.append((char)c); > > names.add(new String(sb)); > > int cols = dataInputStream.readInt(); > > ArrayList<Float> row = new ArrayList<Float>(cols); > > for (int i = 0; i < cols; i++) > row.add(dataInputStream.readFloat()); > > data.add(row); > } catch (EOFException e) { > endOfFile = true; > } > } > > dataInputStream.close(); > } catch (Exception e) { > e.printStackTrace(); > } > } > > Regards, > David > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- DO NOT REPLY [Bug 4870] New: --link-dest results in errors for some paths containing spaces
- directing print.packageInfo to a file
- [LLVMdev] Stack roots and function parameters
- How to create a list that grows automatically
- How to create a list that grows automatically