Hi there, I have a Java process that writes HDF5 files with the following approximate structure: group "xxx" { group "yyy" { dataset {} dataset {} } group "zzz" { dataset {} dataset {} } } where dataset is a rank one dataspace having a compound datatype defined as: H5T_UNIX_TIME, float, float, float, float I have tried R packages h5r and hdf5 in an attempt to read the file but examining the source of h5r and reading documentation for hdf5 here: http://xweb.geos.ed.ac.uk/~hcp/Rhdf5.html leads me to believe that compound datatypes are not supported by these packages. My guess is that mapping arbitrary type definitions in HDF5 to available types in R might be somewhat tricky. Incidentally, h5dump has trouble displaying the data in the dataset but I think this is to do with the time 'field': DATASPACE SIMPLE { ( 33 ) / ( H5S_UNLIMITED ) } DATA { h5dump error: unable to print data } Since I am new to both HDF5 and R I wonder if there are better approaches to storing the information I have that will allow me to use either h5r or hdf5 packages unmodified. I expect I can contribute changes to either of the packages that will allow me to do what I describe but fall short of general compound datatype support. Any comments or advice gratefully received. Regards...Jeremy [[alternative HTML version deleted]]
Brian G. Peterson
2011-Dec-01 10:41 UTC
[Rd] HDF5 compound data types and h5r/hdf5 R packages
On Thu, 2011-12-01 at 19:22 +1300, Jeremy Reeve wrote:> Hi there, > > > I have a Java process that writes HDF5 files with the following > approximate structure: > > group "xxx" { > > group "yyy" { > > dataset {} > dataset {} > > } > > group "zzz" { > > dataset {} > dataset {} > > } > > } > > where dataset is a rank one dataspace having a compound datatype defined as: > > H5T_UNIX_TIME, float, float, float, float > > I have tried R packages h5r and hdf5 in an attempt to read the file > but examining the source of h5r and reading documentation for hdf5 > here: http://xweb.geos.ed.ac.uk/~hcp/Rhdf5.html leads me to believe > that compound datatypes are not supported by these packages. My guess > is that mapping arbitrary type definitions in HDF5 to available types > in R might be somewhat tricky. Incidentally, h5dump has trouble > displaying the data in the dataset but I think this is to do with the > time 'field': > > DATASPACE SIMPLE { ( 33 ) / ( H5S_UNLIMITED ) } > DATA { > h5dump error: unable to print data > } > > Since I am new to both HDF5 and R I wonder if there are better > approaches to storing the information I have that will allow me to use > either h5r or hdf5 packages unmodified. I expect I can contribute > changes to either of the packages that will allow me to do what I > describe but fall short of general compound datatype support. > > Any comments or advice gratefully received. > > Regards...JeremyYou should contact the maintainers of the two hdf5 packages you reference. Hopefully one of them will be open to assisting you, advising of your immediate problem, and collaborating on extension of the package. If that doesn't work, perhaps the R Bioconductor list would be a better place for the discussion, since I believe that HDF5 files are mostly used in the geological and biological sciences. Regards, - Brian -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock
Dear Jeremy! I have written a package rhdf5 that can handle compound data types as well. The H5T_UNIX_TIME type is not yet supported, but it should be possible to read the floats into a data.frame. I put the H5T_UNIX_TIME type on my list and will add it in the next weeks. Currently, you can download the package from http://www-huber.embl.de/users/befische/rhdf5/ It will appear on bioconductor in the next couple of days. Best, Bernd> From: Jeremy Reeve <jeremy.reeve1 at gmail.com> > Date: Wed, Nov 30, 2011 at 10:22 PM > Subject: [Rd] HDF5 compound data types and h5r/hdf5 R packages > To: r-devel at r-project.org > > > Hi there, > > > I have a Java process that writes HDF5 files with the following > approximate structure: > > group "xxx" { > > group "yyy" { > > dataset {} > dataset {} > > } > > group "zzz" { > > dataset {} > dataset {} > > } > > } > > where dataset is a rank one dataspace having a compound datatype defined as: > > H5T_UNIX_TIME, float, float, float, float > > I have tried R packages h5r and hdf5 in an attempt to read the file > but examining the source of h5r and reading documentation for hdf5 > here: http://xweb.geos.ed.ac.uk/~hcp/Rhdf5.html leads me to believe > that compound datatypes are not supported by these packages. My guess > is that mapping arbitrary type definitions in HDF5 to available types > in R might be somewhat tricky. Incidentally, h5dump has trouble > displaying the data in the dataset but I think this is to do with the > time 'field': > > DATASPACE SIMPLE { ( 33 ) / ( H5S_UNLIMITED ) } > DATA { > h5dump error: unable to print data > } > > Since I am new to both HDF5 and R I wonder if there are better > approaches to storing the information I have that will allow me to use > either h5r or hdf5 packages unmodified. I expect I can contribute > changes to either of the packages that will allow me to do what I > describe but fall short of general compound datatype support. > > Any comments or advice gratefully received. > > Regards...Jeremy > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Bernd Fischer EMBL Heidelberg Meyerhofstra?e 1 69117 Heidelberg Tel: +49 [0] 6221 387-8131 E-Mail: bernd.fischer at embl.de Homepage: http://www.ebi.ac.uk/~bfischer