Dear all, I'm trying to fix a subtle bug in the hdf5 package. This package provides an interfaces to the HDF5 library and hence allows one to load data into R from files in the HDF5 format. The bug appeared during a period in which R changed but the package did not. I include below both the R and C code, stripped of everything except what is needed to show the bug. What is supposed to happen is (*) the user calls R function hdf5load() (*) hdf5load() calls C function do_hdf5load() (*) do_hdf5load() opens the HDF5 file recording its HDF5 file id (fid) (*) do_hdf5load() calls C function setup_onexit, passing fid to it (*) setup_onexit sets up the on.exit call to be R function hdf5cleanup with fid as its argument (*) C function do_hdf5load() walks the HDF5 file's tree structure of groups of groups of [...] of datasets, mapping them to an R list of lists of [...] of array variables. This recursive procedure may have a variety of exit points buried inside itself. (*) C function do_hdf5load() exits for some reason. R function hdf5load() therefore exits but before doing so it calls its on.exit code (which is hdf5cleanup(fid) with the right value of fid), closing the file. The problem is that when do_hdf5load() and hdf5load() exit, hdf5cleanup() is usually not called, meaning that the file is left open. You might not notice this, but if you are processing a few year's worth of data, which is stored at 1 file per day, you may end up with the system limit number of files open and be unable to open any more. I have a suspicion that the problem dates to a change in R at 2.8.0. If you do help(on.exit) it notes under "Details" that: "Where ?expr? was evaluated changed in R 2.8.0 ..." But it is not clear how I should modify the C code to force hdf5cleanup() to be reliably called when do_hdf5load() exits. Any help appreciated. Hugh (possibly the nearest thing to a maintainer that the hdf5 package currently has) (R and C code follow) #---------------------------------------------------------------- "hdf5load" <- function (file, load = TRUE, verbosity = 0, tidy = FALSE) { call <- sys.call() .External("do_hdf5load", call, sys.frame(sys.parent()), file, load, as.integer (verbosity), as.logical(tidy), PACKAGE="hdf5") } "hdf5cleanup" <- function (fid) { call <- sys.call() print("In hdf5cleanup: calling do_hdf5cleanup") invisible(.External("do_hdf5cleanup", call, sys.frame(sys.parent()), fid, PACKAGE="hdf5")) } #---------------------------------------------------------------- /*---------------------------------------------------------------*/ SEXP do_hdf5load (SEXP args) { /* Code to process args snipped */ if ((fid = H5Fopen (path, H5F_ACC_RDONLY, H5P_DEFAULT)) < 0) errorcall (call, "unable to open HDF file: %s", path); setup_onexit (fid, env); /* Messy code to walk tree structure of file snipped */ } /* The following function shown in its entirety */ setup_onexit (hid_t fid, SEXP env) { eval (lang2 (install ("on.exit"), lang2 (install ("hdf5cleanup"), ScalarInteger (fid))), env); } SEXP do_hdf5cleanup (SEXP args) { /* Code to process args snipped */ /* various cleanup things done including this: */ H5Fclose(fid) } /*---------------------------------------------------------------*/ -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Looks like you should be using finalizers instead. See the RODBC package for an example of this. On Fri, 4 Mar 2011, H C Pumphrey wrote:> Dear all, > > I'm trying to fix a subtle bug in the hdf5 package. This package provides an > interfaces to the HDF5 library and hence allows one to load data into R from > files in the HDF5 format. The bug appeared during a period in which R changed > but the package did not. > > I include below both the R and C code, stripped of everything except what is > needed to show the bug. What is supposed to happen is > > (*) the user calls R function hdf5load() > (*) hdf5load() calls C function do_hdf5load() > (*) do_hdf5load() opens the HDF5 file recording its HDF5 file id (fid) > (*) do_hdf5load() calls C function setup_onexit, passing fid to it > (*) setup_onexit sets up the on.exit call to be R function hdf5cleanup with > fid as its argument > (*) C function do_hdf5load() walks the HDF5 file's tree structure of groups > of groups of [...] of datasets, mapping them to an R list of lists of [...] > of array variables. This recursive procedure may have a variety of exit > points buried inside itself. > (*) C function do_hdf5load() exits for some reason. R function hdf5load() > therefore exits but before doing so it calls its on.exit code (which is > hdf5cleanup(fid) with the right value of fid), closing the file. > > The problem is that when do_hdf5load() and hdf5load() exit, hdf5cleanup() is > usually not called, meaning that the file is left open. You might not notice > this, but if you are processing a few year's worth of data, which is stored > at 1 file per day, you may end up with the system limit number of files open > and be unable to open any more. > > I have a suspicion that the problem dates to a change in R at 2.8.0. If you > do help(on.exit) it notes under "Details" that: "Where ?expr? was evaluated > changed in R 2.8.0 ..." But it is not clear how I should modify the C code to > force hdf5cleanup() to be reliably called when do_hdf5load() exits. > Any help appreciated. > > Hugh (possibly the nearest thing to a maintainer that the hdf5 package > currently has) > > (R and C code follow) > > #---------------------------------------------------------------- > "hdf5load" <- function (file, load = TRUE, verbosity = 0, tidy = FALSE) > { > call <- sys.call() > .External("do_hdf5load", call, sys.frame(sys.parent()), file, load, > as.integer (verbosity), as.logical(tidy), > PACKAGE="hdf5") > } > > "hdf5cleanup" <- function (fid) > { > call <- sys.call() > print("In hdf5cleanup: calling do_hdf5cleanup") > invisible(.External("do_hdf5cleanup", call, sys.frame(sys.parent()), fid, > PACKAGE="hdf5")) > } > #---------------------------------------------------------------- > > > /*---------------------------------------------------------------*/ > SEXP do_hdf5load (SEXP args) > { > /* Code to process args snipped */ > if ((fid = H5Fopen (path, H5F_ACC_RDONLY, H5P_DEFAULT)) < 0) > errorcall (call, "unable to open HDF file: %s", path); > > setup_onexit (fid, env); > /* Messy code to walk tree structure of file snipped */ > } > > /* The following function shown in its entirety */ > setup_onexit (hid_t fid, SEXP env) > { > eval (lang2 (install ("on.exit"), > lang2 (install ("hdf5cleanup"), > ScalarInteger (fid))), > env); > } > > SEXP > do_hdf5cleanup (SEXP args) > { > /* Code to process args snipped */ > /* various cleanup things done including this: */ > H5Fclose(fid) > } > /*---------------------------------------------------------------*/-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
H C Pumphrey wrote:> I'm trying to fix a subtle bug in the hdf5 package. This package > provides an interfaces to the HDF5 library and hence allows one to load > data into R from files in the HDF5 format. The bug appeared during a > period in which R changed but the package did not. [details snipped]I considered Prof. Ripley's suggestion to use finalizers but I'm not a good enough C programmer and rapidly became bogged down. For the moment I have fixed the bug in a more crude and obvious manner, by putting the relevant clean-up code at the end of both hdf5save() and hdf5load() and removing all references to the on.exit mechanism. This now means that if things work properly you can open and close as many HDF5 files as you like. (I imagine that the downside is that your file may be left open if some sorts of errors occur. But I think this was probably happening anyway, given the fact that the on.exit mechanism wasn't working. ) I have also altered the code behind hdf5load() so that it ignores soft links inside a HDF5 file that don't have a target. Ideally your file should not contain such things. But if it does, then it is better to skirt round them than to fail whenever you meet one. I will try uploading the package to CRAN at some point, but I think I'll use it myself for a bit to see whether it has any other nasties. If anyone wants to try it in the meantime you can get it at http://xweb.geos.ed.ac.uk/~hcp (scroll down to the bottom). I'd be particularly interested to know if it can be built on Windows/MacOS and, if not, why not. Hugh -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.