Christopher Walker
2011-May-04 20:47 UTC
[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem
Hello, We have a user who is trying to post-process HDF files in R. Her script goes through a number (~2500) of files in a directory, opening and reading the contents. This usually goes fine, but occasionally the script dies with: HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: #000: H5F.c line 1560 in H5Fopen(): unable to open file major: File accessability minor: Unable to open file #001: H5F.c line 1337 in H5F_open(): unable to read superblock major: File accessability minor: Read failed #002: H5Fsuper.c line 542 in H5F_super_read(): truncated file major: File accessability minor: File has been truncated Error in hdf5load(file = myfile, load = FALSE, verbosity = 0, tidy = TRUE) : unable to open HDF file: /n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5 HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: #000: H5F.c line 2012 in H5Fclose(): decrementing file ID failed major: Object atom minor: Unable to close file #001: H5I.c line 1340 in H5I_dec_ref(): can''t locate ID major: Object atom minor: Unable to find atom information (already closed?) Error in hdf5cleanup(16778754L) : unable to close HDF file But this file definitely does exist -- any stat or ls command shows it without a problem. Further, once I ''ls'' this file, if I rerun the same script, it successfully reads this file, but then dies on the next one with the same error. If I ''ls'' the entire directory, the script runs to completion without a problem. strace output shows: open("/n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 lseek(3, 0, SEEK_SET) = 0 read(3, "\211HDF\r\n\32\n", 8) = 8 read(3, "\0", 1) = 1 read(3, "\0\0\0\0\10\10\0\4\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377@"..., 87) = 87 close(3) = 0 write(2, "HDF5-DIAG: Error detected in HDF"..., 42) = 42 etc which initially looks fine to me, followed by an abrupt close. NFS filesystems and our 1.6.7.2 filesystem have no such problems -- any suggestions? Thanks very much, Chris
David Dillow
2011-May-04 21:06 UTC
[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem
On Wed, 2011-05-04 at 16:47 -0400, Christopher Walker wrote:> Hello, > > We have a user who is trying to post-process HDF files in R. Her script > goes through a number (~2500) of files in a directory, opening and > reading the contents. This usually goes fine, but occasionally the > script dies with:> open("/n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5", > O_RDONLY) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0This thinks it is a zero length file, but I''ll bet you see the right length later. Perhaps you could test Johann''s suggestion from http://jira.whamcloud.com/browse/LU-274 to see it that helps, and report the results in Jira?
Larry
2011-May-05 02:05 UTC
[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem
try mounting the lustre filesystem with -o flock or -o localflock On Thu, May 5, 2011 at 4:47 AM, Christopher Walker <cwalker at fas.harvard.edu> wrote:> Hello, > > We have a user who is trying to post-process HDF files in R. ?Her script > goes through a number (~2500) of files in a directory, opening and > reading the contents. ?This usually goes fine, but occasionally the > script dies with: > > > HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: > ? #000: H5F.c line 1560 in H5Fopen(): unable to open file > ? ? major: File accessability > ? ? minor: Unable to open file > ? #001: H5F.c line 1337 in H5F_open(): unable to read superblock > ? ? major: File accessability > ? ? minor: Read failed > ? #002: H5Fsuper.c line 542 in H5F_super_read(): truncated file > ? ? major: File accessability > ? ? minor: File has been truncated > Error in hdf5load(file = myfile, load = FALSE, verbosity = 0, tidy > TRUE) : > ? unable to open HDF file: > /n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5 > HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: > ? #000: H5F.c line 2012 in H5Fclose(): decrementing file ID failed > ? ? major: Object atom > ? ? minor: Unable to close file > ? #001: H5I.c line 1340 in H5I_dec_ref(): can''t locate ID > ? ? major: Object atom > ? ? minor: Unable to find atom information (already closed?) > Error in hdf5cleanup(16778754L) : unable to close HDF file > > > But this file definitely does exist -- any stat or ls command shows it > without a problem. ?Further, once I ''ls'' this file, if I rerun the same > script, it successfully reads this file, but then dies on the next one > with the same error. ?If I ''ls'' the entire directory, the script runs to > completion without a problem. ?strace output shows: > > open("/n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5", > O_RDONLY) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > lseek(3, 0, SEEK_SET) ? ? ? ? ? ? ? ? ? = 0 > read(3, "\211HDF\r\n\32\n", 8) ? ? ? ? ?= 8 > read(3, "\0", 1) ? ? ? ? ? ? ? ? ? ? ? ?= 1 > read(3, > "\0\0\0\0\10\10\0\4\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377@"..., > 87) = 87 > close(3) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?= 0 > write(2, "HDF5-DIAG: Error detected in HDF"..., 42) = 42 > etc > > which initially looks fine to me, followed by an abrupt close. > > NFS filesystems and our 1.6.7.2 filesystem have no such problems -- any > suggestions? > > Thanks very much, > Chris > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Christopher Walker
2011-May-05 03:57 UTC
[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem
Hi Larry, Everything below is with the filesystem mounted with localflock. This does indeed look a lot like the bug referred to by David Dillow (thanks!) Chris On 5/4/11 10:05 PM, Larry wrote:> try mounting the lustre filesystem with -o flock or -o localflock > > On Thu, May 5, 2011 at 4:47 AM, Christopher Walker > <cwalker at fas.harvard.edu> wrote: >> Hello, >> >> We have a user who is trying to post-process HDF files in R. Her script >> goes through a number (~2500) of files in a directory, opening and >> reading the contents. This usually goes fine, but occasionally the >> script dies with: >> >> >> HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: >> #000: H5F.c line 1560 in H5Fopen(): unable to open file >> major: File accessability >> minor: Unable to open file >> #001: H5F.c line 1337 in H5F_open(): unable to read superblock >> major: File accessability >> minor: Read failed >> #002: H5Fsuper.c line 542 in H5F_super_read(): truncated file >> major: File accessability >> minor: File has been truncated >> Error in hdf5load(file = myfile, load = FALSE, verbosity = 0, tidy >> TRUE) : >> unable to open HDF file: >> /n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5 >> HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: >> #000: H5F.c line 2012 in H5Fclose(): decrementing file ID failed >> major: Object atom >> minor: Unable to close file >> #001: H5I.c line 1340 in H5I_dec_ref(): can''t locate ID >> major: Object atom >> minor: Unable to find atom information (already closed?) >> Error in hdf5cleanup(16778754L) : unable to close HDF file >> >> >> But this file definitely does exist -- any stat or ls command shows it >> without a problem. Further, once I ''ls'' this file, if I rerun the same >> script, it successfully reads this file, but then dies on the next one >> with the same error. If I ''ls'' the entire directory, the script runs to >> completion without a problem. strace output shows: >> >> open("/n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-000000-g01.h5", >> O_RDONLY) = 3 >> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 >> lseek(3, 0, SEEK_SET) = 0 >> read(3, "\211HDF\r\n\32\n", 8) = 8 >> read(3, "\0", 1) = 1 >> read(3, >> "\0\0\0\0\10\10\0\4\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377@"..., >> 87) = 87 >> close(3) = 0 >> write(2, "HDF5-DIAG: Error detected in HDF"..., 42) = 42 >> etc >> >> which initially looks fine to me, followed by an abrupt close. >> >> NFS filesystems and our 1.6.7.2 filesystem have no such problems -- any >> suggestions? >> >> Thanks very much, >> Chris >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>
Peter Kjellström
2011-May-05 11:17 UTC
[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem
On Wednesday, May 04, 2011 10:47:26 PM Christopher Walker wrote:> Hello, > > We have a user who is trying to post-process HDF files in R. Her script > goes through a number (~2500) of files in a directory, opening and > reading the contents. This usually goes fine, but occasionally the > script dies with: > > > HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080: > #000: H5F.c line 1560 in H5Fopen(): unable to open file > major: File accessability > minor: Unable to open file...> fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0Seems just like https://bugzilla.lustre.org/show_bug.cgi?id=24458 which is a real pain for us atm. Lustre returns file size == 0 for a stat of a non-zero file.> NFS filesystems and our 1.6.7.2 filesystem have no such problems -- any > suggestions?This is what we''ve done, downgrade all our clients from 1.8.5(patchless) to 1.6.7.1(patchless). /Peter> Thanks very much, > Chris-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110505/77e8d122/attachment-0001.bin