I have a multithreaded application running on FreeBSD 4.9, .10 and
-STABLE that I'm having an issue with.
The application writes large amounts of small files over an NFS mount
and randomly we're seeing fclose() return a failure code, -1 and
errorno, EBADF.
We have no idea what may be causing the problem. The NFS server appears
to be functioning fine, no errors at all, it runs perfectly over tons
of other clients.
At first we thought maybe that the fd was getting munged somehow, but
here is the weird part.
If the code is changed to do an fflush() on the fd immediately before we
issue an fclose(), fflush NEVER returns an error and always completes
successfully. However, completely rnadomly fclose() will return an
error condition and errno of EBADF.
There are hundreds of gigs and inodes available on the NFS server and
writes work fine from all other NFS clients at the time. (this is a six
server mail cluster)
We've double checked the compile flags and I've gone through all the
libc calls I can think of. And I've linked my own debugging into the
libc_r close function and it's not showing 'any' closes occuring
between
the fopen and fclose that fails.
We've also checked the flags of the FILE *f, structure, it is still
correct so it has not been munged by anything.
There are lots of conditions where the error EBADF is returned by the
kernel etc... and I suspect one of them is not really a sign of a bad
file handle but means something else, but I don't know any way to find
what is really occuring and if it is serious or just a faulty return code.
Doing a KTRACE on this may be the only option, but the problem is, the
application is SO busy and the problem only happens randomly it'd be
impossible to find if/when it happens. ie: thousands and thousands of
files can be written successfully before we actually see a failed one.
Any help or guidance would be greatly apprecaited.
TIA
--
Robert Blayzor, BOFH
INOC, LLC
rblayzor@inoc.net
PGP: http://www.inoc.net/~dev/
Key fingerprint = 1E02 DABE F989 BC03 3DF5 0E93 8D02 9D0B CB1A A7B0
Never underestimate the bandwidth of a station wagon full of tapes. -
Jackson