David Wolfskill
2008-Nov-25 18:16 UTC
bsdtar vs. NFS: Couldn't visit directory: No such file or directory
Running an 8-core RELENG_7_1/i386 system (updated this morning), trying to tar up a directory hierarchy rooted at a directory nnamed "sb2" in a file system that is NFS-mounted (exported from a NetApp Filer); I have the following logged: @ 1227662967 [Tue Nov 25 17:29:27 2008] Starting "tar zcpf sb2.tgz sb2" in /homes/dwolf/bspace tar: sb2/src/vendor/berkeley-db/os/CVS: Couldn't visit directory: No such file or directory tar: sb2/src/vendor/berkeley-db/mutex: Couldn't visit directory: No such file or directory ... tar: sb2/src/bsd/lib/libgdchart: Couldn't visit directory: No such file or directory tar: sb2/src/bsd/lib/libgd: Couldn't visit directory: No such file or directory @ 1227665194 [Tue Nov 25 18:06:34 2008] Ending "tar zcpf sb2.tgz sb2" in /homes/dwolf/bspace (I elided a couple dozen or so of the whines.) I looked from a different NFS client host and saw each of the allegedly nonexistent filles or directories for which I cared to look. I then see that tar(1) took 1924.05 seconds to do this, and exited with a status code of 0. (I ran it under the auspices of /usr/bin/time.) Now, the reason I was doing this was to make a pristine archive of that hierarchy, so after I did some things in the hhierarcchy, I could blow it away, restore from the pristine archive, and repeat the performance: the intent is to be able to get reproducible results (both timing and output) from several repetitions of a several-hour-long process. The script I cobbled up to do the work checks the status code when tar(1) completes, and terminaates the process if it sees that there was a non-zero status code at that point (among others). Since tar(1) is exiting with a status code of 0, the script has no way to tell that something went (dreadfully) wrong in trying to create the archive, and blithely carries on... which is doomed to failure. Some questions: * Is it both intentional and appropriate for tar(1) to exit with a status code of 0 in this circumstance? The code that issues the whine is in write.c, around lines 662-663 in rev. 1.63.2.10. * It may be argued that telling tar(1) to go look in a file or directory, then claiming that it doesn't exist, is rather bad form; I certainly wouldn't ddisagree, yet I don't know what I can do to prevent it. I'm certain that it's not a case of some process on some other NFS client modifying that directory hierarchy during the tar(1) run. Is there anything that may be done to prevent it? Is there something broken in FreeBSD's NFS client implementation as of RELENG_7_1 that might be causing this? Perhaps it is an artifact of some sort of caching? * Does it matter that the NFS mount is being "managed" by amd(8)? * Am I using tar(1) appropriately? Is there some other tool (e.g. cpio(1)) that might have more appropriate behavior for the intended usage? * Might it help to defer the compression to a point subsequent to the creation of the archive proper? Thanks.... Peace, david -- David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20081126/b7e95666/attachment.pgp
David Wolfskill
2008-Nov-26 11:04 UTC
bsdtar vs. NFS: Couldn't visit directory: No such file or directory
On Wed, Nov 26, 2008 at 10:49:24AM -0800, Tim Kientzle wrote:> ... > >I then see that tar(1) took 1924.05 seconds to do this, and exited with > >a status code of 0. (I ran it under the auspices of /usr/bin/time.) > > I agree that this does seem wrong.Thank you: I managed to acquire a cold or some such thing, so nothing between my ears is working right, and I was wondering if I'd managed to completely lose track of reality, there.... :-}> Since you explicitly called out the time required for the > operation, did you have any concerns about the performance?Probably, but the first order of business would seem to be a matter of ensuring proper operation. That done, I expect that NFS performmance (vs. that of tar(1)) will be a gating factor -- but also fully expect to measure & report. :-}> >* Is it both intentional and appropriate for tar(1) to exit with a > > status code of 0 in this circumstance? The code that issues the > > whine is in write.c, around lines 662-663 in rev. 1.63.2.10. > > As you pointed out, automated scripts need to be able > to trust the exit code to know whether everything > went okay. Based on that, I would agree this is inappropriate, > though perhaps someone has an argument to the contrary. > I'll take a closer look.Excellent; thank you!> ... > >* Am I using tar(1) appropriately? Is there some other tool (e.g. > > cpio(1)) that might have more appropriate behavior for the intended > > usage? > > tar(1) seems appropriate here.Good; I have been using it for similar things rather longer than I really want to think about. :-}> >* Might it help to defer the compression to a point subsequent to the > > creation of the archive proper? > > That should have no effect.That's what I thought, but I'm sure you're familiar with the expression "grasping at straws." And I'm confident that you're far mor familiar with tar(1)'s internel workings than I ever will be. :-)> Only odd thing I see in your usage is that the 'p' modifier > has no effect when used with 'c'. (bsdtar always records > everything it can when creating the archive, limited only by > what the underlying format can represent.)OK -- but that ought not be harmful, yes?> If you can reproduce this on a smaller test case, I think > some of the folks working on NFS support might find detailed > tcpdump output to be interesting reading.I'll see what I can do; such details of the case that catalyzed this thread would certainly not be appropriate for public disclosure. I will, of course, be happy to test. :-} Thank you very much, Tim! Peace, david -- David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20081126/a0c39b9f/attachment.pgp
Tim Kientzle
2008-Nov-26 11:07 UTC
bsdtar vs. NFS: Couldn't visit directory: No such file or directory
David Wolfskill wrote:> Running an 8-core RELENG_7_1/i386 system (updated this morning), trying > to tar up a directory hierarchy rooted at a directory nnamed "sb2" in a > file system that is NFS-mounted (exported from a NetApp Filer); I have > the following logged: > > @ 1227662967 [Tue Nov 25 17:29:27 2008] Starting "tar zcpf sb2.tgz sb2" in /homes/dwolf/bspace > tar: sb2/src/vendor/berkeley-db/os/CVS: Couldn't visit directory: No such file or directory > tar: sb2/src/vendor/berkeley-db/mutex: Couldn't visit directory: No such file or directory > ... > > I then see that tar(1) took 1924.05 seconds to do this, and exited with > a status code of 0. (I ran it under the auspices of /usr/bin/time.)I agree that this does seem wrong. Since you explicitly called out the time required for the operation, did you have any concerns about the performance?> * Is it both intentional and appropriate for tar(1) to exit with a > status code of 0 in this circumstance? The code that issues the > whine is in write.c, around lines 662-663 in rev. 1.63.2.10.As you pointed out, automated scripts need to be able to trust the exit code to know whether everything went okay. Based on that, I would agree this is inappropriate, though perhaps someone has an argument to the contrary. I'll take a closer look.> * It may be argued that telling tar(1) to go look in a file or > directory, then claiming that it doesn't exist, is rather bad form; > I certainly wouldn't ddisagree, yet I don't know what I can do to > prevent it. I'm certain that it's not a case of some process on > some other NFS client modifying that directory hierarchy during the > tar(1) run. Is there anything that may be done to prevent it? Is > there something broken in FreeBSD's NFS client implementation as > of RELENG_7_1 that might be causing this? Perhaps it is an artifact > of some sort of caching? > > * Does it matter that the NFS mount is being "managed" by amd(8)?No idea about the underlying cause. Hopefully someone else can chime in about whether there is some known NFS issue that may be at work here.> * Am I using tar(1) appropriately? Is there some other tool (e.g. > cpio(1)) that might have more appropriate behavior for the intended > usage?tar(1) seems appropriate here.> * Might it help to defer the compression to a point subsequent to the > creation of the archive proper?That should have no effect. Only odd thing I see in your usage is that the 'p' modifier has no effect when used with 'c'. (bsdtar always records everything it can when creating the archive, limited only by what the underlying format can represent.) If you can reproduce this on a smaller test case, I think some of the folks working on NFS support might find detailed tcpdump output to be interesting reading. Tim
Tim Kientzle
2008-Nov-26 11:37 UTC
bsdtar vs. NFS: Couldn't visit directory: No such file or directory
David Wolfskill wrote:> Running an 8-core RELENG_7_1/i386 system (updated this morning), trying > to tar up a directory hierarchy rooted at a directory nnamed "sb2" in a > file system that is NFS-mounted (exported from a NetApp Filer); I have > the following logged: > > @ 1227662967 [Tue Nov 25 17:29:27 2008] Starting "tar zcpf sb2.tgz sb2" in /homes/dwolf/bspace > tar: sb2/src/vendor/berkeley-db/os/CVS: Couldn't visit directory: No such file or directory > ...> * It may be argued that telling tar(1) to go look in a file or > directory, then claiming that it doesn't exist, is rather bad form;Couple of quick notes about what tar is seeing here that may help diagnose the NFS issue. First, tar(1) has it's own directory-traversal code. Here's how that code works: * Tar reads all of the elements of the 'os' directory. * As it reads, it calls lstat() on each one. (stat() would have been used if you'd requested a -L logical traversal) * Any that are directories ('CVS' in this case) are put onto a work queue for later attention * At some later point, the 'CVS' entry is popped from the work queue and tar invokes chdir("CVS") to visit that directory. * It then uses opendir(".") to read entries from that directory. A few key points: 1) An lstat() of the 'CVS' directory here succeeded. 2) Some time may have elapsed between when the lstat() was invoked (and clearly succeeded, else 'CVS' wouldn't have been identified as a directory and put onto the work queue) and when the chdir() call was made. That may be interacting badly with some underlying cache expiration or network variability. 3) The error here is being triggered by a failed chdir() system call. I hope this helps someone to understand the NFS issue that you're seeing. Meanwhile, I'll look at the tar issue. Thanks very much for reporting this. This sort of problem is extremely hard to test for and doesn't get reported very often when it does occur. Hmmmmm.... I just noticed that there's currently no error handling if a chdir("..") operation fails when you try to ascend back to a parent directory. I should probably fix that, too, while I'm in here. Tim