Hello all, in the setup I try to build I want to have snapshots of a file system replicated from host "replsource" to host "repltarget" and from there NFS-mounted on host "nfsclient" to access snapshots directly: replsource# zfs create pool1/nfsw replsource# mkdir /pool1/nfsw/lala replsource# zfs snapshot pool1/nfsw at snap1 replsource# zfs send pool1/nfsw at snap1 | \ ssh repltarget zfs receive -d pool1 (a "pool1" exists on repltarget as well.) repltarget# zfs set sharenfs=ro=nfsclient pool1/nfsw nfsclient# mount repltarget:/pool1/nfsw/.zfs/snapshot /mnt/nfsw/ nfsclient# cd /mnt/nfsw/snap1 nfsclient# access ./lala access("./lala", R_OK | X_OK) == 0 So far, so good. But now I see the following: (wait a bit, for instance 3 minutes, then replicate another snapshot) replsource# zfs snapshot pool1/nfsw at snap2 replsource# zfs send -i pool1/nfsw at snap1 pool1/nfsw at snap2 | \ ssh repltarget zfs receive pool1/nfsw (the PWD of the shell on nfsclient is still /mnt/nfsw/snap1) nfsclient# access ./lala access("./lala", R_OK | X_OK) == -1 (if you think that is surprising, watch this:) nfsclient# ls /mnt/nfsw snap1 snap2 nfsclient# access ./lala access("./lala", R_OK | X_OK) == 0 The "access" program does exactly the access(2) call illustrated in its output. The weird thing is that a directory can be accessed, then not accessed after the exported file system on repltarget has been updated by a zfs recv, then again be accessed after an ls of the mounted directory. In a snoop I see that, when the access(2) fails, the nfsclient gets a "Stale NFS file handle" response, which gets translated to an ENOENT. My problem is that the application accessing the contents inside of the NFS-mounted snapshot cannot find the content any more after the filesystem on repltarget has been updated. Is this a known problem? More important, is there a known workaround? All machines are running SunOS 5.10 Generic_127128-11 i86pc. If some more information could be helpful, I''ll gladly provide it. Regards, Juergen.
J?rgen,> In a snoop I see that, when the access(2) fails, the nfsclient gets > a "Stale NFS file handle" response, which gets translated to an > ENOENT.What happens if you use the noac NFS mount option on the client? I''d not recommend to use it for production environments unless you really need to, but this looks like a nfs client caching issue. Is this an nfsv3 or nfsv4 mount? What happens if you use one or the other? Please provide nfsstat -m output. Nils
(Haven''t I already written an answer to this? Anyway, I cannot find it.) Nils Goroll <slink at schokola.de> writes:>> In a snoop I see that, when the access(2) fails, the nfsclient gets >> a "Stale NFS file handle" response, which gets translated to an >> ENOENT. > > What happens if you use the noac NFS mount option on the client?No change. (I''ll skip your other questions, because:) In between a colleague of mine has found the apparent root cause of the problem. The zfs man page reads, under "zfs receive": If an incremental stream is received, then the destination file system must already exist, and its most recent snapshot must match the incremental stream''s source. The destination file system is unmounted and cannot be accessed during the receive operation. I still think there might be an NFS issue involved, as in my understanding a temporary unmount should not affect the NFS mount much, if a server reboot doesn''t. But the exported file system being unmounted in between makes this behaviour much more plausible and leaves us with little hope that this might be resolved very soon. Mounting the file system directly from the primary source is a feasible workaround, so that problem is not an issue for me any more at the moment. Of course, thanks for your help anyway! Regards, Juergen.
(I found the saved draft of the answer I thought I had send; I send it just for completeness''s sake.) ------------------------ Nils Goroll <slink at schokola.de> writes:> What happens if you use the noac NFS mount option on the client?That does not seem to change the behaviour. (I have not tried it with this test setup, but it happens with noac in the "real" scenario, too.)> I''d not recommend to use it for production environments unless you > really need to, but this looks like a nfs client caching issue.The "real" scenario is indeed a production environment, but with very low traffic, so we thought noac ould be an options.> Is this an nfsv3 or nfsv4 mount? What happens if you use one or the other?This is v3; I have not yet tried v4 yet. We don''t have v4 in use for reasons I don''t know (but which I am sure exist and are valid for our environment).> Please provide nfsstat -m output./mnt/nfsw from repltarget:/pool1/nfsw/.zfs/snapshot Flags: vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600 Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 ------------------------ As mentioned in the previous post, the problem is no longer an issue for me. Still I''d be curious to here more about it if something turns up. Regards, Juergen.