Brett Randall
2016-Mar-10 07:18 UTC
[Gluster-users] Gluster (3.6.3) NFS READDIR failing intermittently from Finder on Mac OS X (10.10 and 10.11)
Hi all I have a problem which is doing my head in. We are running Gluster 3.6.3 with the in-built NFS server, across 8 servers. We share our volume out with SMB, AFP and Gluster's NFS server. In most cases, NFS works fine. Everything is visible and accessible from the terminal. But from Finder on our Macs, we are having a consistent problem. Firstly, we are mounting the share from the command line: $ mount -t nfs -o rw,intr,nolock,tcp 10.0.19.31:/glusvol ./glusvol We then open Finder and traverse to the folder in question (about 7 levels deep). I see about 20-30 items, but I know there are 100+ items in there. This is the case on multiple folders. If I open a terminal, go to that folder, and create a new empty file, the folder refreshes in Finder and I can see everything. However, dismount and remount and everything is gone again (although sometimes it displays all files for a few seconds before most of them disappear). I've repeated this on three different Macs of varying origin and OS version. I've started Wireshark on my Mac and monitored what is happening. It appears that there is an initial NFS READDIR Call to the NFS server with cookie set to 0. The READDIR Reply contains the filename of every file in the folder. Then there is another READDIR call with cookie set to 4096, which happens to be the last cookie listed in the previous reply. Curiously, the reply to this call lists all the files that I *cannot* see in Finder. But doesn't include the ones I can see. Then there are a whole lot of LOOKUP Calls while it looks at all the files that I *can* see. Then it stops at the 24th file, the last file I can see in Finder. It then issues another READDIR Call with a Cookie of 680. The Reply is "NFS3ERR_BAD_COOKIE". Looking through the previous replies, the only time that cookie was issued was in the FIRST reply. And again, the file in question with that cookie number is the LAST file that I can see in Finder. Surely, Finder cannot be THIS broken? I can see all files in that folder fine when I mount via AFS or SMB but not via NFS. But it all works fine from Terminal. We're experimenting with updating Gluster to 3.7.8 and moving to NFS Ganesha in the hope that moving to NFSv4 fixes it, but does anyone have any idea what's happening? I'm happy to send the .pcapng file to someone if it's helpful. I also have a .pcapng of when we create a file in the folder and Finder refreshes to show everything in there. The only interesting thing that I noticed in that file is that the cookie number at the end of the READDIR is much larger than anything I was seeing in the failed listings (17179869176). I tried forcing 32-bit inode sizes in Gluster NFS options (the closest thing I could find to NFS's native 32-bit cookie size restriction) with no joy, just in case that was part of it, which wouldn't make sense but tried anyway and no difference. Thanks in advance. Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160310/0b10f71a/attachment.html>
Niels de Vos
2016-Mar-10 08:55 UTC
[Gluster-users] Gluster (3.6.3) NFS READDIR failing intermittently from Finder on Mac OS X (10.10 and 10.11)
On Thu, Mar 10, 2016 at 06:18:44PM +1100, Brett Randall wrote:> Hi all > > > > I have a problem which is doing my head in. > > > > We are running Gluster 3.6.3 with the in-built NFS server, across 8 servers. > We share our volume out with SMB, AFP and Gluster's NFS server. > > > > In most cases, NFS works fine. Everything is visible and accessible from the > terminal. But from Finder on our Macs, we are having a consistent problem. > > > > Firstly, we are mounting the share from the command line: > > > > $ mount -t nfs -o rw,intr,nolock,tcp 10.0.19.31:/glusvol ./glusvol > > > > We then open Finder and traverse to the folder in question (about 7 levels > deep). I see about 20-30 items, but I know there are 100+ items in there. > This is the case on multiple folders. If I open a terminal, go to that > folder, and create a new empty file, the folder refreshes in Finder and I > can see everything. However, dismount and remount and everything is gone > again (although sometimes it displays all files for a few seconds before > most of them disappear). I've repeated this on three different Macs of > varying origin and OS version. > > > > I've started Wireshark on my Mac and monitored what is happening. It appears > that there is an initial NFS READDIR Call to the NFS server with cookie set > to 0. The READDIR Reply contains the filename of every file in the folder. > Then there is another READDIR call with cookie set to 4096, which happens to > be the last cookie listed in the previous reply. Curiously, the reply to > this call lists all the files that I *cannot* see in Finder. But doesn't > include the ones I can see. Then there are a whole lot of LOOKUP Calls while > it looks at all the files that I *can* see. Then it stops at the 24th file, > the last file I can see in Finder. It then issues another READDIR Call with > a Cookie of 680. The Reply is "NFS3ERR_BAD_COOKIE". Looking through the > previous replies, the only time that cookie was issued was in the FIRST > reply. And again, the file in question with that cookie number is the LAST > file that I can see in Finder. > > > > Surely, Finder cannot be THIS broken? I can see all files in that folder > fine when I mount via AFS or SMB but not via NFS. But it all works fine from > Terminal. We're experimenting with updating Gluster to 3.7.8 and moving to > NFS Ganesha in the hope that moving to NFSv4 fixes it, but does anyone have > any idea what's happening? I'm happy to send the .pcapng file to someone if > it's helpful. I also have a .pcapng of when we create a file in the folder > and Finder refreshes to show everything in there. The only interesting thing > that I noticed in that file is that the cookie number at the end of the > READDIR is much larger than anything I was seeing in the failed listings > (17179869176). I tried forcing 32-bit inode sizes in Gluster NFS options > (the closest thing I could find to NFS's native 32-bit cookie size > restriction) with no joy, just in case that was part of it, which wouldn't > make sense but tried anyway and no difference.It is possible that Finder does not follow the NFSv3 specification correctly. I have seen that some other OS's expect the cookie or inode to be 32-bit. This is the case for most filesystems, but Gluster uses 64-bit values. A subsequent READDIR(P) would use a partial cookie for continuation, and that can result in very strange behaviour. Only exposing 32-bit inodes over Gluster/NFS might be the solution for you. You can enable this with # gluster volume set ${VOLUME} nfs.enable-ino32 on Unmount and re-mount the NFS-export after changing this option. It is possible that the NFSv4 client on Mac OS X handles things better, but it could have the same issues too. HTH, Niels -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160310/b6d9d1aa/attachment.sig>