Erik Jacobson
2020-Jan-25 20:36 UTC
[Gluster-users] gluster NFS hang observed mounting or umounting at scale
Hello. I'm sorry this got long :( So we are using gluster, as mentioned in posts several months ago, in some cluster management solutions. The solutions use a typical head node, and a set of leader nodes to manage the cluster. compute nodes, sometimes thousands of them, mount NFS roots from one of several leader node. Each leader node is a gluster server and gluster NFS server. While we have some bugs to work out, I have not often had systems big enough to test with to report the issues to you guys and get help. Recently, we added to our RO NFS solution to include RW NFS mounts. To make them fast, we create image files, put XFS filesystmes on them, and mount them as loopback. Then we over-mount the writable paths like /etc, /var, and similar in to this XFS-on-top-of-NFS image. It has been scaling great. However, when we started getting around 1000 nodes using both a 3x1 (more common) and 3x24 (some testing) gluster setup, some nodes get stuck. This problem shows up as NFS Server Not responding for the read-write mount point. It also shows up as a hang trying to mount the XFS image file (that resides on the NFS server). I wrote some test scripts to try to duplicate this problem on a smaller cluster and I had some success - one node showed the failure mode. I have 367 compute nodes, 3 leader/gluster/nfs servers, and one head node. In this example, /tmp/mnt1 is a read-write NFS mount that contains an XFS image: /tmp/mnt1/r41c4t3n3/rhel8.0/xfs-test.img I blasted some scripts that try to emulate what our cluster manager does for creating the sparse file, creating an xfs image on it, and growing it. I ran it a few times and one node developed the problem and could not unmount /tmp/mnt1. It reported NFS Server Not Responding. The gluster NFS log has this entry: [2020-01-25 19:07:33.085806] E [MSGID: 109040] [dht-helper.c:1388:dht_migration_complete_check_task] 0-cm_shared-dht: 19bd72f0-6863-4f1d-80dc-a426db8670b8: failed to lookup the file on cm_shared-dht [Stale file handle] [2020-01-25 19:07:33.085848] W [MSGID: 112199] [nfs3-helpers.c:3578:nfs3_log_commit_res] 0-nfs-nfsv3: /image/images_rw_nfs/r41c4t1n1/rhel8.0/xfs-test.img => (XID: cb501b58, COMMIT: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), wverf: 1579988225 Using a gluster mount, xfs-test.img does not seem to exist in that path (even though nothing should have deleted it). It does not show up in "ls" from the nfs client nor a gluster mount. However, if you do something like this on the file from an NFS path: test -r /tmp/mnt1/r41c4t3n3/rhel8.0/xfs-test.img You hang. Same is true from a different compute node/nfs client using the same path above (and the same gluster nfs server). My work around is to purge everything in the gluster shared storage .../r41c4t3n3/* and reboot the node (r41c4t3n3 in this case). It will then be re-created/resolved. However, it isn't desirable because the node's writable persistent storage was killed in the process. (meaning data loss). So this is an unconventional solution but it has been working very well for us and one solution will grow to 5,120 servers soon with this. Hoping you can help me work through this. I'm not an expert on gluster and gluster NFS internals and I know I haven't sent enough information. I'm hoping you can help me provide what would be helpful. I have added a compressed tarball attachment with some data. My nfs logs on all three servers are more than 1.3GB in size even though I set up the servers today. So I included a 10000 tail of one of them in the tarball (the gluster server acting as the NFS server for the test case that failed). I included all the glusterfs logs excluding nfs.log (all servers). I have the volume status and settings too. I also put in my test case so you can see how it worked to duplicate the problem the factory reported to us on the bigger system. Gluster version info: glusterfs 4.1.6 I have one patch for STACK_UNWIND_STRICT/MDC_STACK_UNWIND that fixes an md-cache-statfs issue that was fixed in a newer version. I included the patch in the attachment. Note: I am happy to try newer versions of gluster but this has been stable. I've tried newer versions in the past and lost stability at scale (and we don't have in-house systems the size customers have) -- therefore I tread carefully. Now is a good time to try newer versions if the community thinks it would help as I may get one day with a 1,000 node system (possibly Monday). ganesha was not stable for this workload so that's why I'm still using GNFS. For the test case details... - node-mount-rw-and-rw-image.sh: * helper script copied to all the test nodes/clients (367 nodes) * Mounts the RW NFS path to a temp location * Creates the sparse file, the xfs image on the file, mounts it - mount-rw-and-rw-image.sh: * runs on head node * Copies the above test script to all 367 compute nodes * runs the test script in parallel - umount-rw-and-rw-image.sh: * unounts the test mounts * After running mount-rw-and-rw-image.sh and this script several times, one node of the 367 developed the problem. We hit it with larger numbers on the 1,000 node systems in the factory that I can't debug on. Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: collection.tar.xz Type: application/x-xz Size: 447296 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200125/8942b4df/attachment-0001.xz>
Erik Jacobson
2020-Jan-26 00:51 UTC
[Gluster-users] gluster NFS hang observed mounting or umounting at scale
> The gluster NFS log has this entry: > [2020-01-25 19:07:33.085806] E [MSGID: 109040] [dht-helper.c:1388:dht_migration_complete_check_task] 0-cm_shared-dht: 19bd72f0-6863-4f1d-80dc-a426db8670b8: failed to lookup the file on cm_shared-dht [Stale file handle] > [2020-01-25 19:07:33.085848] W [MSGID: 112199] [nfs3-helpers.c:3578:nfs3_log_commit_res] 0-nfs-nfsv3: /image/images_rw_nfs/r41c4t1n1/rhel8.0/xfs-test.img => (XID: cb501b58, COMMIT: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), wverf: 1579988225 >I've done more digging. I have access to an actual system that is failing (instead of my test case) above. It appears to be the same issue so that's good. (My access goes away in a couple hours). The nodes don't hang at the mount, but rather, at a check in the code for the existence of the image file. I'm not sure if the "holes" message I share below is a problem or not, the file indeed does start sparse. Restarting 'glusterd' on the problem server allows the node to boot. However, it does seem like the problem image file disappears from the face of the earth as far as I can tell (it doesn't exist in the gluster mount to the same path). Searching for all messages in nfs.log related to r17c3t6n3 (the problem node with the problem nfs.img file), I see: [root at leader1 glusterfs]# grep r17c3t6n3 nfs.log [2020-01-24 12:29:42.412019] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: ca68a5fc, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [2020-01-25 04:57:10.199988] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: 1ec43ce0, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [Invalid argument] [2020-01-25 04:57:10.200431] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: 20c43ce0, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [2020-01-25 04:57:10.200695] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: 21c43ce0, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [2020-01-25 04:57:10.200827] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: 1fc43ce0, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [2020-01-25 04:57:10.201808] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: 22c43ce0, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [Invalid argument] [2020-01-25 23:32:09.629807] I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_rw_nfs/r17c3t6n3/rhel8.0 (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2020-01-26 02:42:33.712684] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t6n3/rhel8.0/xfs.img => (XID: a0ca8fc3, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 r17c3t4n1 is another case: [2020-01-25 23:19:46.729427] I [MSGID: 109063] [dht-layout.c:693:dht_layout_normalize] 0-cm_shared-dht: Found anomalies in /image/images_rw_nfs/r17c3t4n1/rhel8.0 (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2020-01-26 02:42:43.907163] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t4n1/rhel8.0/xfs.img => (XID: a77e7e7d, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [2020-01-26 02:42:43.908045] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t4n1/rhel8.0/xfs.img => (XID: a87e7e7d, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973 [2020-01-26 02:42:43.908194] W [MSGID: 112199] [nfs3-helpers.c:3494:nfs3_log_write_res] 0-nfs-nfsv3: /image/images_rw_nfs/r17c3t4n1/rhel8.0/xfs.img => (XID: a67e7e7d, WRITE: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), count: 0, STABLE,wverf: 1579664973