Douglas Colkitt
2013-Feb-18 19:29 UTC
[Gluster-users] Directory metadata inconsistencies and missing output ("mismatched layout" and "no dentry for inode" error)
Hi I'm running into a rather strange and frustrating bug and wondering if anyone on the mailing list might have some insight about what might be causing it. I'm running a cluster of two dozen nodes, where the processing nodes are also the gluster bricks (using the SLURM resource manager). Each node has the glusters mounted natively (not NFS). All nodes are using v3.2.7. Each job in the node runs a shell script like so: containerDir=$1 groupNum=$2 mkdir -p $containerDir ./generateGroupGen.py $groupNum >$containerDir/$groupNum.out Then run the following jobs: runGroupGen [glusterDirectory] 1 runGroupGen [glusterDirectory] 2 runGroupGen [glusterDirectory] 3 ... Typically about 200 jobs launch within milliseconds of each other so the glusterfs/fuse directory system receives a large number of simultaneous create directory and create file system calls within a very short time. Some of the output files inside the directory have a file that exists but no output. When this occurs it is always the case that either all jobs on a node behave normally or all fail to produce output. It should be noted that there are no error messages generated by the processes themselves, and all processes on the no-output node return with no error code. In that sense the failure is silent, but corrupts the data, which is dangerous. The only indication of error are errors (on the no output nodes) in the /var/log/distrib-glusterfs.log of the form: [2013-02-18 05:55:31.382279] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-volume1-client-16: remote operation failed: Stale NFS file handle [2013-02-18 05:55:31.382302] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-volume1-client-17: remote operation failed: Stale NFS file handle [2013-02-18 05:55:31.382327] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-volume1-client-18: remote operation failed: Stale NFS file handle [2013-02-18 05:55:31.640791] W [inode.c:1044:inode_path] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xe8fd) [0x7fa8341868fd] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xa6bb) [0x7fa8341826bb] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_loc_fill+0x1c6) [0x7fa83417d156]))) 0-volume1/inode: no dentry for non-root inode -69777006931: 0a37836d-e9e5-4cc1-8bd2-e8a49947959b [2013-02-18 05:55:31.640865] W [fuse-bridge.c:561:fuse_getattr] 0-glusterfs-fuse: 2298073: GETATTR 140360215569520 (fuse_loc_fill() failed) [2013-02-18 05:55:31.641672] W [inode.c:1044:inode_path] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xe8fd) [0x7fa8341868fd] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xa6bb) [0x7fa8341826bb] (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_loc_fill+0x1c6) [0x7fa83417d156]))) 0-volume1/inode: no dentry for non-root inode -69777006931: 0a37836d-e9e5-4cc1-8bd2-e8a49947959b [2013-02-18 05:55:31.641724] W [fuse-bridge.c:561:fuse_getattr] 0-glusterfs-fuse: 2298079: GETATTR 140360215569520 (fuse_loc_fill() failed) ... Sometimes on these events, and sometimes not, there will also be logs (on both normal and abnormal nodes) of the form: [2013-02-18 03:35:28.679681] I [dht-common.c:525:dht_revalidate_cbk] 0-volume1-dht: mismatching layouts for /inSample/pred/20110831 I understand from reading the mailing list that both the dentry errors and the mismatched layout errors are both non-fatal warnings and that the metadata will become internally consistent regardless. But these errors only happen on times when I'm slamming the glusterfs system with the creation of a bunch of small files in a very short burst like I described above. So their presence seems to be related to the error. I think the issue is almost assuredly related to the delayed propagation of glusterfs directory metadata. Some nodes are creating directory simultaneous to other nodes and the two are producing inconsistencies with regards to the dht layout information. My hypothesis is that when Node A is still writing that the process to resolve the inconsistencies with and propagate the metadata from Node B is rendering the location that Node A is writing to disconnected from its supposed path. (And hence the no dentry errors). I've taken some effort to go through the glusterfs source code, particularly the dht related files. The way dht normalizes anomalies could be the problem, but I've failed to find anything specific. Has anyone else run into a problem like this, or have insight about what might be causing it or how to avoid it? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130218/d2daa165/attachment.html>
Anand Avati
2013-Feb-18 19:54 UTC
[Gluster-users] Directory metadata inconsistencies and missing output ("mismatched layout" and "no dentry for inode" error)
A similar issue was fixed in the master branch recently. Can you apply http://review.gluster.org/4459 to your source / rebuild / retest and see if the issue gets fixed for you? It is quite a trivial patch and might even just apply on 3.2.7 source. Avati On Mon, Feb 18, 2013 at 11:29 AM, Douglas Colkitt <douglas.colkitt at gmail.com> wrote:> Hi I'm running into a rather strange and frustrating bug and wondering if > anyone on the mailing list might have some insight about what might be > causing it. I'm running a cluster of two dozen nodes, where the processing > nodes are also the gluster bricks (using the SLURM resource manager). Each > node has the glusters mounted natively (not NFS). All nodes are using > v3.2.7. Each job in the node runs a shell script like so: > > containerDir=$1 > groupNum=$2 > mkdir -p $containerDir > ./generateGroupGen.py $groupNum >$containerDir/$groupNum.out > > Then run the following jobs: > > runGroupGen [glusterDirectory] 1 > runGroupGen [glusterDirectory] 2 > runGroupGen [glusterDirectory] 3 > ... > > Typically about 200 jobs launch within milliseconds of each other so the > glusterfs/fuse directory system receives a large number of simultaneous > create directory and create file system calls within a very short time. > > Some of the output files inside the directory have a file that exists but > no output. When this occurs it is always the case that either all jobs on a > node behave normally or all fail to produce output. It should be noted that > there are no error messages generated by the processes themselves, and all > processes on the no-output node return with no error code. In that sense > the failure is silent, but corrupts the data, which is dangerous. The only > indication of error are errors (on the no output nodes) in the > /var/log/distrib-glusterfs.log of the form: > > [2013-02-18 05:55:31.382279] E > [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-volume1-client-16: remote > operation failed: Stale NFS file handle > [2013-02-18 05:55:31.382302] E > [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-volume1-client-17: remote > operation failed: Stale NFS file handle > [2013-02-18 05:55:31.382327] E > [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-volume1-client-18: remote > operation failed: Stale NFS file handle > [2013-02-18 05:55:31.640791] W [inode.c:1044:inode_path] > (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xe8fd) [0x7fa8341868fd] > (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xa6bb) [0x7fa8341826bb] > (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_loc_fill+0x1c6) > [0x7fa83417d156]))) 0-volume1/inode: no dentry for non-root inode > -69777006931: 0a37836d-e9e5-4cc1-8bd2-e8a49947959b > [2013-02-18 05:55:31.640865] W [fuse-bridge.c:561:fuse_getattr] > 0-glusterfs-fuse: 2298073: GETATTR 140360215569520 (fuse_loc_fill() failed) > [2013-02-18 05:55:31.641672] W [inode.c:1044:inode_path] > (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xe8fd) [0x7fa8341868fd] > (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(+0xa6bb) [0x7fa8341826bb] > (-->/usr/lib/glusterfs/3.2.7/xlator/mount/fuse.so(fuse_loc_fill+0x1c6) > [0x7fa83417d156]))) 0-volume1/inode: no dentry for non-root inode > -69777006931: 0a37836d-e9e5-4cc1-8bd2-e8a49947959b > [2013-02-18 05:55:31.641724] W [fuse-bridge.c:561:fuse_getattr] > 0-glusterfs-fuse: 2298079: GETATTR 140360215569520 (fuse_loc_fill() failed) > ... > > Sometimes on these events, and sometimes not, there will also be logs (on > both normal and abnormal nodes) of the form: > > [2013-02-18 03:35:28.679681] I [dht-common.c:525:dht_revalidate_cbk] > 0-volume1-dht: mismatching layouts for /inSample/pred/20110831 > > I understand from reading the mailing list that both the dentry errors and > the mismatched layout errors are both non-fatal warnings and that the > metadata will become internally consistent regardless. But these errors > only happen on times when I'm slamming the glusterfs system with the > creation of a bunch of small files in a very short burst like I described > above. So their presence seems to be related to the error. > > I think the issue is almost assuredly related to the delayed propagation > of glusterfs directory metadata. Some nodes are creating directory > simultaneous to other nodes and the two are producing inconsistencies with > regards to the dht layout information. My hypothesis is that when Node A is > still writing that the process to resolve the inconsistencies with and > propagate the metadata from Node B is rendering the location that Node A is > writing to disconnected from its supposed path. (And hence the no dentry > errors). > > I've taken some effort to go through the glusterfs source code, > particularly the dht related files. The way dht normalizes anomalies could > be the problem, but I've failed to find anything specific. > > Has anyone else run into a problem like this, or have insight about what > might be causing it or how to avoid it? > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130218/e4ba525d/attachment.html>