Marc Seeger
2013-Jun-03 09:07 UTC
[Gluster-users] Fuse client dying after "gfid different on subvolume" ?
Hey gluster-users, I just stumbled on a problem in our current test-setup of gluster 3.3.2. This is a simple replicated setup with 2 bricks (on XFS) in 1 volume running on glusterfs version 3.3.2qa3 on ubuntu lucid. The client mounting this volume on /mnt/gfs sits on a mother machine and is using fuse (Version: 2.8.1-1.1ubuntu3.1). On the gluster-fs fuse client mount log: [2013-06-02 21:23:26.677069] W [afr-common.c:1196:afr_detect_self_heal_by_iatt] 0-test-fs-cluster-1-replicate-0: /home/filesshared/README.txt.lock: gfid different on subvolume [2013-06-02 21:23:26.677069] I [afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk] 0-test-fs-cluster-1-replicate-0: Non blocking entrylks failed. [2013-06-02 21:23:26.697068] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-test-fs-cluster-1-client-0: remote operation failed: File exists. Path: /home/filesshared/README.txt.lock (00000000-0000-0000-0000-000000000000) [2013-06-02 21:23:26.697068] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-test-fs-cluster-1-client-1: remote operation failed: File exists. Path: /home/filesshared/README.txt.lock (00000000-0000-0000-0000-000000000000) [2013-06-02 21:23:26.697068] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/debug/io-stats.so(io_stats_lookup_cbk+0xff) [0x7fb16c310d8f] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf248) [0x7fb16fa95248] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf0b1) [0x7fb16fa950b1]))) 0-fuse: inode not found What the application side is doing when this happened: 1. It created /home/filesshared 2. creates /mnt/gfs/home/filesshared 3. deleted /home/filesshared and replaced it with a symlink from /home/filesshared to /mnt/gfs/home/filesshared 4. Tried to write some files Here's the log for that: 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: deploying filesshared.prod 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: creating directory: dir=/home/filesshared, user=0, group=filesshared, mode=0550 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: creating directory: dir=/mnt/gfs/home/filesshared, user=filesshared, group=filesshared, mode=0700 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: created /home/filesshared -> /mnt/gfs/home/filesshared 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 2013-06-02T21:23:27+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 2013-06-02T21:23:27+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 2013-06-02T21:23:28+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 What this resulted in: This turned the mount point completely unresponsive. This means that in PHP, file_exists('/mnt/gfs') returns false and stat() calls fail. In Ruby File.directory?('/mnt/gfs') returns false. This can be solved by calling "umount /mnt/gfs" and then remounting the share again from fstab ("mount /mnt/gfs") I could not find any relevant log entries on the bricks themselves. I sadly also wasn't able to come up with a test case to reproduce it. It seems somewhat similar to http://gluster.org/pipermail/gluster-users/2013-March/035662.html I initially thought that this could have been fixed in http://review.gluster.org/#/c/4689/ , but the qa branch we run has this fix backported. Any idea what could cause this behaviour? Cheers, Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130603/f8bca681/attachment.html>
Marc Seeger
2013-Jun-05 15:51 UTC
[Gluster-users] Fuse client dying after "gfid different on subvolume" ?
And another one: [2013-06-05 09:39:23.281555] W [afr-common.c:1196:afr_detect_self_heal_by_iatt] 0-test-fs-cluster-1-replicate-0: /home/qarshared78/.drush/qarshared78.aliases.drushrc.php.lock: gfid different on subvolume [2013-06-05 09:39:23.281555] I [afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk] 0-test-fs-cluster-1-replicate-0: Non blocking entrylks failed. [2013-06-05 09:39:23.281555] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/debug/io-stats.so(io_stats_lookup_cbk+0xff) [0x7fe9c8481d8f] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf248) [0x7fe9cbc06248] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf0b1) [0x7fe9cbc060b1]))) 0-fuse: inode not found Unmounting and remounting fixes the problem, but until then the volume mount doesn't responding anymore. :-/ On Jun 3, 2013, at 11:07 AM, Marc Seeger <marc.seeger at acquia.com> wrote:> Hey gluster-users, > I just stumbled on a problem in our current test-setup of gluster 3.3.2. > > This is a simple replicated setup with 2 bricks (on XFS) in 1 volume running on glusterfs version 3.3.2qa3 on ubuntu lucid. > The client mounting this volume on /mnt/gfs sits on a mother machine and is using fuse (Version: 2.8.1-1.1ubuntu3.1). > > On the gluster-fs fuse client mount log: > [2013-06-02 21:23:26.677069] W [afr-common.c:1196:afr_detect_self_heal_by_iatt] 0-test-fs-cluster-1-replicate-0: /home/filesshared/README.txt.lock: gfid different on subvolume > [2013-06-02 21:23:26.677069] I [afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk] 0-test-fs-cluster-1-replicate-0: Non blocking entrylks failed. > [2013-06-02 21:23:26.697068] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-test-fs-cluster-1-client-0: remote operation failed: File exists. Path: /home/filesshared/README.txt.lock (00000000-0000-0000-0000-000000000000) > [2013-06-02 21:23:26.697068] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-test-fs-cluster-1-client-1: remote operation failed: File exists. Path: /home/filesshared/README.txt.lock (00000000-0000-0000-0000-000000000000) > [2013-06-02 21:23:26.697068] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/debug/io-stats.so(io_stats_lookup_cbk+0xff) [0x7fb16c310d8f] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf248) [0x7fb16fa95248] (-->/usr/lib/glusterfs/3.3.2qa3/xlator/mount/fuse.so(+0xf0b1) [0x7fb16fa950b1]))) 0-fuse: inode not found > > > What the application side is doing when this happened: > 1. It created /home/filesshared > 2. creates /mnt/gfs/home/filesshared > 3. deleted /home/filesshared and replaced it with a symlink from /home/filesshared to /mnt/gfs/home/filesshared > 4. Tried to write some files > > Here's the log for that: > 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: deploying filesshared.prod > 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: creating directory: dir=/home/filesshared, user=0, group=filesshared, mode=0550 > 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: creating directory: dir=/mnt/gfs/home/filesshared, user=filesshared, group=filesshared, mode=0700 > 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: created /home/filesshared -> /mnt/gfs/home/filesshared > 2013-06-02T21:23:26+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 > 2013-06-02T21:23:27+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 > 2013-06-02T21:23:27+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 > 2013-06-02T21:23:28+00:00 daemon.notice web-14 f-c-w[4842]: PHP Warning: stat(): stat failed for /home/filesshared/README.txt.lock in /usr/ah/lib/ah-lib.php on line 701 > > What this resulted in: > This turned the mount point completely unresponsive. > This means that in PHP, file_exists('/mnt/gfs') returns false and stat() calls fail. In Ruby File.directory?('/mnt/gfs') returns false. > This can be solved by calling "umount /mnt/gfs" and then remounting the share again from fstab ("mount /mnt/gfs") > > I could not find any relevant log entries on the bricks themselves. I sadly also wasn't able to come up with a test case to reproduce it. > > It seems somewhat similar to http://gluster.org/pipermail/gluster-users/2013-March/035662.html > I initially thought that this could have been fixed in http://review.gluster.org/#/c/4689/ , but the qa branch we run has this fix backported. > > Any idea what could cause this behaviour? > > Cheers, > Marc-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130605/7e5ccb4f/attachment.html>