Marc Seeger
2013-Mar-07 00:08 UTC
[Gluster-users] Possible reason for "meta-data data entry missing-entry gfid self-heal failed"?
Hey, in our testing we seem to sometimes run into a problem with gluster fs breaking. The most recent occurrence was two processes on two machines trying to stat the same lock file: 2013-03-06T16:41:27+00:00 daemon.notice : creating directory: dir=/home/vcltest464/.drush, user=10036, group=10036, mode=0700 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 (a few thousand times. it was a bug in the locking code when the stat failed) It ends up as something like this in the gluster log files: [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote8: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote7: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) [2013-03-06 16:36:24.809098] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 9061: MKDIR() /vcltest473/php_sessions => -1 (File exists) [2013-03-06 16:36:26.179144] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-replicate0: no missing files - /home/vcltest473/.drush/vcltest473.aliases.drushrc.php.lock. proceeding to metadata check [2013-03-06 16:36:34.899435] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-replicate0: /vcltest473/livedev: Performing conservative merge [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) [2013-03-06 16:41:02.118580] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12435: MKDIR() /vcltest723/files-private => -1 (File exists) [2013-03-06 16:41:27.179425] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-replicate0: Non blocking entrylks failed. [2013-03-06 16:41:27.179425] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-replicate0: background meta-data data entry missing-entry gfid self-heal failed on /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock [2013-03-06 16:41:27.179425] W [inode.c:914:inode_lookup] (-->/usr/lib/libglusterfs.so.0(default_lookup_cbk+0xc1) [0x7f93c6ac7d01] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xf198) [0x7f93c45ef198] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xeffb) [0x7f93c45eeffb]))) 0-fuse: inode not found After this, the mountpoint was not responding to file_exists() anymore which usually means the client died. Any idea what could have caused such behaviour? Cheers, Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130307/52698d2b/attachment.html>
Marc Seeger
2013-Mar-07 00:14 UTC
[Gluster-users] Possible reason for "meta-data data entry missing-entry gfid self-heal failed"?
Almost forgot: These operations were done on a symlinked directory (/home is linked to /mnt/gfs/home where /mnt/gfs is the gluster mountpoint) On Mar 7, 2013, at 1:08 AM, Marc Seeger <marc.seeger at acquia.com> wrote:> Hey, in our testing we seem to sometimes run into a problem with gluster fs breaking. > The most recent occurrence was two processes on two machines trying to stat the same lock file: > 2013-03-06T16:41:27+00:00 daemon.notice : creating directory: dir=/home/vcltest464/.drush, user=10036, group=10036, mode=0700 > 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 > 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 > 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 > (a few thousand times. it was a bug in the locking code when the stat failed) > > It ends up as something like this in the gluster log files: > > [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote8: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) > [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote7: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) > [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) > [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) > [2013-03-06 16:36:24.809098] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 9061: MKDIR() /vcltest473/php_sessions => -1 (File exists) > [2013-03-06 16:36:26.179144] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-replicate0: no missing files - /home/vcltest473/.drush/vcltest473.aliases.drushrc.php.lock. proceeding to metadata check > [2013-03-06 16:36:34.899435] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-replicate0: /vcltest473/livedev: Performing conservative merge > [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) > [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) > [2013-03-06 16:41:02.118580] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12435: MKDIR() /vcltest723/files-private => -1 (File exists) > [2013-03-06 16:41:27.179425] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-replicate0: Non blocking entrylks failed. > [2013-03-06 16:41:27.179425] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-replicate0: background meta-data data entry missing-entry gfid self-heal failed on /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock > [2013-03-06 16:41:27.179425] W [inode.c:914:inode_lookup] (-->/usr/lib/libglusterfs.so.0(default_lookup_cbk+0xc1) [0x7f93c6ac7d01] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xf198) [0x7f93c45ef198] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xeffb) [0x7f93c45eeffb]))) 0-fuse: inode not found > > > After this, the mountpoint was not responding to file_exists() anymore which usually means the client died. Any idea what could have caused such behaviour? > > > Cheers, > Marc-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130307/61ddc555/attachment.html>
Marc Seeger
2013-Mar-18 16:42 UTC
[Gluster-users] Possible reason for "meta-data data entry missing-entry gfid self-heal failed"?
Sadly, we keep seeing those. The logs display the same pattern: [2013-03-18 05:22:49.174382] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-replicate0: Non blocking entrylks failed. [2013-03-18 05:22:49.174382] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-replicate0: background meta-data data entry missing-entry gfid self-heal failed on /home/vcltest256/.drush/vcltest256.aliases.drushrc.php.lock [2013-03-18 05:22:49.174382] W [inode.c:914:inode_lookup] (-->/usr/lib/libglusterfs.so.0(default_lookup_cbk+0xc1) [0x7f14a4cd6d01] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xf198) [0x7f14a27fe198] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xeffb) [0x7f14a27fdffb]))) 0-fuse: inode not found On Mar 7, 2013, at 1:14 AM, Marc Seeger <marc.seeger at acquia.com> wrote:> Almost forgot: These operations were done on a symlinked directory (/home is linked to /mnt/gfs/home where /mnt/gfs is the gluster mountpoint) > > On Mar 7, 2013, at 1:08 AM, Marc Seeger <marc.seeger at acquia.com> wrote: > >> Hey, in our testing we seem to sometimes run into a problem with gluster fs breaking. >> The most recent occurrence was two processes on two machines trying to stat the same lock file: >> 2013-03-06T16:41:27+00:00 daemon.notice : creating directory: dir=/home/vcltest464/.drush, user=10036, group=10036, mode=0700 >> 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 >> 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 >> 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 >> (a few thousand times. it was a bug in the locking code when the stat failed) >> >> It ends up as something like this in the gluster log files: >> >> [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote8: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) >> [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote7: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) >> [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) >> [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) >> [2013-03-06 16:36:24.809098] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 9061: MKDIR() /vcltest473/php_sessions => -1 (File exists) >> [2013-03-06 16:36:26.179144] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-replicate0: no missing files - /home/vcltest473/.drush/vcltest473.aliases.drushrc.php.lock. proceeding to metadata check >> [2013-03-06 16:36:34.899435] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-replicate0: /vcltest473/livedev: Performing conservative merge >> [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) >> [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) >> [2013-03-06 16:41:02.118580] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12435: MKDIR() /vcltest723/files-private => -1 (File exists) >> [2013-03-06 16:41:27.179425] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-replicate0: Non blocking entrylks failed. >> [2013-03-06 16:41:27.179425] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-replicate0: background meta-data data entry missing-entry gfid self-heal failed on /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock >> [2013-03-06 16:41:27.179425] W [inode.c:914:inode_lookup] (-->/usr/lib/libglusterfs.so.0(default_lookup_cbk+0xc1) [0x7f93c6ac7d01] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xf198) [0x7f93c45ef198] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xeffb) [0x7f93c45eeffb]))) 0-fuse: inode not found >> >> >> After this, the mountpoint was not responding to file_exists() anymore which usually means the client died. Any idea what could have caused such behaviour? >> >> >> Cheers, >> Marc >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130318/49274278/attachment.html>
Pranith Kumar K
2013-Mar-20 02:06 UTC
[Gluster-users] Possible reason for "meta-data data entry missing-entry gfid self-heal failed"?
On 03/18/2013 10:12 PM, Marc Seeger wrote:> Sadly, we keep seeing those. The logs display the same pattern: > > [2013-03-18 05:22:49.174382] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-replicate0: Non blocking entrylks failed. > [2013-03-18 05:22:49.174382] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-replicate0: background meta-data data entry missing-entry gfid self-heal failed on /home/vcltest256/.drush/vcltest256.aliases.drushrc.php.lock > [2013-03-18 05:22:49.174382] W [inode.c:914:inode_lookup] (-->/usr/lib/libglusterfs.so.0(default_lookup_cbk+0xc1) [0x7f14a4cd6d01] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xf198) [0x7f14a27fe198] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xeffb) [0x7f14a27fdffb]))) 0-fuse: inode not found > > > > On Mar 7, 2013, at 1:14 AM, Marc Seeger <marc.seeger at acquia.com > <mailto:marc.seeger at acquia.com>> wrote: > >> Almost forgot: These operations were done on a symlinked directory >> (/home is linked to /mnt/gfs/home where /mnt/gfs is the gluster >> mountpoint) >> >> On Mar 7, 2013, at 1:08 AM, Marc Seeger <marc.seeger at acquia.com >> <mailto:marc.seeger at acquia.com>> wrote: >> >>> Hey, in our testing we seem to sometimes run into a problem with gluster fs breaking. >>> The most recent occurrence was two processes on two machines trying to stat the same lock file: >>> 2013-03-06T16:41:27+00:00 daemon.notice : creating directory: dir=/home/vcltest464/.drush, user=10036, group=10036, mode=0700 >>> 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 >>> 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 >>> 2013-03-06T16:41:27+00:00 daemon.notice : PHP Warning: stat(): stat failed for /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock in [...].php on line 695 >>> (a few thousand times. it was a bug in the locking code when the stat failed) >>> It ends up as something like this in the gluster log files: >>> [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote8: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) >>> [2013-03-06 16:34:36.875559] W [client3_1-fops.c:2457:client3_1_link_cbk] 0-remote7: remote operation failed: File exists (00000000-0000-0000-0000-000000000000 -> /home/vcltest418/prod) >>> [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) >>> [2013-03-06 16:36:24.809098] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest473/php_sessions (00000000-0000-0000-0000-000000000000) >>> [2013-03-06 16:36:24.809098] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 9061: MKDIR() /vcltest473/php_sessions => -1 (File exists) >>> [2013-03-06 16:36:26.179144] I [afr-self-heal-common.c:1189:afr_sh_missing_entry_call_impunge_recreate] 0-replicate0: no missing files - /home/vcltest473/.drush/vcltest473.aliases.drushrc.php.lock. proceeding to metadata check >>> [2013-03-06 16:36:34.899435] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-replicate0: /vcltest473/livedev: Performing conservative merge >>> [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote8: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) >>> [2013-03-06 16:41:02.118580] W [client3_1-fops.c:327:client3_1_mkdir_cbk] 0-remote7: remote operation failed: File exists. Path: /vcltest723/files-private (00000000-0000-0000-0000-000000000000) >>> [2013-03-06 16:41:02.118580] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 12435: MKDIR() /vcltest723/files-private => -1 (File exists) >>> [2013-03-06 16:41:27.179425] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-replicate0: Non blocking entrylks failed. >>> [2013-03-06 16:41:27.179425] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-replicate0: background meta-data data entry missing-entry gfid self-heal failed on /home/vcltest464/.drush/vcltest464.aliases.drushrc.php.lock >>> [2013-03-06 16:41:27.179425] W [inode.c:914:inode_lookup] (-->/usr/lib/libglusterfs.so.0(default_lookup_cbk+0xc1) [0x7f93c6ac7d01] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xf198) [0x7f93c45ef198] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0xeffb) [0x7f93c45eeffb]))) 0-fuse: inode not found >>> After this, the mountpoint was not responding to file_exists() >>> anymore which usually means the client died. Any idea what could >>> have caused such behaviour? >>> >>> >>> Cheers, >>> Marc >> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-usershi Marc, Could you please give us the steps to re-create this issue. A Script would be even better. Give us volume info output so that I can try it with similar configuration as you have. Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130320/cd865a39/attachment.html>