thr3ads.net - Gluster users - [Gluster-users] Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>' [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Mark Ruys

2013-Nov-24 20:17 UTC

[Gluster-users] Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>'

So I decided to bite the bullet and upgraded from 3.3 to 3.4. Somehow this was a
painful proces for me (the glusterfs daemon refused to start), so I decided to
configure our Gluster pool from scratch. Everything seems to work nicely, except
for the self-heal daemon. In the logs, I get every 10 minutes the following
line:

[2013-11-24 19:50:34.495204] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-GLUSTER-SHARE-replicate-0: Unable to self-heal contents of
'<gfid:00000000-0000-0000-0000-000000000001>' (possible
split-brain). Please delete the file from all but the preferred subvolume.-
Pending matrix:? [ [ 0 2 ] [ 2 0 ] ]

I?ve removed and recreated
the?.glusterfs/00/00/00000000-0000-0000-0000-000000000001, but that doesn?t seem
to make a difference.?

How to fix the self-heal daemon?

Mark

# find . -name 00000000-0000-0000-0000-000000000001 -ls
1447202? ? 0 ---------- ? 2 root ? ? root? ? ? ? ? ? 0 Nov 23 22:35
./export-share-1/.glusterfs/indices/xattrop/00000000-0000-0000-0000-000000000001
1319116? ? 0 lrwxrwxrwx ? 1 root ? ? root? ? ? ? ? ? 8 Nov 23 22:35
./export-share-1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001 ->
../../..

Brick 1:
# getfattr -m . -d -e hex export-share-1
# file: export-share-1
trusted.afr.GLUSTER-SHARE-client-0=0x000000000000000000000000
trusted.afr.GLUSTER-SHARE-client-1=0x000000000000000200000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000000000
trusted.glusterfs.volume-id=0xe6eb05aabe3b456cbf3027275faa529c

Brick 2:
# getfattr -m . -d -e hex export-share-2
# file: export-share-2
trusted.afr.GLUSTER-SHARE-client-0=0x000000000000000200000000
trusted.afr.GLUSTER-SHARE-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size=0x0000000000000000
trusted.glusterfs.volume-id=0xe6eb05aabe3b456cbf3027275faa529c


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131124/462a65f3/attachment.html>

Ravishankar N

2013-Nov-25 11:22 UTC

head link

[Gluster-users] Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>'

On 11/25/2013 01:47 AM, Mark Ruys wrote:> So I decided to bite the bullet and upgraded from 3.3 to 3.4. Somehow 
> this was a painful proces for me (the glusterfs daemon refused to 
> start), so I decided to configure our Gluster pool from scratch. 
> Everything seems to work nicely, except for the self-heal daemon. In 
> the logs, I get every 10 minutes the following line:
>
> [2013-11-24 19:50:34.495204] E 
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 
> 0-GLUSTER-SHARE-replicate-0: Unable to self-heal contents of 
> '<gfid:00000000-0000-0000-0000-000000000001>' (possible
split-brain).
> Please delete the file from all but the preferred subvolume.- Pending 
> matrix:  [ [ 0 2 ] [ 2 0 ] ]
>
>
> I've removed and recreated 
> the .glusterfs/00/00/00000000-0000-0000-0000-000000000001, but that 
> doesn't seem to make a difference.
>
> How to fix the self-heal daemon?
>
> Mark
>
> # find . -name 00000000-0000-0000-0000-000000000001 -ls
>
> 1447202    0 ----------   2 root     root            0 Nov 23 22:35 
>
./export-share-1/.glusterfs/indices/xattrop/00000000-0000-0000-0000-000000000001
>
> 1319116    0 lrwxrwxrwx   1 root     root            8 Nov 23 22:35 
> ./export-share-1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001 
> -> ../../..
>
>
> Brick 1:
>
> # getfattr -m . -d -e hex export-share-1
>
> # file: export-share-1
>
> trusted.afr.GLUSTER-SHARE-client-0=0x000000000000000000000000
>
> trusted.afr.GLUSTER-SHARE-client-1=0x000000000000000200000000
>
> trusted.gfid=0x00000000000000000000000000000001
>
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>
> trusted.glusterfs.quota.dirty=0x3000
>
> trusted.glusterfs.quota.size=0x0000000000000000
>
> trusted.glusterfs.volume-id=0xe6eb05aabe3b456cbf3027275faa529c
>
>
> Brick 2:
>
> # getfattr -m . -d -e hex export-share-2
>
> # file: export-share-2
>
> trusted.afr.GLUSTER-SHARE-client-0=0x000000000000000200000000
>
> trusted.afr.GLUSTER-SHARE-client-1=0x000000000000000000000000
>
> trusted.gfid=0x00000000000000000000000000000001
>
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>
> trusted.glusterfs.quota.dirty=0x3000
>
> trusted.glusterfs.quota.size=0x0000000000000000
>
> trusted.glusterfs.volume-id=0xe6eb05aabe3b456cbf3027275faa529c
>
> From the afr extended attributes, it seems you have hit a 
metadata-split-brain of the top level (brick) directory (having gfid 
01). If you are you able to perform I/O  on all files from the mount 
point without error (EIO) and the  file contents are identical on both 
the bricks (check with md5sum), you could  safely clear the afr extended 
attributes of the bricks:

setfattr -n trusted.afr.GLUSTER-SHARE-client-0 -v 
0x000000000000000000000000 /export-share-1
setfattr -n trusted.afr.GLUSTER-SHARE-client-1 -v 
0x000000000000000000000000 /export-share-1

setfattr -n trusted.afr.GLUSTER-SHARE-client-0 -v 
0x000000000000000000000000 /export-share-2
setfattr -n trusted.afr.GLUSTER-SHARE-client-1 -v 
0x000000000000000000000000 /export-share-2

Thanks,
Ravi
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131125/dec3ac0d/attachment.html>

Gluster users - Nov 2013 - Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>'

[Gluster-users] Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>'

[Gluster-users] Unable to self-heal contents of '<gfid:00000000-0000-0000-0000-000000000001>'