Thanks Ravi for your answer. Stupid question but how do I delete the trusted.afr xattrs on this brick? And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? ?? ??????? Original Message ??????? On April 9, 2018 1:24 PM, Ravishankar N <ravishankar at redhat.com> wrote:> ?? > > On 04/09/2018 04:36 PM, mabi wrote: > > > As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: > > > > NODE1: > > > > STAT: > > > > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > NODE2: > > > > STAT: > > > > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > NODE3: > > > > STAT: > > > > File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > > > > Device: ca11h/51729d Inode: 404058268 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-05 16:16:55.292341447 +0200 > > > > Modify: 2018-04-05 16:16:55.292341447 +0200 > > > > Change: 2018-04-09 08:58:54.428074177 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.afr.dirty=0sAAAAAAAAAAAAAAAA > > > > trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA > > > > trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and > > launch heal, that should fix it. But the file seems to have no content > > on both data bricks as well, so you might want to check if that was > > expected. > > -Ravi > > > trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA=> > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > CLIENT GLUSTER MOUNT: > > > > STAT: > > > > File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' > > > > Size: 0 Blocks: 0 IO Block: 131072 regular empty file > > > > Device: 1eh/30d Inode: 13600685574951729371 Links: 1 > > > > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > ??????? Original Message ??????? > > > > On April 9, 2018 9:49 AM, mabi mabi at protonmail.ch wrote: > > > > > Here would be also the corresponding log entries on a gluster node brick log file: > > > > > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] > > > > > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] > > > > > > Hope that helps to find out the issue. > > > > > > ??????? Original Message ??????? > > > > > > On April 9, 2018 9:37 AM, mabi mabi at protonmail.ch wrote: > > > > > > > Hello, > > > > > > > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. > > > > > > > > All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. > > > > > > > > I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. > > > > > > > > For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. > > > > > > > > What can I do to fix that issue? > > > > > > > > Best regards, > > > > > > > > Mabi > > > > > > > > [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>) > > > > > > > > [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > > > > > [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > > > > > [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > > > > > [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > > > > > [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > > > > > [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > > > > > [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > > > > > [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > > > > > [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 > > > > > > > > [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > > > > > [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > > > > > [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > > > > > [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > > > > > [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) > > > > > > > > Gluster-users mailing list > > > > > > > > Gluster-users at gluster.org > > > > > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-Apr-09 11:46 UTC
[Gluster-users] New 3.12.7 possible split-brain on replica 3
On 04/09/2018 05:09 PM, mabi wrote:> Thanks Ravi for your answer. > > Stupid question but how do I delete the trusted.afr xattrs on this brick? > > And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)?Sorry I should have been clearer. Yes the brick on the 3rd node. `setfattr -x trusted.afr.myvol-private-client-0 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` `setfattr -x trusted.afr.myvol-private-client-1 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` After doing this for all files, run 'gluster volume heal <volname>`. HTH, Ravi> ?? > > ??????? Original Message ??????? > > On April 9, 2018 1:24 PM, Ravishankar N <ravishankar at redhat.com> wrote: > >> ?? >> >> On 04/09/2018 04:36 PM, mabi wrote: >> >>> As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: >>> >>> NODE1: >>> >>> STAT: >>> >>> File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? >>> >>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file >>> >>> Device: 23h/35d Inode: 6822549 Links: 2 >>> >>> Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) >>> >>> Access: 2018-04-09 08:58:54.311556621 +0200 >>> >>> Modify: 2018-04-09 08:58:54.311556621 +0200 >>> >>> Change: 2018-04-09 08:58:54.423555611 +0200 >>> >>> Birth: - >>> >>> GETFATTR: >>> >>> trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=>>> >>> trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" >>> >>> trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=>>> >>> trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=>>> >>> NODE2: >>> >>> STAT: >>> >>> File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? >>> >>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file >>> >>> Device: 24h/36d Inode: 6825876 Links: 2 >>> >>> Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) >>> >>> Access: 2018-04-09 08:58:54.311775605 +0200 >>> >>> Modify: 2018-04-09 08:58:54.311775605 +0200 >>> >>> Change: 2018-04-09 08:58:54.423774007 +0200 >>> >>> Birth: - >>> >>> GETFATTR: >>> >>> trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=>>> >>> trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" >>> >>> trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=>>> >>> trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=>>> >>> NODE3: >>> >>> STAT: >>> >>> File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile >>> >>> Size: 0 Blocks: 8 IO Block: 4096 regular empty file >>> >>> Device: ca11h/51729d Inode: 404058268 Links: 2 >>> >>> Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) >>> >>> Access: 2018-04-05 16:16:55.292341447 +0200 >>> >>> Modify: 2018-04-05 16:16:55.292341447 +0200 >>> >>> Change: 2018-04-09 08:58:54.428074177 +0200 >>> >>> Birth: - >>> >>> GETFATTR: >>> >>> trusted.afr.dirty=0sAAAAAAAAAAAAAAAA >>> >>> trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA >>> >>> trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA >> Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed >> >> by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and >> >> launch heal, that should fix it. But the file seems to have no content >> >> on both data bricks as well, so you might want to check if that was >> >> expected. >> >> -Ravi >> >>> trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA=>>> >>> trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=>>> >>> trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=>>> >>> trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=>>> >>> CLIENT GLUSTER MOUNT: >>> >>> STAT: >>> >>> File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' >>> >>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file >>> >>> Device: 1eh/30d Inode: 13600685574951729371 Links: 1 >>> >>> Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) >>> >>> Access: 2018-04-09 08:58:54.311556621 +0200 >>> >>> Modify: 2018-04-09 08:58:54.311556621 +0200 >>> >>> Change: 2018-04-09 08:58:54.423555611 +0200 >>> >>> Birth: - >>> >>> ??????? Original Message ??????? >>> >>> On April 9, 2018 9:49 AM, mabi mabi at protonmail.ch wrote: >>> >>>> Here would be also the corresponding log entries on a gluster node brick log file: >>>> >>>> [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] >>>> >>>> [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] >>>> >>>> Hope that helps to find out the issue. >>>> >>>> ??????? Original Message ??????? >>>> >>>> On April 9, 2018 9:37 AM, mabi mabi at protonmail.ch wrote: >>>> >>>>> Hello, >>>>> >>>>> Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. >>>>> >>>>> All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. >>>>> >>>>> I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. >>>>> >>>>> For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. >>>>> >>>>> What can I do to fix that issue? >>>>> >>>>> Best regards, >>>>> >>>>> Mabi >>>>> >>>>> [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>) >>>>> >>>>> [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] >>>>> >>>>> [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] >>>>> >>>>> [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] >>>>> >>>>> [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] >>>>> >>>>> [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 >>>>> >>>>> [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 >>>>> >>>>> [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 >>>>> >>>>> [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c >>>>> >>>>> [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 >>>>> >>>>> [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 >>>>> >>>>> [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c >>>>> >>>>> [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] >>>>> >>>>> [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] >>>>> >>>>> [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) >>>>> >>>>> Gluster-users mailing list >>>>> >>>>> Gluster-users at gluster.org >>>>> >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> Gluster-users mailing list >>> >>> Gluster-users at gluster.org >>> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >
Again thanks that worked and I have now no more unsynched files. You mentioned that this bug has been fixed in 3.13, would it be possible to backport it to 3.12? I am asking because 3.13 is not a long-term release and as such I would not like to have to upgrade to 3.13. ??????? Original Message ??????? On April 9, 2018 1:46 PM, Ravishankar N <ravishankar at redhat.com> wrote:> ?? > > On 04/09/2018 05:09 PM, mabi wrote: > > > Thanks Ravi for your answer. > > > > Stupid question but how do I delete the trusted.afr xattrs on this brick? > > > > And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? > > Sorry I should have been clearer. Yes the brick on the 3rd node. > > `setfattr -x trusted.afr.myvol-private-client-0 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` > > `setfattr -x trusted.afr.myvol-private-client-1 /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile` > > After doing this for all files, run 'gluster volume heal <volname>`. > > HTH, > > Ravi > > > ??????? Original Message ??????? > > > > On April 9, 2018 1:24 PM, Ravishankar N ravishankar at redhat.com wrote: > > > > > On 04/09/2018 04:36 PM, mabi wrote: > > > > > > > As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: > > > > > > > > NODE1: > > > > > > > > STAT: > > > > > > > > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > > > > > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > > > > > NODE2: > > > > > > > > STAT: > > > > > > > > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > > > > > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > > > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > > > > > NODE3: > > > > > > > > STAT: > > > > > > > > File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > > > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > > > > > > > > Device: ca11h/51729d Inode: 404058268 Links: 2 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > > > > > Access: 2018-04-05 16:16:55.292341447 +0200 > > > > > > > > Modify: 2018-04-05 16:16:55.292341447 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.428074177 +0200 > > > > > > > > Birth: - > > > > > > > > GETFATTR: > > > > > > > > trusted.afr.dirty=0sAAAAAAAAAAAAAAAA > > > > > > > > trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA > > > > > > > > trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA > > > > > > > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed > > > > > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and > > > > > > launch heal, that should fix it. But the file seems to have no content > > > > > > on both data bricks as well, so you might want to check if that was > > > > > > expected. > > > > > > -Ravi > > > > > > > trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA=> > > > > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > > > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > > > > > CLIENT GLUSTER MOUNT: > > > > > > > > STAT: > > > > > > > > File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' > > > > > > > > Size: 0 Blocks: 0 IO Block: 131072 regular empty file > > > > > > > > Device: 1eh/30d Inode: 13600685574951729371 Links: 1 > > > > > > > > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) > > > > > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > > > > > Birth: - > > > > > > > > ??????? Original Message ??????? > > > > > > > > On April 9, 2018 9:49 AM, mabi mabi at protonmail.ch wrote: > > > > > > > > > Here would be also the corresponding log entries on a gluster node brick log file: > > > > > > > > > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] > > > > > > > > > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] > > > > > > > > > > Hope that helps to find out the issue. > > > > > > > > > > ??????? Original Message ??????? > > > > > > > > > > On April 9, 2018 9:37 AM, mabi mabi at protonmail.ch wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. > > > > > > > > > > > > All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. > > > > > > > > > > > > I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. > > > > > > > > > > > > For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. > > > > > > > > > > > > What can I do to fix that issue? > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Mabi > > > > > > > > > > > > [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>) > > > > > > > > > > > > [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > > > > > > > > > [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > > > > > > > > > [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > > > > > > > > > [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > > > > > > > > > [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > > > > > > > > > [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > > > > > > > > > [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > > > > > > > > > [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > > > > > > > > > [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 > > > > > > > > > > > > [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > > > > > > > > > [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > > > > > > > > > [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > > > > > > > > > [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > > > > > > > > > [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) > > > > > > > > > > > > Gluster-users mailing list > > > > > > > > > > > > Gluster-users at gluster.org > > > > > > > > > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > Gluster-users mailing list > > > > > > > > Gluster-users at gluster.org > > > > > > > > http://lists.gluster.org/mailman/listinfo/gluster-users