As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: NODE1: STAT: File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 23h/35d Inode: 6822549 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ= NODE2: STAT: File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? Size: 0 Blocks: 38 IO Block: 131072 regular empty file Device: 24h/36d Inode: 6825876 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-09 08:58:54.311775605 +0200 Modify: 2018-04-09 08:58:54.311775605 +0200 Change: 2018-04-09 08:58:54.423774007 +0200 Birth: - GETFATTR: trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ= NODE3: STAT: File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile Size: 0 Blocks: 8 IO Block: 4096 regular empty file Device: ca11h/51729d Inode: 404058268 Links: 2 Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) Access: 2018-04-05 16:16:55.292341447 +0200 Modify: 2018-04-05 16:16:55.292341447 +0200 Change: 2018-04-09 08:58:54.428074177 +0200 Birth: - GETFATTR: trusted.afr.dirty=0sAAAAAAAAAAAAAAAA trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA=trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ= CLIENT GLUSTER MOUNT: STAT: File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 1eh/30d Inode: 13600685574951729371 Links: 1 Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) Access: 2018-04-09 08:58:54.311556621 +0200 Modify: 2018-04-09 08:58:54.311556621 +0200 Change: 2018-04-09 08:58:54.423555611 +0200 Birth: - ?? ??????? Original Message ??????? On April 9, 2018 9:49 AM, mabi <mabi at protonmail.ch> wrote:> ?? > > Here would be also the corresponding log entries on a gluster node brick log file: > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] > > Hope that helps to find out the issue. > > ??????? Original Message ??????? > > On April 9, 2018 9:37 AM, mabi mabi at protonmail.ch wrote: > > > Hello, > > > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. > > > > All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. > > > > I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. > > > > For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. > > > > What can I do to fix that issue? > > > > Best regards, > > > > Mabi > > > > [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>) > > > > [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 > > > > [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users
Ravishankar N
2018-Apr-09 11:24 UTC
[Gluster-users] New 3.12.7 possible split-brain on replica 3
On 04/09/2018 04:36 PM, mabi wrote:> As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: > > NODE1: > > STAT: > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > Device: 23h/35d Inode: 6822549 Links: 2 > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > Access: 2018-04-09 08:58:54.311556621 +0200 > Modify: 2018-04-09 08:58:54.311556621 +0200 > Change: 2018-04-09 08:58:54.423555611 +0200 > Birth: - > > GETFATTR: > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > NODE2: > > STAT: > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > Device: 24h/36d Inode: 6825876 Links: 2 > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > Access: 2018-04-09 08:58:54.311775605 +0200 > Modify: 2018-04-09 08:58:54.311775605 +0200 > Change: 2018-04-09 08:58:54.423774007 +0200 > Birth: - > > GETFATTR: > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > NODE3: > STAT: > File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > Device: ca11h/51729d Inode: 404058268 Links: 2 > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > Access: 2018-04-05 16:16:55.292341447 +0200 > Modify: 2018-04-05 16:16:55.292341447 +0200 > Change: 2018-04-09 08:58:54.428074177 +0200 > Birth: - > > GETFATTR: > trusted.afr.dirty=0sAAAAAAAAAAAAAAAA > trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA > trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAALooks like you hit the bug of arbiter becoming source (BZ 1482064) fixed by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and launch heal, that should fix it. But the file seems to have no content on both data bricks as well, so you might want to check if that was expected. -Ravi> trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA=> trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > CLIENT GLUSTER MOUNT: > STAT: > File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' > Size: 0 Blocks: 0 IO Block: 131072 regular empty file > Device: 1eh/30d Inode: 13600685574951729371 Links: 1 > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) > Access: 2018-04-09 08:58:54.311556621 +0200 > Modify: 2018-04-09 08:58:54.311556621 +0200 > Change: 2018-04-09 08:58:54.423555611 +0200 > Birth: - > ?? > > ??????? Original Message ??????? > > On April 9, 2018 9:49 AM, mabi <mabi at protonmail.ch> wrote: > >> ?? >> >> Here would be also the corresponding log entries on a gluster node brick log file: >> >> [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] >> >> [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] >> >> Hope that helps to find out the issue. >> >> ??????? Original Message ??????? >> >> On April 9, 2018 9:37 AM, mabi mabi at protonmail.ch wrote: >> >>> Hello, >>> >>> Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. >>> >>> All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. >>> >>> I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. >>> >>> For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. >>> >>> What can I do to fix that issue? >>> >>> Best regards, >>> >>> Mabi >>> >>> [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>) >>> >>> [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] >>> >>> [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] >>> >>> [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] >>> >>> [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] >>> >>> [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 >>> >>> [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 >>> >>> [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 >>> >>> [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c >>> >>> [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 >>> >>> [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 >>> >>> [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c >>> >>> [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] >>> >>> [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] >>> >>> [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) >>> >>> Gluster-users mailing list >>> >>> Gluster-users at gluster.org >>> >>> http://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Thanks Ravi for your answer. Stupid question but how do I delete the trusted.afr xattrs on this brick? And when you say "this brick", do you mean the brick on the arbitrer node (node 3 in my case)? ?? ??????? Original Message ??????? On April 9, 2018 1:24 PM, Ravishankar N <ravishankar at redhat.com> wrote:> ?? > > On 04/09/2018 04:36 PM, mabi wrote: > > > As I was suggested in the past by this mailing list a now ran a stat and getfattr on one of the problematic files on all nodes and at the end a stat on the fuse mount directly. The output is below: > > > > NODE1: > > > > STAT: > > > > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 23h/35d Inode: 6822549 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > NODE2: > > > > STAT: > > > > File: ?/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile? > > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file > > > > Device: 24h/36d Inode: 6825876 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-09 08:58:54.311775605 +0200 > > > > Modify: 2018-04-09 08:58:54.311775605 +0200 > > > > Change: 2018-04-09 08:58:54.423774007 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile" > > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > NODE3: > > > > STAT: > > > > File: /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile > > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file > > > > Device: ca11h/51729d Inode: 404058268 Links: 2 > > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN) > > > > Access: 2018-04-05 16:16:55.292341447 +0200 > > > > Modify: 2018-04-05 16:16:55.292341447 +0200 > > > > Change: 2018-04-09 08:58:54.428074177 +0200 > > > > Birth: - > > > > GETFATTR: > > > > trusted.afr.dirty=0sAAAAAAAAAAAAAAAA > > > > trusted.afr.myvol-private-client-0=0sAAAAAQAAAAAAAAAA > > > > trusted.afr.myvol-private-client-1=0sAAAAAQAAAAAAAAAA > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and > > launch heal, that should fix it. But the file seems to have no content > > on both data bricks as well, so you might want to check if that was > > expected. > > -Ravi > > > trusted.bit-rot.version=0sAgAAAAAAAABavUW2AAGBaA=> > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w=> > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAAAAAAAAAAAAAAAAAAAAAQ=> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAAAAAQ=> > > > CLIENT GLUSTER MOUNT: > > > > STAT: > > > > File: '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile' > > > > Size: 0 Blocks: 0 IO Block: 131072 regular empty file > > > > Device: 1eh/30d Inode: 13600685574951729371 Links: 1 > > > > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909) > > > > Access: 2018-04-09 08:58:54.311556621 +0200 > > > > Modify: 2018-04-09 08:58:54.311556621 +0200 > > > > Change: 2018-04-09 08:58:54.423555611 +0200 > > > > Birth: - > > > > ??????? Original Message ??????? > > > > On April 9, 2018 9:49 AM, mabi mabi at protonmail.ch wrote: > > > > > Here would be also the corresponding log entries on a gluster node brick log file: > > > > > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: removing gfid2path xattr failed on /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a: key = trusted.gfid2path.2529bb66b56be110 [No data available] > > > > > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 0-myvol-private-posix: opendir failed on /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE [No such file or directory] > > > > > > Hope that helps to find out the issue. > > > > > > ??????? Original Message ??????? > > > > > > On April 9, 2018 9:37 AM, mabi mabi at protonmail.ch wrote: > > > > > > > Hello, > > > > > > > > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. Ineeded checking that volume with a "volume heal info" shows that the third node (the arbitrer node) has 9 files to be healed but are not being healed automatically. > > > > > > > > All nodes were always online and there was no network interruption so I am wondering if this might not really be a split-brain issue but something else. > > > > > > > > I found some interesting log entries on the client log file (/var/log/glusterfs/myvol-private.log) which I have included below in this mail. It looks like some renaming has gone wrong because a directory is not empty. > > > > > > > > For your information I have upgraded my GlusterFS in offline mode and the upgrade went smoothly. > > > > > > > > What can I do to fix that issue? > > > > > > > > Best regards, > > > > > > > > Mabi > > > > > > > > [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 0-myvol-private-dht: renaming /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip (hash=myvol-private-replicate-0/cache=<nul>) > > > > > > > > [2018-04-09 06:58:53.692440] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > > > > > [2018-04-09 06:58:53.714129] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > > > > > [2018-04-09 06:58:53.714161] W [MSGID: 114031] [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory] > > > > > > > > [2018-04-09 06:58:53.715638] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote operation failed [Directory not empty] > > > > > > > > [2018-04-09 06:58:53.750372] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > > > > > [2018-04-09 06:58:53.757677] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > > > > > [2018-04-09 06:58:53.775939] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156 > > > > > > > > [2018-04-09 06:58:53.776237] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-myvol-private-replicate-0: performing metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > > > > > [2018-04-09 06:58:53.781762] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. sources=[2] sinks=0 1 > > > > > > > > [2018-04-09 06:58:53.796950] I [MSGID: 108026] [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] sinks=0 1 > > > > > > > > [2018-04-09 06:58:53.812682] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c > > > > > > > > [2018-04-09 06:58:53.879382] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > > > > > [2018-04-09 06:58:53.881514] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain observed. [Input/output error] > > > > > > > > [2018-04-09 06:58:53.890073] W [MSGID: 108027] [afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read subvols for (null) > > > > > > > > Gluster-users mailing list > > > > > > > > Gluster-users at gluster.org > > > > > > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > http://lists.gluster.org/mailman/listinfo/gluster-users