Anh Vo
2018-Jul-03  18:54 UTC
[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error
I am trying to mount nfs to gluster volume and got mount.nfs failure. Looking at nfs.log I am seeing these entries Heal info does not show the mentioned gfid ( 00000000-0000-0000-0000-000000000001 ) being in split-brain. [2018-07-03 18:16:27.694953] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: c3ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) [2018-07-03 18:16:28.204685] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: c4ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) The message "E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed. [Input/output error]" repeated 2 times between [2018-07-03 18:16:27.694903] and [2018-07-03 18:17:02.310689] [2018-07-03 18:17:02.310722] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 2a6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) [2018-07-03 18:17:02.628990] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed. [Input/output error] [2018-07-03 18:17:02.629023] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 2b6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) [2018-07-03 18:17:00.398601] I [MSGID: 108031] [afr-common.c:2458:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting local read_child gv0-client-2 [2018-07-03 18:17:01.666671] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-gv0-replicate-0: no read subvols for / [2018-07-03 18:51:43.509385] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] 0-gv0-replicate-0: no read subvols for / [2018-07-03 18:51:43.936826] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: split-brain observed. [Input/output error] [2018-07-03 18:51:43.936868] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 19b1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) [2018-07-03 18:51:44.278901] W [MSGID: 112199] [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: 1ab1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180703/e5766bd5/attachment.html>
Anh Vo
2018-Jul-03  20:17 UTC
[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error
Actually we just discovered that the heal info command was returning different things when executed on the different nodes of our 3-replica setup. When we execute it on node2 we did not see the split brain reported "/" but if I execute it on node0 and node1 I am seeing: x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info Brick gfs-vm000:/gluster/brick/brick0 <gfid:81289110-867b-42ff-ba3b-1373a187032b> / - Is in split-brain Status: Connected Number of entries: 2 Brick gfs-vm001:/gluster/brick/brick0 / - Is in split-brain <gfid:81289110-867b-42ff-ba3b-1373a187032b> Status: Connected Number of entries: 2 Brick gfs-vm002:/gluster/brick/brick0 / - Is in split-brain Status: Connected Number of entries: 1 I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes and I am seeing node2 has slightly different attr: node0: sudo getfattr -d -m . -e hex /gluster/brick/brick0 getfattr: Removing leading '/' from absolute path names # file: gluster/brick/brick0 trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 node1: sudo getfattr -d -m . -e hex /gluster/brick/brick0 getfattr: Removing leading '/' from absolute path names # file: gluster/brick/brick0 trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 node2: sudo getfattr -d -m . -e hex /gluster/brick/brick0 getfattr: Removing leading '/' from absolute path names # file: gluster/brick/brick0 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000000000000200000000 trusted.afr.gv0-client-1=0x000000000000000200000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 Where do I go from here? Thanks On Tue, Jul 3, 2018 at 11:54 AM, Anh Vo <vtqanh at gmail.com> wrote:> I am trying to mount nfs to gluster volume and got mount.nfs failure. > Looking at nfs.log I am seeing these entries > > Heal info does not show the mentioned gfid ( 00000000-0000-0000-0000- > 000000000001 ) being in split-brain. > > [2018-07-03 18:16:27.694953] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > c3ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:16:28.204685] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > c4ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > The message "E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] > 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: > split-brain observed. [Input/output error]" repeated 2 times between > [2018-07-03 18:16:27.694903] and [2018-07-03 18:17:02.310689] > [2018-07-03 18:17:02.310722] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 2a6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:17:02.628990] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] > 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: > split-brain observed. [Input/output error] > [2018-07-03 18:17:02.629023] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 2b6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:17:00.398601] I [MSGID: 108031] > [afr-common.c:2458:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting > local read_child gv0-client-2 > [2018-07-03 18:17:01.666671] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] > 0-gv0-replicate-0: no read subvols for / > [2018-07-03 18:51:43.509385] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] > 0-gv0-replicate-0: no read subvols for / > [2018-07-03 18:51:43.936826] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] > 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: > split-brain observed. [Input/output error] > [2018-07-03 18:51:43.936868] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 19b1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:51:44.278901] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 1ab1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180703/afb1fdd1/attachment.html>