Anh Vo
2018-Jul-03 20:17 UTC
[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error
Actually we just discovered that the heal info command was returning different things when executed on the different nodes of our 3-replica setup. When we execute it on node2 we did not see the split brain reported "/" but if I execute it on node0 and node1 I am seeing: x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info Brick gfs-vm000:/gluster/brick/brick0 <gfid:81289110-867b-42ff-ba3b-1373a187032b> / - Is in split-brain Status: Connected Number of entries: 2 Brick gfs-vm001:/gluster/brick/brick0 / - Is in split-brain <gfid:81289110-867b-42ff-ba3b-1373a187032b> Status: Connected Number of entries: 2 Brick gfs-vm002:/gluster/brick/brick0 / - Is in split-brain Status: Connected Number of entries: 1 I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes and I am seeing node2 has slightly different attr: node0: sudo getfattr -d -m . -e hex /gluster/brick/brick0 getfattr: Removing leading '/' from absolute path names # file: gluster/brick/brick0 trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 node1: sudo getfattr -d -m . -e hex /gluster/brick/brick0 getfattr: Removing leading '/' from absolute path names # file: gluster/brick/brick0 trusted.afr.gv0-client-2=0x000000000000000100000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 node2: sudo getfattr -d -m . -e hex /gluster/brick/brick0 getfattr: Removing leading '/' from absolute path names # file: gluster/brick/brick0 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gv0-client-0=0x000000000000000200000000 trusted.afr.gv0-client-1=0x000000000000000200000000 trusted.afr.gv0-client-2=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 Where do I go from here? Thanks On Tue, Jul 3, 2018 at 11:54 AM, Anh Vo <vtqanh at gmail.com> wrote:> I am trying to mount nfs to gluster volume and got mount.nfs failure. > Looking at nfs.log I am seeing these entries > > Heal info does not show the mentioned gfid ( 00000000-0000-0000-0000- > 000000000001 ) being in split-brain. > > [2018-07-03 18:16:27.694953] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > c3ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:16:28.204685] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > c4ac3cc5, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > The message "E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] > 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: > split-brain observed. [Input/output error]" repeated 2 times between > [2018-07-03 18:16:27.694903] and [2018-07-03 18:17:02.310689] > [2018-07-03 18:17:02.310722] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 2a6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:17:02.628990] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] > 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: > split-brain observed. [Input/output error] > [2018-07-03 18:17:02.629023] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 2b6f2526, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:17:00.398601] I [MSGID: 108031] > [afr-common.c:2458:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting > local read_child gv0-client-2 > [2018-07-03 18:17:01.666671] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] > 0-gv0-replicate-0: no read subvols for / > [2018-07-03 18:51:43.509385] W [MSGID: 108027] [afr-common.c:2821:afr_discover_done] > 0-gv0-replicate-0: no read subvols for / > [2018-07-03 18:51:43.936826] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] > 0-gv0-replicate-0: Failing STAT on gfid 00000000-0000-0000-0000-000000000001: > split-brain observed. [Input/output error] > [2018-07-03 18:51:43.936868] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 19b1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > [2018-07-03 18:51:44.278901] W [MSGID: 112199] > [nfs3-helpers.c:3414:nfs3_log_common_res] 0-nfs-nfsv3: / => (XID: > 1ab1731e, FSINFO: NFS: 5(I/O error), POSIX: 5(Input/output error)) > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180703/afb1fdd1/attachment.html>
Ravishankar N
2018-Jul-04 03:02 UTC
[Gluster-users] Failed to mount nfs due to split-brain and Input/Output Error
Hi, What version of gluster are you using? 1. The afr xattrs on '/' indicate a meta-data split-brain. You can resolve it using one of the policies listed in https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ For example, "|gluster volume heal gv0 split-brain latest-mtime / " | 2. Is the file corresponding to the other gfid (81289110-867b-42ff-ba3b-1373a187032b) present in all bricks? What do the getfattr outputs for this file indicate? 3. As for the discrepancy in output of heal info, is node2 connected to the other nodes? Does heal info still print the details of all 3 bricks when you run it on node2 ? -Ravi On 07/04/2018 01:47 AM, Anh Vo wrote:> Actually we just discovered that the heal info command was returning > different things when executed on the different nodes of our 3-replica > setup. > When we execute it on node2 we did not see the split brain reported > "/" but if I execute it on node0 and node1 I am seeing: > > x at gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info > Brick gfs-vm000:/gluster/brick/brick0 > <gfid:81289110-867b-42ff-ba3b-1373a187032b> > / - Is in split-brain > > Status: Connected > Number of entries: 2 > > Brick gfs-vm001:/gluster/brick/brick0 > / - Is in split-brain > > <gfid:81289110-867b-42ff-ba3b-1373a187032b> > Status: Connected > Number of entries: 2 > > Brick gfs-vm002:/gluster/brick/brick0 > / - Is in split-brain > > Status: Connected > Number of entries: 1 > > > I ran?getfattr -d -m . -e hex /gluster/brick/brick0 on all three nodes > and I am seeing node2 has slightly different attr: > node0: > sudo getfattr -d -m . -e hex /gluster/brick/brick0 > getfattr: Removing leading '/' from absolute path names > # file: gluster/brick/brick0 > trusted.afr.gv0-client-2=0x000000000000000100000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 > > node1: > sudo getfattr -d -m . -e hex /gluster/brick/brick0 > getfattr: Removing leading '/' from absolute path names > # file: gluster/brick/brick0 > trusted.afr.gv0-client-2=0x000000000000000100000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 > > node2: > sudo getfattr -d -m . -e hex /gluster/brick/brick0 > getfattr: Removing leading '/' from absolute path names > # file: gluster/brick/brick0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gv0-client-0=0x000000000000000200000000 > trusted.afr.gv0-client-1=0x000000000000000200000000 > trusted.afr.gv0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2 > > Where do I go from here? Thanks-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/a658fb32/attachment.html>