Gambit15
2018-Jul-01 20:15 UTC
[Gluster-users] Files not healing & missing their extended attributes - Help!
Hi Ashish, The output is below. It's a rep 2+1 volume. The arbiter is offline for maintenance at the moment, however quorum is met & no files are reported as in split-brain (it hosts VMs, so files aren't accessed concurrently). =====================[root at v0 glusterfs]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 279737d3-3e5a-4ee9-8d4a-97edcca42427 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: s0:/gluster/engine/brick Brick2: s1:/gluster/engine/brick Brick3: s2:/gluster/engine/arbiter (arbiter) Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 performance.low-prio-threads: 32 ===================== [root at v0 glusterfs]# gluster volume heal engine info Brick s0:/gluster/engine/brick /__DIRECT_IO_TEST__ /98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent /98495dbc-a29c-4893-b6a0-0aa70860d0c9 <LIST TRUNCATED FOR BREVITY> Status: Connected Number of entries: 34 Brick s1:/gluster/engine/brick <SAME AS ABOVE - TRUNCATED FOR BREVITY> Status: Connected Number of entries: 34 Brick s2:/gluster/engine/arbiter Status: Ponto final de transporte n?o est? conectado Number of entries: - ======================== PEER V0 == [root at v0 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent getfattr: Removing leading '/' from absolute path names # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-2=0x0000000000000000000024e8 trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at v0 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/* getfattr: Removing leading '/' from absolute path names # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000 # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000 === PEER V1 == [root at v1 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent getfattr: Removing leading '/' from absolute path names # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-2=0x0000000000000000000024ec trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2 trusted.glusterfs.dht=0x000000010000000000000000ffffffff ===================== cmd_history.log-20180701: [2018-07-01 03:11:38.461175] : volume heal engine full : SUCCESS [2018-07-01 03:11:51.151891] : volume heal data full : SUCCESS glustershd.log-20180701: <LOGS FROM 06/01 TRUNCATED> [2018-07-01 07:15:04.779122] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server... glustershd.log: [2018-07-01 07:15:04.779693] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing That's the *only* message in glustershd.log today. ===================== [root at v0 glusterfs]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick s0:/gluster/engine/brick 49154 0 Y 2816 Brick s1:/gluster/engine/brick 49154 0 Y 3995 Self-heal Daemon on localhost N/A N/A Y 2919 Self-heal Daemon on s1 N/A N/A Y 4013 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks ===================== Okay, so actually only the directory ha_agent is listed for healing (not its contents), & that does have attributes set. Many thanks for the reply! On 1 July 2018 at 15:34, Ashish Pandey <aspandey at redhat.com> wrote:> You have not even talked about the volume type and configuration and this > issue would require lot of other information to fix it. > > 1 - What is the type of volume and config. > 2 - Provide the gluster v <volname> info out put > 3 - Heal info out put > 4 - getxattr of one of the file, which needs healing, from all the bricks. > 5 - What lead to the healing of file? > 6 - gluster v <volname> status > 7 - glustershd.log out put just after you run full heal or index heal > > ---- > Ashish > > ------------------------------ > *From: *"Gambit15" <dougti+gluster at gmail.com> > *To: *"gluster-users" <gluster-users at gluster.org> > *Sent: *Sunday, July 1, 2018 11:50:16 PM > *Subject: *[Gluster-users] Files not healing & missing their > extended attributes - Help! > > > Hi Guys, > I had to restart our datacenter yesterday, but since doing so a number of > the files on my gluster share have been stuck, marked as healing. After no > signs of progress, I manually set off a full heal last night, but after > 24hrs, nothing's happened. > > The gluster logs all look normal, and there're no messages about failed > connections or heal processes kicking off. > > I checked the listed files' extended attributes on their bricks today, and > they only show the selinux attribute. There's none of the trusted.* > attributes I'd expect. > The healthy files on the bricks do have their extended attributes though. > > I'm guessing that perhaps the files somehow lost their attributes, and > gluster is no longer able to work out what to do with them? It's not logged > any errors, warnings, or anything else out of the normal though, so I've no > idea what the problem is or how to resolve it. > > I've got 16 hours to get this sorted before the start of work, Monday. > Help! > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180701/81b95daf/attachment.html>
Ashish Pandey
2018-Jul-02 02:37 UTC
[Gluster-users] Files not healing & missing their extended attributes - Help!
The only problem at the moment is that arbiter brick offline. You should only bother about completion of maintenance of arbiter brick ASAP. Bring this brick UP, start FULL heal or index heal and the volume will be in healthy state. --- Ashish ----- Original Message ----- From: "Gambit15" <dougti+gluster at gmail.com> To: "Ashish Pandey" <aspandey at redhat.com> Cc: "gluster-users" <gluster-users at gluster.org> Sent: Monday, July 2, 2018 1:45:01 AM Subject: Re: [Gluster-users] Files not healing & missing their extended attributes - Help! Hi Ashish, The output is below. It's a rep 2+1 volume. The arbiter is offline for maintenance at the moment, however quorum is met & no files are reported as in split-brain (it hosts VMs, so files aren't accessed concurrently). ====================== [root at v0 glusterfs]# gluster volume info engine Volume Name: engine Type: Replicate Volume ID: 279737d3-3e5a-4ee9-8d4a-97edcca42427 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: s0:/gluster/engine/brick Brick2: s1:/gluster/engine/brick Brick3: s2:/gluster/engine/arbiter (arbiter) Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 performance.low-prio-threads: 32 ====================== [root at v0 glusterfs]# gluster volume heal engine info Brick s0:/gluster/engine/brick /__DIRECT_IO_TEST__ /98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent /98495dbc-a29c-4893-b6a0-0aa70860d0c9 <LIST TRUNCATED FOR BREVITY> Status: Connected Number of entries: 34 Brick s1:/gluster/engine/brick <SAME AS ABOVE - TRUNCATED FOR BREVITY> Status: Connected Number of entries: 34 Brick s2:/gluster/engine/arbiter Status: Ponto final de transporte n?o est? conectado Number of entries: - ====================== === PEER V0 === [root at v0 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent getfattr: Removing leading '/' from absolute path names # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-2=0x0000000000000000000024e8 trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at v0 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/* getfattr: Removing leading '/' from absolute path names # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000 # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000 === PEER V1 === [root at v1 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent getfattr: Removing leading '/' from absolute path names # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-2=0x0000000000000000000024ec trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2 trusted.glusterfs.dht=0x000000010000000000000000ffffffff ====================== cmd_history.log-20180701: [2018-07-01 03:11:38.461175] : volume heal engine full : SUCCESS [2018-07-01 03:11:51.151891] : volume heal data full : SUCCESS glustershd.log-20180701: <LOGS FROM 06/01 TRUNCATED> [2018-07-01 07:15:04.779122] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server... glustershd.log: [2018-07-01 07:15:04.779693] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing That's the *only* message in glustershd.log today. ====================== [root at v0 glusterfs]# gluster volume status engine Status of volume: engine Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick s0:/gluster/engine/brick 49154 0 Y 2816 Brick s1:/gluster/engine/brick 49154 0 Y 3995 Self-heal Daemon on localhost N/A N/A Y 2919 Self-heal Daemon on s1 N/A N/A Y 4013 Task Status of Volume engine ------------------------------------------------------------------------------ There are no active volume tasks ====================== Okay, so actually only the directory ha_agent is listed for healing (not its contents), & that does have attributes set. Many thanks for the reply! On 1 July 2018 at 15:34, Ashish Pandey < aspandey at redhat.com > wrote: You have not even talked about the volume type and configuration and this issue would require lot of other information to fix it. 1 - What is the type of volume and config. 2 - Provide the gluster v <volname> info out put 3 - Heal info out put 4 - getxattr of one of the file, which needs healing, from all the bricks. 5 - What lead to the healing of file? 6 - gluster v <volname> status 7 - glustershd.log out put just after you run full heal or index heal ---- Ashish From: "Gambit15" < dougti+gluster at gmail.com > To: "gluster-users" < gluster-users at gluster.org > Sent: Sunday, July 1, 2018 11:50:16 PM Subject: [Gluster-users] Files not healing & missing their extended attributes - Help! Hi Guys, I had to restart our datacenter yesterday, but since doing so a number of the files on my gluster share have been stuck, marked as healing. After no signs of progress, I manually set off a full heal last night, but after 24hrs, nothing's happened. The gluster logs all look normal, and there're no messages about failed connections or heal processes kicking off. I checked the listed files' extended attributes on their bricks today, and they only show the selinux attribute. There's none of the trusted.* attributes I'd expect. The healthy files on the bricks do have their extended attributes though. I'm guessing that perhaps the files somehow lost their attributes, and gluster is no longer able to work out what to do with them? It's not logged any errors, warnings, or anything else out of the normal though, so I've no idea what the problem is or how to resolve it. I've got 16 hours to get this sorted before the start of work, Monday. Help! _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180701/4ee6b633/attachment.html>