thr3ads.net - Gluster users - [Gluster-users] Files not healing & missing their extended attributes

If this information is useful, please help other people find it:
Share via:

Karthik Subrahmanya

2018-Jul-04 09:26 UTC

[Gluster-users] Files not healing & missing their extended attributes - Help!

Hi,
>From the logs you have pasted it looks like those files are in GFIDsplit-brain.
They should have the GFIDs assigned on both the data bricks but they will
be different.

Can you please paste the getfattr output of those files and their parent
from all the bricks again?
Which version of gluster you are using?

If you are using a version higher than or equal to 3.12 gfid split brains
can be resolved using the methods (except method 4)
explained in the "Resolution of split-brain using gluster CLI" section
in
[1].
Also note that for gfid split-brain resolution using CLI you have to pass
the name of the file as argument and not the GFID.

If it is lower than 3.12 (Please consider upgrading them since they are
EOL) you have to resolve it manually as explained in [2]

[1] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
[2]
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain

Thanks & Regards,
Karthik

On Wed, Jul 4, 2018 at 1:59 AM Gambit15 <dougti+gluster at gmail.com>
wrote:
> On 1 July 2018 at 22:37, Ashish Pandey <aspandey at redhat.com>
wrote:
>
>>
>> The only problem at the moment is that arbiter brick offline. You
should
>> only bother about completion of maintenance of arbiter brick ASAP.
>> Bring this brick UP, start FULL heal or index heal and the volume will
be
>> in healthy state.
>>
>
> Doesn't the arbiter only resolve split-brain situations? None of the
files
> that have been marked for healing are marked as in split-brain.
>
> The arbiter has now been brought back up, however the problem continues.
>
> I've found the following information in the client log:
>
> [2018-07-03 19:09:29.245089] W [MSGID: 108008]
> [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
> 0-engine-replicate-0: GFID mismatch for
> <gfid:db9afb92-d2bc-49ed-8e34-dcd437ba7be2>/hosted-engine.metadata
> 5e95ba8c-2f12-49bf-be2d-b4baf210d366 on engine-client-1 and
> b9cd7613-3b96-415d-a549-1dc788a4f94d on engine-client-0
> [2018-07-03 19:09:29.245585] W [fuse-bridge.c:471:fuse_entry_cbk]
> 0-glusterfs-fuse: 10430040: LOOKUP()
> /98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata =>
-1
> (Input/output error)
> [2018-07-03 19:09:30.619000] W [MSGID: 108008]
> [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
> 0-engine-replicate-0: GFID mismatch for
> <gfid:db9afb92-d2bc-49ed-8e34-dcd437ba7be2>/hosted-engine.lockspace
> 8e86902a-c31c-4990-b0c5-0318807edb8f on engine-client-1 and
> e5899a4c-dc5d-487e-84b0-9bbc73133c25 on engine-client-0
> [2018-07-03 19:09:30.619360] W [fuse-bridge.c:471:fuse_entry_cbk]
> 0-glusterfs-fuse: 10430656: LOOKUP()
> /98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace
=>
> -1 (Input/output error)
>
> As you can see from the logs I posted previously, neither of those two
> files, on either of the two servers, have any of gluster's extended
> attributes set.
>
> The arbiter doesn't have any record of the files in question, as they
were
> created after it went offline.
>
> How do I fix this? Is it possible to locate the correct gfids somewhere
&
> redefine them on the files manually?
>
> Cheers,
>  Doug
>
> ------------------------------
>> *From: *"Gambit15" <dougti+gluster at gmail.com>
>> *To: *"Ashish Pandey" <aspandey at redhat.com>
>> *Cc: *"gluster-users" <gluster-users at gluster.org>
>> *Sent: *Monday, July 2, 2018 1:45:01 AM
>> *Subject: *Re: [Gluster-users] Files not healing & missing their
>> extended attributes - Help!
>>
>>
>> Hi Ashish,
>>
>> The output is below. It's a rep 2+1 volume. The arbiter is offline
for
>> maintenance at the moment, however quorum is met & no files are
reported as
>> in split-brain (it hosts VMs, so files aren't accessed
concurrently).
>>
>> =====================>> [root at v0 glusterfs]# gluster volume
info engine
>>
>> Volume Name: engine
>> Type: Replicate
>> Volume ID: 279737d3-3e5a-4ee9-8d4a-97edcca42427
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: s0:/gluster/engine/brick
>> Brick2: s1:/gluster/engine/brick
>> Brick3: s2:/gluster/engine/arbiter (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> performance.low-prio-threads: 32
>>
>> =====================>>
>> [root at v0 glusterfs]# gluster volume heal engine info
>> Brick s0:/gluster/engine/brick
>> /__DIRECT_IO_TEST__
>> /98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>> /98495dbc-a29c-4893-b6a0-0aa70860d0c9
>> <LIST TRUNCATED FOR BREVITY>
>> Status: Connected
>> Number of entries: 34
>>
>> Brick s1:/gluster/engine/brick
>> <SAME AS ABOVE - TRUNCATED FOR BREVITY>
>> Status: Connected
>> Number of entries: 34
>>
>> Brick s2:/gluster/engine/arbiter
>> Status: Ponto final de transporte n?o est? conectado
>> Number of entries: -
>>
>> =====================>> === PEER V0 ==>>
>> [root at v0 glusterfs]# getfattr -m . -d -e hex
>> /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>> getfattr: Removing leading '/' from absolute path names
>> # file:
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>>
>>
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.engine-client-2=0x0000000000000000000024e8
>> trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>
>> [root at v0 glusterfs]# getfattr -m . -d -e hex
>> /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/*
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>>
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace
>>
>>
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000
>>
>> # file:
>>
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata
>>
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000
>>
>>
>> === PEER V1 ==>>
>> [root at v1 glusterfs]# getfattr -m . -d -e hex
>> /gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>> getfattr: Removing leading '/' from absolute path names
>> # file:
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>>
>>
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.engine-client-2=0x0000000000000000000024ec
>> trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>
>> =====================>>
>> cmd_history.log-20180701:
>>
>> [2018-07-01 03:11:38.461175]  : volume heal engine full : SUCCESS
>> [2018-07-01 03:11:51.151891]  : volume heal data full : SUCCESS
>>
>> glustershd.log-20180701:
>> <LOGS FROM 06/01 TRUNCATED>
>> [2018-07-01 07:15:04.779122] I [MSGID: 100011]
>> [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file
from
>> server...
>>
>> glustershd.log:
>> [2018-07-01 07:15:04.779693] I
[glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
>> 0-glusterfs: No change in volfile, continuing
>>
>> That's the *only* message in glustershd.log today.
>>
>> =====================>>
>> [root at v0 glusterfs]# gluster volume status engine
>> Status of volume: engine
>> Gluster process                             TCP Port  RDMA Port  Online
>> Pid
>>
>>
------------------------------------------------------------------------------
>> Brick s0:/gluster/engine/brick              49154     0          Y
>> 2816
>> Brick s1:/gluster/engine/brick              49154     0          Y
>> 3995
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 2919
>> Self-heal Daemon on s1                      N/A       N/A        Y
>> 4013
>>
>> Task Status of Volume engine
>>
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> =====================>>
>> Okay, so actually only the directory ha_agent is listed for healing
(not
>> its contents), & that does have attributes set.
>>
>> Many thanks for the reply!
>>
>>
>> On 1 July 2018 at 15:34, Ashish Pandey <aspandey at redhat.com>
wrote:
>>
>>> You have not even talked about the volume type and configuration
and
>>> this issue would require lot of other information to fix it.
>>>
>>> 1 - What is the type of volume and config.
>>> 2 - Provide the gluster v <volname> info out put
>>> 3 - Heal info out put
>>> 4 - getxattr of one of the file, which needs healing, from all the
>>> bricks.
>>> 5 - What lead to the healing of file?
>>> 6 - gluster v <volname> status
>>> 7 - glustershd.log out put just after you run full heal or index
heal
>>>
>>> ----
>>> Ashish
>>>
>>> ------------------------------
>>> *From: *"Gambit15" <dougti+gluster at gmail.com>
>>> *To: *"gluster-users" <gluster-users at
gluster.org>
>>> *Sent: *Sunday, July 1, 2018 11:50:16 PM
>>> *Subject: *[Gluster-users] Files not healing & missing their
>>> extended        attributes - Help!
>>>
>>>
>>> Hi Guys,
>>>  I had to restart our datacenter yesterday, but since doing so a
number
>>> of the files on my gluster share have been stuck, marked as
healing. After
>>> no signs of progress, I manually set off a full heal last night,
but after
>>> 24hrs, nothing's happened.
>>>
>>> The gluster logs all look normal, and there're no messages
about failed
>>> connections or heal processes kicking off.
>>>
>>> I checked the listed files' extended attributes on their bricks
today,
>>> and they only show the selinux attribute. There's none of the
trusted.*
>>> attributes I'd expect.
>>> The healthy files on the bricks do have their extended attributes
though.
>>>
>>> I'm guessing that perhaps the files somehow lost their
attributes, and
>>> gluster is no longer able to work out what to do with them?
It's not logged
>>> any errors, warnings, or anything else out of the normal though, so
I've no
>>> idea what the problem is or how to resolve it.
>>>
>>> I've got 16 hours to get this sorted before the start of work,
Monday.
>>> Help!
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/f5624669/attachment.html>

Gambit15

2018-Jul-05 00:39 UTC

head link

[Gluster-users] Files not healing & missing their extended attributes - Help!

Hi Karthik,
 Many thanks for the response!

On 4 July 2018 at 05:26, Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:
> Hi,
>
> From the logs you have pasted it looks like those files are in GFID
> split-brain.
> They should have the GFIDs assigned on both the data bricks but they will
> be different.
>
> Can you please paste the getfattr output of those files and their parent
> from all the bricks again?
>
The files don't have any attributes set, however I did manage to find their
corresponding entries in .glusterfs

=================================[root at v0 .glusterfs]# getfattr -m . -d -e
hex
/gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
getfattr: Removing leading '/' from absolute path names
# file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-2=0x0000000000000000000024ea
trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

[root at v0 .glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/*
getfattr: Removing leading '/' from absolute path names
# file:
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000

# file:
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000

[root at v0 .glusterfs]# ls -l
/gluster/engine/brick/.glusterfs/db/9a/db9afb92-d2bc-49ed-8e34-dcd437ba7be2/
total 0
lrwxrwxrwx. 2 vdsm kvm 132 Jun 30 14:55 hosted-engine.lockspace ->
/var/run/vdsm/storage/98495dbc-a29c-4893-b6a0-0aa70860d0c9/2502aff4-6c67-4643-b681-99f2c87e793d/03919182-6be2-4cbc-aea2-b9d68422a800
lrwxrwxrwx. 2 vdsm kvm 132 Jun 30 14:55 hosted-engine.metadata ->
/var/run/vdsm/storage/98495dbc-a29c-4893-b6a0-0aa70860d0c9/99510501-6bdc-485a-98e8-c2f82ff8d519/71fa7e6c-cdfb-4da8-9164-2404b518d0ee

=================================
Again, here are the relevant client log entries:

[2018-07-03 19:09:29.245089] W [MSGID: 108008]
[afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
0-engine-replicate-0: GFID mismatch for
<gfid:db9afb92-d2bc-49ed-8e34-dcd437ba7be2>/hosted-engine.metadata
5e95ba8c-2f12-49bf-be2d-b4baf210d366 on engine-client-1 and
b9cd7613-3b96-415d-a549-1dc788a4f94d on engine-client-0
[2018-07-03 19:09:29.245585] W [fuse-bridge.c:471:fuse_entry_cbk]
0-glusterfs-fuse: 10430040: LOOKUP()
/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata => -1
(Input/output error)
[2018-07-03 19:09:30.619000] W [MSGID: 108008]
[afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
0-engine-replicate-0: GFID mismatch for
<gfid:db9afb92-d2bc-49ed-8e34-dcd437ba7be2>/hosted-engine.lockspace
8e86902a-c31c-4990-b0c5-0318807edb8f on engine-client-1 and
e5899a4c-dc5d-487e-84b0-9bbc73133c25 on engine-client-0
[2018-07-03 19:09:30.619360] W [fuse-bridge.c:471:fuse_entry_cbk]
0-glusterfs-fuse: 10430656: LOOKUP()
/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace =>
-1 (Input/output error)

[root at v0 .glusterfs]# find . -type f | grep -E
"5e95ba8c-2f12-49bf-be2d-b4baf210d366|8e86902a-c31c-4990-b0c5-0318807edb8f|b9cd7613-3b96-415d-a549-1dc788a4f94d|e5899a4c-dc5d-487e-84b0-9bbc73133c25"
[root at v0 .glusterfs]#

=================================
> Which version of gluster you are using?
>
3.8.5
An upgrade is on the books, however I had to go back on my last attempt as
3.12 didn't work with 3.8 & I was unable to do a live rolling upgrade.
Once
I've got this GFID mess sorted out, I'll give a full upgrade a go as
I've
already had to failover this cluster's services to another cluster.

If you are using a version higher than or equal to 3.12 gfid split
brains> can be resolved using the methods (except method 4)
> explained in the "Resolution of split-brain using gluster CLI"
section in
> [1].
> Also note that for gfid split-brain resolution using CLI you have to pass
> the name of the file as argument and not the GFID.
>
> If it is lower than 3.12 (Please consider upgrading them since they are
> EOL) you have to resolve it manually as explained in [2]
>
> [1] https://docs.gluster.org/en/latest/Troubleshooting/
> resolving-splitbrain/
> [2] https://docs.gluster.org/en/latest/Troubleshooting/
> resolving-splitbrain/#dir-split-brain
>
"The user needs to remove either file '1' on brick-a or the file
'1' on
brick-b to resolve the split-brain. In addition, the corresponding
gfid-link file also needs to be removed."

Okay, so as you can see above, the files don't have a trusted.gfid
attribute, & on the brick I didn't find any files in .glusterfs with the
same name as the GFID's reported in the client log. I did however find the
symlinked files in a .glusterfs directory under the parent directory's GFID.

[root at v0 .glusterfs]# ls -l
/gluster/engine/brick/.glusterfs/db/9a/db9afb92-d2bc-49ed-8e34-dcd437ba7be2/
total 0
lrwxrwxrwx. 2 vdsm kvm 132 Jun 30 14:55 hosted-engine.lockspace ->
/var/run/vdsm/storage/98495dbc-a29c-4893-b6a0-0aa70860d0c9/2502aff4-6c67-4643-b681-99f2c87e793d/03919182-6be2-4cbc-aea2-b9d68422a800
lrwxrwxrwx. 2 vdsm kvm 132 Jun 30 14:55 hosted-engine.metadata ->
/var/run/vdsm/storage/98495dbc-a29c-4893-b6a0-0aa70860d0c9/99510501-6bdc-485a-98e8-c2f82ff8d519/71fa7e6c-cdfb-4da8-9164-2404b518d0ee


So if I delete those two symlinks & the files they point to, on one of the
two bricks, will that resolve the split brain? Is that correct?

> Thanks & Regards,
> Karthik
>
> On Wed, Jul 4, 2018 at 1:59 AM Gambit15 <dougti+gluster at gmail.com>
wrote:
>
>> On 1 July 2018 at 22:37, Ashish Pandey <aspandey at redhat.com>
wrote:
>>
>>>
>>> The only problem at the moment is that arbiter brick offline. You
should
>>> only bother about completion of maintenance of arbiter brick ASAP.
>>> Bring this brick UP, start FULL heal or index heal and the volume
will
>>> be in healthy state.
>>>
>>
>> Doesn't the arbiter only resolve split-brain situations? None of
the
>> files that have been marked for healing are marked as in split-brain.
>>
>> The arbiter has now been brought back up, however the problem
continues.
>>
>> I've found the following information in the client log:
>>
>> [2018-07-03 19:09:29.245089] W [MSGID: 108008]
>> [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
>> 0-engine-replicate-0: GFID mismatch for
<gfid:db9afb92-d2bc-49ed-8e34-
>> dcd437ba7be2>/hosted-engine.metadata
5e95ba8c-2f12-49bf-be2d-b4baf210d366
>> on engine-client-1 and b9cd7613-3b96-415d-a549-1dc788a4f94d on
>> engine-client-0
>> [2018-07-03 19:09:29.245585] W [fuse-bridge.c:471:fuse_entry_cbk]
>> 0-glusterfs-fuse: 10430040: LOOKUP() /98495dbc-a29c-4893-b6a0-
>> 0aa70860d0c9/ha_agent/hosted-engine.metadata => -1 (Input/output
error)
>> [2018-07-03 19:09:30.619000] W [MSGID: 108008]
>> [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check]
>> 0-engine-replicate-0: GFID mismatch for
<gfid:db9afb92-d2bc-49ed-8e34-
>> dcd437ba7be2>/hosted-engine.lockspace
8e86902a-c31c-4990-b0c5-0318807edb8f
>> on engine-client-1 and e5899a4c-dc5d-487e-84b0-9bbc73133c25 on
>> engine-client-0
>> [2018-07-03 19:09:30.619360] W [fuse-bridge.c:471:fuse_entry_cbk]
>> 0-glusterfs-fuse: 10430656: LOOKUP() /98495dbc-a29c-4893-b6a0-
>> 0aa70860d0c9/ha_agent/hosted-engine.lockspace => -1 (Input/output
error)
>>
>> As you can see from the logs I posted previously, neither of those two
>> files, on either of the two servers, have any of gluster's extended
>> attributes set.
>>
>> The arbiter doesn't have any record of the files in question, as
they
>> were created after it went offline.
>>
>> How do I fix this? Is it possible to locate the correct gfids somewhere
&
>> redefine them on the files manually?
>>
>> Cheers,
>>  Doug
>>
>> ------------------------------
>>> *From: *"Gambit15" <dougti+gluster at gmail.com>
>>> *To: *"Ashish Pandey" <aspandey at redhat.com>
>>> *Cc: *"gluster-users" <gluster-users at
gluster.org>
>>> *Sent: *Monday, July 2, 2018 1:45:01 AM
>>> *Subject: *Re: [Gluster-users] Files not healing & missing
their
>>> extended attributes - Help!
>>>
>>>
>>> Hi Ashish,
>>>
>>> The output is below. It's a rep 2+1 volume. The arbiter is
offline for
>>> maintenance at the moment, however quorum is met & no files are
reported as
>>> in split-brain (it hosts VMs, so files aren't accessed
concurrently).
>>>
>>> =====================>>> [root at v0 glusterfs]# gluster
volume info engine
>>>
>>> Volume Name: engine
>>> Type: Replicate
>>> Volume ID: 279737d3-3e5a-4ee9-8d4a-97edcca42427
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: s0:/gluster/engine/brick
>>> Brick2: s1:/gluster/engine/brick
>>> Brick3: s2:/gluster/engine/arbiter (arbiter)
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> transport.address-family: inet
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> cluster.eager-lock: enable
>>> network.remote-dio: enable
>>> cluster.quorum-type: auto
>>> cluster.server-quorum-type: server
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> performance.low-prio-threads: 32
>>>
>>> =====================>>>
>>> [root at v0 glusterfs]# gluster volume heal engine info
>>> Brick s0:/gluster/engine/brick
>>> /__DIRECT_IO_TEST__
>>> /98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>>> /98495dbc-a29c-4893-b6a0-0aa70860d0c9
>>> <LIST TRUNCATED FOR BREVITY>
>>> Status: Connected
>>> Number of entries: 34
>>>
>>> Brick s1:/gluster/engine/brick
>>> <SAME AS ABOVE - TRUNCATED FOR BREVITY>
>>> Status: Connected
>>> Number of entries: 34
>>>
>>> Brick s2:/gluster/engine/arbiter
>>> Status: Ponto final de transporte n?o est? conectado
>>> Number of entries: -
>>>
>>> =====================>>> === PEER V0 ==>>>
>>> [root at v0 glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/
>>> 98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/
>>> ha_agent
>>> security.selinux=0x73797374656d5f753a6f626a6563
>>> 745f723a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.engine-client-2=0x0000000000000000000024e8
>>> trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>> [root at v0 glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/
>>> 98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/*
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/
>>> ha_agent/hosted-engine.lockspace
>>> security.selinux=0x73797374656d5f753a6f626a6563
>>> 745f723a6675736566735f743a733000
>>>
>>> # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/
>>> ha_agent/hosted-engine.metadata
>>> security.selinux=0x73797374656d5f753a6f626a6563
>>> 745f723a6675736566735f743a733000
>>>
>>> === PEER V1 ==>>>
>>> [root at v1 glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/
>>> 98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/
>>> ha_agent
>>> security.selinux=0x73797374656d5f753a6f626a6563
>>> 745f723a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.engine-client-2=0x0000000000000000000024ec
>>> trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>
>>> =====================>>>
>>> cmd_history.log-20180701:
>>>
>>> [2018-07-01 03:11:38.461175]  : volume heal engine full : SUCCESS
>>> [2018-07-01 03:11:51.151891]  : volume heal data full : SUCCESS
>>>
>>> glustershd.log-20180701:
>>> <LOGS FROM 06/01 TRUNCATED>
>>> [2018-07-01 07:15:04.779122] I [MSGID: 100011]
[glusterfsd.c:1396:reincarnate]
>>> 0-glusterfsd: Fetching the volume file from server...
>>>
>>> glustershd.log:
>>> [2018-07-01 07:15:04.779693] I
[glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
>>> 0-glusterfs: No change in volfile, continuing
>>>
>>> That's the *only* message in glustershd.log today.
>>>
>>> =====================>>>
>>> [root at v0 glusterfs]# gluster volume status engine
>>> Status of volume: engine
>>> Gluster process                             TCP Port  RDMA Port
>>> Online  Pid
>>> ------------------------------------------------------------
>>> ------------------
>>> Brick s0:/gluster/engine/brick              49154     0
>>> Y       2816
>>> Brick s1:/gluster/engine/brick              49154     0
>>> Y       3995
>>> Self-heal Daemon on localhost               N/A       N/A        Y
>>> 2919
>>> Self-heal Daemon on s1                      N/A       N/A        Y
>>> 4013
>>>
>>> Task Status of Volume engine
>>> ------------------------------------------------------------
>>> ------------------
>>> There are no active volume tasks
>>>
>>> =====================>>>
>>> Okay, so actually only the directory ha_agent is listed for healing
(not
>>> its contents), & that does have attributes set.
>>>
>>> Many thanks for the reply!
>>>
>>>
>>> On 1 July 2018 at 15:34, Ashish Pandey <aspandey at
redhat.com> wrote:
>>>
>>>> You have not even talked about the volume type and
configuration and
>>>> this issue would require lot of other information to fix it.
>>>>
>>>> 1 - What is the type of volume and config.
>>>> 2 - Provide the gluster v <volname> info out put
>>>> 3 - Heal info out put
>>>> 4 - getxattr of one of the file, which needs healing, from all
the
>>>> bricks.
>>>> 5 - What lead to the healing of file?
>>>> 6 - gluster v <volname> status
>>>> 7 - glustershd.log out put just after you run full heal or
index heal
>>>>
>>>> ----
>>>> Ashish
>>>>
>>>> ------------------------------
>>>> *From: *"Gambit15" <dougti+gluster at
gmail.com>
>>>> *To: *"gluster-users" <gluster-users at
gluster.org>
>>>> *Sent: *Sunday, July 1, 2018 11:50:16 PM
>>>> *Subject: *[Gluster-users] Files not healing & missing
their
>>>> extended        attributes - Help!
>>>>
>>>>
>>>> Hi Guys,
>>>>  I had to restart our datacenter yesterday, but since doing so
a number
>>>> of the files on my gluster share have been stuck, marked as
healing. After
>>>> no signs of progress, I manually set off a full heal last
night, but after
>>>> 24hrs, nothing's happened.
>>>>
>>>> The gluster logs all look normal, and there're no messages
about failed
>>>> connections or heal processes kicking off.
>>>>
>>>> I checked the listed files' extended attributes on their
bricks today,
>>>> and they only show the selinux attribute. There's none of
the trusted.*
>>>> attributes I'd expect.
>>>> The healthy files on the bricks do have their extended
attributes
>>>> though.
>>>>
>>>> I'm guessing that perhaps the files somehow lost their
attributes, and
>>>> gluster is no longer able to work out what to do with them?
It's not logged
>>>> any errors, warnings, or anything else out of the normal
though, so I've no
>>>> idea what the problem is or how to resolve it.
>>>>
>>>> I've got 16 hours to get this sorted before the start of
work, Monday.
>>>> Help!
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180704/dda29dd2/attachment.html>

Gluster users - Jul 2018 - Files not healing & missing their extended attributes - Help!

[Gluster-users] Files not healing & missing their extended attributes - Help!

[Gluster-users] Files not healing & missing their extended attributes - Help!