thr3ads.net - Gluster users - [Gluster-users] 3.8.3 Shards Healing Glacier Slow [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2016-Aug-31 08:50 UTC

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

No, sorry, it's working fine. I may have missed some step because of which
i saw that problem. /.shard is also healing fine now.

Let me know if it works for you.

-Krutika

On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> OK I just hit the other issue too, where .shard doesn't get healed. :)
>
> Investigating as to why that is the case. Give me some time.
>
> -Krutika
>
> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>> Just figured the steps Anuradha has provided won't work if granular
entry
>> heal is on.
>> So when you bring down a brick and create fake2 under / of the volume,
>> granular entry heal feature causes
>> sh to remember only the fact that 'fake2' needs to be recreated
on the
>> offline brick (because changelogs are granular).
>>
>> In this case, we would be required to indicate to self-heal-daemon that
>> the entire directory tree from '/' needs to be repaired on the
brick that
>> contains no data.
>>
>> To fix this, I did the following (for users who use granular entry
>> self-healing):
>>
>> 1. Kill the last brick process in the replica (/bricks/3)
>>
>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>
>> 3. [root at server-3 ~]# mkdir /bricks/3
>>
>> 4. Create a new dir on the mount point:
>>     [root at client-1 ~]# mkdir /mnt/fake
>>
>> 5. Set some fake xattr on the root of the volume, and not the
'fake'
>> directory itself.
>>     [root at client-1 ~]# setfattr -n "user.some-name" -v
"some-value" /mnt
>>
>> 6. Make sure there's no io happening on your volume.
>>
>> 7. Check the pending xattrs on the brick directories of the two good
>> copies (on bricks 1 and 2), you should be seeing same values as the one
>> marked in red in both bricks.
>> (note that the client-<num> xattr key will have the same last
digit as
>> the index of the brick that is down, when counting from 0. So if the
first
>> brick is the one that is down, it would read trusted.afr.*-client-0; if
the
>> second brick is the one that is empty and down, it would read
>> trusted.afr.*-client-1 and so on).
>>
>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>> # file: 1
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>> # file: 2
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a
1.
>>
>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>> *0x000000010000000100000001* /bricks/1
>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>> *0x000000010000000100000001* /bricks/2
>>
>> 9. Get the xattrs again and check the xattrs are set properly now
>>
>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>> # file: 1
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>> # file: 2
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> 10. Force-start the volume.
>>
>> [root at server-1 ~]# gluster volume start rep force
>> volume start: rep: success
>>
>> 11. Monitor heal-info command to ensure the number of entries keeps
>> growing.
>>
>> 12. Keep monitoring with step 10 and eventually the number of entries
>> needing heal must come down to 0.
>> Also the checksums of the files on the previously empty brick should
now
>> match with the copies on the other two bricks.
>>
>> Could you check if the above steps work for you, in your test
environment?
>>
>> You caught a nice bug in the manual steps to follow when granular
>> entry-heal is enabled and an empty brick needs heal. Thanks for
reporting
>> it. :) We will fix the documentation appropriately.
>>
>> -Krutika
>>
>>
>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>> Tried this.
>>>
>>> With me, only 'fake2' gets healed after i bring the
'empty' brick back
>>> up and it stops there unless I do a 'heal-full'.
>>>
>>> Is that what you're seeing as well?
>>>
>>> -Krutika
>>>
>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> Same issue brought up glusterd on problem node heal count still
stuck
>>>> at 6330.
>>>>
>>>> Ran gluster v heal GUSTER1 full
>>>>
>>>> glustershd on problem node shows a sweep starting and finishing
in
>>>> seconds.  Other 2 nodes show no activity in log.  They should
start a sweep
>>>> too shouldn't they?
>>>>
>>>> Tried starting from scratch
>>>>
>>>> kill -15 brickpid
>>>> rm -Rf /brick
>>>> mkdir -p /brick
>>>> mkdir mkdir /gsmount/fake2
>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>
>>>> Heals visible dirs instantly then stops.
>>>>
>>>> gluster v heal GLUSTER1 full
>>>>
>>>> see sweep star on problem node and end almost instantly.  no
files
>>>> added t heal list no files healed no more logging
>>>>
>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>> starting full sweep on subvol GLUSTER1-client-1
>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>
>>>> same results no matter which node you run command on.  Still
stuck with
>>>> 6330 files showing needing healed out of 19k.  still showing in
logs no
>>>> heals are occuring.
>>>>
>>>> Is their a way to forcibly reset any prior heal data?  Could it
be
>>>> stuck on some past failed heal start?
>>>>
>>>>
>>>>
>>>>
>>>> *David Gossage*
>>>> *Carousel Checks Inc. | System Administrator*
>>>> *Office* 708.613.2284
>>>>
>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> updated test server to 3.8.3
>>>>>>
>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>> Options Reconfigured:
>>>>>> cluster.granular-entry-heal: on
>>>>>> performance.readdir-ahead: on
>>>>>> performance.read-ahead: off
>>>>>> nfs.disable: on
>>>>>> nfs.addr-namelookup: off
>>>>>> nfs.enable-ino32: off
>>>>>> cluster.background-self-heal-count: 16
>>>>>> cluster.self-heal-window-size: 1024
>>>>>> performance.quick-read: off
>>>>>> performance.io-cache: off
>>>>>> performance.stat-prefetch: off
>>>>>> cluster.eager-lock: enable
>>>>>> network.remote-dio: on
>>>>>> cluster.quorum-type: auto
>>>>>> cluster.server-quorum-type: server
>>>>>> storage.owner-gid: 36
>>>>>> storage.owner-uid: 36
>>>>>> server.allow-insecure: on
>>>>>> features.shard: on
>>>>>> features.shard-block-size: 64MB
>>>>>> performance.strict-o-direct: off
>>>>>> cluster.locking-scheme: granular
>>>>>>
>>>>>> kill -15 brickpid
>>>>>> rm -Rf /gluster2/brick3
>>>>>> mkdir -p /gluster2/brick3/1
>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>> \:_glustershard/fake2
>>>>>> setfattr -n "user.some-name" -v
"some-value"
>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>>> gluster v start glustershard force
>>>>>>
>>>>>> at this point brick process starts and all visible
files including
>>>>>> new dir are made on brick
>>>>>> handful of shards are in heal statistics still but no
.shard
>>>>>> directory created and no increase in shard count
>>>>>>
>>>>>> gluster v heal glustershard
>>>>>>
>>>>>> At this point still no increase in count or dir made no
additional
>>>>>> activity in logs for healing generated.  waited few
minutes tailing logs to
>>>>>> check if anything kicked in.
>>>>>>
>>>>>> gluster v heal glustershard full
>>>>>>
>>>>>> gluster shards added to list and heal commences.  logs
show full
>>>>>> sweep starting on all 3 nodes.  though this time it
only shows as finishing
>>>>>> on one which looks to be the one that had brick
deleted.
>>>>>>
>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-0
>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-1
>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-2
>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>> glustershard-client-2
>>>>>>
>>>>>
>>>>> Just realized its still healing so that may be why sweep on
2 other
>>>>> bricks haven't replied as finished.
>>>>>
>>>>>>
>>>>>>
>>>>>> my hope is that later tonight a full heal will work on
production.
>>>>>> Is it possible self-heal daemon can get stale or stop
listening but still
>>>>>> show as active?  Would stopping and starting self-heal
daemon from gluster
>>>>>> cli before doing these heals be helpful?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika
Dhananjay <
>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika
Dhananjay <
>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David
Gossage <
>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM,
Krutika Dhananjay <
>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could you also share the
glustershd logs?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'll get them when I get to
work sure
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I tried the same steps that you
mentioned multiple times, but
>>>>>>>>>>>> heal is running to completion
without any issues.
>>>>>>>>>>>>
>>>>>>>>>>>> It must be said that 'heal
full' traverses the files and
>>>>>>>>>>>> directories in a depth-first
order and does heals also in the same order.
>>>>>>>>>>>> But if it gets interrupted in
the middle (say because self-heal-daemon was
>>>>>>>>>>>> either intentionally or
unintentionally brought offline and then brought
>>>>>>>>>>>> back up), self-heal will only
pick up the entries that are so far marked as
>>>>>>>>>>>> new-entries that need heal
which it will find in indices/xattrop directory.
>>>>>>>>>>>> What this means is that those
files and directories that were not visited
>>>>>>>>>>>> during the crawl, will remain
untouched and unhealed in this second
>>>>>>>>>>>> iteration of heal, unless you
execute a 'heal-full' again.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So should it start healing shards
as it crawls or not until
>>>>>>>>>>> after it crawls the entire .shard
directory?  At the pace it was going that
>>>>>>>>>>> could be a week with one node
appearing in the cluster but with no shard
>>>>>>>>>>> files if anything tries to access a
file on that node.  From my experience
>>>>>>>>>>> other day telling it to heal full
again did nothing regardless of node used.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Crawl is started from '/' of the
volume. Whenever self-heal
>>>>>>>>> detects during the crawl that a file or
directory is present in some
>>>>>>>>> brick(s) and absent in others, it creates
the file on the bricks where it
>>>>>>>>> is absent and marks the fact that the file
or directory might need
>>>>>>>>> data/entry and metadata heal too (this also
means that an index is created
>>>>>>>>> under .glusterfs/indices/xattrop of the src
bricks). And the data/entry and
>>>>>>>>> metadata heal are picked up and done in
>>>>>>>>>
>>>>>>>> the background with the help of these indices.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at my 3rd node as example i find nearly
an exact same
>>>>>>>> number of files in xattrop dir as reported by
heal count at time I brought
>>>>>>>> down node2 to try and alleviate read io errors
that seemed to occur from
>>>>>>>> what I was guessing as attempts to use the node
with no shards for reads.
>>>>>>>>
>>>>>>>> Also attached are the glustershd logs from the
3 nodes, along with
>>>>>>>> the test node i tried yesterday with same
results.
>>>>>>>>
>>>>>>>
>>>>>>> Looking at my own logs I notice that a full sweep
was only ever
>>>>>>> recorded in glustershd.log on 2nd node with missing
directory.  I believe I
>>>>>>> should have found a sweep begun on every node
correct?
>>>>>>>
>>>>>>> On my test dev when it did work I do see that
>>>>>>>
>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-0
>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-0
>>>>>>>
>>>>>>> While when looking at past few days of the 3 prod
nodes i only found
>>>>>>> that on my 2nd node
>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> My suspicion is that this is
what happened on your setup. Could
>>>>>>>>>>>> you confirm if that was the
case?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Brick was brought online with force
start then a full heal
>>>>>>>>>>> launched.  Hours later after it
became evident that it was not adding new
>>>>>>>>>>> files to heal I did try restarting
self-heal daemon and relaunching full
>>>>>>>>>>> heal again. But this was after the
heal had basically already failed to
>>>>>>>>>>> work as intended.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OK. How did you figure it was not
adding any new files? I need to
>>>>>>>>>> know what places you were monitoring to
come to this conclusion.
>>>>>>>>>>
>>>>>>>>>> -Krutika
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> As for those logs, I did
manager to do something that caused
>>>>>>>>>>>> these warning messages you
shared earlier to appear in my client and server
>>>>>>>>>>>> logs.
>>>>>>>>>>>> Although these logs are
annoying and a bit scary too, they
>>>>>>>>>>>> didn't do any harm to the
data in my volume. Why they appear just after a
>>>>>>>>>>>> brick is replaced and under no
other circumstances is something I'm still
>>>>>>>>>>>> investigating.
>>>>>>>>>>>>
>>>>>>>>>>>> But for future, it would be
good to follow the steps Anuradha
>>>>>>>>>>>> gave as that would allow
self-heal to at least detect that it has some
>>>>>>>>>>>> repairing to do whenever it is
restarted whether intentionally or otherwise.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I followed those steps as described
on my test box and ended up
>>>>>>>>>>> with exact same outcome of adding
shards at an agonizing slow pace and no
>>>>>>>>>>> creation of .shard directory or
heals on shard directory.  Directories
>>>>>>>>>>> visible from mount healed quickly. 
This was with one VM so it has only 800
>>>>>>>>>>> shards as well.  After hours at
work it had added a total of 33 shards to
>>>>>>>>>>> be healed.  I sent those logs
yesterday as well though not the glustershd.
>>>>>>>>>>>
>>>>>>>>>>> Does replace-brick command copy
files in same manner?  For these
>>>>>>>>>>> purposes I am contemplating just
skipping the heal route.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22
AM, David Gossage <
>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> attached brick and client
logs from test machine where same
>>>>>>>>>>>>> behavior occurred not sure
if anything new is there.  its still on 3.8.2
>>>>>>>>>>>>>
>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>> cluster.locking-scheme:
granular
>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>> features.shard-block-size:
64MB
>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>>> cluster.server-quorum-type:
server
>>>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>>>> performance.stat-prefetch:
off
>>>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>>> performance.readdir-ahead:
on
>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
2:20 PM, David Gossage <
>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ----- Original
Message -----
>>>>>>>>>>>>>>> > From:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>> > To:
"Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>> > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>> Gluster-users at
gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>> kdhananj at
redhat.com>
>>>>>>>>>>>>>>> > Sent: Monday,
August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>> > Subject: Re:
[Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>>>>>>> Slow
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Mon, Aug
29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > > Response
inline.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > -----
Original Message -----
>>>>>>>>>>>>>>> > > >
From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>> > > > To:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>> > > > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>> Gluster-users at
gluster.org>
>>>>>>>>>>>>>>> > > >
Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>> > > >
Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Could you attach both client and brick logs? Meanwhile
>>>>>>>>>>>>>>> I will try these
>>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>>> > > > out
on my machines and see if it is easily recreatable.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
-Krutika
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > On
Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>> > > dgossage
at carouselchecks.com
>>>>>>>>>>>>>>> > > > >
wrote:
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>> > > >
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>> > > >
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>> > > >
Options Reconfigured:
>>>>>>>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>> > > >
cluster.self-heal-daemon: on
>>>>>>>>>>>>>>> > > >
cluster.locking-scheme: granular
>>>>>>>>>>>>>>> > > >
features.shard-block-size: 64MB
>>>>>>>>>>>>>>> > > >
features.shard: on
>>>>>>>>>>>>>>> > > >
performance.readdir-ahead: on
>>>>>>>>>>>>>>> > > >
storage.owner-uid: 36
>>>>>>>>>>>>>>> > > >
storage.owner-gid: 36
>>>>>>>>>>>>>>> > > >
performance.quick-read: off
>>>>>>>>>>>>>>> > > >
performance.read-ahead: off
>>>>>>>>>>>>>>> > > >
performance.io-cache: off
>>>>>>>>>>>>>>> > > >
performance.stat-prefetch: on
>>>>>>>>>>>>>>> > > >
cluster.eager-lock: enable
>>>>>>>>>>>>>>> > > >
network.remote-dio: enable
>>>>>>>>>>>>>>> > > >
cluster.quorum-type: auto
>>>>>>>>>>>>>>> > > >
cluster.server-quorum-type: server
>>>>>>>>>>>>>>> > > >
server.allow-insecure: on
>>>>>>>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>>>>>>>> > > >
nfs.disable: on
>>>>>>>>>>>>>>> > > >
nfs.addr-namelookup: off
>>>>>>>>>>>>>>> > > >
nfs.enable-ino32: off
>>>>>>>>>>>>>>> > > >
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>>>>>>>>>>>> > > >
Following steps detailed in previous recommendations
>>>>>>>>>>>>>>> began proces of
>>>>>>>>>>>>>>> > > >
replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > 1)
kill pid of brick
>>>>>>>>>>>>>>> > > > 2)
reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>> > > > 3)
recreate directory of brick
>>>>>>>>>>>>>>> > > > 4)
gluster volume start <> force
>>>>>>>>>>>>>>> > > > 5)
gluster volume heal <> full
>>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > I'd
suggest that full heal is not used. There are a few
>>>>>>>>>>>>>>> bugs in full heal.
>>>>>>>>>>>>>>> > > Better
safe than sorry ;)
>>>>>>>>>>>>>>> > > Instead
I'd suggest the following steps:
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > Currently
I brought the node down by systemctl stop
>>>>>>>>>>>>>>> glusterd as I was
>>>>>>>>>>>>>>> > getting
sporadic io issues and a few VM's paused so hoping
>>>>>>>>>>>>>>> that will help.
>>>>>>>>>>>>>>> > I may wait to
do this till around 4PM when most work is
>>>>>>>>>>>>>>> done in case it
>>>>>>>>>>>>>>> > shoots load
up.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > > 1) kill
pid of brick
>>>>>>>>>>>>>>> > > 2) to
configuring of brick that you need
>>>>>>>>>>>>>>> > > 3)
recreate brick dir
>>>>>>>>>>>>>>> > > 4) while
the brick is still down, from the mount point:
>>>>>>>>>>>>>>> > >    a)
create a dummy non existent dir under / of mount.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > so if noee 2
is down brick, pick node for example 3 and
>>>>>>>>>>>>>>> make a test dir
>>>>>>>>>>>>>>> > under its
brick directory that doesnt exist on 2 or should
>>>>>>>>>>>>>>> I be dong this
>>>>>>>>>>>>>>> > over a gluster
mount?
>>>>>>>>>>>>>>> You should be doing
this over gluster mount.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > >    b) set
a non existent extended attribute on / of
>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Could you give
me an example of an attribute to set?
>>>>>>>>>>>>>>>  I've read a
tad on
>>>>>>>>>>>>>>> > this, and
looked up attributes but haven't set any yet
>>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> Sure. setfattr -n
"user.some-name" -v "some-value"
>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>> > Doing these
steps will ensure that heal happens only from
>>>>>>>>>>>>>>> updated brick to
>>>>>>>>>>>>>>> > > down
brick.
>>>>>>>>>>>>>>> > > 5)
gluster v start <> force
>>>>>>>>>>>>>>> > > 6)
gluster v heal <>
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Will it matter
if somewhere in gluster the full heal
>>>>>>>>>>>>>>> command was run
other
>>>>>>>>>>>>>>> > day?  Not sure
if it eventually stops or times out.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> full heal will stop
once the crawl is done. So if you want
>>>>>>>>>>>>>>> to trigger heal
again,
>>>>>>>>>>>>>>> run gluster v heal
<>. Actually even brick up or volume
>>>>>>>>>>>>>>> start force should
>>>>>>>>>>>>>>> trigger the heal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did this on test bed
today.  its one server with 3 bricks on
>>>>>>>>>>>>>> same machine so take
that for what its worth.  also it still runs 3.8.2.
>>>>>>>>>>>>>> Maybe ill update and
re-run test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>>> created fake dir on
gluster mount
>>>>>>>>>>>>>> set suggested fake
attribute on it
>>>>>>>>>>>>>> ran volume start
<> force
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> looked at files it said
needed healing and it was just 8
>>>>>>>>>>>>>> shards that were
modified for few minutes I ran through steps
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gave it few minutes and
it stayed same
>>>>>>>>>>>>>> ran gluster volume
<> heal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it healed all the
directories and files you can see over
>>>>>>>>>>>>>> mount including
fakedir.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> same issue for shards
though.  it adds more shards to heal at
>>>>>>>>>>>>>> glacier pace.  slight
jump in speed if I stat every file and dir in VM
>>>>>>>>>>>>>> running but not all
shards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It started with 8
shards to heal and is now only at 33 out of
>>>>>>>>>>>>>> 800 and probably wont
finish adding for few days at rate it goes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > > 1st
node worked as expected took 12 hours to heal 1TB
>>>>>>>>>>>>>>> data. Load was
>>>>>>>>>>>>>>> > > little
>>>>>>>>>>>>>>> > > >
heavy but nothing shocking.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
About an hour after node 1 finished I began same
>>>>>>>>>>>>>>> process on node2.
Heal
>>>>>>>>>>>>>>> > > >
proces kicked in as before and the files in
>>>>>>>>>>>>>>> directories visible
from
>>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>>> > > > and
.glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>> crawl of .shard
adding
>>>>>>>>>>>>>>> > > >
those files to heal count at which point the entire
>>>>>>>>>>>>>>> proces ground to a
>>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>>> > > >
basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>> added 5900 to heal
>>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>>> > > > Load
on all 3 machnes is negligible. It was suggested
>>>>>>>>>>>>>>> to change this
>>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>>> > > > to
full cluster.data-self-heal-algorithm and restart
>>>>>>>>>>>>>>> volume which I
>>>>>>>>>>>>>>> > > did. No
>>>>>>>>>>>>>>> > > >
efffect. Tried relaunching heal no effect, despite any
>>>>>>>>>>>>>>> node picked. I
>>>>>>>>>>>>>>> > > >
started each VM and performed a stat of all files from
>>>>>>>>>>>>>>> within it, or a
>>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>>> > > >
virus scan and that seemed to cause short small spikes
>>>>>>>>>>>>>>> in shards added,
>>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>>> > > > not
by much. Logs are showing no real messages
>>>>>>>>>>>>>>> indicating anything
is
>>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>>> > > > on.
I get hits to brick log on occasion of null
>>>>>>>>>>>>>>> lookups making me
think
>>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>>> > > > not
really crawling shards directory but waiting for a
>>>>>>>>>>>>>>> shard lookup to
>>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>>> > > > it.
I'll get following in brick log but not constant
>>>>>>>>>>>>>>> and sometime
>>>>>>>>>>>>>>> > > multiple
>>>>>>>>>>>>>>> > > > for
same shard.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>> > > >
[server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>> 0-GLUSTER1-server:
no resolution
>>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>>> > > > for
(null) (LOOKUP)
>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>> 0-GLUSTER1-server:
12591783:
>>>>>>>>>>>>>>> > > >
LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>>> > > >
argument) [Invalid argument]
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > This
one repeated about 30 times in row then nothing
>>>>>>>>>>>>>>> for 10 minutes then
>>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>>> > > > hit
for one different shard by itself.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > How
can I determine if Heal is actually running? How
>>>>>>>>>>>>>>> can I kill it or
>>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>>> > > >
restart? Does node I start it from determine which
>>>>>>>>>>>>>>> directory gets
>>>>>>>>>>>>>>> > > crawled
to
>>>>>>>>>>>>>>> > > >
determine heals?
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
David Gossage
>>>>>>>>>>>>>>> > > >
Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>> > > >
Office 708.613.2284
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>> > > Thanks,
>>>>>>>>>>>>>>> > > Anuradha.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160831/45f1cdb4/attachment.html>

David Gossage

2016-Aug-31 09:04 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> No, sorry, it's working fine. I may have missed some step because of
which
> i saw that problem. /.shard is also healing fine now.
>
> Let me know if it works for you.
>
> -Krutika
>
> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>> OK I just hit the other issue too, where .shard doesn't get healed.
:)
>>
>> Investigating as to why that is the case. Give me some time.
>>
>> -Krutika
>>
>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>> Just figured the steps Anuradha has provided won't work if
granular
>>> entry heal is on.
>>> So when you bring down a brick and create fake2 under / of the
volume,
>>> granular entry heal feature causes
>>> sh to remember only the fact that 'fake2' needs to be
recreated on the
>>> offline brick (because changelogs are granular).
>>>
>>> In this case, we would be required to indicate to self-heal-daemon
that
>>> the entire directory tree from '/' needs to be repaired on
the brick that
>>> contains no data.
>>>
>>> To fix this, I did the following (for users who use granular entry
>>> self-healing):
>>>
>>> 1. Kill the last brick process in the replica (/bricks/3)
>>>
>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>
>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>
>>> 4. Create a new dir on the mount point:
>>>     [root at client-1 ~]# mkdir /mnt/fake
>>>
>>> 5. Set some fake xattr on the root of the volume, and not the
'fake'
>>> directory itself.
>>>     [root at client-1 ~]# setfattr -n "user.some-name" -v
"some-value" /mnt
>>>
>>> 6. Make sure there's no io happening on your volume.
>>>
>>I'll test this on dev today.  But for my case in production this means
I'll
need to shut down every VM after work for this heal?  Will the fact I have
6k files already listed as needing heals affect anything?

>
>>> 7. Check the pending xattrs on the brick directories of the two
good
>>> copies (on bricks 1 and 2), you should be seeing same values as the
one
>>> marked in red in both bricks.
>>> (note that the client-<num> xattr key will have the same last
digit as
>>> the index of the brick that is down, when counting from 0. So if
the first
>>> brick is the one that is down, it would read
trusted.afr.*-client-0; if the
>>> second brick is the one that is empty and down, it would read
>>> trusted.afr.*-client-1 and so on).
>>>
>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>> # file: 1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>> # file: 2
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2
to a 1.
>>>
>>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>> *0x000000010000000100000001* /bricks/1
>>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>> *0x000000010000000100000001* /bricks/2
>>>
>>> 9. Get the xattrs again and check the xattrs are set properly now
>>>
>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>> # file: 1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>> # file: 2
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> 10. Force-start the volume.
>>>
>>> [root at server-1 ~]# gluster volume start rep force
>>> volume start: rep: success
>>>
>>> 11. Monitor heal-info command to ensure the number of entries keeps
>>> growing.
>>>
>>> 12. Keep monitoring with step 10 and eventually the number of
entries
>>> needing heal must come down to 0.
>>> Also the checksums of the files on the previously empty brick
should now
>>> match with the copies on the other two bricks.
>>>
>>> Could you check if the above steps work for you, in your test
>>> environment?
>>>
>>> You caught a nice bug in the manual steps to follow when granular
>>> entry-heal is enabled and an empty brick needs heal. Thanks for
reporting
>>> it. :) We will fix the documentation appropriately.
>>>
>>> -Krutika
>>>
>>>
>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhananj at
redhat.com
>>> > wrote:
>>>
>>>> Tried this.
>>>>
>>>> With me, only 'fake2' gets healed after i bring the
'empty' brick back
>>>> up and it stops there unless I do a 'heal-full'.
>>>>
>>>> Is that what you're seeing as well?
>>>>
>>>> -Krutika
>>>>
>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> Same issue brought up glusterd on problem node heal count
still stuck
>>>>> at 6330.
>>>>>
>>>>> Ran gluster v heal GUSTER1 full
>>>>>
>>>>> glustershd on problem node shows a sweep starting and
finishing in
>>>>> seconds.  Other 2 nodes show no activity in log.  They
should start a sweep
>>>>> too shouldn't they?
>>>>>
>>>>> Tried starting from scratch
>>>>>
>>>>> kill -15 brickpid
>>>>> rm -Rf /brick
>>>>> mkdir -p /brick
>>>>> mkdir mkdir /gsmount/fake2
>>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>>
>>>>> Heals visible dirs instantly then stops.
>>>>>
>>>>> gluster v heal GLUSTER1 full
>>>>>
>>>>> see sweep star on problem node and end almost instantly. 
no files
>>>>> added t heal list no files healed no more logging
>>>>>
>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>
>>>>> same results no matter which node you run command on. 
Still stuck
>>>>> with 6330 files showing needing healed out of 19k.  still
showing in logs
>>>>> no heals are occuring.
>>>>>
>>>>> Is their a way to forcibly reset any prior heal data? 
Could it be
>>>>> stuck on some past failed heal start?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *David Gossage*
>>>>> *Carousel Checks Inc. | System Administrator*
>>>>> *Office* 708.613.2284
>>>>>
>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> updated test server to 3.8.3
>>>>>>>
>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>> Options Reconfigured:
>>>>>>> cluster.granular-entry-heal: on
>>>>>>> performance.readdir-ahead: on
>>>>>>> performance.read-ahead: off
>>>>>>> nfs.disable: on
>>>>>>> nfs.addr-namelookup: off
>>>>>>> nfs.enable-ino32: off
>>>>>>> cluster.background-self-heal-count: 16
>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>> performance.quick-read: off
>>>>>>> performance.io-cache: off
>>>>>>> performance.stat-prefetch: off
>>>>>>> cluster.eager-lock: enable
>>>>>>> network.remote-dio: on
>>>>>>> cluster.quorum-type: auto
>>>>>>> cluster.server-quorum-type: server
>>>>>>> storage.owner-gid: 36
>>>>>>> storage.owner-uid: 36
>>>>>>> server.allow-insecure: on
>>>>>>> features.shard: on
>>>>>>> features.shard-block-size: 64MB
>>>>>>> performance.strict-o-direct: off
>>>>>>> cluster.locking-scheme: granular
>>>>>>>
>>>>>>> kill -15 brickpid
>>>>>>> rm -Rf /gluster2/brick3
>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>> \:_glustershard/fake2
>>>>>>> setfattr -n "user.some-name" -v
"some-value"
>>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>>>> gluster v start glustershard force
>>>>>>>
>>>>>>> at this point brick process starts and all visible
files including
>>>>>>> new dir are made on brick
>>>>>>> handful of shards are in heal statistics still but
no .shard
>>>>>>> directory created and no increase in shard count
>>>>>>>
>>>>>>> gluster v heal glustershard
>>>>>>>
>>>>>>> At this point still no increase in count or dir
made no additional
>>>>>>> activity in logs for healing generated.  waited few
minutes tailing logs to
>>>>>>> check if anything kicked in.
>>>>>>>
>>>>>>> gluster v heal glustershard full
>>>>>>>
>>>>>>> gluster shards added to list and heal commences. 
logs show full
>>>>>>> sweep starting on all 3 nodes.  though this time it
only shows as finishing
>>>>>>> on one which looks to be the one that had brick
deleted.
>>>>>>>
>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-0
>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>>
>>>>>>
>>>>>> Just realized its still healing so that may be why
sweep on 2 other
>>>>>> bricks haven't replied as finished.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> my hope is that later tonight a full heal will work
on production.
>>>>>>> Is it possible self-heal daemon can get stale or
stop listening but still
>>>>>>> show as active?  Would stopping and starting
self-heal daemon from gluster
>>>>>>> cli before doing these heals be helpful?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage
<
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika
Dhananjay <
>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM,
Krutika Dhananjay <
>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM,
David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18
AM, Krutika Dhananjay <
>>>>>>>>>>>> kdhananj at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you also share the
glustershd logs?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'll get them when I get to
work sure
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried the same steps that
you mentioned multiple times, but
>>>>>>>>>>>>> heal is running to
completion without any issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It must be said that
'heal full' traverses the files and
>>>>>>>>>>>>> directories in a
depth-first order and does heals also in the same order.
>>>>>>>>>>>>> But if it gets interrupted
in the middle (say because self-heal-daemon was
>>>>>>>>>>>>> either intentionally or
unintentionally brought offline and then brought
>>>>>>>>>>>>> back up), self-heal will
only pick up the entries that are so far marked as
>>>>>>>>>>>>> new-entries that need heal
which it will find in indices/xattrop directory.
>>>>>>>>>>>>> What this means is that
those files and directories that were not visited
>>>>>>>>>>>>> during the crawl, will
remain untouched and unhealed in this second
>>>>>>>>>>>>> iteration of heal, unless
you execute a 'heal-full' again.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So should it start healing
shards as it crawls or not until
>>>>>>>>>>>> after it crawls the entire
.shard directory?  At the pace it was going that
>>>>>>>>>>>> could be a week with one node
appearing in the cluster but with no shard
>>>>>>>>>>>> files if anything tries to
access a file on that node.  From my experience
>>>>>>>>>>>> other day telling it to heal
full again did nothing regardless of node used.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Crawl is started from '/' of
the volume. Whenever self-heal
>>>>>>>>>> detects during the crawl that a file or
directory is present in some
>>>>>>>>>> brick(s) and absent in others, it
creates the file on the bricks where it
>>>>>>>>>> is absent and marks the fact that the
file or directory might need
>>>>>>>>>> data/entry and metadata heal too (this
also means that an index is created
>>>>>>>>>> under .glusterfs/indices/xattrop of the
src bricks). And the data/entry and
>>>>>>>>>> metadata heal are picked up and done in
>>>>>>>>>>
>>>>>>>>> the background with the help of these
indices.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at my 3rd node as example i find
nearly an exact same
>>>>>>>>> number of files in xattrop dir as reported
by heal count at time I brought
>>>>>>>>> down node2 to try and alleviate read io
errors that seemed to occur from
>>>>>>>>> what I was guessing as attempts to use the
node with no shards for reads.
>>>>>>>>>
>>>>>>>>> Also attached are the glustershd logs from
the 3 nodes, along with
>>>>>>>>> the test node i tried yesterday with same
results.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at my own logs I notice that a full
sweep was only ever
>>>>>>>> recorded in glustershd.log on 2nd node with
missing directory.  I believe I
>>>>>>>> should have found a sweep begun on every node
correct?
>>>>>>>>
>>>>>>>> On my test dev when it did work I do see that
>>>>>>>>
>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-0
>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-0
>>>>>>>>
>>>>>>>> While when looking at past few days of the 3
prod nodes i only
>>>>>>>> found that on my 2nd node
>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> My suspicion is that this
is what happened on your setup.
>>>>>>>>>>>>> Could you confirm if that
was the case?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Brick was brought online with
force start then a full heal
>>>>>>>>>>>> launched.  Hours later after it
became evident that it was not adding new
>>>>>>>>>>>> files to heal I did try
restarting self-heal daemon and relaunching full
>>>>>>>>>>>> heal again. But this was after
the heal had basically already failed to
>>>>>>>>>>>> work as intended.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> OK. How did you figure it was not
adding any new files? I need
>>>>>>>>>>> to know what places you were
monitoring to come to this conclusion.
>>>>>>>>>>>
>>>>>>>>>>> -Krutika
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> As for those logs, I did
manager to do something that caused
>>>>>>>>>>>>> these warning messages you
shared earlier to appear in my client and server
>>>>>>>>>>>>> logs.
>>>>>>>>>>>>> Although these logs are
annoying and a bit scary too, they
>>>>>>>>>>>>> didn't do any harm to
the data in my volume. Why they appear just after a
>>>>>>>>>>>>> brick is replaced and under
no other circumstances is something I'm still
>>>>>>>>>>>>> investigating.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But for future, it would be
good to follow the steps Anuradha
>>>>>>>>>>>>> gave as that would allow
self-heal to at least detect that it has some
>>>>>>>>>>>>> repairing to do whenever it
is restarted whether intentionally or otherwise.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I followed those steps as
described on my test box and ended up
>>>>>>>>>>>> with exact same outcome of
adding shards at an agonizing slow pace and no
>>>>>>>>>>>> creation of .shard directory or
heals on shard directory.  Directories
>>>>>>>>>>>> visible from mount healed
quickly.  This was with one VM so it has only 800
>>>>>>>>>>>> shards as well.  After hours at
work it had added a total of 33 shards to
>>>>>>>>>>>> be healed.  I sent those logs
yesterday as well though not the glustershd.
>>>>>>>>>>>>
>>>>>>>>>>>> Does replace-brick command copy
files in same manner?  For
>>>>>>>>>>>> these purposes I am
contemplating just skipping the heal route.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
2:22 AM, David Gossage <
>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> attached brick and
client logs from test machine where same
>>>>>>>>>>>>>> behavior occurred not
sure if anything new is there.  its still on 3.8.2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Number of Bricks: 1 x 3
= 3
>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>>> cluster.locking-scheme:
granular
>>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>>>
features.shard-block-size: 64MB
>>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>>> server.allow-insecure:
on
>>>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>>>>
cluster.server-quorum-type: server
>>>>>>>>>>>>>> cluster.quorum-type:
auto
>>>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>>>> cluster.eager-lock:
enable
>>>>>>>>>>>>>>
performance.stat-prefetch: off
>>>>>>>>>>>>>> performance.io-cache:
off
>>>>>>>>>>>>>> performance.quick-read:
off
>>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>>>> nfs.addr-namelookup:
off
>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>> performance.read-ahead:
off
>>>>>>>>>>>>>>
performance.readdir-ahead: on
>>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
2:20 PM, David Gossage <
>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 29,
2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ----- Original
Message -----
>>>>>>>>>>>>>>>> > From:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>> > To:
"Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>> > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>> kdhananj at
redhat.com>
>>>>>>>>>>>>>>>> > Sent:
Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>> > Subject:
Re: [Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>>>>>>>> Slow
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Mon,
Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > >
Response inline.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > -----
Original Message -----
>>>>>>>>>>>>>>>> > > >
From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>>> > > >
To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>> > > >
Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org>
>>>>>>>>>>>>>>>> > > >
Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>> > > >
Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Could you attach both client and brick logs?
>>>>>>>>>>>>>>>> Meanwhile I
will try these
>>>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>>>> > > >
out on my machines and see if it is easily
>>>>>>>>>>>>>>>> recreatable.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
-Krutika
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>> > >
dgossage at carouselchecks.com
>>>>>>>>>>>>>>>> > > >
> wrote:
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > >
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > >
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > >
Options Reconfigured:
>>>>>>>>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>> > > >
cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>> > > >
cluster.locking-scheme: granular
>>>>>>>>>>>>>>>> > > >
features.shard-block-size: 64MB
>>>>>>>>>>>>>>>> > > >
features.shard: on
>>>>>>>>>>>>>>>> > > >
performance.readdir-ahead: on
>>>>>>>>>>>>>>>> > > >
storage.owner-uid: 36
>>>>>>>>>>>>>>>> > > >
storage.owner-gid: 36
>>>>>>>>>>>>>>>> > > >
performance.quick-read: off
>>>>>>>>>>>>>>>> > > >
performance.read-ahead: off
>>>>>>>>>>>>>>>> > > >
performance.io-cache: off
>>>>>>>>>>>>>>>> > > >
performance.stat-prefetch: on
>>>>>>>>>>>>>>>> > > >
cluster.eager-lock: enable
>>>>>>>>>>>>>>>> > > >
network.remote-dio: enable
>>>>>>>>>>>>>>>> > > >
cluster.quorum-type: auto
>>>>>>>>>>>>>>>> > > >
cluster.server-quorum-type: server
>>>>>>>>>>>>>>>> > > >
server.allow-insecure: on
>>>>>>>>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>>>>>>>>> > > >
nfs.disable: on
>>>>>>>>>>>>>>>> > > >
nfs.addr-namelookup: off
>>>>>>>>>>>>>>>> > > >
nfs.enable-ino32: off
>>>>>>>>>>>>>>>> > > >
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>> > > >
Following steps detailed in previous recommendations
>>>>>>>>>>>>>>>> began proces of
>>>>>>>>>>>>>>>> > > >
replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
1) kill pid of brick
>>>>>>>>>>>>>>>> > > >
2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>> > > >
3) recreate directory of brick
>>>>>>>>>>>>>>>> > > >
4) gluster volume start <> force
>>>>>>>>>>>>>>>> > > >
5) gluster volume heal <> full
>>>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > >
I'd suggest that full heal is not used. There are a few
>>>>>>>>>>>>>>>> bugs in full
heal.
>>>>>>>>>>>>>>>> > >
Better safe than sorry ;)
>>>>>>>>>>>>>>>> > >
Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > >
Currently I brought the node down by systemctl stop
>>>>>>>>>>>>>>>> glusterd as I
was
>>>>>>>>>>>>>>>> > getting
sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>> hoping that
will help.
>>>>>>>>>>>>>>>> > I may wait
to do this till around 4PM when most work is
>>>>>>>>>>>>>>>> done in case it
>>>>>>>>>>>>>>>> > shoots
load up.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > 1)
kill pid of brick
>>>>>>>>>>>>>>>> > > 2) to
configuring of brick that you need
>>>>>>>>>>>>>>>> > > 3)
recreate brick dir
>>>>>>>>>>>>>>>> > > 4)
while the brick is still down, from the mount point:
>>>>>>>>>>>>>>>> > >    a)
create a dummy non existent dir under / of mount.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > so if noee
2 is down brick, pick node for example 3 and
>>>>>>>>>>>>>>>> make a test dir
>>>>>>>>>>>>>>>> > under its
brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>> should I be
dong this
>>>>>>>>>>>>>>>> > over a
gluster mount?
>>>>>>>>>>>>>>>> You should be
doing this over gluster mount.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > >    b)
set a non existent extended attribute on / of
>>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Could you
give me an example of an attribute to set?
>>>>>>>>>>>>>>>>  I've read
a tad on
>>>>>>>>>>>>>>>> > this, and
looked up attributes but haven't set any yet
>>>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> Sure. setfattr
-n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>>> > Doing
these steps will ensure that heal happens only from
>>>>>>>>>>>>>>>> updated brick
to
>>>>>>>>>>>>>>>> > > down
brick.
>>>>>>>>>>>>>>>> > > 5)
gluster v start <> force
>>>>>>>>>>>>>>>> > > 6)
gluster v heal <>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Will it
matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>> command was run
other
>>>>>>>>>>>>>>>> > day?  Not
sure if it eventually stops or times out.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> full heal will
stop once the crawl is done. So if you want
>>>>>>>>>>>>>>>> to trigger heal
again,
>>>>>>>>>>>>>>>> run gluster v
heal <>. Actually even brick up or volume
>>>>>>>>>>>>>>>> start force
should
>>>>>>>>>>>>>>>> trigger the
heal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did this on test
bed today.  its one server with 3 bricks on
>>>>>>>>>>>>>>> same machine so
take that for what its worth.  also it still runs 3.8.2.
>>>>>>>>>>>>>>> Maybe ill update
and re-run test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>>>> created fake dir on
gluster mount
>>>>>>>>>>>>>>> set suggested fake
attribute on it
>>>>>>>>>>>>>>> ran volume start
<> force
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> looked at files it
said needed healing and it was just 8
>>>>>>>>>>>>>>> shards that were
modified for few minutes I ran through steps
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gave it few minutes
and it stayed same
>>>>>>>>>>>>>>> ran gluster volume
<> heal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> it healed all the
directories and files you can see over
>>>>>>>>>>>>>>> mount including
fakedir.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> same issue for
shards though.  it adds more shards to heal
>>>>>>>>>>>>>>> at glacier pace. 
slight jump in speed if I stat every file and dir in VM
>>>>>>>>>>>>>>> running but not all
shards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It started with 8
shards to heal and is now only at 33 out
>>>>>>>>>>>>>>> of 800 and probably
wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > >
1st node worked as expected took 12 hours to heal 1TB
>>>>>>>>>>>>>>>> data. Load was
>>>>>>>>>>>>>>>> > >
little
>>>>>>>>>>>>>>>> > > >
heavy but nothing shocking.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>> process on
node2. Heal
>>>>>>>>>>>>>>>> > > >
proces kicked in as before and the files in
>>>>>>>>>>>>>>>> directories
visible from
>>>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>>>> > > >
and .glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>>> crawl of .shard
adding
>>>>>>>>>>>>>>>> > > >
those files to heal count at which point the entire
>>>>>>>>>>>>>>>> proces ground
to a
>>>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>>>> > > >
basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>>> added 5900 to
heal
>>>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>>>> > > >
Load on all 3 machnes is negligible. It was suggested
>>>>>>>>>>>>>>>> to change this
>>>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>>>> > > >
to full cluster.data-self-heal-algorithm and restart
>>>>>>>>>>>>>>>> volume which I
>>>>>>>>>>>>>>>> > > did.
No
>>>>>>>>>>>>>>>> > > >
efffect. Tried relaunching heal no effect, despite
>>>>>>>>>>>>>>>> any node
picked. I
>>>>>>>>>>>>>>>> > > >
started each VM and performed a stat of all files
>>>>>>>>>>>>>>>> from within it,
or a
>>>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>>>> > > >
virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>> spikes in
shards added,
>>>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>>>> > > >
not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>> indicating
anything is
>>>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>>>> > > >
on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>> lookups making
me think
>>>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>>>> > > >
not really crawling shards directory but waiting for
>>>>>>>>>>>>>>>> a shard lookup
to
>>>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>>>> > > >
it. I'll get following in brick log but not constant
>>>>>>>>>>>>>>>> and sometime
>>>>>>>>>>>>>>>> > >
multiple
>>>>>>>>>>>>>>>> > > >
for same shard.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>> > > >
[server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>>
0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>>>> > > >
for (null) (LOOKUP)
>>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>>
0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>> > > >
LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>>>> > > >
argument) [Invalid argument]
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
This one repeated about 30 times in row then nothing
>>>>>>>>>>>>>>>> for 10 minutes
then
>>>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>>>> > > >
hit for one different shard by itself.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
How can I determine if Heal is actually running? How
>>>>>>>>>>>>>>>> can I kill it
or
>>>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>>>> > > >
restart? Does node I start it from determine which
>>>>>>>>>>>>>>>> directory gets
>>>>>>>>>>>>>>>> > >
crawled to
>>>>>>>>>>>>>>>> > > >
determine heals?
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
David Gossage
>>>>>>>>>>>>>>>> > > >
Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>> > > >
Office 708.613.2284
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>>> > >
Thanks,
>>>>>>>>>>>>>>>> > >
Anuradha.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160831/8e2dd801/attachment.html>

David Gossage

2016-Aug-31 14:43 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

Just as a test I did not shut down the one VM on the cluster as finding a
window before weekend where I can shut down all VM's and fit in a full heal
is unlikely so wanted to see what occurs.


kill -15 brick pid
rm -Rf /gluster2/brick1/1
mkdir /gluster2/brick1/1
mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake3
setfattr -n "user.some-name" -v "some-value"
/rhev/data-center/mnt/
glusterSD/192.168.71.10\:_glustershard

getfattr -d -m . -e hex /gluster2/brick2/1
# file: gluster2/brick2/1
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.afr.glustershard-client-0=0x000000000000000200000000
trusted.afr.glustershard-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

getfattr -d -m . -e hex /gluster2/brick3/1
# file: gluster2/brick3/1
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000001
trusted.afr.glustershard-client-0=0x000000000000000200000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

setfattr -n trusted.afr.glustershard-client-0 -v 0x000000010000000200000000
/gluster2/brick2/1
setfattr -n trusted.afr.glustershard-client-0 -v 0x000000010000000200000000
/gluster2/brick3/1

getfattr -d -m . -e hex /gluster2/brick3/1/
getfattr: Removing leading '/' from absolute path names
# file: gluster2/brick3/1/
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glustershard-client-0=0x000000010000000200000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

getfattr -d -m . -e hex /gluster2/brick2/1/
getfattr: Removing leading '/' from absolute path names
# file: gluster2/brick2/1/
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glustershard-client-0=0x000000010000000200000000
trusted.afr.glustershard-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

gluster v start glustershard force

gluster heal counts climbed up and down a little as it healed everything in
visible gluster mount and .glusterfs for visible mount files then stalled
with around 15 shards and the fake3 directory still in list

getfattr -d -m . -e hex /gluster2/brick2/1/
getfattr: Removing leading '/' from absolute path names
# file: gluster2/brick2/1/
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glustershard-client-0=0x000000010000000000000000
trusted.afr.glustershard-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

getfattr -d -m . -e hex /gluster2/brick3/1/
getfattr: Removing leading '/' from absolute path names
# file: gluster2/brick3/1/
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glustershard-client-0=0x000000010000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

getfattr -d -m . -e hex /gluster2/brick1/1/
getfattr: Removing leading '/' from absolute path names
# file: gluster2/brick1/1/
security.selinux=0x756e636f6e66696e65645f753a6f
626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

heal count stayed same for awhile then ran

gluster v heal glustershard full

heals jump up to 700 as shards actually get read in as needing heals.
 glustershd shows 3 sweeps started one per brick

It heals shards things look ok heal <> info shows 0 files but statistics
heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt stop vm
running?

# file: gluster2/brick1/1/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

# file: gluster2/brick2/1/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glustershard-client-0=0x000000010000000000000000
trusted.afr.glustershard-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

# file: gluster2/brick3/1/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glustershard-client-0=0x000000010000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
user.some-name=0x736f6d652d76616c7565

meta-data split-brain?  heal <> info split-brain shows no files or
entries.  If I had thought ahead I would have checked the values returned
by getfattr before, although I do know heal-count was returning 0 at the
time


Assuming I need to shut down vm's and put volume in maintenance from ovirt
to prevent any io.  Does it need to occur for whole heal or can I
re-activate at some point to bring VM's back up?




*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> No, sorry, it's working fine. I may have missed some step because of
which
> i saw that problem. /.shard is also healing fine now.
>
> Let me know if it works for you.
>
> -Krutika
>
> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>> OK I just hit the other issue too, where .shard doesn't get healed.
:)
>>
>> Investigating as to why that is the case. Give me some time.
>>
>> -Krutika
>>
>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>> Just figured the steps Anuradha has provided won't work if
granular
>>> entry heal is on.
>>> So when you bring down a brick and create fake2 under / of the
volume,
>>> granular entry heal feature causes
>>> sh to remember only the fact that 'fake2' needs to be
recreated on the
>>> offline brick (because changelogs are granular).
>>>
>>> In this case, we would be required to indicate to self-heal-daemon
that
>>> the entire directory tree from '/' needs to be repaired on
the brick that
>>> contains no data.
>>>
>>> To fix this, I did the following (for users who use granular entry
>>> self-healing):
>>>
>>> 1. Kill the last brick process in the replica (/bricks/3)
>>>
>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>
>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>
>>> 4. Create a new dir on the mount point:
>>>     [root at client-1 ~]# mkdir /mnt/fake
>>>
>>> 5. Set some fake xattr on the root of the volume, and not the
'fake'
>>> directory itself.
>>>     [root at client-1 ~]# setfattr -n "user.some-name" -v
"some-value" /mnt
>>>
>>> 6. Make sure there's no io happening on your volume.
>>>
>>> 7. Check the pending xattrs on the brick directories of the two
good
>>> copies (on bricks 1 and 2), you should be seeing same values as the
one
>>> marked in red in both bricks.
>>> (note that the client-<num> xattr key will have the same last
digit as
>>> the index of the brick that is down, when counting from 0. So if
the first
>>> brick is the one that is down, it would read
trusted.afr.*-client-0; if the
>>> second brick is the one that is empty and down, it would read
>>> trusted.afr.*-client-1 and so on).
>>>
>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>> # file: 1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>> # file: 2
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2
to a 1.
>>>
>>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>> *0x000000010000000100000001* /bricks/1
>>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>> *0x000000010000000100000001* /bricks/2
>>>
>>> 9. Get the xattrs again and check the xattrs are set properly now
>>>
>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>> # file: 1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>> # file: 2
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> 10. Force-start the volume.
>>>
>>> [root at server-1 ~]# gluster volume start rep force
>>> volume start: rep: success
>>>
>>> 11. Monitor heal-info command to ensure the number of entries keeps
>>> growing.
>>>
>>> 12. Keep monitoring with step 10 and eventually the number of
entries
>>> needing heal must come down to 0.
>>> Also the checksums of the files on the previously empty brick
should now
>>> match with the copies on the other two bricks.
>>>
>>> Could you check if the above steps work for you, in your test
>>> environment?
>>>
>>> You caught a nice bug in the manual steps to follow when granular
>>> entry-heal is enabled and an empty brick needs heal. Thanks for
reporting
>>> it. :) We will fix the documentation appropriately.
>>>
>>> -Krutika
>>>
>>>
>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhananj at
redhat.com
>>> > wrote:
>>>
>>>> Tried this.
>>>>
>>>> With me, only 'fake2' gets healed after i bring the
'empty' brick back
>>>> up and it stops there unless I do a 'heal-full'.
>>>>
>>>> Is that what you're seeing as well?
>>>>
>>>> -Krutika
>>>>
>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> Same issue brought up glusterd on problem node heal count
still stuck
>>>>> at 6330.
>>>>>
>>>>> Ran gluster v heal GUSTER1 full
>>>>>
>>>>> glustershd on problem node shows a sweep starting and
finishing in
>>>>> seconds.  Other 2 nodes show no activity in log.  They
should start a sweep
>>>>> too shouldn't they?
>>>>>
>>>>> Tried starting from scratch
>>>>>
>>>>> kill -15 brickpid
>>>>> rm -Rf /brick
>>>>> mkdir -p /brick
>>>>> mkdir mkdir /gsmount/fake2
>>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>>
>>>>> Heals visible dirs instantly then stops.
>>>>>
>>>>> gluster v heal GLUSTER1 full
>>>>>
>>>>> see sweep star on problem node and end almost instantly. 
no files
>>>>> added t heal list no files healed no more logging
>>>>>
>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>
>>>>> same results no matter which node you run command on. 
Still stuck
>>>>> with 6330 files showing needing healed out of 19k.  still
showing in logs
>>>>> no heals are occuring.
>>>>>
>>>>> Is their a way to forcibly reset any prior heal data? 
Could it be
>>>>> stuck on some past failed heal start?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *David Gossage*
>>>>> *Carousel Checks Inc. | System Administrator*
>>>>> *Office* 708.613.2284
>>>>>
>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> updated test server to 3.8.3
>>>>>>>
>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>> Options Reconfigured:
>>>>>>> cluster.granular-entry-heal: on
>>>>>>> performance.readdir-ahead: on
>>>>>>> performance.read-ahead: off
>>>>>>> nfs.disable: on
>>>>>>> nfs.addr-namelookup: off
>>>>>>> nfs.enable-ino32: off
>>>>>>> cluster.background-self-heal-count: 16
>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>> performance.quick-read: off
>>>>>>> performance.io-cache: off
>>>>>>> performance.stat-prefetch: off
>>>>>>> cluster.eager-lock: enable
>>>>>>> network.remote-dio: on
>>>>>>> cluster.quorum-type: auto
>>>>>>> cluster.server-quorum-type: server
>>>>>>> storage.owner-gid: 36
>>>>>>> storage.owner-uid: 36
>>>>>>> server.allow-insecure: on
>>>>>>> features.shard: on
>>>>>>> features.shard-block-size: 64MB
>>>>>>> performance.strict-o-direct: off
>>>>>>> cluster.locking-scheme: granular
>>>>>>>
>>>>>>> kill -15 brickpid
>>>>>>> rm -Rf /gluster2/brick3
>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>> \:_glustershard/fake2
>>>>>>> setfattr -n "user.some-name" -v
"some-value"
>>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>>>> gluster v start glustershard force
>>>>>>>
>>>>>>> at this point brick process starts and all visible
files including
>>>>>>> new dir are made on brick
>>>>>>> handful of shards are in heal statistics still but
no .shard
>>>>>>> directory created and no increase in shard count
>>>>>>>
>>>>>>> gluster v heal glustershard
>>>>>>>
>>>>>>> At this point still no increase in count or dir
made no additional
>>>>>>> activity in logs for healing generated.  waited few
minutes tailing logs to
>>>>>>> check if anything kicked in.
>>>>>>>
>>>>>>> gluster v heal glustershard full
>>>>>>>
>>>>>>> gluster shards added to list and heal commences. 
logs show full
>>>>>>> sweep starting on all 3 nodes.  though this time it
only shows as finishing
>>>>>>> on one which looks to be the one that had brick
deleted.
>>>>>>>
>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-0
>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>>
>>>>>>
>>>>>> Just realized its still healing so that may be why
sweep on 2 other
>>>>>> bricks haven't replied as finished.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> my hope is that later tonight a full heal will work
on production.
>>>>>>> Is it possible self-heal daemon can get stale or
stop listening but still
>>>>>>> show as active?  Would stopping and starting
self-heal daemon from gluster
>>>>>>> cli before doing these heals be helpful?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage
<
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika
Dhananjay <
>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM,
Krutika Dhananjay <
>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM,
David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18
AM, Krutika Dhananjay <
>>>>>>>>>>>> kdhananj at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you also share the
glustershd logs?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'll get them when I get to
work sure
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried the same steps that
you mentioned multiple times, but
>>>>>>>>>>>>> heal is running to
completion without any issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It must be said that
'heal full' traverses the files and
>>>>>>>>>>>>> directories in a
depth-first order and does heals also in the same order.
>>>>>>>>>>>>> But if it gets interrupted
in the middle (say because self-heal-daemon was
>>>>>>>>>>>>> either intentionally or
unintentionally brought offline and then brought
>>>>>>>>>>>>> back up), self-heal will
only pick up the entries that are so far marked as
>>>>>>>>>>>>> new-entries that need heal
which it will find in indices/xattrop directory.
>>>>>>>>>>>>> What this means is that
those files and directories that were not visited
>>>>>>>>>>>>> during the crawl, will
remain untouched and unhealed in this second
>>>>>>>>>>>>> iteration of heal, unless
you execute a 'heal-full' again.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So should it start healing
shards as it crawls or not until
>>>>>>>>>>>> after it crawls the entire
.shard directory?  At the pace it was going that
>>>>>>>>>>>> could be a week with one node
appearing in the cluster but with no shard
>>>>>>>>>>>> files if anything tries to
access a file on that node.  From my experience
>>>>>>>>>>>> other day telling it to heal
full again did nothing regardless of node used.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Crawl is started from '/' of
the volume. Whenever self-heal
>>>>>>>>>> detects during the crawl that a file or
directory is present in some
>>>>>>>>>> brick(s) and absent in others, it
creates the file on the bricks where it
>>>>>>>>>> is absent and marks the fact that the
file or directory might need
>>>>>>>>>> data/entry and metadata heal too (this
also means that an index is created
>>>>>>>>>> under .glusterfs/indices/xattrop of the
src bricks). And the data/entry and
>>>>>>>>>> metadata heal are picked up and done in
>>>>>>>>>>
>>>>>>>>> the background with the help of these
indices.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at my 3rd node as example i find
nearly an exact same
>>>>>>>>> number of files in xattrop dir as reported
by heal count at time I brought
>>>>>>>>> down node2 to try and alleviate read io
errors that seemed to occur from
>>>>>>>>> what I was guessing as attempts to use the
node with no shards for reads.
>>>>>>>>>
>>>>>>>>> Also attached are the glustershd logs from
the 3 nodes, along with
>>>>>>>>> the test node i tried yesterday with same
results.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at my own logs I notice that a full
sweep was only ever
>>>>>>>> recorded in glustershd.log on 2nd node with
missing directory.  I believe I
>>>>>>>> should have found a sweep begun on every node
correct?
>>>>>>>>
>>>>>>>> On my test dev when it did work I do see that
>>>>>>>>
>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-0
>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-0
>>>>>>>>
>>>>>>>> While when looking at past few days of the 3
prod nodes i only
>>>>>>>> found that on my 2nd node
>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> My suspicion is that this
is what happened on your setup.
>>>>>>>>>>>>> Could you confirm if that
was the case?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Brick was brought online with
force start then a full heal
>>>>>>>>>>>> launched.  Hours later after it
became evident that it was not adding new
>>>>>>>>>>>> files to heal I did try
restarting self-heal daemon and relaunching full
>>>>>>>>>>>> heal again. But this was after
the heal had basically already failed to
>>>>>>>>>>>> work as intended.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> OK. How did you figure it was not
adding any new files? I need
>>>>>>>>>>> to know what places you were
monitoring to come to this conclusion.
>>>>>>>>>>>
>>>>>>>>>>> -Krutika
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> As for those logs, I did
manager to do something that caused
>>>>>>>>>>>>> these warning messages you
shared earlier to appear in my client and server
>>>>>>>>>>>>> logs.
>>>>>>>>>>>>> Although these logs are
annoying and a bit scary too, they
>>>>>>>>>>>>> didn't do any harm to
the data in my volume. Why they appear just after a
>>>>>>>>>>>>> brick is replaced and under
no other circumstances is something I'm still
>>>>>>>>>>>>> investigating.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But for future, it would be
good to follow the steps Anuradha
>>>>>>>>>>>>> gave as that would allow
self-heal to at least detect that it has some
>>>>>>>>>>>>> repairing to do whenever it
is restarted whether intentionally or otherwise.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I followed those steps as
described on my test box and ended up
>>>>>>>>>>>> with exact same outcome of
adding shards at an agonizing slow pace and no
>>>>>>>>>>>> creation of .shard directory or
heals on shard directory.  Directories
>>>>>>>>>>>> visible from mount healed
quickly.  This was with one VM so it has only 800
>>>>>>>>>>>> shards as well.  After hours at
work it had added a total of 33 shards to
>>>>>>>>>>>> be healed.  I sent those logs
yesterday as well though not the glustershd.
>>>>>>>>>>>>
>>>>>>>>>>>> Does replace-brick command copy
files in same manner?  For
>>>>>>>>>>>> these purposes I am
contemplating just skipping the heal route.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
2:22 AM, David Gossage <
>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> attached brick and
client logs from test machine where same
>>>>>>>>>>>>>> behavior occurred not
sure if anything new is there.  its still on 3.8.2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Number of Bricks: 1 x 3
= 3
>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>>> cluster.locking-scheme:
granular
>>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>>>
features.shard-block-size: 64MB
>>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>>> server.allow-insecure:
on
>>>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>>>>
cluster.server-quorum-type: server
>>>>>>>>>>>>>> cluster.quorum-type:
auto
>>>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>>>> cluster.eager-lock:
enable
>>>>>>>>>>>>>>
performance.stat-prefetch: off
>>>>>>>>>>>>>> performance.io-cache:
off
>>>>>>>>>>>>>> performance.quick-read:
off
>>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>>>> nfs.addr-namelookup:
off
>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>> performance.read-ahead:
off
>>>>>>>>>>>>>>
performance.readdir-ahead: on
>>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
2:20 PM, David Gossage <
>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 29,
2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ----- Original
Message -----
>>>>>>>>>>>>>>>> > From:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>> > To:
"Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>> > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>> kdhananj at
redhat.com>
>>>>>>>>>>>>>>>> > Sent:
Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>> > Subject:
Re: [Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>>>>>>>> Slow
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Mon,
Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > >
Response inline.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > -----
Original Message -----
>>>>>>>>>>>>>>>> > > >
From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>>> > > >
To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>> > > >
Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>> Gluster-users
at gluster.org>
>>>>>>>>>>>>>>>> > > >
Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>> > > >
Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Could you attach both client and brick logs?
>>>>>>>>>>>>>>>> Meanwhile I
will try these
>>>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>>>> > > >
out on my machines and see if it is easily
>>>>>>>>>>>>>>>> recreatable.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
-Krutika
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>> > >
dgossage at carouselchecks.com
>>>>>>>>>>>>>>>> > > >
> wrote:
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > >
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > >
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > >
Options Reconfigured:
>>>>>>>>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>> > > >
cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>> > > >
cluster.locking-scheme: granular
>>>>>>>>>>>>>>>> > > >
features.shard-block-size: 64MB
>>>>>>>>>>>>>>>> > > >
features.shard: on
>>>>>>>>>>>>>>>> > > >
performance.readdir-ahead: on
>>>>>>>>>>>>>>>> > > >
storage.owner-uid: 36
>>>>>>>>>>>>>>>> > > >
storage.owner-gid: 36
>>>>>>>>>>>>>>>> > > >
performance.quick-read: off
>>>>>>>>>>>>>>>> > > >
performance.read-ahead: off
>>>>>>>>>>>>>>>> > > >
performance.io-cache: off
>>>>>>>>>>>>>>>> > > >
performance.stat-prefetch: on
>>>>>>>>>>>>>>>> > > >
cluster.eager-lock: enable
>>>>>>>>>>>>>>>> > > >
network.remote-dio: enable
>>>>>>>>>>>>>>>> > > >
cluster.quorum-type: auto
>>>>>>>>>>>>>>>> > > >
cluster.server-quorum-type: server
>>>>>>>>>>>>>>>> > > >
server.allow-insecure: on
>>>>>>>>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>>>>>>>>> > > >
nfs.disable: on
>>>>>>>>>>>>>>>> > > >
nfs.addr-namelookup: off
>>>>>>>>>>>>>>>> > > >
nfs.enable-ino32: off
>>>>>>>>>>>>>>>> > > >
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>> > > >
Following steps detailed in previous recommendations
>>>>>>>>>>>>>>>> began proces of
>>>>>>>>>>>>>>>> > > >
replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
1) kill pid of brick
>>>>>>>>>>>>>>>> > > >
2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>> > > >
3) recreate directory of brick
>>>>>>>>>>>>>>>> > > >
4) gluster volume start <> force
>>>>>>>>>>>>>>>> > > >
5) gluster volume heal <> full
>>>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > >
I'd suggest that full heal is not used. There are a few
>>>>>>>>>>>>>>>> bugs in full
heal.
>>>>>>>>>>>>>>>> > >
Better safe than sorry ;)
>>>>>>>>>>>>>>>> > >
Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > >
Currently I brought the node down by systemctl stop
>>>>>>>>>>>>>>>> glusterd as I
was
>>>>>>>>>>>>>>>> > getting
sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>> hoping that
will help.
>>>>>>>>>>>>>>>> > I may wait
to do this till around 4PM when most work is
>>>>>>>>>>>>>>>> done in case it
>>>>>>>>>>>>>>>> > shoots
load up.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > 1)
kill pid of brick
>>>>>>>>>>>>>>>> > > 2) to
configuring of brick that you need
>>>>>>>>>>>>>>>> > > 3)
recreate brick dir
>>>>>>>>>>>>>>>> > > 4)
while the brick is still down, from the mount point:
>>>>>>>>>>>>>>>> > >    a)
create a dummy non existent dir under / of mount.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > so if noee
2 is down brick, pick node for example 3 and
>>>>>>>>>>>>>>>> make a test dir
>>>>>>>>>>>>>>>> > under its
brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>> should I be
dong this
>>>>>>>>>>>>>>>> > over a
gluster mount?
>>>>>>>>>>>>>>>> You should be
doing this over gluster mount.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > >    b)
set a non existent extended attribute on / of
>>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Could you
give me an example of an attribute to set?
>>>>>>>>>>>>>>>>  I've read
a tad on
>>>>>>>>>>>>>>>> > this, and
looked up attributes but haven't set any yet
>>>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> Sure. setfattr
-n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>>> > Doing
these steps will ensure that heal happens only from
>>>>>>>>>>>>>>>> updated brick
to
>>>>>>>>>>>>>>>> > > down
brick.
>>>>>>>>>>>>>>>> > > 5)
gluster v start <> force
>>>>>>>>>>>>>>>> > > 6)
gluster v heal <>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Will it
matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>> command was run
other
>>>>>>>>>>>>>>>> > day?  Not
sure if it eventually stops or times out.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> full heal will
stop once the crawl is done. So if you want
>>>>>>>>>>>>>>>> to trigger heal
again,
>>>>>>>>>>>>>>>> run gluster v
heal <>. Actually even brick up or volume
>>>>>>>>>>>>>>>> start force
should
>>>>>>>>>>>>>>>> trigger the
heal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did this on test
bed today.  its one server with 3 bricks on
>>>>>>>>>>>>>>> same machine so
take that for what its worth.  also it still runs 3.8.2.
>>>>>>>>>>>>>>> Maybe ill update
and re-run test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>>>> created fake dir on
gluster mount
>>>>>>>>>>>>>>> set suggested fake
attribute on it
>>>>>>>>>>>>>>> ran volume start
<> force
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> looked at files it
said needed healing and it was just 8
>>>>>>>>>>>>>>> shards that were
modified for few minutes I ran through steps
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gave it few minutes
and it stayed same
>>>>>>>>>>>>>>> ran gluster volume
<> heal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> it healed all the
directories and files you can see over
>>>>>>>>>>>>>>> mount including
fakedir.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> same issue for
shards though.  it adds more shards to heal
>>>>>>>>>>>>>>> at glacier pace. 
slight jump in speed if I stat every file and dir in VM
>>>>>>>>>>>>>>> running but not all
shards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It started with 8
shards to heal and is now only at 33 out
>>>>>>>>>>>>>>> of 800 and probably
wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > >
1st node worked as expected took 12 hours to heal 1TB
>>>>>>>>>>>>>>>> data. Load was
>>>>>>>>>>>>>>>> > >
little
>>>>>>>>>>>>>>>> > > >
heavy but nothing shocking.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>> process on
node2. Heal
>>>>>>>>>>>>>>>> > > >
proces kicked in as before and the files in
>>>>>>>>>>>>>>>> directories
visible from
>>>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>>>> > > >
and .glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>>> crawl of .shard
adding
>>>>>>>>>>>>>>>> > > >
those files to heal count at which point the entire
>>>>>>>>>>>>>>>> proces ground
to a
>>>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>>>> > > >
basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>>> added 5900 to
heal
>>>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>>>> > > >
Load on all 3 machnes is negligible. It was suggested
>>>>>>>>>>>>>>>> to change this
>>>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>>>> > > >
to full cluster.data-self-heal-algorithm and restart
>>>>>>>>>>>>>>>> volume which I
>>>>>>>>>>>>>>>> > > did.
No
>>>>>>>>>>>>>>>> > > >
efffect. Tried relaunching heal no effect, despite
>>>>>>>>>>>>>>>> any node
picked. I
>>>>>>>>>>>>>>>> > > >
started each VM and performed a stat of all files
>>>>>>>>>>>>>>>> from within it,
or a
>>>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>>>> > > >
virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>> spikes in
shards added,
>>>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>>>> > > >
not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>> indicating
anything is
>>>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>>>> > > >
on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>> lookups making
me think
>>>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>>>> > > >
not really crawling shards directory but waiting for
>>>>>>>>>>>>>>>> a shard lookup
to
>>>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>>>> > > >
it. I'll get following in brick log but not constant
>>>>>>>>>>>>>>>> and sometime
>>>>>>>>>>>>>>>> > >
multiple
>>>>>>>>>>>>>>>> > > >
for same shard.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>> > > >
[server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>>
0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>>>> > > >
for (null) (LOOKUP)
>>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>>
0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>> > > >
LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>>>> > > >
argument) [Invalid argument]
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
This one repeated about 30 times in row then nothing
>>>>>>>>>>>>>>>> for 10 minutes
then
>>>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>>>> > > >
hit for one different shard by itself.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
How can I determine if Heal is actually running? How
>>>>>>>>>>>>>>>> can I kill it
or
>>>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>>>> > > >
restart? Does node I start it from determine which
>>>>>>>>>>>>>>>> directory gets
>>>>>>>>>>>>>>>> > >
crawled to
>>>>>>>>>>>>>>>> > > >
determine heals?
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
David Gossage
>>>>>>>>>>>>>>>> > > >
Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>> > > >
Office 708.613.2284
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>>> > >
Thanks,
>>>>>>>>>>>>>>>> > >
Anuradha.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160831/9192c4ca/attachment.html>

Krutika Dhananjay

2016-Sep-01 05:09 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

On Wed, Aug 31, 2016 at 8:13 PM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> Just as a test I did not shut down the one VM on the cluster as finding a
> window before weekend where I can shut down all VM's and fit in a full
heal
> is unlikely so wanted to see what occurs.
>
>
> kill -15 brick pid
> rm -Rf /gluster2/brick1/1
> mkdir /gluster2/brick1/1
> mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake3
> setfattr -n "user.some-name" -v "some-value"
/rhev/data-center/mnt/glusterS
> D/192.168.71.10\:_glustershard
>
> getfattr -d -m . -e hex /gluster2/brick2/1
> # file: gluster2/brick2/1
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000001
> trusted.afr.glustershard-client-0=0x000000000000000200000000
>
This is unusual. The last digit ought to have been 1 on account of
"fake3"
being created while hte first brick is offline.

This discussion is becoming unnecessary lengthy. Mind if we discuss this
and sort it out on IRC today, at least the communication will be continuous
and in real-time. I'm kdhananjay on #gluster (Freenode). Ping me when
you're online.

-Krutika


> trusted.afr.glustershard-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> getfattr -d -m . -e hex /gluster2/brick3/1
> # file: gluster2/brick3/1
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000001
> trusted.afr.glustershard-client-0=0x000000000000000200000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> setfattr -n trusted.afr.glustershard-client-0 -v
> 0x000000010000000200000000 /gluster2/brick2/1
> setfattr -n trusted.afr.glustershard-client-0 -v
> 0x000000010000000200000000 /gluster2/brick3/1
>
> getfattr -d -m . -e hex /gluster2/brick3/1/
> getfattr: Removing leading '/' from absolute path names
> # file: gluster2/brick3/1/
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glustershard-client-0=0x000000010000000200000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> getfattr -d -m . -e hex /gluster2/brick2/1/
> getfattr: Removing leading '/' from absolute path names
> # file: gluster2/brick2/1/
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glustershard-client-0=0x000000010000000200000000
> trusted.afr.glustershard-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> gluster v start glustershard force
>
> gluster heal counts climbed up and down a little as it healed everything
> in visible gluster mount and .glusterfs for visible mount files then
> stalled with around 15 shards and the fake3 directory still in list
>
> getfattr -d -m . -e hex /gluster2/brick2/1/
> getfattr: Removing leading '/' from absolute path names
> # file: gluster2/brick2/1/
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glustershard-client-0=0x000000010000000000000000
> trusted.afr.glustershard-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> getfattr -d -m . -e hex /gluster2/brick3/1/
> getfattr: Removing leading '/' from absolute path names
> # file: gluster2/brick3/1/
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glustershard-client-0=0x000000010000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> getfattr -d -m . -e hex /gluster2/brick1/1/
> getfattr: Removing leading '/' from absolute path names
> # file: gluster2/brick1/1/
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> heal count stayed same for awhile then ran
>
> gluster v heal glustershard full
>
> heals jump up to 700 as shards actually get read in as needing heals.
>  glustershd shows 3 sweeps started one per brick
>
> It heals shards things look ok heal <> info shows 0 files but
statistics
> heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt stop vm
> running?
>
> # file: gluster2/brick1/1/
> security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> # file: gluster2/brick2/1/
> security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glustershard-client-0=0x000000010000000000000000
> trusted.afr.glustershard-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> # file: gluster2/brick3/1/
> security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glustershard-client-0=0x000000010000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> meta-data split-brain?  heal <> info split-brain shows no files or
> entries.  If I had thought ahead I would have checked the values returned
> by getfattr before, although I do know heal-count was returning 0 at the
> time
>
>
> Assuming I need to shut down vm's and put volume in maintenance from
ovirt
> to prevent any io.  Does it need to occur for whole heal or can I
> re-activate at some point to bring VM's back up?
>
>
>
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>> No, sorry, it's working fine. I may have missed some step because
of
>> which i saw that problem. /.shard is also healing fine now.
>>
>> Let me know if it works for you.
>>
>> -Krutika
>>
>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>> OK I just hit the other issue too, where .shard doesn't get
healed. :)
>>>
>>> Investigating as to why that is the case. Give me some time.
>>>
>>> -Krutika
>>>
>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at
redhat.com
>>> > wrote:
>>>
>>>> Just figured the steps Anuradha has provided won't work if
granular
>>>> entry heal is on.
>>>> So when you bring down a brick and create fake2 under / of the
volume,
>>>> granular entry heal feature causes
>>>> sh to remember only the fact that 'fake2' needs to be
recreated on the
>>>> offline brick (because changelogs are granular).
>>>>
>>>> In this case, we would be required to indicate to
self-heal-daemon that
>>>> the entire directory tree from '/' needs to be repaired
on the brick that
>>>> contains no data.
>>>>
>>>> To fix this, I did the following (for users who use granular
entry
>>>> self-healing):
>>>>
>>>> 1. Kill the last brick process in the replica (/bricks/3)
>>>>
>>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>>
>>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>>
>>>> 4. Create a new dir on the mount point:
>>>>     [root at client-1 ~]# mkdir /mnt/fake
>>>>
>>>> 5. Set some fake xattr on the root of the volume, and not the
'fake'
>>>> directory itself.
>>>>     [root at client-1 ~]# setfattr -n
"user.some-name" -v "some-value"
>>>> /mnt
>>>>
>>>> 6. Make sure there's no io happening on your volume.
>>>>
>>>> 7. Check the pending xattrs on the brick directories of the two
good
>>>> copies (on bricks 1 and 2), you should be seeing same values as
the one
>>>> marked in red in both bricks.
>>>> (note that the client-<num> xattr key will have the same
last digit as
>>>> the index of the brick that is down, when counting from 0. So
if the first
>>>> brick is the one that is down, it would read
trusted.afr.*-client-0; if the
>>>> second brick is the one that is empty and down, it would read
>>>> trusted.afr.*-client-1 and so on).
>>>>
>>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>>> # file: 1
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a6574635f72756e74696d655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>
>>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>>> # file: 2
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a6574635f72756e74696d655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>
>>>> 8. Flip the 8th digit in the
trusted.afr.<VOLNAME>-client-2 to a 1.
>>>>
>>>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>>> *0x000000010000000100000001* /bricks/1
>>>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>>> *0x000000010000000100000001* /bricks/2
>>>>
>>>> 9. Get the xattrs again and check the xattrs are set properly
now
>>>>
>>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>>> # file: 1
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a6574635f72756e74696d655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>
>>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>>> # file: 2
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a6574635f72756e74696d655f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>
>>>> 10. Force-start the volume.
>>>>
>>>> [root at server-1 ~]# gluster volume start rep force
>>>> volume start: rep: success
>>>>
>>>> 11. Monitor heal-info command to ensure the number of entries
keeps
>>>> growing.
>>>>
>>>> 12. Keep monitoring with step 10 and eventually the number of
entries
>>>> needing heal must come down to 0.
>>>> Also the checksums of the files on the previously empty brick
should
>>>> now match with the copies on the other two bricks.
>>>>
>>>> Could you check if the above steps work for you, in your test
>>>> environment?
>>>>
>>>> You caught a nice bug in the manual steps to follow when
granular
>>>> entry-heal is enabled and an empty brick needs heal. Thanks for
reporting
>>>> it. :) We will fix the documentation appropriately.
>>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <
>>>> kdhananj at redhat.com> wrote:
>>>>
>>>>> Tried this.
>>>>>
>>>>> With me, only 'fake2' gets healed after i bring the
'empty' brick back
>>>>> up and it stops there unless I do a 'heal-full'.
>>>>>
>>>>> Is that what you're seeing as well?
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> Same issue brought up glusterd on problem node heal
count still stuck
>>>>>> at 6330.
>>>>>>
>>>>>> Ran gluster v heal GUSTER1 full
>>>>>>
>>>>>> glustershd on problem node shows a sweep starting and
finishing in
>>>>>> seconds.  Other 2 nodes show no activity in log.  They
should start a sweep
>>>>>> too shouldn't they?
>>>>>>
>>>>>> Tried starting from scratch
>>>>>>
>>>>>> kill -15 brickpid
>>>>>> rm -Rf /brick
>>>>>> mkdir -p /brick
>>>>>> mkdir mkdir /gsmount/fake2
>>>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>>>
>>>>>> Heals visible dirs instantly then stops.
>>>>>>
>>>>>> gluster v heal GLUSTER1 full
>>>>>>
>>>>>> see sweep star on problem node and end almost
instantly.  no files
>>>>>> added t heal list no files healed no more logging
>>>>>>
>>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>
>>>>>> same results no matter which node you run command on. 
Still stuck
>>>>>> with 6330 files showing needing healed out of 19k. 
still showing in logs
>>>>>> no heals are occuring.
>>>>>>
>>>>>> Is their a way to forcibly reset any prior heal data? 
Could it be
>>>>>> stuck on some past failed heal start?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *David Gossage*
>>>>>> *Carousel Checks Inc. | System Administrator*
>>>>>> *Office* 708.613.2284
>>>>>>
>>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage
<
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> updated test server to 3.8.3
>>>>>>>>
>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>>> Options Reconfigured:
>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> performance.read-ahead: off
>>>>>>>> nfs.disable: on
>>>>>>>> nfs.addr-namelookup: off
>>>>>>>> nfs.enable-ino32: off
>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>> performance.quick-read: off
>>>>>>>> performance.io-cache: off
>>>>>>>> performance.stat-prefetch: off
>>>>>>>> cluster.eager-lock: enable
>>>>>>>> network.remote-dio: on
>>>>>>>> cluster.quorum-type: auto
>>>>>>>> cluster.server-quorum-type: server
>>>>>>>> storage.owner-gid: 36
>>>>>>>> storage.owner-uid: 36
>>>>>>>> server.allow-insecure: on
>>>>>>>> features.shard: on
>>>>>>>> features.shard-block-size: 64MB
>>>>>>>> performance.strict-o-direct: off
>>>>>>>> cluster.locking-scheme: granular
>>>>>>>>
>>>>>>>> kill -15 brickpid
>>>>>>>> rm -Rf /gluster2/brick3
>>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>>> \:_glustershard/fake2
>>>>>>>> setfattr -n "user.some-name" -v
"some-value"
>>>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>>>>> gluster v start glustershard force
>>>>>>>>
>>>>>>>> at this point brick process starts and all
visible files including
>>>>>>>> new dir are made on brick
>>>>>>>> handful of shards are in heal statistics still
but no .shard
>>>>>>>> directory created and no increase in shard
count
>>>>>>>>
>>>>>>>> gluster v heal glustershard
>>>>>>>>
>>>>>>>> At this point still no increase in count or dir
made no additional
>>>>>>>> activity in logs for healing generated.  waited
few minutes tailing logs to
>>>>>>>> check if anything kicked in.
>>>>>>>>
>>>>>>>> gluster v heal glustershard full
>>>>>>>>
>>>>>>>> gluster shards added to list and heal
commences.  logs show full
>>>>>>>> sweep starting on all 3 nodes.  though this
time it only shows as finishing
>>>>>>>> on one which looks to be the one that had brick
deleted.
>>>>>>>>
>>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-0
>>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep
on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep
on subvol
>>>>>>>> glustershard-client-2
>>>>>>>>
>>>>>>>
>>>>>>> Just realized its still healing so that may be why
sweep on 2 other
>>>>>>> bricks haven't replied as finished.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> my hope is that later tonight a full heal will
work on production.
>>>>>>>> Is it possible self-heal daemon can get stale
or stop listening but still
>>>>>>>> show as active?  Would stopping and starting
self-heal daemon from gluster
>>>>>>>> cli before doing these heals be helpful?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage
<
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David
Gossage <
>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM,
Krutika Dhananjay <
>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM,
Krutika Dhananjay <
>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07
PM, David Gossage <
>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
7:18 AM, Krutika Dhananjay <
>>>>>>>>>>>>> kdhananj at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you also share
the glustershd logs?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll get them when I
get to work sure
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I tried the same steps
that you mentioned multiple times, but
>>>>>>>>>>>>>> heal is running to
completion without any issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It must be said that
'heal full' traverses the files and
>>>>>>>>>>>>>> directories in a
depth-first order and does heals also in the same order.
>>>>>>>>>>>>>> But if it gets
interrupted in the middle (say because self-heal-daemon was
>>>>>>>>>>>>>> either intentionally or
unintentionally brought offline and then brought
>>>>>>>>>>>>>> back up), self-heal
will only pick up the entries that are so far marked as
>>>>>>>>>>>>>> new-entries that need
heal which it will find in indices/xattrop directory.
>>>>>>>>>>>>>> What this means is that
those files and directories that were not visited
>>>>>>>>>>>>>> during the crawl, will
remain untouched and unhealed in this second
>>>>>>>>>>>>>> iteration of heal,
unless you execute a 'heal-full' again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So should it start healing
shards as it crawls or not until
>>>>>>>>>>>>> after it crawls the entire
.shard directory?  At the pace it was going that
>>>>>>>>>>>>> could be a week with one
node appearing in the cluster but with no shard
>>>>>>>>>>>>> files if anything tries to
access a file on that node.  From my experience
>>>>>>>>>>>>> other day telling it to
heal full again did nothing regardless of node used.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Crawl is started from '/'
of the volume. Whenever self-heal
>>>>>>>>>>> detects during the crawl that a
file or directory is present in some
>>>>>>>>>>> brick(s) and absent in others, it
creates the file on the bricks where it
>>>>>>>>>>> is absent and marks the fact that
the file or directory might need
>>>>>>>>>>> data/entry and metadata heal too
(this also means that an index is created
>>>>>>>>>>> under .glusterfs/indices/xattrop of
the src bricks). And the data/entry and
>>>>>>>>>>> metadata heal are picked up and
done in
>>>>>>>>>>>
>>>>>>>>>> the background with the help of these
indices.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Looking at my 3rd node as example i
find nearly an exact same
>>>>>>>>>> number of files in xattrop dir as
reported by heal count at time I brought
>>>>>>>>>> down node2 to try and alleviate read io
errors that seemed to occur from
>>>>>>>>>> what I was guessing as attempts to use
the node with no shards for reads.
>>>>>>>>>>
>>>>>>>>>> Also attached are the glustershd logs
from the 3 nodes, along
>>>>>>>>>> with the test node i tried yesterday
with same results.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at my own logs I notice that a full
sweep was only ever
>>>>>>>>> recorded in glustershd.log on 2nd node with
missing directory.  I believe I
>>>>>>>>> should have found a sweep begun on every
node correct?
>>>>>>>>>
>>>>>>>>> On my test dev when it did work I do see
that
>>>>>>>>>
>>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-glustershard-replicate-0: starting full
sweep on subvol
>>>>>>>>> glustershard-client-0
>>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-glustershard-replicate-0: starting full
sweep on subvol
>>>>>>>>> glustershard-client-1
>>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-glustershard-replicate-0: starting full
sweep on subvol
>>>>>>>>> glustershard-client-2
>>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-glustershard-replicate-0: finished full
sweep on subvol
>>>>>>>>> glustershard-client-2
>>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-glustershard-replicate-0: finished full
sweep on subvol
>>>>>>>>> glustershard-client-1
>>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-glustershard-replicate-0: finished full
sweep on subvol
>>>>>>>>> glustershard-client-0
>>>>>>>>>
>>>>>>>>> While when looking at past few days of the
3 prod nodes i only
>>>>>>>>> found that on my 2nd node
>>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep
on subvol GLUSTER1-client-1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> My suspicion is that
this is what happened on your setup.
>>>>>>>>>>>>>> Could you confirm if
that was the case?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Brick was brought online
with force start then a full heal
>>>>>>>>>>>>> launched.  Hours later
after it became evident that it was not adding new
>>>>>>>>>>>>> files to heal I did try
restarting self-heal daemon and relaunching full
>>>>>>>>>>>>> heal again. But this was
after the heal had basically already failed to
>>>>>>>>>>>>> work as intended.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> OK. How did you figure it was
not adding any new files? I need
>>>>>>>>>>>> to know what places you were
monitoring to come to this conclusion.
>>>>>>>>>>>>
>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for those logs, I
did manager to do something that caused
>>>>>>>>>>>>>> these warning messages
you shared earlier to appear in my client and server
>>>>>>>>>>>>>> logs.
>>>>>>>>>>>>>> Although these logs are
annoying and a bit scary too, they
>>>>>>>>>>>>>> didn't do any harm
to the data in my volume. Why they appear just after a
>>>>>>>>>>>>>> brick is replaced and
under no other circumstances is something I'm still
>>>>>>>>>>>>>> investigating.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But for future, it
would be good to follow the steps Anuradha
>>>>>>>>>>>>>> gave as that would
allow self-heal to at least detect that it has some
>>>>>>>>>>>>>> repairing to do
whenever it is restarted whether intentionally or otherwise.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I followed those steps as
described on my test box and ended
>>>>>>>>>>>>> up with exact same outcome
of adding shards at an agonizing slow pace and
>>>>>>>>>>>>> no creation of .shard
directory or heals on shard directory.  Directories
>>>>>>>>>>>>> visible from mount healed
quickly.  This was with one VM so it has only 800
>>>>>>>>>>>>> shards as well.  After
hours at work it had added a total of 33 shards to
>>>>>>>>>>>>> be healed.  I sent those
logs yesterday as well though not the glustershd.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does replace-brick command
copy files in same manner?  For
>>>>>>>>>>>>> these purposes I am
contemplating just skipping the heal route.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
2:22 AM, David Gossage <
>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> attached brick and
client logs from test machine where same
>>>>>>>>>>>>>>> behavior occurred
not sure if anything new is there.  its still on 3.8.2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Number of Bricks: 1
x 3 = 3
>>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>>> Options
Reconfigured:
>>>>>>>>>>>>>>>
cluster.locking-scheme: granular
>>>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>>>>
features.shard-block-size: 64MB
>>>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>>>>
server.allow-insecure: on
>>>>>>>>>>>>>>> storage.owner-uid:
36
>>>>>>>>>>>>>>> storage.owner-gid:
36
>>>>>>>>>>>>>>>
cluster.server-quorum-type: server
>>>>>>>>>>>>>>>
cluster.quorum-type: auto
>>>>>>>>>>>>>>> network.remote-dio:
on
>>>>>>>>>>>>>>> cluster.eager-lock:
enable
>>>>>>>>>>>>>>>
performance.stat-prefetch: off
>>>>>>>>>>>>>>>
performance.io-cache: off
>>>>>>>>>>>>>>>
performance.quick-read: off
>>>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>> nfs.enable-ino32:
off
>>>>>>>>>>>>>>>
nfs.addr-namelookup: off
>>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>>>
performance.read-ahead: off
>>>>>>>>>>>>>>>
performance.readdir-ahead: on
>>>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 29,
2016 at 2:20 PM, David Gossage <
>>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Aug 29,
2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----
Original Message -----
>>>>>>>>>>>>>>>>> > From:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>>> > To:
"Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>>> > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>>> kdhananj at
redhat.com>
>>>>>>>>>>>>>>>>> > Sent:
Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>>> >
Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>> Glacier
Slow
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On
Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > >
Response inline.
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
----- Original Message -----
>>>>>>>>>>>>>>>>> > >
> From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>>>> > >
> To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>>> > >
> Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>> > >
> Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>>> > >
> Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>> Glacier
Slow
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> Could you attach both client and brick logs?
>>>>>>>>>>>>>>>>> Meanwhile I
will try these
>>>>>>>>>>>>>>>>> > >
steps
>>>>>>>>>>>>>>>>> > >
> out on my machines and see if it is easily
>>>>>>>>>>>>>>>>>
recreatable.
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> -Krutika
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>>> > >
dgossage at carouselchecks.com
>>>>>>>>>>>>>>>>> > >
> > wrote:
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>> > >
> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>> > >
> Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>> > >
> Options Reconfigured:
>>>>>>>>>>>>>>>>> > >
> cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>>> > >
> cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>>> > >
> cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>> > >
> features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>> > >
> features.shard: on
>>>>>>>>>>>>>>>>> > >
> performance.readdir-ahead: on
>>>>>>>>>>>>>>>>> > >
> storage.owner-uid: 36
>>>>>>>>>>>>>>>>> > >
> storage.owner-gid: 36
>>>>>>>>>>>>>>>>> > >
> performance.quick-read: off
>>>>>>>>>>>>>>>>> > >
> performance.read-ahead: off
>>>>>>>>>>>>>>>>> > >
> performance.io-cache: off
>>>>>>>>>>>>>>>>> > >
> performance.stat-prefetch: on
>>>>>>>>>>>>>>>>> > >
> cluster.eager-lock: enable
>>>>>>>>>>>>>>>>> > >
> network.remote-dio: enable
>>>>>>>>>>>>>>>>> > >
> cluster.quorum-type: auto
>>>>>>>>>>>>>>>>> > >
> cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>> > >
> server.allow-insecure: on
>>>>>>>>>>>>>>>>> > >
> cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>> > >
> cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>> > >
> performance.strict-write-ordering: off
>>>>>>>>>>>>>>>>> > >
> nfs.disable: on
>>>>>>>>>>>>>>>>> > >
> nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>> > >
> nfs.enable-ino32: off
>>>>>>>>>>>>>>>>> > >
> cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>>> > >
> Following steps detailed in previous recommendations
>>>>>>>>>>>>>>>>> began
proces of
>>>>>>>>>>>>>>>>> > >
> replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> 1) kill pid of brick
>>>>>>>>>>>>>>>>> > >
> 2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>>> > >
> 3) recreate directory of brick
>>>>>>>>>>>>>>>>> > >
> 4) gluster volume start <> force
>>>>>>>>>>>>>>>>> > >
> 5) gluster volume heal <> full
>>>>>>>>>>>>>>>>> > >
Hi,
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
I'd suggest that full heal is not used. There are a
>>>>>>>>>>>>>>>>> few bugs in
full heal.
>>>>>>>>>>>>>>>>> > >
Better safe than sorry ;)
>>>>>>>>>>>>>>>>> > >
Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
Currently I brought the node down by systemctl stop
>>>>>>>>>>>>>>>>> glusterd as
I was
>>>>>>>>>>>>>>>>> >
getting sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>>> hoping that
will help.
>>>>>>>>>>>>>>>>> > I may
wait to do this till around 4PM when most work is
>>>>>>>>>>>>>>>>> done in
case it
>>>>>>>>>>>>>>>>> > shoots
load up.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > >
1) kill pid of brick
>>>>>>>>>>>>>>>>> > >
2) to configuring of brick that you need
>>>>>>>>>>>>>>>>> > >
3) recreate brick dir
>>>>>>>>>>>>>>>>> > >
4) while the brick is still down, from the mount point:
>>>>>>>>>>>>>>>>> > >  
a) create a dummy non existent dir under / of mount.
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > so if
noee 2 is down brick, pick node for example 3 and
>>>>>>>>>>>>>>>>> make a test
dir
>>>>>>>>>>>>>>>>> > under
its brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>>> should I be
dong this
>>>>>>>>>>>>>>>>> > over a
gluster mount?
>>>>>>>>>>>>>>>>> You should
be doing this over gluster mount.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > >  
b) set a non existent extended attribute on / of
>>>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Could
you give me an example of an attribute to set?
>>>>>>>>>>>>>>>>>  I've
read a tad on
>>>>>>>>>>>>>>>>> > this,
and looked up attributes but haven't set any yet
>>>>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> Sure.
setfattr -n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>>>> > Doing
these steps will ensure that heal happens only
>>>>>>>>>>>>>>>>> from
updated brick to
>>>>>>>>>>>>>>>>> > >
down brick.
>>>>>>>>>>>>>>>>> > >
5) gluster v start <> force
>>>>>>>>>>>>>>>>> > >
6) gluster v heal <>
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Will
it matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>>> command was
run other
>>>>>>>>>>>>>>>>> > day? 
Not sure if it eventually stops or times out.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> full heal
will stop once the crawl is done. So if you want
>>>>>>>>>>>>>>>>> to trigger
heal again,
>>>>>>>>>>>>>>>>> run gluster
v heal <>. Actually even brick up or volume
>>>>>>>>>>>>>>>>> start force
should
>>>>>>>>>>>>>>>>> trigger the
heal.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Did this on
test bed today.  its one server with 3 bricks
>>>>>>>>>>>>>>>> on same machine
so take that for what its worth.  also it still runs
>>>>>>>>>>>>>>>> 3.8.2.  Maybe
ill update and re-run test.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>>>> deleted brick
dir
>>>>>>>>>>>>>>>> recreated brick
dir
>>>>>>>>>>>>>>>> created fake
dir on gluster mount
>>>>>>>>>>>>>>>> set suggested
fake attribute on it
>>>>>>>>>>>>>>>> ran volume
start <> force
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> looked at files
it said needed healing and it was just 8
>>>>>>>>>>>>>>>> shards that
were modified for few minutes I ran through steps
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> gave it few
minutes and it stayed same
>>>>>>>>>>>>>>>> ran gluster
volume <> heal
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> it healed all
the directories and files you can see over
>>>>>>>>>>>>>>>> mount including
fakedir.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> same issue for
shards though.  it adds more shards to heal
>>>>>>>>>>>>>>>> at glacier
pace.  slight jump in speed if I stat every file and dir in VM
>>>>>>>>>>>>>>>> running but not
all shards.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It started with
8 shards to heal and is now only at 33 out
>>>>>>>>>>>>>>>> of 800 and
probably wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
> 1st node worked as expected took 12 hours to heal
>>>>>>>>>>>>>>>>> 1TB data.
Load was
>>>>>>>>>>>>>>>>> > >
little
>>>>>>>>>>>>>>>>> > >
> heavy but nothing shocking.
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>>> process on
node2. Heal
>>>>>>>>>>>>>>>>> > >
> proces kicked in as before and the files in
>>>>>>>>>>>>>>>>> directories
visible from
>>>>>>>>>>>>>>>>> > >
mount
>>>>>>>>>>>>>>>>> > >
> and .glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>>>> crawl of
.shard adding
>>>>>>>>>>>>>>>>> > >
> those files to heal count at which point the entire
>>>>>>>>>>>>>>>>> proces
ground to a
>>>>>>>>>>>>>>>>> > >
halt
>>>>>>>>>>>>>>>>> > >
> basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>>>> added 5900
to heal
>>>>>>>>>>>>>>>>> > >
list.
>>>>>>>>>>>>>>>>> > >
> Load on all 3 machnes is negligible. It was
>>>>>>>>>>>>>>>>> suggested
to change this
>>>>>>>>>>>>>>>>> > >
value
>>>>>>>>>>>>>>>>> > >
> to full cluster.data-self-heal-algorithm and
>>>>>>>>>>>>>>>>> restart
volume which I
>>>>>>>>>>>>>>>>> > >
did. No
>>>>>>>>>>>>>>>>> > >
> efffect. Tried relaunching heal no effect, despite
>>>>>>>>>>>>>>>>> any node
picked. I
>>>>>>>>>>>>>>>>> > >
> started each VM and performed a stat of all files
>>>>>>>>>>>>>>>>> from within
it, or a
>>>>>>>>>>>>>>>>> > >
full
>>>>>>>>>>>>>>>>> > >
> virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>>> spikes in
shards added,
>>>>>>>>>>>>>>>>> > >
but
>>>>>>>>>>>>>>>>> > >
> not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>>> indicating
anything is
>>>>>>>>>>>>>>>>> > >
going
>>>>>>>>>>>>>>>>> > >
> on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>>> lookups
making me think
>>>>>>>>>>>>>>>>> > >
its
>>>>>>>>>>>>>>>>> > >
> not really crawling shards directory but waiting for
>>>>>>>>>>>>>>>>> a shard
lookup to
>>>>>>>>>>>>>>>>> > >
add
>>>>>>>>>>>>>>>>> > >
> it. I'll get following in brick log but not constant
>>>>>>>>>>>>>>>>> and
sometime
>>>>>>>>>>>>>>>>> > >
multiple
>>>>>>>>>>>>>>>>> > >
> for same shard.
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>>> > >
> [server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>>>
0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>>> > >
type
>>>>>>>>>>>>>>>>> > >
> for (null) (LOOKUP)
>>>>>>>>>>>>>>>>> > >
> [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>>> > >
> [server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>>>
0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>>> > >
> LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>>> > >
> 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>>> ==>
(Invalid
>>>>>>>>>>>>>>>>> > >
> argument) [Invalid argument]
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> This one repeated about 30 times in row then nothing
>>>>>>>>>>>>>>>>> for 10
minutes then
>>>>>>>>>>>>>>>>> > >
one
>>>>>>>>>>>>>>>>> > >
> hit for one different shard by itself.
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> How can I determine if Heal is actually running? How
>>>>>>>>>>>>>>>>> can I kill
it or
>>>>>>>>>>>>>>>>> > >
force
>>>>>>>>>>>>>>>>> > >
> restart? Does node I start it from determine which
>>>>>>>>>>>>>>>>> directory
gets
>>>>>>>>>>>>>>>>> > >
crawled to
>>>>>>>>>>>>>>>>> > >
> determine heals?
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> David Gossage
>>>>>>>>>>>>>>>>> > >
> Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>>> > >
> Office 708.613.2284
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> _______________________________________________
>>>>>>>>>>>>>>>>> > >
> Gluster-users mailing list
>>>>>>>>>>>>>>>>> > >
> Gluster-users at gluster.org
>>>>>>>>>>>>>>>>> > >
> http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>
/listinfo/gluster-users
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
>
>>>>>>>>>>>>>>>>> > >
> _______________________________________________
>>>>>>>>>>>>>>>>> > >
> Gluster-users mailing list
>>>>>>>>>>>>>>>>> > >
> Gluster-users at gluster.org
>>>>>>>>>>>>>>>>> > >
> http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>
/listinfo/gluster-users
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> > >
--
>>>>>>>>>>>>>>>>> > >
Thanks,
>>>>>>>>>>>>>>>>> > >
Anuradha.
>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160901/2b1fd340/attachment.html>

Gluster users - Aug 2016 - 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow