thr3ads.net - Gluster users - [Gluster-users] 3.8.3 Shards Healing Glacier Slow [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2016-Aug-31 07:19 UTC

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

OK I just hit the other issue too, where .shard doesn't get healed. :)

Investigating as to why that is the case. Give me some time.

-Krutika

On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> Just figured the steps Anuradha has provided won't work if granular
entry
> heal is on.
> So when you bring down a brick and create fake2 under / of the volume,
> granular entry heal feature causes
> sh to remember only the fact that 'fake2' needs to be recreated on
the
> offline brick (because changelogs are granular).
>
> In this case, we would be required to indicate to self-heal-daemon that
> the entire directory tree from '/' needs to be repaired on the
brick that
> contains no data.
>
> To fix this, I did the following (for users who use granular entry
> self-healing):
>
> 1. Kill the last brick process in the replica (/bricks/3)
>
> 2. [root at server-3 ~]# rm -rf /bricks/3
>
> 3. [root at server-3 ~]# mkdir /bricks/3
>
> 4. Create a new dir on the mount point:
>     [root at client-1 ~]# mkdir /mnt/fake
>
> 5. Set some fake xattr on the root of the volume, and not the
'fake'
> directory itself.
>     [root at client-1 ~]# setfattr -n "user.some-name" -v
"some-value" /mnt
>
> 6. Make sure there's no io happening on your volume.
>
> 7. Check the pending xattrs on the brick directories of the two good
> copies (on bricks 1 and 2), you should be seeing same values as the one
> marked in red in both bricks.
> (note that the client-<num> xattr key will have the same last digit
as the
> index of the brick that is down, when counting from 0. So if the first
> brick is the one that is down, it would read trusted.afr.*-client-0; if the
> second brick is the one that is empty and down, it would read
> trusted.afr.*-client-1 and so on).
>
> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
> # file: 1
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a6574635f72756e74696d655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> *trusted.afr.rep-client-2=0x000000000000000100000001*
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>
> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
> # file: 2
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a6574635f72756e74696d655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> *trusted.afr.rep-client-2=0x000**000000000000100000001*
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>
> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1.
>
> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
> *0x000000010000000100000001* /bricks/1
> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
> *0x000000010000000100000001* /bricks/2
>
> 9. Get the xattrs again and check the xattrs are set properly now
>
> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
> # file: 1
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a6574635f72756e74696d655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> *trusted.afr.rep-client-2=0x000**000010000000100000001*
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>
> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
> # file: 2
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a6574635f72756e74696d655f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> *trusted.afr.rep-client-2=0x000**000010000000100000001*
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>
> 10. Force-start the volume.
>
> [root at server-1 ~]# gluster volume start rep force
> volume start: rep: success
>
> 11. Monitor heal-info command to ensure the number of entries keeps
> growing.
>
> 12. Keep monitoring with step 10 and eventually the number of entries
> needing heal must come down to 0.
> Also the checksums of the files on the previously empty brick should now
> match with the copies on the other two bricks.
>
> Could you check if the above steps work for you, in your test environment?
>
> You caught a nice bug in the manual steps to follow when granular
> entry-heal is enabled and an empty brick needs heal. Thanks for reporting
> it. :) We will fix the documentation appropriately.
>
> -Krutika
>
>
> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>> Tried this.
>>
>> With me, only 'fake2' gets healed after i bring the
'empty' brick back up
>> and it stops there unless I do a 'heal-full'.
>>
>> Is that what you're seeing as well?
>>
>> -Krutika
>>
>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> Same issue brought up glusterd on problem node heal count still
stuck at
>>> 6330.
>>>
>>> Ran gluster v heal GUSTER1 full
>>>
>>> glustershd on problem node shows a sweep starting and finishing in
>>> seconds.  Other 2 nodes show no activity in log.  They should start
a sweep
>>> too shouldn't they?
>>>
>>> Tried starting from scratch
>>>
>>> kill -15 brickpid
>>> rm -Rf /brick
>>> mkdir -p /brick
>>> mkdir mkdir /gsmount/fake2
>>> setfattr -n "user.some-name" -v "some-value"
/gsmount/fake2
>>>
>>> Heals visible dirs instantly then stops.
>>>
>>> gluster v heal GLUSTER1 full
>>>
>>> see sweep star on problem node and end almost instantly.  no files
added
>>> t heal list no files healed no more logging
>>>
>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>> starting full sweep on subvol GLUSTER1-client-1
>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>> finished full sweep on subvol GLUSTER1-client-1
>>>
>>> same results no matter which node you run command on.  Still stuck
with
>>> 6330 files showing needing healed out of 19k.  still showing in
logs no
>>> heals are occuring.
>>>
>>> Is their a way to forcibly reset any prior heal data?  Could it be
stuck
>>> on some past failed heal start?
>>>
>>>
>>>
>>>
>>> *David Gossage*
>>> *Carousel Checks Inc. | System Administrator*
>>> *Office* 708.613.2284
>>>
>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> updated test server to 3.8.3
>>>>>
>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>> Options Reconfigured:
>>>>> cluster.granular-entry-heal: on
>>>>> performance.readdir-ahead: on
>>>>> performance.read-ahead: off
>>>>> nfs.disable: on
>>>>> nfs.addr-namelookup: off
>>>>> nfs.enable-ino32: off
>>>>> cluster.background-self-heal-count: 16
>>>>> cluster.self-heal-window-size: 1024
>>>>> performance.quick-read: off
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.eager-lock: enable
>>>>> network.remote-dio: on
>>>>> cluster.quorum-type: auto
>>>>> cluster.server-quorum-type: server
>>>>> storage.owner-gid: 36
>>>>> storage.owner-uid: 36
>>>>> server.allow-insecure: on
>>>>> features.shard: on
>>>>> features.shard-block-size: 64MB
>>>>> performance.strict-o-direct: off
>>>>> cluster.locking-scheme: granular
>>>>>
>>>>> kill -15 brickpid
>>>>> rm -Rf /gluster2/brick3
>>>>> mkdir -p /gluster2/brick3/1
>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>> \:_glustershard/fake2
>>>>> setfattr -n "user.some-name" -v
"some-value"
>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>> gluster v start glustershard force
>>>>>
>>>>> at this point brick process starts and all visible files
including new
>>>>> dir are made on brick
>>>>> handful of shards are in heal statistics still but no
.shard directory
>>>>> created and no increase in shard count
>>>>>
>>>>> gluster v heal glustershard
>>>>>
>>>>> At this point still no increase in count or dir made no
additional
>>>>> activity in logs for healing generated.  waited few minutes
tailing logs to
>>>>> check if anything kicked in.
>>>>>
>>>>> gluster v heal glustershard full
>>>>>
>>>>> gluster shards added to list and heal commences.  logs show
full sweep
>>>>> starting on all 3 nodes.  though this time it only shows as
finishing on
>>>>> one which looks to be the one that had brick deleted.
>>>>>
>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>> glustershard-client-0
>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>> glustershard-client-1
>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>> glustershard-client-2
>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>> glustershard-client-2
>>>>>
>>>>
>>>> Just realized its still healing so that may be why sweep on 2
other
>>>> bricks haven't replied as finished.
>>>>
>>>>>
>>>>>
>>>>> my hope is that later tonight a full heal will work on
production.  Is
>>>>> it possible self-heal daemon can get stale or stop
listening but still show
>>>>> as active?  Would stopping and starting self-heal daemon
from gluster cli
>>>>> before doing these heals be helpful?
>>>>>
>>>>>
>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay
<
>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika
Dhananjay <
>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David
Gossage <
>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM,
Krutika Dhananjay <
>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Could you also share the glustershd
logs?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'll get them when I get to work
sure
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I tried the same steps that you
mentioned multiple times, but
>>>>>>>>>>> heal is running to completion
without any issues.
>>>>>>>>>>>
>>>>>>>>>>> It must be said that 'heal
full' traverses the files and
>>>>>>>>>>> directories in a depth-first order
and does heals also in the same order.
>>>>>>>>>>> But if it gets interrupted in the
middle (say because self-heal-daemon was
>>>>>>>>>>> either intentionally or
unintentionally brought offline and then brought
>>>>>>>>>>> back up), self-heal will only pick
up the entries that are so far marked as
>>>>>>>>>>> new-entries that need heal which it
will find in indices/xattrop directory.
>>>>>>>>>>> What this means is that those files
and directories that were not visited
>>>>>>>>>>> during the crawl, will remain
untouched and unhealed in this second
>>>>>>>>>>> iteration of heal, unless you
execute a 'heal-full' again.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So should it start healing shards as it
crawls or not until after
>>>>>>>>>> it crawls the entire .shard directory? 
At the pace it was going that could
>>>>>>>>>> be a week with one node appearing in
the cluster but with no shard files if
>>>>>>>>>> anything tries to access a file on that
node.  From my experience other day
>>>>>>>>>> telling it to heal full again did
nothing regardless of node used.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> Crawl is started from '/' of the
volume. Whenever self-heal detects
>>>>>>>> during the crawl that a file or directory is
present in some brick(s) and
>>>>>>>> absent in others, it creates the file on the
bricks where it is absent and
>>>>>>>> marks the fact that the file or directory might
need data/entry and
>>>>>>>> metadata heal too (this also means that an
index is created under
>>>>>>>> .glusterfs/indices/xattrop of the src bricks).
And the data/entry and
>>>>>>>> metadata heal are picked up and done in
>>>>>>>>
>>>>>>> the background with the help of these indices.
>>>>>>>>
>>>>>>>
>>>>>>> Looking at my 3rd node as example i find nearly an
exact same number
>>>>>>> of files in xattrop dir as reported by heal count
at time I brought down
>>>>>>> node2 to try and alleviate read io errors that
seemed to occur from what I
>>>>>>> was guessing as attempts to use the node with no
shards for reads.
>>>>>>>
>>>>>>> Also attached are the glustershd logs from the 3
nodes, along with
>>>>>>> the test node i tried yesterday with same results.
>>>>>>>
>>>>>>
>>>>>> Looking at my own logs I notice that a full sweep was
only ever
>>>>>> recorded in glustershd.log on 2nd node with missing
directory.  I believe I
>>>>>> should have found a sweep begun on every node correct?
>>>>>>
>>>>>> On my test dev when it did work I do see that
>>>>>>
>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-0
>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-1
>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-2
>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>> glustershard-client-2
>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>> glustershard-client-1
>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>> glustershard-client-0
>>>>>>
>>>>>> While when looking at past few days of the 3 prod nodes
i only found
>>>>>> that on my 2nd node
>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> My suspicion is that this is what
happened on your setup. Could
>>>>>>>>>>> you confirm if that was the case?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Brick was brought online with force
start then a full heal
>>>>>>>>>> launched.  Hours later after it became
evident that it was not adding new
>>>>>>>>>> files to heal I did try restarting
self-heal daemon and relaunching full
>>>>>>>>>> heal again. But this was after the heal
had basically already failed to
>>>>>>>>>> work as intended.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OK. How did you figure it was not adding
any new files? I need to
>>>>>>>>> know what places you were monitoring to
come to this conclusion.
>>>>>>>>>
>>>>>>>>> -Krutika
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> As for those logs, I did manager to
do something that caused
>>>>>>>>>>> these warning messages you shared
earlier to appear in my client and server
>>>>>>>>>>> logs.
>>>>>>>>>>> Although these logs are annoying
and a bit scary too, they
>>>>>>>>>>> didn't do any harm to the data
in my volume. Why they appear just after a
>>>>>>>>>>> brick is replaced and under no
other circumstances is something I'm still
>>>>>>>>>>> investigating.
>>>>>>>>>>>
>>>>>>>>>>> But for future, it would be good to
follow the steps Anuradha
>>>>>>>>>>> gave as that would allow self-heal
to at least detect that it has some
>>>>>>>>>>> repairing to do whenever it is
restarted whether intentionally or otherwise.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I followed those steps as described on
my test box and ended up
>>>>>>>>>> with exact same outcome of adding
shards at an agonizing slow pace and no
>>>>>>>>>> creation of .shard directory or heals
on shard directory.  Directories
>>>>>>>>>> visible from mount healed quickly. 
This was with one VM so it has only 800
>>>>>>>>>> shards as well.  After hours at work it
had added a total of 33 shards to
>>>>>>>>>> be healed.  I sent those logs yesterday
as well though not the glustershd.
>>>>>>>>>>
>>>>>>>>>> Does replace-brick command copy files
in same manner?  For these
>>>>>>>>>> purposes I am contemplating just
skipping the heal route.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -Krutika
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM,
David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> attached brick and client logs
from test machine where same
>>>>>>>>>>>> behavior occurred not sure if
anything new is there.  its still on 3.8.2
>>>>>>>>>>>>
>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>> Bricks:
>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>> cluster.locking-scheme:
granular
>>>>>>>>>>>> performance.strict-o-direct:
off
>>>>>>>>>>>> features.shard-block-size: 64MB
>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>> cluster.server-quorum-type:
server
>>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>>> cluster.self-heal-window-size:
1024
>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20
PM, David Gossage <
>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
7:01 AM, Anuradha Talur <
>>>>>>>>>>>>> atalur at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ----- Original Message
-----
>>>>>>>>>>>>>> > From: "David
Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>> > To: "Anuradha
Talur" <atalur at redhat.com>
>>>>>>>>>>>>>> > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>> Gluster-users at
gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>> kdhananj at
redhat.com>
>>>>>>>>>>>>>> > Sent: Monday,
August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>> > Subject: Re:
[Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>>>>>> Slow
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Mon, Aug 29,
2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > > Response
inline.
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > -----
Original Message -----
>>>>>>>>>>>>>> > > > From:
"Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>> > > > To:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>> > > > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>> Gluster-users at
gluster.org>
>>>>>>>>>>>>>> > > > Sent:
Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>> > > > Subject:
Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Could
you attach both client and brick logs? Meanwhile
>>>>>>>>>>>>>> I will try these
>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>> > > > out on
my machines and see if it is easily recreatable.
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > -Krutika
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > On Mon,
Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>> > > dgossage at
carouselchecks.com
>>>>>>>>>>>>>> > > > >
wrote:
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Centos 7
Gluster 3.8.3
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>> > > > Options
Reconfigured:
>>>>>>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>> > > >
cluster.self-heal-daemon: on
>>>>>>>>>>>>>> > > >
cluster.locking-scheme: granular
>>>>>>>>>>>>>> > > >
features.shard-block-size: 64MB
>>>>>>>>>>>>>> > > >
features.shard: on
>>>>>>>>>>>>>> > > >
performance.readdir-ahead: on
>>>>>>>>>>>>>> > > >
storage.owner-uid: 36
>>>>>>>>>>>>>> > > >
storage.owner-gid: 36
>>>>>>>>>>>>>> > > >
performance.quick-read: off
>>>>>>>>>>>>>> > > >
performance.read-ahead: off
>>>>>>>>>>>>>> > > >
performance.io-cache: off
>>>>>>>>>>>>>> > > >
performance.stat-prefetch: on
>>>>>>>>>>>>>> > > >
cluster.eager-lock: enable
>>>>>>>>>>>>>> > > >
network.remote-dio: enable
>>>>>>>>>>>>>> > > >
cluster.quorum-type: auto
>>>>>>>>>>>>>> > > >
cluster.server-quorum-type: server
>>>>>>>>>>>>>> > > >
server.allow-insecure: on
>>>>>>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>>>>>>> > > >
nfs.disable: on
>>>>>>>>>>>>>> > > >
nfs.addr-namelookup: off
>>>>>>>>>>>>>> > > >
nfs.enable-ino32: off
>>>>>>>>>>>>>> > > >
cluster.granular-entry-heal: on
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > Friday
did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>>>>>>>>>>> > > >
Following steps detailed in previous recommendations
>>>>>>>>>>>>>> began proces of
>>>>>>>>>>>>>> > > >
replacing and healngbricks one node at a time.
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > 1) kill
pid of brick
>>>>>>>>>>>>>> > > > 2)
reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>> > > > 3)
recreate directory of brick
>>>>>>>>>>>>>> > > > 4)
gluster volume start <> force
>>>>>>>>>>>>>> > > > 5)
gluster volume heal <> full
>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > I'd
suggest that full heal is not used. There are a few
>>>>>>>>>>>>>> bugs in full heal.
>>>>>>>>>>>>>> > > Better safe
than sorry ;)
>>>>>>>>>>>>>> > > Instead
I'd suggest the following steps:
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > Currently I
brought the node down by systemctl stop
>>>>>>>>>>>>>> glusterd as I was
>>>>>>>>>>>>>> > getting sporadic
io issues and a few VM's paused so hoping
>>>>>>>>>>>>>> that will help.
>>>>>>>>>>>>>> > I may wait to do
this till around 4PM when most work is
>>>>>>>>>>>>>> done in case it
>>>>>>>>>>>>>> > shoots load up.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > > 1) kill pid
of brick
>>>>>>>>>>>>>> > > 2) to
configuring of brick that you need
>>>>>>>>>>>>>> > > 3) recreate
brick dir
>>>>>>>>>>>>>> > > 4) while the
brick is still down, from the mount point:
>>>>>>>>>>>>>> > >    a) create
a dummy non existent dir under / of mount.
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > so if noee 2 is
down brick, pick node for example 3 and
>>>>>>>>>>>>>> make a test dir
>>>>>>>>>>>>>> > under its brick
directory that doesnt exist on 2 or should
>>>>>>>>>>>>>> I be dong this
>>>>>>>>>>>>>> > over a gluster
mount?
>>>>>>>>>>>>>> You should be doing
this over gluster mount.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > >    b) set a
non existent extended attribute on / of mount.
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Could you give me
an example of an attribute to set?   I've
>>>>>>>>>>>>>> read a tad on
>>>>>>>>>>>>>> > this, and looked
up attributes but haven't set any yet
>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> Sure. setfattr -n
"user.some-name" -v "some-value"
>>>>>>>>>>>>>> <path-to-mount>
>>>>>>>>>>>>>> > Doing these steps
will ensure that heal happens only from
>>>>>>>>>>>>>> updated brick to
>>>>>>>>>>>>>> > > down brick.
>>>>>>>>>>>>>> > > 5) gluster v
start <> force
>>>>>>>>>>>>>> > > 6) gluster v
heal <>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Will it matter if
somewhere in gluster the full heal
>>>>>>>>>>>>>> command was run other
>>>>>>>>>>>>>> > day?  Not sure if
it eventually stops or times out.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> full heal will stop
once the crawl is done. So if you want to
>>>>>>>>>>>>>> trigger heal again,
>>>>>>>>>>>>>> run gluster v heal
<>. Actually even brick up or volume start
>>>>>>>>>>>>>> force should
>>>>>>>>>>>>>> trigger the heal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Did this on test bed today.
its one server with 3 bricks on
>>>>>>>>>>>>> same machine so take that
for what its worth.  also it still runs 3.8.2.
>>>>>>>>>>>>> Maybe ill update and re-run
test.
>>>>>>>>>>>>>
>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>> created fake dir on gluster
mount
>>>>>>>>>>>>> set suggested fake
attribute on it
>>>>>>>>>>>>> ran volume start <>
force
>>>>>>>>>>>>>
>>>>>>>>>>>>> looked at files it said
needed healing and it was just 8
>>>>>>>>>>>>> shards that were modified
for few minutes I ran through steps
>>>>>>>>>>>>>
>>>>>>>>>>>>> gave it few minutes and it
stayed same
>>>>>>>>>>>>> ran gluster volume <>
heal
>>>>>>>>>>>>>
>>>>>>>>>>>>> it healed all the
directories and files you can see over mount
>>>>>>>>>>>>> including fakedir.
>>>>>>>>>>>>>
>>>>>>>>>>>>> same issue for shards
though.  it adds more shards to heal at
>>>>>>>>>>>>> glacier pace.  slight jump
in speed if I stat every file and dir in VM
>>>>>>>>>>>>> running but not all shards.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It started with 8 shards to
heal and is now only at 33 out of
>>>>>>>>>>>>> 800 and probably wont
finish adding for few days at rate it goes.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > > 1st node
worked as expected took 12 hours to heal 1TB
>>>>>>>>>>>>>> data. Load was
>>>>>>>>>>>>>> > > little
>>>>>>>>>>>>>> > > > heavy
but nothing shocking.
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > About an
hour after node 1 finished I began same
>>>>>>>>>>>>>> process on node2. Heal
>>>>>>>>>>>>>> > > > proces
kicked in as before and the files in directories
>>>>>>>>>>>>>> visible from
>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>> > > > and
.glusterfs healed in short time. Then it began
>>>>>>>>>>>>>> crawl of .shard adding
>>>>>>>>>>>>>> > > > those
files to heal count at which point the entire
>>>>>>>>>>>>>> proces ground to a
>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>> > > >
basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>> added 5900 to heal
>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>> > > > Load on
all 3 machnes is negligible. It was suggested
>>>>>>>>>>>>>> to change this
>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart
>>>>>>>>>>>>>> volume which I
>>>>>>>>>>>>>> > > did. No
>>>>>>>>>>>>>> > > > efffect.
Tried relaunching heal no effect, despite any
>>>>>>>>>>>>>> node picked. I
>>>>>>>>>>>>>> > > > started
each VM and performed a stat of all files from
>>>>>>>>>>>>>> within it, or a
>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>> > > > virus
scan and that seemed to cause short small spikes
>>>>>>>>>>>>>> in shards added,
>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>> > > > not by
much. Logs are showing no real messages
>>>>>>>>>>>>>> indicating anything is
>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>> > > > on. I
get hits to brick log on occasion of null lookups
>>>>>>>>>>>>>> making me think
>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>> > > > not
really crawling shards directory but waiting for a
>>>>>>>>>>>>>> shard lookup to
>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>> > > > it.
I'll get following in brick log but not constant
>>>>>>>>>>>>>> and sometime
>>>>>>>>>>>>>> > > multiple
>>>>>>>>>>>>>> > > > for same
shard.
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>> > > >
[server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>> 0-GLUSTER1-server: no
resolution
>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>> > > > for
(null) (LOOKUP)
>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>> 0-GLUSTER1-server:
12591783:
>>>>>>>>>>>>>> > > > LOOKUP
(null) (00000000-0000-0000-00
>>>>>>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>> > > >
argument) [Invalid argument]
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > This one
repeated about 30 times in row then nothing
>>>>>>>>>>>>>> for 10 minutes then
>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>> > > > hit for
one different shard by itself.
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > How can
I determine if Heal is actually running? How
>>>>>>>>>>>>>> can I kill it or
>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>> > > > restart?
Does node I start it from determine which
>>>>>>>>>>>>>> directory gets
>>>>>>>>>>>>>> > > crawled to
>>>>>>>>>>>>>> > > >
determine heals?
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > > David
Gossage
>>>>>>>>>>>>>> > > > Carousel
Checks Inc. | System Administrator
>>>>>>>>>>>>>> > > > Office
708.613.2284
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>> > > Thanks,
>>>>>>>>>>>>>> > > Anuradha.
>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160831/c42fe90d/attachment.html>

Krutika Dhananjay

2016-Aug-31 08:50 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

No, sorry, it's working fine. I may have missed some step because of which
i saw that problem. /.shard is also healing fine now.

Let me know if it works for you.

-Krutika

On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
> OK I just hit the other issue too, where .shard doesn't get healed. :)
>
> Investigating as to why that is the case. Give me some time.
>
> -Krutika
>
> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>> Just figured the steps Anuradha has provided won't work if granular
entry
>> heal is on.
>> So when you bring down a brick and create fake2 under / of the volume,
>> granular entry heal feature causes
>> sh to remember only the fact that 'fake2' needs to be recreated
on the
>> offline brick (because changelogs are granular).
>>
>> In this case, we would be required to indicate to self-heal-daemon that
>> the entire directory tree from '/' needs to be repaired on the
brick that
>> contains no data.
>>
>> To fix this, I did the following (for users who use granular entry
>> self-healing):
>>
>> 1. Kill the last brick process in the replica (/bricks/3)
>>
>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>
>> 3. [root at server-3 ~]# mkdir /bricks/3
>>
>> 4. Create a new dir on the mount point:
>>     [root at client-1 ~]# mkdir /mnt/fake
>>
>> 5. Set some fake xattr on the root of the volume, and not the
'fake'
>> directory itself.
>>     [root at client-1 ~]# setfattr -n "user.some-name" -v
"some-value" /mnt
>>
>> 6. Make sure there's no io happening on your volume.
>>
>> 7. Check the pending xattrs on the brick directories of the two good
>> copies (on bricks 1 and 2), you should be seeing same values as the one
>> marked in red in both bricks.
>> (note that the client-<num> xattr key will have the same last
digit as
>> the index of the brick that is down, when counting from 0. So if the
first
>> brick is the one that is down, it would read trusted.afr.*-client-0; if
the
>> second brick is the one that is empty and down, it would read
>> trusted.afr.*-client-1 and so on).
>>
>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>> # file: 1
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>> # file: 2
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a
1.
>>
>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>> *0x000000010000000100000001* /bricks/1
>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>> *0x000000010000000100000001* /bricks/2
>>
>> 9. Get the xattrs again and check the xattrs are set properly now
>>
>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>> # file: 1
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>> # file: 2
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> 10. Force-start the volume.
>>
>> [root at server-1 ~]# gluster volume start rep force
>> volume start: rep: success
>>
>> 11. Monitor heal-info command to ensure the number of entries keeps
>> growing.
>>
>> 12. Keep monitoring with step 10 and eventually the number of entries
>> needing heal must come down to 0.
>> Also the checksums of the files on the previously empty brick should
now
>> match with the copies on the other two bricks.
>>
>> Could you check if the above steps work for you, in your test
environment?
>>
>> You caught a nice bug in the manual steps to follow when granular
>> entry-heal is enabled and an empty brick needs heal. Thanks for
reporting
>> it. :) We will fix the documentation appropriately.
>>
>> -Krutika
>>
>>
>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>> Tried this.
>>>
>>> With me, only 'fake2' gets healed after i bring the
'empty' brick back
>>> up and it stops there unless I do a 'heal-full'.
>>>
>>> Is that what you're seeing as well?
>>>
>>> -Krutika
>>>
>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> Same issue brought up glusterd on problem node heal count still
stuck
>>>> at 6330.
>>>>
>>>> Ran gluster v heal GUSTER1 full
>>>>
>>>> glustershd on problem node shows a sweep starting and finishing
in
>>>> seconds.  Other 2 nodes show no activity in log.  They should
start a sweep
>>>> too shouldn't they?
>>>>
>>>> Tried starting from scratch
>>>>
>>>> kill -15 brickpid
>>>> rm -Rf /brick
>>>> mkdir -p /brick
>>>> mkdir mkdir /gsmount/fake2
>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>
>>>> Heals visible dirs instantly then stops.
>>>>
>>>> gluster v heal GLUSTER1 full
>>>>
>>>> see sweep star on problem node and end almost instantly.  no
files
>>>> added t heal list no files healed no more logging
>>>>
>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>> starting full sweep on subvol GLUSTER1-client-1
>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>
>>>> same results no matter which node you run command on.  Still
stuck with
>>>> 6330 files showing needing healed out of 19k.  still showing in
logs no
>>>> heals are occuring.
>>>>
>>>> Is their a way to forcibly reset any prior heal data?  Could it
be
>>>> stuck on some past failed heal start?
>>>>
>>>>
>>>>
>>>>
>>>> *David Gossage*
>>>> *Carousel Checks Inc. | System Administrator*
>>>> *Office* 708.613.2284
>>>>
>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> updated test server to 3.8.3
>>>>>>
>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>> Options Reconfigured:
>>>>>> cluster.granular-entry-heal: on
>>>>>> performance.readdir-ahead: on
>>>>>> performance.read-ahead: off
>>>>>> nfs.disable: on
>>>>>> nfs.addr-namelookup: off
>>>>>> nfs.enable-ino32: off
>>>>>> cluster.background-self-heal-count: 16
>>>>>> cluster.self-heal-window-size: 1024
>>>>>> performance.quick-read: off
>>>>>> performance.io-cache: off
>>>>>> performance.stat-prefetch: off
>>>>>> cluster.eager-lock: enable
>>>>>> network.remote-dio: on
>>>>>> cluster.quorum-type: auto
>>>>>> cluster.server-quorum-type: server
>>>>>> storage.owner-gid: 36
>>>>>> storage.owner-uid: 36
>>>>>> server.allow-insecure: on
>>>>>> features.shard: on
>>>>>> features.shard-block-size: 64MB
>>>>>> performance.strict-o-direct: off
>>>>>> cluster.locking-scheme: granular
>>>>>>
>>>>>> kill -15 brickpid
>>>>>> rm -Rf /gluster2/brick3
>>>>>> mkdir -p /gluster2/brick3/1
>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>> \:_glustershard/fake2
>>>>>> setfattr -n "user.some-name" -v
"some-value"
>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>>> gluster v start glustershard force
>>>>>>
>>>>>> at this point brick process starts and all visible
files including
>>>>>> new dir are made on brick
>>>>>> handful of shards are in heal statistics still but no
.shard
>>>>>> directory created and no increase in shard count
>>>>>>
>>>>>> gluster v heal glustershard
>>>>>>
>>>>>> At this point still no increase in count or dir made no
additional
>>>>>> activity in logs for healing generated.  waited few
minutes tailing logs to
>>>>>> check if anything kicked in.
>>>>>>
>>>>>> gluster v heal glustershard full
>>>>>>
>>>>>> gluster shards added to list and heal commences.  logs
show full
>>>>>> sweep starting on all 3 nodes.  though this time it
only shows as finishing
>>>>>> on one which looks to be the one that had brick
deleted.
>>>>>>
>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-0
>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-1
>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>> glustershard-client-2
>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>> glustershard-client-2
>>>>>>
>>>>>
>>>>> Just realized its still healing so that may be why sweep on
2 other
>>>>> bricks haven't replied as finished.
>>>>>
>>>>>>
>>>>>>
>>>>>> my hope is that later tonight a full heal will work on
production.
>>>>>> Is it possible self-heal daemon can get stale or stop
listening but still
>>>>>> show as active?  Would stopping and starting self-heal
daemon from gluster
>>>>>> cli before doing these heals be helpful?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika
Dhananjay <
>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika
Dhananjay <
>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David
Gossage <
>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM,
Krutika Dhananjay <
>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could you also share the
glustershd logs?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'll get them when I get to
work sure
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I tried the same steps that you
mentioned multiple times, but
>>>>>>>>>>>> heal is running to completion
without any issues.
>>>>>>>>>>>>
>>>>>>>>>>>> It must be said that 'heal
full' traverses the files and
>>>>>>>>>>>> directories in a depth-first
order and does heals also in the same order.
>>>>>>>>>>>> But if it gets interrupted in
the middle (say because self-heal-daemon was
>>>>>>>>>>>> either intentionally or
unintentionally brought offline and then brought
>>>>>>>>>>>> back up), self-heal will only
pick up the entries that are so far marked as
>>>>>>>>>>>> new-entries that need heal
which it will find in indices/xattrop directory.
>>>>>>>>>>>> What this means is that those
files and directories that were not visited
>>>>>>>>>>>> during the crawl, will remain
untouched and unhealed in this second
>>>>>>>>>>>> iteration of heal, unless you
execute a 'heal-full' again.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So should it start healing shards
as it crawls or not until
>>>>>>>>>>> after it crawls the entire .shard
directory?  At the pace it was going that
>>>>>>>>>>> could be a week with one node
appearing in the cluster but with no shard
>>>>>>>>>>> files if anything tries to access a
file on that node.  From my experience
>>>>>>>>>>> other day telling it to heal full
again did nothing regardless of node used.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Crawl is started from '/' of the
volume. Whenever self-heal
>>>>>>>>> detects during the crawl that a file or
directory is present in some
>>>>>>>>> brick(s) and absent in others, it creates
the file on the bricks where it
>>>>>>>>> is absent and marks the fact that the file
or directory might need
>>>>>>>>> data/entry and metadata heal too (this also
means that an index is created
>>>>>>>>> under .glusterfs/indices/xattrop of the src
bricks). And the data/entry and
>>>>>>>>> metadata heal are picked up and done in
>>>>>>>>>
>>>>>>>> the background with the help of these indices.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at my 3rd node as example i find nearly
an exact same
>>>>>>>> number of files in xattrop dir as reported by
heal count at time I brought
>>>>>>>> down node2 to try and alleviate read io errors
that seemed to occur from
>>>>>>>> what I was guessing as attempts to use the node
with no shards for reads.
>>>>>>>>
>>>>>>>> Also attached are the glustershd logs from the
3 nodes, along with
>>>>>>>> the test node i tried yesterday with same
results.
>>>>>>>>
>>>>>>>
>>>>>>> Looking at my own logs I notice that a full sweep
was only ever
>>>>>>> recorded in glustershd.log on 2nd node with missing
directory.  I believe I
>>>>>>> should have found a sweep begun on every node
correct?
>>>>>>>
>>>>>>> On my test dev when it did work I do see that
>>>>>>>
>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-0
>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on
subvol
>>>>>>> glustershard-client-0
>>>>>>>
>>>>>>> While when looking at past few days of the 3 prod
nodes i only found
>>>>>>> that on my 2nd node
>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> My suspicion is that this is
what happened on your setup. Could
>>>>>>>>>>>> you confirm if that was the
case?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Brick was brought online with force
start then a full heal
>>>>>>>>>>> launched.  Hours later after it
became evident that it was not adding new
>>>>>>>>>>> files to heal I did try restarting
self-heal daemon and relaunching full
>>>>>>>>>>> heal again. But this was after the
heal had basically already failed to
>>>>>>>>>>> work as intended.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OK. How did you figure it was not
adding any new files? I need to
>>>>>>>>>> know what places you were monitoring to
come to this conclusion.
>>>>>>>>>>
>>>>>>>>>> -Krutika
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> As for those logs, I did
manager to do something that caused
>>>>>>>>>>>> these warning messages you
shared earlier to appear in my client and server
>>>>>>>>>>>> logs.
>>>>>>>>>>>> Although these logs are
annoying and a bit scary too, they
>>>>>>>>>>>> didn't do any harm to the
data in my volume. Why they appear just after a
>>>>>>>>>>>> brick is replaced and under no
other circumstances is something I'm still
>>>>>>>>>>>> investigating.
>>>>>>>>>>>>
>>>>>>>>>>>> But for future, it would be
good to follow the steps Anuradha
>>>>>>>>>>>> gave as that would allow
self-heal to at least detect that it has some
>>>>>>>>>>>> repairing to do whenever it is
restarted whether intentionally or otherwise.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I followed those steps as described
on my test box and ended up
>>>>>>>>>>> with exact same outcome of adding
shards at an agonizing slow pace and no
>>>>>>>>>>> creation of .shard directory or
heals on shard directory.  Directories
>>>>>>>>>>> visible from mount healed quickly. 
This was with one VM so it has only 800
>>>>>>>>>>> shards as well.  After hours at
work it had added a total of 33 shards to
>>>>>>>>>>> be healed.  I sent those logs
yesterday as well though not the glustershd.
>>>>>>>>>>>
>>>>>>>>>>> Does replace-brick command copy
files in same manner?  For these
>>>>>>>>>>> purposes I am contemplating just
skipping the heal route.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22
AM, David Gossage <
>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> attached brick and client
logs from test machine where same
>>>>>>>>>>>>> behavior occurred not sure
if anything new is there.  its still on 3.8.2
>>>>>>>>>>>>>
>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>> cluster.locking-scheme:
granular
>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>> features.shard-block-size:
64MB
>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>>> cluster.server-quorum-type:
server
>>>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>>>> performance.stat-prefetch:
off
>>>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>>> performance.readdir-ahead:
on
>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
2:20 PM, David Gossage <
>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at
7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ----- Original
Message -----
>>>>>>>>>>>>>>> > From:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>> > To:
"Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>> > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>> Gluster-users at
gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>> kdhananj at
redhat.com>
>>>>>>>>>>>>>>> > Sent: Monday,
August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>> > Subject: Re:
[Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>>>>>>> Slow
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Mon, Aug
29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>> atalur at
redhat.com> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > > Response
inline.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > -----
Original Message -----
>>>>>>>>>>>>>>> > > >
From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>> > > > To:
"David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>> > > > Cc:
"gluster-users at gluster.org List" <
>>>>>>>>>>>>>>> Gluster-users at
gluster.org>
>>>>>>>>>>>>>>> > > >
Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>> > > >
Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Could you attach both client and brick logs? Meanwhile
>>>>>>>>>>>>>>> I will try these
>>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>>> > > > out
on my machines and see if it is easily recreatable.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
-Krutika
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > On
Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>> > > dgossage
at carouselchecks.com
>>>>>>>>>>>>>>> > > > >
wrote:
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>> > > >
Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>> > > >
Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>> > > >
Options Reconfigured:
>>>>>>>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>> > > >
cluster.self-heal-daemon: on
>>>>>>>>>>>>>>> > > >
cluster.locking-scheme: granular
>>>>>>>>>>>>>>> > > >
features.shard-block-size: 64MB
>>>>>>>>>>>>>>> > > >
features.shard: on
>>>>>>>>>>>>>>> > > >
performance.readdir-ahead: on
>>>>>>>>>>>>>>> > > >
storage.owner-uid: 36
>>>>>>>>>>>>>>> > > >
storage.owner-gid: 36
>>>>>>>>>>>>>>> > > >
performance.quick-read: off
>>>>>>>>>>>>>>> > > >
performance.read-ahead: off
>>>>>>>>>>>>>>> > > >
performance.io-cache: off
>>>>>>>>>>>>>>> > > >
performance.stat-prefetch: on
>>>>>>>>>>>>>>> > > >
cluster.eager-lock: enable
>>>>>>>>>>>>>>> > > >
network.remote-dio: enable
>>>>>>>>>>>>>>> > > >
cluster.quorum-type: auto
>>>>>>>>>>>>>>> > > >
cluster.server-quorum-type: server
>>>>>>>>>>>>>>> > > >
server.allow-insecure: on
>>>>>>>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>>>>>>>> > > >
nfs.disable: on
>>>>>>>>>>>>>>> > > >
nfs.addr-namelookup: off
>>>>>>>>>>>>>>> > > >
nfs.enable-ino32: off
>>>>>>>>>>>>>>> > > >
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>>>>>>>>>>>> > > >
Following steps detailed in previous recommendations
>>>>>>>>>>>>>>> began proces of
>>>>>>>>>>>>>>> > > >
replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > 1)
kill pid of brick
>>>>>>>>>>>>>>> > > > 2)
reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>> > > > 3)
recreate directory of brick
>>>>>>>>>>>>>>> > > > 4)
gluster volume start <> force
>>>>>>>>>>>>>>> > > > 5)
gluster volume heal <> full
>>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > I'd
suggest that full heal is not used. There are a few
>>>>>>>>>>>>>>> bugs in full heal.
>>>>>>>>>>>>>>> > > Better
safe than sorry ;)
>>>>>>>>>>>>>>> > > Instead
I'd suggest the following steps:
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > Currently
I brought the node down by systemctl stop
>>>>>>>>>>>>>>> glusterd as I was
>>>>>>>>>>>>>>> > getting
sporadic io issues and a few VM's paused so hoping
>>>>>>>>>>>>>>> that will help.
>>>>>>>>>>>>>>> > I may wait to
do this till around 4PM when most work is
>>>>>>>>>>>>>>> done in case it
>>>>>>>>>>>>>>> > shoots load
up.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > > 1) kill
pid of brick
>>>>>>>>>>>>>>> > > 2) to
configuring of brick that you need
>>>>>>>>>>>>>>> > > 3)
recreate brick dir
>>>>>>>>>>>>>>> > > 4) while
the brick is still down, from the mount point:
>>>>>>>>>>>>>>> > >    a)
create a dummy non existent dir under / of mount.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > so if noee 2
is down brick, pick node for example 3 and
>>>>>>>>>>>>>>> make a test dir
>>>>>>>>>>>>>>> > under its
brick directory that doesnt exist on 2 or should
>>>>>>>>>>>>>>> I be dong this
>>>>>>>>>>>>>>> > over a gluster
mount?
>>>>>>>>>>>>>>> You should be doing
this over gluster mount.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > >    b) set
a non existent extended attribute on / of
>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Could you give
me an example of an attribute to set?
>>>>>>>>>>>>>>>  I've read a
tad on
>>>>>>>>>>>>>>> > this, and
looked up attributes but haven't set any yet
>>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> Sure. setfattr -n
"user.some-name" -v "some-value"
>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>> > Doing these
steps will ensure that heal happens only from
>>>>>>>>>>>>>>> updated brick to
>>>>>>>>>>>>>>> > > down
brick.
>>>>>>>>>>>>>>> > > 5)
gluster v start <> force
>>>>>>>>>>>>>>> > > 6)
gluster v heal <>
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Will it matter
if somewhere in gluster the full heal
>>>>>>>>>>>>>>> command was run
other
>>>>>>>>>>>>>>> > day?  Not sure
if it eventually stops or times out.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> full heal will stop
once the crawl is done. So if you want
>>>>>>>>>>>>>>> to trigger heal
again,
>>>>>>>>>>>>>>> run gluster v heal
<>. Actually even brick up or volume
>>>>>>>>>>>>>>> start force should
>>>>>>>>>>>>>>> trigger the heal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did this on test bed
today.  its one server with 3 bricks on
>>>>>>>>>>>>>> same machine so take
that for what its worth.  also it still runs 3.8.2.
>>>>>>>>>>>>>> Maybe ill update and
re-run test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>>> created fake dir on
gluster mount
>>>>>>>>>>>>>> set suggested fake
attribute on it
>>>>>>>>>>>>>> ran volume start
<> force
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> looked at files it said
needed healing and it was just 8
>>>>>>>>>>>>>> shards that were
modified for few minutes I ran through steps
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gave it few minutes and
it stayed same
>>>>>>>>>>>>>> ran gluster volume
<> heal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it healed all the
directories and files you can see over
>>>>>>>>>>>>>> mount including
fakedir.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> same issue for shards
though.  it adds more shards to heal at
>>>>>>>>>>>>>> glacier pace.  slight
jump in speed if I stat every file and dir in VM
>>>>>>>>>>>>>> running but not all
shards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It started with 8
shards to heal and is now only at 33 out of
>>>>>>>>>>>>>> 800 and probably wont
finish adding for few days at rate it goes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > > 1st
node worked as expected took 12 hours to heal 1TB
>>>>>>>>>>>>>>> data. Load was
>>>>>>>>>>>>>>> > > little
>>>>>>>>>>>>>>> > > >
heavy but nothing shocking.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
About an hour after node 1 finished I began same
>>>>>>>>>>>>>>> process on node2.
Heal
>>>>>>>>>>>>>>> > > >
proces kicked in as before and the files in
>>>>>>>>>>>>>>> directories visible
from
>>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>>> > > > and
.glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>> crawl of .shard
adding
>>>>>>>>>>>>>>> > > >
those files to heal count at which point the entire
>>>>>>>>>>>>>>> proces ground to a
>>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>>> > > >
basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>> added 5900 to heal
>>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>>> > > > Load
on all 3 machnes is negligible. It was suggested
>>>>>>>>>>>>>>> to change this
>>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>>> > > > to
full cluster.data-self-heal-algorithm and restart
>>>>>>>>>>>>>>> volume which I
>>>>>>>>>>>>>>> > > did. No
>>>>>>>>>>>>>>> > > >
efffect. Tried relaunching heal no effect, despite any
>>>>>>>>>>>>>>> node picked. I
>>>>>>>>>>>>>>> > > >
started each VM and performed a stat of all files from
>>>>>>>>>>>>>>> within it, or a
>>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>>> > > >
virus scan and that seemed to cause short small spikes
>>>>>>>>>>>>>>> in shards added,
>>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>>> > > > not
by much. Logs are showing no real messages
>>>>>>>>>>>>>>> indicating anything
is
>>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>>> > > > on.
I get hits to brick log on occasion of null
>>>>>>>>>>>>>>> lookups making me
think
>>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>>> > > > not
really crawling shards directory but waiting for a
>>>>>>>>>>>>>>> shard lookup to
>>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>>> > > > it.
I'll get following in brick log but not constant
>>>>>>>>>>>>>>> and sometime
>>>>>>>>>>>>>>> > > multiple
>>>>>>>>>>>>>>> > > > for
same shard.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>> > > >
[server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>> 0-GLUSTER1-server:
no resolution
>>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>>> > > > for
(null) (LOOKUP)
>>>>>>>>>>>>>>> > > >
[2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>> 0-GLUSTER1-server:
12591783:
>>>>>>>>>>>>>>> > > >
LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>>> > > >
argument) [Invalid argument]
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > This
one repeated about 30 times in row then nothing
>>>>>>>>>>>>>>> for 10 minutes then
>>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>>> > > > hit
for one different shard by itself.
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > > How
can I determine if Heal is actually running? How
>>>>>>>>>>>>>>> can I kill it or
>>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>>> > > >
restart? Does node I start it from determine which
>>>>>>>>>>>>>>> directory gets
>>>>>>>>>>>>>>> > > crawled
to
>>>>>>>>>>>>>>> > > >
determine heals?
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
David Gossage
>>>>>>>>>>>>>>> > > >
Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>> > > >
Office 708.613.2284
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>>>>>>> > > >
Gluster-users mailing list
>>>>>>>>>>>>>>> > > >
Gluster-users at gluster.org
>>>>>>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>> > > Thanks,
>>>>>>>>>>>>>>> > > Anuradha.
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160831/45f1cdb4/attachment.html>

Gluster users - Aug 2016 - 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow