thr3ads.net - Gluster users - [Gluster-users] 3.8.3 Shards Healing Glacier Slow [Aug 2016]

If this information is useful, please help other people find it:
Share via:

David Gossage

2016-Aug-30 13:52 UTC

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
>
>
> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>>> wrote:
>>>
>>>> Could you also share the glustershd logs?
>>>>
>>>
>>> I'll get them when I get to work sure
>>>
>>
>>>
>>>>
>>>> I tried the same steps that you mentioned multiple times, but
heal is
>>>> running to completion without any issues.
>>>>
>>>> It must be said that 'heal full' traverses the files
and directories in
>>>> a depth-first order and does heals also in the same order. But
if it gets
>>>> interrupted in the middle (say because self-heal-daemon was
either
>>>> intentionally or unintentionally brought offline and then
brought back up),
>>>> self-heal will only pick up the entries that are so far marked
as
>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>> What this means is that those files and directories that were
not visited
>>>> during the crawl, will remain untouched and unhealed in this
second
>>>> iteration of heal, unless you execute a 'heal-full'
again.
>>>>
>>>
>>> So should it start healing shards as it crawls or not until after
it
>>> crawls the entire .shard directory?  At the pace it was going that
could be
>>> a week with one node appearing in the cluster but with no shard
files if
>>> anything tries to access a file on that node.  From my experience
other day
>>> telling it to heal full again did nothing regardless of node used.
>>>
>>
> Crawl is started from '/' of the volume. Whenever self-heal detects
during
> the crawl that a file or directory is present in some brick(s) and absent
> in others, it creates the file on the bricks where it is absent and marks
> the fact that the file or directory might need data/entry and metadata heal
> too (this also means that an index is created under
> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
> metadata heal are picked up and done in
>
the background with the help of these indices.>
Looking at my 3rd node as example i find nearly an exact same number of
files in xattrop dir as reported by heal count at time I brought down node2
to try and alleviate read io errors that seemed to occur from what I was
guessing as attempts to use the node with no shards for reads.

Also attached are the glustershd logs from the 3 nodes, along with the test
node i tried yesterday with same results.
>
>
>>>
>>>> My suspicion is that this is what happened on your setup. Could
you
>>>> confirm if that was the case?
>>>>
>>>
>>> Brick was brought online with force start then a full heal
launched.
>>> Hours later after it became evident that it was not adding new
files to
>>> heal I did try restarting self-heal daemon and relaunching full
heal again.
>>> But this was after the heal had basically already failed to work as
>>> intended.
>>>
>>
>> OK. How did you figure it was not adding any new files? I need to know
>> what places you were monitoring to come to this conclusion.
>>
>> -Krutika
>>
>>
>>>
>>>
>>>> As for those logs, I did manager to do something that caused
these
>>>> warning messages you shared earlier to appear in my client and
server logs.
>>>> Although these logs are annoying and a bit scary too, they
didn't do
>>>> any harm to the data in my volume. Why they appear just after a
brick is
>>>> replaced and under no other circumstances is something I'm
still
>>>> investigating.
>>>>
>>>> But for future, it would be good to follow the steps Anuradha
gave as
>>>> that would allow self-heal to at least detect that it has some
repairing to
>>>> do whenever it is restarted whether intentionally or otherwise.
>>>>
>>>
>>> I followed those steps as described on my test box and ended up
with
>>> exact same outcome of adding shards at an agonizing slow pace and
no
>>> creation of .shard directory or heals on shard directory. 
Directories
>>> visible from mount healed quickly.  This was with one VM so it has
only 800
>>> shards as well.  After hours at work it had added a total of 33
shards to
>>> be healed.  I sent those logs yesterday as well though not the
glustershd.
>>>
>>> Does replace-brick command copy files in same manner?  For these
>>> purposes I am contemplating just skipping the heal route.
>>>
>>>
>>>> -Krutika
>>>>
>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> attached brick and client logs from test machine where same
behavior
>>>>> occurred not sure if anything new is there.  its still on
3.8.2
>>>>>
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>> Options Reconfigured:
>>>>> cluster.locking-scheme: granular
>>>>> performance.strict-o-direct: off
>>>>> features.shard-block-size: 64MB
>>>>> features.shard: on
>>>>> server.allow-insecure: on
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> cluster.server-quorum-type: server
>>>>> cluster.quorum-type: auto
>>>>> network.remote-dio: on
>>>>> cluster.eager-lock: enable
>>>>> performance.stat-prefetch: off
>>>>> performance.io-cache: off
>>>>> performance.quick-read: off
>>>>> cluster.self-heal-window-size: 1024
>>>>> cluster.background-self-heal-count: 16
>>>>> nfs.enable-ino32: off
>>>>> nfs.addr-namelookup: off
>>>>> nfs.disable: on
>>>>> performance.read-ahead: off
>>>>> performance.readdir-ahead: on
>>>>> cluster.granular-entry-heal: on
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur
<atalur at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> > From: "David Gossage" <dgossage
at carouselchecks.com>
>>>>>>> > To: "Anuradha Talur" <atalur at
redhat.com>
>>>>>>> > Cc: "gluster-users at gluster.org
List" <Gluster-users at gluster.org>,
>>>>>>> "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards
Healing Glacier Slow
>>>>>>> >
>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha
Talur <atalur at redhat.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Response inline.
>>>>>>> > >
>>>>>>> > > ----- Original Message -----
>>>>>>> > > > From: "Krutika Dhananjay"
<kdhananj at redhat.com>
>>>>>>> > > > To: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>> > > > Cc: "gluster-users at
gluster.org List" <
>>>>>>> Gluster-users at gluster.org>
>>>>>>> > > > Sent: Monday, August 29, 2016
3:55:04 PM
>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3
Shards Healing Glacier Slow
>>>>>>> > > >
>>>>>>> > > > Could you attach both client and
brick logs? Meanwhile I will
>>>>>>> try these
>>>>>>> > > steps
>>>>>>> > > > out on my machines and see if it is
easily recreatable.
>>>>>>> > > >
>>>>>>> > > > -Krutika
>>>>>>> > > >
>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM,
David Gossage <
>>>>>>> > > dgossage at carouselchecks.com
>>>>>>> > > > > wrote:
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>> > > >
>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Options Reconfigured:
>>>>>>> > > > cluster.data-self-heal-algorithm:
full
>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>> > > > features.shard: on
>>>>>>> > > > performance.readdir-ahead: on
>>>>>>> > > > storage.owner-uid: 36
>>>>>>> > > > storage.owner-gid: 36
>>>>>>> > > > performance.quick-read: off
>>>>>>> > > > performance.read-ahead: off
>>>>>>> > > > performance.io-cache: off
>>>>>>> > > > performance.stat-prefetch: on
>>>>>>> > > > cluster.eager-lock: enable
>>>>>>> > > > network.remote-dio: enable
>>>>>>> > > > cluster.quorum-type: auto
>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>> > > > server.allow-insecure: on
>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>> > > > cluster.background-self-heal-count:
16
>>>>>>> > > > performance.strict-write-ordering:
off
>>>>>>> > > > nfs.disable: on
>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>> > > > nfs.enable-ino32: off
>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>> > > >
>>>>>>> > > > Friday did rolling upgrade from
3.8.3->3.8.3 no issues.
>>>>>>> > > > Following steps detailed in previous
recommendations began
>>>>>>> proces of
>>>>>>> > > > replacing and healngbricks one node
at a time.
>>>>>>> > > >
>>>>>>> > > > 1) kill pid of brick
>>>>>>> > > > 2) reconfigure brick from raid6 to
raid10
>>>>>>> > > > 3) recreate directory of brick
>>>>>>> > > > 4) gluster volume start <>
force
>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>> > > Hi,
>>>>>>> > >
>>>>>>> > > I'd suggest that full heal is not
used. There are a few bugs in
>>>>>>> full heal.
>>>>>>> > > Better safe than sorry ;)
>>>>>>> > > Instead I'd suggest the following
steps:
>>>>>>> > >
>>>>>>> > > Currently I brought the node down by
systemctl stop glusterd as
>>>>>>> I was
>>>>>>> > getting sporadic io issues and a few VM's
paused so hoping that
>>>>>>> will help.
>>>>>>> > I may wait to do this till around 4PM when
most work is done in
>>>>>>> case it
>>>>>>> > shoots load up.
>>>>>>> >
>>>>>>> >
>>>>>>> > > 1) kill pid of brick
>>>>>>> > > 2) to configuring of brick that you need
>>>>>>> > > 3) recreate brick dir
>>>>>>> > > 4) while the brick is still down, from
the mount point:
>>>>>>> > >    a) create a dummy non existent dir
under / of mount.
>>>>>>> > >
>>>>>>> >
>>>>>>> > so if noee 2 is down brick, pick node for
example 3 and make a
>>>>>>> test dir
>>>>>>> > under its brick directory that doesnt exist on
2 or should I be
>>>>>>> dong this
>>>>>>> > over a gluster mount?
>>>>>>> You should be doing this over gluster mount.
>>>>>>> >
>>>>>>> > >    b) set a non existent extended
attribute on / of mount.
>>>>>>> > >
>>>>>>> >
>>>>>>> > Could you give me an example of an attribute
to set?   I've read a
>>>>>>> tad on
>>>>>>> > this, and looked up attributes but haven't
set any yet myself.
>>>>>>> >
>>>>>>> Sure. setfattr -n "user.some-name" -v
"some-value" <path-to-mount>
>>>>>>> > Doing these steps will ensure that heal
happens only from updated
>>>>>>> brick to
>>>>>>> > > down brick.
>>>>>>> > > 5) gluster v start <> force
>>>>>>> > > 6) gluster v heal <>
>>>>>>> > >
>>>>>>> >
>>>>>>> > Will it matter if somewhere in gluster the
full heal command was
>>>>>>> run other
>>>>>>> > day?  Not sure if it eventually stops or times
out.
>>>>>>> >
>>>>>>> full heal will stop once the crawl is done. So if
you want to
>>>>>>> trigger heal again,
>>>>>>> run gluster v heal <>. Actually even brick up
or volume start force
>>>>>>> should
>>>>>>> trigger the heal.
>>>>>>>
>>>>>>
>>>>>> Did this on test bed today.  its one server with 3
bricks on same
>>>>>> machine so take that for what its worth.  also it still
runs 3.8.2.  Maybe
>>>>>> ill update and re-run test.
>>>>>>
>>>>>> killed brick
>>>>>> deleted brick dir
>>>>>> recreated brick dir
>>>>>> created fake dir on gluster mount
>>>>>> set suggested fake attribute on it
>>>>>> ran volume start <> force
>>>>>>
>>>>>> looked at files it said needed healing and it was just
8 shards that
>>>>>> were modified for few minutes I ran through steps
>>>>>>
>>>>>> gave it few minutes and it stayed same
>>>>>> ran gluster volume <> heal
>>>>>>
>>>>>> it healed all the directories and files you can see
over mount
>>>>>> including fakedir.
>>>>>>
>>>>>> same issue for shards though.  it adds more shards to
heal at glacier
>>>>>> pace.  slight jump in speed if I stat every file and
dir in VM running but
>>>>>> not all shards.
>>>>>>
>>>>>> It started with 8 shards to heal and is now only at 33
out of 800 and
>>>>>> probably wont finish adding for few days at rate it
goes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> > >
>>>>>>> > > > 1st node worked as expected took 12
hours to heal 1TB data.
>>>>>>> Load was
>>>>>>> > > little
>>>>>>> > > > heavy but nothing shocking.
>>>>>>> > > >
>>>>>>> > > > About an hour after node 1 finished
I began same process on
>>>>>>> node2. Heal
>>>>>>> > > > proces kicked in as before and the
files in directories
>>>>>>> visible from
>>>>>>> > > mount
>>>>>>> > > > and .glusterfs healed in short time.
Then it began crawl of
>>>>>>> .shard adding
>>>>>>> > > > those files to heal count at which
point the entire proces
>>>>>>> ground to a
>>>>>>> > > halt
>>>>>>> > > > basically. After 48 hours out of 19k
shards it has added 5900
>>>>>>> to heal
>>>>>>> > > list.
>>>>>>> > > > Load on all 3 machnes is negligible.
It was suggested to
>>>>>>> change this
>>>>>>> > > value
>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart volume
>>>>>>> which I
>>>>>>> > > did. No
>>>>>>> > > > efffect. Tried relaunching heal no
effect, despite any node
>>>>>>> picked. I
>>>>>>> > > > started each VM and performed a stat
of all files from within
>>>>>>> it, or a
>>>>>>> > > full
>>>>>>> > > > virus scan and that seemed to cause
short small spikes in
>>>>>>> shards added,
>>>>>>> > > but
>>>>>>> > > > not by much. Logs are showing no
real messages indicating
>>>>>>> anything is
>>>>>>> > > going
>>>>>>> > > > on. I get hits to brick log on
occasion of null lookups making
>>>>>>> me think
>>>>>>> > > its
>>>>>>> > > > not really crawling shards directory
but waiting for a shard
>>>>>>> lookup to
>>>>>>> > > add
>>>>>>> > > > it. I'll get following in brick
log but not constant and
>>>>>>> sometime
>>>>>>> > > multiple
>>>>>>> > > > for same shard.
>>>>>>> > > >
>>>>>>> > > > [2016-08-29 08:31:57.478125] W
[MSGID: 115009]
>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>> resolution
>>>>>>> > > type
>>>>>>> > > > for (null) (LOOKUP)
>>>>>>> > > > [2016-08-29 08:31:57.478170] E
[MSGID: 115050]
>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>> 12591783:
>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==>
>>>>>>> (Invalid
>>>>>>> > > > argument) [Invalid argument]
>>>>>>> > > >
>>>>>>> > > > This one repeated about 30 times in
row then nothing for 10
>>>>>>> minutes then
>>>>>>> > > one
>>>>>>> > > > hit for one different shard by
itself.
>>>>>>> > > >
>>>>>>> > > > How can I determine if Heal is
actually running? How can I
>>>>>>> kill it or
>>>>>>> > > force
>>>>>>> > > > restart? Does node I start it from
determine which directory
>>>>>>> gets
>>>>>>> > > crawled to
>>>>>>> > > > determine heals?
>>>>>>> > > >
>>>>>>> > > > David Gossage
>>>>>>> > > > Carousel Checks Inc. | System
Administrator
>>>>>>> > > > Office 708.613.2284
>>>>>>> > > >
>>>>>>> > > >
_______________________________________________
>>>>>>> > > > Gluster-users mailing list
>>>>>>> > > > Gluster-users at gluster.org
>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
_______________________________________________
>>>>>>> > > > Gluster-users mailing list
>>>>>>> > > > Gluster-users at gluster.org
>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > Thanks,
>>>>>>> > > Anuradha.
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Anuradha.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node1
Type: application/octet-stream
Size: 322716 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node2.gz
Type: application/x-gzip
Size: 645489 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node3.gz
Type: application/x-gzip
Size: 296635 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-testnode
Type: application/octet-stream
Size: 20910 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.obj>

David Gossage

2016-Aug-30 14:03 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj
at redhat.com
>>>> > wrote:
>>>>
>>>>> Could you also share the glustershd logs?
>>>>>
>>>>
>>>> I'll get them when I get to work sure
>>>>
>>>
>>>>
>>>>>
>>>>> I tried the same steps that you mentioned multiple times,
but heal is
>>>>> running to completion without any issues.
>>>>>
>>>>> It must be said that 'heal full' traverses the
files and directories
>>>>> in a depth-first order and does heals also in the same
order. But if it
>>>>> gets interrupted in the middle (say because
self-heal-daemon was either
>>>>> intentionally or unintentionally brought offline and then
brought back up),
>>>>> self-heal will only pick up the entries that are so far
marked as
>>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>>> What this means is that those files and directories that
were not visited
>>>>> during the crawl, will remain untouched and unhealed in
this second
>>>>> iteration of heal, unless you execute a 'heal-full'
again.
>>>>>
>>>>
>>>> So should it start healing shards as it crawls or not until
after it
>>>> crawls the entire .shard directory?  At the pace it was going
that could be
>>>> a week with one node appearing in the cluster but with no shard
files if
>>>> anything tries to access a file on that node.  From my
experience other day
>>>> telling it to heal full again did nothing regardless of node
used.
>>>>
>>>
>> Crawl is started from '/' of the volume. Whenever self-heal
detects
>> during the crawl that a file or directory is present in some brick(s)
and
>> absent in others, it creates the file on the bricks where it is absent
and
>> marks the fact that the file or directory might need data/entry and
>> metadata heal too (this also means that an index is created under
>> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>> metadata heal are picked up and done in
>>
> the background with the help of these indices.
>>
>
> Looking at my 3rd node as example i find nearly an exact same number of
> files in xattrop dir as reported by heal count at time I brought down node2
> to try and alleviate read io errors that seemed to occur from what I was
> guessing as attempts to use the node with no shards for reads.
>
> Also attached are the glustershd logs from the 3 nodes, along with the
> test node i tried yesterday with same results.
>
Is it possible you just need to spam the heal full command?  Wait for a
certain amount of time for it to time out?

The test server that I did yesterday that stopped at listing 33 shards then
healing none of them stlll had 33 shards in list this morning.  I issued
another heal full and it jumped up and found the missing shards.

On the one hand its reassuring that if I just spam the command enough
eventually it will heal.  It's also disconcerting that if I spam the
command enough times the heal will start.

I can't test if same behavior would occur on live node as I expect if it
did kick in heals I'd have 12 hours of high load during copy again
perhaps.  But I can test if it happens after last shift.  Though I lost
track of how many times I tried restarting heal full over Saturday and
Sunday when it looked to be doing nothing from all heal tracking commands
documented.

>>
>>>>
>>>>> My suspicion is that this is what happened on your setup.
Could you
>>>>> confirm if that was the case?
>>>>>
>>>>
>>>> Brick was brought online with force start then a full heal
launched.
>>>> Hours later after it became evident that it was not adding new
files to
>>>> heal I did try restarting self-heal daemon and relaunching full
heal again.
>>>> But this was after the heal had basically already failed to
work as
>>>> intended.
>>>>
>>>
>>> OK. How did you figure it was not adding any new files? I need to
know
>>> what places you were monitoring to come to this conclusion.
>>>
>>> -Krutika
>>>
>>>
>>>>
>>>>
>>>>> As for those logs, I did manager to do something that
caused these
>>>>> warning messages you shared earlier to appear in my client
and server logs.
>>>>> Although these logs are annoying and a bit scary too, they
didn't do
>>>>> any harm to the data in my volume. Why they appear just
after a brick is
>>>>> replaced and under no other circumstances is something
I'm still
>>>>> investigating.
>>>>>
>>>>> But for future, it would be good to follow the steps
Anuradha gave as
>>>>> that would allow self-heal to at least detect that it has
some repairing to
>>>>> do whenever it is restarted whether intentionally or
otherwise.
>>>>>
>>>>
>>>> I followed those steps as described on my test box and ended up
with
>>>> exact same outcome of adding shards at an agonizing slow pace
and no
>>>> creation of .shard directory or heals on shard directory. 
Directories
>>>> visible from mount healed quickly.  This was with one VM so it
has only 800
>>>> shards as well.  After hours at work it had added a total of 33
shards to
>>>> be healed.  I sent those logs yesterday as well though not the
glustershd.
>>>>
>>>> Does replace-brick command copy files in same manner?  For
these
>>>> purposes I am contemplating just skipping the heal route.
>>>>
>>>>
>>>>> -Krutika
>>>>>
>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> attached brick and client logs from test machine where
same behavior
>>>>>> occurred not sure if anything new is there.  its still
on 3.8.2
>>>>>>
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>> Options Reconfigured:
>>>>>> cluster.locking-scheme: granular
>>>>>> performance.strict-o-direct: off
>>>>>> features.shard-block-size: 64MB
>>>>>> features.shard: on
>>>>>> server.allow-insecure: on
>>>>>> storage.owner-uid: 36
>>>>>> storage.owner-gid: 36
>>>>>> cluster.server-quorum-type: server
>>>>>> cluster.quorum-type: auto
>>>>>> network.remote-dio: on
>>>>>> cluster.eager-lock: enable
>>>>>> performance.stat-prefetch: off
>>>>>> performance.io-cache: off
>>>>>> performance.quick-read: off
>>>>>> cluster.self-heal-window-size: 1024
>>>>>> cluster.background-self-heal-count: 16
>>>>>> nfs.enable-ino32: off
>>>>>> nfs.addr-namelookup: off
>>>>>> nfs.disable: on
>>>>>> performance.read-ahead: off
>>>>>> performance.readdir-ahead: on
>>>>>> cluster.granular-entry-heal: on
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur
<atalur at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> > From: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>> > To: "Anuradha Talur" <atalur
at redhat.com>
>>>>>>>> > Cc: "gluster-users at gluster.org
List" <Gluster-users at gluster.org>,
>>>>>>>> "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards
Healing Glacier Slow
>>>>>>>> >
>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha
Talur <
>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>> >
>>>>>>>> > > Response inline.
>>>>>>>> > >
>>>>>>>> > > ----- Original Message -----
>>>>>>>> > > > From: "Krutika
Dhananjay" <kdhananj at redhat.com>
>>>>>>>> > > > To: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>> > > > Cc: "gluster-users at
gluster.org List" <
>>>>>>>> Gluster-users at gluster.org>
>>>>>>>> > > > Sent: Monday, August 29, 2016
3:55:04 PM
>>>>>>>> > > > Subject: Re: [Gluster-users]
3.8.3 Shards Healing Glacier Slow
>>>>>>>> > > >
>>>>>>>> > > > Could you attach both client and
brick logs? Meanwhile I will
>>>>>>>> try these
>>>>>>>> > > steps
>>>>>>>> > > > out on my machines and see if it
is easily recreatable.
>>>>>>>> > > >
>>>>>>>> > > > -Krutika
>>>>>>>> > > >
>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM,
David Gossage <
>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>> > > > > wrote:
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>> > > >
>>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Options Reconfigured:
>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>> > > > features.shard: on
>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>> > > > performance.quick-read: off
>>>>>>>> > > > performance.read-ahead: off
>>>>>>>> > > > performance.io-cache: off
>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>> > > > network.remote-dio: enable
>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>> > > > cluster.server-quorum-type:
server
>>>>>>>> > > > server.allow-insecure: on
>>>>>>>> > > > cluster.self-heal-window-size:
1024
>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>> > > > nfs.disable: on
>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>> > > >
>>>>>>>> > > > Friday did rolling upgrade from
3.8.3->3.8.3 no issues.
>>>>>>>> > > > Following steps detailed in
previous recommendations began
>>>>>>>> proces of
>>>>>>>> > > > replacing and healngbricks one
node at a time.
>>>>>>>> > > >
>>>>>>>> > > > 1) kill pid of brick
>>>>>>>> > > > 2) reconfigure brick from raid6
to raid10
>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>> > > > 4) gluster volume start <>
force
>>>>>>>> > > > 5) gluster volume heal <>
full
>>>>>>>> > > Hi,
>>>>>>>> > >
>>>>>>>> > > I'd suggest that full heal is not
used. There are a few bugs in
>>>>>>>> full heal.
>>>>>>>> > > Better safe than sorry ;)
>>>>>>>> > > Instead I'd suggest the following
steps:
>>>>>>>> > >
>>>>>>>> > > Currently I brought the node down by
systemctl stop glusterd as
>>>>>>>> I was
>>>>>>>> > getting sporadic io issues and a few
VM's paused so hoping that
>>>>>>>> will help.
>>>>>>>> > I may wait to do this till around 4PM when
most work is done in
>>>>>>>> case it
>>>>>>>> > shoots load up.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > > 1) kill pid of brick
>>>>>>>> > > 2) to configuring of brick that you
need
>>>>>>>> > > 3) recreate brick dir
>>>>>>>> > > 4) while the brick is still down,
from the mount point:
>>>>>>>> > >    a) create a dummy non existent dir
under / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > so if noee 2 is down brick, pick node for
example 3 and make a
>>>>>>>> test dir
>>>>>>>> > under its brick directory that doesnt
exist on 2 or should I be
>>>>>>>> dong this
>>>>>>>> > over a gluster mount?
>>>>>>>> You should be doing this over gluster mount.
>>>>>>>> >
>>>>>>>> > >    b) set a non existent extended
attribute on / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Could you give me an example of an
attribute to set?   I've read
>>>>>>>> a tad on
>>>>>>>> > this, and looked up attributes but
haven't set any yet myself.
>>>>>>>> >
>>>>>>>> Sure. setfattr -n "user.some-name" -v
"some-value" <path-to-mount>
>>>>>>>> > Doing these steps will ensure that heal
happens only from updated
>>>>>>>> brick to
>>>>>>>> > > down brick.
>>>>>>>> > > 5) gluster v start <> force
>>>>>>>> > > 6) gluster v heal <>
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Will it matter if somewhere in gluster the
full heal command was
>>>>>>>> run other
>>>>>>>> > day?  Not sure if it eventually stops or
times out.
>>>>>>>> >
>>>>>>>> full heal will stop once the crawl is done. So
if you want to
>>>>>>>> trigger heal again,
>>>>>>>> run gluster v heal <>. Actually even
brick up or volume start force
>>>>>>>> should
>>>>>>>> trigger the heal.
>>>>>>>>
>>>>>>>
>>>>>>> Did this on test bed today.  its one server with 3
bricks on same
>>>>>>> machine so take that for what its worth.  also it
still runs 3.8.2.  Maybe
>>>>>>> ill update and re-run test.
>>>>>>>
>>>>>>> killed brick
>>>>>>> deleted brick dir
>>>>>>> recreated brick dir
>>>>>>> created fake dir on gluster mount
>>>>>>> set suggested fake attribute on it
>>>>>>> ran volume start <> force
>>>>>>>
>>>>>>> looked at files it said needed healing and it was
just 8 shards that
>>>>>>> were modified for few minutes I ran through steps
>>>>>>>
>>>>>>> gave it few minutes and it stayed same
>>>>>>> ran gluster volume <> heal
>>>>>>>
>>>>>>> it healed all the directories and files you can see
over mount
>>>>>>> including fakedir.
>>>>>>>
>>>>>>> same issue for shards though.  it adds more shards
to heal at
>>>>>>> glacier pace.  slight jump in speed if I stat every
file and dir in VM
>>>>>>> running but not all shards.
>>>>>>>
>>>>>>> It started with 8 shards to heal and is now only at
33 out of 800
>>>>>>> and probably wont finish adding for few days at
rate it goes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> > >
>>>>>>>> > > > 1st node worked as expected took
12 hours to heal 1TB data.
>>>>>>>> Load was
>>>>>>>> > > little
>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>> > > >
>>>>>>>> > > > About an hour after node 1
finished I began same process on
>>>>>>>> node2. Heal
>>>>>>>> > > > proces kicked in as before and
the files in directories
>>>>>>>> visible from
>>>>>>>> > > mount
>>>>>>>> > > > and .glusterfs healed in short
time. Then it began crawl of
>>>>>>>> .shard adding
>>>>>>>> > > > those files to heal count at
which point the entire proces
>>>>>>>> ground to a
>>>>>>>> > > halt
>>>>>>>> > > > basically. After 48 hours out of
19k shards it has added 5900
>>>>>>>> to heal
>>>>>>>> > > list.
>>>>>>>> > > > Load on all 3 machnes is
negligible. It was suggested to
>>>>>>>> change this
>>>>>>>> > > value
>>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart volume
>>>>>>>> which I
>>>>>>>> > > did. No
>>>>>>>> > > > efffect. Tried relaunching heal
no effect, despite any node
>>>>>>>> picked. I
>>>>>>>> > > > started each VM and performed a
stat of all files from within
>>>>>>>> it, or a
>>>>>>>> > > full
>>>>>>>> > > > virus scan and that seemed to
cause short small spikes in
>>>>>>>> shards added,
>>>>>>>> > > but
>>>>>>>> > > > not by much. Logs are showing no
real messages indicating
>>>>>>>> anything is
>>>>>>>> > > going
>>>>>>>> > > > on. I get hits to brick log on
occasion of null lookups
>>>>>>>> making me think
>>>>>>>> > > its
>>>>>>>> > > > not really crawling shards
directory but waiting for a shard
>>>>>>>> lookup to
>>>>>>>> > > add
>>>>>>>> > > > it. I'll get following in
brick log but not constant and
>>>>>>>> sometime
>>>>>>>> > > multiple
>>>>>>>> > > > for same shard.
>>>>>>>> > > >
>>>>>>>> > > > [2016-08-29 08:31:57.478125] W
[MSGID: 115009]
>>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>>> resolution
>>>>>>>> > > type
>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>> > > > [2016-08-29 08:31:57.478170] E
[MSGID: 115050]
>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>>> 12591783:
>>>>>>>> > > > LOOKUP (null)
(00000000-0000-0000-00
>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>> ==> (Invalid
>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>> > > >
>>>>>>>> > > > This one repeated about 30 times
in row then nothing for 10
>>>>>>>> minutes then
>>>>>>>> > > one
>>>>>>>> > > > hit for one different shard by
itself.
>>>>>>>> > > >
>>>>>>>> > > > How can I determine if Heal is
actually running? How can I
>>>>>>>> kill it or
>>>>>>>> > > force
>>>>>>>> > > > restart? Does node I start it
from determine which directory
>>>>>>>> gets
>>>>>>>> > > crawled to
>>>>>>>> > > > determine heals?
>>>>>>>> > > >
>>>>>>>> > > > David Gossage
>>>>>>>> > > > Carousel Checks Inc. | System
Administrator
>>>>>>>> > > > Office 708.613.2284
>>>>>>>> > > >
>>>>>>>> > > >
_______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
_______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Thanks,
>>>>>>>> > > Anuradha.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Anuradha.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/35fd16da/attachment.html>

David Gossage

2016-Aug-30 14:29 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj
at redhat.com
>>>> > wrote:
>>>>
>>>>> Could you also share the glustershd logs?
>>>>>
>>>>
>>>> I'll get them when I get to work sure
>>>>
>>>
>>>>
>>>>>
>>>>> I tried the same steps that you mentioned multiple times,
but heal is
>>>>> running to completion without any issues.
>>>>>
>>>>> It must be said that 'heal full' traverses the
files and directories
>>>>> in a depth-first order and does heals also in the same
order. But if it
>>>>> gets interrupted in the middle (say because
self-heal-daemon was either
>>>>> intentionally or unintentionally brought offline and then
brought back up),
>>>>> self-heal will only pick up the entries that are so far
marked as
>>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>>> What this means is that those files and directories that
were not visited
>>>>> during the crawl, will remain untouched and unhealed in
this second
>>>>> iteration of heal, unless you execute a 'heal-full'
again.
>>>>>
>>>>
>>>> So should it start healing shards as it crawls or not until
after it
>>>> crawls the entire .shard directory?  At the pace it was going
that could be
>>>> a week with one node appearing in the cluster but with no shard
files if
>>>> anything tries to access a file on that node.  From my
experience other day
>>>> telling it to heal full again did nothing regardless of node
used.
>>>>
>>>
>> Crawl is started from '/' of the volume. Whenever self-heal
detects
>> during the crawl that a file or directory is present in some brick(s)
and
>> absent in others, it creates the file on the bricks where it is absent
and
>> marks the fact that the file or directory might need data/entry and
>> metadata heal too (this also means that an index is created under
>> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>> metadata heal are picked up and done in
>>
> the background with the help of these indices.
>>
>
> Looking at my 3rd node as example i find nearly an exact same number of
> files in xattrop dir as reported by heal count at time I brought down node2
> to try and alleviate read io errors that seemed to occur from what I was
> guessing as attempts to use the node with no shards for reads.
>
> Also attached are the glustershd logs from the 3 nodes, along with the
> test node i tried yesterday with same results.
>
Looking at my own logs I notice that a full sweep was only ever recorded in
glustershd.log on 2nd node with missing directory.  I believe I should have
found a sweep begun on every node correct?

On my test dev when it did work I do see that

[2016-08-30 13:56:25.223333] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-0
[2016-08-30 13:56:25.223522] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-1
[2016-08-30 13:56:25.224616] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-2
[2016-08-30 14:18:48.333740] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-2
[2016-08-30 14:18:48.356008] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-1
[2016-08-30 14:18:49.637811] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-0

While when looking at past few days of the 3 prod nodes i only found that
on my 2nd node
[2016-08-27 01:26:42.638772] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 11:37:01.732366] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
[2016-08-27 12:58:34.597228] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 12:59:28.041173] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
[2016-08-27 20:03:42.560188] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 20:03:44.278274] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
[2016-08-27 21:00:42.603315] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 21:00:46.148674] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1




>
>>
>>>>
>>>>> My suspicion is that this is what happened on your setup.
Could you
>>>>> confirm if that was the case?
>>>>>
>>>>
>>>> Brick was brought online with force start then a full heal
launched.
>>>> Hours later after it became evident that it was not adding new
files to
>>>> heal I did try restarting self-heal daemon and relaunching full
heal again.
>>>> But this was after the heal had basically already failed to
work as
>>>> intended.
>>>>
>>>
>>> OK. How did you figure it was not adding any new files? I need to
know
>>> what places you were monitoring to come to this conclusion.
>>>
>>> -Krutika
>>>
>>>
>>>>
>>>>
>>>>> As for those logs, I did manager to do something that
caused these
>>>>> warning messages you shared earlier to appear in my client
and server logs.
>>>>> Although these logs are annoying and a bit scary too, they
didn't do
>>>>> any harm to the data in my volume. Why they appear just
after a brick is
>>>>> replaced and under no other circumstances is something
I'm still
>>>>> investigating.
>>>>>
>>>>> But for future, it would be good to follow the steps
Anuradha gave as
>>>>> that would allow self-heal to at least detect that it has
some repairing to
>>>>> do whenever it is restarted whether intentionally or
otherwise.
>>>>>
>>>>
>>>> I followed those steps as described on my test box and ended up
with
>>>> exact same outcome of adding shards at an agonizing slow pace
and no
>>>> creation of .shard directory or heals on shard directory. 
Directories
>>>> visible from mount healed quickly.  This was with one VM so it
has only 800
>>>> shards as well.  After hours at work it had added a total of 33
shards to
>>>> be healed.  I sent those logs yesterday as well though not the
glustershd.
>>>>
>>>> Does replace-brick command copy files in same manner?  For
these
>>>> purposes I am contemplating just skipping the heal route.
>>>>
>>>>
>>>>> -Krutika
>>>>>
>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> attached brick and client logs from test machine where
same behavior
>>>>>> occurred not sure if anything new is there.  its still
on 3.8.2
>>>>>>
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>> Options Reconfigured:
>>>>>> cluster.locking-scheme: granular
>>>>>> performance.strict-o-direct: off
>>>>>> features.shard-block-size: 64MB
>>>>>> features.shard: on
>>>>>> server.allow-insecure: on
>>>>>> storage.owner-uid: 36
>>>>>> storage.owner-gid: 36
>>>>>> cluster.server-quorum-type: server
>>>>>> cluster.quorum-type: auto
>>>>>> network.remote-dio: on
>>>>>> cluster.eager-lock: enable
>>>>>> performance.stat-prefetch: off
>>>>>> performance.io-cache: off
>>>>>> performance.quick-read: off
>>>>>> cluster.self-heal-window-size: 1024
>>>>>> cluster.background-self-heal-count: 16
>>>>>> nfs.enable-ino32: off
>>>>>> nfs.addr-namelookup: off
>>>>>> nfs.disable: on
>>>>>> performance.read-ahead: off
>>>>>> performance.readdir-ahead: on
>>>>>> cluster.granular-entry-heal: on
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur
<atalur at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> > From: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>> > To: "Anuradha Talur" <atalur
at redhat.com>
>>>>>>>> > Cc: "gluster-users at gluster.org
List" <Gluster-users at gluster.org>,
>>>>>>>> "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards
Healing Glacier Slow
>>>>>>>> >
>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha
Talur <
>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>> >
>>>>>>>> > > Response inline.
>>>>>>>> > >
>>>>>>>> > > ----- Original Message -----
>>>>>>>> > > > From: "Krutika
Dhananjay" <kdhananj at redhat.com>
>>>>>>>> > > > To: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>> > > > Cc: "gluster-users at
gluster.org List" <
>>>>>>>> Gluster-users at gluster.org>
>>>>>>>> > > > Sent: Monday, August 29, 2016
3:55:04 PM
>>>>>>>> > > > Subject: Re: [Gluster-users]
3.8.3 Shards Healing Glacier Slow
>>>>>>>> > > >
>>>>>>>> > > > Could you attach both client and
brick logs? Meanwhile I will
>>>>>>>> try these
>>>>>>>> > > steps
>>>>>>>> > > > out on my machines and see if it
is easily recreatable.
>>>>>>>> > > >
>>>>>>>> > > > -Krutika
>>>>>>>> > > >
>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM,
David Gossage <
>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>> > > > > wrote:
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>> > > >
>>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Options Reconfigured:
>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>> > > > features.shard: on
>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>> > > > performance.quick-read: off
>>>>>>>> > > > performance.read-ahead: off
>>>>>>>> > > > performance.io-cache: off
>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>> > > > network.remote-dio: enable
>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>> > > > cluster.server-quorum-type:
server
>>>>>>>> > > > server.allow-insecure: on
>>>>>>>> > > > cluster.self-heal-window-size:
1024
>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>> > > > nfs.disable: on
>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>> > > >
>>>>>>>> > > > Friday did rolling upgrade from
3.8.3->3.8.3 no issues.
>>>>>>>> > > > Following steps detailed in
previous recommendations began
>>>>>>>> proces of
>>>>>>>> > > > replacing and healngbricks one
node at a time.
>>>>>>>> > > >
>>>>>>>> > > > 1) kill pid of brick
>>>>>>>> > > > 2) reconfigure brick from raid6
to raid10
>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>> > > > 4) gluster volume start <>
force
>>>>>>>> > > > 5) gluster volume heal <>
full
>>>>>>>> > > Hi,
>>>>>>>> > >
>>>>>>>> > > I'd suggest that full heal is not
used. There are a few bugs in
>>>>>>>> full heal.
>>>>>>>> > > Better safe than sorry ;)
>>>>>>>> > > Instead I'd suggest the following
steps:
>>>>>>>> > >
>>>>>>>> > > Currently I brought the node down by
systemctl stop glusterd as
>>>>>>>> I was
>>>>>>>> > getting sporadic io issues and a few
VM's paused so hoping that
>>>>>>>> will help.
>>>>>>>> > I may wait to do this till around 4PM when
most work is done in
>>>>>>>> case it
>>>>>>>> > shoots load up.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > > 1) kill pid of brick
>>>>>>>> > > 2) to configuring of brick that you
need
>>>>>>>> > > 3) recreate brick dir
>>>>>>>> > > 4) while the brick is still down,
from the mount point:
>>>>>>>> > >    a) create a dummy non existent dir
under / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > so if noee 2 is down brick, pick node for
example 3 and make a
>>>>>>>> test dir
>>>>>>>> > under its brick directory that doesnt
exist on 2 or should I be
>>>>>>>> dong this
>>>>>>>> > over a gluster mount?
>>>>>>>> You should be doing this over gluster mount.
>>>>>>>> >
>>>>>>>> > >    b) set a non existent extended
attribute on / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Could you give me an example of an
attribute to set?   I've read
>>>>>>>> a tad on
>>>>>>>> > this, and looked up attributes but
haven't set any yet myself.
>>>>>>>> >
>>>>>>>> Sure. setfattr -n "user.some-name" -v
"some-value" <path-to-mount>
>>>>>>>> > Doing these steps will ensure that heal
happens only from updated
>>>>>>>> brick to
>>>>>>>> > > down brick.
>>>>>>>> > > 5) gluster v start <> force
>>>>>>>> > > 6) gluster v heal <>
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Will it matter if somewhere in gluster the
full heal command was
>>>>>>>> run other
>>>>>>>> > day?  Not sure if it eventually stops or
times out.
>>>>>>>> >
>>>>>>>> full heal will stop once the crawl is done. So
if you want to
>>>>>>>> trigger heal again,
>>>>>>>> run gluster v heal <>. Actually even
brick up or volume start force
>>>>>>>> should
>>>>>>>> trigger the heal.
>>>>>>>>
>>>>>>>
>>>>>>> Did this on test bed today.  its one server with 3
bricks on same
>>>>>>> machine so take that for what its worth.  also it
still runs 3.8.2.  Maybe
>>>>>>> ill update and re-run test.
>>>>>>>
>>>>>>> killed brick
>>>>>>> deleted brick dir
>>>>>>> recreated brick dir
>>>>>>> created fake dir on gluster mount
>>>>>>> set suggested fake attribute on it
>>>>>>> ran volume start <> force
>>>>>>>
>>>>>>> looked at files it said needed healing and it was
just 8 shards that
>>>>>>> were modified for few minutes I ran through steps
>>>>>>>
>>>>>>> gave it few minutes and it stayed same
>>>>>>> ran gluster volume <> heal
>>>>>>>
>>>>>>> it healed all the directories and files you can see
over mount
>>>>>>> including fakedir.
>>>>>>>
>>>>>>> same issue for shards though.  it adds more shards
to heal at
>>>>>>> glacier pace.  slight jump in speed if I stat every
file and dir in VM
>>>>>>> running but not all shards.
>>>>>>>
>>>>>>> It started with 8 shards to heal and is now only at
33 out of 800
>>>>>>> and probably wont finish adding for few days at
rate it goes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> > >
>>>>>>>> > > > 1st node worked as expected took
12 hours to heal 1TB data.
>>>>>>>> Load was
>>>>>>>> > > little
>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>> > > >
>>>>>>>> > > > About an hour after node 1
finished I began same process on
>>>>>>>> node2. Heal
>>>>>>>> > > > proces kicked in as before and
the files in directories
>>>>>>>> visible from
>>>>>>>> > > mount
>>>>>>>> > > > and .glusterfs healed in short
time. Then it began crawl of
>>>>>>>> .shard adding
>>>>>>>> > > > those files to heal count at
which point the entire proces
>>>>>>>> ground to a
>>>>>>>> > > halt
>>>>>>>> > > > basically. After 48 hours out of
19k shards it has added 5900
>>>>>>>> to heal
>>>>>>>> > > list.
>>>>>>>> > > > Load on all 3 machnes is
negligible. It was suggested to
>>>>>>>> change this
>>>>>>>> > > value
>>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart volume
>>>>>>>> which I
>>>>>>>> > > did. No
>>>>>>>> > > > efffect. Tried relaunching heal
no effect, despite any node
>>>>>>>> picked. I
>>>>>>>> > > > started each VM and performed a
stat of all files from within
>>>>>>>> it, or a
>>>>>>>> > > full
>>>>>>>> > > > virus scan and that seemed to
cause short small spikes in
>>>>>>>> shards added,
>>>>>>>> > > but
>>>>>>>> > > > not by much. Logs are showing no
real messages indicating
>>>>>>>> anything is
>>>>>>>> > > going
>>>>>>>> > > > on. I get hits to brick log on
occasion of null lookups
>>>>>>>> making me think
>>>>>>>> > > its
>>>>>>>> > > > not really crawling shards
directory but waiting for a shard
>>>>>>>> lookup to
>>>>>>>> > > add
>>>>>>>> > > > it. I'll get following in
brick log but not constant and
>>>>>>>> sometime
>>>>>>>> > > multiple
>>>>>>>> > > > for same shard.
>>>>>>>> > > >
>>>>>>>> > > > [2016-08-29 08:31:57.478125] W
[MSGID: 115009]
>>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>>> resolution
>>>>>>>> > > type
>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>> > > > [2016-08-29 08:31:57.478170] E
[MSGID: 115050]
>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>>> 12591783:
>>>>>>>> > > > LOOKUP (null)
(00000000-0000-0000-00
>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>> ==> (Invalid
>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>> > > >
>>>>>>>> > > > This one repeated about 30 times
in row then nothing for 10
>>>>>>>> minutes then
>>>>>>>> > > one
>>>>>>>> > > > hit for one different shard by
itself.
>>>>>>>> > > >
>>>>>>>> > > > How can I determine if Heal is
actually running? How can I
>>>>>>>> kill it or
>>>>>>>> > > force
>>>>>>>> > > > restart? Does node I start it
from determine which directory
>>>>>>>> gets
>>>>>>>> > > crawled to
>>>>>>>> > > > determine heals?
>>>>>>>> > > >
>>>>>>>> > > > David Gossage
>>>>>>>> > > > Carousel Checks Inc. | System
Administrator
>>>>>>>> > > > Office 708.613.2284
>>>>>>>> > > >
>>>>>>>> > > >
_______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
_______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Thanks,
>>>>>>>> > > Anuradha.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Anuradha.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/4af75814/attachment.html>

Gluster users - Aug 2016 - 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow