thr3ads.net - Gluster users - [Gluster-users] 3.8.3 Shards Healing Glacier Slow [Aug 2016]

If this information is useful, please help other people find it:
Share via:

David Gossage

2016-Aug-30 14:29 UTC

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj
at redhat.com
>>>> > wrote:
>>>>
>>>>> Could you also share the glustershd logs?
>>>>>
>>>>
>>>> I'll get them when I get to work sure
>>>>
>>>
>>>>
>>>>>
>>>>> I tried the same steps that you mentioned multiple times,
but heal is
>>>>> running to completion without any issues.
>>>>>
>>>>> It must be said that 'heal full' traverses the
files and directories
>>>>> in a depth-first order and does heals also in the same
order. But if it
>>>>> gets interrupted in the middle (say because
self-heal-daemon was either
>>>>> intentionally or unintentionally brought offline and then
brought back up),
>>>>> self-heal will only pick up the entries that are so far
marked as
>>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>>> What this means is that those files and directories that
were not visited
>>>>> during the crawl, will remain untouched and unhealed in
this second
>>>>> iteration of heal, unless you execute a 'heal-full'
again.
>>>>>
>>>>
>>>> So should it start healing shards as it crawls or not until
after it
>>>> crawls the entire .shard directory?  At the pace it was going
that could be
>>>> a week with one node appearing in the cluster but with no shard
files if
>>>> anything tries to access a file on that node.  From my
experience other day
>>>> telling it to heal full again did nothing regardless of node
used.
>>>>
>>>
>> Crawl is started from '/' of the volume. Whenever self-heal
detects
>> during the crawl that a file or directory is present in some brick(s)
and
>> absent in others, it creates the file on the bricks where it is absent
and
>> marks the fact that the file or directory might need data/entry and
>> metadata heal too (this also means that an index is created under
>> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>> metadata heal are picked up and done in
>>
> the background with the help of these indices.
>>
>
> Looking at my 3rd node as example i find nearly an exact same number of
> files in xattrop dir as reported by heal count at time I brought down node2
> to try and alleviate read io errors that seemed to occur from what I was
> guessing as attempts to use the node with no shards for reads.
>
> Also attached are the glustershd logs from the 3 nodes, along with the
> test node i tried yesterday with same results.
>
Looking at my own logs I notice that a full sweep was only ever recorded in
glustershd.log on 2nd node with missing directory.  I believe I should have
found a sweep begun on every node correct?

On my test dev when it did work I do see that

[2016-08-30 13:56:25.223333] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-0
[2016-08-30 13:56:25.223522] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-1
[2016-08-30 13:56:25.224616] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-2
[2016-08-30 14:18:48.333740] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-2
[2016-08-30 14:18:48.356008] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-1
[2016-08-30 14:18:49.637811] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-0

While when looking at past few days of the 3 prod nodes i only found that
on my 2nd node
[2016-08-27 01:26:42.638772] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 11:37:01.732366] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
[2016-08-27 12:58:34.597228] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 12:59:28.041173] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
[2016-08-27 20:03:42.560188] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 20:03:44.278274] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
[2016-08-27 21:00:42.603315] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
[2016-08-27 21:00:46.148674] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1




>
>>
>>>>
>>>>> My suspicion is that this is what happened on your setup.
Could you
>>>>> confirm if that was the case?
>>>>>
>>>>
>>>> Brick was brought online with force start then a full heal
launched.
>>>> Hours later after it became evident that it was not adding new
files to
>>>> heal I did try restarting self-heal daemon and relaunching full
heal again.
>>>> But this was after the heal had basically already failed to
work as
>>>> intended.
>>>>
>>>
>>> OK. How did you figure it was not adding any new files? I need to
know
>>> what places you were monitoring to come to this conclusion.
>>>
>>> -Krutika
>>>
>>>
>>>>
>>>>
>>>>> As for those logs, I did manager to do something that
caused these
>>>>> warning messages you shared earlier to appear in my client
and server logs.
>>>>> Although these logs are annoying and a bit scary too, they
didn't do
>>>>> any harm to the data in my volume. Why they appear just
after a brick is
>>>>> replaced and under no other circumstances is something
I'm still
>>>>> investigating.
>>>>>
>>>>> But for future, it would be good to follow the steps
Anuradha gave as
>>>>> that would allow self-heal to at least detect that it has
some repairing to
>>>>> do whenever it is restarted whether intentionally or
otherwise.
>>>>>
>>>>
>>>> I followed those steps as described on my test box and ended up
with
>>>> exact same outcome of adding shards at an agonizing slow pace
and no
>>>> creation of .shard directory or heals on shard directory. 
Directories
>>>> visible from mount healed quickly.  This was with one VM so it
has only 800
>>>> shards as well.  After hours at work it had added a total of 33
shards to
>>>> be healed.  I sent those logs yesterday as well though not the
glustershd.
>>>>
>>>> Does replace-brick command copy files in same manner?  For
these
>>>> purposes I am contemplating just skipping the heal route.
>>>>
>>>>
>>>>> -Krutika
>>>>>
>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> attached brick and client logs from test machine where
same behavior
>>>>>> occurred not sure if anything new is there.  its still
on 3.8.2
>>>>>>
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>> Options Reconfigured:
>>>>>> cluster.locking-scheme: granular
>>>>>> performance.strict-o-direct: off
>>>>>> features.shard-block-size: 64MB
>>>>>> features.shard: on
>>>>>> server.allow-insecure: on
>>>>>> storage.owner-uid: 36
>>>>>> storage.owner-gid: 36
>>>>>> cluster.server-quorum-type: server
>>>>>> cluster.quorum-type: auto
>>>>>> network.remote-dio: on
>>>>>> cluster.eager-lock: enable
>>>>>> performance.stat-prefetch: off
>>>>>> performance.io-cache: off
>>>>>> performance.quick-read: off
>>>>>> cluster.self-heal-window-size: 1024
>>>>>> cluster.background-self-heal-count: 16
>>>>>> nfs.enable-ino32: off
>>>>>> nfs.addr-namelookup: off
>>>>>> nfs.disable: on
>>>>>> performance.read-ahead: off
>>>>>> performance.readdir-ahead: on
>>>>>> cluster.granular-entry-heal: on
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur
<atalur at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> > From: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>> > To: "Anuradha Talur" <atalur
at redhat.com>
>>>>>>>> > Cc: "gluster-users at gluster.org
List" <Gluster-users at gluster.org>,
>>>>>>>> "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards
Healing Glacier Slow
>>>>>>>> >
>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha
Talur <
>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>> >
>>>>>>>> > > Response inline.
>>>>>>>> > >
>>>>>>>> > > ----- Original Message -----
>>>>>>>> > > > From: "Krutika
Dhananjay" <kdhananj at redhat.com>
>>>>>>>> > > > To: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>> > > > Cc: "gluster-users at
gluster.org List" <
>>>>>>>> Gluster-users at gluster.org>
>>>>>>>> > > > Sent: Monday, August 29, 2016
3:55:04 PM
>>>>>>>> > > > Subject: Re: [Gluster-users]
3.8.3 Shards Healing Glacier Slow
>>>>>>>> > > >
>>>>>>>> > > > Could you attach both client and
brick logs? Meanwhile I will
>>>>>>>> try these
>>>>>>>> > > steps
>>>>>>>> > > > out on my machines and see if it
is easily recreatable.
>>>>>>>> > > >
>>>>>>>> > > > -Krutika
>>>>>>>> > > >
>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM,
David Gossage <
>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>> > > > > wrote:
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>> > > >
>>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Options Reconfigured:
>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>> > > > features.shard: on
>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>> > > > performance.quick-read: off
>>>>>>>> > > > performance.read-ahead: off
>>>>>>>> > > > performance.io-cache: off
>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>> > > > network.remote-dio: enable
>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>> > > > cluster.server-quorum-type:
server
>>>>>>>> > > > server.allow-insecure: on
>>>>>>>> > > > cluster.self-heal-window-size:
1024
>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>> > > > nfs.disable: on
>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>> > > >
>>>>>>>> > > > Friday did rolling upgrade from
3.8.3->3.8.3 no issues.
>>>>>>>> > > > Following steps detailed in
previous recommendations began
>>>>>>>> proces of
>>>>>>>> > > > replacing and healngbricks one
node at a time.
>>>>>>>> > > >
>>>>>>>> > > > 1) kill pid of brick
>>>>>>>> > > > 2) reconfigure brick from raid6
to raid10
>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>> > > > 4) gluster volume start <>
force
>>>>>>>> > > > 5) gluster volume heal <>
full
>>>>>>>> > > Hi,
>>>>>>>> > >
>>>>>>>> > > I'd suggest that full heal is not
used. There are a few bugs in
>>>>>>>> full heal.
>>>>>>>> > > Better safe than sorry ;)
>>>>>>>> > > Instead I'd suggest the following
steps:
>>>>>>>> > >
>>>>>>>> > > Currently I brought the node down by
systemctl stop glusterd as
>>>>>>>> I was
>>>>>>>> > getting sporadic io issues and a few
VM's paused so hoping that
>>>>>>>> will help.
>>>>>>>> > I may wait to do this till around 4PM when
most work is done in
>>>>>>>> case it
>>>>>>>> > shoots load up.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > > 1) kill pid of brick
>>>>>>>> > > 2) to configuring of brick that you
need
>>>>>>>> > > 3) recreate brick dir
>>>>>>>> > > 4) while the brick is still down,
from the mount point:
>>>>>>>> > >    a) create a dummy non existent dir
under / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > so if noee 2 is down brick, pick node for
example 3 and make a
>>>>>>>> test dir
>>>>>>>> > under its brick directory that doesnt
exist on 2 or should I be
>>>>>>>> dong this
>>>>>>>> > over a gluster mount?
>>>>>>>> You should be doing this over gluster mount.
>>>>>>>> >
>>>>>>>> > >    b) set a non existent extended
attribute on / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Could you give me an example of an
attribute to set?   I've read
>>>>>>>> a tad on
>>>>>>>> > this, and looked up attributes but
haven't set any yet myself.
>>>>>>>> >
>>>>>>>> Sure. setfattr -n "user.some-name" -v
"some-value" <path-to-mount>
>>>>>>>> > Doing these steps will ensure that heal
happens only from updated
>>>>>>>> brick to
>>>>>>>> > > down brick.
>>>>>>>> > > 5) gluster v start <> force
>>>>>>>> > > 6) gluster v heal <>
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Will it matter if somewhere in gluster the
full heal command was
>>>>>>>> run other
>>>>>>>> > day?  Not sure if it eventually stops or
times out.
>>>>>>>> >
>>>>>>>> full heal will stop once the crawl is done. So
if you want to
>>>>>>>> trigger heal again,
>>>>>>>> run gluster v heal <>. Actually even
brick up or volume start force
>>>>>>>> should
>>>>>>>> trigger the heal.
>>>>>>>>
>>>>>>>
>>>>>>> Did this on test bed today.  its one server with 3
bricks on same
>>>>>>> machine so take that for what its worth.  also it
still runs 3.8.2.  Maybe
>>>>>>> ill update and re-run test.
>>>>>>>
>>>>>>> killed brick
>>>>>>> deleted brick dir
>>>>>>> recreated brick dir
>>>>>>> created fake dir on gluster mount
>>>>>>> set suggested fake attribute on it
>>>>>>> ran volume start <> force
>>>>>>>
>>>>>>> looked at files it said needed healing and it was
just 8 shards that
>>>>>>> were modified for few minutes I ran through steps
>>>>>>>
>>>>>>> gave it few minutes and it stayed same
>>>>>>> ran gluster volume <> heal
>>>>>>>
>>>>>>> it healed all the directories and files you can see
over mount
>>>>>>> including fakedir.
>>>>>>>
>>>>>>> same issue for shards though.  it adds more shards
to heal at
>>>>>>> glacier pace.  slight jump in speed if I stat every
file and dir in VM
>>>>>>> running but not all shards.
>>>>>>>
>>>>>>> It started with 8 shards to heal and is now only at
33 out of 800
>>>>>>> and probably wont finish adding for few days at
rate it goes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> > >
>>>>>>>> > > > 1st node worked as expected took
12 hours to heal 1TB data.
>>>>>>>> Load was
>>>>>>>> > > little
>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>> > > >
>>>>>>>> > > > About an hour after node 1
finished I began same process on
>>>>>>>> node2. Heal
>>>>>>>> > > > proces kicked in as before and
the files in directories
>>>>>>>> visible from
>>>>>>>> > > mount
>>>>>>>> > > > and .glusterfs healed in short
time. Then it began crawl of
>>>>>>>> .shard adding
>>>>>>>> > > > those files to heal count at
which point the entire proces
>>>>>>>> ground to a
>>>>>>>> > > halt
>>>>>>>> > > > basically. After 48 hours out of
19k shards it has added 5900
>>>>>>>> to heal
>>>>>>>> > > list.
>>>>>>>> > > > Load on all 3 machnes is
negligible. It was suggested to
>>>>>>>> change this
>>>>>>>> > > value
>>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart volume
>>>>>>>> which I
>>>>>>>> > > did. No
>>>>>>>> > > > efffect. Tried relaunching heal
no effect, despite any node
>>>>>>>> picked. I
>>>>>>>> > > > started each VM and performed a
stat of all files from within
>>>>>>>> it, or a
>>>>>>>> > > full
>>>>>>>> > > > virus scan and that seemed to
cause short small spikes in
>>>>>>>> shards added,
>>>>>>>> > > but
>>>>>>>> > > > not by much. Logs are showing no
real messages indicating
>>>>>>>> anything is
>>>>>>>> > > going
>>>>>>>> > > > on. I get hits to brick log on
occasion of null lookups
>>>>>>>> making me think
>>>>>>>> > > its
>>>>>>>> > > > not really crawling shards
directory but waiting for a shard
>>>>>>>> lookup to
>>>>>>>> > > add
>>>>>>>> > > > it. I'll get following in
brick log but not constant and
>>>>>>>> sometime
>>>>>>>> > > multiple
>>>>>>>> > > > for same shard.
>>>>>>>> > > >
>>>>>>>> > > > [2016-08-29 08:31:57.478125] W
[MSGID: 115009]
>>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>>> resolution
>>>>>>>> > > type
>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>> > > > [2016-08-29 08:31:57.478170] E
[MSGID: 115050]
>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>>> 12591783:
>>>>>>>> > > > LOOKUP (null)
(00000000-0000-0000-00
>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>> ==> (Invalid
>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>> > > >
>>>>>>>> > > > This one repeated about 30 times
in row then nothing for 10
>>>>>>>> minutes then
>>>>>>>> > > one
>>>>>>>> > > > hit for one different shard by
itself.
>>>>>>>> > > >
>>>>>>>> > > > How can I determine if Heal is
actually running? How can I
>>>>>>>> kill it or
>>>>>>>> > > force
>>>>>>>> > > > restart? Does node I start it
from determine which directory
>>>>>>>> gets
>>>>>>>> > > crawled to
>>>>>>>> > > > determine heals?
>>>>>>>> > > >
>>>>>>>> > > > David Gossage
>>>>>>>> > > > Carousel Checks Inc. | System
Administrator
>>>>>>>> > > > Office 708.613.2284
>>>>>>>> > > >
>>>>>>>> > > >
_______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
_______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Thanks,
>>>>>>>> > > Anuradha.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Anuradha.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/4af75814/attachment.html>

David Gossage

2016-Aug-30 15:02 UTC

head link

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

updated test server to 3.8.3

Brick1: 192.168.71.10:/gluster2/brick1/1
Brick2: 192.168.71.11:/gluster2/brick2/1
Brick3: 192.168.71.12:/gluster2/brick3/1
Options Reconfigured:
cluster.granular-entry-heal: on
performance.readdir-ahead: on
performance.read-ahead: off
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: off
cluster.background-self-heal-count: 16
cluster.self-heal-window-size: 1024
performance.quick-read: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-gid: 36
storage.owner-uid: 36
server.allow-insecure: on
features.shard: on
features.shard-block-size: 64MB
performance.strict-o-direct: off
cluster.locking-scheme: granular

kill -15 brickpid
rm -Rf /gluster2/brick3
mkdir -p /gluster2/brick3/1
mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10
\:_glustershard/fake2
setfattr -n "user.some-name" -v "some-value"
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
gluster v start glustershard force

at this point brick process starts and all visible files including new dir
are made on brick
handful of shards are in heal statistics still but no .shard directory
created and no increase in shard count

gluster v heal glustershard

At this point still no increase in count or dir made no additional activity
in logs for healing generated.  waited few minutes tailing logs to check if
anything kicked in.

gluster v heal glustershard full

gluster shards added to list and heal commences.  logs show full sweep
starting on all 3 nodes.  though this time it only shows as finishing on
one which looks to be the one that had brick deleted.

[2016-08-30 14:45:33.098589] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-0
[2016-08-30 14:45:33.099492] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-1
[2016-08-30 14:45:33.100093] I [MSGID: 108026]
[afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
starting full sweep on subvol glustershard-client-2
[2016-08-30 14:52:29.760213] I [MSGID: 108026]
[afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
finished full sweep on subvol glustershard-client-2


my hope is that later tonight a full heal will work on production.  Is it
possible self-heal daemon can get stale or stop listening but still show as
active?  Would stopping and starting self-heal daemon from gluster cli
before doing these heals be helpful?


On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
> dgossage at carouselchecks.com> wrote:
>
>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at
redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <
>>>>> kdhananj at redhat.com> wrote:
>>>>>
>>>>>> Could you also share the glustershd logs?
>>>>>>
>>>>>
>>>>> I'll get them when I get to work sure
>>>>>
>>>>
>>>>>
>>>>>>
>>>>>> I tried the same steps that you mentioned multiple
times, but heal is
>>>>>> running to completion without any issues.
>>>>>>
>>>>>> It must be said that 'heal full' traverses the
files and directories
>>>>>> in a depth-first order and does heals also in the same
order. But if it
>>>>>> gets interrupted in the middle (say because
self-heal-daemon was either
>>>>>> intentionally or unintentionally brought offline and
then brought back up),
>>>>>> self-heal will only pick up the entries that are so far
marked as
>>>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>>>> What this means is that those files and directories
that were not visited
>>>>>> during the crawl, will remain untouched and unhealed in
this second
>>>>>> iteration of heal, unless you execute a
'heal-full' again.
>>>>>>
>>>>>
>>>>> So should it start healing shards as it crawls or not until
after it
>>>>> crawls the entire .shard directory?  At the pace it was
going that could be
>>>>> a week with one node appearing in the cluster but with no
shard files if
>>>>> anything tries to access a file on that node.  From my
experience other day
>>>>> telling it to heal full again did nothing regardless of
node used.
>>>>>
>>>>
>>> Crawl is started from '/' of the volume. Whenever self-heal
detects
>>> during the crawl that a file or directory is present in some
brick(s) and
>>> absent in others, it creates the file on the bricks where it is
absent and
>>> marks the fact that the file or directory might need data/entry and
>>> metadata heal too (this also means that an index is created under
>>> .glusterfs/indices/xattrop of the src bricks). And the data/entry
and
>>> metadata heal are picked up and done in
>>>
>> the background with the help of these indices.
>>>
>>
>> Looking at my 3rd node as example i find nearly an exact same number of
>> files in xattrop dir as reported by heal count at time I brought down
node2
>> to try and alleviate read io errors that seemed to occur from what I
was
>> guessing as attempts to use the node with no shards for reads.
>>
>> Also attached are the glustershd logs from the 3 nodes, along with the
>> test node i tried yesterday with same results.
>>
>
> Looking at my own logs I notice that a full sweep was only ever recorded
> in glustershd.log on 2nd node with missing directory.  I believe I should
> have found a sweep begun on every node correct?
>
> On my test dev when it did work I do see that
>
> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-0
> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-1
> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-2
> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
> finished full sweep on subvol glustershard-client-2
> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
> finished full sweep on subvol glustershard-client-1
> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
> finished full sweep on subvol glustershard-client-0
>
> While when looking at past few days of the 3 prod nodes i only found that
> on my 2nd node
> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> starting full sweep on subvol GLUSTER1-client-1
> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> finished full sweep on subvol GLUSTER1-client-1
> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> starting full sweep on subvol GLUSTER1-client-1
> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> finished full sweep on subvol GLUSTER1-client-1
> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> starting full sweep on subvol GLUSTER1-client-1
> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> finished full sweep on subvol GLUSTER1-client-1
> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> starting full sweep on subvol GLUSTER1-client-1
> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
> finished full sweep on subvol GLUSTER1-client-1
>
>
>
>
>
>>
>>>
>>>>>
>>>>>> My suspicion is that this is what happened on your
setup. Could you
>>>>>> confirm if that was the case?
>>>>>>
>>>>>
>>>>> Brick was brought online with force start then a full heal
launched.
>>>>> Hours later after it became evident that it was not adding
new files to
>>>>> heal I did try restarting self-heal daemon and relaunching
full heal again.
>>>>> But this was after the heal had basically already failed to
work as
>>>>> intended.
>>>>>
>>>>
>>>> OK. How did you figure it was not adding any new files? I need
to know
>>>> what places you were monitoring to come to this conclusion.
>>>>
>>>> -Krutika
>>>>
>>>>
>>>>>
>>>>>
>>>>>> As for those logs, I did manager to do something that
caused these
>>>>>> warning messages you shared earlier to appear in my
client and server logs.
>>>>>> Although these logs are annoying and a bit scary too,
they didn't do
>>>>>> any harm to the data in my volume. Why they appear just
after a brick is
>>>>>> replaced and under no other circumstances is something
I'm still
>>>>>> investigating.
>>>>>>
>>>>>> But for future, it would be good to follow the steps
Anuradha gave as
>>>>>> that would allow self-heal to at least detect that it
has some repairing to
>>>>>> do whenever it is restarted whether intentionally or
otherwise.
>>>>>>
>>>>>
>>>>> I followed those steps as described on my test box and
ended up with
>>>>> exact same outcome of adding shards at an agonizing slow
pace and no
>>>>> creation of .shard directory or heals on shard directory. 
Directories
>>>>> visible from mount healed quickly.  This was with one VM so
it has only 800
>>>>> shards as well.  After hours at work it had added a total
of 33 shards to
>>>>> be healed.  I sent those logs yesterday as well though not
the glustershd.
>>>>>
>>>>> Does replace-brick command copy files in same manner?  For
these
>>>>> purposes I am contemplating just skipping the heal route.
>>>>>
>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> attached brick and client logs from test machine
where same behavior
>>>>>>> occurred not sure if anything new is there.  its
still on 3.8.2
>>>>>>>
>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>> Options Reconfigured:
>>>>>>> cluster.locking-scheme: granular
>>>>>>> performance.strict-o-direct: off
>>>>>>> features.shard-block-size: 64MB
>>>>>>> features.shard: on
>>>>>>> server.allow-insecure: on
>>>>>>> storage.owner-uid: 36
>>>>>>> storage.owner-gid: 36
>>>>>>> cluster.server-quorum-type: server
>>>>>>> cluster.quorum-type: auto
>>>>>>> network.remote-dio: on
>>>>>>> cluster.eager-lock: enable
>>>>>>> performance.stat-prefetch: off
>>>>>>> performance.io-cache: off
>>>>>>> performance.quick-read: off
>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>> cluster.background-self-heal-count: 16
>>>>>>> nfs.enable-ino32: off
>>>>>>> nfs.addr-namelookup: off
>>>>>>> nfs.disable: on
>>>>>>> performance.read-ahead: off
>>>>>>> performance.readdir-ahead: on
>>>>>>> cluster.granular-entry-heal: on
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur
<atalur at redhat.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>> > From: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>>> > To: "Anuradha Talur"
<atalur at redhat.com>
>>>>>>>>> > Cc: "gluster-users at gluster.org
List" <Gluster-users at gluster.org>,
>>>>>>>>> "Krutika Dhananjay" <kdhananj
at redhat.com>
>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42
PM
>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3
Shards Healing Glacier Slow
>>>>>>>>> >
>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM,
Anuradha Talur <
>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>> >
>>>>>>>>> > > Response inline.
>>>>>>>>> > >
>>>>>>>>> > > ----- Original Message -----
>>>>>>>>> > > > From: "Krutika
Dhananjay" <kdhananj at redhat.com>
>>>>>>>>> > > > To: "David
Gossage" <dgossage at carouselchecks.com>
>>>>>>>>> > > > Cc: "gluster-users at
gluster.org List" <
>>>>>>>>> Gluster-users at gluster.org>
>>>>>>>>> > > > Sent: Monday, August 29,
2016 3:55:04 PM
>>>>>>>>> > > > Subject: Re: [Gluster-users]
3.8.3 Shards Healing Glacier
>>>>>>>>> Slow
>>>>>>>>> > > >
>>>>>>>>> > > > Could you attach both client
and brick logs? Meanwhile I
>>>>>>>>> will try these
>>>>>>>>> > > steps
>>>>>>>>> > > > out on my machines and see
if it is easily recreatable.
>>>>>>>>> > > >
>>>>>>>>> > > > -Krutika
>>>>>>>>> > > >
>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31
PM, David Gossage <
>>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>>> > > > > wrote:
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>>> > > >
>>>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>> > > > Options Reconfigured:
>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>>> > > > cluster.locking-scheme:
granular
>>>>>>>>> > > > features.shard-block-size:
64MB
>>>>>>>>> > > > features.shard: on
>>>>>>>>> > > > performance.readdir-ahead:
on
>>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>>> > > > performance.quick-read: off
>>>>>>>>> > > > performance.read-ahead: off
>>>>>>>>> > > > performance.io-cache: off
>>>>>>>>> > > > performance.stat-prefetch:
on
>>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>>> > > > network.remote-dio: enable
>>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>>> > > > cluster.server-quorum-type:
server
>>>>>>>>> > > > server.allow-insecure: on
>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>> > > > nfs.disable: on
>>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>>> > > > cluster.granular-entry-heal:
on
>>>>>>>>> > > >
>>>>>>>>> > > > Friday did rolling upgrade
from 3.8.3->3.8.3 no issues.
>>>>>>>>> > > > Following steps detailed in
previous recommendations began
>>>>>>>>> proces of
>>>>>>>>> > > > replacing and healngbricks
one node at a time.
>>>>>>>>> > > >
>>>>>>>>> > > > 1) kill pid of brick
>>>>>>>>> > > > 2) reconfigure brick from
raid6 to raid10
>>>>>>>>> > > > 3) recreate directory of
brick
>>>>>>>>> > > > 4) gluster volume start
<> force
>>>>>>>>> > > > 5) gluster volume heal
<> full
>>>>>>>>> > > Hi,
>>>>>>>>> > >
>>>>>>>>> > > I'd suggest that full heal is
not used. There are a few bugs
>>>>>>>>> in full heal.
>>>>>>>>> > > Better safe than sorry ;)
>>>>>>>>> > > Instead I'd suggest the
following steps:
>>>>>>>>> > >
>>>>>>>>> > > Currently I brought the node down
by systemctl stop glusterd
>>>>>>>>> as I was
>>>>>>>>> > getting sporadic io issues and a few
VM's paused so hoping that
>>>>>>>>> will help.
>>>>>>>>> > I may wait to do this till around 4PM
when most work is done in
>>>>>>>>> case it
>>>>>>>>> > shoots load up.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > > 1) kill pid of brick
>>>>>>>>> > > 2) to configuring of brick that
you need
>>>>>>>>> > > 3) recreate brick dir
>>>>>>>>> > > 4) while the brick is still down,
from the mount point:
>>>>>>>>> > >    a) create a dummy non existent
dir under / of mount.
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>> > so if noee 2 is down brick, pick node
for example 3 and make a
>>>>>>>>> test dir
>>>>>>>>> > under its brick directory that doesnt
exist on 2 or should I be
>>>>>>>>> dong this
>>>>>>>>> > over a gluster mount?
>>>>>>>>> You should be doing this over gluster
mount.
>>>>>>>>> >
>>>>>>>>> > >    b) set a non existent extended
attribute on / of mount.
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>> > Could you give me an example of an
attribute to set?   I've read
>>>>>>>>> a tad on
>>>>>>>>> > this, and looked up attributes but
haven't set any yet myself.
>>>>>>>>> >
>>>>>>>>> Sure. setfattr -n
"user.some-name" -v "some-value" <path-to-mount>
>>>>>>>>> > Doing these steps will ensure that
heal happens only from
>>>>>>>>> updated brick to
>>>>>>>>> > > down brick.
>>>>>>>>> > > 5) gluster v start <> force
>>>>>>>>> > > 6) gluster v heal <>
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>> > Will it matter if somewhere in gluster
the full heal command was
>>>>>>>>> run other
>>>>>>>>> > day?  Not sure if it eventually stops
or times out.
>>>>>>>>> >
>>>>>>>>> full heal will stop once the crawl is done.
So if you want to
>>>>>>>>> trigger heal again,
>>>>>>>>> run gluster v heal <>. Actually even
brick up or volume start
>>>>>>>>> force should
>>>>>>>>> trigger the heal.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Did this on test bed today.  its one server
with 3 bricks on same
>>>>>>>> machine so take that for what its worth.  also
it still runs 3.8.2.  Maybe
>>>>>>>> ill update and re-run test.
>>>>>>>>
>>>>>>>> killed brick
>>>>>>>> deleted brick dir
>>>>>>>> recreated brick dir
>>>>>>>> created fake dir on gluster mount
>>>>>>>> set suggested fake attribute on it
>>>>>>>> ran volume start <> force
>>>>>>>>
>>>>>>>> looked at files it said needed healing and it
was just 8 shards
>>>>>>>> that were modified for few minutes I ran
through steps
>>>>>>>>
>>>>>>>> gave it few minutes and it stayed same
>>>>>>>> ran gluster volume <> heal
>>>>>>>>
>>>>>>>> it healed all the directories and files you can
see over mount
>>>>>>>> including fakedir.
>>>>>>>>
>>>>>>>> same issue for shards though.  it adds more
shards to heal at
>>>>>>>> glacier pace.  slight jump in speed if I stat
every file and dir in VM
>>>>>>>> running but not all shards.
>>>>>>>>
>>>>>>>> It started with 8 shards to heal and is now
only at 33 out of 800
>>>>>>>> and probably wont finish adding for few days at
rate it goes.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> > >
>>>>>>>>> > > > 1st node worked as expected
took 12 hours to heal 1TB data.
>>>>>>>>> Load was
>>>>>>>>> > > little
>>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>>> > > >
>>>>>>>>> > > > About an hour after node 1
finished I began same process on
>>>>>>>>> node2. Heal
>>>>>>>>> > > > proces kicked in as before
and the files in directories
>>>>>>>>> visible from
>>>>>>>>> > > mount
>>>>>>>>> > > > and .glusterfs healed in
short time. Then it began crawl of
>>>>>>>>> .shard adding
>>>>>>>>> > > > those files to heal count at
which point the entire proces
>>>>>>>>> ground to a
>>>>>>>>> > > halt
>>>>>>>>> > > > basically. After 48 hours
out of 19k shards it has added
>>>>>>>>> 5900 to heal
>>>>>>>>> > > list.
>>>>>>>>> > > > Load on all 3 machnes is
negligible. It was suggested to
>>>>>>>>> change this
>>>>>>>>> > > value
>>>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart volume
>>>>>>>>> which I
>>>>>>>>> > > did. No
>>>>>>>>> > > > efffect. Tried relaunching
heal no effect, despite any node
>>>>>>>>> picked. I
>>>>>>>>> > > > started each VM and
performed a stat of all files from
>>>>>>>>> within it, or a
>>>>>>>>> > > full
>>>>>>>>> > > > virus scan and that seemed
to cause short small spikes in
>>>>>>>>> shards added,
>>>>>>>>> > > but
>>>>>>>>> > > > not by much. Logs are
showing no real messages indicating
>>>>>>>>> anything is
>>>>>>>>> > > going
>>>>>>>>> > > > on. I get hits to brick log
on occasion of null lookups
>>>>>>>>> making me think
>>>>>>>>> > > its
>>>>>>>>> > > > not really crawling shards
directory but waiting for a shard
>>>>>>>>> lookup to
>>>>>>>>> > > add
>>>>>>>>> > > > it. I'll get following
in brick log but not constant and
>>>>>>>>> sometime
>>>>>>>>> > > multiple
>>>>>>>>> > > > for same shard.
>>>>>>>>> > > >
>>>>>>>>> > > > [2016-08-29 08:31:57.478125]
W [MSGID: 115009]
>>>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>>>> resolution
>>>>>>>>> > > type
>>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>>> > > > [2016-08-29 08:31:57.478170]
E [MSGID: 115050]
>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>> 0-GLUSTER1-server: 12591783:
>>>>>>>>> > > > LOOKUP (null)
(00000000-0000-0000-00
>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>> ==> (Invalid
>>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>>> > > >
>>>>>>>>> > > > This one repeated about 30
times in row then nothing for 10
>>>>>>>>> minutes then
>>>>>>>>> > > one
>>>>>>>>> > > > hit for one different shard
by itself.
>>>>>>>>> > > >
>>>>>>>>> > > > How can I determine if Heal
is actually running? How can I
>>>>>>>>> kill it or
>>>>>>>>> > > force
>>>>>>>>> > > > restart? Does node I start
it from determine which directory
>>>>>>>>> gets
>>>>>>>>> > > crawled to
>>>>>>>>> > > > determine heals?
>>>>>>>>> > > >
>>>>>>>>> > > > David Gossage
>>>>>>>>> > > > Carousel Checks Inc. |
System Administrator
>>>>>>>>> > > > Office 708.613.2284
>>>>>>>>> > > >
>>>>>>>>> > > >
_______________________________________________
>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>> > > >
>>>>>>>>> > > >
>>>>>>>>> > > >
_______________________________________________
>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>> > >
>>>>>>>>> > > --
>>>>>>>>> > > Thanks,
>>>>>>>>> > > Anuradha.
>>>>>>>>> > >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>> Anuradha.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/e0cdfd6c/attachment.html>

Gluster users - Aug 2016 - 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow

[Gluster-users] 3.8.3 Shards Healing Glacier Slow