On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
wrote:
>
>
> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>>> wrote:
>>>
>>>> Could you also share the glustershd logs?
>>>>
>>>
>>> I'll get them when I get to work sure
>>>
>>
>>>
>>>>
>>>> I tried the same steps that you mentioned multiple times, but
heal is
>>>> running to completion without any issues.
>>>>
>>>> It must be said that 'heal full' traverses the files
and directories in
>>>> a depth-first order and does heals also in the same order. But
if it gets
>>>> interrupted in the middle (say because self-heal-daemon was
either
>>>> intentionally or unintentionally brought offline and then
brought back up),
>>>> self-heal will only pick up the entries that are so far marked
as
>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>> What this means is that those files and directories that were
not visited
>>>> during the crawl, will remain untouched and unhealed in this
second
>>>> iteration of heal, unless you execute a 'heal-full'
again.
>>>>
>>>
>>> So should it start healing shards as it crawls or not until after
it
>>> crawls the entire .shard directory? At the pace it was going that
could be
>>> a week with one node appearing in the cluster but with no shard
files if
>>> anything tries to access a file on that node. From my experience
other day
>>> telling it to heal full again did nothing regardless of node used.
>>>
>>
> Crawl is started from '/' of the volume. Whenever self-heal detects
during
> the crawl that a file or directory is present in some brick(s) and absent
> in others, it creates the file on the bricks where it is absent and marks
> the fact that the file or directory might need data/entry and metadata heal
> too (this also means that an index is created under
> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
> metadata heal are picked up and done in
>
the background with the help of these indices.>
Looking at my 3rd node as example i find nearly an exact same number of
files in xattrop dir as reported by heal count at time I brought down node2
to try and alleviate read io errors that seemed to occur from what I was
guessing as attempts to use the node with no shards for reads.
Also attached are the glustershd logs from the 3 nodes, along with the test
node i tried yesterday with same results.
>
>
>>>
>>>> My suspicion is that this is what happened on your setup. Could
you
>>>> confirm if that was the case?
>>>>
>>>
>>> Brick was brought online with force start then a full heal
launched.
>>> Hours later after it became evident that it was not adding new
files to
>>> heal I did try restarting self-heal daemon and relaunching full
heal again.
>>> But this was after the heal had basically already failed to work as
>>> intended.
>>>
>>
>> OK. How did you figure it was not adding any new files? I need to know
>> what places you were monitoring to come to this conclusion.
>>
>> -Krutika
>>
>>
>>>
>>>
>>>> As for those logs, I did manager to do something that caused
these
>>>> warning messages you shared earlier to appear in my client and
server logs.
>>>> Although these logs are annoying and a bit scary too, they
didn't do
>>>> any harm to the data in my volume. Why they appear just after a
brick is
>>>> replaced and under no other circumstances is something I'm
still
>>>> investigating.
>>>>
>>>> But for future, it would be good to follow the steps Anuradha
gave as
>>>> that would allow self-heal to at least detect that it has some
repairing to
>>>> do whenever it is restarted whether intentionally or otherwise.
>>>>
>>>
>>> I followed those steps as described on my test box and ended up
with
>>> exact same outcome of adding shards at an agonizing slow pace and
no
>>> creation of .shard directory or heals on shard directory.
Directories
>>> visible from mount healed quickly. This was with one VM so it has
only 800
>>> shards as well. After hours at work it had added a total of 33
shards to
>>> be healed. I sent those logs yesterday as well though not the
glustershd.
>>>
>>> Does replace-brick command copy files in same manner? For these
>>> purposes I am contemplating just skipping the heal route.
>>>
>>>
>>>> -Krutika
>>>>
>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> attached brick and client logs from test machine where same
behavior
>>>>> occurred not sure if anything new is there. its still on
3.8.2
>>>>>
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>> Options Reconfigured:
>>>>> cluster.locking-scheme: granular
>>>>> performance.strict-o-direct: off
>>>>> features.shard-block-size: 64MB
>>>>> features.shard: on
>>>>> server.allow-insecure: on
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> cluster.server-quorum-type: server
>>>>> cluster.quorum-type: auto
>>>>> network.remote-dio: on
>>>>> cluster.eager-lock: enable
>>>>> performance.stat-prefetch: off
>>>>> performance.io-cache: off
>>>>> performance.quick-read: off
>>>>> cluster.self-heal-window-size: 1024
>>>>> cluster.background-self-heal-count: 16
>>>>> nfs.enable-ino32: off
>>>>> nfs.addr-namelookup: off
>>>>> nfs.disable: on
>>>>> performance.read-ahead: off
>>>>> performance.readdir-ahead: on
>>>>> cluster.granular-entry-heal: on
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur
<atalur at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> > From: "David Gossage" <dgossage
at carouselchecks.com>
>>>>>>> > To: "Anuradha Talur" <atalur at
redhat.com>
>>>>>>> > Cc: "gluster-users at gluster.org
List" <Gluster-users at gluster.org>,
>>>>>>> "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards
Healing Glacier Slow
>>>>>>> >
>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha
Talur <atalur at redhat.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Response inline.
>>>>>>> > >
>>>>>>> > > ----- Original Message -----
>>>>>>> > > > From: "Krutika Dhananjay"
<kdhananj at redhat.com>
>>>>>>> > > > To: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>> > > > Cc: "gluster-users at
gluster.org List" <
>>>>>>> Gluster-users at gluster.org>
>>>>>>> > > > Sent: Monday, August 29, 2016
3:55:04 PM
>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3
Shards Healing Glacier Slow
>>>>>>> > > >
>>>>>>> > > > Could you attach both client and
brick logs? Meanwhile I will
>>>>>>> try these
>>>>>>> > > steps
>>>>>>> > > > out on my machines and see if it is
easily recreatable.
>>>>>>> > > >
>>>>>>> > > > -Krutika
>>>>>>> > > >
>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM,
David Gossage <
>>>>>>> > > dgossage at carouselchecks.com
>>>>>>> > > > > wrote:
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>> > > >
>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Options Reconfigured:
>>>>>>> > > > cluster.data-self-heal-algorithm:
full
>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>> > > > features.shard: on
>>>>>>> > > > performance.readdir-ahead: on
>>>>>>> > > > storage.owner-uid: 36
>>>>>>> > > > storage.owner-gid: 36
>>>>>>> > > > performance.quick-read: off
>>>>>>> > > > performance.read-ahead: off
>>>>>>> > > > performance.io-cache: off
>>>>>>> > > > performance.stat-prefetch: on
>>>>>>> > > > cluster.eager-lock: enable
>>>>>>> > > > network.remote-dio: enable
>>>>>>> > > > cluster.quorum-type: auto
>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>> > > > server.allow-insecure: on
>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>> > > > cluster.background-self-heal-count:
16
>>>>>>> > > > performance.strict-write-ordering:
off
>>>>>>> > > > nfs.disable: on
>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>> > > > nfs.enable-ino32: off
>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>> > > >
>>>>>>> > > > Friday did rolling upgrade from
3.8.3->3.8.3 no issues.
>>>>>>> > > > Following steps detailed in previous
recommendations began
>>>>>>> proces of
>>>>>>> > > > replacing and healngbricks one node
at a time.
>>>>>>> > > >
>>>>>>> > > > 1) kill pid of brick
>>>>>>> > > > 2) reconfigure brick from raid6 to
raid10
>>>>>>> > > > 3) recreate directory of brick
>>>>>>> > > > 4) gluster volume start <>
force
>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>> > > Hi,
>>>>>>> > >
>>>>>>> > > I'd suggest that full heal is not
used. There are a few bugs in
>>>>>>> full heal.
>>>>>>> > > Better safe than sorry ;)
>>>>>>> > > Instead I'd suggest the following
steps:
>>>>>>> > >
>>>>>>> > > Currently I brought the node down by
systemctl stop glusterd as
>>>>>>> I was
>>>>>>> > getting sporadic io issues and a few VM's
paused so hoping that
>>>>>>> will help.
>>>>>>> > I may wait to do this till around 4PM when
most work is done in
>>>>>>> case it
>>>>>>> > shoots load up.
>>>>>>> >
>>>>>>> >
>>>>>>> > > 1) kill pid of brick
>>>>>>> > > 2) to configuring of brick that you need
>>>>>>> > > 3) recreate brick dir
>>>>>>> > > 4) while the brick is still down, from
the mount point:
>>>>>>> > > a) create a dummy non existent dir
under / of mount.
>>>>>>> > >
>>>>>>> >
>>>>>>> > so if noee 2 is down brick, pick node for
example 3 and make a
>>>>>>> test dir
>>>>>>> > under its brick directory that doesnt exist on
2 or should I be
>>>>>>> dong this
>>>>>>> > over a gluster mount?
>>>>>>> You should be doing this over gluster mount.
>>>>>>> >
>>>>>>> > > b) set a non existent extended
attribute on / of mount.
>>>>>>> > >
>>>>>>> >
>>>>>>> > Could you give me an example of an attribute
to set? I've read a
>>>>>>> tad on
>>>>>>> > this, and looked up attributes but haven't
set any yet myself.
>>>>>>> >
>>>>>>> Sure. setfattr -n "user.some-name" -v
"some-value" <path-to-mount>
>>>>>>> > Doing these steps will ensure that heal
happens only from updated
>>>>>>> brick to
>>>>>>> > > down brick.
>>>>>>> > > 5) gluster v start <> force
>>>>>>> > > 6) gluster v heal <>
>>>>>>> > >
>>>>>>> >
>>>>>>> > Will it matter if somewhere in gluster the
full heal command was
>>>>>>> run other
>>>>>>> > day? Not sure if it eventually stops or times
out.
>>>>>>> >
>>>>>>> full heal will stop once the crawl is done. So if
you want to
>>>>>>> trigger heal again,
>>>>>>> run gluster v heal <>. Actually even brick up
or volume start force
>>>>>>> should
>>>>>>> trigger the heal.
>>>>>>>
>>>>>>
>>>>>> Did this on test bed today. its one server with 3
bricks on same
>>>>>> machine so take that for what its worth. also it still
runs 3.8.2. Maybe
>>>>>> ill update and re-run test.
>>>>>>
>>>>>> killed brick
>>>>>> deleted brick dir
>>>>>> recreated brick dir
>>>>>> created fake dir on gluster mount
>>>>>> set suggested fake attribute on it
>>>>>> ran volume start <> force
>>>>>>
>>>>>> looked at files it said needed healing and it was just
8 shards that
>>>>>> were modified for few minutes I ran through steps
>>>>>>
>>>>>> gave it few minutes and it stayed same
>>>>>> ran gluster volume <> heal
>>>>>>
>>>>>> it healed all the directories and files you can see
over mount
>>>>>> including fakedir.
>>>>>>
>>>>>> same issue for shards though. it adds more shards to
heal at glacier
>>>>>> pace. slight jump in speed if I stat every file and
dir in VM running but
>>>>>> not all shards.
>>>>>>
>>>>>> It started with 8 shards to heal and is now only at 33
out of 800 and
>>>>>> probably wont finish adding for few days at rate it
goes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> > >
>>>>>>> > > > 1st node worked as expected took 12
hours to heal 1TB data.
>>>>>>> Load was
>>>>>>> > > little
>>>>>>> > > > heavy but nothing shocking.
>>>>>>> > > >
>>>>>>> > > > About an hour after node 1 finished
I began same process on
>>>>>>> node2. Heal
>>>>>>> > > > proces kicked in as before and the
files in directories
>>>>>>> visible from
>>>>>>> > > mount
>>>>>>> > > > and .glusterfs healed in short time.
Then it began crawl of
>>>>>>> .shard adding
>>>>>>> > > > those files to heal count at which
point the entire proces
>>>>>>> ground to a
>>>>>>> > > halt
>>>>>>> > > > basically. After 48 hours out of 19k
shards it has added 5900
>>>>>>> to heal
>>>>>>> > > list.
>>>>>>> > > > Load on all 3 machnes is negligible.
It was suggested to
>>>>>>> change this
>>>>>>> > > value
>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart volume
>>>>>>> which I
>>>>>>> > > did. No
>>>>>>> > > > efffect. Tried relaunching heal no
effect, despite any node
>>>>>>> picked. I
>>>>>>> > > > started each VM and performed a stat
of all files from within
>>>>>>> it, or a
>>>>>>> > > full
>>>>>>> > > > virus scan and that seemed to cause
short small spikes in
>>>>>>> shards added,
>>>>>>> > > but
>>>>>>> > > > not by much. Logs are showing no
real messages indicating
>>>>>>> anything is
>>>>>>> > > going
>>>>>>> > > > on. I get hits to brick log on
occasion of null lookups making
>>>>>>> me think
>>>>>>> > > its
>>>>>>> > > > not really crawling shards directory
but waiting for a shard
>>>>>>> lookup to
>>>>>>> > > add
>>>>>>> > > > it. I'll get following in brick
log but not constant and
>>>>>>> sometime
>>>>>>> > > multiple
>>>>>>> > > > for same shard.
>>>>>>> > > >
>>>>>>> > > > [2016-08-29 08:31:57.478125] W
[MSGID: 115009]
>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>> resolution
>>>>>>> > > type
>>>>>>> > > > for (null) (LOOKUP)
>>>>>>> > > > [2016-08-29 08:31:57.478170] E
[MSGID: 115050]
>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>> 12591783:
>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==>
>>>>>>> (Invalid
>>>>>>> > > > argument) [Invalid argument]
>>>>>>> > > >
>>>>>>> > > > This one repeated about 30 times in
row then nothing for 10
>>>>>>> minutes then
>>>>>>> > > one
>>>>>>> > > > hit for one different shard by
itself.
>>>>>>> > > >
>>>>>>> > > > How can I determine if Heal is
actually running? How can I
>>>>>>> kill it or
>>>>>>> > > force
>>>>>>> > > > restart? Does node I start it from
determine which directory
>>>>>>> gets
>>>>>>> > > crawled to
>>>>>>> > > > determine heals?
>>>>>>> > > >
>>>>>>> > > > David Gossage
>>>>>>> > > > Carousel Checks Inc. | System
Administrator
>>>>>>> > > > Office 708.613.2284
>>>>>>> > > >
>>>>>>> > > >
_______________________________________________
>>>>>>> > > > Gluster-users mailing list
>>>>>>> > > > Gluster-users at gluster.org
>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
_______________________________________________
>>>>>>> > > > Gluster-users mailing list
>>>>>>> > > > Gluster-users at gluster.org
>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > Thanks,
>>>>>>> > > Anuradha.
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Anuradha.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node1
Type: application/octet-stream
Size: 322716 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node2.gz
Type: application/x-gzip
Size: 645489 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node3.gz
Type: application/x-gzip
Size: 296635 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-testnode
Type: application/octet-stream
Size: 20910 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.obj>