On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <dgossage at
carouselchecks.com> wrote:
> updated test server to 3.8.3
>
> Brick1: 192.168.71.10:/gluster2/brick1/1
> Brick2: 192.168.71.11:/gluster2/brick2/1
> Brick3: 192.168.71.12:/gluster2/brick3/1
> Options Reconfigured:
> cluster.granular-entry-heal: on
> performance.readdir-ahead: on
> performance.read-ahead: off
> nfs.disable: on
> nfs.addr-namelookup: off
> nfs.enable-ino32: off
> cluster.background-self-heal-count: 16
> cluster.self-heal-window-size: 1024
> performance.quick-read: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-gid: 36
> storage.owner-uid: 36
> server.allow-insecure: on
> features.shard: on
> features.shard-block-size: 64MB
> performance.strict-o-direct: off
> cluster.locking-scheme: granular
>
> kill -15 brickpid
> rm -Rf /gluster2/brick3
> mkdir -p /gluster2/brick3/1
> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_
> glustershard/fake2
> setfattr -n "user.some-name" -v "some-value"
/rhev/data-center/mnt/
> glusterSD/192.168.71.10\:_glustershard/fake2
> gluster v start glustershard force
>
> at this point brick process starts and all visible files including new dir
> are made on brick
> handful of shards are in heal statistics still but no .shard directory
> created and no increase in shard count
>
> gluster v heal glustershard
>
> At this point still no increase in count or dir made no additional
> activity in logs for healing generated. waited few minutes tailing logs to
> check if anything kicked in.
>
> gluster v heal glustershard full
>
> gluster shards added to list and heal commences. logs show full sweep
> starting on all 3 nodes. though this time it only shows as finishing on
> one which looks to be the one that had brick deleted.
>
> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-0
> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-1
> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-2
> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
> finished full sweep on subvol glustershard-client-2
>
Just realized its still healing so that may be why sweep on 2 other bricks
haven't replied as finished.
>
>
> my hope is that later tonight a full heal will work on production. Is it
> possible self-heal daemon can get stale or stop listening but still show as
> active? Would stopping and starting self-heal daemon from gluster cli
> before doing these heals be helpful?
>
>
> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
> dgossage at carouselchecks.com> wrote:
>
>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj
at redhat.com
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <
>>>>>> kdhananj at redhat.com> wrote:
>>>>>>
>>>>>>> Could you also share the glustershd logs?
>>>>>>>
>>>>>>
>>>>>> I'll get them when I get to work sure
>>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> I tried the same steps that you mentioned multiple
times, but heal
>>>>>>> is running to completion without any issues.
>>>>>>>
>>>>>>> It must be said that 'heal full' traverses
the files and directories
>>>>>>> in a depth-first order and does heals also in the
same order. But if it
>>>>>>> gets interrupted in the middle (say because
self-heal-daemon was either
>>>>>>> intentionally or unintentionally brought offline
and then brought back up),
>>>>>>> self-heal will only pick up the entries that are so
far marked as
>>>>>>> new-entries that need heal which it will find in
indices/xattrop directory.
>>>>>>> What this means is that those files and directories
that were not visited
>>>>>>> during the crawl, will remain untouched and
unhealed in this second
>>>>>>> iteration of heal, unless you execute a
'heal-full' again.
>>>>>>>
>>>>>>
>>>>>> So should it start healing shards as it crawls or not
until after it
>>>>>> crawls the entire .shard directory? At the pace it was
going that could be
>>>>>> a week with one node appearing in the cluster but with
no shard files if
>>>>>> anything tries to access a file on that node. From my
experience other day
>>>>>> telling it to heal full again did nothing regardless of
node used.
>>>>>>
>>>>>
>>>> Crawl is started from '/' of the volume. Whenever
self-heal detects
>>>> during the crawl that a file or directory is present in some
brick(s) and
>>>> absent in others, it creates the file on the bricks where it is
absent and
>>>> marks the fact that the file or directory might need data/entry
and
>>>> metadata heal too (this also means that an index is created
under
>>>> .glusterfs/indices/xattrop of the src bricks). And the
data/entry and
>>>> metadata heal are picked up and done in
>>>>
>>> the background with the help of these indices.
>>>>
>>>
>>> Looking at my 3rd node as example i find nearly an exact same
number of
>>> files in xattrop dir as reported by heal count at time I brought
down node2
>>> to try and alleviate read io errors that seemed to occur from what
I was
>>> guessing as attempts to use the node with no shards for reads.
>>>
>>> Also attached are the glustershd logs from the 3 nodes, along with
the
>>> test node i tried yesterday with same results.
>>>
>>
>> Looking at my own logs I notice that a full sweep was only ever
recorded
>> in glustershd.log on 2nd node with missing directory. I believe I
should
>> have found a sweep begun on every node correct?
>>
>> On my test dev when it did work I do see that
>>
>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
>> starting full sweep on subvol glustershard-client-0
>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
>> starting full sweep on subvol glustershard-client-1
>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
>> starting full sweep on subvol glustershard-client-2
>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
>> finished full sweep on subvol glustershard-client-2
>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
>> finished full sweep on subvol glustershard-client-1
>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
>> finished full sweep on subvol glustershard-client-0
>>
>> While when looking at past few days of the 3 prod nodes i only found
that
>> on my 2nd node
>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>>
>>
>>
>>
>>
>>>
>>>>
>>>>>>
>>>>>>> My suspicion is that this is what happened on your
setup. Could you
>>>>>>> confirm if that was the case?
>>>>>>>
>>>>>>
>>>>>> Brick was brought online with force start then a full
heal launched.
>>>>>> Hours later after it became evident that it was not
adding new files to
>>>>>> heal I did try restarting self-heal daemon and
relaunching full heal again.
>>>>>> But this was after the heal had basically already
failed to work as
>>>>>> intended.
>>>>>>
>>>>>
>>>>> OK. How did you figure it was not adding any new files? I
need to know
>>>>> what places you were monitoring to come to this conclusion.
>>>>>
>>>>> -Krutika
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> As for those logs, I did manager to do something
that caused these
>>>>>>> warning messages you shared earlier to appear in my
client and server logs.
>>>>>>> Although these logs are annoying and a bit scary
too, they didn't do
>>>>>>> any harm to the data in my volume. Why they appear
just after a brick is
>>>>>>> replaced and under no other circumstances is
something I'm still
>>>>>>> investigating.
>>>>>>>
>>>>>>> But for future, it would be good to follow the
steps Anuradha gave
>>>>>>> as that would allow self-heal to at least detect
that it has some repairing
>>>>>>> to do whenever it is restarted whether
intentionally or otherwise.
>>>>>>>
>>>>>>
>>>>>> I followed those steps as described on my test box and
ended up with
>>>>>> exact same outcome of adding shards at an agonizing
slow pace and no
>>>>>> creation of .shard directory or heals on shard
directory. Directories
>>>>>> visible from mount healed quickly. This was with one
VM so it has only 800
>>>>>> shards as well. After hours at work it had added a
total of 33 shards to
>>>>>> be healed. I sent those logs yesterday as well though
not the glustershd.
>>>>>>
>>>>>> Does replace-brick command copy files in same manner?
For these
>>>>>> purposes I am contemplating just skipping the heal
route.
>>>>>>
>>>>>>
>>>>>>> -Krutika
>>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> attached brick and client logs from test
machine where same
>>>>>>>> behavior occurred not sure if anything new is
there. its still on 3.8.2
>>>>>>>>
>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>>> Options Reconfigured:
>>>>>>>> cluster.locking-scheme: granular
>>>>>>>> performance.strict-o-direct: off
>>>>>>>> features.shard-block-size: 64MB
>>>>>>>> features.shard: on
>>>>>>>> server.allow-insecure: on
>>>>>>>> storage.owner-uid: 36
>>>>>>>> storage.owner-gid: 36
>>>>>>>> cluster.server-quorum-type: server
>>>>>>>> cluster.quorum-type: auto
>>>>>>>> network.remote-dio: on
>>>>>>>> cluster.eager-lock: enable
>>>>>>>> performance.stat-prefetch: off
>>>>>>>> performance.io-cache: off
>>>>>>>> performance.quick-read: off
>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>> nfs.enable-ino32: off
>>>>>>>> nfs.addr-namelookup: off
>>>>>>>> nfs.disable: on
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage
<
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha
Talur <atalur at redhat.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>> > From: "David Gossage"
<dgossage at carouselchecks.com>
>>>>>>>>>> > To: "Anuradha Talur"
<atalur at redhat.com>
>>>>>>>>>> > Cc: "gluster-users at
gluster.org List" <Gluster-users at gluster.org>,
>>>>>>>>>> "Krutika Dhananjay"
<kdhananj at redhat.com>
>>>>>>>>>> > Sent: Monday, August 29, 2016
5:12:42 PM
>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3
Shards Healing Glacier Slow
>>>>>>>>>> >
>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM,
Anuradha Talur <
>>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > > Response inline.
>>>>>>>>>> > >
>>>>>>>>>> > > ----- Original Message -----
>>>>>>>>>> > > > From: "Krutika
Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>> > > > To: "David
Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>> > > > Cc: "gluster-users
at gluster.org List" <
>>>>>>>>>> Gluster-users at gluster.org>
>>>>>>>>>> > > > Sent: Monday, August 29,
2016 3:55:04 PM
>>>>>>>>>> > > > Subject: Re:
[Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>> Slow
>>>>>>>>>> > > >
>>>>>>>>>> > > > Could you attach both
client and brick logs? Meanwhile I
>>>>>>>>>> will try these
>>>>>>>>>> > > steps
>>>>>>>>>> > > > out on my machines and
see if it is easily recreatable.
>>>>>>>>>> > > >
>>>>>>>>>> > > > -Krutika
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Mon, Aug 29, 2016 at
2:31 PM, David Gossage <
>>>>>>>>>> > > dgossage at
carouselchecks.com
>>>>>>>>>> > > > > wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>>>> > > >
>>>>>>>>>> > > > Brick1:
ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>> > > > Brick2:
ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>> > > > Brick3:
ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>> > > > Options Reconfigured:
>>>>>>>>>> > > >
cluster.data-self-heal-algorithm: full
>>>>>>>>>> > > >
cluster.self-heal-daemon: on
>>>>>>>>>> > > > cluster.locking-scheme:
granular
>>>>>>>>>> > > >
features.shard-block-size: 64MB
>>>>>>>>>> > > > features.shard: on
>>>>>>>>>> > > >
performance.readdir-ahead: on
>>>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>>>> > > > performance.quick-read:
off
>>>>>>>>>> > > > performance.read-ahead:
off
>>>>>>>>>> > > > performance.io-cache:
off
>>>>>>>>>> > > >
performance.stat-prefetch: on
>>>>>>>>>> > > > cluster.eager-lock:
enable
>>>>>>>>>> > > > network.remote-dio:
enable
>>>>>>>>>> > > > cluster.quorum-type:
auto
>>>>>>>>>> > > >
cluster.server-quorum-type: server
>>>>>>>>>> > > > server.allow-insecure:
on
>>>>>>>>>> > > >
cluster.self-heal-window-size: 1024
>>>>>>>>>> > > >
cluster.background-self-heal-count: 16
>>>>>>>>>> > > >
performance.strict-write-ordering: off
>>>>>>>>>> > > > nfs.disable: on
>>>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>>>> > > >
cluster.granular-entry-heal: on
>>>>>>>>>> > > >
>>>>>>>>>> > > > Friday did rolling
upgrade from 3.8.3->3.8.3 no issues.
>>>>>>>>>> > > > Following steps detailed
in previous recommendations began
>>>>>>>>>> proces of
>>>>>>>>>> > > > replacing and
healngbricks one node at a time.
>>>>>>>>>> > > >
>>>>>>>>>> > > > 1) kill pid of brick
>>>>>>>>>> > > > 2) reconfigure brick
from raid6 to raid10
>>>>>>>>>> > > > 3) recreate directory of
brick
>>>>>>>>>> > > > 4) gluster volume start
<> force
>>>>>>>>>> > > > 5) gluster volume heal
<> full
>>>>>>>>>> > > Hi,
>>>>>>>>>> > >
>>>>>>>>>> > > I'd suggest that full
heal is not used. There are a few bugs
>>>>>>>>>> in full heal.
>>>>>>>>>> > > Better safe than sorry ;)
>>>>>>>>>> > > Instead I'd suggest the
following steps:
>>>>>>>>>> > >
>>>>>>>>>> > > Currently I brought the node
down by systemctl stop glusterd
>>>>>>>>>> as I was
>>>>>>>>>> > getting sporadic io issues and a
few VM's paused so hoping that
>>>>>>>>>> will help.
>>>>>>>>>> > I may wait to do this till around
4PM when most work is done in
>>>>>>>>>> case it
>>>>>>>>>> > shoots load up.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > > 1) kill pid of brick
>>>>>>>>>> > > 2) to configuring of brick
that you need
>>>>>>>>>> > > 3) recreate brick dir
>>>>>>>>>> > > 4) while the brick is still
down, from the mount point:
>>>>>>>>>> > > a) create a dummy non
existent dir under / of mount.
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> > so if noee 2 is down brick, pick
node for example 3 and make a
>>>>>>>>>> test dir
>>>>>>>>>> > under its brick directory that
doesnt exist on 2 or should I be
>>>>>>>>>> dong this
>>>>>>>>>> > over a gluster mount?
>>>>>>>>>> You should be doing this over gluster
mount.
>>>>>>>>>> >
>>>>>>>>>> > > b) set a non existent
extended attribute on / of mount.
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> > Could you give me an example of an
attribute to set? I've
>>>>>>>>>> read a tad on
>>>>>>>>>> > this, and looked up attributes but
haven't set any yet myself.
>>>>>>>>>> >
>>>>>>>>>> Sure. setfattr -n
"user.some-name" -v "some-value" <path-to-mount>
>>>>>>>>>> > Doing these steps will ensure that
heal happens only from
>>>>>>>>>> updated brick to
>>>>>>>>>> > > down brick.
>>>>>>>>>> > > 5) gluster v start <>
force
>>>>>>>>>> > > 6) gluster v heal <>
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> > Will it matter if somewhere in
gluster the full heal command
>>>>>>>>>> was run other
>>>>>>>>>> > day? Not sure if it eventually
stops or times out.
>>>>>>>>>> >
>>>>>>>>>> full heal will stop once the crawl is
done. So if you want to
>>>>>>>>>> trigger heal again,
>>>>>>>>>> run gluster v heal <>. Actually
even brick up or volume start
>>>>>>>>>> force should
>>>>>>>>>> trigger the heal.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Did this on test bed today. its one server
with 3 bricks on same
>>>>>>>>> machine so take that for what its worth.
also it still runs 3.8.2. Maybe
>>>>>>>>> ill update and re-run test.
>>>>>>>>>
>>>>>>>>> killed brick
>>>>>>>>> deleted brick dir
>>>>>>>>> recreated brick dir
>>>>>>>>> created fake dir on gluster mount
>>>>>>>>> set suggested fake attribute on it
>>>>>>>>> ran volume start <> force
>>>>>>>>>
>>>>>>>>> looked at files it said needed healing and
it was just 8 shards
>>>>>>>>> that were modified for few minutes I ran
through steps
>>>>>>>>>
>>>>>>>>> gave it few minutes and it stayed same
>>>>>>>>> ran gluster volume <> heal
>>>>>>>>>
>>>>>>>>> it healed all the directories and files you
can see over mount
>>>>>>>>> including fakedir.
>>>>>>>>>
>>>>>>>>> same issue for shards though. it adds more
shards to heal at
>>>>>>>>> glacier pace. slight jump in speed if I
stat every file and dir in VM
>>>>>>>>> running but not all shards.
>>>>>>>>>
>>>>>>>>> It started with 8 shards to heal and is now
only at 33 out of 800
>>>>>>>>> and probably wont finish adding for few
days at rate it goes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> > >
>>>>>>>>>> > > > 1st node worked as
expected took 12 hours to heal 1TB data.
>>>>>>>>>> Load was
>>>>>>>>>> > > little
>>>>>>>>>> > > > heavy but nothing
shocking.
>>>>>>>>>> > > >
>>>>>>>>>> > > > About an hour after node
1 finished I began same process on
>>>>>>>>>> node2. Heal
>>>>>>>>>> > > > proces kicked in as
before and the files in directories
>>>>>>>>>> visible from
>>>>>>>>>> > > mount
>>>>>>>>>> > > > and .glusterfs healed in
short time. Then it began crawl of
>>>>>>>>>> .shard adding
>>>>>>>>>> > > > those files to heal
count at which point the entire proces
>>>>>>>>>> ground to a
>>>>>>>>>> > > halt
>>>>>>>>>> > > > basically. After 48
hours out of 19k shards it has added
>>>>>>>>>> 5900 to heal
>>>>>>>>>> > > list.
>>>>>>>>>> > > > Load on all 3 machnes is
negligible. It was suggested to
>>>>>>>>>> change this
>>>>>>>>>> > > value
>>>>>>>>>> > > > to full
cluster.data-self-heal-algorithm and restart
>>>>>>>>>> volume which I
>>>>>>>>>> > > did. No
>>>>>>>>>> > > > efffect. Tried
relaunching heal no effect, despite any node
>>>>>>>>>> picked. I
>>>>>>>>>> > > > started each VM and
performed a stat of all files from
>>>>>>>>>> within it, or a
>>>>>>>>>> > > full
>>>>>>>>>> > > > virus scan and that
seemed to cause short small spikes in
>>>>>>>>>> shards added,
>>>>>>>>>> > > but
>>>>>>>>>> > > > not by much. Logs are
showing no real messages indicating
>>>>>>>>>> anything is
>>>>>>>>>> > > going
>>>>>>>>>> > > > on. I get hits to brick
log on occasion of null lookups
>>>>>>>>>> making me think
>>>>>>>>>> > > its
>>>>>>>>>> > > > not really crawling
shards directory but waiting for a
>>>>>>>>>> shard lookup to
>>>>>>>>>> > > add
>>>>>>>>>> > > > it. I'll get
following in brick log but not constant and
>>>>>>>>>> sometime
>>>>>>>>>> > > multiple
>>>>>>>>>> > > > for same shard.
>>>>>>>>>> > > >
>>>>>>>>>> > > > [2016-08-29
08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>> > > >
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server:
>>>>>>>>>> no resolution
>>>>>>>>>> > > type
>>>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>>>> > > > [2016-08-29
08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>> > > >
[server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>> 0-GLUSTER1-server: 12591783:
>>>>>>>>>> > > > LOOKUP (null)
(00000000-0000-0000-00
>>>>>>>>>> > > >
00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>> ==> (Invalid
>>>>>>>>>> > > > argument) [Invalid
argument]
>>>>>>>>>> > > >
>>>>>>>>>> > > > This one repeated about
30 times in row then nothing for 10
>>>>>>>>>> minutes then
>>>>>>>>>> > > one
>>>>>>>>>> > > > hit for one different
shard by itself.
>>>>>>>>>> > > >
>>>>>>>>>> > > > How can I determine if
Heal is actually running? How can I
>>>>>>>>>> kill it or
>>>>>>>>>> > > force
>>>>>>>>>> > > > restart? Does node I
start it from determine which
>>>>>>>>>> directory gets
>>>>>>>>>> > > crawled to
>>>>>>>>>> > > > determine heals?
>>>>>>>>>> > > >
>>>>>>>>>> > > > David Gossage
>>>>>>>>>> > > > Carousel Checks Inc. |
System Administrator
>>>>>>>>>> > > > Office 708.613.2284
>>>>>>>>>> > > >
>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>> > > > Gluster-users mailing
list
>>>>>>>>>> > > > Gluster-users at
gluster.org
>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > >
_______________________________________________
>>>>>>>>>> > > > Gluster-users mailing
list
>>>>>>>>>> > > > Gluster-users at
gluster.org
>>>>>>>>>> > > >
http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>> > >
>>>>>>>>>> > > --
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > > Anuradha.
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks,
>>>>>>>>>> Anuradha.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160830/df0d9839/attachment.html>