Going to top post with solution Krutika Dhananjay came up with. His steps
were much less volatile and could be done with volume still being actively
used and also much less prone to accidental destruction.
My use case and issue were desire to wipe a brick and recreate with same
directory structure so as to change underlying raid setup of disks making
brick. Problem occurred that getting the shards to heal was 99% of the
time failing.
These are steps he provided that has been working well.
1) kill brick pid on server that you want to replace
kill -15 <brickpid>
2) do brick maintenance which in my case was:
zpool destroy <ZFSPOOL>
zpool create (options) yada yada disks
3) make sure original path to brick exists
mkdir /path/to/brick
4) set extended attribute on new brick path (not over gluster mount)
setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 /path/to/brick
5) create a mount point to volume
mkdir /mnt-brick-test
glusterfs --volfile-id=<VOLNAME> --volfile-server=<valid host or ip of
an
active gluster server> --client-pid=-6 /mnt-brick-test
6)set an extended attribute on the gluster network mount VOLNAME is the
gluster volume KILLEDBRICK# is the index of server needing heal. they
start from 0 and gluster v info should display them in order
setfattr -n trusted.replace-brick -v VOLNAME-client-KILLEDBRICK#
/mnt-brick-test
7) gluster heal should know show the / root of gluster volume in output
gluster v heal VOLNAME info
8) force start volume to bring up killed brick
gluster v start VOLNAME force
9) optionally watch heal progress and drink beer while you wait and hope
nothing blows up
watch -n 10 gluster v heal VOLNAME statistics heal-count
10) unmount gluster network mount from server
umount /mnt-brick-test
11) Praise the developers for their efforts
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284
On Thu, Sep 1, 2016 at 2:29 PM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> On Thu, Sep 1, 2016 at 12:09 AM, Krutika Dhananjay <kdhananj at
redhat.com>
> wrote:
>
>>
>>
>> On Wed, Aug 31, 2016 at 8:13 PM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> Just as a test I did not shut down the one VM on the cluster as
finding
>>> a window before weekend where I can shut down all VM's and fit
in a full
>>> heal is unlikely so wanted to see what occurs.
>>>
>>>
>>> kill -15 brick pid
>>> rm -Rf /gluster2/brick1/1
>>> mkdir /gluster2/brick1/1
>>> mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake3
>>> setfattr -n "user.some-name" -v "some-value"
>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>
>>> getfattr -d -m . -e hex /gluster2/brick2/1
>>> # file: gluster2/brick2/1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000001
>>> trusted.afr.glustershard-client-0=0x000000000000000200000000
>>>
>>
>> This is unusual. The last digit ought to have been 1 on account of
>> "fake3" being created while hte first brick is offline.
>>
>> This discussion is becoming unnecessary lengthy. Mind if we discuss
this
>> and sort it out on IRC today, at least the communication will be
continuous
>> and in real-time. I'm kdhananjay on #gluster (Freenode). Ping me
when
>> you're online.
>>
>> -Krutika
>>
>
> Thanks for assistance this morning. Looks like I lost connection in IRC
> and didn't realize it so sorry if you came back looking for me. Let me
> know when the steps you worked out have been reviewed and if it's found
> safe for production use and I'll give a try.
>
>
>
>>
>>
>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> getfattr -d -m . -e hex /gluster2/brick3/1
>>> # file: gluster2/brick3/1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000001
>>> trusted.afr.glustershard-client-0=0x000000000000000200000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> setfattr -n trusted.afr.glustershard-client-0 -v
>>> 0x000000010000000200000000 /gluster2/brick2/1
>>> setfattr -n trusted.afr.glustershard-client-0 -v
>>> 0x000000010000000200000000 /gluster2/brick3/1
>>>
>>> getfattr -d -m . -e hex /gluster2/brick3/1/
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster2/brick3/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.glustershard-client-0=0x000000010000000200000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> getfattr -d -m . -e hex /gluster2/brick2/1/
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster2/brick2/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.glustershard-client-0=0x000000010000000200000000
>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> gluster v start glustershard force
>>>
>>> gluster heal counts climbed up and down a little as it healed
everything
>>> in visible gluster mount and .glusterfs for visible mount files
then
>>> stalled with around 15 shards and the fake3 directory still in list
>>>
>>> getfattr -d -m . -e hex /gluster2/brick2/1/
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster2/brick2/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> getfattr -d -m . -e hex /gluster2/brick3/1/
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster2/brick3/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> getfattr -d -m . -e hex /gluster2/brick1/1/
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: gluster2/brick1/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> heal count stayed same for awhile then ran
>>>
>>> gluster v heal glustershard full
>>>
>>> heals jump up to 700 as shards actually get read in as needing
heals.
>>> glustershd shows 3 sweeps started one per brick
>>>
>>> It heals shards things look ok heal <> info shows 0 files but
statistics
>>> heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt
stop vm
>>> running?
>>>
>>> # file: gluster2/brick1/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> # file: gluster2/brick2/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> # file: gluster2/brick3/1/
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a756e6c6162656c65645f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>> user.some-name=0x736f6d652d76616c7565
>>>
>>> meta-data split-brain? heal <> info split-brain shows no
files or
>>> entries. If I had thought ahead I would have checked the values
returned
>>> by getfattr before, although I do know heal-count was returning 0
at the
>>> time
>>>
>>>
>>> Assuming I need to shut down vm's and put volume in maintenance
from
>>> ovirt to prevent any io. Does it need to occur for whole heal or
can I
>>> re-activate at some point to bring VM's back up?
>>>
>>>
>>>
>>>
>>> *David Gossage*
>>> *Carousel Checks Inc. | System Administrator*
>>> *Office* 708.613.2284
>>>
>>> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>>> wrote:
>>>
>>>> No, sorry, it's working fine. I may have missed some step
because of
>>>> which i saw that problem. /.shard is also healing fine now.
>>>>
>>>> Let me know if it works for you.
>>>>
>>>> -Krutika
>>>>
>>>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <
>>>> kdhananj at redhat.com> wrote:
>>>>
>>>>> OK I just hit the other issue too, where .shard doesn't
get healed. :)
>>>>>
>>>>> Investigating as to why that is the case. Give me some
time.
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <
>>>>> kdhananj at redhat.com> wrote:
>>>>>
>>>>>> Just figured the steps Anuradha has provided won't
work if granular
>>>>>> entry heal is on.
>>>>>> So when you bring down a brick and create fake2 under /
of the
>>>>>> volume, granular entry heal feature causes
>>>>>> sh to remember only the fact that 'fake2' needs
to be recreated on
>>>>>> the offline brick (because changelogs are granular).
>>>>>>
>>>>>> In this case, we would be required to indicate to
self-heal-daemon
>>>>>> that the entire directory tree from '/' needs
to be repaired on the brick
>>>>>> that contains no data.
>>>>>>
>>>>>> To fix this, I did the following (for users who use
granular entry
>>>>>> self-healing):
>>>>>>
>>>>>> 1. Kill the last brick process in the replica
(/bricks/3)
>>>>>>
>>>>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>>>>
>>>>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>>>>
>>>>>> 4. Create a new dir on the mount point:
>>>>>> [root at client-1 ~]# mkdir /mnt/fake
>>>>>>
>>>>>> 5. Set some fake xattr on the root of the volume, and
not the 'fake'
>>>>>> directory itself.
>>>>>> [root at client-1 ~]# setfattr -n
"user.some-name" -v "some-value"
>>>>>> /mnt
>>>>>>
>>>>>> 6. Make sure there's no io happening on your
volume.
>>>>>>
>>>>>> 7. Check the pending xattrs on the brick directories of
the two good
>>>>>> copies (on bricks 1 and 2), you should be seeing same
values as the one
>>>>>> marked in red in both bricks.
>>>>>> (note that the client-<num> xattr key will have
the same last digit
>>>>>> as the index of the brick that is down, when counting
from 0. So if the
>>>>>> first brick is the one that is down, it would read
trusted.afr.*-client-0;
>>>>>> if the second brick is the one that is empty and down,
it would read
>>>>>> trusted.afr.*-client-1 and so on).
>>>>>>
>>>>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>>>>> # file: 1
>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>
>>>>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>>>>> # file: 2
>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>
>>>>>> 8. Flip the 8th digit in the
trusted.afr.<VOLNAME>-client-2 to a 1.
>>>>>>
>>>>>> [root at server-1 ~]# setfattr -n
trusted.afr.rep-client-2 -v
>>>>>> *0x000000010000000100000001* /bricks/1
>>>>>> [root at server-2 ~]# setfattr -n
trusted.afr.rep-client-2 -v
>>>>>> *0x000000010000000100000001* /bricks/2
>>>>>>
>>>>>> 9. Get the xattrs again and check the xattrs are set
properly now
>>>>>>
>>>>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>>>>> # file: 1
>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>
>>>>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>>>>> # file: 2
>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>
>>>>>> 10. Force-start the volume.
>>>>>>
>>>>>> [root at server-1 ~]# gluster volume start rep force
>>>>>> volume start: rep: success
>>>>>>
>>>>>> 11. Monitor heal-info command to ensure the number of
entries keeps
>>>>>> growing.
>>>>>>
>>>>>> 12. Keep monitoring with step 10 and eventually the
number of entries
>>>>>> needing heal must come down to 0.
>>>>>> Also the checksums of the files on the previously empty
brick should
>>>>>> now match with the copies on the other two bricks.
>>>>>>
>>>>>> Could you check if the above steps work for you, in
your test
>>>>>> environment?
>>>>>>
>>>>>> You caught a nice bug in the manual steps to follow
when granular
>>>>>> entry-heal is enabled and an empty brick needs heal.
Thanks for reporting
>>>>>> it. :) We will fix the documentation appropriately.
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay
<
>>>>>> kdhananj at redhat.com> wrote:
>>>>>>
>>>>>>> Tried this.
>>>>>>>
>>>>>>> With me, only 'fake2' gets healed after i
bring the 'empty' brick
>>>>>>> back up and it stops there unless I do a
'heal-full'.
>>>>>>>
>>>>>>> Is that what you're seeing as well?
>>>>>>>
>>>>>>> -Krutika
>>>>>>>
>>>>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> Same issue brought up glusterd on problem node
heal count still
>>>>>>>> stuck at 6330.
>>>>>>>>
>>>>>>>> Ran gluster v heal GUSTER1 full
>>>>>>>>
>>>>>>>> glustershd on problem node shows a sweep
starting and finishing in
>>>>>>>> seconds. Other 2 nodes show no activity in
log. They should start a sweep
>>>>>>>> too shouldn't they?
>>>>>>>>
>>>>>>>> Tried starting from scratch
>>>>>>>>
>>>>>>>> kill -15 brickpid
>>>>>>>> rm -Rf /brick
>>>>>>>> mkdir -p /brick
>>>>>>>> mkdir mkdir /gsmount/fake2
>>>>>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>>>>>
>>>>>>>> Heals visible dirs instantly then stops.
>>>>>>>>
>>>>>>>> gluster v heal GLUSTER1 full
>>>>>>>>
>>>>>>>> see sweep star on problem node and end almost
instantly. no files
>>>>>>>> added t heal list no files healed no more
logging
>>>>>>>>
>>>>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>
>>>>>>>> same results no matter which node you run
command on. Still stuck
>>>>>>>> with 6330 files showing needing healed out of
19k. still showing in logs
>>>>>>>> no heals are occuring.
>>>>>>>>
>>>>>>>> Is their a way to forcibly reset any prior heal
data? Could it be
>>>>>>>> stuck on some past failed heal start?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *David Gossage*
>>>>>>>> *Carousel Checks Inc. | System Administrator*
>>>>>>>> *Office* 708.613.2284
>>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage
<
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David
Gossage <
>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>
>>>>>>>>>> updated test server to 3.8.3
>>>>>>>>>>
>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>> Options Reconfigured:
>>>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>> nfs.disable: on
>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>>>> performance.quick-read: off
>>>>>>>>>> performance.io-cache: off
>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>> network.remote-dio: on
>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>> cluster.server-quorum-type: server
>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>> features.shard: on
>>>>>>>>>> features.shard-block-size: 64MB
>>>>>>>>>> performance.strict-o-direct: off
>>>>>>>>>> cluster.locking-scheme: granular
>>>>>>>>>>
>>>>>>>>>> kill -15 brickpid
>>>>>>>>>> rm -Rf /gluster2/brick3
>>>>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>>>>> \:_glustershard/fake2
>>>>>>>>>> setfattr -n "user.some-name"
-v "some-value"
>>>>>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>>>>>>> /fake2
>>>>>>>>>> gluster v start glustershard force
>>>>>>>>>>
>>>>>>>>>> at this point brick process starts and
all visible files
>>>>>>>>>> including new dir are made on brick
>>>>>>>>>> handful of shards are in heal
statistics still but no .shard
>>>>>>>>>> directory created and no increase in
shard count
>>>>>>>>>>
>>>>>>>>>> gluster v heal glustershard
>>>>>>>>>>
>>>>>>>>>> At this point still no increase in
count or dir made no
>>>>>>>>>> additional activity in logs for healing
generated. waited few minutes
>>>>>>>>>> tailing logs to check if anything
kicked in.
>>>>>>>>>>
>>>>>>>>>> gluster v heal glustershard full
>>>>>>>>>>
>>>>>>>>>> gluster shards added to list and heal
commences. logs show full
>>>>>>>>>> sweep starting on all 3 nodes. though
this time it only shows as finishing
>>>>>>>>>> on one which looks to be the one that
had brick deleted.
>>>>>>>>>>
>>>>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID:
108026]
>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>> 0-glustershard-replicate-0: starting
full sweep on subvol
>>>>>>>>>> glustershard-client-0
>>>>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID:
108026]
>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>> 0-glustershard-replicate-0: starting
full sweep on subvol
>>>>>>>>>> glustershard-client-1
>>>>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID:
108026]
>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>> 0-glustershard-replicate-0: starting
full sweep on subvol
>>>>>>>>>> glustershard-client-2
>>>>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID:
108026]
>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>> 0-glustershard-replicate-0: finished
full sweep on subvol
>>>>>>>>>> glustershard-client-2
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Just realized its still healing so that may
be why sweep on 2
>>>>>>>>> other bricks haven't replied as
finished.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> my hope is that later tonight a full
heal will work on
>>>>>>>>>> production. Is it possible self-heal
daemon can get stale or stop
>>>>>>>>>> listening but still show as active?
Would stopping and starting self-heal
>>>>>>>>>> daemon from gluster cli before doing
these heals be helpful?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David
Gossage <
>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM,
David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:01
AM, Krutika Dhananjay <
>>>>>>>>>>>> kdhananj at redhat.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
6:20 PM, Krutika Dhananjay <
>>>>>>>>>>>>> kdhananj at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
6:07 PM, David Gossage <
>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Aug 30,
2016 at 7:18 AM, Krutika Dhananjay <
>>>>>>>>>>>>>>> kdhananj at
redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Could you also
share the glustershd logs?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'll get them
when I get to work sure
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I tried the
same steps that you mentioned multiple times,
>>>>>>>>>>>>>>>> but heal is
running to completion without any issues.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It must be said
that 'heal full' traverses the files and
>>>>>>>>>>>>>>>> directories in
a depth-first order and does heals also in the same order.
>>>>>>>>>>>>>>>> But if it gets
interrupted in the middle (say because self-heal-daemon was
>>>>>>>>>>>>>>>> either
intentionally or unintentionally brought offline and then brought
>>>>>>>>>>>>>>>> back up),
self-heal will only pick up the entries that are so far marked as
>>>>>>>>>>>>>>>> new-entries
that need heal which it will find in indices/xattrop directory.
>>>>>>>>>>>>>>>> What this means
is that those files and directories that were not visited
>>>>>>>>>>>>>>>> during the
crawl, will remain untouched and unhealed in this second
>>>>>>>>>>>>>>>> iteration of
heal, unless you execute a 'heal-full' again.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So should it start
healing shards as it crawls or not until
>>>>>>>>>>>>>>> after it crawls the
entire .shard directory? At the pace it was going that
>>>>>>>>>>>>>>> could be a week
with one node appearing in the cluster but with no shard
>>>>>>>>>>>>>>> files if anything
tries to access a file on that node. From my experience
>>>>>>>>>>>>>>> other day telling
it to heal full again did nothing regardless of node used.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Crawl is started from
'/' of the volume. Whenever self-heal
>>>>>>>>>>>>> detects during the crawl
that a file or directory is present in some
>>>>>>>>>>>>> brick(s) and absent in
others, it creates the file on the bricks where it
>>>>>>>>>>>>> is absent and marks the
fact that the file or directory might need
>>>>>>>>>>>>> data/entry and metadata
heal too (this also means that an index is created
>>>>>>>>>>>>> under
.glusterfs/indices/xattrop of the src bricks). And the data/entry and
>>>>>>>>>>>>> metadata heal are picked up
and done in
>>>>>>>>>>>>>
>>>>>>>>>>>> the background with the help of
these indices.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at my 3rd node as
example i find nearly an exact same
>>>>>>>>>>>> number of files in xattrop dir
as reported by heal count at time I brought
>>>>>>>>>>>> down node2 to try and alleviate
read io errors that seemed to occur from
>>>>>>>>>>>> what I was guessing as attempts
to use the node with no shards for reads.
>>>>>>>>>>>>
>>>>>>>>>>>> Also attached are the
glustershd logs from the 3 nodes, along
>>>>>>>>>>>> with the test node i tried
yesterday with same results.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Looking at my own logs I notice
that a full sweep was only ever
>>>>>>>>>>> recorded in glustershd.log on 2nd
node with missing directory. I believe I
>>>>>>>>>>> should have found a sweep begun on
every node correct?
>>>>>>>>>>>
>>>>>>>>>>> On my test dev when it did work I
do see that
>>>>>>>>>>>
>>>>>>>>>>> [2016-08-30 13:56:25.223333] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>> [2016-08-30 13:56:25.223522] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>> [2016-08-30 13:56:25.224616] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>> [2016-08-30 14:18:48.333740] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>> [2016-08-30 14:18:48.356008] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>> [2016-08-30 14:18:49.637811] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>>
>>>>>>>>>>> While when looking at past few days
of the 3 prod nodes i only
>>>>>>>>>>> found that on my 2nd node
>>>>>>>>>>> [2016-08-27 01:26:42.638772] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 11:37:01.732366] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 12:58:34.597228] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 12:59:28.041173] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 20:03:42.560188] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 20:03:44.278274] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 21:00:42.603315] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>> [2016-08-27 21:00:46.148674] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished
full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My suspicion is
that this is what happened on your setup.
>>>>>>>>>>>>>>>> Could you
confirm if that was the case?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Brick was brought
online with force start then a full heal
>>>>>>>>>>>>>>> launched. Hours
later after it became evident that it was not adding new
>>>>>>>>>>>>>>> files to heal I did
try restarting self-heal daemon and relaunching full
>>>>>>>>>>>>>>> heal again. But
this was after the heal had basically already failed to
>>>>>>>>>>>>>>> work as intended.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> OK. How did you figure
it was not adding any new files? I
>>>>>>>>>>>>>> need to know what
places you were monitoring to come to this conclusion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As for those
logs, I did manager to do something that
>>>>>>>>>>>>>>>> caused these
warning messages you shared earlier to appear in my client and
>>>>>>>>>>>>>>>> server logs.
>>>>>>>>>>>>>>>> Although these
logs are annoying and a bit scary too, they
>>>>>>>>>>>>>>>> didn't do
any harm to the data in my volume. Why they appear just after a
>>>>>>>>>>>>>>>> brick is
replaced and under no other circumstances is something I'm still
>>>>>>>>>>>>>>>> investigating.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But for future,
it would be good to follow the steps
>>>>>>>>>>>>>>>> Anuradha gave
as that would allow self-heal to at least detect that it has
>>>>>>>>>>>>>>>> some repairing
to do whenever it is restarted whether intentionally or
>>>>>>>>>>>>>>>> otherwise.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I followed those
steps as described on my test box and ended
>>>>>>>>>>>>>>> up with exact same
outcome of adding shards at an agonizing slow pace and
>>>>>>>>>>>>>>> no creation of
.shard directory or heals on shard directory. Directories
>>>>>>>>>>>>>>> visible from mount
healed quickly. This was with one VM so it has only 800
>>>>>>>>>>>>>>> shards as well.
After hours at work it had added a total of 33 shards to
>>>>>>>>>>>>>>> be healed. I sent
those logs yesterday as well though not the glustershd.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does replace-brick
command copy files in same manner? For
>>>>>>>>>>>>>>> these purposes I am
contemplating just skipping the heal route.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Aug 30,
2016 at 2:22 AM, David Gossage <
>>>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> attached
brick and client logs from test machine where
>>>>>>>>>>>>>>>>> same
behavior occurred not sure if anything new is there. its still on
>>>>>>>>>>>>>>>>> 3.8.2
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Number of
Bricks: 1 x 3 = 3
>>>>>>>>>>>>>>>>>
Transport-type: tcp
>>>>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>>>>> Options
Reconfigured:
>>>>>>>>>>>>>>>>>
cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>>>>>>
features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>>
features.shard: on
>>>>>>>>>>>>>>>>>
server.allow-insecure: on
>>>>>>>>>>>>>>>>>
storage.owner-uid: 36
>>>>>>>>>>>>>>>>>
storage.owner-gid: 36
>>>>>>>>>>>>>>>>>
cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>>
cluster.quorum-type: auto
>>>>>>>>>>>>>>>>>
network.remote-dio: on
>>>>>>>>>>>>>>>>>
cluster.eager-lock: enable
>>>>>>>>>>>>>>>>>
performance.stat-prefetch: off
>>>>>>>>>>>>>>>>>
performance.io-cache: off
>>>>>>>>>>>>>>>>>
performance.quick-read: off
>>>>>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>>
nfs.enable-ino32: off
>>>>>>>>>>>>>>>>>
nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>>
nfs.disable: on
>>>>>>>>>>>>>>>>>
performance.read-ahead: off
>>>>>>>>>>>>>>>>>
performance.readdir-ahead: on
>>>>>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Aug
29, 2016 at 2:20 PM, David Gossage <
>>>>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon,
Aug 29, 2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>>> atalur
at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
----- Original Message -----
>>>>>>>>>>>>>>>>>>>
> From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>>>>>
> To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>>>>>
> Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>>>>>
kdhananj at redhat.com>
>>>>>>>>>>>>>>>>>>>
> Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>>>>>
> Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>>>>
Glacier Slow
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>>>>
atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> > Response inline.
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
> > ----- Original Message -----
>>>>>>>>>>>>>>>>>>>
> > > From: "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>>>>>>>>>>>>>>
> > > To: "David Gossage" <dgossage at
carouselchecks.com>
>>>>>>>>>>>>>>>>>>>
> > > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>>>>
> > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>>>>>
> > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>>>>
Glacier Slow
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > Could you attach both client and brick logs?
>>>>>>>>>>>>>>>>>>>
Meanwhile I will try these
>>>>>>>>>>>>>>>>>>>
> > steps
>>>>>>>>>>>>>>>>>>>
> > > out on my machines and see if it is easily
>>>>>>>>>>>>>>>>>>>
recreatable.
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > -Krutika
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>>>>>
> > dgossage at carouselchecks.com
>>>>>>>>>>>>>>>>>>>
> > > > wrote:
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>
> > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>
> > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>
> > > Options Reconfigured:
>>>>>>>>>>>>>>>>>>>
> > > cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>>>>>
> > > cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>>>>>
> > > cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>>>>
> > > features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>>>>
> > > features.shard: on
>>>>>>>>>>>>>>>>>>>
> > > performance.readdir-ahead: on
>>>>>>>>>>>>>>>>>>>
> > > storage.owner-uid: 36
>>>>>>>>>>>>>>>>>>>
> > > storage.owner-gid: 36
>>>>>>>>>>>>>>>>>>>
> > > performance.quick-read: off
>>>>>>>>>>>>>>>>>>>
> > > performance.read-ahead: off
>>>>>>>>>>>>>>>>>>>
> > > performance.io-cache: off
>>>>>>>>>>>>>>>>>>>
> > > performance.stat-prefetch: on
>>>>>>>>>>>>>>>>>>>
> > > cluster.eager-lock: enable
>>>>>>>>>>>>>>>>>>>
> > > network.remote-dio: enable
>>>>>>>>>>>>>>>>>>>
> > > cluster.quorum-type: auto
>>>>>>>>>>>>>>>>>>>
> > > cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>>>>
> > > server.allow-insecure: on
>>>>>>>>>>>>>>>>>>>
> > > cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>>>>
> > > cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>>>>
> > > performance.strict-write-ordering: off
>>>>>>>>>>>>>>>>>>>
> > > nfs.disable: on
>>>>>>>>>>>>>>>>>>>
> > > nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>>>>
> > > nfs.enable-ino32: off
>>>>>>>>>>>>>>>>>>>
> > > cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>>>>>
issues.
>>>>>>>>>>>>>>>>>>>
> > > Following steps detailed in previous
>>>>>>>>>>>>>>>>>>>
recommendations began proces of
>>>>>>>>>>>>>>>>>>>
> > > replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > 1) kill pid of brick
>>>>>>>>>>>>>>>>>>>
> > > 2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>>>>>
> > > 3) recreate directory of brick
>>>>>>>>>>>>>>>>>>>
> > > 4) gluster volume start <> force
>>>>>>>>>>>>>>>>>>>
> > > 5) gluster volume heal <> full
>>>>>>>>>>>>>>>>>>>
> > Hi,
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
> > I'd suggest that full heal is not used. There are a
>>>>>>>>>>>>>>>>>>> few
bugs in full heal.
>>>>>>>>>>>>>>>>>>>
> > Better safe than sorry ;)
>>>>>>>>>>>>>>>>>>>
> > Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
> > Currently I brought the node down by systemctl stop
>>>>>>>>>>>>>>>>>>>
glusterd as I was
>>>>>>>>>>>>>>>>>>>
> getting sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>>>>>
hoping that will help.
>>>>>>>>>>>>>>>>>>>
> I may wait to do this till around 4PM when most work
>>>>>>>>>>>>>>>>>>> is
done in case it
>>>>>>>>>>>>>>>>>>>
> shoots load up.
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> > 1) kill pid of brick
>>>>>>>>>>>>>>>>>>>
> > 2) to configuring of brick that you need
>>>>>>>>>>>>>>>>>>>
> > 3) recreate brick dir
>>>>>>>>>>>>>>>>>>>
> > 4) while the brick is still down, from the mount
>>>>>>>>>>>>>>>>>>>
point:
>>>>>>>>>>>>>>>>>>>
> > a) create a dummy non existent dir under / of
>>>>>>>>>>>>>>>>>>>
mount.
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> so if noee 2 is down brick, pick node for example 3
>>>>>>>>>>>>>>>>>>> and
make a test dir
>>>>>>>>>>>>>>>>>>>
> under its brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>>>>>
should I be dong this
>>>>>>>>>>>>>>>>>>>
> over a gluster mount?
>>>>>>>>>>>>>>>>>>> You
should be doing this over gluster mount.
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> > b) set a non existent extended attribute on / of
>>>>>>>>>>>>>>>>>>>
mount.
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> Could you give me an example of an attribute to set?
>>>>>>>>>>>>>>>>>>>
I've read a tad on
>>>>>>>>>>>>>>>>>>>
> this, and looked up attributes but haven't set any yet
>>>>>>>>>>>>>>>>>>>
myself.
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
Sure. setfattr -n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>>>>>>
> Doing these steps will ensure that heal happens only
>>>>>>>>>>>>>>>>>>>
from updated brick to
>>>>>>>>>>>>>>>>>>>
> > down brick.
>>>>>>>>>>>>>>>>>>>
> > 5) gluster v start <> force
>>>>>>>>>>>>>>>>>>>
> > 6) gluster v heal <>
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
> Will it matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>>>>>
command was run other
>>>>>>>>>>>>>>>>>>>
> day? Not sure if it eventually stops or times out.
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
full heal will stop once the crawl is done. So if you
>>>>>>>>>>>>>>>>>>>
want to trigger heal again,
>>>>>>>>>>>>>>>>>>> run
gluster v heal <>. Actually even brick up or volume
>>>>>>>>>>>>>>>>>>>
start force should
>>>>>>>>>>>>>>>>>>>
trigger the heal.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Did
this on test bed today. its one server with 3 bricks
>>>>>>>>>>>>>>>>>> on same
machine so take that for what its worth. also it still runs
>>>>>>>>>>>>>>>>>> 3.8.2.
Maybe ill update and re-run test.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> killed
brick
>>>>>>>>>>>>>>>>>> deleted
brick dir
>>>>>>>>>>>>>>>>>>
recreated brick dir
>>>>>>>>>>>>>>>>>> created
fake dir on gluster mount
>>>>>>>>>>>>>>>>>> set
suggested fake attribute on it
>>>>>>>>>>>>>>>>>> ran
volume start <> force
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> looked
at files it said needed healing and it was just 8
>>>>>>>>>>>>>>>>>> shards
that were modified for few minutes I ran through steps
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> gave it
few minutes and it stayed same
>>>>>>>>>>>>>>>>>> ran
gluster volume <> heal
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> it
healed all the directories and files you can see over
>>>>>>>>>>>>>>>>>> mount
including fakedir.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> same
issue for shards though. it adds more shards to
>>>>>>>>>>>>>>>>>> heal at
glacier pace. slight jump in speed if I stat every file and dir in
>>>>>>>>>>>>>>>>>> VM
running but not all shards.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It
started with 8 shards to heal and is now only at 33
>>>>>>>>>>>>>>>>>> out of
800 and probably wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
> > > 1st node worked as expected took 12 hours to heal
>>>>>>>>>>>>>>>>>>> 1TB
data. Load was
>>>>>>>>>>>>>>>>>>>
> > little
>>>>>>>>>>>>>>>>>>>
> > > heavy but nothing shocking.
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>>>>>
process on node2. Heal
>>>>>>>>>>>>>>>>>>>
> > > proces kicked in as before and the files in
>>>>>>>>>>>>>>>>>>>
directories visible from
>>>>>>>>>>>>>>>>>>>
> > mount
>>>>>>>>>>>>>>>>>>>
> > > and .glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>>>>>>
crawl of .shard adding
>>>>>>>>>>>>>>>>>>>
> > > those files to heal count at which point the
>>>>>>>>>>>>>>>>>>>
entire proces ground to a
>>>>>>>>>>>>>>>>>>>
> > halt
>>>>>>>>>>>>>>>>>>>
> > > basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>>>>>>
added 5900 to heal
>>>>>>>>>>>>>>>>>>>
> > list.
>>>>>>>>>>>>>>>>>>>
> > > Load on all 3 machnes is negligible. It was
>>>>>>>>>>>>>>>>>>>
suggested to change this
>>>>>>>>>>>>>>>>>>>
> > value
>>>>>>>>>>>>>>>>>>>
> > > to full cluster.data-self-heal-algorithm and
>>>>>>>>>>>>>>>>>>>
restart volume which I
>>>>>>>>>>>>>>>>>>>
> > did. No
>>>>>>>>>>>>>>>>>>>
> > > efffect. Tried relaunching heal no effect, despite
>>>>>>>>>>>>>>>>>>> any
node picked. I
>>>>>>>>>>>>>>>>>>>
> > > started each VM and performed a stat of all files
>>>>>>>>>>>>>>>>>>>
from within it, or a
>>>>>>>>>>>>>>>>>>>
> > full
>>>>>>>>>>>>>>>>>>>
> > > virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>>>>>
spikes in shards added,
>>>>>>>>>>>>>>>>>>>
> > but
>>>>>>>>>>>>>>>>>>>
> > > not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>>>>>
indicating anything is
>>>>>>>>>>>>>>>>>>>
> > going
>>>>>>>>>>>>>>>>>>>
> > > on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>>>>>
lookups making me think
>>>>>>>>>>>>>>>>>>>
> > its
>>>>>>>>>>>>>>>>>>>
> > > not really crawling shards directory but waiting
>>>>>>>>>>>>>>>>>>> for
a shard lookup to
>>>>>>>>>>>>>>>>>>>
> > add
>>>>>>>>>>>>>>>>>>>
> > > it. I'll get following in brick log but not
>>>>>>>>>>>>>>>>>>>
constant and sometime
>>>>>>>>>>>>>>>>>>>
> > multiple
>>>>>>>>>>>>>>>>>>>
> > > for same shard.
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>>>>>
> > > [server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>>>>>
0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>>>>>
> > type
>>>>>>>>>>>>>>>>>>>
> > > for (null) (LOOKUP)
>>>>>>>>>>>>>>>>>>>
> > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>>>>>
> > > [server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>>>>>
0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>>>>>
> > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>>>>>
> > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>>>>>
==> (Invalid
>>>>>>>>>>>>>>>>>>>
> > > argument) [Invalid argument]
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > This one repeated about 30 times in row then
>>>>>>>>>>>>>>>>>>>
nothing for 10 minutes then
>>>>>>>>>>>>>>>>>>>
> > one
>>>>>>>>>>>>>>>>>>>
> > > hit for one different shard by itself.
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > How can I determine if Heal is actually running?
>>>>>>>>>>>>>>>>>>> How
can I kill it or
>>>>>>>>>>>>>>>>>>>
> > force
>>>>>>>>>>>>>>>>>>>
> > > restart? Does node I start it from determine which
>>>>>>>>>>>>>>>>>>>
directory gets
>>>>>>>>>>>>>>>>>>>
> > crawled to
>>>>>>>>>>>>>>>>>>>
> > > determine heals?
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > David Gossage
>>>>>>>>>>>>>>>>>>>
> > > Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>>>>>
> > > Office 708.613.2284
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > _______________________________________________
>>>>>>>>>>>>>>>>>>>
> > > Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>
> > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>
> > > http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>>>
/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>
> > > _______________________________________________
>>>>>>>>>>>>>>>>>>>
> > > Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>
> > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>
> > > http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>>>
/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
> > --
>>>>>>>>>>>>>>>>>>>
> > Thanks,
>>>>>>>>>>>>>>>>>>>
> > Anuradha.
>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>
Anuradha.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160906/4727ff0e/attachment.html>