On Tue, Sep 6, 2016 at 7:27 PM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> Going to top post with solution Krutika Dhananjay came up with. His steps
> were much less volatile and could be done with volume still being actively
> used and also much less prone to accidental destruction.
>
> My use case and issue were desire to wipe a brick and recreate with same
> directory structure so as to change underlying raid setup of disks making
> brick. Problem occurred that getting the shards to heal was 99% of the
> time failing.
>
>
Hi,
Thank you for posting this before I could get around to it. Also thanks to
Pranith for suggesting the additional precautionary 'trusted.afr.dirty'
step (step 4 below) and reviewing the steps once.
IIUC the newly-introduced reset-brick command serves as an alternative to
all this lengthy process listed below.
@Pranith,
Is the above statement correct? If so, do we know which releases will have
the reset-brick command/feature?
> These are steps he provided that has been working well.
>
Err.. she. :)
-Krutika
>
> 1) kill brick pid on server that you want to replace
> kill -15 <brickpid>
>
> 2) do brick maintenance which in my case was:
> zpool destroy <ZFSPOOL>
> zpool create (options) yada yada disks
>
> 3) make sure original path to brick exists
> mkdir /path/to/brick
>
> 4) set extended attribute on new brick path (not over gluster mount)
> setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 /path/to/brick
>
> 5) create a mount point to volume
> mkdir /mnt-brick-test
> glusterfs --volfile-id=<VOLNAME> --volfile-server=<valid host or
ip of an
> active gluster server> --client-pid=-6 /mnt-brick-test
>
> 6)set an extended attribute on the gluster network mount VOLNAME is the
> gluster volume KILLEDBRICK# is the index of server needing heal. they
> start from 0 and gluster v info should display them in order
> setfattr -n trusted.replace-brick -v VOLNAME-client-KILLEDBRICK#
> /mnt-brick-test
>
> 7) gluster heal should know show the / root of gluster volume in output
> gluster v heal VOLNAME info
>
> 8) force start volume to bring up killed brick
> gluster v start VOLNAME force
>
> 9) optionally watch heal progress and drink beer while you wait and hope
> nothing blows up
> watch -n 10 gluster v heal VOLNAME statistics heal-count
>
> 10) unmount gluster network mount from server
> umount /mnt-brick-test
>
> 11) Praise the developers for their efforts
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Thu, Sep 1, 2016 at 2:29 PM, David Gossage <dgossage at
carouselchecks.com
> > wrote:
>
>> On Thu, Sep 1, 2016 at 12:09 AM, Krutika Dhananjay <kdhananj at
redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Aug 31, 2016 at 8:13 PM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> Just as a test I did not shut down the one VM on the cluster as
finding
>>>> a window before weekend where I can shut down all VM's and
fit in a full
>>>> heal is unlikely so wanted to see what occurs.
>>>>
>>>>
>>>> kill -15 brick pid
>>>> rm -Rf /gluster2/brick1/1
>>>> mkdir /gluster2/brick1/1
>>>> mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>> /fake3
>>>> setfattr -n "user.some-name" -v
"some-value"
>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick2/1
>>>> # file: gluster2/brick2/1
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000001
>>>> trusted.afr.glustershard-client-0=0x000000000000000200000000
>>>>
>>>
>>> This is unusual. The last digit ought to have been 1 on account of
>>> "fake3" being created while hte first brick is offline.
>>>
>>> This discussion is becoming unnecessary lengthy. Mind if we discuss
this
>>> and sort it out on IRC today, at least the communication will be
continuous
>>> and in real-time. I'm kdhananjay on #gluster (Freenode). Ping
me when
>>> you're online.
>>>
>>> -Krutika
>>>
>>
>> Thanks for assistance this morning. Looks like I lost connection in
IRC
>> and didn't realize it so sorry if you came back looking for me.
Let me
>> know when the steps you worked out have been reviewed and if it's
found
>> safe for production use and I'll give a try.
>>
>>
>>
>>>
>>>
>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick3/1
>>>> # file: gluster2/brick3/1
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000001
>>>> trusted.afr.glustershard-client-0=0x000000000000000200000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> setfattr -n trusted.afr.glustershard-client-0 -v
>>>> 0x000000010000000200000000 /gluster2/brick2/1
>>>> setfattr -n trusted.afr.glustershard-client-0 -v
>>>> 0x000000010000000200000000 /gluster2/brick3/1
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick3/1/
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: gluster2/brick3/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.glustershard-client-0=0x000000010000000200000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick2/1/
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: gluster2/brick2/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.glustershard-client-0=0x000000010000000200000000
>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> gluster v start glustershard force
>>>>
>>>> gluster heal counts climbed up and down a little as it healed
>>>> everything in visible gluster mount and .glusterfs for visible
mount files
>>>> then stalled with around 15 shards and the fake3 directory
still in list
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick2/1/
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: gluster2/brick2/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick3/1/
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: gluster2/brick3/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick1/1/
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: gluster2/brick1/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> heal count stayed same for awhile then ran
>>>>
>>>> gluster v heal glustershard full
>>>>
>>>> heals jump up to 700 as shards actually get read in as needing
heals.
>>>> glustershd shows 3 sweeps started one per brick
>>>>
>>>> It heals shards things look ok heal <> info shows 0 files
but
>>>> statistics heal-info shows 1 left for brick 2 and 3. perhaps
cause I didnt
>>>> stop vm running?
>>>>
>>>> # file: gluster2/brick1/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> # file: gluster2/brick2/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> # file: gluster2/brick3/1/
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>> user.some-name=0x736f6d652d76616c7565
>>>>
>>>> meta-data split-brain? heal <> info split-brain shows no
files or
>>>> entries. If I had thought ahead I would have checked the
values returned
>>>> by getfattr before, although I do know heal-count was returning
0 at the
>>>> time
>>>>
>>>>
>>>> Assuming I need to shut down vm's and put volume in
maintenance from
>>>> ovirt to prevent any io. Does it need to occur for whole heal
or can I
>>>> re-activate at some point to bring VM's back up?
>>>>
>>>>
>>>>
>>>>
>>>> *David Gossage*
>>>> *Carousel Checks Inc. | System Administrator*
>>>> *Office* 708.613.2284
>>>>
>>>> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhananj
at redhat.com
>>>> > wrote:
>>>>
>>>>> No, sorry, it's working fine. I may have missed some
step because of
>>>>> which i saw that problem. /.shard is also healing fine now.
>>>>>
>>>>> Let me know if it works for you.
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <
>>>>> kdhananj at redhat.com> wrote:
>>>>>
>>>>>> OK I just hit the other issue too, where .shard
doesn't get healed. :)
>>>>>>
>>>>>> Investigating as to why that is the case. Give me some
time.
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay
<
>>>>>> kdhananj at redhat.com> wrote:
>>>>>>
>>>>>>> Just figured the steps Anuradha has provided
won't work if granular
>>>>>>> entry heal is on.
>>>>>>> So when you bring down a brick and create fake2
under / of the
>>>>>>> volume, granular entry heal feature causes
>>>>>>> sh to remember only the fact that 'fake2'
needs to be recreated on
>>>>>>> the offline brick (because changelogs are
granular).
>>>>>>>
>>>>>>> In this case, we would be required to indicate to
self-heal-daemon
>>>>>>> that the entire directory tree from '/'
needs to be repaired on the brick
>>>>>>> that contains no data.
>>>>>>>
>>>>>>> To fix this, I did the following (for users who use
granular entry
>>>>>>> self-healing):
>>>>>>>
>>>>>>> 1. Kill the last brick process in the replica
(/bricks/3)
>>>>>>>
>>>>>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>>>>>
>>>>>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>>>>>
>>>>>>> 4. Create a new dir on the mount point:
>>>>>>> [root at client-1 ~]# mkdir /mnt/fake
>>>>>>>
>>>>>>> 5. Set some fake xattr on the root of the volume,
and not the 'fake'
>>>>>>> directory itself.
>>>>>>> [root at client-1 ~]# setfattr -n
"user.some-name" -v "some-value"
>>>>>>> /mnt
>>>>>>>
>>>>>>> 6. Make sure there's no io happening on your
volume.
>>>>>>>
>>>>>>> 7. Check the pending xattrs on the brick
directories of the two good
>>>>>>> copies (on bricks 1 and 2), you should be seeing
same values as the one
>>>>>>> marked in red in both bricks.
>>>>>>> (note that the client-<num> xattr key will
have the same last digit
>>>>>>> as the index of the brick that is down, when
counting from 0. So if the
>>>>>>> first brick is the one that is down, it would read
trusted.afr.*-client-0;
>>>>>>> if the second brick is the one that is empty and
down, it would read
>>>>>>> trusted.afr.*-client-1 and so on).
>>>>>>>
>>>>>>> [root at server-1 ~]# getfattr -d -m . -e hex
/bricks/1
>>>>>>> # file: 1
>>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>
*trusted.afr.rep-client-2=0x000000000000000100000001*
>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>
>>>>>>> [root at server-2 ~]# getfattr -d -m . -e hex
/bricks/2
>>>>>>> # file: 2
>>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>
*trusted.afr.rep-client-2=0x000**000000000000100000001*
>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>
>>>>>>> 8. Flip the 8th digit in the
trusted.afr.<VOLNAME>-client-2 to a 1.
>>>>>>>
>>>>>>> [root at server-1 ~]# setfattr -n
trusted.afr.rep-client-2 -v
>>>>>>> *0x000000010000000100000001* /bricks/1
>>>>>>> [root at server-2 ~]# setfattr -n
trusted.afr.rep-client-2 -v
>>>>>>> *0x000000010000000100000001* /bricks/2
>>>>>>>
>>>>>>> 9. Get the xattrs again and check the xattrs are
set properly now
>>>>>>>
>>>>>>> [root at server-1 ~]# getfattr -d -m . -e hex
/bricks/1
>>>>>>> # file: 1
>>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>
*trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>
>>>>>>> [root at server-2 ~]# getfattr -d -m . -e hex
/bricks/2
>>>>>>> # file: 2
>>>>>>>
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>
*trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>
trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>
>>>>>>> 10. Force-start the volume.
>>>>>>>
>>>>>>> [root at server-1 ~]# gluster volume start rep
force
>>>>>>> volume start: rep: success
>>>>>>>
>>>>>>> 11. Monitor heal-info command to ensure the number
of entries keeps
>>>>>>> growing.
>>>>>>>
>>>>>>> 12. Keep monitoring with step 10 and eventually the
number of
>>>>>>> entries needing heal must come down to 0.
>>>>>>> Also the checksums of the files on the previously
empty brick should
>>>>>>> now match with the copies on the other two bricks.
>>>>>>>
>>>>>>> Could you check if the above steps work for you, in
your test
>>>>>>> environment?
>>>>>>>
>>>>>>> You caught a nice bug in the manual steps to follow
when granular
>>>>>>> entry-heal is enabled and an empty brick needs
heal. Thanks for reporting
>>>>>>> it. :) We will fix the documentation appropriately.
>>>>>>>
>>>>>>> -Krutika
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay
<
>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>
>>>>>>>> Tried this.
>>>>>>>>
>>>>>>>> With me, only 'fake2' gets healed after
i bring the 'empty' brick
>>>>>>>> back up and it stops there unless I do a
'heal-full'.
>>>>>>>>
>>>>>>>> Is that what you're seeing as well?
>>>>>>>>
>>>>>>>> -Krutika
>>>>>>>>
>>>>>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage
<
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> Same issue brought up glusterd on problem
node heal count still
>>>>>>>>> stuck at 6330.
>>>>>>>>>
>>>>>>>>> Ran gluster v heal GUSTER1 full
>>>>>>>>>
>>>>>>>>> glustershd on problem node shows a sweep
starting and finishing in
>>>>>>>>> seconds. Other 2 nodes show no activity in
log. They should start a sweep
>>>>>>>>> too shouldn't they?
>>>>>>>>>
>>>>>>>>> Tried starting from scratch
>>>>>>>>>
>>>>>>>>> kill -15 brickpid
>>>>>>>>> rm -Rf /brick
>>>>>>>>> mkdir -p /brick
>>>>>>>>> mkdir mkdir /gsmount/fake2
>>>>>>>>> setfattr -n "user.some-name" -v
"some-value" /gsmount/fake2
>>>>>>>>>
>>>>>>>>> Heals visible dirs instantly then stops.
>>>>>>>>>
>>>>>>>>> gluster v heal GLUSTER1 full
>>>>>>>>>
>>>>>>>>> see sweep star on problem node and end
almost instantly. no files
>>>>>>>>> added t heal list no files healed no more
logging
>>>>>>>>>
>>>>>>>>> [2016-08-30 23:11:31.544331] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep
on subvol GLUSTER1-client-1
>>>>>>>>> [2016-08-30 23:11:33.776235] I [MSGID:
108026]
>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep
on subvol GLUSTER1-client-1
>>>>>>>>>
>>>>>>>>> same results no matter which node you run
command on. Still stuck
>>>>>>>>> with 6330 files showing needing healed out
of 19k. still showing in logs
>>>>>>>>> no heals are occuring.
>>>>>>>>>
>>>>>>>>> Is their a way to forcibly reset any prior
heal data? Could it be
>>>>>>>>> stuck on some past failed heal start?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *David Gossage*
>>>>>>>>> *Carousel Checks Inc. | System
Administrator*
>>>>>>>>> *Office* 708.613.2284
>>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David
Gossage <
>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David
Gossage <
>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>
>>>>>>>>>>> updated test server to 3.8.3
>>>>>>>>>>>
>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>> cluster.background-self-heal-count:
16
>>>>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>> cluster.server-quorum-type: server
>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>> features.shard: on
>>>>>>>>>>> features.shard-block-size: 64MB
>>>>>>>>>>> performance.strict-o-direct: off
>>>>>>>>>>> cluster.locking-scheme: granular
>>>>>>>>>>>
>>>>>>>>>>> kill -15 brickpid
>>>>>>>>>>> rm -Rf /gluster2/brick3
>>>>>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>>>>>> mkdir mkdir
/rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>>>>>> \:_glustershard/fake2
>>>>>>>>>>> setfattr -n
"user.some-name" -v "some-value"
>>>>>>>>>>>
/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>>>>>>>> /fake2
>>>>>>>>>>> gluster v start glustershard force
>>>>>>>>>>>
>>>>>>>>>>> at this point brick process starts
and all visible files
>>>>>>>>>>> including new dir are made on brick
>>>>>>>>>>> handful of shards are in heal
statistics still but no .shard
>>>>>>>>>>> directory created and no increase
in shard count
>>>>>>>>>>>
>>>>>>>>>>> gluster v heal glustershard
>>>>>>>>>>>
>>>>>>>>>>> At this point still no increase in
count or dir made no
>>>>>>>>>>> additional activity in logs for
healing generated. waited few minutes
>>>>>>>>>>> tailing logs to check if anything
kicked in.
>>>>>>>>>>>
>>>>>>>>>>> gluster v heal glustershard full
>>>>>>>>>>>
>>>>>>>>>>> gluster shards added to list and
heal commences. logs show full
>>>>>>>>>>> sweep starting on all 3 nodes.
though this time it only shows as finishing
>>>>>>>>>>> on one which looks to be the one
that had brick deleted.
>>>>>>>>>>>
>>>>>>>>>>> [2016-08-30 14:45:33.098589] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>> [2016-08-30 14:45:33.099492] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>> [2016-08-30 14:45:33.100093] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>> [2016-08-30 14:52:29.760213] I
[MSGID: 108026]
>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Just realized its still healing so that
may be why sweep on 2
>>>>>>>>>> other bricks haven't replied as
finished.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> my hope is that later tonight a
full heal will work on
>>>>>>>>>>> production. Is it possible
self-heal daemon can get stale or stop
>>>>>>>>>>> listening but still show as active?
Would stopping and starting self-heal
>>>>>>>>>>> daemon from gluster cli before
doing these heals be helpful?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM,
David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:52
AM, David Gossage <
>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
8:01 AM, Krutika Dhananjay <
>>>>>>>>>>>>> kdhananj at redhat.com>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at
6:20 PM, Krutika Dhananjay <
>>>>>>>>>>>>>> kdhananj at
redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Aug 30,
2016 at 6:07 PM, David Gossage <
>>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Aug 30,
2016 at 7:18 AM, Krutika Dhananjay <
>>>>>>>>>>>>>>>> kdhananj at
redhat.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Could you
also share the glustershd logs?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'll get
them when I get to work sure
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I tried the
same steps that you mentioned multiple times,
>>>>>>>>>>>>>>>>> but heal is
running to completion without any issues.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It must be
said that 'heal full' traverses the files and
>>>>>>>>>>>>>>>>> directories
in a depth-first order and does heals also in the same order.
>>>>>>>>>>>>>>>>> But if it
gets interrupted in the middle (say because self-heal-daemon was
>>>>>>>>>>>>>>>>> either
intentionally or unintentionally brought offline and then brought
>>>>>>>>>>>>>>>>> back up),
self-heal will only pick up the entries that are so far marked as
>>>>>>>>>>>>>>>>> new-entries
that need heal which it will find in indices/xattrop directory.
>>>>>>>>>>>>>>>>> What this
means is that those files and directories that were not visited
>>>>>>>>>>>>>>>>> during the
crawl, will remain untouched and unhealed in this second
>>>>>>>>>>>>>>>>> iteration
of heal, unless you execute a 'heal-full' again.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So should it
start healing shards as it crawls or not until
>>>>>>>>>>>>>>>> after it crawls
the entire .shard directory? At the pace it was going that
>>>>>>>>>>>>>>>> could be a week
with one node appearing in the cluster but with no shard
>>>>>>>>>>>>>>>> files if
anything tries to access a file on that node. From my experience
>>>>>>>>>>>>>>>> other day
telling it to heal full again did nothing regardless of node used.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Crawl is started from
'/' of the volume. Whenever self-heal
>>>>>>>>>>>>>> detects during the
crawl that a file or directory is present in some
>>>>>>>>>>>>>> brick(s) and absent in
others, it creates the file on the bricks where it
>>>>>>>>>>>>>> is absent and marks the
fact that the file or directory might need
>>>>>>>>>>>>>> data/entry and metadata
heal too (this also means that an index is created
>>>>>>>>>>>>>> under
.glusterfs/indices/xattrop of the src bricks). And the data/entry and
>>>>>>>>>>>>>> metadata heal are
picked up and done in
>>>>>>>>>>>>>>
>>>>>>>>>>>>> the background with the
help of these indices.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking at my 3rd node as
example i find nearly an exact same
>>>>>>>>>>>>> number of files in xattrop
dir as reported by heal count at time I brought
>>>>>>>>>>>>> down node2 to try and
alleviate read io errors that seemed to occur from
>>>>>>>>>>>>> what I was guessing as
attempts to use the node with no shards for reads.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also attached are the
glustershd logs from the 3 nodes, along
>>>>>>>>>>>>> with the test node i tried
yesterday with same results.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at my own logs I notice
that a full sweep was only ever
>>>>>>>>>>>> recorded in glustershd.log on
2nd node with missing directory. I believe I
>>>>>>>>>>>> should have found a sweep begun
on every node correct?
>>>>>>>>>>>>
>>>>>>>>>>>> On my test dev when it did work
I do see that
>>>>>>>>>>>>
>>>>>>>>>>>> [2016-08-30 13:56:25.223333] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>>> [2016-08-30 13:56:25.223522] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>>> [2016-08-30 13:56:25.224616] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0:
starting full sweep on subvol
>>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>> [2016-08-30 14:18:48.333740] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>> [2016-08-30 14:18:48.356008] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>>> [2016-08-30 14:18:49.637811] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0:
finished full sweep on subvol
>>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>>>
>>>>>>>>>>>> While when looking at past few
days of the 3 prod nodes i only
>>>>>>>>>>>> found that on my 2nd node
>>>>>>>>>>>> [2016-08-27 01:26:42.638772] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 11:37:01.732366] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 12:58:34.597228] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 12:59:28.041173] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 20:03:42.560188] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 20:03:44.278274] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 21:00:42.603315] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>> [2016-08-27 21:00:46.148674] I
[MSGID: 108026]
>>>>>>>>>>>>
[afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-GLUSTER1-replicate-0:
finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My
suspicion is that this is what happened on your setup.
>>>>>>>>>>>>>>>>> Could you
confirm if that was the case?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brick was
brought online with force start then a full heal
>>>>>>>>>>>>>>>> launched.
Hours later after it became evident that it was not adding new
>>>>>>>>>>>>>>>> files to heal I
did try restarting self-heal daemon and relaunching full
>>>>>>>>>>>>>>>> heal again. But
this was after the heal had basically already failed to
>>>>>>>>>>>>>>>> work as
intended.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OK. How did you
figure it was not adding any new files? I
>>>>>>>>>>>>>>> need to know what
places you were monitoring to come to this conclusion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As for
those logs, I did manager to do something that
>>>>>>>>>>>>>>>>> caused
these warning messages you shared earlier to appear in my client and
>>>>>>>>>>>>>>>>> server
logs.
>>>>>>>>>>>>>>>>> Although
these logs are annoying and a bit scary too, they
>>>>>>>>>>>>>>>>> didn't
do any harm to the data in my volume. Why they appear just after a
>>>>>>>>>>>>>>>>> brick is
replaced and under no other circumstances is something I'm still
>>>>>>>>>>>>>>>>>
investigating.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But for
future, it would be good to follow the steps
>>>>>>>>>>>>>>>>> Anuradha
gave as that would allow self-heal to at least detect that it has
>>>>>>>>>>>>>>>>> some
repairing to do whenever it is restarted whether intentionally or
>>>>>>>>>>>>>>>>> otherwise.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I followed
those steps as described on my test box and
>>>>>>>>>>>>>>>> ended up with
exact same outcome of adding shards at an agonizing slow pace
>>>>>>>>>>>>>>>> and no creation
of .shard directory or heals on shard directory.
>>>>>>>>>>>>>>>> Directories
visible from mount healed quickly. This was with one VM so it
>>>>>>>>>>>>>>>> has only 800
shards as well. After hours at work it had added a total of
>>>>>>>>>>>>>>>> 33 shards to be
healed. I sent those logs yesterday as well though not the
>>>>>>>>>>>>>>>> glustershd.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does
replace-brick command copy files in same manner? For
>>>>>>>>>>>>>>>> these purposes
I am contemplating just skipping the heal route.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Aug
30, 2016 at 2:22 AM, David Gossage <
>>>>>>>>>>>>>>>>> dgossage at
carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
attached brick and client logs from test machine where
>>>>>>>>>>>>>>>>>> same
behavior occurred not sure if anything new is there. its still on
>>>>>>>>>>>>>>>>>> 3.8.2
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Number
of Bricks: 1 x 3 = 3
>>>>>>>>>>>>>>>>>>
Transport-type: tcp
>>>>>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>>>>>> Brick1:
192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>>>>>> Brick2:
192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>>>>>> Brick3:
192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>>>>>> Options
Reconfigured:
>>>>>>>>>>>>>>>>>>
cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>>>
performance.strict-o-direct: off
>>>>>>>>>>>>>>>>>>
features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>>>
features.shard: on
>>>>>>>>>>>>>>>>>>
server.allow-insecure: on
>>>>>>>>>>>>>>>>>>
storage.owner-uid: 36
>>>>>>>>>>>>>>>>>>
storage.owner-gid: 36
>>>>>>>>>>>>>>>>>>
cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>>>
cluster.quorum-type: auto
>>>>>>>>>>>>>>>>>>
network.remote-dio: on
>>>>>>>>>>>>>>>>>>
cluster.eager-lock: enable
>>>>>>>>>>>>>>>>>>
performance.stat-prefetch: off
>>>>>>>>>>>>>>>>>>
performance.io-cache: off
>>>>>>>>>>>>>>>>>>
performance.quick-read: off
>>>>>>>>>>>>>>>>>>
cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>>>
cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>>>
nfs.enable-ino32: off
>>>>>>>>>>>>>>>>>>
nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>>>
nfs.disable: on
>>>>>>>>>>>>>>>>>>
performance.read-ahead: off
>>>>>>>>>>>>>>>>>>
performance.readdir-ahead: on
>>>>>>>>>>>>>>>>>>
cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon,
Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>>>>>>>>>>>>>>
dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On
Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>>>>
atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
----- Original Message -----
>>>>>>>>>>>>>>>>>>>>
> From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>>>>>>
> To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>>>>>>
> Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>>>>>>
kdhananj at redhat.com>
>>>>>>>>>>>>>>>>>>>>
> Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>>>>>>
> Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>>>>>
Glacier Slow
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>>>>>
atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> > Response inline.
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
> > ----- Original Message -----
>>>>>>>>>>>>>>>>>>>>
> > > From: "Krutika Dhananjay" <kdhananj at
redhat.com>
>>>>>>>>>>>>>>>>>>>>
> > > To: "David Gossage" <dgossage at
carouselchecks.com>
>>>>>>>>>>>>>>>>>>>>
> > > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>>>>
Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>>>>>
> > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>>>>>>
> > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>>>>>
Glacier Slow
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > Could you attach both client and brick logs?
>>>>>>>>>>>>>>>>>>>>
Meanwhile I will try these
>>>>>>>>>>>>>>>>>>>>
> > steps
>>>>>>>>>>>>>>>>>>>>
> > > out on my machines and see if it is easily
>>>>>>>>>>>>>>>>>>>>
recreatable.
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > -Krutika
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>>>>>>
> > dgossage at carouselchecks.com
>>>>>>>>>>>>>>>>>>>>
> > > > wrote:
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>>
> > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>>
> > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>>
> > > Options Reconfigured:
>>>>>>>>>>>>>>>>>>>>
> > > cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>>>>>>
> > > cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>>>>>>
> > > cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>>>>>
> > > features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>>>>>
> > > features.shard: on
>>>>>>>>>>>>>>>>>>>>
> > > performance.readdir-ahead: on
>>>>>>>>>>>>>>>>>>>>
> > > storage.owner-uid: 36
>>>>>>>>>>>>>>>>>>>>
> > > storage.owner-gid: 36
>>>>>>>>>>>>>>>>>>>>
> > > performance.quick-read: off
>>>>>>>>>>>>>>>>>>>>
> > > performance.read-ahead: off
>>>>>>>>>>>>>>>>>>>>
> > > performance.io-cache: off
>>>>>>>>>>>>>>>>>>>>
> > > performance.stat-prefetch: on
>>>>>>>>>>>>>>>>>>>>
> > > cluster.eager-lock: enable
>>>>>>>>>>>>>>>>>>>>
> > > network.remote-dio: enable
>>>>>>>>>>>>>>>>>>>>
> > > cluster.quorum-type: auto
>>>>>>>>>>>>>>>>>>>>
> > > cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>>>>>
> > > server.allow-insecure: on
>>>>>>>>>>>>>>>>>>>>
> > > cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>>>>>
> > > cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>>>>>
> > > performance.strict-write-ordering: off
>>>>>>>>>>>>>>>>>>>>
> > > nfs.disable: on
>>>>>>>>>>>>>>>>>>>>
> > > nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>>>>>
> > > nfs.enable-ino32: off
>>>>>>>>>>>>>>>>>>>>
> > > cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>>>>>>
issues.
>>>>>>>>>>>>>>>>>>>>
> > > Following steps detailed in previous
>>>>>>>>>>>>>>>>>>>>
recommendations began proces of
>>>>>>>>>>>>>>>>>>>>
> > > replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > 1) kill pid of brick
>>>>>>>>>>>>>>>>>>>>
> > > 2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>>>>>>
> > > 3) recreate directory of brick
>>>>>>>>>>>>>>>>>>>>
> > > 4) gluster volume start <> force
>>>>>>>>>>>>>>>>>>>>
> > > 5) gluster volume heal <> full
>>>>>>>>>>>>>>>>>>>>
> > Hi,
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
> > I'd suggest that full heal is not used. There are a
>>>>>>>>>>>>>>>>>>>>
few bugs in full heal.
>>>>>>>>>>>>>>>>>>>>
> > Better safe than sorry ;)
>>>>>>>>>>>>>>>>>>>>
> > Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
> > Currently I brought the node down by systemctl stop
>>>>>>>>>>>>>>>>>>>>
glusterd as I was
>>>>>>>>>>>>>>>>>>>>
> getting sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>>>>>>
hoping that will help.
>>>>>>>>>>>>>>>>>>>>
> I may wait to do this till around 4PM when most work
>>>>>>>>>>>>>>>>>>>>
is done in case it
>>>>>>>>>>>>>>>>>>>>
> shoots load up.
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> > 1) kill pid of brick
>>>>>>>>>>>>>>>>>>>>
> > 2) to configuring of brick that you need
>>>>>>>>>>>>>>>>>>>>
> > 3) recreate brick dir
>>>>>>>>>>>>>>>>>>>>
> > 4) while the brick is still down, from the mount
>>>>>>>>>>>>>>>>>>>>
point:
>>>>>>>>>>>>>>>>>>>>
> > a) create a dummy non existent dir under / of
>>>>>>>>>>>>>>>>>>>>
mount.
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> so if noee 2 is down brick, pick node for example 3
>>>>>>>>>>>>>>>>>>>>
and make a test dir
>>>>>>>>>>>>>>>>>>>>
> under its brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>>>>>>
should I be dong this
>>>>>>>>>>>>>>>>>>>>
> over a gluster mount?
>>>>>>>>>>>>>>>>>>>>
You should be doing this over gluster mount.
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> > b) set a non existent extended attribute on / of
>>>>>>>>>>>>>>>>>>>>
mount.
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> Could you give me an example of an attribute to set?
>>>>>>>>>>>>>>>>>>>>
I've read a tad on
>>>>>>>>>>>>>>>>>>>>
> this, and looked up attributes but haven't set any
>>>>>>>>>>>>>>>>>>>>
yet myself.
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
Sure. setfattr -n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>>>>>>
<path-to-mount>
>>>>>>>>>>>>>>>>>>>>
> Doing these steps will ensure that heal happens only
>>>>>>>>>>>>>>>>>>>>
from updated brick to
>>>>>>>>>>>>>>>>>>>>
> > down brick.
>>>>>>>>>>>>>>>>>>>>
> > 5) gluster v start <> force
>>>>>>>>>>>>>>>>>>>>
> > 6) gluster v heal <>
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
> Will it matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>>>>>>
command was run other
>>>>>>>>>>>>>>>>>>>>
> day? Not sure if it eventually stops or times out.
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
full heal will stop once the crawl is done. So if you
>>>>>>>>>>>>>>>>>>>>
want to trigger heal again,
>>>>>>>>>>>>>>>>>>>>
run gluster v heal <>. Actually even brick up or volume
>>>>>>>>>>>>>>>>>>>>
start force should
>>>>>>>>>>>>>>>>>>>>
trigger the heal.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Did
this on test bed today. its one server with 3
>>>>>>>>>>>>>>>>>>>
bricks on same machine so take that for what its worth. also it still runs
>>>>>>>>>>>>>>>>>>>
3.8.2. Maybe ill update and re-run test.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
killed brick
>>>>>>>>>>>>>>>>>>>
deleted brick dir
>>>>>>>>>>>>>>>>>>>
recreated brick dir
>>>>>>>>>>>>>>>>>>>
created fake dir on gluster mount
>>>>>>>>>>>>>>>>>>> set
suggested fake attribute on it
>>>>>>>>>>>>>>>>>>> ran
volume start <> force
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
looked at files it said needed healing and it was just 8
>>>>>>>>>>>>>>>>>>>
shards that were modified for few minutes I ran through steps
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
gave it few minutes and it stayed same
>>>>>>>>>>>>>>>>>>> ran
gluster volume <> heal
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> it
healed all the directories and files you can see over
>>>>>>>>>>>>>>>>>>>
mount including fakedir.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
same issue for shards though. it adds more shards to
>>>>>>>>>>>>>>>>>>>
heal at glacier pace. slight jump in speed if I stat every file and dir in
>>>>>>>>>>>>>>>>>>> VM
running but not all shards.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It
started with 8 shards to heal and is now only at 33
>>>>>>>>>>>>>>>>>>> out
of 800 and probably wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
> > > 1st node worked as expected took 12 hours to heal
>>>>>>>>>>>>>>>>>>>>
1TB data. Load was
>>>>>>>>>>>>>>>>>>>>
> > little
>>>>>>>>>>>>>>>>>>>>
> > > heavy but nothing shocking.
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>>>>>>
process on node2. Heal
>>>>>>>>>>>>>>>>>>>>
> > > proces kicked in as before and the files in
>>>>>>>>>>>>>>>>>>>>
directories visible from
>>>>>>>>>>>>>>>>>>>>
> > mount
>>>>>>>>>>>>>>>>>>>>
> > > and .glusterfs healed in short time. Then it
>>>>>>>>>>>>>>>>>>>>
began crawl of .shard adding
>>>>>>>>>>>>>>>>>>>>
> > > those files to heal count at which point the
>>>>>>>>>>>>>>>>>>>>
entire proces ground to a
>>>>>>>>>>>>>>>>>>>>
> > halt
>>>>>>>>>>>>>>>>>>>>
> > > basically. After 48 hours out of 19k shards it
>>>>>>>>>>>>>>>>>>>>
has added 5900 to heal
>>>>>>>>>>>>>>>>>>>>
> > list.
>>>>>>>>>>>>>>>>>>>>
> > > Load on all 3 machnes is negligible. It was
>>>>>>>>>>>>>>>>>>>>
suggested to change this
>>>>>>>>>>>>>>>>>>>>
> > value
>>>>>>>>>>>>>>>>>>>>
> > > to full cluster.data-self-heal-algorithm and
>>>>>>>>>>>>>>>>>>>>
restart volume which I
>>>>>>>>>>>>>>>>>>>>
> > did. No
>>>>>>>>>>>>>>>>>>>>
> > > efffect. Tried relaunching heal no effect,
>>>>>>>>>>>>>>>>>>>>
despite any node picked. I
>>>>>>>>>>>>>>>>>>>>
> > > started each VM and performed a stat of all files
>>>>>>>>>>>>>>>>>>>>
from within it, or a
>>>>>>>>>>>>>>>>>>>>
> > full
>>>>>>>>>>>>>>>>>>>>
> > > virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>>>>>>
spikes in shards added,
>>>>>>>>>>>>>>>>>>>>
> > but
>>>>>>>>>>>>>>>>>>>>
> > > not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>>>>>>
indicating anything is
>>>>>>>>>>>>>>>>>>>>
> > going
>>>>>>>>>>>>>>>>>>>>
> > > on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>>>>>>
lookups making me think
>>>>>>>>>>>>>>>>>>>>
> > its
>>>>>>>>>>>>>>>>>>>>
> > > not really crawling shards directory but waiting
>>>>>>>>>>>>>>>>>>>>
for a shard lookup to
>>>>>>>>>>>>>>>>>>>>
> > add
>>>>>>>>>>>>>>>>>>>>
> > > it. I'll get following in brick log but not
>>>>>>>>>>>>>>>>>>>>
constant and sometime
>>>>>>>>>>>>>>>>>>>>
> > multiple
>>>>>>>>>>>>>>>>>>>>
> > > for same shard.
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>>>>>>
> > > [server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>>>>>>
0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>>>>>>
> > type
>>>>>>>>>>>>>>>>>>>>
> > > for (null) (LOOKUP)
>>>>>>>>>>>>>>>>>>>>
> > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>>>>>>
> > > [server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>>>>>>
0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>>>>>>
> > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>>>>>>
> > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>>>>>>
==> (Invalid
>>>>>>>>>>>>>>>>>>>>
> > > argument) [Invalid argument]
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > This one repeated about 30 times in row then
>>>>>>>>>>>>>>>>>>>>
nothing for 10 minutes then
>>>>>>>>>>>>>>>>>>>>
> > one
>>>>>>>>>>>>>>>>>>>>
> > > hit for one different shard by itself.
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > How can I determine if Heal is actually running?
>>>>>>>>>>>>>>>>>>>>
How can I kill it or
>>>>>>>>>>>>>>>>>>>>
> > force
>>>>>>>>>>>>>>>>>>>>
> > > restart? Does node I start it from determine
>>>>>>>>>>>>>>>>>>>>
which directory gets
>>>>>>>>>>>>>>>>>>>>
> > crawled to
>>>>>>>>>>>>>>>>>>>>
> > > determine heals?
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > David Gossage
>>>>>>>>>>>>>>>>>>>>
> > > Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>>>>>>
> > > Office 708.613.2284
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > _______________________________________________
>>>>>>>>>>>>>>>>>>>>
> > > Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>
> > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>
> > > http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>>>>
/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > >
>>>>>>>>>>>>>>>>>>>>
> > > _______________________________________________
>>>>>>>>>>>>>>>>>>>>
> > > Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>
> > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>
> > > http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>>>>
/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
> > --
>>>>>>>>>>>>>>>>>>>>
> > Thanks,
>>>>>>>>>>>>>>>>>>>>
> > Anuradha.
>>>>>>>>>>>>>>>>>>>>
> >
>>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
--
>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>
Anuradha.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160906/4656dc53/attachment.html>