thr3ads.net - Gluster users - [Gluster-users] recovering from a replace-brick gone wrong [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Danny Webb

2017-Jul-25 15:25 UTC

[Gluster-users] recovering from a replace-brick gone wrong

Hi All,

I have a 4 node cluster with a 4 brick distribute replica 2 volume on
it running version 3.9.0-2 on CentOS 7.  I use the cluster to provide
shared volumes in a virtual environment as our storage only serves
block storage.

For some reason I decided to make the bricks for this volume directly
on the block device rather than abstracting with LVM for easy space
management.  The bricks have surpassed 90% utilisation and we have
started seeing load increase on one of the clients and two of the nodes
/ bricks most likely due to DHT lookups.

In an effort to rework the bricks and migrate them to LVM backed mounts
I issued a replace-brick command to deprecate out the direct mount to a
 new empty LVM mount.  Immediately after I issued this command load
jumped to ~150 on the two clients (high throughput apache servers) even
though CPU utilisation was minimal.  I could see the clients logging
loads regarding meta-data self heals:

[2017-07-23 20:38:04.803241] I [MSGID: 108026] [afr-self-heal-
common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed
metadata selfheal on b44e9eb5-f886-4222-940d-5
93866c210ff. sources=[0]  sinks[2017-07-23 20:38:04.803736] I [MSGID: 108026]
[afr-self-heal-
common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed
metadata selfheal on 97ad48bc-8873-4700-9f82-4
7130cd031a1. sources=[0]  sinks[2017-07-23 20:38:04.837770] I [MSGID: 108026]
[afr-self-heal-
common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed
metadata selfheal on 041e58e2-9afe-4f43-ba0b-d
11e80b5053b. sources=[0]  sinks
In order to "fix" the situation I had to kill off the new brick
process
so now I'm in a situation with a distribute replica volume with one of
the replica sets in a degraded state.

Is there any way I can re-add the old brick back to get back to a
normal working state?  If I do this migration again I'd probably just
do a direct dd of the ext4 file system onto the new mount while the
brick was offline.

Cheers,

Danny
Danny Webb
Senior Linux and Virtualisation Engineer
The Hut Group<http://www.thehutgroup.com/>

Tel:
Email: Danny.Webb at thehutgroup.com<mailto:Danny.Webb at thehutgroup.com>

For the purposes of this email, the "company" means The Hut Group
Limited, a company registered in England and Wales (company number 6539496)
whose registered office is at Meridian House, Gadbrook Park, Rudheath,
Northwich, Cheshire, CW9 7RA and/or any of its respective subsidiaries.

Confidentiality Notice
This e-mail is confidential and intended for the use of the named recipient
only. If you are not the intended recipient please notify us by telephone
immediately on +44(0)1606 338197 or return it to us by e-mail. Please then
delete it from your system and note that any use, dissemination, forwarding,
printing or copying is strictly prohibited. Any views or opinions are solely
those of the author and do not necessarily represent those of the company.

Encryptions and Viruses
Please note that this e-mail and any attachments have not been encrypted. They
may therefore be liable to be compromised. Please also note that it is your
responsibility to scan this e-mail and any attachments for viruses. We do not,
to the extent permitted by law, accept any liability (whether in contract,
negligence or otherwise) for any virus infection and/or external compromise of
security and/or confidentiality in relation to transmissions sent by e-mail.

Monitoring
Activity and use of the company's systems is monitored to secure its
effective use and operation and for other lawful business purposes.
Communications using these systems will also be monitored and may be recorded to
secure effective use and operation and for other lawful business purposes.

hgvyjuv

Reasonably Related Threads

Search for more apparently analagous threads

Gluster users - Jul 2017 - recovering from a replace-brick gone wrong

[Gluster-users] recovering from a replace-brick gone wrong

Reasonably Related Threads

Wisdom of the Ancients