Danny Webb
2017-Jul-25 15:25 UTC
[Gluster-users] recovering from a replace-brick gone wrong
Hi All, I have a 4 node cluster with a 4 brick distribute replica 2 volume on it running version 3.9.0-2 on CentOS 7. I use the cluster to provide shared volumes in a virtual environment as our storage only serves block storage. For some reason I decided to make the bricks for this volume directly on the block device rather than abstracting with LVM for easy space management. The bricks have surpassed 90% utilisation and we have started seeing load increase on one of the clients and two of the nodes / bricks most likely due to DHT lookups. In an effort to rework the bricks and migrate them to LVM backed mounts I issued a replace-brick command to deprecate out the direct mount to a new empty LVM mount. Immediately after I issued this command load jumped to ~150 on the two clients (high throughput apache servers) even though CPU utilisation was minimal. I could see the clients logging loads regarding meta-data self heals: [2017-07-23 20:38:04.803241] I [MSGID: 108026] [afr-self-heal- common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed metadata selfheal on b44e9eb5-f886-4222-940d-5 93866c210ff. sources=[0] sinks[2017-07-23 20:38:04.803736] I [MSGID: 108026] [afr-self-heal- common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed metadata selfheal on 97ad48bc-8873-4700-9f82-4 7130cd031a1. sources=[0] sinks[2017-07-23 20:38:04.837770] I [MSGID: 108026] [afr-self-heal- common.c:1077:afr_log_selfheal] 2-gv_cdn_001-replicate-1: Completed metadata selfheal on 041e58e2-9afe-4f43-ba0b-d 11e80b5053b. sources=[0] sinks In order to "fix" the situation I had to kill off the new brick process so now I'm in a situation with a distribute replica volume with one of the replica sets in a degraded state. Is there any way I can re-add the old brick back to get back to a normal working state? If I do this migration again I'd probably just do a direct dd of the ext4 file system onto the new mount while the brick was offline. Cheers, Danny Danny Webb Senior Linux and Virtualisation Engineer The Hut Group<http://www.thehutgroup.com/> Tel: Email: Danny.Webb at thehutgroup.com<mailto:Danny.Webb at thehutgroup.com> For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Meridian House, Gadbrook Park, Rudheath, Northwich, Cheshire, CW9 7RA and/or any of its respective subsidiaries. Confidentiality Notice This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 338197 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company. Encryptions and Viruses Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail. Monitoring Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes. hgvyjuv
Apparently Analagous Threads
- not healing one file
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements
- Gluster endless heal
- New 3.12.7 possible split-brain on replica 3
- [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements