Diego Zuccato
2020-Oct-26 13:56 UTC
[Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
Il 26/10/20 14:46, mabi ha scritto:>> I solved it by "degrading" the volume to replica 2, then cleared the >> arbiter bricks and upgraded again to replica 3 arbiter 1. > Thanks Diego for pointing out this workaround. How much data do you have on that volume in terms of TB and files? Because I have around 3TB of data in 10 million files. So I am a bit worried of taking such drastic measures.The volume is built by 26 10TB disks w/ genetic data. I currently don't have exact numbers, but it's still at the beginning, so there are a bit less than 10TB actually used. But you're only removing the arbiters, you always have two copies of your files. The worst that can happen is a split brain condition (avoidable by requiring a 2-nodes quorum, in that case the worst is that the volume goes readonly).> How bad was the load after on your volume when re-adding the arbiter brick? and how long did it take to sync/heal?IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.> Would another workaround such as turning off quotas on that problematic volume work? That sounds much less scary but I don't know if that would work...I don't know, sorry. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786
mabi
2020-Oct-26 14:09 UTC
[Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)
??????? Original Message ??????? On Monday, October 26, 2020 2:56 PM, Diego Zuccato <diego.zuccato at unibo.it> wrote:> The volume is built by 26 10TB disks w/ genetic data. I currently don't > have exact numbers, but it's still at the beginning, so there are a bit > less than 10TB actually used. > But you're only removing the arbiters, you always have two copies of > your files. The worst that can happen is a split brain condition > (avoidable by requiring a 2-nodes quorum, in that case the worst is that > the volume goes readonly).Right, seen liked that this sounds reasonable. Do you actually remember the exact command you ran in order to remove the brick? I was thinking this should be it: gluster volume remove-brick <VOLNAME> <BRICK> force but should I use "force" or "start"?> IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM) > that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.That's quite long I must say and I am in the same case as you, my arbiter is a VM.