thr3ads.net - Gluster users - [Gluster-users] Is rebalance in progress or not? [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2020-Mar-15 10:07 UTC

[Gluster-users] Is rebalance in progress or not?

On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev <ailiev+gluster at
mamul.org> wrote:>Hi list,
>
>I was having some issues with one of my Gluster nodes so I ended up 
>re-installing it. Now I want to re-add the bricks for my main volume
>and 
>I'm having the following issue - when I try to add the bricks I get:
>
> > # gluster volume add-brick store1 replica 3 <bricks ...>
> > volume add-brick: failed: Pre Validation failed on 172.31.35.132. 
>Volume name store1 rebalance is in progress. Please retry after
>completion
>
>But then if I get the rebalance status I get:
>
> > # gluster volume rebalance store1 status
> > volume rebalance: store1: failed: Rebalance not started for volume 
>store1.
>
>And if I try to start the rebalancing I get:
>
> > # gluster volume rebalance store1 start
>> volume rebalance: store1: failed: Rebalance on store1 is already
>started
>
>Looking at the logs of the first node, when I try to start the
>rebalance 
>operation I see this:
>
> > [2020-03-15 09:41:31.883651] E [MSGID: 106276] 
>[glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
>Received 
>stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67
>
>On the second node the logs are showing stuff that indicates that a 
>rebalance operation is indeed in progress:
>
> > [2020-03-15 09:47:34.190042] I [MSGID: 109081] 
>[dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of 
>/redacted
> > [2020-03-15 09:47:34.775691] I 
>[dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data
>
>called on /redacted
> > [2020-03-15 09:47:36.019403] I 
>[dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration 
>operation on dir /redacted took 1.24 secs
>
>
>Some background on what led to this situation:
>
>The volume was originally a replica 3 distributed replicated volume on 
>three nodes. In order to detach the faulty node I lowered the replica 
>count to 2 and removed the bricks from that node from the volume. I 
>cleaned up the storage (formatted the bricks and cleaned the 
>trusted.gfid and trusted.glusterfs.volume-id extended attributes) and 
>purged the gluster packages from the system, then I re-installed the 
>gluster packages and did a `gluster peer probe` from another node.
>
>I'm running Gluster 6.6 on CentOS 7.7 on all nodes.
>
>I feel stuck at this point, so any guidance will be greatly
>appreciated.
>
>Thanks!
>
>Best regards,
Hey  Alex,

Did you try to  go the second node  (the  one tgat  thinks  balance  is running)
and stop tge balance ?

gluster volume rebalance VOLNAME stop

Then add the new brick (and  increase  the  replica  count) and after  the  heal
is over - rebalance again.

Best Regards,
Strahil Nikolov

Alexander Iliev

2020-Mar-15 10:16 UTC

head link

[Gluster-users] Is rebalance in progress or not?

On 3/15/20 11:07 AM, Strahil Nikolov wrote:> On March 15, 2020 11:50:32 AM GMT+02:00, Alexander Iliev <ailiev+gluster
at mamul.org> wrote:
>> Hi list,
>>
>> I was having some issues with one of my Gluster nodes so I ended up
>> re-installing it. Now I want to re-add the bricks for my main volume
>> and
>> I'm having the following issue - when I try to add the bricks I
get:
>>
>>> # gluster volume add-brick store1 replica 3 <bricks ...>
>>> volume add-brick: failed: Pre Validation failed on 172.31.35.132.
>> Volume name store1 rebalance is in progress. Please retry after
>> completion
>>
>> But then if I get the rebalance status I get:
>>
>>> # gluster volume rebalance store1 status
>>> volume rebalance: store1: failed: Rebalance not started for volume
>> store1.
>>
>> And if I try to start the rebalancing I get:
>>
>>> # gluster volume rebalance store1 start
>>> volume rebalance: store1: failed: Rebalance on store1 is already
>> started
>>
>> Looking at the logs of the first node, when I try to start the
>> rebalance
>> operation I see this:
>>
>>> [2020-03-15 09:41:31.883651] E [MSGID: 106276]
>> [glusterd-rpc-ops.c:1200:__glusterd_stage_op_cbk] 0-management:
>> Received
>> stage RJT from uuid: 9476b8bb-d7ee-489a-b083-875805343e67
>>
>> On the second node the logs are showing stuff that indicates that a
>> rebalance operation is indeed in progress:
>>
>>> [2020-03-15 09:47:34.190042] I [MSGID: 109081]
>> [dht-common.c:5868:dht_setxattr] 0-store1-dht: fixing the layout of
>> /redacted
>>> [2020-03-15 09:47:34.775691] I
>> [dht-rebalance.c:3285:gf_defrag_process_dir] 0-store1-dht: migrate data
>>
>> called on /redacted
>>> [2020-03-15 09:47:36.019403] I
>> [dht-rebalance.c:3480:gf_defrag_process_dir] 0-store1-dht: Migration
>> operation on dir /redacted took 1.24 secs
>>
>>
>> Some background on what led to this situation:
>>
>> The volume was originally a replica 3 distributed replicated volume on
>> three nodes. In order to detach the faulty node I lowered the replica
>> count to 2 and removed the bricks from that node from the volume. I
>> cleaned up the storage (formatted the bricks and cleaned the
>> trusted.gfid and trusted.glusterfs.volume-id extended attributes) and
>> purged the gluster packages from the system, then I re-installed the
>> gluster packages and did a `gluster peer probe` from another node.
>>
>> I'm running Gluster 6.6 on CentOS 7.7 on all nodes.
>>
>> I feel stuck at this point, so any guidance will be greatly
>> appreciated.
>>
>> Thanks!
>>
>> Best regards,
> 
> Hey  Alex,
> 
> Did you try to  go the second node  (the  one tgat  thinks  balance  is
running)  and stop tge balance ?
> 
> gluster volume rebalance VOLNAME stop
> 
> Then add the new brick (and  increase  the  replica  count) and after  the 
heal is over - rebalance again.
Hey Strahil,

Thanks for the suggestion, I just tried it, but unfortunately the result 
is pretty much the same - when I try to stop the rebalance on the second 
node it reports that no rebalance is in progress:

 > # gluster volume rebalance store1 stop
 > volume rebalance: store1: failed: Rebalance not started for volume 
store1.
> 
> Best Regards,
> Strahil Nikolov
> 
Best regards,
--
alexander iliev

Gluster users - Mar 2020 - Is rebalance in progress or not?

[Gluster-users] Is rebalance in progress or not?

[Gluster-users] Is rebalance in progress or not?