thr3ads.net - Gluster users - [Gluster-users] Rebalancing newly added bricks [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Strahil

2019-Sep-12 07:00 UTC

[Gluster-users] Rebalancing newly added bricks

Hi Nithya,

Thanks for the detailed explanation.
It makes sense.

Best Regards,
Strahil NikolovOn Sep 12, 2019 08:18, Nithya Balachandran <nbalacha at
redhat.com> wrote:>
>
>
> On Wed, 11 Sep 2019 at 09:47, Strahil <hunter86_bg at yahoo.com>
wrote:
>>
>> Hi Nithya,
>>
>> I just reminded about your previous? e-mail? which left me with the
impression that old volumes need that.
>> This is the one 1 mean:
>>
>> >It looks like this is a replicate volume. If >that is the case
then yes, you are >running an old version of Gluster for >which this was
the default
>
>
> Hi Strahil,
>
> I'm providing a little more detail here which I hope will explain
things.
> Rebalance was always a volume wide operation - a rebalance start operation
will start rebalance processes on all nodes of the volume. However, different
processes would behave differently. In earlier releases, all nodes would crawl
the bricks and update the directory layouts. However, only one node in each
replica/disperse set would actually migrate files,so the rebalance status would
only show one node doing any "work" (scanning, rebalancing etc).
However, this one node will process all the files in its replica sets. Rerunning
rebalance on other nodes would make no difference as it will always be the same
node that ends up migrating files.
> So for instance, for a replicate volume with server1:/brick1,
server2:/brick2 and server3:/brick3 in that order, only the rebalance process on
server1 would migrate files. In newer releases, all 3 nodes would migrate files.
>
> The rebalance status does not capture the directory operations of fixing
layouts which is why it looks like the other nodes are not doing anything.
>
> Hope this helps.
>
> Regards,
> Nithya
>>
>> behaviour.?
>>
>> >
>> >
>>
>> >Regards,
>>
>> >
>>
>> >Nithya
>>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Sep 9, 2019 06:36, Nithya Balachandran <nbalacha at
redhat.com> wrote:
>>>
>>>
>>>
>>> On Sat, 7 Sep 2019 at 00:03, Strahil Nikolov <hunter86_bg at
yahoo.com> wrote:
>>>>
>>>> As it was mentioned, you might have to run rebalance on the
other node - but it is better to wait this node is over.
>>>>
>>>
>>> Hi Strahil,
>>>
>>> Rebalance does not need to be run on the other node - the operation
is a volume wide one . Only a single node per replica set would migrate files in
the version used in this case .
>>>
>>> Regards,
>>> Nithya
>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>>
>>>> ? ?????, 6 ????????? 2019 ?., 15:29:20 ?. ???????+3, Herb
Burnswell <herbert.burnswell at gmail.com>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190912/8df2f997/attachment.html>

Herb Burnswell

2019-Sep-13 19:54 UTC

head link

[Gluster-users] Rebalancing newly added bricks

Hi,

Well our rebalance seems to have failed.  Here is the output:

# gluster vol rebalance tank status
                                    Node Rebalanced-files          size
  scanned      failures       skipped               status  run time in
h:m:s
                               ---------      -----------   -----------
-----------   -----------   -----------         ------------
--------------
                               localhost          1348706        57.8TB
  2234439             9             6               failed      190:24:3
                               serverB                         0
 0Bytes             7             0             0            completed
  63:47:55
volume rebalance: tank: success

# gluster vol status tank
Status of volume: tank
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick serverA:/gluster_bricks/data1       49162     0          Y       20318
Brick serverB:/gluster_bricks/data1       49166     0          Y       3432
Brick serverA:/gluster_bricks/data2       49163     0          Y       20323
Brick serverB:/gluster_bricks/data2       49167     0          Y       3435
Brick serverA:/gluster_bricks/data3       49164     0          Y       4625
Brick serverA:/gluster_bricks/data4       49165     0          Y       4644
Brick serverA:/gluster_bricks/data5       49166     0          Y       5088
Brick serverA:/gluster_bricks/data6       49167     0          Y       5128
Brick serverB:/gluster_bricks/data3       49168     0          Y       22314
Brick serverB:/gluster_bricks/data4       49169     0          Y       22345
Brick serverB:/gluster_bricks/data5       49170     0          Y       22889
Brick serverB:/gluster_bricks/data6       49171     0          Y       22932
Self-heal Daemon on localhost               N/A       N/A        Y
6202
Self-heal Daemon on serverB               N/A       N/A        Y       22981

Task Status of Volume tank
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : eec64343-8e0d-4523-ad05-5678f9eb9eb2
Status               : failed

# df -hP |grep data
/dev/mapper/gluster_vg-gluster_lv1_data   60T   31T   29T  52%
/gluster_bricks/data1
/dev/mapper/gluster_vg-gluster_lv2_data   60T   31T   29T  51%
/gluster_bricks/data2
/dev/mapper/gluster_vg-gluster_lv3_data   60T   15T   46T  24%
/gluster_bricks/data3
/dev/mapper/gluster_vg-gluster_lv4_data   60T   15T   46T  24%
/gluster_bricks/data4
/dev/mapper/gluster_vg-gluster_lv5_data   60T   15T   45T  25%
/gluster_bricks/data5
/dev/mapper/gluster_vg-gluster_lv6_data   60T   15T   45T  25%
/gluster_bricks/data6


The rebalance log on serverA shows a disconnect from serverB

[2019-09-08 15:41:44.285591] C
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-tank-client-10: server
<serverB>:49170 has not responded in the last 42 seconds, disconnecting.
[2019-09-08 15:41:44.285739] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-tank-client-10: disconnected from
tank-client-10. Client process will keep trying to connect to glusterd
until brick's port is available
[2019-09-08 15:41:44.286023] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff986e8b132] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff986c5299e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff986c52aae] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7ff986c54220] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2b0)[0x7ff986c54ce0] )))))
0-tank-client-10: forced unwinding frame type(GlusterFS 3.3)
op(FXATTROP(34)) called at 2019-09-08 15:40:44.040333 (xid=0x7f8cfac)

Does this type of failure cause data corruption?  What is the best course
of action at this point?

Thanks,

HB

On Wed, Sep 11, 2019 at 11:58 PM Strahil <hunter86_bg at yahoo.com> wrote:
> Hi Nithya,
>
> Thanks for the detailed explanation.
> It makes sense.
>
> Best Regards,
> Strahil Nikolov
> On Sep 12, 2019 08:18, Nithya Balachandran <nbalacha at redhat.com>
wrote:
>
>
>
> On Wed, 11 Sep 2019 at 09:47, Strahil <hunter86_bg at yahoo.com>
wrote:
>
> Hi Nithya,
>
> I just reminded about your previous  e-mail  which left me with the
> impression that old volumes need that.
> This is the one 1 mean:
>
> >It looks like this is a replicate volume. If >that is the case then
yes,
> you are >running an old version of Gluster for >which this was the
default
>
>
> Hi Strahil,
>
> I'm providing a little more detail here which I hope will explain
things.
> Rebalance was always a volume wide operation - a *rebalance start*
> operation will start rebalance processes on all nodes of the volume.
> However, different processes would behave differently. In earlier releases,
> all nodes would crawl the bricks and update the directory layouts. However,
> only one node in each replica/disperse set would actually migrate files,so
> the rebalance status would only show one node doing any "work"
(scanning,
> rebalancing etc). However, this one node will process all the files in its
> replica sets. Rerunning rebalance on other nodes would make no difference
> as it will always be the same node that ends up migrating files.
> So for instance, for a replicate volume with server1:/brick1,
> server2:/brick2 and server3:/brick3 in that order, only the rebalance
> process on server1 would migrate files. In newer releases, all 3 nodes
> would migrate files.
>
> The rebalance status does not capture the directory operations of fixing
> layouts which is why it looks like the other nodes are not doing anything.
>
> Hope this helps.
>
> Regards,
> Nithya
>
> behaviour.
>
> >
> >
>
> >Regards,
>
> >
>
> >Nithya
>
>
> Best Regards,
> Strahil Nikolov
> On Sep 9, 2019 06:36, Nithya Balachandran <nbalacha at redhat.com>
wrote:
>
>
>
> On Sat, 7 Sep 2019 at 00:03, Strahil Nikolov <hunter86_bg at
yahoo.com>
> wrote:
>
> As it was mentioned, you might have to run rebalance on the other node -
> but it is better to wait this node is over.
>
>
> Hi Strahil,
>
> Rebalance does not need to be run on the other node - the operation is a
> volume wide one . Only a single node per replica set would migrate files in
> the version used in this case .
>
> Regards,
> Nithya
>
> Best Regards,
> Strahil Nikolov
>
> ? ?????, 6 ????????? 2019 ?., 15:29:20 ?. ???????+3, Herb Burnswell <
> herbert.burnswell at gmail.com>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190913/a7a0855f/attachment.html>

Gluster users - Sep 2019 - Rebalancing newly added bricks

[Gluster-users] Rebalancing newly added bricks

[Gluster-users] Rebalancing newly added bricks