I have 2 data centers in two different region, each DC have 3 severs, I have created glusterfs volume with 4 replica, this is glusterfs volume info output: Volume Name: test-halo Type: Replicate Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: 10.0.0.1:/mnt/test1 Brick2: 10.0.0.3:/mnt/test2 Brick3: 10.0.0.5:/mnt/test3 Brick4: 10.0.0.6:/mnt/test4 Options Reconfigured: cluster.halo-shd-max-latency: 5 cluster.halo-max-latency: 10 cluster.quorum-count: 2 cluster.quorum-type: fixed cluster.halo-enabled: yes transport.address-family: inet nfs.disable: on bricks with ip 10.0.0.1 & 10.0.0.3 are in region A and bricks with ip 10.0.0.5 & 10.0.0.6 are in region B when I mount the volume in region A, I except the data first store in brick1 & brick2, then asynchronously the data copies in region B, on brick3 & brick4. Am I write? this is what halo claims? If yes, unfortunately, this not happen to me, no differ I mount the volume in region A or mount the volume in region B, all the data are copied in brick3 & brick4 and no data copies in brick1 & brick2. ping bricks ip from region A is as follows: ping 10.0.0.1 & 10.0.0.3 are bellow time=0.500 ms ping 10.0.0.5 & 10.0.0.6 are more than time=20 ms What is the logic that the halo select the bricks to write to?if it is the access time, so when I mount the volume in region A, the ping time to brick1 & brick2 is bellow 0.5 ms, but the halo select the brick3 & brick4!!!! glusterfs version is: glusterfs 3.12.4 I really need to work with halo feature, But I am not successful to run this case, Can anyone help me soon?? Thx alot -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180204/5d4d15de/attachment.html>
I have mounted the halo glusterfs volume in debug mode, and the output is as follows: . . . [2018-02-05 11:42:48.282473] D [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-test-halo-client-1: Ping latency is 0ms [2018-02-05 11:42:48.282502] D [MSGID: 0] [afr-common.c:5025:afr_get_halo_latency] 0-test-halo-replicate-0: Using halo latency 10 [2018-02-05 11:42:48.282525] D [MSGID: 0] [afr-common.c:4820:__afr_handle_ping_event] 0-test-halo-client-1: Client ping @ 140032933708544 ms . . . [2018-02-05 11:42:48.393776] D [MSGID: 0] [afr-common.c:4803:find_worst_up_child] 0-test-halo-replicate-0: Found worst up child (1) @ 140032933708544 ms latency [2018-02-05 11:42:48.393803] D [MSGID: 0] [afr-common.c:4903:__afr_handle_child_up_event] 0-test-halo-replicate-0: Marking child 1 down, doesn't meet halo threshold (10), and > halo_min_replicas (2) . . . I think these debug output means: As the ping time for test-halo-client-1 (brick2) is (0.5ms) and it is not under halo threshold (10 ms), this false decision for selecting bricks happen to halo. I can not set the halo threshold to 0 because: #gluster vol set test-halo cluster.halo-max-latency 0 volume set: failed: '0' in 'option halo-max-latency 0' is out of range [1 - 99999] so I think the range [1 - 99999] should change to [0 - 99999], so I can get the desired brick selection for halo feature, am I right? If not, why the halo decide to mark down the best brick which has ping time bellow 0.5ms? On Sun, Feb 4, 2018 at 2:27 PM, atris adam <atris.adam at gmail.com> wrote:> I have 2 data centers in two different region, each DC have 3 severs, I > have created glusterfs volume with 4 replica, this is glusterfs volume info > output: > > > Volume Name: test-halo > Type: Replicate > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 4 = 4 > Transport-type: tcp > Bricks: > Brick1: 10.0.0.1:/mnt/test1 > Brick2: 10.0.0.3:/mnt/test2 > Brick3: 10.0.0.5:/mnt/test3 > Brick4: 10.0.0.6:/mnt/test4 > Options Reconfigured: > cluster.halo-shd-max-latency: 5 > cluster.halo-max-latency: 10 > cluster.quorum-count: 2 > cluster.quorum-type: fixed > cluster.halo-enabled: yes > transport.address-family: inet > nfs.disable: on > > bricks with ip 10.0.0.1 & 10.0.0.3 are in region A and bricks with ip > 10.0.0.5 & 10.0.0.6 are in region B > > > when I mount the volume in region A, I except the data first store in > brick1 & brick2, then asynchronously the data copies in region B, on brick3 > & brick4. > > Am I write? this is what halo claims? > > If yes, unfortunately, this not happen to me, no differ I mount the volume > in region A or mount the volume in region B, all the data are copied in > brick3 & brick4 and no data copies in brick1 & brick2. > > ping bricks ip from region A is as follows: > ping 10.0.0.1 & 10.0.0.3 are bellow time=0.500 ms > ping 10.0.0.5 & 10.0.0.6 are more than time=20 ms > > What is the logic that the halo select the bricks to write to?if it is the > access time, so when I mount the volume in region A, the ping time to > brick1 & brick2 is bellow 0.5 ms, but the halo select the brick3 & > brick4!!!! > > glusterfs version is: > glusterfs 3.12.4 > > I really need to work with halo feature, But I am not successful to run > this case, Can anyone help me soon?? > > > Thx alot >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180205/cc463dbc/attachment.html>
I have checked more and mount the volume in another region (in region c), the ping time from region c is as follows: ping 10.0.0.1 & 10.0.0.3 are bellow time=12 ms ping 10.0.0.5 & 10.0.0.6 are more than time=32 ms I expect the bricks with lower ping time to be selected at write time, but still the brick selection is not as desired and those bricks with more ping time are selected. I change the cluster.halo-max-latency to 20, but this not affect anything. on more thing is, the previous email I wrote was not with the right result, I though that by changing the range to [0-999999] everything will be ok, but my today experience shows that I was wrong. any help will be appreciated ;) On Mon, Feb 5, 2018 at 4:04 PM, atris adam <atris.adam at gmail.com> wrote:> I have mounted the halo glusterfs volume in debug mode, and the output is > as follows: > . > . > . > [2018-02-05 11:42:48.282473] D [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] > 0-test-halo-client-1: Ping latency is 0ms > [2018-02-05 11:42:48.282502] D [MSGID: 0] [afr-common.c:5025:afr_get_halo_latency] > 0-test-halo-replicate-0: Using halo latency 10 > [2018-02-05 11:42:48.282525] D [MSGID: 0] [afr-common.c:4820:__afr_handle_ping_event] > 0-test-halo-client-1: Client ping @ 140032933708544 ms > . > . > . > [2018-02-05 11:42:48.393776] D [MSGID: 0] [afr-common.c:4803:find_worst_up_child] > 0-test-halo-replicate-0: Found worst up child (1) @ 140032933708544 ms > latency > [2018-02-05 11:42:48.393803] D [MSGID: 0] [afr-common.c:4903:__afr_handle_child_up_event] > 0-test-halo-replicate-0: Marking child 1 down, doesn't meet halo threshold > (10), and > halo_min_replicas (2) > . > . > . > > I think these debug output means: > As the ping time for test-halo-client-1 (brick2) is (0.5ms) and it is not > under halo threshold (10 ms), this false decision for selecting bricks > happen to halo. > I can not set the halo threshold to 0 because: > > #gluster vol set test-halo cluster.halo-max-latency 0 > volume set: failed: '0' in 'option halo-max-latency 0' is out of range [1 > - 99999] > > so I think the range [1 - 99999] should change to [0 - 99999], so I can > get the desired brick selection for halo feature, am I right? If not, why > the halo decide to mark down the best brick which has ping time bellow > 0.5ms? > > On Sun, Feb 4, 2018 at 2:27 PM, atris adam <atris.adam at gmail.com> wrote: > >> I have 2 data centers in two different region, each DC have 3 severs, I >> have created glusterfs volume with 4 replica, this is glusterfs volume info >> output: >> >> >> Volume Name: test-halo >> Type: Replicate >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 4 = 4 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.0.1:/mnt/test1 >> Brick2: 10.0.0.3:/mnt/test2 >> Brick3: 10.0.0.5:/mnt/test3 >> Brick4: 10.0.0.6:/mnt/test4 >> Options Reconfigured: >> cluster.halo-shd-max-latency: 5 >> cluster.halo-max-latency: 10 >> cluster.quorum-count: 2 >> cluster.quorum-type: fixed >> cluster.halo-enabled: yes >> transport.address-family: inet >> nfs.disable: on >> >> bricks with ip 10.0.0.1 & 10.0.0.3 are in region A and bricks with ip >> 10.0.0.5 & 10.0.0.6 are in region B >> >> >> when I mount the volume in region A, I except the data first store in >> brick1 & brick2, then asynchronously the data copies in region B, on brick3 >> & brick4. >> >> Am I write? this is what halo claims? >> >> If yes, unfortunately, this not happen to me, no differ I mount the >> volume in region A or mount the volume in region B, all the data are copied >> in brick3 & brick4 and no data copies in brick1 & brick2. >> >> ping bricks ip from region A is as follows: >> ping 10.0.0.1 & 10.0.0.3 are bellow time=0.500 ms >> ping 10.0.0.5 & 10.0.0.6 are more than time=20 ms >> >> What is the logic that the halo select the bricks to write to?if it is >> the access time, so when I mount the volume in region A, the ping time to >> brick1 & brick2 is bellow 0.5 ms, but the halo select the brick3 & >> brick4!!!! >> >> glusterfs version is: >> glusterfs 3.12.4 >> >> I really need to work with halo feature, But I am not successful to run >> this case, Can anyone help me soon?? >> >> >> Thx alot >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180206/c83e80f8/attachment.html>