Adrian Gruntkowski
2015-Oct-09 11:50 UTC
[Gluster-users] Problem with quorum on a replicated volume with 2 bricks and 3 nodes in trusted pool
Hello everyone, I'm trying to setup a quorum on my cluster and hit an issue where taking down one node blocks writing on the affected volume. The thing is, I have 3 servers where 2 volumes are setup in a cross-over manner, like this: [Server1: vol1]<--->[Server2: vol1 vol2]<--->[Server3: vol2]. The trusted pool contains 3 servers so AFAIK taking down, for example, "Server3" shouldn't take down "vol2", but it does with "quorum not met" message in the logs [2015-10-09 11:12:55.386736] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-system_mail1-client-1: server 172.16.11.112:49152 has not responded in the last 42 seconds, disconnecting. [2015-10-09 11:12:55.387213] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f950b98340a] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f950b74e4df] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950b74e5fe] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f950b74fdcc] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f950b750578] ))))) 0-system_mail1-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-10-09 11:12:00.087425 (xid=0x517) [2015-10-09 11:12:55.387238] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] [2015-10-09 11:12:55.387429] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f950b98340a] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f950b74e4df] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950b74e5fe] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f950b74fdcc] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f950b750578] ))))) 0-system_mail1-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-10-09 11:12:07.374032 (xid=0x518) [2015-10-09 11:12:55.387591] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f950b98340a] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f950b74e4df] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950b74e5fe] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f950b74fdcc] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f950b750578] ))))) 0-system_mail1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-10-09 11:12:13.381487 (xid=0x519) [2015-10-09 11:12:55.387614] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk] 0-system_mail1-client-1: socket disconnected [2015-10-09 11:12:55.387624] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-system_mail1-client-1: disconnected from system_mail1-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-10-09 11:12:55.387635] W [MSGID: 108001] [afr-common.c:4043:afr_notify] 0-system_mail1-replicate-0: Client-quorum is not met [2015-10-09 11:12:55.387959] I [socket.c:3362:socket_submit_request] 0-system_mail1-client-1: not connected (priv->connected = 0) [2015-10-09 11:12:55.387972] W [rpc-clnt.c:1571:rpc_clnt_submit] 0-system_mail1-client-1: failed to submit rpc-request (XID: 0x51a Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (system_mail1-client-1) [2015-10-09 11:12:55.387982] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: remote operation failed. Path: /images (a63d0ff2-cb42-4cee-9df7-477459539788) [Transport endpoint is not connected] [2015-10-09 11:12:55.388653] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected] [2015-10-09 11:13:03.245909] I [MSGID: 108031] [afr-common.c:1745:afr_local_discovery_cbk] 0-system_mail1-replicate-0: selecting local read_child system_mail1-client-0 [2015-10-09 11:13:04.734547] W [fuse-bridge.c:1937:fuse_create_cbk] 0-glusterfs-fuse: 1253: /x => -1 (Read-only file system) [2015-10-09 11:13:10.419069] E [socket.c:2332:socket_connect_finish] 0-system_mail1-client-1: connection to 172.16.11.112:24007 failed (Connection timed out) [2015-10-09 11:12:55.387447] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] (Another weird thing is that glusterfs version reported in logs is 3.7.3, when the Debian packages installed on my system are for 3.7.3 - don't know if it's meant to be that way). Below is the output of "gluster volume info" on one of the servers (there are 4 volumes in my actual setup): Volume Name: data_mail1 Type: Replicate Volume ID: c2833dbe-aaa5-49d0-91d3-5abb44efb48c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/data/mail1 Brick2: mail-rep:/GFS/data/mail1 Options Reconfigured: cluster.quorum-count: 2 auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* performance.readdir-ahead: on cluster.server-quorum-type: server cluster.quorum-type: fixed cluster.server-quorum-ratio: 51% Volume Name: data_www1 Type: Replicate Volume ID: 385a7052-3ab5-42c2-93bc-6e10c4e7c0f1 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/data/www1 Brick2: web-rep:/GFS/data/www1 Options Reconfigured: cluster.quorum-count: 2 auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* performance.readdir-ahead: on cluster.server-quorum-type: server cluster.quorum-type: fixed cluster.server-quorum-ratio: 51% Volume Name: system_mail1 Type: Replicate Volume ID: 82dc0617-d855-4bf0-b5e5-c4147ca15779 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/system/mail1 Brick2: mail-rep:/GFS/system/mail1 Options Reconfigured: cluster.quorum-count: 2 auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* performance.readdir-ahead: on cluster.server-quorum-type: server cluster.quorum-type: fixed cluster.server-quorum-ratio: 51% Volume Name: system_www1 Type: Replicate Volume ID: 83868eb5-7b32-4e80-882c-e83361b267b9 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: cluster-rep:/GFS/system/www1 Brick2: web-rep:/GFS/system/www1 Options Reconfigured: cluster.quorum-count: 2 auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* performance.readdir-ahead: on cluster.server-quorum-type: server cluster.quorum-type: fixed cluster.server-quorum-ratio: 51% I have also tried switching between "cluster.qourum-type" = fixed|auto, with the same result. What am I missing? Is there a way to add a "fake brick" to meet the quorum requirements without holding 3rd replica of the data? -- Regards, Adrian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151009/9f85bc8b/attachment.html>
Ravishankar N
2015-Oct-09 14:17 UTC
[Gluster-users] Problem with quorum on a replicated volume with 2 bricks and 3 nodes in trusted pool
On 10/09/2015 05:20 PM, Adrian Gruntkowski wrote:> Hello everyone, > > I'm trying to setup a quorum on my cluster and hit an issue where > taking down one node blocks writing > on the affected volume. The thing is, I have 3 servers where 2 volumes > are setup in a cross-over manner, > like this: > > [Server1: vol1]<--->[Server2: vol1 vol2]<--->[Server3: vol2]. > > The trusted pool contains 3 servers so AFAIK taking down, for example, > "Server3" shouldn't take down "vol2", > but it does with "quorum not met" message in the logs > > [2015-10-09 11:12:55.386736] C > [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] > 0-system_mail1-client-1: server 172.16.11.112:49152 > <http://172.16.11.112:49152> has not responded in the last 42 seconds, > disconnecting. > [2015-10-09 11:12:55.387213] E [rpc-clnt.c:362:saved_frames_unwind] > (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f950b98340a] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f950b74e4df] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950b74e5fe] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f950b74fdcc] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f950b750578] > ))))) 0-system_mail1-client-1: forced unwinding frame type(GlusterFS > 3.3) op(LOOKUP(27)) called at 2015-10-09 11:12:00.087425 (xid=0x517) > [2015-10-09 11:12:55.387238] W [MSGID: 114031] > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: > remote operation failed. Path: / > (00000000-0000-0000-0000-000000000001) [Transport endpoint is not > connected] > [2015-10-09 11:12:55.387429] E [rpc-clnt.c:362:saved_frames_unwind] > (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f950b98340a] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f950b74e4df] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950b74e5fe] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f950b74fdcc] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f950b750578] > ))))) 0-system_mail1-client-1: forced unwinding frame type(GlusterFS > 3.3) op(LOOKUP(27)) called at 2015-10-09 11:12:07.374032 (xid=0x518) > [2015-10-09 11:12:55.387591] E [rpc-clnt.c:362:saved_frames_unwind] > (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x19a)[0x7f950b98340a] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f950b74e4df] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950b74e5fe] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x9c)[0x7f950b74fdcc] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f950b750578] > ))))) 0-system_mail1-client-1: forced unwinding frame type(GF-DUMP) > op(NULL(2)) called at 2015-10-09 11:12:13.381487 (xid=0x519) > [2015-10-09 11:12:55.387614] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk] > 0-system_mail1-client-1: socket disconnected > [2015-10-09 11:12:55.387624] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-system_mail1-client-1: > disconnected from system_mail1-client-1. Client process will keep > trying to connect to glusterd until brick's port is available > [2015-10-09 11:12:55.387635] W [MSGID: 108001] > [afr-common.c:4043:afr_notify] 0-system_mail1-replicate-0: > Client-quorum is not met > [2015-10-09 11:12:55.387959] I [socket.c:3362:socket_submit_request] > 0-system_mail1-client-1: not connected (priv->connected = 0) > [2015-10-09 11:12:55.387972] W [rpc-clnt.c:1571:rpc_clnt_submit] > 0-system_mail1-client-1: failed to submit rpc-request (XID: 0x51a > Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport > (system_mail1-client-1) > [2015-10-09 11:12:55.387982] W [MSGID: 114031] > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: > remote operation failed. Path: /images > (a63d0ff2-cb42-4cee-9df7-477459539788) [Transport endpoint is not > connected] > [2015-10-09 11:12:55.388653] W [MSGID: 114031] > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: > remote operation failed. Path: (null) > (00000000-0000-0000-0000-000000000000) [Transport endpoint is not > connected] > [2015-10-09 11:13:03.245909] I [MSGID: 108031] > [afr-common.c:1745:afr_local_discovery_cbk] > 0-system_mail1-replicate-0: selecting local read_child > system_mail1-client-0 > [2015-10-09 11:13:04.734547] W [fuse-bridge.c:1937:fuse_create_cbk] > 0-glusterfs-fuse: 1253: /x => -1 (Read-only file system) > [2015-10-09 11:13:10.419069] E [socket.c:2332:socket_connect_finish] > 0-system_mail1-client-1: connection to 172.16.11.112:24007 > <http://172.16.11.112:24007> failed (Connection timed out) > [2015-10-09 11:12:55.387447] W [MSGID: 114031] > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-system_mail1-client-1: > remote operation failed. Path: / > (00000000-0000-0000-0000-000000000001) [Transport endpoint is not > connected] > > (Another weird thing is that glusterfs version reported in logs is > 3.7.3, when the Debian packages installed > on my system are for 3.7.3 - don't know if it's meant to be that way). > > Below is the output of "gluster volume info" on one of the servers > (there are 4 volumes in my actual setup): > > Volume Name: data_mail1 > Type: Replicate > Volume ID: c2833dbe-aaa5-49d0-91d3-5abb44efb48c > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/data/mail1 > Brick2: mail-rep:/GFS/data/mail1 > Options Reconfigured: > cluster.quorum-count: 2 > auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* > performance.readdir-ahead: on > cluster.server-quorum-type: server > cluster.quorum-type: fixed > cluster.server-quorum-ratio: 51% > Volume Name: data_www1 > Type: Replicate > Volume ID: 385a7052-3ab5-42c2-93bc-6e10c4e7c0f1 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/data/www1 > Brick2: web-rep:/GFS/data/www1 > Options Reconfigured: > cluster.quorum-count: 2 > auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* > performance.readdir-ahead: on > cluster.server-quorum-type: server > cluster.quorum-type: fixed > cluster.server-quorum-ratio: 51% > Volume Name: system_mail1 > Type: Replicate > Volume ID: 82dc0617-d855-4bf0-b5e5-c4147ca15779 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/system/mail1 > Brick2: mail-rep:/GFS/system/mail1 > Options Reconfigured: > cluster.quorum-count: 2 > auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* > performance.readdir-ahead: on > cluster.server-quorum-type: server > cluster.quorum-type: fixed > cluster.server-quorum-ratio: 51% > Volume Name: system_www1 > Type: Replicate > Volume ID: 83868eb5-7b32-4e80-882c-e83361b267b9 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: cluster-rep:/GFS/system/www1 > Brick2: web-rep:/GFS/system/www1 > Options Reconfigured: > cluster.quorum-count: 2 > auth.allow: 127.0.0.1,172.16.11.*,172.16.12.* > performance.readdir-ahead: on > cluster.server-quorum-type: server > cluster.quorum-type: fixed > cluster.server-quorum-ratio: 51% > > I have also tried switching between "cluster.qourum-type" = > fixed|auto, with the same result. >Client-quorum (cluster.quorum-type and cluster.quorum-count) options apply to AFR (the replicate translator). You have set it to 'fixed' with a count of '2'. This means that both bricks of the volume(s) need to be online to meet quorum. If only one brick is up, the volume becomes read-only. For replica 2 volumes, it doesn't make much sense to enable client-quorum without losing high availability (i.e volume still being writable when only one brick is up).> What am I missing? Is there a way to add a "fake brick" to meet the > quorum requirements without > holding 3rd replica of the data?You can try using arbiter-volumes [1] that are a good compromise between replica-2 and normal replica-3 volumes. Also server-quorum (cluster.server-quorum-type and cluster.server-quorum-ratio) doesn't help much in avoiding data split-brains, so IMO, you might as well disable it. -Ravi [1] https://github.com/gluster/glusterfs-specs/blob/master/done/Features/afr-arbiter-volumes.md> > -- > Regards, > Adrian > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151009/7debb3fd/attachment.html>