thr3ads.net - Gluster users - [Gluster-users] Replicated volume does not work if comprised of more than 16 bricks [Dec 2019]

If this information is useful, please help other people find it:
Share via:

Vitaly Pyslar

2019-Dec-13 12:34 UTC

[Gluster-users] Replicated volume does not work if comprised of more than 16 bricks

Hello.

I use gluster replicated volume as a shared storage for elasticsearch snapshot 
repository. Recently I?ve added more nodes in the cluster and encountered a 
problem that  I can't add more than 16 bricks to the volume. When I try to
add
a new brick command fails with an error. File es_snapshots-add-brick-mount.log 
contains the following messages:

[2019-12-13 10:44:42.659062] E [MSGID: 114058] [client-handshake.c:
1456:client_query_portmap_cbk] 0-es_snapshots-client-16: failed to get the 
port number for remote subvolume. Please run 'gluster volume status' on
server
to see if brick 
process is running.  
[2019-12-13 10:44:42.659081] I [socket.c:864:__socket_shutdown] 0-
es_snapshots-client-16: intentional socket shutdown(27) 
[2019-12-13 10:44:42.659275] I [MSGID: 114018] [client.c:
2347:client_rpc_notify] 0-es_snapshots-client-16: disconnected from 
es_snapshots-client-16. Client process will keep trying to connect to glusterd 
until brick's port is available  
[2019-12-13 10:44:42.659301] E [MSGID: 108006] [afr-common.c:
5357:__afr_handle_child_down_event] 0-es_snapshots-replicate-0: All subvolumes 
are down. Going offline until at least one of them comes back up.

Brick process is not starting.  Messages from glusterd.log:

[2019-12-13 10:44:42.631969] I [MSGID: 106578] [glusterd-brick-ops.c:
1024:glusterd_op_perform_add_bricks] 0-management: replica-count is set 17  
[2019-12-13 10:44:42.632004] I [MSGID: 106578] [glusterd-brick-ops.c:
1033:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to 
change it  
[2019-12-13 10:44:42.703263] E [MSGID: 106053] [glusterd-utils.c:
13751:glusterd_handle_replicate_brick_ops] 0-management: Failed to set 
extended attribute trusted.add-brick : Transport endpoint is not connected 
[Transport endpoint is not
connected] 
[2019-12-13 10:44:42.732214] E [MSGID: 106073] [glusterd-brick-ops.c:
2051:glusterd_op_add_brick] 0-glusterd: Unable to add bricks  
[2019-12-13 10:44:42.732244] E [MSGID: 106122] [glusterd-mgmt.c:
317:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed.  
[2019-12-13 10:44:42.732253] E [MSGID: 106122] [glusterd-mgmt-handler.c:
594:glusterd_handle_commit_fn] 0-management: commit failed on operation Add 
brick

If I delete the volume and recreate it from scratch with all bricks at once, 
the volume successfully starts and command gluster volume status shows that 
all bricks are running and all TCP-ports are available. But when I try to 
mount it with gluster native client, it fails with the error:

[2019-12-12 12:10:48.513177] E [fuse-bridge.c:5235:fuse_first_lookup] 0-fuse: 
first lookup on root failed (Transport endpoint is not connected)

If I remove one brick from the volume, then I can add the new one. But if the 
volume is comprised of more than 16 bricks then adding new bricks always 
fails.

My OS is Ubuntu 16.04 and Gluster version is 7.0.

I suppose this can be a bug, but I couldn't find any related issues. Does 
anyone use a replicated volumes with more than 16 bricks?

Gluster users - Dec 2019 - Replicated volume does not work if comprised of more than 16 bricks

[Gluster-users] Replicated volume does not work if comprised of more than 16 bricks