Vitaly Pyslar
2019-Dec-13 12:34 UTC
[Gluster-users] Replicated volume does not work if comprised of more than 16 bricks
Hello. I use gluster replicated volume as a shared storage for elasticsearch snapshot repository. Recently I?ve added more nodes in the cluster and encountered a problem that I can't add more than 16 bricks to the volume. When I try to add a new brick command fails with an error. File es_snapshots-add-brick-mount.log contains the following messages: [2019-12-13 10:44:42.659062] E [MSGID: 114058] [client-handshake.c: 1456:client_query_portmap_cbk] 0-es_snapshots-client-16: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2019-12-13 10:44:42.659081] I [socket.c:864:__socket_shutdown] 0- es_snapshots-client-16: intentional socket shutdown(27) [2019-12-13 10:44:42.659275] I [MSGID: 114018] [client.c: 2347:client_rpc_notify] 0-es_snapshots-client-16: disconnected from es_snapshots-client-16. Client process will keep trying to connect to glusterd until brick's port is available [2019-12-13 10:44:42.659301] E [MSGID: 108006] [afr-common.c: 5357:__afr_handle_child_down_event] 0-es_snapshots-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. Brick process is not starting. Messages from glusterd.log: [2019-12-13 10:44:42.631969] I [MSGID: 106578] [glusterd-brick-ops.c: 1024:glusterd_op_perform_add_bricks] 0-management: replica-count is set 17 [2019-12-13 10:44:42.632004] I [MSGID: 106578] [glusterd-brick-ops.c: 1033:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-12-13 10:44:42.703263] E [MSGID: 106053] [glusterd-utils.c: 13751:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2019-12-13 10:44:42.732214] E [MSGID: 106073] [glusterd-brick-ops.c: 2051:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2019-12-13 10:44:42.732244] E [MSGID: 106122] [glusterd-mgmt.c: 317:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2019-12-13 10:44:42.732253] E [MSGID: 106122] [glusterd-mgmt-handler.c: 594:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick If I delete the volume and recreate it from scratch with all bricks at once, the volume successfully starts and command gluster volume status shows that all bricks are running and all TCP-ports are available. But when I try to mount it with gluster native client, it fails with the error: [2019-12-12 12:10:48.513177] E [fuse-bridge.c:5235:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) If I remove one brick from the volume, then I can add the new one. But if the volume is comprised of more than 16 bricks then adding new bricks always fails. My OS is Ubuntu 16.04 and Gluster version is 7.0. I suppose this can be a bug, but I couldn't find any related issues. Does anyone use a replicated volumes with more than 16 bricks?