Jamie Lawrence
2019-Mar-07 19:32 UTC
[Gluster-users] Cannot write more than 512 bytes to gluster vol
I just stood up a new cluster running 4.1.7, my first experience with version 4. It is a simple replica 3 volume: gluster v create la1_db_1 replica 3 \ gluster-10g-1:/gluster-bricks/la1_db_1/la1_db_1 \ gluster-10g-2:/gluster-bricks/la1_db_1/la1_db_1 \ gluster-10g-3:/gluster-bricks/la1_db_1/la1_db_1 gluster v set la1_db_1 storage.owner-uid 130 gluster v set la1_db_1 storage.owner-gid 130 gluster v set la1_db_1 server.allow-insecure on gluster v set la1_db_1 auth.allow [various IPs] After mounting on a client, everything appears fine until you try to use it. dd if=/dev/zero of=/path/on/client/foo will write 512 bytes and then hang until timeout, at which point it declares "Transport not connected". Notably, if I mount the volume on one of the gluster machines over the same interface, it behave like it should. That led me to investigate packet filtering, which is configured correctly, and in any case, after flushing all rules on all involved machines, the issue persists. cli.log contains a lot of: [2019-03-06 17:00:02.893553] I [cli.c:773:main] 0-cli: Started running /sbin/gluster with version 4.1.7 [2019-03-06 17:00:02.897199] I [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not installed [2019-03-06 17:00:02.897545] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-03-06 17:00:02.897617] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2019-03-06 17:00:02.897678] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0 [2019-03-06 17:00:02.898244] I [input.c:31:cli_batch] 0-: Exiting with: 0 [2019-03-06 17:00:02.922637] I [cli.c:773:main] 0-cli: Started running /sbin/gluster with version 4.1.7 [2019-03-06 17:00:02.926599] I [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not installed [2019-03-06 17:00:02.926906] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-03-06 17:00:02.926956] I [socket.c:2632:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2019-03-06 17:00:02.927113] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glusterfs: error returned while attempting to connect to host:(null), port:0 [2019-03-06 17:00:02.927573] I [input.c:31:cli_batch] 0-: Exiting with: 0 The client log is more interesting, I just don't know what to make of it: [2019-03-07 19:18:36.674687] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.674726] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.674752] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.674806] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-0: changing port to 49152 (from 0) [2019-03-07 19:18:36.674815] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-1: changing port to 49152 (from 0) [2019-03-07 19:18:36.674927] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-2: changing port to 49152 (from 0) [2019-03-07 19:18:36.675012] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675054] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675155] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675203] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675243] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675306] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:18:36.675563] I [MSGID: 114046] [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-1: Connected to la1_db_1-client-1, attached to remote volume '/gluster-bricks/la1_db_1/la1_db_1'. [2019-03-07 19:18:36.675573] I [MSGID: 108005] [afr-common.c:5336:__afr_handle_child_up_event] 0-la1_db_1-replicate-0: Subvolume 'la1_db_1-client-1' came back up; going online. [2019-03-07 19:18:36.675722] I [MSGID: 114046] [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-0: Connected to la1_db_1-client-0, attached to remote volume '/gluster-bricks/la1_db_1/la1_db_1'. [2019-03-07 19:18:36.675728] I [MSGID: 114046] [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-2: Connected to la1_db_1-client-2, attached to remote volume '/gluster-bricks/la1_db_1/la1_db_1'. [2019-03-07 19:18:36.675743] I [MSGID: 108002] [afr-common.c:5611:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is met [2019-03-07 19:18:36.676564] I [fuse-bridge.c:4294:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2019-03-07 19:18:36.676578] I [fuse-bridge.c:4927:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-03-07 19:18:36.677623] I [MSGID: 109005] [dht-selfheal.c:2342:dht_selfheal_directory] 0-la1_db_1-dht: Directory selfheal failed: Unable to form layout for directory / [2019-03-07 19:20:01.674361] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-0: server 172.16.0.171:49152 has not responded in the last 42 seconds, disconnecting. [2019-03-07 19:20:01.674440] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-1: server 172.16.0.172:49152 has not responded in the last 42 seconds, disconnecting. [2019-03-07 19:20:01.674458] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-2: server 172.16.0.174:49152 has not responded in the last 42 seconds, disconnecting. [2019-03-07 19:20:01.674455] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-la1_db_1-client-0: disconnected from la1_db_1-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-03-07 19:20:01.674530] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-la1_db_1-client-1: disconnected from la1_db_1-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-03-07 19:20:01.674557] W [MSGID: 108001] [afr-common.c:5618:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is not met [2019-03-07 19:20:01.674776] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-0: forced unwinding frame type(GlusterFS 4.x v1) op(LOOKUP(27)) called at 2019-03-07 19:18:47.415290 (xid=0x11) [2019-03-07 19:20:01.674798] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-0: remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport endpoint is not connected] [2019-03-07 19:20:01.674808] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-1: forced unwinding frame type(GlusterFS 4.x v1) op(LOOKUP(27)) called at 2019-03-07 19:18:47.415303 (xid=0x15) [2019-03-07 19:20:01.674830] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-1: remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport endpoint is not connected] [2019-03-07 19:20:01.674888] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-07 19:19:19.670189 (xid=0x12) [2019-03-07 19:20:01.674902] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-la1_db_1-client-0: socket disconnected [2019-03-07 19:20:01.674916] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-07 19:19:19.670216 (xid=0x16) [2019-03-07 19:20:01.674932] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-la1_db_1-client-1: socket disconnected [2019-03-07 19:20:01.674935] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-la1_db_1-client-2: disconnected from la1_db_1-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2019-03-07 19:20:01.674947] E [MSGID: 108006] [afr-common.c:5413:__afr_handle_child_down_event] 0-la1_db_1-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-03-07 19:20:01.675040] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-2: forced unwinding frame type(GlusterFS 4.x v1) op(LOOKUP(27)) called at 2019-03-07 19:18:47.415315 (xid=0x11) [2019-03-07 19:20:01.675055] W [MSGID: 114031] [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-2: remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) [Transport endpoint is not connected] [2019-03-07 19:20:01.675077] E [MSGID: 101046] [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675097] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve (Transport endpoint is not connected) [2019-03-07 19:20:01.675164] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675176] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675179] E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675194] W [fuse-bridge.c:565:fuse_entry_cbk] 0-glusterfs-fuse: 18: LOOKUP() /foo => -1 (Transport endpoint is not connected) [2019-03-07 19:20:01.675280] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675282] E [rpc-clnt.c:348:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) 0-la1_db_1-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-07 19:19:19.670241 (xid=0x12) [2019-03-07 19:20:01.675308] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-la1_db_1-client-2: socket disconnected [2019-03-07 19:20:01.675310] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675314] E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675332] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve (Transport endpoint is not connected) [2019-03-07 19:20:01.675379] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675390] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675393] E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675405] W [fuse-bridge.c:565:fuse_entry_cbk] 0-glusterfs-fuse: 20: LOOKUP() /foo => -1 (Transport endpoint is not connected) [2019-03-07 19:20:01.675447] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675470] I [MSGID: 108006] [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up [2019-03-07 19:20:01.675472] E [MSGID: 101046] [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null [2019-03-07 19:20:01.675485] W [fuse-bridge.c:899:fuse_attr_cbk] 0-glusterfs-fuse: 21: LOOKUP() / => -1 (Transport endpoint is not connected) [2019-03-07 19:20:11.675946] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.675989] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676040] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676101] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676138] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-0: changing port to 49152 (from 0) [2019-03-07 19:20:11.676261] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676271] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-1: changing port to 49152 (from 0) [2019-03-07 19:20:11.676293] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676358] I [rpc-clnt.c:2105:rpc_clnt_reconfig] 0-la1_db_1-client-2: changing port to 49152 (from 0) [2019-03-07 19:20:11.676455] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-0: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676477] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676571] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676684] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-1: error returned while attempting to connect to host:(null), port:0 [2019-03-07 19:20:11.676753] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-la1_db_1-client-2: error returned while attempting to connect to host:(null), port:0 This is a 10G network that is otherwise performing flawlessly, and I'm seeing no errors on the switches; as mentioned, I've tried this both in our normal configuration and with iptables flushed. Not entirely sure what to try next; any suggestions? Thanks, -j
Poornima Gurusiddaiah
2019-Mar-08 01:09 UTC
[Gluster-users] Cannot write more than 512 bytes to gluster vol
>From the client log, looks like the host is null and port is 0, hence theclient is not able to connect to the bricks(Gluster volume). The client tries to connect to Glusterd daemon on the host specified in the mount command to get the hosts and port(volfile) on which the bricks are running. Have you set the firewall rules for opening the ports required by Gluster? Also can you share the complete client log preferably with TRACE log level? #gluster vol set volname client-log-level TRACE Please reset it back once you have collected the log, else this can slow down and also fill up the logs dir. #gluster vol reset volname client-log-level Regards, Poornima On Fri, Mar 8, 2019, 1:03 AM Jamie Lawrence <jlawrence at squaretrade.com> wrote:> I just stood up a new cluster running 4.1.7, my first experience with > version 4. It is a simple replica 3 volume: > > gluster v create la1_db_1 replica 3 \ > gluster-10g-1:/gluster-bricks/la1_db_1/la1_db_1 \ > gluster-10g-2:/gluster-bricks/la1_db_1/la1_db_1 \ > gluster-10g-3:/gluster-bricks/la1_db_1/la1_db_1 > > gluster v set la1_db_1 storage.owner-uid 130 > gluster v set la1_db_1 storage.owner-gid 130 > gluster v set la1_db_1 server.allow-insecure on > gluster v set la1_db_1 auth.allow [various IPs] > > After mounting on a client, everything appears fine until you try to use > it. > > dd if=/dev/zero of=/path/on/client/foo > > will write 512 bytes and then hang until timeout, at which point it > declares "Transport not connected". > > Notably, if I mount the volume on one of the gluster machines over the > same interface, it behave like it should. That led me to investigate packet > filtering, which is configured correctly, and in any case, after flushing > all rules on all involved machines, the issue persists. > > cli.log contains a lot of: > > [2019-03-06 17:00:02.893553] I [cli.c:773:main] 0-cli: Started running > /sbin/gluster with version 4.1.7 > [2019-03-06 17:00:02.897199] I > [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not > installed > [2019-03-06 17:00:02.897545] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-03-06 17:00:02.897617] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-03-06 17:00:02.897678] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-03-06 17:00:02.898244] I [input.c:31:cli_batch] 0-: Exiting with: 0 > [2019-03-06 17:00:02.922637] I [cli.c:773:main] 0-cli: Started running > /sbin/gluster with version 4.1.7 > [2019-03-06 17:00:02.926599] I > [cli-cmd-volume.c:2375:cli_check_gsync_present] 0-: geo-replication not > installed > [2019-03-06 17:00:02.926906] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-03-06 17:00:02.926956] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-03-06 17:00:02.927113] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-03-06 17:00:02.927573] I [input.c:31:cli_batch] 0-: Exiting with: 0 > > The client log is more interesting, I just don't know what to make of it: > > [2019-03-07 19:18:36.674687] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.674726] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.674752] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.674806] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-0: changing port to 49152 (from 0) > [2019-03-07 19:18:36.674815] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-1: changing port to 49152 (from 0) > [2019-03-07 19:18:36.674927] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-2: changing port to 49152 (from 0) > [2019-03-07 19:18:36.675012] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675054] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675155] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675203] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675243] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675306] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:18:36.675563] I [MSGID: 114046] > [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-1: > Connected to la1_db_1-client-1, attached to remote volume > '/gluster-bricks/la1_db_1/la1_db_1'. > [2019-03-07 19:18:36.675573] I [MSGID: 108005] > [afr-common.c:5336:__afr_handle_child_up_event] 0-la1_db_1-replicate-0: > Subvolume 'la1_db_1-client-1' came back up; going online. > [2019-03-07 19:18:36.675722] I [MSGID: 114046] > [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-0: > Connected to la1_db_1-client-0, attached to remote volume > '/gluster-bricks/la1_db_1/la1_db_1'. > [2019-03-07 19:18:36.675728] I [MSGID: 114046] > [client-handshake.c:1095:client_setvolume_cbk] 0-la1_db_1-client-2: > Connected to la1_db_1-client-2, attached to remote volume > '/gluster-bricks/la1_db_1/la1_db_1'. > [2019-03-07 19:18:36.675743] I [MSGID: 108002] > [afr-common.c:5611:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is met > [2019-03-07 19:18:36.676564] I [fuse-bridge.c:4294:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.26 > [2019-03-07 19:18:36.676578] I [fuse-bridge.c:4927:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-03-07 19:18:36.677623] I [MSGID: 109005] > [dht-selfheal.c:2342:dht_selfheal_directory] 0-la1_db_1-dht: Directory > selfheal failed: Unable to form layout for directory / > [2019-03-07 19:20:01.674361] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-0: > server 172.16.0.171:49152 has not responded in the last 42 seconds, > disconnecting. > [2019-03-07 19:20:01.674440] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-1: > server 172.16.0.172:49152 has not responded in the last 42 seconds, > disconnecting. > [2019-03-07 19:20:01.674458] C > [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-la1_db_1-client-2: > server 172.16.0.174:49152 has not responded in the last 42 seconds, > disconnecting. > [2019-03-07 19:20:01.674455] I [MSGID: 114018] > [client.c:2254:client_rpc_notify] 0-la1_db_1-client-0: disconnected from > la1_db_1-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-03-07 19:20:01.674530] I [MSGID: 114018] > [client.c:2254:client_rpc_notify] 0-la1_db_1-client-1: disconnected from > la1_db_1-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-03-07 19:20:01.674557] W [MSGID: 108001] > [afr-common.c:5618:afr_notify] 0-la1_db_1-replicate-0: Client-quorum is not > met > [2019-03-07 19:20:01.674776] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-0: forced unwinding frame type(GlusterFS 4.x v1) > op(LOOKUP(27)) called at 2019-03-07 19:18:47.415290 (xid=0x11) > [2019-03-07 19:20:01.674798] W [MSGID: 114031] > [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-0: > remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) > [Transport endpoint is not connected] > [2019-03-07 19:20:01.674808] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-1: forced unwinding frame type(GlusterFS 4.x v1) > op(LOOKUP(27)) called at 2019-03-07 19:18:47.415303 (xid=0x15) > [2019-03-07 19:20:01.674830] W [MSGID: 114031] > [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-1: > remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) > [Transport endpoint is not connected] > [2019-03-07 19:20:01.674888] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) > called at 2019-03-07 19:19:19.670189 (xid=0x12) > [2019-03-07 19:20:01.674902] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] > 0-la1_db_1-client-0: socket disconnected > [2019-03-07 19:20:01.674916] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) > called at 2019-03-07 19:19:19.670216 (xid=0x16) > [2019-03-07 19:20:01.674932] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] > 0-la1_db_1-client-1: socket disconnected > [2019-03-07 19:20:01.674935] I [MSGID: 114018] > [client.c:2254:client_rpc_notify] 0-la1_db_1-client-2: disconnected from > la1_db_1-client-2. Client process will keep trying to connect to glusterd > until brick's port is available > [2019-03-07 19:20:01.674947] E [MSGID: 108006] > [afr-common.c:5413:__afr_handle_child_down_event] 0-la1_db_1-replicate-0: > All subvolumes are down. Going offline until atleast one of them comes back > up. > [2019-03-07 19:20:01.675040] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-2: forced unwinding frame type(GlusterFS 4.x v1) > op(LOOKUP(27)) called at 2019-03-07 19:18:47.415315 (xid=0x11) > [2019-03-07 19:20:01.675055] W [MSGID: 114031] > [client-rpc-fops_v2.c:2540:client4_0_lookup_cbk] 0-la1_db_1-client-2: > remote operation failed. Path: /foo (ef9860a9-a366-49a4-9305-08cdb164395b) > [Transport endpoint is not connected] > [2019-03-07 19:20:01.675077] E [MSGID: 101046] > [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675097] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve > (Transport endpoint is not connected) > [2019-03-07 19:20:01.675164] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675176] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675179] E [MSGID: 101046] > [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675194] W [fuse-bridge.c:565:fuse_entry_cbk] > 0-glusterfs-fuse: 18: LOOKUP() /foo => -1 (Transport endpoint is not > connected) > [2019-03-07 19:20:01.675280] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675282] E [rpc-clnt.c:348:saved_frames_unwind] (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcfbed74ddb] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7fcfbeb44021] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7fcfbeb4414e] (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7fcfbeb456be] > (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7fcfbeb46268] ))))) > 0-la1_db_1-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) > called at 2019-03-07 19:19:19.670241 (xid=0x12) > [2019-03-07 19:20:01.675308] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] > 0-la1_db_1-client-2: socket disconnected > [2019-03-07 19:20:01.675310] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675314] E [MSGID: 101046] > [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675332] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001/foo: failed to resolve > (Transport endpoint is not connected) > [2019-03-07 19:20:01.675379] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675390] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675393] E [MSGID: 101046] > [dht-common.c:1502:dht_lookup_dir_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675405] W [fuse-bridge.c:565:fuse_entry_cbk] > 0-glusterfs-fuse: 20: LOOKUP() /foo => -1 (Transport endpoint is not > connected) > [2019-03-07 19:20:01.675447] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675470] I [MSGID: 108006] > [afr-common.c:5677:afr_local_init] 0-la1_db_1-replicate-0: no subvolumes up > [2019-03-07 19:20:01.675472] E [MSGID: 101046] > [dht-common.c:1905:dht_revalidate_cbk] 0-la1_db_1-dht: dict is null > [2019-03-07 19:20:01.675485] W [fuse-bridge.c:899:fuse_attr_cbk] > 0-glusterfs-fuse: 21: LOOKUP() / => -1 (Transport endpoint is not connected) > [2019-03-07 19:20:11.675946] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.675989] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676040] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676101] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676138] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-0: changing port to 49152 (from 0) > [2019-03-07 19:20:11.676261] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676271] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-1: changing port to 49152 (from 0) > [2019-03-07 19:20:11.676293] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676358] I [rpc-clnt.c:2105:rpc_clnt_reconfig] > 0-la1_db_1-client-2: changing port to 49152 (from 0) > [2019-03-07 19:20:11.676455] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-0: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676477] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676571] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676684] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-1: error returned while attempting to connect to > host:(null), port:0 > [2019-03-07 19:20:11.676753] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-la1_db_1-client-2: error returned while attempting to connect to > host:(null), port:0 > > > This is a 10G network that is otherwise performing flawlessly, and I'm > seeing no errors on the switches; as mentioned, I've tried this both in our > normal configuration and with iptables flushed. Not entirely sure what to > try next; any suggestions? > > Thanks, > > -j > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190308/f51b0e1f/attachment.html>