TomK
2018-May-08 02:28 UTC
[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.
On 4/11/2018 11:54 AM, Alex K wrote: Hey Guy's, Returning to this topic, after disabling the the quorum: cluster.quorum-type: none cluster.server-quorum-type: none I've ran into a number of gluster errors (see below). I'm using gluster as the backend for my NFS storage. I have gluster running on two nodes, nfs01 and nfs02. It's mounted on /n on each host. The path /n is in turn shared out by NFS Ganesha. It's a two node setup with quorum disabled as noted below: [root at nfs02 ganesha]# mount|grep gv01 nfs02:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root at nfs01 glusterfs]# mount|grep gv01 nfs01:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) Gluster always reports as working no matter when I type the below two commands: [root at nfs01 glusterfs]# gluster volume info Volume Name: gv01 Type: Replicate Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: nfs01:/bricks/0/gv01 Brick2: nfs02:/bricks/0/gv01 Options Reconfigured: cluster.server-quorum-type: none cluster.quorum-type: none server.event-threads: 8 client.event-threads: 8 performance.readdir-ahead: on performance.write-behind-window-size: 8MB performance.io-thread-count: 16 performance.cache-size: 1GB nfs.trusted-sync: on performance.client-io-threads: off nfs.disable: on transport.address-family: inet [root at nfs01 glusterfs]# gluster status unrecognized word: status (position 0) [root at nfs01 glusterfs]# gluster volume status Status of volume: gv01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick nfs01:/bricks/0/gv01 49152 0 Y 1422 Brick nfs02:/bricks/0/gv01 49152 0 Y 1422 Self-heal Daemon on localhost N/A N/A Y 1248 Self-heal Daemon on nfs02.nix.my.dom N/A N/A Y 1251 Task Status of Volume gv01 ------------------------------------------------------------------------------ There are no active volume tasks [root at nfs01 glusterfs]# [root at nfs01 glusterfs]# rpm -aq|grep -Ei gluster glusterfs-3.13.2-2.el7.x86_64 glusterfs-devel-3.13.2-2.el7.x86_64 glusterfs-fuse-3.13.2-2.el7.x86_64 glusterfs-api-devel-3.13.2-2.el7.x86_64 centos-release-gluster313-1.0-1.el7.centos.noarch python2-gluster-3.13.2-2.el7.x86_64 glusterfs-client-xlators-3.13.2-2.el7.x86_64 glusterfs-server-3.13.2-2.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64 glusterfs-cli-3.13.2-2.el7.x86_64 centos-release-gluster312-1.0-1.el7.centos.noarch python2-glusterfs-api-1.1-1.el7.noarch glusterfs-libs-3.13.2-2.el7.x86_64 glusterfs-extra-xlators-3.13.2-2.el7.x86_64 glusterfs-api-3.13.2-2.el7.x86_64 [root at nfs01 glusterfs]# The short of it is that everything works and mounts on guests work as long as I don't try to write to the NFS share from my clients. If I try to write to the share, everything comes apart like this: -sh-4.2$ pwd /n/my.dom/tom -sh-4.2$ ls -altri total 6258 11715278280495367299 -rw-------. 1 tom at my.dom tom at my.dom 231 Feb 17 20:15 .bashrc 10937819299152577443 -rw-------. 1 tom at my.dom tom at my.dom 193 Feb 17 20:15 .bash_profile 10823746994379198104 -rw-------. 1 tom at my.dom tom at my.dom 18 Feb 17 20:15 .bash_logout 10718721668898812166 drwxr-xr-x. 3 root root 4096 Mar 5 02:46 .. 12008425472191154054 drwx------. 2 tom at my.dom tom at my.dom 4096 Mar 18 03:07 .ssh 13763048923429182948 -rw-rw-r--. 1 tom at my.dom tom at my.dom 6359568 Mar 25 22:38 opennebula-cores.tar.gz 11674701370106210511 -rw-rw-r--. 1 tom at my.dom tom at my.dom 4 Apr 9 23:25 meh.txt 9326637590629964475 -rw-r--r--. 1 tom at my.dom tom at my.dom 24970 May 1 01:30 nfs-trace-working.dat.gz 9337343577229627320 -rw-------. 1 tom at my.dom tom at my.dom 3734 May 1 23:38 .bash_history 11438151930727967183 drwx------. 3 tom at my.dom tom at my.dom 4096 May 1 23:58 . 9865389421596220499 -rw-r--r--. 1 tom at my.dom tom at my.dom 4096 May 1 23:58 .meh.txt.swp -sh-4.2$ touch test.txt -sh-4.2$ vi test.txt -sh-4.2$ ls -altri ls: cannot open directory .: Permission denied -sh-4.2$ ls -altri ls: cannot open directory .: Permission denied -sh-4.2$ ls -altri This is followed by a slew of other errors in apps using the gluster volume. These errors include: 02/05/2018 23:10:52 : epoch 5aea7bd5 : nfs02.nix.my.dom : ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM ==> ganesha-gfapi.log <=[2018-05-03 04:32:18.009245] I [MSGID: 114021] [client.c:2369:notify] 0-gv01-client-0: current graph is no longer active, destroying rpc_client [2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify] 0-gv01-client-1: current graph is no longer active, destroying rpc_client [2018-05-03 04:32:18.009499] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from gv01-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2018-05-03 04:32:18.009557] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from gv01-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-05-03 04:32:18.009610] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket So I'm wondering, if this is due to the two node gluster, as it seems to be, and what is it that I really need to do here? Should I go with the recommended 3 node setup to avoid this which would include a proper quorum? Or is there more to this and it really doesn't matter if I have a 2 node gluster cluster without a quorum and this is due to something else still? Again, anytime I check the gluter volumes, everything checks out. The results of both 'gluster volume info' and 'gluster volume status' is always as I pasted above, fully working. I'm also using the Linux KDC Free IPA with this solution as well. -- Cheers, Tom K. ------------------------------------------------------------------------------------- Living on earth is expensive, but it includes a free trip around the sun. [root at nfs01 glusterfs]# cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 option cluster.quorum-type none option cluster.server-quorum-type none # option lock-timer 180 # option transport.address-family inet6 # option base-port 49152 # option max-port 65535 end-volume [root at nfs01 glusterfs]# [root at nfs02 glusterfs]# grep -E " E " *.log glusterd.log:[2018-04-30 06:37:51.315618] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-04-30 06:37:51.315696] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-04-30 06:40:37.994481] E [socket.c:2374:socket_connect_finish] 0-management: connection to 192.168.0.131:24007 failed (Connection refused); disconnecting socket glusterd.log:[2018-05-01 04:56:19.231954] E [socket.c:2374:socket_connect_finish] 0-management: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket glusterd.log:[2018-05-01 22:43:04.195366] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-01 22:43:04.195445] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 02:46:32.397585] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 02:46:32.397653] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 03:16:10.937203] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 03:16:10.937261] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 03:57:20.918315] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 03:57:20.918400] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-05 01:37:24.981265] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-05 01:37:24.981346] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-07 03:04:20.053473] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-07 03:04:20.053553] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glustershd.log:[2018-04-30 06:37:53.671466] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-04-30 06:40:41.694799] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-01 04:55:57.191783] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket glustershd.log:[2018-05-01 05:10:55.207027] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-01 22:43:06.313941] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-02 03:16:12.884697] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-01 04:56:01.191877] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket n.log:[2018-05-01 05:10:56.448375] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket n.log:[2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket n.log:[2018-05-02 03:16:12.919833] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-05 01:38:37.389091] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:37.389171] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) n.log:[2018-05-05 01:38:46.974945] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:46.975012] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-05 01:38:47.010671] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:47.010731] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-07 03:05:48.552793] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:48.552872] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) n.log:[2018-05-07 03:05:56.084586] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:56.084655] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-07 03:05:56.148767] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:56.148825] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed [root at nfs02 glusterfs]# ganesha-gfapi.log:[2018-04-08 03:45:25.440067] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 03:45:28.455560] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 03:45:29.145764] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 03:51:15.529380] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 03:51:29.754070] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 03:51:40.633012] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:36:28.005490] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 04:37:09.038708] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:37:09.039432] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 04:37:09.044188] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:37:09.044484] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:17:02.093164] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:17:29.123148] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 17:17:50.135169] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:18:03.290346] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:18:14.202118] E [socket.c:2369:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 17:19:39.014330] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:32:21.714643] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:32:21.734187] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null ganesha-gfapi.log:[2018-04-08 20:35:30.005234] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 20:55:29.009144] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 20:57:52.009895] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 21:00:29.004716] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 21:01:01.205704] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 21:01:01.209797] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null ganesha-gfapi.log:[2018-04-09 04:41:02.006926] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-10 03:20:40.011967] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-10 03:30:33.057576] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-13 02:13:01.005629] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-14 21:41:18.313290] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-15 03:01:37.005636] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-15 03:02:37.319050] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-15 03:43:02.719856] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-15 20:36:31.143742] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 00:02:38.697700] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:16:38.383945] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:25:30.904382] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:25:57.432071] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-16 05:26:00.122608] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:30:20.172115] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-17 05:07:05.006133] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-17 05:08:39.004624] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-20 04:58:55.043976] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-20 05:07:22.762457] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-20 05:09:18.710446] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-20 05:09:21.489724] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.791636] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.797525] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.797565] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-28 07:36:29.927497] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-28 07:36:31.215686] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:36:31.216287] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-30 06:37:02.005127] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-30 06:37:53.985563] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-01 04:55:57.191787] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-05-01 05:10:55.595474] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-01 05:10:56.620226] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-05-01 22:42:26.005472] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-01 22:43:06.423349] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-02 03:16:12.930652] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-03 02:43:03.021549] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-03 03:00:01.034676] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-03 03:59:28.006170] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-05 01:38:47.474503] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88] ))))) 0-gv01-client-0: forced unwinding frame type(GlusterFS Handshake) op(SETVOLUME(1)) called at 2018-05-05 01:38:46.968501 (xid=0x5) ganesha-gfapi.log:[2018-05-05 01:38:47.474834] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-05 01:38:47.474960] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88] ))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2018-05-05 01:38:46.965204 (xid=0x2) ganesha-gfapi.log:[2018-05-05 01:38:50.457456] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to timer event queue ganesha-gfapi.log:[2018-05-07 03:05:58.522295] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f45af6d6f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f45af49be7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f45af49bf9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f45af49d720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f45a0db6c88] ))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2018-05-07 03:05:56.080210 (xid=0x2) ganesha-gfapi.log:[2018-05-07 03:05:59.504926] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to timer event queue ganesha-gfapi.log:[2018-05-07 03:05:59.505274] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-0: Error adding to timer event queue [root at nfs02 ganesha]# [root at nfs02 ganesha]#> > > On Wed, Apr 11, 2018 at 4:35 AM, TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>> wrote: > > On 4/9/2018 2:45 AM, Alex K wrote: > Hey Alex, > > With two nodes, the setup works but both sides go down when one node > is missing.? Still I set the below two params to none and that > solved my issue: > > cluster.quorum-type: none > cluster.server-quorum-type: none > > yes this disables quorum so as to avoid the issue. Glad that this > helped. Bare in in mind though that it is easier to face split-brain > issues with quorum is disabled, that's why 3 nodes at least are > recommended. Just to note that I have also a 2 node cluster which is > running without issues for long time. > > Thank you for that. > > Cheers, > Tom > > Hi, > > You need 3 nodes at least to have quorum enabled. In 2 node > setup you need to disable quorum so as to be able to still use > the volume when one of the nodes go down. > > On Mon, Apr 9, 2018, 09:02 TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com> <mailto:tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>>> wrote: > > ? ? Hey All, > > ? ? In a two node glusterfs setup, with one node down, can't > use the second > ? ? node to mount the volume.? I understand this is expected > behaviour? > ? ? Anyway to allow the secondary node to function then > replicate what > ? ? changed to the first (primary) when it's back online?? Or > should I just > ? ? go for a third node to allow for this? > > ? ? Also, how safe is it to set the following to none? > > ? ? cluster.quorum-type: auto > ? ? cluster.server-quorum-type: server > > > ? ? [root at nfs01 /]# gluster volume start gv01 > ? ? volume start: gv01: failed: Quorum not met. Volume > operation not > ? ? allowed. > ? ? [root at nfs01 /]# > > > ? ? [root at nfs01 /]# gluster volume status > ? ? Status of volume: gv01 > ? ? Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TCP Port? RDMA > Port? ? ?Online? Pid > > ------------------------------------------------------------------------------ > ? ? Brick nfs01:/bricks/0/gv01? ? ? ? ? ? ? ? ? N/A? ? ? ?N/A > ? ? ? N? ? ? ? ? ?N/A > ? ? Self-heal Daemon on localhost? ? ? ? ? ? ? ?N/A? ? ? ?N/A > ? ? ? Y > ? ? 25561 > > ? ? Task Status of Volume gv01 > > ------------------------------------------------------------------------------ > ? ? There are no active volume tasks > > ? ? [root at nfs01 /]# > > > ? ? [root at nfs01 /]# gluster volume info > > ? ? Volume Name: gv01 > ? ? Type: Replicate > ? ? Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 > ? ? Status: Started > ? ? Snapshot Count: 0 > ? ? Number of Bricks: 1 x 2 = 2 > ? ? Transport-type: tcp > ? ? Bricks: > ? ? Brick1: nfs01:/bricks/0/gv01 > ? ? Brick2: nfs02:/bricks/0/gv01 > ? ? Options Reconfigured: > ? ? transport.address-family: inet > ? ? nfs.disable: on > ? ? performance.client-io-threads: off > ? ? nfs.trusted-sync: on > ? ? performance.cache-size: 1GB > ? ? performance.io-thread-count: 16 > ? ? performance.write-behind-window-size: 8MB > ? ? performance.readdir-ahead: on > ? ? client.event-threads: 8 > ? ? server.event-threads: 8 > ? ? cluster.quorum-type: auto > ? ? cluster.server-quorum-type: server > ? ? [root at nfs01 /]# > > > > > ? ? ==> n.log <=> ? ? [2018-04-09 05:08:13.704156] I [MSGID: 100030] > [glusterfsd.c:2556:main] > ? ? 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs > version > ? ? 3.13.2 (args: /usr/sbin/glusterfs --process-name fuse > ? ? --volfile-server=nfs01 --volfile-id=/gv01 /n) > ? ? [2018-04-09 05:08:13.711255] W [MSGID: 101002] > ? ? [options.c:995:xl_opt_validate] 0-glusterfs: option > 'address-family' is > ? ? deprecated, preferred is 'transport.address-family', > continuing with > ? ? correction > ? ? [2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect] > ? ? 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:13.729025] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 1 > ? ? [2018-04-09 05:08:13.737757] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 2 > ? ? [2018-04-09 05:08:13.738114] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 3 > ? ? [2018-04-09 05:08:13.738203] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 4 > ? ? [2018-04-09 05:08:13.738324] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 5 > ? ? [2018-04-09 05:08:13.738330] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 6 > ? ? [2018-04-09 05:08:13.738655] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 7 > ? ? [2018-04-09 05:08:13.738742] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 8 > ? ? [2018-04-09 05:08:13.739460] W [MSGID: 101174] > ? ? [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: > option > ? ? 'parallel-readdir' is not recognized > ? ? [2018-04-09 05:08:13.739787] I [MSGID: 114020] > [client.c:2360:notify] > ? ? 0-gv01-client-0: parent translators are ready, attempting > connect on > ? ? transport > ? ? [2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:13.747372] I [MSGID: 114020] > [client.c:2360:notify] > ? ? 0-gv01-client-1: parent translators are ready, attempting > connect on > ? ? transport > ? ? [2018-04-09 05:08:13.747883] E [MSGID: 114058] > ? ? [client-handshake.c:1571:client_query_portmap_cbk] > 0-gv01-client-0: > ? ? failed to get the port number for remote subvolume. Please > run 'gluster > ? ? volume status' on server to see if brick process is running. > ? ? [2018-04-09 05:08:13.748026] I [MSGID: 114018] > ? ? [client.c:2285:client_rpc_notify] 0-gv01-client-0: > disconnected from > ? ? gv01-client-0. Client process will keep trying to connect > to glusterd > ? ? until brick's port is available > ? ? [2018-04-09 05:08:13.748070] W [MSGID: 108001] > ? ? [afr-common.c:5391:afr_notify] 0-gv01-replicate-0: > Client-quorum is > ? ? not met > ? ? [2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? Final graph: > > +------------------------------------------------------------------------------+ > ? ? ?? ?1: volume gv01-client-0 > ? ? ?? ?2:? ? ?type protocol/client > ? ? ?? ?3:? ? ?option ping-timeout 42 > ? ? ?? ?4:? ? ?option remote-host nfs01 > ? ? ?? ?5:? ? ?option remote-subvolume /bricks/0/gv01 > ? ? ?? ?6:? ? ?option transport-type socket > ? ? ?? ?7:? ? ?option transport.address-family inet > ? ? ?? ?8:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ? ?? ?9:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? ? ?? 10:? ? ?option event-threads 8 > ? ? ?? 11:? ? ?option transport.tcp-user-timeout 0 > ? ? ?? 12:? ? ?option transport.socket.keepalive-time 20 > ? ? ?? 13:? ? ?option transport.socket.keepalive-interval 2 > ? ? ?? 14:? ? ?option transport.socket.keepalive-count 9 > ? ? ?? 15:? ? ?option send-gids true > ? ? ?? 16: end-volume > ? ? ?? 17: > ? ? ?? 18: volume gv01-client-1 > ? ? ?? 19:? ? ?type protocol/client > ? ? ?? 20:? ? ?option ping-timeout 42 > ? ? ?? 21:? ? ?option remote-host nfs02 > ? ? ?? 22:? ? ?option remote-subvolume /bricks/0/gv01 > ? ? ?? 23:? ? ?option transport-type socket > ? ? ?? 24:? ? ?option transport.address-family inet > ? ? ?? 25:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ? ?? 26:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? ? ?? 27:? ? ?option event-threads 8 > ? ? ?? 28:? ? ?option transport.tcp-user-timeout 0 > ? ? ?? 29:? ? ?option transport.socket.keepalive-time 20 > ? ? ?? 30:? ? ?option transport.socket.keepalive-interval 2 > ? ? ?? 31:? ? ?option transport.socket.keepalive-count 9 > ? ? ?? 32:? ? ?option send-gids true > ? ? ?? 33: end-volume > ? ? ?? 34: > ? ? ?? 35: volume gv01-replicate-0 > ? ? ?? 36:? ? ?type cluster/replicate > ? ? ?? 37:? ? ?option afr-pending-xattr gv01-client-0,gv01-client-1 > ? ? ?? 38:? ? ?option quorum-type auto > ? ? ?? 39:? ? ?option use-compound-fops off > ? ? ?? 40:? ? ?subvolumes gv01-client-0 gv01-client-1 > ? ? ?? 41: end-volume > ? ? ?? 42: > ? ? ?? 43: volume gv01-dht > ? ? ?? 44:? ? ?type cluster/distribute > ? ? ?? 45:? ? ?option lock-migration off > ? ? ?? 46:? ? ?subvolumes gv01-replicate-0 > ? ? ?? 47: end-volume > ? ? ?? 48: > ? ? ?? 49: volume gv01-write-behind > ? ? ?? 50:? ? ?type performance/write-behind > ? ? ?? 51:? ? ?option cache-size 8MB > ? ? ?? 52:? ? ?subvolumes gv01-dht > ? ? ?? 53: end-volume > ? ? ?? 54: > ? ? ?? 55: volume gv01-read-ahead > ? ? ?? 56:? ? ?type performance/read-ahead > ? ? ?? 57:? ? ?subvolumes gv01-write-behind > ? ? ?? 58: end-volume > ? ? ?? 59: > ? ? ?? 60: volume gv01-readdir-ahead > ? ? ?? 61:? ? ?type performance/readdir-ahead > ? ? ?? 62:? ? ?option parallel-readdir off > ? ? ?? 63:? ? ?option rda-request-size 131072 > ? ? ?? 64:? ? ?option rda-cache-limit 10MB > ? ? ?? 65:? ? ?subvolumes gv01-read-ahead > ? ? ?? 66: end-volume > ? ? ?? 67: > ? ? ?? 68: volume gv01-io-cache > ? ? ?? 69:? ? ?type performance/io-cache > ? ? ?? 70:? ? ?option cache-size 1GB > ? ? ?? 71:? ? ?subvolumes gv01-readdir-ahead > ? ? ?? 72: end-volume > ? ? ?? 73: > ? ? ?? 74: volume gv01-quick-read > ? ? ?? 75:? ? ?type performance/quick-read > ? ? ?? 76:? ? ?option cache-size 1GB > ? ? ?? 77:? ? ?subvolumes gv01-io-cache > ? ? ?? 78: end-volume > ? ? ?? 79: > ? ? ?? 80: volume gv01-open-behind > ? ? ?? 81:? ? ?type performance/open-behind > ? ? ?? 82:? ? ?subvolumes gv01-quick-read > ? ? ?? 83: end-volume > ? ? ?? 84: > ? ? ?? 85: volume gv01-md-cache > ? ? ?? 86:? ? ?type performance/md-cache > ? ? ?? 87:? ? ?subvolumes gv01-open-behind > ? ? ?? 88: end-volume > ? ? ?? 89: > ? ? ?? 90: volume gv01 > ? ? ?? 91:? ? ?type debug/io-stats > ? ? ?? 92:? ? ?option log-level INFO > ? ? ?? 93:? ? ?option latency-measurement off > ? ? ?? 94:? ? ?option count-fop-hits off > ? ? ?? 95:? ? ?subvolumes gv01-md-cache > ? ? ?? 96: end-volume > ? ? ?? 97: > ? ? ?? 98: volume meta-autoload > ? ? ?? 99:? ? ?type meta > ? ? 100:? ? ?subvolumes gv01 > ? ? 101: end-volume > ? ? 102: > > +------------------------------------------------------------------------------+ > ? ? [2018-04-09 05:08:13.922631] E > [socket.c:2374:socket_connect_finish] > ? ? 0-gv01-client-1: connection to 192.168.0.119:24007 > <http://192.168.0.119:24007> > ? ? <http://192.168.0.119:24007> failed (No route to > > ? ? host); disconnecting socket > ? ? [2018-04-09 05:08:13.922690] E [MSGID: 108006] > ? ? [afr-common.c:5164:__afr_handle_child_down_event] > 0-gv01-replicate-0: > ? ? All subvolumes are down. Going offline until atleast one of > them comes > ? ? back up. > ? ? [2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init] > ? ? 0-glusterfs-fuse: FUSE inited with protocol versions: > glusterfs 7.24 > ? ? kernel 7.22 > ? ? [2018-04-09 05:08:13.926245] I > [fuse-bridge.c:4835:fuse_graph_sync] > ? ? 0-fuse: switched to graph 0 > ? ? [2018-04-09 05:08:13.926518] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.926671] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.926762] E > [fuse-bridge.c:4271:fuse_first_lookup] > ? ? 0-fuse: first lookup on root failed (Transport endpoint is not > ? ? connected) > ? ? [2018-04-09 05:08:13.927207] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.927262] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.927301] W > ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > ? ? 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport > ? ? endpoint is not connected) > ? ? [2018-04-09 05:08:13.927339] E > [fuse-bridge.c:900:fuse_getattr_resume] > ? ? 0-glusterfs-fuse: 2: GETATTR 1 > (00000000-0000-0000-0000-000000000001) > ? ? resolution failed > ? ? [2018-04-09 05:08:13.931497] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.931558] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.931599] W > ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > ? ? 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport > ? ? endpoint is not connected) > ? ? [2018-04-09 05:08:13.931623] E > [fuse-bridge.c:900:fuse_getattr_resume] > ? ? 0-glusterfs-fuse: 3: GETATTR 1 > (00000000-0000-0000-0000-000000000001) > ? ? resolution failed > ? ? [2018-04-09 05:08:13.937258] I > [fuse-bridge.c:5093:fuse_thread_proc] > ? ? 0-fuse: initating unmount of /n > ? ? [2018-04-09 05:08:13.938043] W > [glusterfsd.c:1393:cleanup_and_exit] > ? ? (-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25] > ? ? -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) > [0x560b52471675] > ? ? -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) > [0x560b5247149b] ) 0-: > ? ? received signum (15), shutting down > ? ? [2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] > 0-fuse: > ? ? Unmounting '/n'. > ? ? [2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] > 0-fuse: Closing > ? ? fuse connection to '/n'. > > ? ? ==> glusterd.log <=> ? ? [2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect] > ? ? 0-management: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > > ? ? ==> glustershd.log <=> ? ? [2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > > > > > > > > ? ? -- > ? ? Cheers, > ? ? Tom K. > > ------------------------------------------------------------------------------------- > > ? ? Living on earth is expensive, but it includes a free trip > around the > ? ? sun. > > ? ? _______________________________________________ > ? ? Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > -- > Cheers, > Tom K. > ------------------------------------------------------------------------------------- > > Living on earth is expensive, but it includes a free trip around the > sun. > >
TomK
2018-May-22 05:43 UTC
[Gluster-users] [SOLVED] [Nfs-ganesha-support] volume start: gv01: failed: Quorum not met. Volume operation not allowed.
Hey All, Appears I solved this one and NFS mounts now work on all my clients. No issues since fixing it a few hours back. RESOLUTION Auditd is to blame for the trouble. Noticed this in the logs on 2 of the 3 NFS servers (nfs01, nfs02, nfs03): type=AVC msg=audit(1526965320.850:4094): avc: denied { write } for pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2 success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null) type=PROCTITLE msg=audit(1526965320.850:4094): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54 type=AVC msg=audit(1526965320.850:4095): avc: denied { unlink } for pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87 success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null) type=PROCTITLE msg=audit(1526965320.850:4095): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54 Fix was to adjust the SELinux rules using audit2allow. All the errors below including the one in the link below, were due to that. Turns out that when ever it worked, it hit the only working server in the system, nfs03. Whenever it didn't work, it was hitting the non working servers. So sometimes it worked, and other times it didn't. It looked like it was to do with Haproxy / Keepalived as well since I couldn't mount using the VIP but could using the host. But that wasn't the case either. I've also added the third brick to the Gluster FS, nfs03, trying to see if the backend FS was to blame since Gluster FS recommends 3 bricks minimum for replication, but that had no effect. In case anyone runs into this, I've added notes here as well: http://microdevsys.com/wp/kernel-nfs-nfs4_discover_server_trunking-unhandled-error-512-exiting-with-error-eio-and-mount-hangs/ http://microdevsys.com/wp/nfs-reply-xid-3844308326-reply-err-20-auth-rejected-credentials-client-should-begin-new-session/ The errors thrown included: NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client should begin new session) kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO and mount hangs + the kernel exception below. -- Cheers, Tom K. ------------------------------------------------------------------------------------- Living on earth is expensive, but it includes a free trip around the sun. May 21 23:53:13 psql01 kernel: CPU: 3 PID: 2273 Comm: mount.nfs Tainted: G L ------------ 3.10.0-693.21.1.el7.x86_64 #1 . . . May 21 23:53:13 psql01 kernel: task: ffff880136335ee0 ti: ffff8801376b0000 task.ti: ffff8801376b0000 May 21 23:53:13 psql01 kernel: RIP: 0010:[<ffffffff816b6545>] [<ffffffff816b6545>] _raw_spin_unlock_irqrestore+0x15/0x20 May 21 23:53:13 psql01 kernel: RSP: 0018:ffff8801376b3a60 EFLAGS: 00000206 May 21 23:53:13 psql01 kernel: RAX: ffffffffc05ab078 RBX: ffff880036973928 RCX: dead000000000200 May 21 23:53:13 psql01 kernel: RDX: ffffffffc05ab078 RSI: 0000000000000206 RDI: 0000000000000206 May 21 23:53:13 psql01 kernel: RBP: ffff8801376b3a60 R08: ffff8801376b3ab8 R09: ffff880137de1200 May 21 23:53:13 psql01 kernel: R10: ffff880036973928 R11: 0000000000000000 R12: ffff880036973928 May 21 23:53:13 psql01 kernel: R13: ffff8801376b3a58 R14: ffff88013fd98a40 R15: ffff8801376b3a58 May 21 23:53:13 psql01 kernel: FS: 00007fab48f07880(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 May 21 23:53:13 psql01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 21 23:53:13 psql01 kernel: CR2: 00007f99793d93cc CR3: 000000013761e000 CR4: 00000000000007e0 May 21 23:53:13 psql01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 21 23:53:13 psql01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 21 23:53:13 psql01 kernel: Call Trace: May 21 23:53:13 psql01 kernel: [<ffffffff810b4d86>] finish_wait+0x56/0x70 May 21 23:53:13 psql01 kernel: [<ffffffffc0580361>] nfs_wait_client_init_complete+0xa1/0xe0 [nfs] May 21 23:53:13 psql01 kernel: [<ffffffff810b4fc0>] ? wake_up_atomic_t+0x30/0x30 May 21 23:53:13 psql01 kernel: [<ffffffffc0581e9b>] nfs_get_client+0x22b/0x470 [nfs] May 21 23:53:13 psql01 kernel: [<ffffffffc05eafd8>] nfs4_set_client+0x98/0x130 [nfsv4] May 21 23:53:13 psql01 kernel: [<ffffffffc05ec77e>] nfs4_create_server+0x13e/0x3b0 [nfsv4] May 21 23:53:13 psql01 kernel: [<ffffffffc05e391e>] nfs4_remote_mount+0x2e/0x60 [nfsv4] May 21 23:53:13 psql01 kernel: [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0 May 21 23:53:13 psql01 kernel: [<ffffffff811aa685>] ? __alloc_percpu+0x15/0x20 May 21 23:53:13 psql01 kernel: [<ffffffff81226d57>] vfs_kern_mount+0x67/0x110 May 21 23:53:13 psql01 kernel: [<ffffffffc05e3846>] nfs_do_root_mount+0x86/0xc0 [nfsv4] May 21 23:53:13 psql01 kernel: [<ffffffffc05e3c44>] nfs4_try_mount+0x44/0xc0 [nfsv4] May 21 23:53:13 psql01 kernel: [<ffffffffc05826d7>] ? get_nfs_version+0x27/0x90 [nfs] May 21 23:53:13 psql01 kernel: [<ffffffffc058ec9b>] nfs_fs_mount+0x4cb/0xda0 [nfs] May 21 23:53:13 psql01 kernel: [<ffffffffc058fbe0>] ? nfs_clone_super+0x140/0x140 [nfs] May 21 23:53:13 psql01 kernel: [<ffffffffc058daa0>] ? param_set_portnr+0x70/0x70 [nfs] May 21 23:53:13 psql01 kernel: [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0 May 21 23:53:13 psql01 kernel: [<ffffffff811aa685>] ? __alloc_percpu+0x15/0x20 May 21 23:53:13 psql01 kernel: [<ffffffff81226d57>] vfs_kern_mount+0x67/0x110 May 21 23:53:13 psql01 kernel: [<ffffffff81229263>] do_mount+0x233/0xaf0 May 21 23:53:13 psql01 kernel: [<ffffffff81229ea6>] SyS_mount+0x96/0xf0 May 21 23:53:13 psql01 kernel: [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21 May 21 23:53:13 psql01 kernel: [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146 On 5/7/2018 10:28 PM, TomK wrote:> This list has been deprecated. Please subscribe to the new support list > at lists.nfs-ganesha.org. > On 4/11/2018 11:54 AM, Alex K wrote: > > Hey Guy's, > > Returning to this topic, after disabling the the quorum: > > cluster.quorum-type: none > cluster.server-quorum-type: none > > I've ran into a number of gluster errors (see below). > > I'm using gluster as the backend for my NFS storage.? I have gluster > running on two nodes, nfs01 and nfs02.? It's mounted on /n on each host. > ?The path /n is in turn shared out by NFS Ganesha.? It's a two node > setup with quorum disabled as noted below: > > [root at nfs02 ganesha]# mount|grep gv01 > nfs02:/gv01 on /n type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > > [root at nfs01 glusterfs]# mount|grep gv01 > nfs01:/gv01 on /n type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > > Gluster always reports as working no matter when I type the below two > commands: > > [root at nfs01 glusterfs]# gluster volume info > > Volume Name: gv01 > Type: Replicate > Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: nfs01:/bricks/0/gv01 > Brick2: nfs02:/bricks/0/gv01 > Options Reconfigured: > cluster.server-quorum-type: none > cluster.quorum-type: none > server.event-threads: 8 > client.event-threads: 8 > performance.readdir-ahead: on > performance.write-behind-window-size: 8MB > performance.io-thread-count: 16 > performance.cache-size: 1GB > nfs.trusted-sync: on > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > [root at nfs01 glusterfs]# gluster status > unrecognized word: status (position 0) > [root at nfs01 glusterfs]# gluster volume status > Status of volume: gv01 > Gluster process???????????????????????????? TCP Port? RDMA Port? Online > Pid > ------------------------------------------------------------------------------ > > Brick nfs01:/bricks/0/gv01????????????????? 49152???? 0????????? Y 1422 > Brick nfs02:/bricks/0/gv01????????????????? 49152???? 0????????? Y 1422 > Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y 1248 > Self-heal Daemon on nfs02.nix.my.dom?????? N/A?????? N/A??????? Y > 1251 > > Task Status of Volume gv01 > ------------------------------------------------------------------------------ > > There are no active volume tasks > > [root at nfs01 glusterfs]# > > [root at nfs01 glusterfs]# rpm -aq|grep -Ei gluster > glusterfs-3.13.2-2.el7.x86_64 > glusterfs-devel-3.13.2-2.el7.x86_64 > glusterfs-fuse-3.13.2-2.el7.x86_64 > glusterfs-api-devel-3.13.2-2.el7.x86_64 > centos-release-gluster313-1.0-1.el7.centos.noarch > python2-gluster-3.13.2-2.el7.x86_64 > glusterfs-client-xlators-3.13.2-2.el7.x86_64 > glusterfs-server-3.13.2-2.el7.x86_64 > libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64 > glusterfs-cli-3.13.2-2.el7.x86_64 > centos-release-gluster312-1.0-1.el7.centos.noarch > python2-glusterfs-api-1.1-1.el7.noarch > glusterfs-libs-3.13.2-2.el7.x86_64 > glusterfs-extra-xlators-3.13.2-2.el7.x86_64 > glusterfs-api-3.13.2-2.el7.x86_64 > [root at nfs01 glusterfs]# > > The short of it is that everything works and mounts on guests work as > long as I don't try to write to the NFS share from my clients.? If I try > to write to the share, everything comes apart like this: > > -sh-4.2$ pwd > /n/my.dom/tom > -sh-4.2$ ls -altri > total 6258 > 11715278280495367299 -rw-------. 1 tom at my.dom tom at my.dom???? 231 Feb 17 > 20:15 .bashrc > 10937819299152577443 -rw-------. 1 tom at my.dom tom at my.dom???? 193 Feb 17 > 20:15 .bash_profile > 10823746994379198104 -rw-------. 1 tom at my.dom tom at my.dom????? 18 Feb 17 > 20:15 .bash_logout > 10718721668898812166 drwxr-xr-x. 3 root??????? root?????????? 4096 Mar 5 > 02:46 .. > 12008425472191154054 drwx------. 2 tom at my.dom tom at my.dom??? 4096 Mar 18 > 03:07 .ssh > 13763048923429182948 -rw-rw-r--. 1 tom at my.dom tom at my.dom 6359568 Mar 25 > 22:38 opennebula-cores.tar.gz > 11674701370106210511 -rw-rw-r--. 1 tom at my.dom tom at my.dom?????? 4 Apr? 9 > 23:25 meh.txt > ?9326637590629964475 -rw-r--r--. 1 tom at my.dom tom at my.dom?? 24970 May? 1 > 01:30 nfs-trace-working.dat.gz > ?9337343577229627320 -rw-------. 1 tom at my.dom tom at my.dom??? 3734 May? 1 > 23:38 .bash_history > 11438151930727967183 drwx------. 3 tom at my.dom tom at my.dom??? 4096 May? 1 > 23:58 . > ?9865389421596220499 -rw-r--r--. 1 tom at my.dom tom at my.dom??? 4096 May? 1 > 23:58 .meh.txt.swp > -sh-4.2$ touch test.txt > -sh-4.2$ vi test.txt > -sh-4.2$ ls -altri > ls: cannot open directory .: Permission denied > -sh-4.2$ ls -altri > ls: cannot open directory .: Permission denied > -sh-4.2$ ls -altri > > This is followed by a slew of other errors in apps using the gluster > volume.? These errors include: > > 02/05/2018 23:10:52 : epoch 5aea7bd5 : nfs02.nix.my.dom : > ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not > authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM > > > ==> ganesha-gfapi.log <=> [2018-05-03 04:32:18.009245] I [MSGID: 114021] [client.c:2369:notify] > 0-gv01-client-0: current graph is no longer active, destroying rpc_client > [2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify] > 0-gv01-client-1: current graph is no longer active, destroying rpc_client > [2018-05-03 04:32:18.009499] I [MSGID: 114018] > [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from > gv01-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2018-05-03 04:32:18.009557] I [MSGID: 114018] > [client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from > gv01-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2018-05-03 04:32:18.009610] E [MSGID: 108006] > [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: > All subvolumes are down. Going offline until atleast one of them comes > back up. > > > [2018-05-01 22:43:06.412067] E [MSGID: 114058] > [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: > failed to get the port number for remote subvolume. Please run 'gluster > volume status' on server to see if brick process is running. > [2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] > 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection > refused); disconnecting socket > > > So I'm wondering, if this is due to the two node gluster, as it seems to > be, and what is it that I really need to do here?? Should I go with the > recommended 3 node setup to avoid this which would include a proper > quorum?? Or is there more to this and it really doesn't matter if I have > a 2 node gluster cluster without a quorum and this is due to something > else still? > > Again, anytime I check the gluter volumes, everything checks out.? The > results of both 'gluster volume info' and 'gluster volume status' is > always as I pasted above, fully working. > > I'm also using the Linux KDC Free IPA with this solution as well. >
Possibly Parallel Threads
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.