thr3ads.net - Gluster users - [Gluster-users] [SOLVED] [Nfs-ganesha-support] volume start: gv01: failed: Quorum not met. Volume operation not allowed. [May 2018]

If this information is useful, please help other people find it:
Share via:

TomK

2018-May-08 02:28 UTC

[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

On 4/11/2018 11:54 AM, Alex K wrote:

Hey Guy's,

Returning to this topic, after disabling the the quorum:

cluster.quorum-type: none
cluster.server-quorum-type: none

I've ran into a number of gluster errors (see below).

I'm using gluster as the backend for my NFS storage.  I have gluster 
running on two nodes, nfs01 and nfs02.  It's mounted on /n on each host. 
  The path /n is in turn shared out by NFS Ganesha.  It's a two node 
setup with quorum disabled as noted below:

[root at nfs02 ganesha]# mount|grep gv01
nfs02:/gv01 on /n type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root at nfs01 glusterfs]# mount|grep gv01
nfs01:/gv01 on /n type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Gluster always reports as working no matter when I type the below two 
commands:

[root at nfs01 glusterfs]# gluster volume info

Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
Options Reconfigured:
cluster.server-quorum-type: none
cluster.quorum-type: none
server.event-threads: 8
client.event-threads: 8
performance.readdir-ahead: on
performance.write-behind-window-size: 8MB
performance.io-thread-count: 16
performance.cache-size: 1GB
nfs.trusted-sync: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
[root at nfs01 glusterfs]# gluster status
unrecognized word: status (position 0)
[root at nfs01 glusterfs]# gluster volume status
Status of volume: gv01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick nfs01:/bricks/0/gv01                  49152     0          Y 1422
Brick nfs02:/bricks/0/gv01                  49152     0          Y 1422
Self-heal Daemon on localhost               N/A       N/A        Y 1248
Self-heal Daemon on nfs02.nix.my.dom       N/A       N/A        Y       1251

Task Status of Volume gv01
------------------------------------------------------------------------------
There are no active volume tasks

[root at nfs01 glusterfs]#

[root at nfs01 glusterfs]# rpm -aq|grep -Ei gluster
glusterfs-3.13.2-2.el7.x86_64
glusterfs-devel-3.13.2-2.el7.x86_64
glusterfs-fuse-3.13.2-2.el7.x86_64
glusterfs-api-devel-3.13.2-2.el7.x86_64
centos-release-gluster313-1.0-1.el7.centos.noarch
python2-gluster-3.13.2-2.el7.x86_64
glusterfs-client-xlators-3.13.2-2.el7.x86_64
glusterfs-server-3.13.2-2.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64
glusterfs-cli-3.13.2-2.el7.x86_64
centos-release-gluster312-1.0-1.el7.centos.noarch
python2-glusterfs-api-1.1-1.el7.noarch
glusterfs-libs-3.13.2-2.el7.x86_64
glusterfs-extra-xlators-3.13.2-2.el7.x86_64
glusterfs-api-3.13.2-2.el7.x86_64
[root at nfs01 glusterfs]#

The short of it is that everything works and mounts on guests work as 
long as I don't try to write to the NFS share from my clients.  If I try 
to write to the share, everything comes apart like this:

-sh-4.2$ pwd
/n/my.dom/tom
-sh-4.2$ ls -altri
total 6258
11715278280495367299 -rw-------. 1 tom at my.dom tom at my.dom     231 Feb 17 
20:15 .bashrc
10937819299152577443 -rw-------. 1 tom at my.dom tom at my.dom     193 Feb 17 
20:15 .bash_profile
10823746994379198104 -rw-------. 1 tom at my.dom tom at my.dom      18 Feb 17 
20:15 .bash_logout
10718721668898812166 drwxr-xr-x. 3 root        root           4096 Mar 5 
02:46 ..
12008425472191154054 drwx------. 2 tom at my.dom tom at my.dom    4096 Mar 18 
03:07 .ssh
13763048923429182948 -rw-rw-r--. 1 tom at my.dom tom at my.dom 6359568 Mar 25 
22:38 opennebula-cores.tar.gz
11674701370106210511 -rw-rw-r--. 1 tom at my.dom tom at my.dom       4 Apr  9 
23:25 meh.txt
  9326637590629964475 -rw-r--r--. 1 tom at my.dom tom at my.dom   24970 May  1 
01:30 nfs-trace-working.dat.gz
  9337343577229627320 -rw-------. 1 tom at my.dom tom at my.dom    3734 May  1 
23:38 .bash_history
11438151930727967183 drwx------. 3 tom at my.dom tom at my.dom    4096 May  1 
23:58 .
  9865389421596220499 -rw-r--r--. 1 tom at my.dom tom at my.dom    4096 May  1 
23:58 .meh.txt.swp
-sh-4.2$ touch test.txt
-sh-4.2$ vi test.txt
-sh-4.2$ ls -altri
ls: cannot open directory .: Permission denied
-sh-4.2$ ls -altri
ls: cannot open directory .: Permission denied
-sh-4.2$ ls -altri

This is followed by a slew of other errors in apps using the gluster 
volume.  These errors include:

02/05/2018 23:10:52 : epoch 5aea7bd5 : nfs02.nix.my.dom : 
ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not 
authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM

==> ganesha-gfapi.log <=[2018-05-03 04:32:18.009245] I [MSGID: 114021]
[client.c:2369:notify]
0-gv01-client-0: current graph is no longer active, destroying rpc_client
[2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify] 
0-gv01-client-1: current graph is no longer active, destroying rpc_client
[2018-05-03 04:32:18.009499] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from 
gv01-client-0. Client process will keep trying to connect to glusterd 
until brick's port is available
[2018-05-03 04:32:18.009557] I [MSGID: 114018] 
[client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from 
gv01-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available
[2018-05-03 04:32:18.009610] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.

[2018-05-01 22:43:06.412067] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 
0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection 
refused); disconnecting socket

So I'm wondering, if this is due to the two node gluster, as it seems to 
be, and what is it that I really need to do here?  Should I go with the 
recommended 3 node setup to avoid this which would include a proper 
quorum?  Or is there more to this and it really doesn't matter if I have 
a 2 node gluster cluster without a quorum and this is due to something 
else still?

Again, anytime I check the gluter volumes, everything checks out.  The 
results of both 'gluster volume info' and 'gluster volume
status' is
always as I pasted above, fully working.

I'm also using the Linux KDC Free IPA with this solution as well.

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

[root at nfs01 glusterfs]# cat /etc/glusterfs/glusterd.vol
volume management
     type mgmt/glusterd
     option working-directory /var/lib/glusterd
     option transport-type socket,rdma
     option transport.socket.keepalive-time 10
     option transport.socket.keepalive-interval 2
     option transport.socket.read-fail-log off
     option ping-timeout 0
     option event-threads 1
     option cluster.quorum-type none
     option cluster.server-quorum-type none
#   option lock-timer 180
#   option transport.address-family inet6
#   option base-port 49152
#   option max-port  65535
end-volume
[root at nfs01 glusterfs]#

[root at nfs02 glusterfs]# grep -E " E " *.log
glusterd.log:[2018-04-30 06:37:51.315618] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-04-30 06:37:51.315696] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glusterd.log:[2018-04-30 06:40:37.994481] E 
[socket.c:2374:socket_connect_finish] 0-management: connection to 
192.168.0.131:24007 failed (Connection refused); disconnecting socket
glusterd.log:[2018-05-01 04:56:19.231954] E 
[socket.c:2374:socket_connect_finish] 0-management: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
glusterd.log:[2018-05-01 22:43:04.195366] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-05-01 22:43:04.195445] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glusterd.log:[2018-05-02 02:46:32.397585] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-05-02 02:46:32.397653] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glusterd.log:[2018-05-02 03:16:10.937203] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-05-02 03:16:10.937261] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glusterd.log:[2018-05-02 03:57:20.918315] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-05-02 03:57:20.918400] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glusterd.log:[2018-05-05 01:37:24.981265] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-05-05 01:37:24.981346] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glusterd.log:[2018-05-07 03:04:20.053473] E 
[rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: 
/usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared 
object file: No such file or directory
glusterd.log:[2018-05-07 03:04:20.053553] E [MSGID: 106243] 
[glusterd.c:1769:init] 0-management: creation of 1 listeners failed, 
continuing with succeeded transport
glustershd.log:[2018-04-30 06:37:53.671466] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
glustershd.log:[2018-04-30 06:40:41.694799] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
glustershd.log:[2018-05-01 04:55:57.191783] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
glustershd.log:[2018-05-01 05:10:55.207027] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
glustershd.log:[2018-05-01 22:43:06.313941] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
glustershd.log:[2018-05-02 03:16:12.884697] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
n.log:[2018-05-01 04:56:01.191877] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
n.log:[2018-05-01 05:10:56.448375] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:49152 failed (Connection refused); disconnecting socket
n.log:[2018-05-01 22:43:06.412067] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
n.log:[2018-05-01 22:43:55.554833] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:49152 failed (Connection refused); disconnecting socket
n.log:[2018-05-02 03:16:12.919833] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
n.log:[2018-05-05 01:38:37.389091] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
n.log:[2018-05-05 01:38:37.389171] E 
[fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root 
failed (Transport endpoint is not connected)
n.log:[2018-05-05 01:38:46.974945] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
n.log:[2018-05-05 01:38:46.975012] E 
[fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 
(00000000-0000-0000-0000-000000000001) resolution failed
n.log:[2018-05-05 01:38:47.010671] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
n.log:[2018-05-05 01:38:47.010731] E 
[fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 
(00000000-0000-0000-0000-000000000001) resolution failed
n.log:[2018-05-07 03:05:48.552793] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
n.log:[2018-05-07 03:05:48.552872] E 
[fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root 
failed (Transport endpoint is not connected)
n.log:[2018-05-07 03:05:56.084586] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
n.log:[2018-05-07 03:05:56.084655] E 
[fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 
(00000000-0000-0000-0000-000000000001) resolution failed
n.log:[2018-05-07 03:05:56.148767] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
n.log:[2018-05-07 03:05:56.148825] E 
[fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 
(00000000-0000-0000-0000-000000000001) resolution failed
[root at nfs02 glusterfs]#

ganesha-gfapi.log:[2018-04-08 03:45:25.440067] E 
[socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-08 03:45:28.455560] E 
[socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-08 03:45:29.145764] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 03:51:15.529380] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 03:51:29.754070] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 03:51:40.633012] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 04:36:28.005490] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 04:37:09.038708] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 04:37:09.039432] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 04:37:09.044188] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 04:37:09.044484] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 17:17:02.093164] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 17:17:29.123148] E 
[socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-08 17:17:50.135169] E [MSGID: 114058] 
[client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 17:18:03.290346] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 17:18:14.202118] E 
[socket.c:2369:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-08 17:19:39.014330] E [MSGID: 108006] 
[afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 17:32:21.714643] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 17:32:21.734187] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
ganesha-gfapi.log:[2018-04-08 20:35:30.005234] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 20:55:29.009144] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 20:57:52.009895] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 21:00:29.004716] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-08 21:01:01.205704] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-08 21:01:01.209797] E [MSGID: 101046] 
[dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
ganesha-gfapi.log:[2018-04-09 04:41:02.006926] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-10 03:20:40.011967] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-10 03:30:33.057576] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-13 02:13:01.005629] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-14 21:41:18.313290] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-15 03:01:37.005636] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-15 03:02:37.319050] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-15 03:43:02.719856] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
ganesha-gfapi.log:[2018-04-15 20:36:31.143742] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-16 00:02:38.697700] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-16 05:16:38.383945] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-16 05:25:30.904382] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
ganesha-gfapi.log:[2018-04-16 05:25:57.432071] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-16 05:26:00.122608] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-16 05:30:20.172115] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-17 05:07:05.006133] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-17 05:08:39.004624] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-20 04:58:55.043976] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-20 05:07:22.762457] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
ganesha-gfapi.log:[2018-04-20 05:09:18.710446] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-20 05:09:21.489724] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-28 07:22:16.791636] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-28 07:22:16.797525] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 
192.168.0.119:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-04-28 07:22:16.797565] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-28 07:36:29.927497] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-04-28 07:36:31.215686] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
ganesha-gfapi.log:[2018-04-28 07:36:31.216287] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-30 06:37:02.005127] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-04-30 06:37:53.985563] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-05-01 04:55:57.191787] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:24007 failed (No route to host); disconnecting socket
ganesha-gfapi.log:[2018-05-01 05:10:55.595474] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-05-01 05:10:56.620226] E 
[socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 
192.168.0.131:49152 failed (Connection refused); disconnecting socket
ganesha-gfapi.log:[2018-05-01 22:42:26.005472] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-05-01 22:43:06.423349] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-05-02 03:16:12.930652] E [MSGID: 114058] 
[client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
ganesha-gfapi.log:[2018-05-03 02:43:03.021549] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-05-03 03:00:01.034676] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-05-03 03:59:28.006170] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-05-05 01:38:47.474503] E 
[rpc-clnt.c:350:saved_frames_unwind] (--> 
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> 
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> 
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> 
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] 
(--> 
/usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88]
))))) 0-gv01-client-0: forced unwinding frame type(GlusterFS Handshake) 
op(SETVOLUME(1)) called at 2018-05-05 01:38:46.968501 (xid=0x5)
ganesha-gfapi.log:[2018-05-05 01:38:47.474834] E [MSGID: 108006] 
[afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
All subvolumes are down. Going offline until atleast one of them comes 
back up.
ganesha-gfapi.log:[2018-05-05 01:38:47.474960] E 
[rpc-clnt.c:350:saved_frames_unwind] (--> 
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> 
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> 
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> 
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] 
(--> 
/usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88]
))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) 
called at 2018-05-05 01:38:46.965204 (xid=0x2)
ganesha-gfapi.log:[2018-05-05 01:38:50.457456] E 
[rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to 
timer event queue
ganesha-gfapi.log:[2018-05-07 03:05:58.522295] E 
[rpc-clnt.c:350:saved_frames_unwind] (--> 
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f45af6d6f0b] (--> 
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f45af49be7e] (--> 
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f45af49bf9e] (--> 
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f45af49d720] 
(--> 
/usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f45a0db6c88]
))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) 
called at 2018-05-07 03:05:56.080210 (xid=0x2)
ganesha-gfapi.log:[2018-05-07 03:05:59.504926] E 
[rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to 
timer event queue
ganesha-gfapi.log:[2018-05-07 03:05:59.505274] E 
[rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-0: Error adding to 
timer event queue
[root at nfs02 ganesha]#
[root at nfs02 ganesha]#

> 
> 
> On Wed, Apr 11, 2018 at 4:35 AM, TomK <tomkcpr at mdevsys.com 
> <mailto:tomkcpr at mdevsys.com>> wrote:
> 
>     On 4/9/2018 2:45 AM, Alex K wrote:
>     Hey Alex,
> 
>     With two nodes, the setup works but both sides go down when one node
>     is missing.? Still I set the below two params to none and that
>     solved my issue:
> 
>     cluster.quorum-type: none
>     cluster.server-quorum-type: none
> 
> yes this disables quorum so as to avoid the issue. Glad that this 
> helped. Bare in in mind though that it is easier to face split-brain 
> issues with quorum is disabled, that's why 3 nodes at least are 
> recommended. Just to note that I have also a 2 node cluster which is 
> running without issues for long time.
> 
>     Thank you for that.
> 
>     Cheers,
>     Tom
> 
>         Hi,
> 
>         You need 3 nodes at least to have quorum enabled. In 2 node
>         setup you need to disable quorum so as to be able to still use
>         the volume when one of the nodes go down.
> 
>         On Mon, Apr 9, 2018, 09:02 TomK <tomkcpr at mdevsys.com
>         <mailto:tomkcpr at mdevsys.com> <mailto:tomkcpr at
mdevsys.com
>         <mailto:tomkcpr at mdevsys.com>>> wrote:
> 
>          ? ? Hey All,
> 
>          ? ? In a two node glusterfs setup, with one node down, can't
>         use the second
>          ? ? node to mount the volume.? I understand this is expected
>         behaviour?
>          ? ? Anyway to allow the secondary node to function then
>         replicate what
>          ? ? changed to the first (primary) when it's back online?? Or
>         should I just
>          ? ? go for a third node to allow for this?
> 
>          ? ? Also, how safe is it to set the following to none?
> 
>          ? ? cluster.quorum-type: auto
>          ? ? cluster.server-quorum-type: server
> 
> 
>          ? ? [root at nfs01 /]# gluster volume start gv01
>          ? ? volume start: gv01: failed: Quorum not met. Volume
>         operation not
>          ? ? allowed.
>          ? ? [root at nfs01 /]#
> 
> 
>          ? ? [root at nfs01 /]# gluster volume status
>          ? ? Status of volume: gv01
>          ? ? Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TCP Port? RDMA
>         Port? ? ?Online? Pid
>             
>        
------------------------------------------------------------------------------
>          ? ? Brick nfs01:/bricks/0/gv01? ? ? ? ? ? ? ? ? N/A? ? ? ?N/A 
>          ? ? ? N? ? ? ? ? ?N/A
>          ? ? Self-heal Daemon on localhost? ? ? ? ? ? ? ?N/A? ? ? ?N/A 
>          ? ? ? Y
>          ? ? 25561
> 
>          ? ? Task Status of Volume gv01
>             
>        
------------------------------------------------------------------------------
>          ? ? There are no active volume tasks
> 
>          ? ? [root at nfs01 /]#
> 
> 
>          ? ? [root at nfs01 /]# gluster volume info
> 
>          ? ? Volume Name: gv01
>          ? ? Type: Replicate
>          ? ? Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
>          ? ? Status: Started
>          ? ? Snapshot Count: 0
>          ? ? Number of Bricks: 1 x 2 = 2
>          ? ? Transport-type: tcp
>          ? ? Bricks:
>          ? ? Brick1: nfs01:/bricks/0/gv01
>          ? ? Brick2: nfs02:/bricks/0/gv01
>          ? ? Options Reconfigured:
>          ? ? transport.address-family: inet
>          ? ? nfs.disable: on
>          ? ? performance.client-io-threads: off
>          ? ? nfs.trusted-sync: on
>          ? ? performance.cache-size: 1GB
>          ? ? performance.io-thread-count: 16
>          ? ? performance.write-behind-window-size: 8MB
>          ? ? performance.readdir-ahead: on
>          ? ? client.event-threads: 8
>          ? ? server.event-threads: 8
>          ? ? cluster.quorum-type: auto
>          ? ? cluster.server-quorum-type: server
>          ? ? [root at nfs01 /]#
> 
> 
> 
> 
>          ? ? ==> n.log <=>          ? ? [2018-04-09
05:08:13.704156] I [MSGID: 100030]
>         [glusterfsd.c:2556:main]
>          ? ? 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
>         version
>          ? ? 3.13.2 (args: /usr/sbin/glusterfs --process-name fuse
>          ? ? --volfile-server=nfs01 --volfile-id=/gv01 /n)
>          ? ? [2018-04-09 05:08:13.711255] W [MSGID: 101002]
>          ? ? [options.c:995:xl_opt_validate] 0-glusterfs: option
>         'address-family' is
>          ? ? deprecated, preferred is 'transport.address-family',
>         continuing with
>          ? ? correction
>          ? ? [2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect]
>          ? ? 0-glusterfs: Error disabling sockopt IPV6_V6ONLY:
"Protocol not
>          ? ? available"
>          ? ? [2018-04-09 05:08:13.729025] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 1
>          ? ? [2018-04-09 05:08:13.737757] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 2
>          ? ? [2018-04-09 05:08:13.738114] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 3
>          ? ? [2018-04-09 05:08:13.738203] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 4
>          ? ? [2018-04-09 05:08:13.738324] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 5
>          ? ? [2018-04-09 05:08:13.738330] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 6
>          ? ? [2018-04-09 05:08:13.738655] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 7
>          ? ? [2018-04-09 05:08:13.738742] I [MSGID: 101190]
>          ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll:
>         Started thread
>          ? ? with index 8
>          ? ? [2018-04-09 05:08:13.739460] W [MSGID: 101174]
>          ? ? [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead:
>         option
>          ? ? 'parallel-readdir' is not recognized
>          ? ? [2018-04-09 05:08:13.739787] I [MSGID: 114020]
>         [client.c:2360:notify]
>          ? ? 0-gv01-client-0: parent translators are ready, attempting
>         connect on
>          ? ? transport
>          ? ? [2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect]
>          ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY:
>         "Protocol not
>          ? ? available"
>          ? ? [2018-04-09 05:08:13.747372] I [MSGID: 114020]
>         [client.c:2360:notify]
>          ? ? 0-gv01-client-1: parent translators are ready, attempting
>         connect on
>          ? ? transport
>          ? ? [2018-04-09 05:08:13.747883] E [MSGID: 114058]
>          ? ? [client-handshake.c:1571:client_query_portmap_cbk]
>         0-gv01-client-0:
>          ? ? failed to get the port number for remote subvolume. Please
>         run 'gluster
>          ? ? volume status' on server to see if brick process is
running.
>          ? ? [2018-04-09 05:08:13.748026] I [MSGID: 114018]
>          ? ? [client.c:2285:client_rpc_notify] 0-gv01-client-0:
>         disconnected from
>          ? ? gv01-client-0. Client process will keep trying to connect
>         to glusterd
>          ? ? until brick's port is available
>          ? ? [2018-04-09 05:08:13.748070] W [MSGID: 108001]
>          ? ? [afr-common.c:5391:afr_notify] 0-gv01-replicate-0:
>         Client-quorum is
>          ? ? not met
>          ? ? [2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect]
>          ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY:
>         "Protocol not
>          ? ? available"
>          ? ? Final graph:
>             
>        
+------------------------------------------------------------------------------+
>          ? ? ?? ?1: volume gv01-client-0
>          ? ? ?? ?2:? ? ?type protocol/client
>          ? ? ?? ?3:? ? ?option ping-timeout 42
>          ? ? ?? ?4:? ? ?option remote-host nfs01
>          ? ? ?? ?5:? ? ?option remote-subvolume /bricks/0/gv01
>          ? ? ?? ?6:? ? ?option transport-type socket
>          ? ? ?? ?7:? ? ?option transport.address-family inet
>          ? ? ?? ?8:? ? ?option username
916ccf06-dc1d-467f-bc3d-f00a7449618f
>          ? ? ?? ?9:? ? ?option password
a44739e0-9587-411f-8e6a-9a6a4e46156c
>          ? ? ?? 10:? ? ?option event-threads 8
>          ? ? ?? 11:? ? ?option transport.tcp-user-timeout 0
>          ? ? ?? 12:? ? ?option transport.socket.keepalive-time 20
>          ? ? ?? 13:? ? ?option transport.socket.keepalive-interval 2
>          ? ? ?? 14:? ? ?option transport.socket.keepalive-count 9
>          ? ? ?? 15:? ? ?option send-gids true
>          ? ? ?? 16: end-volume
>          ? ? ?? 17:
>          ? ? ?? 18: volume gv01-client-1
>          ? ? ?? 19:? ? ?type protocol/client
>          ? ? ?? 20:? ? ?option ping-timeout 42
>          ? ? ?? 21:? ? ?option remote-host nfs02
>          ? ? ?? 22:? ? ?option remote-subvolume /bricks/0/gv01
>          ? ? ?? 23:? ? ?option transport-type socket
>          ? ? ?? 24:? ? ?option transport.address-family inet
>          ? ? ?? 25:? ? ?option username
916ccf06-dc1d-467f-bc3d-f00a7449618f
>          ? ? ?? 26:? ? ?option password
a44739e0-9587-411f-8e6a-9a6a4e46156c
>          ? ? ?? 27:? ? ?option event-threads 8
>          ? ? ?? 28:? ? ?option transport.tcp-user-timeout 0
>          ? ? ?? 29:? ? ?option transport.socket.keepalive-time 20
>          ? ? ?? 30:? ? ?option transport.socket.keepalive-interval 2
>          ? ? ?? 31:? ? ?option transport.socket.keepalive-count 9
>          ? ? ?? 32:? ? ?option send-gids true
>          ? ? ?? 33: end-volume
>          ? ? ?? 34:
>          ? ? ?? 35: volume gv01-replicate-0
>          ? ? ?? 36:? ? ?type cluster/replicate
>          ? ? ?? 37:? ? ?option afr-pending-xattr
gv01-client-0,gv01-client-1
>          ? ? ?? 38:? ? ?option quorum-type auto
>          ? ? ?? 39:? ? ?option use-compound-fops off
>          ? ? ?? 40:? ? ?subvolumes gv01-client-0 gv01-client-1
>          ? ? ?? 41: end-volume
>          ? ? ?? 42:
>          ? ? ?? 43: volume gv01-dht
>          ? ? ?? 44:? ? ?type cluster/distribute
>          ? ? ?? 45:? ? ?option lock-migration off
>          ? ? ?? 46:? ? ?subvolumes gv01-replicate-0
>          ? ? ?? 47: end-volume
>          ? ? ?? 48:
>          ? ? ?? 49: volume gv01-write-behind
>          ? ? ?? 50:? ? ?type performance/write-behind
>          ? ? ?? 51:? ? ?option cache-size 8MB
>          ? ? ?? 52:? ? ?subvolumes gv01-dht
>          ? ? ?? 53: end-volume
>          ? ? ?? 54:
>          ? ? ?? 55: volume gv01-read-ahead
>          ? ? ?? 56:? ? ?type performance/read-ahead
>          ? ? ?? 57:? ? ?subvolumes gv01-write-behind
>          ? ? ?? 58: end-volume
>          ? ? ?? 59:
>          ? ? ?? 60: volume gv01-readdir-ahead
>          ? ? ?? 61:? ? ?type performance/readdir-ahead
>          ? ? ?? 62:? ? ?option parallel-readdir off
>          ? ? ?? 63:? ? ?option rda-request-size 131072
>          ? ? ?? 64:? ? ?option rda-cache-limit 10MB
>          ? ? ?? 65:? ? ?subvolumes gv01-read-ahead
>          ? ? ?? 66: end-volume
>          ? ? ?? 67:
>          ? ? ?? 68: volume gv01-io-cache
>          ? ? ?? 69:? ? ?type performance/io-cache
>          ? ? ?? 70:? ? ?option cache-size 1GB
>          ? ? ?? 71:? ? ?subvolumes gv01-readdir-ahead
>          ? ? ?? 72: end-volume
>          ? ? ?? 73:
>          ? ? ?? 74: volume gv01-quick-read
>          ? ? ?? 75:? ? ?type performance/quick-read
>          ? ? ?? 76:? ? ?option cache-size 1GB
>          ? ? ?? 77:? ? ?subvolumes gv01-io-cache
>          ? ? ?? 78: end-volume
>          ? ? ?? 79:
>          ? ? ?? 80: volume gv01-open-behind
>          ? ? ?? 81:? ? ?type performance/open-behind
>          ? ? ?? 82:? ? ?subvolumes gv01-quick-read
>          ? ? ?? 83: end-volume
>          ? ? ?? 84:
>          ? ? ?? 85: volume gv01-md-cache
>          ? ? ?? 86:? ? ?type performance/md-cache
>          ? ? ?? 87:? ? ?subvolumes gv01-open-behind
>          ? ? ?? 88: end-volume
>          ? ? ?? 89:
>          ? ? ?? 90: volume gv01
>          ? ? ?? 91:? ? ?type debug/io-stats
>          ? ? ?? 92:? ? ?option log-level INFO
>          ? ? ?? 93:? ? ?option latency-measurement off
>          ? ? ?? 94:? ? ?option count-fop-hits off
>          ? ? ?? 95:? ? ?subvolumes gv01-md-cache
>          ? ? ?? 96: end-volume
>          ? ? ?? 97:
>          ? ? ?? 98: volume meta-autoload
>          ? ? ?? 99:? ? ?type meta
>          ? ? 100:? ? ?subvolumes gv01
>          ? ? 101: end-volume
>          ? ? 102:
>             
>        
+------------------------------------------------------------------------------+
>          ? ? [2018-04-09 05:08:13.922631] E
>         [socket.c:2374:socket_connect_finish]
>          ? ? 0-gv01-client-1: connection to 192.168.0.119:24007
>         <http://192.168.0.119:24007>
>          ? ? <http://192.168.0.119:24007> failed (No route to
> 
>          ? ? host); disconnecting socket
>          ? ? [2018-04-09 05:08:13.922690] E [MSGID: 108006]
>          ? ? [afr-common.c:5164:__afr_handle_child_down_event]
>         0-gv01-replicate-0:
>          ? ? All subvolumes are down. Going offline until atleast one of
>         them comes
>          ? ? back up.
>          ? ? [2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init]
>          ? ? 0-glusterfs-fuse: FUSE inited with protocol versions:
>         glusterfs 7.24
>          ? ? kernel 7.22
>          ? ? [2018-04-09 05:08:13.926245] I
>         [fuse-bridge.c:4835:fuse_graph_sync]
>          ? ? 0-fuse: switched to graph 0
>          ? ? [2018-04-09 05:08:13.926518] I [MSGID: 108006]
>          ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no
>         subvolumes up
>          ? ? [2018-04-09 05:08:13.926671] E [MSGID: 101046]
>          ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is
null
>          ? ? [2018-04-09 05:08:13.926762] E
>         [fuse-bridge.c:4271:fuse_first_lookup]
>          ? ? 0-fuse: first lookup on root failed (Transport endpoint is not
>          ? ? connected)
>          ? ? [2018-04-09 05:08:13.927207] I [MSGID: 108006]
>          ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no
>         subvolumes up
>          ? ? [2018-04-09 05:08:13.927262] E [MSGID: 101046]
>          ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is
null
>          ? ? [2018-04-09 05:08:13.927301] W
>          ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse:
>          ? ? 00000000-0000-0000-0000-000000000001: failed to resolve
>         (Transport
>          ? ? endpoint is not connected)
>          ? ? [2018-04-09 05:08:13.927339] E
>         [fuse-bridge.c:900:fuse_getattr_resume]
>          ? ? 0-glusterfs-fuse: 2: GETATTR 1
>         (00000000-0000-0000-0000-000000000001)
>          ? ? resolution failed
>          ? ? [2018-04-09 05:08:13.931497] I [MSGID: 108006]
>          ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no
>         subvolumes up
>          ? ? [2018-04-09 05:08:13.931558] E [MSGID: 101046]
>          ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is
null
>          ? ? [2018-04-09 05:08:13.931599] W
>          ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse:
>          ? ? 00000000-0000-0000-0000-000000000001: failed to resolve
>         (Transport
>          ? ? endpoint is not connected)
>          ? ? [2018-04-09 05:08:13.931623] E
>         [fuse-bridge.c:900:fuse_getattr_resume]
>          ? ? 0-glusterfs-fuse: 3: GETATTR 1
>         (00000000-0000-0000-0000-000000000001)
>          ? ? resolution failed
>          ? ? [2018-04-09 05:08:13.937258] I
>         [fuse-bridge.c:5093:fuse_thread_proc]
>          ? ? 0-fuse: initating unmount of /n
>          ? ? [2018-04-09 05:08:13.938043] W
>         [glusterfsd.c:1393:cleanup_and_exit]
>          ? ? (-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25]
>          ? ? -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>         [0x560b52471675]
>          ? ? -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>         [0x560b5247149b] ) 0-:
>          ? ? received signum (15), shutting down
>          ? ? [2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini]
>         0-fuse:
>          ? ? Unmounting '/n'.
>          ? ? [2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini]
>         0-fuse: Closing
>          ? ? fuse connection to '/n'.
> 
>          ? ? ==> glusterd.log <=>          ? ? [2018-04-09
05:08:15.118078] W [socket.c:3216:socket_connect]
>          ? ? 0-management: Error disabling sockopt IPV6_V6ONLY:
>         "Protocol not
>          ? ? available"
> 
>          ? ? ==> glustershd.log <=>          ? ? [2018-04-09
05:08:15.282192] W [socket.c:3216:socket_connect]
>          ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY:
>         "Protocol not
>          ? ? available"
>          ? ? [2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect]
>          ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY:
>         "Protocol not
>          ? ? available"
> 
> 
> 
> 
> 
> 
> 
>          ? ? --
>          ? ? Cheers,
>          ? ? Tom K.
>             
>        
-------------------------------------------------------------------------------------
> 
>          ? ? Living on earth is expensive, but it includes a free trip
>         around the
>          ? ? sun.
> 
>          ? ? _______________________________________________
>          ? ? Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>         <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> 
> 
>     -- 
>     Cheers,
>     Tom K.
>    
-------------------------------------------------------------------------------------
> 
>     Living on earth is expensive, but it includes a free trip around the
>     sun.
> 
>

TomK

2018-May-22 05:43 UTC

head link

[Gluster-users] [SOLVED] [Nfs-ganesha-support] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

Hey All,

Appears I solved this one and NFS mounts now work on all my clients.  No 
issues since fixing it a few hours back.

RESOLUTION

Auditd is to blame for the trouble.  Noticed this in the logs on 2 of 
the 3 NFS servers (nfs01, nfs02, nfs03):

type=AVC msg=audit(1526965320.850:4094): avc:  denied  { write } for 
pid=8714 comm="ganesha.nfsd" name="nfs_0"
dev="dm-0" ino=201547689
scontext=system_u:system_r:ganesha_t:s0 
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2 
success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1 
pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 
fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" 
exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0
key=(null)
type=PROCTITLE msg=audit(1526965320.850:4094): 
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
type=AVC msg=audit(1526965320.850:4095): avc:  denied  { unlink } for 
pid=8714 comm="ganesha.nfsd" name="nfs_0"
dev="dm-0" ino=201547689
scontext=system_u:system_r:ganesha_t:s0 
tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87 
success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5 
items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 
fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 
comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" 
subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4095): 
proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54

Fix was to adjust the SELinux rules using audit2allow.

All the errors below including the one in the link below, were due to that.

Turns out that when ever it worked, it hit the only working server in 
the system, nfs03.  Whenever it didn't work, it was hitting the non 
working servers.  So sometimes it worked, and other times it didn't.  It 
looked like it was to do with Haproxy / Keepalived as well since I 
couldn't mount using the VIP but could using the host.  But that wasn't 
the case either.

I've also added the third brick to the Gluster FS, nfs03, trying to see 
if the backend FS was to blame since Gluster FS recommends 3 bricks 
minimum for replication, but that had no effect.

In case anyone runs into this, I've added notes here as well:

http://microdevsys.com/wp/kernel-nfs-nfs4_discover_server_trunking-unhandled-error-512-exiting-with-error-eio-and-mount-hangs/

http://microdevsys.com/wp/nfs-reply-xid-3844308326-reply-err-20-auth-rejected-credentials-client-should-begin-new-session/

The errors thrown included:

NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client 
should begin new session)

kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting 
with error EIO and mount hangs

+ the kernel exception below.

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.


May 21 23:53:13 psql01 kernel: CPU: 3 PID: 2273 Comm: mount.nfs Tainted: 
G             L ------------   3.10.0-693.21.1.el7.x86_64 #1
.
.
.
May 21 23:53:13 psql01 kernel: task: ffff880136335ee0 ti: 
ffff8801376b0000 task.ti: ffff8801376b0000
May 21 23:53:13 psql01 kernel: RIP: 0010:[<ffffffff816b6545>] 
[<ffffffff816b6545>] _raw_spin_unlock_irqrestore+0x15/0x20
May 21 23:53:13 psql01 kernel: RSP: 0018:ffff8801376b3a60  EFLAGS: 00000206
May 21 23:53:13 psql01 kernel: RAX: ffffffffc05ab078 RBX: 
ffff880036973928 RCX: dead000000000200
May 21 23:53:13 psql01 kernel: RDX: ffffffffc05ab078 RSI: 
0000000000000206 RDI: 0000000000000206
May 21 23:53:13 psql01 kernel: RBP: ffff8801376b3a60 R08: 
ffff8801376b3ab8 R09: ffff880137de1200
May 21 23:53:13 psql01 kernel: R10: ffff880036973928 R11: 
0000000000000000 R12: ffff880036973928
May 21 23:53:13 psql01 kernel: R13: ffff8801376b3a58 R14: 
ffff88013fd98a40 R15: ffff8801376b3a58
May 21 23:53:13 psql01 kernel: FS:  00007fab48f07880(0000) 
GS:ffff88013fd80000(0000) knlGS:0000000000000000
May 21 23:53:13 psql01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
May 21 23:53:13 psql01 kernel: CR2: 00007f99793d93cc CR3: 
000000013761e000 CR4: 00000000000007e0
May 21 23:53:13 psql01 kernel: DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
May 21 23:53:13 psql01 kernel: DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
May 21 23:53:13 psql01 kernel: Call Trace:
May 21 23:53:13 psql01 kernel: [<ffffffff810b4d86>] finish_wait+0x56/0x70
May 21 23:53:13 psql01 kernel: [<ffffffffc0580361>] 
nfs_wait_client_init_complete+0xa1/0xe0 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffff810b4fc0>] ? 
wake_up_atomic_t+0x30/0x30
May 21 23:53:13 psql01 kernel: [<ffffffffc0581e9b>] 
nfs_get_client+0x22b/0x470 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc05eafd8>] 
nfs4_set_client+0x98/0x130 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05ec77e>] 
nfs4_create_server+0x13e/0x3b0 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05e391e>] 
nfs4_remote_mount+0x2e/0x60 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [<ffffffff811aa685>] ? 
__alloc_percpu+0x15/0x20
May 21 23:53:13 psql01 kernel: [<ffffffff81226d57>] 
vfs_kern_mount+0x67/0x110
May 21 23:53:13 psql01 kernel: [<ffffffffc05e3846>] 
nfs_do_root_mount+0x86/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05e3c44>] 
nfs4_try_mount+0x44/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [<ffffffffc05826d7>] ? 
get_nfs_version+0x27/0x90 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc058ec9b>] 
nfs_fs_mount+0x4cb/0xda0 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc058fbe0>] ? 
nfs_clone_super+0x140/0x140 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffffc058daa0>] ? 
param_set_portnr+0x70/0x70 [nfs]
May 21 23:53:13 psql01 kernel: [<ffffffff81209f1e>] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [<ffffffff811aa685>] ? 
__alloc_percpu+0x15/0x20
May 21 23:53:13 psql01 kernel: [<ffffffff81226d57>] 
vfs_kern_mount+0x67/0x110
May 21 23:53:13 psql01 kernel: [<ffffffff81229263>] do_mount+0x233/0xaf0
May 21 23:53:13 psql01 kernel: [<ffffffff81229ea6>] SyS_mount+0x96/0xf0
May 21 23:53:13 psql01 kernel: [<ffffffff816c0715>] 
system_call_fastpath+0x1c/0x21
May 21 23:53:13 psql01 kernel: [<ffffffff816c0661>] ? 
system_call_after_swapgs+0xae/0x146




On 5/7/2018 10:28 PM, TomK wrote:> This list has been deprecated. Please subscribe to the new support list 
> at lists.nfs-ganesha.org.
> On 4/11/2018 11:54 AM, Alex K wrote:
> 
> Hey Guy's,
> 
> Returning to this topic, after disabling the the quorum:
> 
> cluster.quorum-type: none
> cluster.server-quorum-type: none
> 
> I've ran into a number of gluster errors (see below).
> 
> I'm using gluster as the backend for my NFS storage.? I have gluster 
> running on two nodes, nfs01 and nfs02.? It's mounted on /n on each
host.
>  ?The path /n is in turn shared out by NFS Ganesha.? It's a two node 
> setup with quorum disabled as noted below:
> 
> [root at nfs02 ganesha]# mount|grep gv01
> nfs02:/gv01 on /n type fuse.glusterfs 
>
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> 
> 
> [root at nfs01 glusterfs]# mount|grep gv01
> nfs01:/gv01 on /n type fuse.glusterfs 
>
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> 
> 
> Gluster always reports as working no matter when I type the below two 
> commands:
> 
> [root at nfs01 glusterfs]# gluster volume info
> 
> Volume Name: gv01
> Type: Replicate
> Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: nfs01:/bricks/0/gv01
> Brick2: nfs02:/bricks/0/gv01
> Options Reconfigured:
> cluster.server-quorum-type: none
> cluster.quorum-type: none
> server.event-threads: 8
> client.event-threads: 8
> performance.readdir-ahead: on
> performance.write-behind-window-size: 8MB
> performance.io-thread-count: 16
> performance.cache-size: 1GB
> nfs.trusted-sync: on
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> [root at nfs01 glusterfs]# gluster status
> unrecognized word: status (position 0)
> [root at nfs01 glusterfs]# gluster volume status
> Status of volume: gv01
> Gluster process???????????????????????????? TCP Port? RDMA Port? Online  
> Pid
>
------------------------------------------------------------------------------
> 
> Brick nfs01:/bricks/0/gv01????????????????? 49152???? 0????????? Y 1422
> Brick nfs02:/bricks/0/gv01????????????????? 49152???? 0????????? Y 1422
> Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y 1248
> Self-heal Daemon on nfs02.nix.my.dom?????? N/A?????? N/A??????? Y       
> 1251
> 
> Task Status of Volume gv01
>
------------------------------------------------------------------------------
> 
> There are no active volume tasks
> 
> [root at nfs01 glusterfs]#
> 
> [root at nfs01 glusterfs]# rpm -aq|grep -Ei gluster
> glusterfs-3.13.2-2.el7.x86_64
> glusterfs-devel-3.13.2-2.el7.x86_64
> glusterfs-fuse-3.13.2-2.el7.x86_64
> glusterfs-api-devel-3.13.2-2.el7.x86_64
> centos-release-gluster313-1.0-1.el7.centos.noarch
> python2-gluster-3.13.2-2.el7.x86_64
> glusterfs-client-xlators-3.13.2-2.el7.x86_64
> glusterfs-server-3.13.2-2.el7.x86_64
> libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64
> glusterfs-cli-3.13.2-2.el7.x86_64
> centos-release-gluster312-1.0-1.el7.centos.noarch
> python2-glusterfs-api-1.1-1.el7.noarch
> glusterfs-libs-3.13.2-2.el7.x86_64
> glusterfs-extra-xlators-3.13.2-2.el7.x86_64
> glusterfs-api-3.13.2-2.el7.x86_64
> [root at nfs01 glusterfs]#
> 
> The short of it is that everything works and mounts on guests work as 
> long as I don't try to write to the NFS share from my clients.? If I
try
> to write to the share, everything comes apart like this:
> 
> -sh-4.2$ pwd
> /n/my.dom/tom
> -sh-4.2$ ls -altri
> total 6258
> 11715278280495367299 -rw-------. 1 tom at my.dom tom at my.dom???? 231 Feb
17
> 20:15 .bashrc
> 10937819299152577443 -rw-------. 1 tom at my.dom tom at my.dom???? 193 Feb
17
> 20:15 .bash_profile
> 10823746994379198104 -rw-------. 1 tom at my.dom tom at my.dom????? 18 Feb
17
> 20:15 .bash_logout
> 10718721668898812166 drwxr-xr-x. 3 root??????? root?????????? 4096 Mar 5 
> 02:46 ..
> 12008425472191154054 drwx------. 2 tom at my.dom tom at my.dom??? 4096 Mar
18
> 03:07 .ssh
> 13763048923429182948 -rw-rw-r--. 1 tom at my.dom tom at my.dom 6359568 Mar
25
> 22:38 opennebula-cores.tar.gz
> 11674701370106210511 -rw-rw-r--. 1 tom at my.dom tom at my.dom?????? 4 Apr?
9
> 23:25 meh.txt
>  ?9326637590629964475 -rw-r--r--. 1 tom at my.dom tom at my.dom?? 24970
May? 1
> 01:30 nfs-trace-working.dat.gz
>  ?9337343577229627320 -rw-------. 1 tom at my.dom tom at my.dom??? 3734
May? 1
> 23:38 .bash_history
> 11438151930727967183 drwx------. 3 tom at my.dom tom at my.dom??? 4096 May?
1
> 23:58 .
>  ?9865389421596220499 -rw-r--r--. 1 tom at my.dom tom at my.dom??? 4096
May? 1
> 23:58 .meh.txt.swp
> -sh-4.2$ touch test.txt
> -sh-4.2$ vi test.txt
> -sh-4.2$ ls -altri
> ls: cannot open directory .: Permission denied
> -sh-4.2$ ls -altri
> ls: cannot open directory .: Permission denied
> -sh-4.2$ ls -altri
> 
> This is followed by a slew of other errors in apps using the gluster 
> volume.? These errors include:
> 
> 02/05/2018 23:10:52 : epoch 5aea7bd5 : nfs02.nix.my.dom : 
> ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not 
> authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
> 
> 
> ==> ganesha-gfapi.log <=> [2018-05-03 04:32:18.009245] I [MSGID:
114021] [client.c:2369:notify]
> 0-gv01-client-0: current graph is no longer active, destroying rpc_client
> [2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify] 
> 0-gv01-client-1: current graph is no longer active, destroying rpc_client
> [2018-05-03 04:32:18.009499] I [MSGID: 114018] 
> [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from 
> gv01-client-0. Client process will keep trying to connect to glusterd 
> until brick's port is available
> [2018-05-03 04:32:18.009557] I [MSGID: 114018] 
> [client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from 
> gv01-client-1. Client process will keep trying to connect to glusterd 
> until brick's port is available
> [2018-05-03 04:32:18.009610] E [MSGID: 108006] 
> [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: 
> All subvolumes are down. Going offline until atleast one of them comes 
> back up.
> 
> 
> [2018-05-01 22:43:06.412067] E [MSGID: 114058] 
> [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: 
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
> [2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 
> 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection 
> refused); disconnecting socket
> 
> 
> So I'm wondering, if this is due to the two node gluster, as it seems
to
> be, and what is it that I really need to do here?? Should I go with the 
> recommended 3 node setup to avoid this which would include a proper 
> quorum?? Or is there more to this and it really doesn't matter if I
have
> a 2 node gluster cluster without a quorum and this is due to something 
> else still?
> 
> Again, anytime I check the gluter volumes, everything checks out.? The 
> results of both 'gluster volume info' and 'gluster volume
status' is
> always as I pasted above, fully working.
> 
> I'm also using the Linux KDC Free IPA with this solution as well.
>

Seemingly Similar Threads

Search for more maybe matching threads

Gluster users - May 2018 - [SOLVED] [Nfs-ganesha-support] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

[Gluster-users] [SOLVED] [Nfs-ganesha-support] volume start: gv01: failed: Quorum not met. Volume operation not allowed.

Seemingly Similar Threads