TomK
2018-Apr-11 01:35 UTC
[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.
On 4/9/2018 2:45 AM, Alex K wrote: Hey Alex, With two nodes, the setup works but both sides go down when one node is missing. Still I set the below two params to none and that solved my issue: cluster.quorum-type: none cluster.server-quorum-type: none Thank you for that. Cheers, Tom> Hi, > > You need 3 nodes at least to have quorum enabled. In 2 node setup you > need to disable quorum so as to be able to still use the volume when one > of the nodes go down. > > On Mon, Apr 9, 2018, 09:02 TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>> wrote: > > Hey All, > > In a two node glusterfs setup, with one node down, can't use the second > node to mount the volume.? I understand this is expected behaviour? > Anyway to allow the secondary node to function then replicate what > changed to the first (primary) when it's back online?? Or should I just > go for a third node to allow for this? > > Also, how safe is it to set the following to none? > > cluster.quorum-type: auto > cluster.server-quorum-type: server > > > [root at nfs01 /]# gluster volume start gv01 > volume start: gv01: failed: Quorum not met. Volume operation not > allowed. > [root at nfs01 /]# > > > [root at nfs01 /]# gluster volume status > Status of volume: gv01 > Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TCP Port? RDMA Port > Online? Pid > ------------------------------------------------------------------------------ > Brick nfs01:/bricks/0/gv01? ? ? ? ? ? ? ? ? N/A? ? ? ?N/A? ? ? ? N > ? ? ?N/A > Self-heal Daemon on localhost? ? ? ? ? ? ? ?N/A? ? ? ?N/A? ? ? ? Y > 25561 > > Task Status of Volume gv01 > ------------------------------------------------------------------------------ > There are no active volume tasks > > [root at nfs01 /]# > > > [root at nfs01 /]# gluster volume info > > Volume Name: gv01 > Type: Replicate > Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: nfs01:/bricks/0/gv01 > Brick2: nfs02:/bricks/0/gv01 > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > nfs.trusted-sync: on > performance.cache-size: 1GB > performance.io-thread-count: 16 > performance.write-behind-window-size: 8MB > performance.readdir-ahead: on > client.event-threads: 8 > server.event-threads: 8 > cluster.quorum-type: auto > cluster.server-quorum-type: server > [root at nfs01 /]# > > > > > ==> n.log <=> [2018-04-09 05:08:13.704156] I [MSGID: 100030] [glusterfsd.c:2556:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version > 3.13.2 (args: /usr/sbin/glusterfs --process-name fuse > --volfile-server=nfs01 --volfile-id=/gv01 /n) > [2018-04-09 05:08:13.711255] W [MSGID: 101002] > [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is > deprecated, preferred is 'transport.address-family', continuing with > correction > [2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect] > 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not > available" > [2018-04-09 05:08:13.729025] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2018-04-09 05:08:13.737757] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 2 > [2018-04-09 05:08:13.738114] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 3 > [2018-04-09 05:08:13.738203] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 4 > [2018-04-09 05:08:13.738324] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 5 > [2018-04-09 05:08:13.738330] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 6 > [2018-04-09 05:08:13.738655] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 7 > [2018-04-09 05:08:13.738742] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 8 > [2018-04-09 05:08:13.739460] W [MSGID: 101174] > [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option > 'parallel-readdir' is not recognized > [2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify] > 0-gv01-client-0: parent translators are ready, attempting connect on > transport > [2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect] > 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not > available" > [2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify] > 0-gv01-client-1: parent translators are ready, attempting connect on > transport > [2018-04-09 05:08:13.747883] E [MSGID: 114058] > [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: > failed to get the port number for remote subvolume. Please run 'gluster > volume status' on server to see if brick process is running. > [2018-04-09 05:08:13.748026] I [MSGID: 114018] > [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from > gv01-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2018-04-09 05:08:13.748070] W [MSGID: 108001] > [afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is > not met > [2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect] > 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not > available" > Final graph: > +------------------------------------------------------------------------------+ > ? ?1: volume gv01-client-0 > ? ?2:? ? ?type protocol/client > ? ?3:? ? ?option ping-timeout 42 > ? ?4:? ? ?option remote-host nfs01 > ? ?5:? ? ?option remote-subvolume /bricks/0/gv01 > ? ?6:? ? ?option transport-type socket > ? ?7:? ? ?option transport.address-family inet > ? ?8:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ?9:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? 10:? ? ?option event-threads 8 > ? 11:? ? ?option transport.tcp-user-timeout 0 > ? 12:? ? ?option transport.socket.keepalive-time 20 > ? 13:? ? ?option transport.socket.keepalive-interval 2 > ? 14:? ? ?option transport.socket.keepalive-count 9 > ? 15:? ? ?option send-gids true > ? 16: end-volume > ? 17: > ? 18: volume gv01-client-1 > ? 19:? ? ?type protocol/client > ? 20:? ? ?option ping-timeout 42 > ? 21:? ? ?option remote-host nfs02 > ? 22:? ? ?option remote-subvolume /bricks/0/gv01 > ? 23:? ? ?option transport-type socket > ? 24:? ? ?option transport.address-family inet > ? 25:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? 26:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? 27:? ? ?option event-threads 8 > ? 28:? ? ?option transport.tcp-user-timeout 0 > ? 29:? ? ?option transport.socket.keepalive-time 20 > ? 30:? ? ?option transport.socket.keepalive-interval 2 > ? 31:? ? ?option transport.socket.keepalive-count 9 > ? 32:? ? ?option send-gids true > ? 33: end-volume > ? 34: > ? 35: volume gv01-replicate-0 > ? 36:? ? ?type cluster/replicate > ? 37:? ? ?option afr-pending-xattr gv01-client-0,gv01-client-1 > ? 38:? ? ?option quorum-type auto > ? 39:? ? ?option use-compound-fops off > ? 40:? ? ?subvolumes gv01-client-0 gv01-client-1 > ? 41: end-volume > ? 42: > ? 43: volume gv01-dht > ? 44:? ? ?type cluster/distribute > ? 45:? ? ?option lock-migration off > ? 46:? ? ?subvolumes gv01-replicate-0 > ? 47: end-volume > ? 48: > ? 49: volume gv01-write-behind > ? 50:? ? ?type performance/write-behind > ? 51:? ? ?option cache-size 8MB > ? 52:? ? ?subvolumes gv01-dht > ? 53: end-volume > ? 54: > ? 55: volume gv01-read-ahead > ? 56:? ? ?type performance/read-ahead > ? 57:? ? ?subvolumes gv01-write-behind > ? 58: end-volume > ? 59: > ? 60: volume gv01-readdir-ahead > ? 61:? ? ?type performance/readdir-ahead > ? 62:? ? ?option parallel-readdir off > ? 63:? ? ?option rda-request-size 131072 > ? 64:? ? ?option rda-cache-limit 10MB > ? 65:? ? ?subvolumes gv01-read-ahead > ? 66: end-volume > ? 67: > ? 68: volume gv01-io-cache > ? 69:? ? ?type performance/io-cache > ? 70:? ? ?option cache-size 1GB > ? 71:? ? ?subvolumes gv01-readdir-ahead > ? 72: end-volume > ? 73: > ? 74: volume gv01-quick-read > ? 75:? ? ?type performance/quick-read > ? 76:? ? ?option cache-size 1GB > ? 77:? ? ?subvolumes gv01-io-cache > ? 78: end-volume > ? 79: > ? 80: volume gv01-open-behind > ? 81:? ? ?type performance/open-behind > ? 82:? ? ?subvolumes gv01-quick-read > ? 83: end-volume > ? 84: > ? 85: volume gv01-md-cache > ? 86:? ? ?type performance/md-cache > ? 87:? ? ?subvolumes gv01-open-behind > ? 88: end-volume > ? 89: > ? 90: volume gv01 > ? 91:? ? ?type debug/io-stats > ? 92:? ? ?option log-level INFO > ? 93:? ? ?option latency-measurement off > ? 94:? ? ?option count-fop-hits off > ? 95:? ? ?subvolumes gv01-md-cache > ? 96: end-volume > ? 97: > ? 98: volume meta-autoload > ? 99:? ? ?type meta > 100:? ? ?subvolumes gv01 > 101: end-volume > 102: > +------------------------------------------------------------------------------+ > [2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish] > 0-gv01-client-1: connection to 192.168.0.119:24007 > <http://192.168.0.119:24007> failed (No route to > host); disconnecting socket > [2018-04-09 05:08:13.922690] E [MSGID: 108006] > [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: > All subvolumes are down. Going offline until atleast one of them comes > back up. > [2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 > kernel 7.22 > [2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2018-04-09 05:08:13.926518] I [MSGID: 108006] > [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up > [2018-04-09 05:08:13.926671] E [MSGID: 101046] > [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > [2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not > connected) > [2018-04-09 05:08:13.927207] I [MSGID: 108006] > [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up > [2018-04-09 05:08:13.927262] E [MSGID: 101046] > [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > [2018-04-09 05:08:13.927301] W > [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getattr_resume] > 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) > resolution failed > [2018-04-09 05:08:13.931497] I [MSGID: 108006] > [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up > [2018-04-09 05:08:13.931558] E [MSGID: 101046] > [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > [2018-04-09 05:08:13.931599] W > [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getattr_resume] > 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) > resolution failed > [2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc] > 0-fuse: initating unmount of /n > [2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560b5247149b] ) 0-: > received signum (15), shutting down > [2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] 0-fuse: > Unmounting '/n'. > [2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: Closing > fuse connection to '/n'. > > ==> glusterd.log <=> [2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect] > 0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not > available" > > ==> glustershd.log <=> [2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect] > 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not > available" > [2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect] > 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not > available" > > > > > > > > -- > Cheers, > Tom K. > ------------------------------------------------------------------------------------- > > Living on earth is expensive, but it includes a free trip around the > sun. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Cheers, Tom K. ------------------------------------------------------------------------------------- Living on earth is expensive, but it includes a free trip around the sun.
Alex K
2018-Apr-11 15:54 UTC
[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.
On Wed, Apr 11, 2018 at 4:35 AM, TomK <tomkcpr at mdevsys.com> wrote:> On 4/9/2018 2:45 AM, Alex K wrote: > Hey Alex, > > With two nodes, the setup works but both sides go down when one node is > missing. Still I set the below two params to none and that solved my issue: > > cluster.quorum-type: none > cluster.server-quorum-type: none > > yes this disables quorum so as to avoid the issue. Glad that this helped.Bare in in mind though that it is easier to face split-brain issues with quorum is disabled, that's why 3 nodes at least are recommended. Just to note that I have also a 2 node cluster which is running without issues for long time.> Thank you for that. > > Cheers, > Tom > > Hi, >> >> You need 3 nodes at least to have quorum enabled. In 2 node setup you >> need to disable quorum so as to be able to still use the volume when one of >> the nodes go down. >> >> On Mon, Apr 9, 2018, 09:02 TomK <tomkcpr at mdevsys.com <mailto: >> tomkcpr at mdevsys.com>> wrote: >> >> Hey All, >> >> In a two node glusterfs setup, with one node down, can't use the >> second >> node to mount the volume. I understand this is expected behaviour? >> Anyway to allow the secondary node to function then replicate what >> changed to the first (primary) when it's back online? Or should I >> just >> go for a third node to allow for this? >> >> Also, how safe is it to set the following to none? >> >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> >> >> [root at nfs01 /]# gluster volume start gv01 >> volume start: gv01: failed: Quorum not met. Volume operation not >> allowed. >> [root at nfs01 /]# >> >> >> [root at nfs01 /]# gluster volume status >> Status of volume: gv01 >> Gluster process TCP Port RDMA Port >> Online Pid >> ------------------------------------------------------------ >> ------------------ >> Brick nfs01:/bricks/0/gv01 N/A N/A N >> N/A >> Self-heal Daemon on localhost N/A N/A Y >> 25561 >> >> Task Status of Volume gv01 >> ------------------------------------------------------------ >> ------------------ >> There are no active volume tasks >> >> [root at nfs01 /]# >> >> >> [root at nfs01 /]# gluster volume info >> >> Volume Name: gv01 >> Type: Replicate >> Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: nfs01:/bricks/0/gv01 >> Brick2: nfs02:/bricks/0/gv01 >> Options Reconfigured: >> transport.address-family: inet >> nfs.disable: on >> performance.client-io-threads: off >> nfs.trusted-sync: on >> performance.cache-size: 1GB >> performance.io-thread-count: 16 >> performance.write-behind-window-size: 8MB >> performance.readdir-ahead: on >> client.event-threads: 8 >> server.event-threads: 8 >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> [root at nfs01 /]# >> >> >> >> >> ==> n.log <=>> [2018-04-09 05:08:13.704156] I [MSGID: 100030] >> [glusterfsd.c:2556:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version >> 3.13.2 (args: /usr/sbin/glusterfs --process-name fuse >> --volfile-server=nfs01 --volfile-id=/gv01 /n) >> [2018-04-09 05:08:13.711255] W [MSGID: 101002] >> [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' >> is >> deprecated, preferred is 'transport.address-family', continuing with >> correction >> [2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect] >> 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not >> available" >> [2018-04-09 05:08:13.729025] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 1 >> [2018-04-09 05:08:13.737757] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 2 >> [2018-04-09 05:08:13.738114] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 3 >> [2018-04-09 05:08:13.738203] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 4 >> [2018-04-09 05:08:13.738324] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 5 >> [2018-04-09 05:08:13.738330] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 6 >> [2018-04-09 05:08:13.738655] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 7 >> [2018-04-09 05:08:13.738742] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started >> thread >> with index 8 >> [2018-04-09 05:08:13.739460] W [MSGID: 101174] >> [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option >> 'parallel-readdir' is not recognized >> [2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify] >> 0-gv01-client-0: parent translators are ready, attempting connect on >> transport >> [2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect] >> 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not >> available" >> [2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify] >> 0-gv01-client-1: parent translators are ready, attempting connect on >> transport >> [2018-04-09 05:08:13.747883] E [MSGID: 114058] >> [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: >> failed to get the port number for remote subvolume. Please run >> 'gluster >> volume status' on server to see if brick process is running. >> [2018-04-09 05:08:13.748026] I [MSGID: 114018] >> [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from >> gv01-client-0. Client process will keep trying to connect to glusterd >> until brick's port is available >> [2018-04-09 05:08:13.748070] W [MSGID: 108001] >> [afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is >> not met >> [2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect] >> 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not >> available" >> Final graph: >> +----------------------------------------------------------- >> -------------------+ >> 1: volume gv01-client-0 >> 2: type protocol/client >> 3: option ping-timeout 42 >> 4: option remote-host nfs01 >> 5: option remote-subvolume /bricks/0/gv01 >> 6: option transport-type socket >> 7: option transport.address-family inet >> 8: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f >> 9: option password a44739e0-9587-411f-8e6a-9a6a4e46156c >> 10: option event-threads 8 >> 11: option transport.tcp-user-timeout 0 >> 12: option transport.socket.keepalive-time 20 >> 13: option transport.socket.keepalive-interval 2 >> 14: option transport.socket.keepalive-count 9 >> 15: option send-gids true >> 16: end-volume >> 17: >> 18: volume gv01-client-1 >> 19: type protocol/client >> 20: option ping-timeout 42 >> 21: option remote-host nfs02 >> 22: option remote-subvolume /bricks/0/gv01 >> 23: option transport-type socket >> 24: option transport.address-family inet >> 25: option username 916ccf06-dc1d-467f-bc3d-f00a7449618f >> 26: option password a44739e0-9587-411f-8e6a-9a6a4e46156c >> 27: option event-threads 8 >> 28: option transport.tcp-user-timeout 0 >> 29: option transport.socket.keepalive-time 20 >> 30: option transport.socket.keepalive-interval 2 >> 31: option transport.socket.keepalive-count 9 >> 32: option send-gids true >> 33: end-volume >> 34: >> 35: volume gv01-replicate-0 >> 36: type cluster/replicate >> 37: option afr-pending-xattr gv01-client-0,gv01-client-1 >> 38: option quorum-type auto >> 39: option use-compound-fops off >> 40: subvolumes gv01-client-0 gv01-client-1 >> 41: end-volume >> 42: >> 43: volume gv01-dht >> 44: type cluster/distribute >> 45: option lock-migration off >> 46: subvolumes gv01-replicate-0 >> 47: end-volume >> 48: >> 49: volume gv01-write-behind >> 50: type performance/write-behind >> 51: option cache-size 8MB >> 52: subvolumes gv01-dht >> 53: end-volume >> 54: >> 55: volume gv01-read-ahead >> 56: type performance/read-ahead >> 57: subvolumes gv01-write-behind >> 58: end-volume >> 59: >> 60: volume gv01-readdir-ahead >> 61: type performance/readdir-ahead >> 62: option parallel-readdir off >> 63: option rda-request-size 131072 >> 64: option rda-cache-limit 10MB >> 65: subvolumes gv01-read-ahead >> 66: end-volume >> 67: >> 68: volume gv01-io-cache >> 69: type performance/io-cache >> 70: option cache-size 1GB >> 71: subvolumes gv01-readdir-ahead >> 72: end-volume >> 73: >> 74: volume gv01-quick-read >> 75: type performance/quick-read >> 76: option cache-size 1GB >> 77: subvolumes gv01-io-cache >> 78: end-volume >> 79: >> 80: volume gv01-open-behind >> 81: type performance/open-behind >> 82: subvolumes gv01-quick-read >> 83: end-volume >> 84: >> 85: volume gv01-md-cache >> 86: type performance/md-cache >> 87: subvolumes gv01-open-behind >> 88: end-volume >> 89: >> 90: volume gv01 >> 91: type debug/io-stats >> 92: option log-level INFO >> 93: option latency-measurement off >> 94: option count-fop-hits off >> 95: subvolumes gv01-md-cache >> 96: end-volume >> 97: >> 98: volume meta-autoload >> 99: type meta >> 100: subvolumes gv01 >> 101: end-volume >> 102: >> +----------------------------------------------------------- >> -------------------+ >> [2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish] >> 0-gv01-client-1: connection to 192.168.0.119:24007 >> <http://192.168.0.119:24007> failed (No route to >> >> host); disconnecting socket >> [2018-04-09 05:08:13.922690] E [MSGID: 108006] >> [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: >> All subvolumes are down. Going offline until atleast one of them comes >> back up. >> [2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 >> kernel 7.22 >> [2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync] >> 0-fuse: switched to graph 0 >> [2018-04-09 05:08:13.926518] I [MSGID: 108006] >> [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes >> up >> [2018-04-09 05:08:13.926671] E [MSGID: 101046] >> [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null >> [2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup] >> 0-fuse: first lookup on root failed (Transport endpoint is not >> connected) >> [2018-04-09 05:08:13.927207] I [MSGID: 108006] >> [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes >> up >> [2018-04-09 05:08:13.927262] E [MSGID: 101046] >> [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null >> [2018-04-09 05:08:13.927301] W >> [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: >> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport >> endpoint is not connected) >> [2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getatt >> r_resume] >> 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) >> resolution failed >> [2018-04-09 05:08:13.931497] I [MSGID: 108006] >> [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes >> up >> [2018-04-09 05:08:13.931558] E [MSGID: 101046] >> [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null >> [2018-04-09 05:08:13.931599] W >> [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: >> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport >> endpoint is not connected) >> [2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getatt >> r_resume] >> 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) >> resolution failed >> [2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc] >> 0-fuse: initating unmount of /n >> [2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560b5247149b] ) 0-: >> received signum (15), shutting down >> [2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] 0-fuse: >> Unmounting '/n'. >> [2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: >> Closing >> fuse connection to '/n'. >> >> ==> glusterd.log <=>> [2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect] >> 0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not >> available" >> >> ==> glustershd.log <=>> [2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect] >> 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not >> available" >> [2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect] >> 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not >> available" >> >> >> >> >> >> >> >> -- >> Cheers, >> Tom K. >> ------------------------------------------------------------ >> ------------------------- >> >> Living on earth is expensive, but it includes a free trip around the >> sun. >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Cheers, > Tom K. > ------------------------------------------------------------ > ------------------------- > > Living on earth is expensive, but it includes a free trip around the sun. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180411/4e48c8fa/attachment.html>
TomK
2018-May-08 02:25 UTC
[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.
On 4/11/2018 11:54 AM, Alex K wrote: Hey Guy's, Returning to this topic, after disabling the the quorum: cluster.quorum-type: none cluster.server-quorum-type: none I've ran into a number of gluster errors (see below). I'm using gluster as the backend for my NFS storage. I have gluster running on two nodes, nfs01 and nfs02. It's mounted on /n on each host. The path /n is in turn shared out by NFS Ganesha. It's a two node setup with quorum disabled as noted below: [root at nfs02 ganesha]# mount|grep gv01 nfs02:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root at nfs01 glusterfs]# mount|grep gv01 nfs01:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) Gluster always reports as working no matter when I type the below two commands: [root at nfs01 glusterfs]# gluster volume info Volume Name: gv01 Type: Replicate Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: nfs01:/bricks/0/gv01 Brick2: nfs02:/bricks/0/gv01 Options Reconfigured: cluster.server-quorum-type: none cluster.quorum-type: none server.event-threads: 8 client.event-threads: 8 performance.readdir-ahead: on performance.write-behind-window-size: 8MB performance.io-thread-count: 16 performance.cache-size: 1GB nfs.trusted-sync: on performance.client-io-threads: off nfs.disable: on transport.address-family: inet [root at nfs01 glusterfs]# gluster status unrecognized word: status (position 0) [root at nfs01 glusterfs]# gluster volume status Status of volume: gv01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick nfs01:/bricks/0/gv01 49152 0 Y 1422 Brick nfs02:/bricks/0/gv01 49152 0 Y 1422 Self-heal Daemon on localhost N/A N/A Y 1248 Self-heal Daemon on nfs02.nix.my.dom N/A N/A Y 1251 Task Status of Volume gv01 ------------------------------------------------------------------------------ There are no active volume tasks [root at nfs01 glusterfs]# [root at nfs01 glusterfs]# rpm -aq|grep -Ei gluster glusterfs-3.13.2-2.el7.x86_64 glusterfs-devel-3.13.2-2.el7.x86_64 glusterfs-fuse-3.13.2-2.el7.x86_64 glusterfs-api-devel-3.13.2-2.el7.x86_64 centos-release-gluster313-1.0-1.el7.centos.noarch python2-gluster-3.13.2-2.el7.x86_64 glusterfs-client-xlators-3.13.2-2.el7.x86_64 glusterfs-server-3.13.2-2.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64 glusterfs-cli-3.13.2-2.el7.x86_64 centos-release-gluster312-1.0-1.el7.centos.noarch python2-glusterfs-api-1.1-1.el7.noarch glusterfs-libs-3.13.2-2.el7.x86_64 glusterfs-extra-xlators-3.13.2-2.el7.x86_64 glusterfs-api-3.13.2-2.el7.x86_64 [root at nfs01 glusterfs]# The short of it is that everything works and mounts on guests work as long as I don't try to write to the NFS share from my clients. If I try to write to the share, everything comes apart like this: -sh-4.2$ pwd /n/my.dom/tom -sh-4.2$ ls -altri total 6258 11715278280495367299 -rw-------. 1 tom at my.dom tom at my.dom 231 Feb 17 20:15 .bashrc 10937819299152577443 -rw-------. 1 tom at my.dom tom at my.dom 193 Feb 17 20:15 .bash_profile 10823746994379198104 -rw-------. 1 tom at my.dom tom at my.dom 18 Feb 17 20:15 .bash_logout 10718721668898812166 drwxr-xr-x. 3 root root 4096 Mar 5 02:46 .. 12008425472191154054 drwx------. 2 tom at my.dom tom at my.dom 4096 Mar 18 03:07 .ssh 13763048923429182948 -rw-rw-r--. 1 tom at my.dom tom at my.dom 6359568 Mar 25 22:38 opennebula-cores.tar.gz 11674701370106210511 -rw-rw-r--. 1 tom at my.dom tom at my.dom 4 Apr 9 23:25 meh.txt 9326637590629964475 -rw-r--r--. 1 tom at my.dom tom at my.dom 24970 May 1 01:30 nfs-trace-working.dat.gz 9337343577229627320 -rw-------. 1 tom at my.dom tom at my.dom 3734 May 1 23:38 .bash_history 11438151930727967183 drwx------. 3 tom at my.dom tom at my.dom 4096 May 1 23:58 . 9865389421596220499 -rw-r--r--. 1 tom at my.dom tom at my.dom 4096 May 1 23:58 .meh.txt.swp -sh-4.2$ touch test.txt -sh-4.2$ vi test.txt -sh-4.2$ ls -altri ls: cannot open directory .: Permission denied -sh-4.2$ ls -altri ls: cannot open directory .: Permission denied -sh-4.2$ ls -altri This is followed by a slew of other errors in apps using the gluster volume. These errors include: 02/05/2018 23:10:52 : epoch 5aea7bd5 : nfs02.nix.mds.xyz : ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM ==> ganesha-gfapi.log <=[2018-05-03 04:32:18.009245] I [MSGID: 114021] [client.c:2369:notify] 0-gv01-client-0: current graph is no longer active, destroying rpc_client [2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify] 0-gv01-client-1: current graph is no longer active, destroying rpc_client [2018-05-03 04:32:18.009499] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from gv01-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2018-05-03 04:32:18.009557] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from gv01-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-05-03 04:32:18.009610] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket So I'm wondering, if this is due to the two node gluster, as it seems to be, and what is it that I really need to do here? Should I go with the recommended 3 node setup to avoid this which would include a proper quorum? Or is there more to this and it really doesn't matter if I have a 2 node gluster cluster without a quorum and this is due to something else still? Again, anytime I check the gluter volumes, everything checks out. The results of both 'gluster volume info' and 'gluster volume status' is always as I pasted above, fully working. I'm also using the Linux KDC Free IPA with this solution as well. -- Cheers, Tom K. ------------------------------------------------------------------------------------- Living on earth is expensive, but it includes a free trip around the sun. [root at nfs01 glusterfs]# cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 option cluster.quorum-type none option cluster.server-quorum-type none # option lock-timer 180 # option transport.address-family inet6 # option base-port 49152 # option max-port 65535 end-volume [root at nfs01 glusterfs]# [root at nfs02 glusterfs]# grep -E " E " *.log glusterd.log:[2018-04-30 06:37:51.315618] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-04-30 06:37:51.315696] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-04-30 06:40:37.994481] E [socket.c:2374:socket_connect_finish] 0-management: connection to 192.168.0.131:24007 failed (Connection refused); disconnecting socket glusterd.log:[2018-05-01 04:56:19.231954] E [socket.c:2374:socket_connect_finish] 0-management: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket glusterd.log:[2018-05-01 22:43:04.195366] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-01 22:43:04.195445] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 02:46:32.397585] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 02:46:32.397653] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 03:16:10.937203] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 03:16:10.937261] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 03:57:20.918315] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 03:57:20.918400] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-05 01:37:24.981265] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-05 01:37:24.981346] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-07 03:04:20.053473] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-07 03:04:20.053553] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glustershd.log:[2018-04-30 06:37:53.671466] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-04-30 06:40:41.694799] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-01 04:55:57.191783] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket glustershd.log:[2018-05-01 05:10:55.207027] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-01 22:43:06.313941] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-02 03:16:12.884697] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-01 04:56:01.191877] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket n.log:[2018-05-01 05:10:56.448375] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket n.log:[2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket n.log:[2018-05-02 03:16:12.919833] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-05 01:38:37.389091] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:37.389171] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) n.log:[2018-05-05 01:38:46.974945] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:46.975012] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-05 01:38:47.010671] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:47.010731] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-07 03:05:48.552793] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:48.552872] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) n.log:[2018-05-07 03:05:56.084586] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:56.084655] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-07 03:05:56.148767] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:56.148825] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed [root at nfs02 glusterfs]# ganesha-gfapi.log:[2018-04-08 03:45:25.440067] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 03:45:28.455560] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 03:45:29.145764] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 03:51:15.529380] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 03:51:29.754070] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 03:51:40.633012] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:36:28.005490] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 04:37:09.038708] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:37:09.039432] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 04:37:09.044188] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:37:09.044484] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:17:02.093164] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:17:29.123148] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 17:17:50.135169] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:18:03.290346] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:18:14.202118] E [socket.c:2369:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 17:19:39.014330] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:32:21.714643] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:32:21.734187] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null ganesha-gfapi.log:[2018-04-08 20:35:30.005234] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 20:55:29.009144] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 20:57:52.009895] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 21:00:29.004716] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 21:01:01.205704] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 21:01:01.209797] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null ganesha-gfapi.log:[2018-04-09 04:41:02.006926] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-10 03:20:40.011967] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-10 03:30:33.057576] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-13 02:13:01.005629] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-14 21:41:18.313290] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-15 03:01:37.005636] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-15 03:02:37.319050] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-15 03:43:02.719856] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-15 20:36:31.143742] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 00:02:38.697700] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:16:38.383945] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:25:30.904382] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:25:57.432071] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-16 05:26:00.122608] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:30:20.172115] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-17 05:07:05.006133] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-17 05:08:39.004624] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-20 04:58:55.043976] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-20 05:07:22.762457] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-20 05:09:18.710446] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-20 05:09:21.489724] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.791636] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.797525] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.797565] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-28 07:36:29.927497] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-28 07:36:31.215686] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:36:31.216287] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-30 06:37:02.005127] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-30 06:37:53.985563] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-01 04:55:57.191787] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-05-01 05:10:55.595474] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-01 05:10:56.620226] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-05-01 22:42:26.005472] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-01 22:43:06.423349] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-02 03:16:12.930652] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-03 02:43:03.021549] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-03 03:00:01.034676] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-03 03:59:28.006170] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-05 01:38:47.474503] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88] ))))) 0-gv01-client-0: forced unwinding frame type(GlusterFS Handshake) op(SETVOLUME(1)) called at 2018-05-05 01:38:46.968501 (xid=0x5) ganesha-gfapi.log:[2018-05-05 01:38:47.474834] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-05 01:38:47.474960] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88] ))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2018-05-05 01:38:46.965204 (xid=0x2) ganesha-gfapi.log:[2018-05-05 01:38:50.457456] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to timer event queue ganesha-gfapi.log:[2018-05-07 03:05:58.522295] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f45af6d6f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f45af49be7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f45af49bf9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f45af49d720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f45a0db6c88] ))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2018-05-07 03:05:56.080210 (xid=0x2) ganesha-gfapi.log:[2018-05-07 03:05:59.504926] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to timer event queue ganesha-gfapi.log:[2018-05-07 03:05:59.505274] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-0: Error adding to timer event queue [root at nfs02 ganesha]# [root at nfs02 ganesha]#> > > On Wed, Apr 11, 2018 at 4:35 AM, TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>> wrote: > > On 4/9/2018 2:45 AM, Alex K wrote: > Hey Alex, > > With two nodes, the setup works but both sides go down when one node > is missing.? Still I set the below two params to none and that > solved my issue: > > cluster.quorum-type: none > cluster.server-quorum-type: none > > yes this disables quorum so as to avoid the issue. Glad that this > helped. Bare in in mind though that it is easier to face split-brain > issues with quorum is disabled, that's why 3 nodes at least are > recommended. Just to note that I have also a 2 node cluster which is > running without issues for long time. > > Thank you for that. > > Cheers, > Tom > > Hi, > > You need 3 nodes at least to have quorum enabled. In 2 node > setup you need to disable quorum so as to be able to still use > the volume when one of the nodes go down. > > On Mon, Apr 9, 2018, 09:02 TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com> <mailto:tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>>> wrote: > > ? ? Hey All, > > ? ? In a two node glusterfs setup, with one node down, can't > use the second > ? ? node to mount the volume.? I understand this is expected > behaviour? > ? ? Anyway to allow the secondary node to function then > replicate what > ? ? changed to the first (primary) when it's back online?? Or > should I just > ? ? go for a third node to allow for this? > > ? ? Also, how safe is it to set the following to none? > > ? ? cluster.quorum-type: auto > ? ? cluster.server-quorum-type: server > > > ? ? [root at nfs01 /]# gluster volume start gv01 > ? ? volume start: gv01: failed: Quorum not met. Volume > operation not > ? ? allowed. > ? ? [root at nfs01 /]# > > > ? ? [root at nfs01 /]# gluster volume status > ? ? Status of volume: gv01 > ? ? Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TCP Port? RDMA > Port? ? ?Online? Pid > > ------------------------------------------------------------------------------ > ? ? Brick nfs01:/bricks/0/gv01? ? ? ? ? ? ? ? ? N/A? ? ? ?N/A > ? ? ? N? ? ? ? ? ?N/A > ? ? Self-heal Daemon on localhost? ? ? ? ? ? ? ?N/A? ? ? ?N/A > ? ? ? Y > ? ? 25561 > > ? ? Task Status of Volume gv01 > > ------------------------------------------------------------------------------ > ? ? There are no active volume tasks > > ? ? [root at nfs01 /]# > > > ? ? [root at nfs01 /]# gluster volume info > > ? ? Volume Name: gv01 > ? ? Type: Replicate > ? ? Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 > ? ? Status: Started > ? ? Snapshot Count: 0 > ? ? Number of Bricks: 1 x 2 = 2 > ? ? Transport-type: tcp > ? ? Bricks: > ? ? Brick1: nfs01:/bricks/0/gv01 > ? ? Brick2: nfs02:/bricks/0/gv01 > ? ? Options Reconfigured: > ? ? transport.address-family: inet > ? ? nfs.disable: on > ? ? performance.client-io-threads: off > ? ? nfs.trusted-sync: on > ? ? performance.cache-size: 1GB > ? ? performance.io-thread-count: 16 > ? ? performance.write-behind-window-size: 8MB > ? ? performance.readdir-ahead: on > ? ? client.event-threads: 8 > ? ? server.event-threads: 8 > ? ? cluster.quorum-type: auto > ? ? cluster.server-quorum-type: server > ? ? [root at nfs01 /]# > > > > > ? ? ==> n.log <=> ? ? [2018-04-09 05:08:13.704156] I [MSGID: 100030] > [glusterfsd.c:2556:main] > ? ? 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs > version > ? ? 3.13.2 (args: /usr/sbin/glusterfs --process-name fuse > ? ? --volfile-server=nfs01 --volfile-id=/gv01 /n) > ? ? [2018-04-09 05:08:13.711255] W [MSGID: 101002] > ? ? [options.c:995:xl_opt_validate] 0-glusterfs: option > 'address-family' is > ? ? deprecated, preferred is 'transport.address-family', > continuing with > ? ? correction > ? ? [2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect] > ? ? 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:13.729025] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 1 > ? ? [2018-04-09 05:08:13.737757] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 2 > ? ? [2018-04-09 05:08:13.738114] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 3 > ? ? [2018-04-09 05:08:13.738203] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 4 > ? ? [2018-04-09 05:08:13.738324] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 5 > ? ? [2018-04-09 05:08:13.738330] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 6 > ? ? [2018-04-09 05:08:13.738655] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 7 > ? ? [2018-04-09 05:08:13.738742] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 8 > ? ? [2018-04-09 05:08:13.739460] W [MSGID: 101174] > ? ? [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: > option > ? ? 'parallel-readdir' is not recognized > ? ? [2018-04-09 05:08:13.739787] I [MSGID: 114020] > [client.c:2360:notify] > ? ? 0-gv01-client-0: parent translators are ready, attempting > connect on > ? ? transport > ? ? [2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:13.747372] I [MSGID: 114020] > [client.c:2360:notify] > ? ? 0-gv01-client-1: parent translators are ready, attempting > connect on > ? ? transport > ? ? [2018-04-09 05:08:13.747883] E [MSGID: 114058] > ? ? [client-handshake.c:1571:client_query_portmap_cbk] > 0-gv01-client-0: > ? ? failed to get the port number for remote subvolume. Please > run 'gluster > ? ? volume status' on server to see if brick process is running. > ? ? [2018-04-09 05:08:13.748026] I [MSGID: 114018] > ? ? [client.c:2285:client_rpc_notify] 0-gv01-client-0: > disconnected from > ? ? gv01-client-0. Client process will keep trying to connect > to glusterd > ? ? until brick's port is available > ? ? [2018-04-09 05:08:13.748070] W [MSGID: 108001] > ? ? [afr-common.c:5391:afr_notify] 0-gv01-replicate-0: > Client-quorum is > ? ? not met > ? ? [2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? Final graph: > > +------------------------------------------------------------------------------+ > ? ? ?? ?1: volume gv01-client-0 > ? ? ?? ?2:? ? ?type protocol/client > ? ? ?? ?3:? ? ?option ping-timeout 42 > ? ? ?? ?4:? ? ?option remote-host nfs01 > ? ? ?? ?5:? ? ?option remote-subvolume /bricks/0/gv01 > ? ? ?? ?6:? ? ?option transport-type socket > ? ? ?? ?7:? ? ?option transport.address-family inet > ? ? ?? ?8:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ? ?? ?9:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? ? ?? 10:? ? ?option event-threads 8 > ? ? ?? 11:? ? ?option transport.tcp-user-timeout 0 > ? ? ?? 12:? ? ?option transport.socket.keepalive-time 20 > ? ? ?? 13:? ? ?option transport.socket.keepalive-interval 2 > ? ? ?? 14:? ? ?option transport.socket.keepalive-count 9 > ? ? ?? 15:? ? ?option send-gids true > ? ? ?? 16: end-volume > ? ? ?? 17: > ? ? ?? 18: volume gv01-client-1 > ? ? ?? 19:? ? ?type protocol/client > ? ? ?? 20:? ? ?option ping-timeout 42 > ? ? ?? 21:? ? ?option remote-host nfs02 > ? ? ?? 22:? ? ?option remote-subvolume /bricks/0/gv01 > ? ? ?? 23:? ? ?option transport-type socket > ? ? ?? 24:? ? ?option transport.address-family inet > ? ? ?? 25:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ? ?? 26:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? ? ?? 27:? ? ?option event-threads 8 > ? ? ?? 28:? ? ?option transport.tcp-user-timeout 0 > ? ? ?? 29:? ? ?option transport.socket.keepalive-time 20 > ? ? ?? 30:? ? ?option transport.socket.keepalive-interval 2 > ? ? ?? 31:? ? ?option transport.socket.keepalive-count 9 > ? ? ?? 32:? ? ?option send-gids true > ? ? ?? 33: end-volume > ? ? ?? 34: > ? ? ?? 35: volume gv01-replicate-0 > ? ? ?? 36:? ? ?type cluster/replicate > ? ? ?? 37:? ? ?option afr-pending-xattr gv01-client-0,gv01-client-1 > ? ? ?? 38:? ? ?option quorum-type auto > ? ? ?? 39:? ? ?option use-compound-fops off > ? ? ?? 40:? ? ?subvolumes gv01-client-0 gv01-client-1 > ? ? ?? 41: end-volume > ? ? ?? 42: > ? ? ?? 43: volume gv01-dht > ? ? ?? 44:? ? ?type cluster/distribute > ? ? ?? 45:? ? ?option lock-migration off > ? ? ?? 46:? ? ?subvolumes gv01-replicate-0 > ? ? ?? 47: end-volume > ? ? ?? 48: > ? ? ?? 49: volume gv01-write-behind > ? ? ?? 50:? ? ?type performance/write-behind > ? ? ?? 51:? ? ?option cache-size 8MB > ? ? ?? 52:? ? ?subvolumes gv01-dht > ? ? ?? 53: end-volume > ? ? ?? 54: > ? ? ?? 55: volume gv01-read-ahead > ? ? ?? 56:? ? ?type performance/read-ahead > ? ? ?? 57:? ? ?subvolumes gv01-write-behind > ? ? ?? 58: end-volume > ? ? ?? 59: > ? ? ?? 60: volume gv01-readdir-ahead > ? ? ?? 61:? ? ?type performance/readdir-ahead > ? ? ?? 62:? ? ?option parallel-readdir off > ? ? ?? 63:? ? ?option rda-request-size 131072 > ? ? ?? 64:? ? ?option rda-cache-limit 10MB > ? ? ?? 65:? ? ?subvolumes gv01-read-ahead > ? ? ?? 66: end-volume > ? ? ?? 67: > ? ? ?? 68: volume gv01-io-cache > ? ? ?? 69:? ? ?type performance/io-cache > ? ? ?? 70:? ? ?option cache-size 1GB > ? ? ?? 71:? ? ?subvolumes gv01-readdir-ahead > ? ? ?? 72: end-volume > ? ? ?? 73: > ? ? ?? 74: volume gv01-quick-read > ? ? ?? 75:? ? ?type performance/quick-read > ? ? ?? 76:? ? ?option cache-size 1GB > ? ? ?? 77:? ? ?subvolumes gv01-io-cache > ? ? ?? 78: end-volume > ? ? ?? 79: > ? ? ?? 80: volume gv01-open-behind > ? ? ?? 81:? ? ?type performance/open-behind > ? ? ?? 82:? ? ?subvolumes gv01-quick-read > ? ? ?? 83: end-volume > ? ? ?? 84: > ? ? ?? 85: volume gv01-md-cache > ? ? ?? 86:? ? ?type performance/md-cache > ? ? ?? 87:? ? ?subvolumes gv01-open-behind > ? ? ?? 88: end-volume > ? ? ?? 89: > ? ? ?? 90: volume gv01 > ? ? ?? 91:? ? ?type debug/io-stats > ? ? ?? 92:? ? ?option log-level INFO > ? ? ?? 93:? ? ?option latency-measurement off > ? ? ?? 94:? ? ?option count-fop-hits off > ? ? ?? 95:? ? ?subvolumes gv01-md-cache > ? ? ?? 96: end-volume > ? ? ?? 97: > ? ? ?? 98: volume meta-autoload > ? ? ?? 99:? ? ?type meta > ? ? 100:? ? ?subvolumes gv01 > ? ? 101: end-volume > ? ? 102: > > +------------------------------------------------------------------------------+ > ? ? [2018-04-09 05:08:13.922631] E > [socket.c:2374:socket_connect_finish] > ? ? 0-gv01-client-1: connection to 192.168.0.119:24007 > <http://192.168.0.119:24007> > ? ? <http://192.168.0.119:24007> failed (No route to > > ? ? host); disconnecting socket > ? ? [2018-04-09 05:08:13.922690] E [MSGID: 108006] > ? ? [afr-common.c:5164:__afr_handle_child_down_event] > 0-gv01-replicate-0: > ? ? All subvolumes are down. Going offline until atleast one of > them comes > ? ? back up. > ? ? [2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init] > ? ? 0-glusterfs-fuse: FUSE inited with protocol versions: > glusterfs 7.24 > ? ? kernel 7.22 > ? ? [2018-04-09 05:08:13.926245] I > [fuse-bridge.c:4835:fuse_graph_sync] > ? ? 0-fuse: switched to graph 0 > ? ? [2018-04-09 05:08:13.926518] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.926671] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.926762] E > [fuse-bridge.c:4271:fuse_first_lookup] > ? ? 0-fuse: first lookup on root failed (Transport endpoint is not > ? ? connected) > ? ? [2018-04-09 05:08:13.927207] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.927262] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.927301] W > ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > ? ? 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport > ? ? endpoint is not connected) > ? ? [2018-04-09 05:08:13.927339] E > [fuse-bridge.c:900:fuse_getattr_resume] > ? ? 0-glusterfs-fuse: 2: GETATTR 1 > (00000000-0000-0000-0000-000000000001) > ? ? resolution failed > ? ? [2018-04-09 05:08:13.931497] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.931558] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.931599] W > ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > ? ? 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport > ? ? endpoint is not connected) > ? ? [2018-04-09 05:08:13.931623] E > [fuse-bridge.c:900:fuse_getattr_resume] > ? ? 0-glusterfs-fuse: 3: GETATTR 1 > (00000000-0000-0000-0000-000000000001) > ? ? resolution failed > ? ? [2018-04-09 05:08:13.937258] I > [fuse-bridge.c:5093:fuse_thread_proc] > ? ? 0-fuse: initating unmount of /n > ? ? [2018-04-09 05:08:13.938043] W > [glusterfsd.c:1393:cleanup_and_exit] > ? ? (-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25] > ? ? -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) > [0x560b52471675] > ? ? -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) > [0x560b5247149b] ) 0-: > ? ? received signum (15), shutting down > ? ? [2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] > 0-fuse: > ? ? Unmounting '/n'. > ? ? [2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] > 0-fuse: Closing > ? ? fuse connection to '/n'. > > ? ? ==> glusterd.log <=> ? ? [2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect] > ? ? 0-management: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > > ? ? ==> glustershd.log <=> ? ? [2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > > > > > > > > ? ? -- > ? ? Cheers, > ? ? Tom K. > > ------------------------------------------------------------------------------------- > > ? ? Living on earth is expensive, but it includes a free trip > around the > ? ? sun. > > ? ? _______________________________________________ > ? ? Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > -- > Cheers, > Tom K. > ------------------------------------------------------------------------------------- > > Living on earth is expensive, but it includes a free trip around the > sun. > >
TomK
2018-May-08 02:28 UTC
[Gluster-users] volume start: gv01: failed: Quorum not met. Volume operation not allowed.
On 4/11/2018 11:54 AM, Alex K wrote: Hey Guy's, Returning to this topic, after disabling the the quorum: cluster.quorum-type: none cluster.server-quorum-type: none I've ran into a number of gluster errors (see below). I'm using gluster as the backend for my NFS storage. I have gluster running on two nodes, nfs01 and nfs02. It's mounted on /n on each host. The path /n is in turn shared out by NFS Ganesha. It's a two node setup with quorum disabled as noted below: [root at nfs02 ganesha]# mount|grep gv01 nfs02:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root at nfs01 glusterfs]# mount|grep gv01 nfs01:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) Gluster always reports as working no matter when I type the below two commands: [root at nfs01 glusterfs]# gluster volume info Volume Name: gv01 Type: Replicate Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: nfs01:/bricks/0/gv01 Brick2: nfs02:/bricks/0/gv01 Options Reconfigured: cluster.server-quorum-type: none cluster.quorum-type: none server.event-threads: 8 client.event-threads: 8 performance.readdir-ahead: on performance.write-behind-window-size: 8MB performance.io-thread-count: 16 performance.cache-size: 1GB nfs.trusted-sync: on performance.client-io-threads: off nfs.disable: on transport.address-family: inet [root at nfs01 glusterfs]# gluster status unrecognized word: status (position 0) [root at nfs01 glusterfs]# gluster volume status Status of volume: gv01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick nfs01:/bricks/0/gv01 49152 0 Y 1422 Brick nfs02:/bricks/0/gv01 49152 0 Y 1422 Self-heal Daemon on localhost N/A N/A Y 1248 Self-heal Daemon on nfs02.nix.my.dom N/A N/A Y 1251 Task Status of Volume gv01 ------------------------------------------------------------------------------ There are no active volume tasks [root at nfs01 glusterfs]# [root at nfs01 glusterfs]# rpm -aq|grep -Ei gluster glusterfs-3.13.2-2.el7.x86_64 glusterfs-devel-3.13.2-2.el7.x86_64 glusterfs-fuse-3.13.2-2.el7.x86_64 glusterfs-api-devel-3.13.2-2.el7.x86_64 centos-release-gluster313-1.0-1.el7.centos.noarch python2-gluster-3.13.2-2.el7.x86_64 glusterfs-client-xlators-3.13.2-2.el7.x86_64 glusterfs-server-3.13.2-2.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.9.x86_64 glusterfs-cli-3.13.2-2.el7.x86_64 centos-release-gluster312-1.0-1.el7.centos.noarch python2-glusterfs-api-1.1-1.el7.noarch glusterfs-libs-3.13.2-2.el7.x86_64 glusterfs-extra-xlators-3.13.2-2.el7.x86_64 glusterfs-api-3.13.2-2.el7.x86_64 [root at nfs01 glusterfs]# The short of it is that everything works and mounts on guests work as long as I don't try to write to the NFS share from my clients. If I try to write to the share, everything comes apart like this: -sh-4.2$ pwd /n/my.dom/tom -sh-4.2$ ls -altri total 6258 11715278280495367299 -rw-------. 1 tom at my.dom tom at my.dom 231 Feb 17 20:15 .bashrc 10937819299152577443 -rw-------. 1 tom at my.dom tom at my.dom 193 Feb 17 20:15 .bash_profile 10823746994379198104 -rw-------. 1 tom at my.dom tom at my.dom 18 Feb 17 20:15 .bash_logout 10718721668898812166 drwxr-xr-x. 3 root root 4096 Mar 5 02:46 .. 12008425472191154054 drwx------. 2 tom at my.dom tom at my.dom 4096 Mar 18 03:07 .ssh 13763048923429182948 -rw-rw-r--. 1 tom at my.dom tom at my.dom 6359568 Mar 25 22:38 opennebula-cores.tar.gz 11674701370106210511 -rw-rw-r--. 1 tom at my.dom tom at my.dom 4 Apr 9 23:25 meh.txt 9326637590629964475 -rw-r--r--. 1 tom at my.dom tom at my.dom 24970 May 1 01:30 nfs-trace-working.dat.gz 9337343577229627320 -rw-------. 1 tom at my.dom tom at my.dom 3734 May 1 23:38 .bash_history 11438151930727967183 drwx------. 3 tom at my.dom tom at my.dom 4096 May 1 23:58 . 9865389421596220499 -rw-r--r--. 1 tom at my.dom tom at my.dom 4096 May 1 23:58 .meh.txt.swp -sh-4.2$ touch test.txt -sh-4.2$ vi test.txt -sh-4.2$ ls -altri ls: cannot open directory .: Permission denied -sh-4.2$ ls -altri ls: cannot open directory .: Permission denied -sh-4.2$ ls -altri This is followed by a slew of other errors in apps using the gluster volume. These errors include: 02/05/2018 23:10:52 : epoch 5aea7bd5 : nfs02.nix.my.dom : ganesha.nfsd-5891[svc_12] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM ==> ganesha-gfapi.log <=[2018-05-03 04:32:18.009245] I [MSGID: 114021] [client.c:2369:notify] 0-gv01-client-0: current graph is no longer active, destroying rpc_client [2018-05-03 04:32:18.009338] I [MSGID: 114021] [client.c:2369:notify] 0-gv01-client-1: current graph is no longer active, destroying rpc_client [2018-05-03 04:32:18.009499] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from gv01-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2018-05-03 04:32:18.009557] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-1: disconnected from gv01-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-05-03 04:32:18.009610] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket So I'm wondering, if this is due to the two node gluster, as it seems to be, and what is it that I really need to do here? Should I go with the recommended 3 node setup to avoid this which would include a proper quorum? Or is there more to this and it really doesn't matter if I have a 2 node gluster cluster without a quorum and this is due to something else still? Again, anytime I check the gluter volumes, everything checks out. The results of both 'gluster volume info' and 'gluster volume status' is always as I pasted above, fully working. I'm also using the Linux KDC Free IPA with this solution as well. -- Cheers, Tom K. ------------------------------------------------------------------------------------- Living on earth is expensive, but it includes a free trip around the sun. [root at nfs01 glusterfs]# cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 option cluster.quorum-type none option cluster.server-quorum-type none # option lock-timer 180 # option transport.address-family inet6 # option base-port 49152 # option max-port 65535 end-volume [root at nfs01 glusterfs]# [root at nfs02 glusterfs]# grep -E " E " *.log glusterd.log:[2018-04-30 06:37:51.315618] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-04-30 06:37:51.315696] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-04-30 06:40:37.994481] E [socket.c:2374:socket_connect_finish] 0-management: connection to 192.168.0.131:24007 failed (Connection refused); disconnecting socket glusterd.log:[2018-05-01 04:56:19.231954] E [socket.c:2374:socket_connect_finish] 0-management: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket glusterd.log:[2018-05-01 22:43:04.195366] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-01 22:43:04.195445] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 02:46:32.397585] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 02:46:32.397653] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 03:16:10.937203] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 03:16:10.937261] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-02 03:57:20.918315] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-02 03:57:20.918400] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-05 01:37:24.981265] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-05 01:37:24.981346] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glusterd.log:[2018-05-07 03:04:20.053473] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.13.2/rpc-transport/rdma.so: cannot open shared object file: No such file or directory glusterd.log:[2018-05-07 03:04:20.053553] E [MSGID: 106243] [glusterd.c:1769:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport glustershd.log:[2018-04-30 06:37:53.671466] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-04-30 06:40:41.694799] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-01 04:55:57.191783] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket glustershd.log:[2018-05-01 05:10:55.207027] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-01 22:43:06.313941] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. glustershd.log:[2018-05-02 03:16:12.884697] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-01 04:56:01.191877] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket n.log:[2018-05-01 05:10:56.448375] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket n.log:[2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket n.log:[2018-05-02 03:16:12.919833] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. n.log:[2018-05-05 01:38:37.389091] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:37.389171] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) n.log:[2018-05-05 01:38:46.974945] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:46.975012] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-05 01:38:47.010671] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-05 01:38:47.010731] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-07 03:05:48.552793] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:48.552872] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) n.log:[2018-05-07 03:05:56.084586] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:56.084655] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed n.log:[2018-05-07 03:05:56.148767] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null n.log:[2018-05-07 03:05:56.148825] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed [root at nfs02 glusterfs]# ganesha-gfapi.log:[2018-04-08 03:45:25.440067] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 03:45:28.455560] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 03:45:29.145764] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 03:51:15.529380] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 03:51:29.754070] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 03:51:40.633012] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:36:28.005490] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 04:37:09.038708] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:37:09.039432] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 04:37:09.044188] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 04:37:09.044484] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:17:02.093164] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:17:29.123148] E [socket.c:2369:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 17:17:50.135169] E [MSGID: 114058] [client-handshake.c:1565:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:18:03.290346] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:18:14.202118] E [socket.c:2369:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-08 17:19:39.014330] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 17:32:21.714643] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 17:32:21.734187] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null ganesha-gfapi.log:[2018-04-08 20:35:30.005234] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 20:55:29.009144] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 20:57:52.009895] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 21:00:29.004716] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-08 21:01:01.205704] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-08 21:01:01.209797] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null ganesha-gfapi.log:[2018-04-09 04:41:02.006926] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-10 03:20:40.011967] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-10 03:30:33.057576] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-13 02:13:01.005629] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-14 21:41:18.313290] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-15 03:01:37.005636] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-15 03:02:37.319050] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-15 03:43:02.719856] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-15 20:36:31.143742] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 00:02:38.697700] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:16:38.383945] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:25:30.904382] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:25:57.432071] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-16 05:26:00.122608] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-16 05:30:20.172115] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-17 05:07:05.006133] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-17 05:08:39.004624] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-20 04:58:55.043976] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-20 05:07:22.762457] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-20 05:09:18.710446] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-20 05:09:21.489724] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.791636] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.797525] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:22:16.797565] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-28 07:36:29.927497] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-04-28 07:36:31.215686] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-04-28 07:36:31.216287] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-30 06:37:02.005127] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-04-30 06:37:53.985563] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-01 04:55:57.191787] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:24007 failed (No route to host); disconnecting socket ganesha-gfapi.log:[2018-05-01 05:10:55.595474] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-01 05:10:56.620226] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket ganesha-gfapi.log:[2018-05-01 22:42:26.005472] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-01 22:43:06.423349] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-02 03:16:12.930652] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. ganesha-gfapi.log:[2018-05-03 02:43:03.021549] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-03 03:00:01.034676] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-03 03:59:28.006170] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-05 01:38:47.474503] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88] ))))) 0-gv01-client-0: forced unwinding frame type(GlusterFS Handshake) op(SETVOLUME(1)) called at 2018-05-05 01:38:46.968501 (xid=0x5) ganesha-gfapi.log:[2018-05-05 01:38:47.474834] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. ganesha-gfapi.log:[2018-05-05 01:38:47.474960] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f327e4d2f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f327e297e7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f327e297f9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f327e299720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f32701a7c88] ))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2018-05-05 01:38:46.965204 (xid=0x2) ganesha-gfapi.log:[2018-05-05 01:38:50.457456] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to timer event queue ganesha-gfapi.log:[2018-05-07 03:05:58.522295] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f45af6d6f0b] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f45af49be7e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f45af49bf9e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f45af49d720] (--> /usr/lib64/glusterfs/3.13.2/xlator/protocol/client.so(fini+0x28)[0x7f45a0db6c88] ))))) 0-gv01-client-1: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2018-05-07 03:05:56.080210 (xid=0x2) ganesha-gfapi.log:[2018-05-07 03:05:59.504926] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-1: Error adding to timer event queue ganesha-gfapi.log:[2018-05-07 03:05:59.505274] E [rpc-clnt.c:417:rpc_clnt_reconnect] 0-gv01-client-0: Error adding to timer event queue [root at nfs02 ganesha]# [root at nfs02 ganesha]#> > > On Wed, Apr 11, 2018 at 4:35 AM, TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>> wrote: > > On 4/9/2018 2:45 AM, Alex K wrote: > Hey Alex, > > With two nodes, the setup works but both sides go down when one node > is missing.? Still I set the below two params to none and that > solved my issue: > > cluster.quorum-type: none > cluster.server-quorum-type: none > > yes this disables quorum so as to avoid the issue. Glad that this > helped. Bare in in mind though that it is easier to face split-brain > issues with quorum is disabled, that's why 3 nodes at least are > recommended. Just to note that I have also a 2 node cluster which is > running without issues for long time. > > Thank you for that. > > Cheers, > Tom > > Hi, > > You need 3 nodes at least to have quorum enabled. In 2 node > setup you need to disable quorum so as to be able to still use > the volume when one of the nodes go down. > > On Mon, Apr 9, 2018, 09:02 TomK <tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com> <mailto:tomkcpr at mdevsys.com > <mailto:tomkcpr at mdevsys.com>>> wrote: > > ? ? Hey All, > > ? ? In a two node glusterfs setup, with one node down, can't > use the second > ? ? node to mount the volume.? I understand this is expected > behaviour? > ? ? Anyway to allow the secondary node to function then > replicate what > ? ? changed to the first (primary) when it's back online?? Or > should I just > ? ? go for a third node to allow for this? > > ? ? Also, how safe is it to set the following to none? > > ? ? cluster.quorum-type: auto > ? ? cluster.server-quorum-type: server > > > ? ? [root at nfs01 /]# gluster volume start gv01 > ? ? volume start: gv01: failed: Quorum not met. Volume > operation not > ? ? allowed. > ? ? [root at nfs01 /]# > > > ? ? [root at nfs01 /]# gluster volume status > ? ? Status of volume: gv01 > ? ? Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TCP Port? RDMA > Port? ? ?Online? Pid > > ------------------------------------------------------------------------------ > ? ? Brick nfs01:/bricks/0/gv01? ? ? ? ? ? ? ? ? N/A? ? ? ?N/A > ? ? ? N? ? ? ? ? ?N/A > ? ? Self-heal Daemon on localhost? ? ? ? ? ? ? ?N/A? ? ? ?N/A > ? ? ? Y > ? ? 25561 > > ? ? Task Status of Volume gv01 > > ------------------------------------------------------------------------------ > ? ? There are no active volume tasks > > ? ? [root at nfs01 /]# > > > ? ? [root at nfs01 /]# gluster volume info > > ? ? Volume Name: gv01 > ? ? Type: Replicate > ? ? Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666 > ? ? Status: Started > ? ? Snapshot Count: 0 > ? ? Number of Bricks: 1 x 2 = 2 > ? ? Transport-type: tcp > ? ? Bricks: > ? ? Brick1: nfs01:/bricks/0/gv01 > ? ? Brick2: nfs02:/bricks/0/gv01 > ? ? Options Reconfigured: > ? ? transport.address-family: inet > ? ? nfs.disable: on > ? ? performance.client-io-threads: off > ? ? nfs.trusted-sync: on > ? ? performance.cache-size: 1GB > ? ? performance.io-thread-count: 16 > ? ? performance.write-behind-window-size: 8MB > ? ? performance.readdir-ahead: on > ? ? client.event-threads: 8 > ? ? server.event-threads: 8 > ? ? cluster.quorum-type: auto > ? ? cluster.server-quorum-type: server > ? ? [root at nfs01 /]# > > > > > ? ? ==> n.log <=> ? ? [2018-04-09 05:08:13.704156] I [MSGID: 100030] > [glusterfsd.c:2556:main] > ? ? 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs > version > ? ? 3.13.2 (args: /usr/sbin/glusterfs --process-name fuse > ? ? --volfile-server=nfs01 --volfile-id=/gv01 /n) > ? ? [2018-04-09 05:08:13.711255] W [MSGID: 101002] > ? ? [options.c:995:xl_opt_validate] 0-glusterfs: option > 'address-family' is > ? ? deprecated, preferred is 'transport.address-family', > continuing with > ? ? correction > ? ? [2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect] > ? ? 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:13.729025] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 1 > ? ? [2018-04-09 05:08:13.737757] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 2 > ? ? [2018-04-09 05:08:13.738114] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 3 > ? ? [2018-04-09 05:08:13.738203] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 4 > ? ? [2018-04-09 05:08:13.738324] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 5 > ? ? [2018-04-09 05:08:13.738330] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 6 > ? ? [2018-04-09 05:08:13.738655] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 7 > ? ? [2018-04-09 05:08:13.738742] I [MSGID: 101190] > ? ? [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: > Started thread > ? ? with index 8 > ? ? [2018-04-09 05:08:13.739460] W [MSGID: 101174] > ? ? [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: > option > ? ? 'parallel-readdir' is not recognized > ? ? [2018-04-09 05:08:13.739787] I [MSGID: 114020] > [client.c:2360:notify] > ? ? 0-gv01-client-0: parent translators are ready, attempting > connect on > ? ? transport > ? ? [2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:13.747372] I [MSGID: 114020] > [client.c:2360:notify] > ? ? 0-gv01-client-1: parent translators are ready, attempting > connect on > ? ? transport > ? ? [2018-04-09 05:08:13.747883] E [MSGID: 114058] > ? ? [client-handshake.c:1571:client_query_portmap_cbk] > 0-gv01-client-0: > ? ? failed to get the port number for remote subvolume. Please > run 'gluster > ? ? volume status' on server to see if brick process is running. > ? ? [2018-04-09 05:08:13.748026] I [MSGID: 114018] > ? ? [client.c:2285:client_rpc_notify] 0-gv01-client-0: > disconnected from > ? ? gv01-client-0. Client process will keep trying to connect > to glusterd > ? ? until brick's port is available > ? ? [2018-04-09 05:08:13.748070] W [MSGID: 108001] > ? ? [afr-common.c:5391:afr_notify] 0-gv01-replicate-0: > Client-quorum is > ? ? not met > ? ? [2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? Final graph: > > +------------------------------------------------------------------------------+ > ? ? ?? ?1: volume gv01-client-0 > ? ? ?? ?2:? ? ?type protocol/client > ? ? ?? ?3:? ? ?option ping-timeout 42 > ? ? ?? ?4:? ? ?option remote-host nfs01 > ? ? ?? ?5:? ? ?option remote-subvolume /bricks/0/gv01 > ? ? ?? ?6:? ? ?option transport-type socket > ? ? ?? ?7:? ? ?option transport.address-family inet > ? ? ?? ?8:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ? ?? ?9:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? ? ?? 10:? ? ?option event-threads 8 > ? ? ?? 11:? ? ?option transport.tcp-user-timeout 0 > ? ? ?? 12:? ? ?option transport.socket.keepalive-time 20 > ? ? ?? 13:? ? ?option transport.socket.keepalive-interval 2 > ? ? ?? 14:? ? ?option transport.socket.keepalive-count 9 > ? ? ?? 15:? ? ?option send-gids true > ? ? ?? 16: end-volume > ? ? ?? 17: > ? ? ?? 18: volume gv01-client-1 > ? ? ?? 19:? ? ?type protocol/client > ? ? ?? 20:? ? ?option ping-timeout 42 > ? ? ?? 21:? ? ?option remote-host nfs02 > ? ? ?? 22:? ? ?option remote-subvolume /bricks/0/gv01 > ? ? ?? 23:? ? ?option transport-type socket > ? ? ?? 24:? ? ?option transport.address-family inet > ? ? ?? 25:? ? ?option username 916ccf06-dc1d-467f-bc3d-f00a7449618f > ? ? ?? 26:? ? ?option password a44739e0-9587-411f-8e6a-9a6a4e46156c > ? ? ?? 27:? ? ?option event-threads 8 > ? ? ?? 28:? ? ?option transport.tcp-user-timeout 0 > ? ? ?? 29:? ? ?option transport.socket.keepalive-time 20 > ? ? ?? 30:? ? ?option transport.socket.keepalive-interval 2 > ? ? ?? 31:? ? ?option transport.socket.keepalive-count 9 > ? ? ?? 32:? ? ?option send-gids true > ? ? ?? 33: end-volume > ? ? ?? 34: > ? ? ?? 35: volume gv01-replicate-0 > ? ? ?? 36:? ? ?type cluster/replicate > ? ? ?? 37:? ? ?option afr-pending-xattr gv01-client-0,gv01-client-1 > ? ? ?? 38:? ? ?option quorum-type auto > ? ? ?? 39:? ? ?option use-compound-fops off > ? ? ?? 40:? ? ?subvolumes gv01-client-0 gv01-client-1 > ? ? ?? 41: end-volume > ? ? ?? 42: > ? ? ?? 43: volume gv01-dht > ? ? ?? 44:? ? ?type cluster/distribute > ? ? ?? 45:? ? ?option lock-migration off > ? ? ?? 46:? ? ?subvolumes gv01-replicate-0 > ? ? ?? 47: end-volume > ? ? ?? 48: > ? ? ?? 49: volume gv01-write-behind > ? ? ?? 50:? ? ?type performance/write-behind > ? ? ?? 51:? ? ?option cache-size 8MB > ? ? ?? 52:? ? ?subvolumes gv01-dht > ? ? ?? 53: end-volume > ? ? ?? 54: > ? ? ?? 55: volume gv01-read-ahead > ? ? ?? 56:? ? ?type performance/read-ahead > ? ? ?? 57:? ? ?subvolumes gv01-write-behind > ? ? ?? 58: end-volume > ? ? ?? 59: > ? ? ?? 60: volume gv01-readdir-ahead > ? ? ?? 61:? ? ?type performance/readdir-ahead > ? ? ?? 62:? ? ?option parallel-readdir off > ? ? ?? 63:? ? ?option rda-request-size 131072 > ? ? ?? 64:? ? ?option rda-cache-limit 10MB > ? ? ?? 65:? ? ?subvolumes gv01-read-ahead > ? ? ?? 66: end-volume > ? ? ?? 67: > ? ? ?? 68: volume gv01-io-cache > ? ? ?? 69:? ? ?type performance/io-cache > ? ? ?? 70:? ? ?option cache-size 1GB > ? ? ?? 71:? ? ?subvolumes gv01-readdir-ahead > ? ? ?? 72: end-volume > ? ? ?? 73: > ? ? ?? 74: volume gv01-quick-read > ? ? ?? 75:? ? ?type performance/quick-read > ? ? ?? 76:? ? ?option cache-size 1GB > ? ? ?? 77:? ? ?subvolumes gv01-io-cache > ? ? ?? 78: end-volume > ? ? ?? 79: > ? ? ?? 80: volume gv01-open-behind > ? ? ?? 81:? ? ?type performance/open-behind > ? ? ?? 82:? ? ?subvolumes gv01-quick-read > ? ? ?? 83: end-volume > ? ? ?? 84: > ? ? ?? 85: volume gv01-md-cache > ? ? ?? 86:? ? ?type performance/md-cache > ? ? ?? 87:? ? ?subvolumes gv01-open-behind > ? ? ?? 88: end-volume > ? ? ?? 89: > ? ? ?? 90: volume gv01 > ? ? ?? 91:? ? ?type debug/io-stats > ? ? ?? 92:? ? ?option log-level INFO > ? ? ?? 93:? ? ?option latency-measurement off > ? ? ?? 94:? ? ?option count-fop-hits off > ? ? ?? 95:? ? ?subvolumes gv01-md-cache > ? ? ?? 96: end-volume > ? ? ?? 97: > ? ? ?? 98: volume meta-autoload > ? ? ?? 99:? ? ?type meta > ? ? 100:? ? ?subvolumes gv01 > ? ? 101: end-volume > ? ? 102: > > +------------------------------------------------------------------------------+ > ? ? [2018-04-09 05:08:13.922631] E > [socket.c:2374:socket_connect_finish] > ? ? 0-gv01-client-1: connection to 192.168.0.119:24007 > <http://192.168.0.119:24007> > ? ? <http://192.168.0.119:24007> failed (No route to > > ? ? host); disconnecting socket > ? ? [2018-04-09 05:08:13.922690] E [MSGID: 108006] > ? ? [afr-common.c:5164:__afr_handle_child_down_event] > 0-gv01-replicate-0: > ? ? All subvolumes are down. Going offline until atleast one of > them comes > ? ? back up. > ? ? [2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init] > ? ? 0-glusterfs-fuse: FUSE inited with protocol versions: > glusterfs 7.24 > ? ? kernel 7.22 > ? ? [2018-04-09 05:08:13.926245] I > [fuse-bridge.c:4835:fuse_graph_sync] > ? ? 0-fuse: switched to graph 0 > ? ? [2018-04-09 05:08:13.926518] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.926671] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.926762] E > [fuse-bridge.c:4271:fuse_first_lookup] > ? ? 0-fuse: first lookup on root failed (Transport endpoint is not > ? ? connected) > ? ? [2018-04-09 05:08:13.927207] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.927262] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.927301] W > ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > ? ? 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport > ? ? endpoint is not connected) > ? ? [2018-04-09 05:08:13.927339] E > [fuse-bridge.c:900:fuse_getattr_resume] > ? ? 0-glusterfs-fuse: 2: GETATTR 1 > (00000000-0000-0000-0000-000000000001) > ? ? resolution failed > ? ? [2018-04-09 05:08:13.931497] I [MSGID: 108006] > ? ? [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no > subvolumes up > ? ? [2018-04-09 05:08:13.931558] E [MSGID: 101046] > ? ? [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null > ? ? [2018-04-09 05:08:13.931599] W > ? ? [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: > ? ? 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport > ? ? endpoint is not connected) > ? ? [2018-04-09 05:08:13.931623] E > [fuse-bridge.c:900:fuse_getattr_resume] > ? ? 0-glusterfs-fuse: 3: GETATTR 1 > (00000000-0000-0000-0000-000000000001) > ? ? resolution failed > ? ? [2018-04-09 05:08:13.937258] I > [fuse-bridge.c:5093:fuse_thread_proc] > ? ? 0-fuse: initating unmount of /n > ? ? [2018-04-09 05:08:13.938043] W > [glusterfsd.c:1393:cleanup_and_exit] > ? ? (-->/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25] > ? ? -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) > [0x560b52471675] > ? ? -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) > [0x560b5247149b] ) 0-: > ? ? received signum (15), shutting down > ? ? [2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] > 0-fuse: > ? ? Unmounting '/n'. > ? ? [2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] > 0-fuse: Closing > ? ? fuse connection to '/n'. > > ? ? ==> glusterd.log <=> ? ? [2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect] > ? ? 0-management: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > > ? ? ==> glustershd.log <=> ? ? [2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > ? ? [2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect] > ? ? 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: > "Protocol not > ? ? available" > > > > > > > > ? ? -- > ? ? Cheers, > ? ? Tom K. > > ------------------------------------------------------------------------------------- > > ? ? Living on earth is expensive, but it includes a free trip > around the > ? ? sun. > > ? ? _______________________________________________ > ? ? Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > <mailto:Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org>> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > -- > Cheers, > Tom K. > ------------------------------------------------------------------------------------- > > Living on earth is expensive, but it includes a free trip around the > sun. > >
Possibly Parallel Threads
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- volume start: gv01: failed: Quorum not met. Volume operation not allowed.
- [SOLVED] [Nfs-ganesha-support] volume start: gv01: failed: Quorum not met. Volume operation not allowed.