Thanks for all the information.
For pstack output, gluster-debuginfo package has to be installed. I will
check out the provided information and get back to you.
On Wed, Apr 22, 2020 at 11:54 AM <nico at furyweb.fr> wrote:
> I think all issues are linked with the same underlying problem.
>
> 1. all peers were in Connected state from every node yesterday but node 2
> is "semi-connected" at now:
> root at glusterDevVM1:/var/log/glusterfs# gluster peer status
> Number of Peers: 2
>
> Hostname: glusterDevVM3
> Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
> State: Peer in Cluster (Connected)
>
> Hostname: glusterDevVM2
> Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
> State: Peer in Cluster (Connected)
> root at glusterDevVM2:~# gluster peer status
> Number of Peers: 2
>
> Hostname: glusterDevVM1
> Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
> State: Peer in Cluster (Disconnected)
>
> Hostname: glusterDevVM3
> Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
> State: Peer in Cluster (Connected)
> root at glusterDevVM3:~# gluster peer status
> Number of Peers: 2
>
> Hostname: glusterDevVM2
> Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
> State: Peer in Cluster (Connected)
>
> Hostname: glusterDevVM1
> Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
> State: Peer in Cluster (Connected)
> 2, 3, 4. a simple gluster volume status show multiple errors on each node:
> root at glusterDevVM1:~# gluster volume status tmp
> Locking failed on glusterDevVM2. Please check log file for details.
> root at glusterDevVM2:~# gluster volume status tmp
> Another transaction is in progress for tmp. Please try again after some
> time.
> root at glusterDevVM3:~# gluster volume status tmp
> Error : Request timed out
>
> Logs for each node (except SSL errors):
> root at glusterDevVM1:~# egrep -v '0-socket.management'
> /var/log/glusterfs/glusterd.log
> [2020-04-22 05:38:32.278618] E [rpc-clnt.c:346:saved_frames_unwind] (-->
>
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7fd28d99fda8]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7fd28d745d97]
(-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7fd28d745ebe] (-->
>
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7fd28d746e93]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7fd28d747a18]
)))))
> 0-management: forced unwinding frame type(Peer mgmt) op(--(4)) called at
> 2020-04-22 05:38:32.243087 (xid=0x8d)
> [2020-04-22 05:38:32.278638] E [MSGID: 106157]
> [glusterd-rpc-ops.c:665:__glusterd_friend_update_cbk] 0-management: RPC
> Error
> [2020-04-22 05:38:32.278651] I [MSGID: 106493]
> [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management:
> Received RJT from uuid: 00000000-0000-0000-0000-000000000000
> [2020-04-22 05:38:33.256401] I [MSGID: 106498]
> [glusterd-svc-helper.c:747:__glusterd_send_svc_configure_req] 0-management:
> not connected yet
> [2020-04-22 05:38:35.279149] I [socket.c:4347:ssl_setup_connection_params]
> 0-management: SSL support on the I/O path is ENABLED
> [2020-04-22 05:38:35.279169] I [socket.c:4350:ssl_setup_connection_params]
> 0-management: SSL support for glusterd is ENABLED
> [2020-04-22 05:38:35.279178] I [socket.c:4360:ssl_setup_connection_params]
> 0-management: using certificate depth 1
> The message "I [MSGID: 106004]
> [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer
> <glusterDevVM2> (<7f6c3023-144b-4db2-9063-d90926dbdd18>), in
state <Peer in
> Cluster>, has disconnected from glusterd." repeated 3 times between
> [2020-04-22 05:38:25.232116] and [2020-04-22 05:38:35.667153]
> [2020-04-22 05:38:35.667255] W
> [glusterd-locks.c:796:glusterd_mgmt_v3_unlock]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119)
> [0x7fd287da7119]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae)
> [0x7fd287db1aae]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3)
> [0x7fd287e640d3] ) 0-management: Lock for vol <vol> not held
> [2020-04-22 05:38:35.667275] W [MSGID: 106117]
> [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not
> released for <vol>
> 2 last lines repeated for each volume
>
> root at glusterDevVM2:~# egrep -v '0-socket.management'
> /var/log/glusterfs/glusterd.log
> [2020-04-22 05:51:57.493574] E [rpc-clnt.c:346:saved_frames_unwind] (-->
>
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7f30411dbda8]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7f3040f81d97]
(-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7f3040f81ebe] (-->
>
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7f3040f82e93]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7f3040f83a18]
)))))
> 0-management: forced unwinding frame type(Gluster MGMT Handshake)
> op(MGMT-VERS(1)) called at 2020-04-22 05:51:57.483579 (xid=0x563)
> [2020-04-22 05:51:57.493623] E [MSGID: 106167]
> [glusterd-handshake.c:2040:__glusterd_mgmt_hndsk_version_cbk] 0-management:
> Error through RPC layer, retry again later
> [2020-04-22 05:52:00.501474] I [socket.c:4347:ssl_setup_connection_params]
> 0-management: SSL support on the I/O path is ENABLED
> [2020-04-22 05:52:00.501542] I [socket.c:4350:ssl_setup_connection_params]
> 0-management: SSL support for glusterd is ENABLED
> [2020-04-22 05:52:00.501569] I [socket.c:4360:ssl_setup_connection_params]
> 0-management: using certificate depth 1
> [2020-04-22 05:52:00.983720] I [MSGID: 106004]
> [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer
> <glusterDevVM1> (<e2263e4d-a307-45d5-9cec-e1791f7a45fb>), in
state <Peer in
> Cluster>, has disconnected from glusterd.
> [2020-04-22 05:52:00.983886] W
> [glusterd-locks.c:796:glusterd_mgmt_v3_unlock]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119)
> [0x7f303b5e3119]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae)
> [0x7f303b5edaae]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3)
> [0x7f303b6a00d3] ) 0-management: Lock for vol <vol> not held
> [2020-04-22 05:52:00.983909] W [MSGID: 106117]
> [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not
> released for <vol>
> 2 last lines repeated for each volume
>
> root at glusterDevVM3:~# egrep -v '0-socket.management'
> /var/log/glusterfs/glusterd.log
> [2020-04-22 05:38:33.229959] I [MSGID: 106499]
> [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume tmp
> [2020-04-22 05:41:33.230170] I
> [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock
> timer is cancelled for volume_type tmp_vol
> [2020-04-22 05:48:34.908289] E [rpc-clnt.c:183:call_bail] 0-management:
> bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x108, unique
> 918, sent = 2020-04-22 05:38:33.230268, timeout = 600 for 10.5.1.7:24007
> [2020-04-22 05:48:34.908339] E [MSGID: 106115]
> [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking
> failed on glusterDevVM1. Please check log file for details.
> [2020-04-22 05:48:40.288539] E [rpc-clnt.c:183:call_bail] 0-management:
> bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x27, unique
> 917, sent = 2020-04-22 05:38:33.230258, timeout = 600 for 10.5.1.8:24007
> [2020-04-22 05:48:40.288568] E [MSGID: 106115]
> [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking
> failed on glusterDevVM2. Please check log file for details.
> [2020-04-22 05:48:40.288631] E [MSGID: 106150]
> [glusterd-syncop.c:1918:gd_sync_task_begin] 0-management: Locking Peers
> Failed.
>
> I'm not familiar with pstack, when running on node 3 (arbiter) I get
only
> these few lines :
> root at glusterDevVM3:~# pstack 13700
>
> 13700: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
> (No symbols found)
> 0x7fafd747a6cd: ????
>
> Which debian stretch package should I install ?
>
> To be more explicit, I stopped glusterd on all 3 nodes then restart
> sequentially with this order : node1, node3 (arbiter) then node2.
> Log files can be dl at
> https://www.dropbox.com/s/rcgcw7jrud2wkj1/glusterd-logs.tar.bz2?dl=0
>
> Thanks for your help.
>
> ------------------------------
> *De: *"Sanju Rakonde" <srakonde at redhat.com>
> *?: *nico at furyweb.fr
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Envoy?: *Mercredi 22 Avril 2020 07:23:52
> *Objet: *Re: [Gluster-users] never ending logging
>
> Hi,
> The email is talking about many issues. Let me ask a few questions to get
> a whole picture.
> 1. are the peers are in the connected state now? or they still in the
> rejected state?
> 2. What led you to see "locking failed" messages? We would like
to if
> there is a reproducer and fix the issue if any.
> 3. Another transaction in progress message appears when there is already a
> operation going on. Are you seeing this when there is no such transaction
> going on?
> 4. When did you hit the timedouts? Did you tried to look at the pstack
> output of glusterd process? If so, please share the pstack output.
>
> On Tue, Apr 21, 2020 at 7:08 PM <nico at furyweb.fr> wrote:
>
>> Hi all.
>>
>> We're using 3 nodes Gluster 7.3 (2 + 1 arbiter), yesterday node 2
was
>> rejected from cluster and I applied following steps to fix :
>>
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
>> I also saw
>>
https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/
>> but solution isn't compatible as cluster.max-op-version doesn't
exist and
>> all op-version are the same on all 3 nodes.
>>
>> After renewing SSL certs and several restart all volumes came back
online
>> but glusterd log file on all 3 nodes is filled with nothing else than
>> following 3 lines :
>>
>> [2020-04-21 13:05:19.478913] I
>> [socket.c:4347:ssl_setup_connection_params] 0-socket.management: SSL
>> support on the I/O path is ENABLED
>> [2020-04-21 13:05:19.478972] I
>> [socket.c:4350:ssl_setup_connection_params] 0-socket.management: SSL
>> support for glusterd is ENABLED
>> [2020-04-21 13:05:19.478986] I
>> [socket.c:4360:ssl_setup_connection_params] 0-socket.management: using
>> certificate depth 1
>>
>> Moreover, I have "Locking failed", "Another transaction
is in progress"
>> and "Error : Request timed out" on gluster volume status
volxxx command.
>> All SSL certs on clients have also been renewed and all volumes were
>> remounted. All 3 nodes were alternatively restarted (glusterd) and
rebooted.
>>
>> The cluster is not in production environment but there's about ~250
>> clients for ~75 volumes, I don't know how to troubleshoot and fix
this
>> problem, if anyone has an idea.
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> Thanks,
> Sanju
>
>
--
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200422/ca920d3d/attachment.html>