Is it possible that version 9.1 and 9.6 can't talk to each other? My
understanding was that they should be able to.
On Fri, 24 Feb 2023 at 10:36, David Cunningham <dcunningham at
voisonics.com>
wrote:
> We've tried to remove "sg" from the cluster so we can
re-install the
> GlusterFS node on it, but the following command run on "br" also
gives a
> timeout error:
>
> gluster volume remove-brick gvol0 replica 1
> sg:/nodirectwritedata/gluster/gvol0 force
>
> How can we tell "br" to just remove "sg" without trying
to contact it?
>
>
> On Fri, 24 Feb 2023 at 10:31, David Cunningham <dcunningham at
voisonics.com>
> wrote:
>
>> Hello,
>>
>> We have a cluster with two nodes, "sg" and "br",
which were running
>> GlusterFS 9.1, installed via the Ubuntu package manager. We updated the
>> Ubuntu packages on "sg" to version 9.6, and now have big
problems. The "br"
>> node is still on version 9.1.
>>
>> Running "gluster volume status" on either host gives
"Error : Request
>> timed out". On "sg" not all processes are running,
compared to "br", as
>> below. Restarting the services on "sg" doesn't help. Can
anyone advise how
>> we should proceed? This is a production system.
>>
>> root at sg:~# ps -ef | grep gluster
>> root 15196 1 0 22:37 ? 00:00:00 /usr/sbin/glusterd -p
>> /var/run/glusterd.pid --log-level INFO
>> root 15426 1 0 22:39 ? 00:00:00 /usr/bin/python3
>> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
>> root 15457 15426 0 22:39 ? 00:00:00 /usr/bin/python3
>> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
>> root 19341 13695 0 23:24 pts/1 00:00:00 grep --color=auto
gluster
>>
>> root at br:~# ps -ef | grep gluster
>> root 2052 1 0 2022 ? 00:00:00 /usr/bin/python3
>> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
>> root 2062 1 3 2022 ? 10-11:57:16 /usr/sbin/glusterfs
>> --fuse-mountopts=noatime --process-name fuse --volfile-server=br
>> --volfile-server=sg --volfile-id=/gvol0 --fuse-mountopts=noatime
>> /mnt/glusterfs
>> root 2379 2052 0 2022 ? 00:00:00 /usr/bin/python3
>> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
>> root 5884 1 5 2022 ? 18-16:08:53 /usr/sbin/glusterfsd
>> -s br --volfile-id gvol0.br.nodirectwritedata-gluster-gvol0 -p
>> /var/run/gluster/vols/gvol0/br-nodirectwritedata-gluster-gvol0.pid -S
>> /var/run/gluster/61df1d4e1c65300e.socket --brick-name
>> /nodirectwritedata/gluster/gvol0 -l
>> /var/log/glusterfs/bricks/nodirectwritedata-gluster-gvol0.log
>> --xlator-option
*-posix.glusterd-uuid=11e528b0-8c69-4b5d-82ed-c41dd25536d6
>> --process-name brick --brick-port 49152 --xlator-option
>> gvol0-server.listen-port=49152
>> root 10463 18747 0 23:24 pts/1 00:00:00 grep --color=auto
gluster
>> root 27744 1 0 2022 ? 03:55:10 /usr/sbin/glusterfsd -s
>> br --volfile-id gvol0.br.nodirectwritedata-gluster-gvol0 -p
>> /var/run/gluster/vols/gvol0/br-nodirectwritedata-gluster-gvol0.pid -S
>> /var/run/gluster/61df1d4e1c65300e.socket --brick-name
>> /nodirectwritedata/gluster/gvol0 -l
>> /var/log/glusterfs/bricks/nodirectwritedata-gluster-gvol0.log
>> --xlator-option
*-posix.glusterd-uuid=11e528b0-8c69-4b5d-82ed-c41dd25536d6
>> --process-name brick --brick-port 49153 --xlator-option
>> gvol0-server.listen-port=49153
>> root 48227 1 0 Feb17 ? 00:00:26 /usr/sbin/glusterd -p
>> /var/run/glusterd.pid --log-level INFO
>>
>> On "sg" in glusterd.log we're seeing:
>>
>> [2023-02-23 20:26:57.619318 +0000] E [rpc-clnt.c:181:call_bail]
>> 0-management: bailing out frame type(glusterd mgmt v3), op(--(6)), xid
>> 0x11, unique = 27, sent = 2023-02-23 20:16:50.596447 +0000, timeout =
600
>> for 10.20.20.11:24007
>> [2023-02-23 20:26:57.619425 +0000] E [MSGID: 106115]
>> [glusterd-mgmt.c:122:gd_mgmt_v3_collate_errors] 0-management: Unlocking
>> failed on br. Please check log file for details.
>> [2023-02-23 20:26:57.619545 +0000] E [MSGID: 106151]
>> [glusterd-syncop.c:1655:gd_unlock_op_phase] 0-management: Failed to
unlock
>> on some peer(s)
>> [2023-02-23 20:26:57.619693 +0000] W
>> [glusterd-locks.c:817:glusterd_mgmt_v3_unlock]
>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe19b9)
>> [0x7fadf47fa9b9]
>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe0e20)
>> [0x7fadf47f9e20]
>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe7904)
>> [0x7fadf4800904] ) 0-management: Lock owner mismatch. Lock for vol
gvol0
>> held by 11e528b0-8c69-4b5d-82ed-c41dd25536d6
>> [2023-02-23 20:26:57.619780 +0000] E [MSGID: 106117]
>> [glusterd-syncop.c:1679:gd_unlock_op_phase] 0-management: Unable to
release
>> lock for gvol0
>> [2023-02-23 20:26:57.619939 +0000] I
>> [socket.c:3811:socket_submit_outgoing_msg] 0-socket.management: not
>> connected (priv->connected = -1)
>> [2023-02-23 20:26:57.619969 +0000] E
>> [rpcsvc.c:1567:rpcsvc_submit_generic] 0-rpc-service: failed to submit
>> message (XID: 0x3, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to
>> rpc-transport (socket.management)
>> [2023-02-23 20:26:57.619995 +0000] E [MSGID: 106430]
>> [glusterd-utils.c:678:glusterd_submit_reply] 0-glusterd: Reply
submission
>> failed
>>
>> And in the brick log:
>>
>> [2023-02-23 20:22:56.717721 +0000] I
[addr.c:54:compare_addr_and_update]
>> 0-/nodirectwritedata/gluster/gvol0: allowed = "*", received
addr >> "10.20.20.11"
>> [2023-02-23 20:22:56.717817 +0000] I [login.c:110:gf_auth]
0-auth/login:
>> allowed user names: a26c7de4-1236-4e0a-944a-cb82de7f7f0e
>> [2023-02-23 20:22:56.717840 +0000] I [MSGID: 115029]
>> [server-handshake.c:561:server_setvolume] 0-gvol0-server: accepted
client
>> from
>>
CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0
>> (version: 9.1) with subvol /nodirectwritedata/gluster/gvol0
>> [2023-02-23 20:22:56.741545 +0000] W [socket.c:766:__socket_rwv]
>> 0-tcp.gvol0-server: readv on 10.20.20.11:49144 failed (No data
available)
>> [2023-02-23 20:22:56.741599 +0000] I [MSGID: 115036]
>> [server.c:500:server_rpc_notify] 0-gvol0-server: disconnecting
connection
>>
[{client-uid=CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0}]
>>
>> [2023-02-23 20:22:56.741866 +0000] I [MSGID: 101055]
>> [client_t.c:397:gf_client_unref] 0-gvol0-server: Shutting down
connection
>>
CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0
>>
>>
>> Thanks for your help,
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230224/96c49112/attachment.html>