thr3ads.net - Gluster users - [Gluster-users] Upgrade 10.4 -> 11.1 making problems [Jan 2024]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2024-Jan-18 21:58 UTC

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Are you able to set the logs to debug level ?It might provide a clue what it is
going on.
Best Regards,Strahil Nikolov 
 
  On Thu, Jan 18, 2024 at 13:08, Diego Zuccato<diego.zuccato at unibo.it>
wrote:   That's the same kind of errors I keep seeing on my 2 clusters,
regenerated some months ago. Seems a pseudo-split-brain that should be 
impossible on a replica 3 cluster but keeps happening.
Sadly going to ditch Gluster ASAP.

Diego

Il 18/01/2024 07:11, Hu Bert ha scritto:> Good morning,
> heal still not running. Pending heals now sum up to 60K per brick.
> Heal was starting instantly e.g. after server reboot with version
> 10.4, but doesn't with version 11. What could be wrong?
> 
> I only see these errors on one of the "good" servers in
glustershd.log:
> 
> [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:
> remote operation failed.
> [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
> {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> f00681b}, {errno=2}, {error=No such file or directory}]
> [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:
> remote operation failed.
> [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
> {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> d94dd11}, {errno=2}, {error=No such file or directory}]
> 
> About 7K today. Any ideas? Someone?
> 
> 
> Best regards,
> Hubert
> 
> Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii at
googlemail.com>:
>>
>> ok, finally managed to get all servers, volumes etc runnung, but took
>> a couple of restarts, cksum checks etc.
>>
>> One problem: a volume doesn't heal automatically or doesn't
heal at all.
>>
>> gluster volume status
>> Status of volume: workdata
>> Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port? RDMA Port? Online?
Pid
>>
------------------------------------------------------------------------------
>> Brick glusterpub1:/gluster/md3/workdata? ? 58832? ? 0? ? ? ? ? Y? ? ?
3436
>> Brick glusterpub2:/gluster/md3/workdata? ? 59315? ? 0? ? ? ? ? Y? ? ?
1526
>> Brick glusterpub3:/gluster/md3/workdata? ? 56917? ? 0? ? ? ? ? Y? ? ?
1952
>> Brick glusterpub1:/gluster/md4/workdata? ? 59688? ? 0? ? ? ? ? Y? ? ?
3755
>> Brick glusterpub2:/gluster/md4/workdata? ? 60271? ? 0? ? ? ? ? Y? ? ?
2271
>> Brick glusterpub3:/gluster/md4/workdata? ? 49461? ? 0? ? ? ? ? Y? ? ?
2399
>> Brick glusterpub1:/gluster/md5/workdata? ? 54651? ? 0? ? ? ? ? Y? ? ?
4208
>> Brick glusterpub2:/gluster/md5/workdata? ? 49685? ? 0? ? ? ? ? Y? ? ?
2751
>> Brick glusterpub3:/gluster/md5/workdata? ? 59202? ? 0? ? ? ? ? Y? ? ?
2803
>> Brick glusterpub1:/gluster/md6/workdata? ? 55829? ? 0? ? ? ? ? Y? ? ?
4583
>> Brick glusterpub2:/gluster/md6/workdata? ? 50455? ? 0? ? ? ? ? Y? ? ?
3296
>> Brick glusterpub3:/gluster/md6/workdata? ? 50262? ? 0? ? ? ? ? Y? ? ?
3237
>> Brick glusterpub1:/gluster/md7/workdata? ? 52238? ? 0? ? ? ? ? Y? ? ?
5014
>> Brick glusterpub2:/gluster/md7/workdata? ? 52474? ? 0? ? ? ? ? Y? ? ?
3673
>> Brick glusterpub3:/gluster/md7/workdata? ? 57966? ? 0? ? ? ? ? Y? ? ?
3653
>> Self-heal Daemon on localhost? ? ? ? ? ? ? N/A? ? ? N/A? ? ? ? Y? ? ?
4141
>> Self-heal Daemon on glusterpub1? ? ? ? ? ? N/A? ? ? N/A? ? ? ? Y? ? ?
5570
>> Self-heal Daemon on glusterpub2? ? ? ? ? ? N/A? ? ? N/A? ? ? ? Y? ? ?
4139
>>
>> "gluster volume heal workdata info" lists a lot of files per
brick.
>> "gluster volume heal workdata statistics heal-count" shows
thousands
>> of files per brick.
>> "gluster volume heal workdata enable" has no effect.
>>
>> gluster volume heal workdata full
>> Launching heal operation to perform full self heal on volume workdata
>> has been successful
>> Use heal info commands to check status.
>>
>> -> not doing anything at all. And nothing happening on the 2
"good"
>> servers in e.g. glustershd.log. Heal was working as expected on
>> version 10.4, but here... silence. Someone has an idea?
>>
>>
>> Best regards,
>> Hubert
>>
>> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
>> <gilberto.nunes32 at gmail.com>:
>>>
>>> Ah! Indeed! You need to perform an upgrade in the clients as well.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert <revirii at
googlemail.com> escreveu:
>>>>
>>>> morning to those still reading :-)
>>>>
>>>> i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
>>>>
>>>> there's a paragraph about "peer rejected" with
the same error message,
>>>> telling me: "Update the cluster.op-version" - i had
only updated the
>>>> server nodes, but not the clients. So upgrading the
cluster.op-version
>>>> wasn't possible at this time. So... upgrading the clients
to version
>>>> 11.1 and then the op-version should solve the problem?
>>>>
>>>>
>>>> Thx,
>>>> Hubert
>>>>
>>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii
at googlemail.com>:
>>>>>
>>>>> Hi,
>>>>> just upgraded some gluster servers from version 10.4 to
version 11.1.
>>>>> Debian bullseye & bookworm. When only installing the
packages: good,
>>>>> servers, volumes etc. work as expected.
>>>>>
>>>>> But one needs to test if the systems work after a daemon
and/or server
>>>>> restart. Well, did a reboot, and after that the
rebooted/restarted
>>>>> system is "out". Log message from working node:
>>>>>
>>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]
>>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
>>>>> 0-management: using the op-version 100000
>>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]
>>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
>>>>> 0-glusterd: Received probe from uuid:
>>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
>>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]
>>>>> [glusterd-utils.c:3824:glusterd_compare_friend_volume]
0-management:
>>>>> Version of Cksums sourceimages differ. local cksum =
2204642525,
>>>>> remote cksum = 1931483801 on peer gluster190
>>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]
>>>>> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp]
0-glusterd:
>>>>> Responded to gluster190 (0), ret: 0, op_ret: -1
>>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]
>>>>> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk]
0-glusterd:
>>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e, host:
>>>>> gluster190, port: 0
>>>>>
>>>>> peer status from rebooted node:
>>>>>
>>>>> root at gluster190 ~ # gluster peer status
>>>>> Number of Peers: 2
>>>>>
>>>>> Hostname: gluster189
>>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
>>>>> State: Peer Rejected (Connected)
>>>>>
>>>>> Hostname: gluster188
>>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
>>>>> State: Peer Rejected (Connected)
>>>>>
>>>>> So the rebooted gluster190 is not accepted anymore. And
thus does not
>>>>> appear in "gluster volume status". I then
followed this guide:
>>>>>
>>>>>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>>>>>
>>>>> Remove everything under /var/lib/glusterd/ (except
glusterd.info) and
>>>>> restart glusterd service etc. Data get copied from other
nodes,
>>>>> 'gluster peer status' is ok again - but the volume
info is missing,
>>>>> /var/lib/glusterd/vols is empty. When syncing this dir from
another
>>>>> node, the volume then is available again, heals start etc.
>>>>>
>>>>> Well, and just to be sure that everything's working as
it should,
>>>>> rebooted that node again - the rebooted node is kicked out
again, and
>>>>> you have to restart bringing it back again.
>>>>>
>>>>> Sry, but did i miss anything? Has someone experienced
similar
>>>>> problems? I'll probably downgrade to 10.4 again, that
version was
>>>>> working...
>>>>>
>>>>>
>>>>> Thx,
>>>>> Hubert
>>>> ________
>>>>
>>>>
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> Schedule -
>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240118/42bd4695/attachment.html>

Diego Zuccato

2024-Jan-19 06:54 UTC

head link

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

I don't want to hijack the thread. And in my case setting logs to debug 
would fill my /var partitions in no time. Maybe the OP can.

Diego

Il 18/01/2024 22:58, Strahil Nikolov ha scritto:> Are you able to set the logs to debug level ?
> It might provide a clue what it is going on.
> 
> Best Regards,
> Strahil Nikolov
> 
>     On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
>     <diego.zuccato at unibo.it> wrote:
>     That's the same kind of errors I keep seeing on my 2 clusters,
>     regenerated some months ago. Seems a pseudo-split-brain that should be
>     impossible on a replica 3 cluster but keeps happening.
>     Sadly going to ditch Gluster ASAP.
> 
>     Diego
> 
>     Il 18/01/2024 07:11, Hu Bert ha scritto:
>      > Good morning,
>      > heal still not running. Pending heals now sum up to 60K per
brick.
>      > Heal was starting instantly e.g. after server reboot with version
>      > 10.4, but doesn't with version 11. What could be wrong?
>      >
>      > I only see these errors on one of the "good" servers in
>     glustershd.log:
>      >
>      > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
>      > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-0:
>      > remote operation failed.
>      > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
>      > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
>      > f00681b}, {errno=2}, {error=No such file or directory}]
>      > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
>      > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-1:
>      > remote operation failed.
>      > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
>      > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
>      > d94dd11}, {errno=2}, {error=No such file or directory}]
>      >
>      > About 7K today. Any ideas? Someone?
>      >
>      >
>      > Best regards,
>      > Hubert
>      >
>      > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert
>     <revirii at googlemail.com <mailto:revirii at
googlemail.com>>:
>      >>
>      >> ok, finally managed to get all servers, volumes etc runnung,
but
>     took
>      >> a couple of restarts, cksum checks etc.
>      >>
>      >> One problem: a volume doesn't heal automatically or
doesn't heal
>     at all.
>      >>
>      >> gluster volume status
>      >> Status of volume: workdata
>      >> Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port? RDMA
Port
>     Online? Pid
>      >>
>    
------------------------------------------------------------------------------
>      >> Brick glusterpub1:/gluster/md3/workdata? ? 58832? ? 0
>     Y? ? ? 3436
>      >> Brick glusterpub2:/gluster/md3/workdata? ? 59315? ? 0
>     Y? ? ? 1526
>      >> Brick glusterpub3:/gluster/md3/workdata? ? 56917? ? 0
>     Y? ? ? 1952
>      >> Brick glusterpub1:/gluster/md4/workdata? ? 59688? ? 0
>     Y? ? ? 3755
>      >> Brick glusterpub2:/gluster/md4/workdata? ? 60271? ? 0
>     Y? ? ? 2271
>      >> Brick glusterpub3:/gluster/md4/workdata? ? 49461? ? 0
>     Y? ? ? 2399
>      >> Brick glusterpub1:/gluster/md5/workdata? ? 54651? ? 0
>     Y? ? ? 4208
>      >> Brick glusterpub2:/gluster/md5/workdata? ? 49685? ? 0
>     Y? ? ? 2751
>      >> Brick glusterpub3:/gluster/md5/workdata? ? 59202? ? 0
>     Y? ? ? 2803
>      >> Brick glusterpub1:/gluster/md6/workdata? ? 55829? ? 0
>     Y? ? ? 4583
>      >> Brick glusterpub2:/gluster/md6/workdata? ? 50455? ? 0
>     Y? ? ? 3296
>      >> Brick glusterpub3:/gluster/md6/workdata? ? 50262? ? 0
>     Y? ? ? 3237
>      >> Brick glusterpub1:/gluster/md7/workdata? ? 52238? ? 0
>     Y? ? ? 5014
>      >> Brick glusterpub2:/gluster/md7/workdata? ? 52474? ? 0
>     Y? ? ? 3673
>      >> Brick glusterpub3:/gluster/md7/workdata? ? 57966? ? 0
>     Y? ? ? 3653
>      >> Self-heal Daemon on localhost? ? ? ? ? ? ? N/A? ? ? N/A
>     Y? ? ? 4141
>      >> Self-heal Daemon on glusterpub1? ? ? ? ? ? N/A? ? ? N/A
>     Y? ? ? 5570
>      >> Self-heal Daemon on glusterpub2? ? ? ? ? ? N/A? ? ? N/A
>     Y? ? ? 4139
>      >>
>      >> "gluster volume heal workdata info" lists a lot of
files per brick.
>      >> "gluster volume heal workdata statistics
heal-count" shows thousands
>      >> of files per brick.
>      >> "gluster volume heal workdata enable" has no
effect.
>      >>
>      >> gluster volume heal workdata full
>      >> Launching heal operation to perform full self heal on volume
>     workdata
>      >> has been successful
>      >> Use heal info commands to check status.
>      >>
>      >> -> not doing anything at all. And nothing happening on the
2 "good"
>      >> servers in e.g. glustershd.log. Heal was working as expected
on
>      >> version 10.4, but here... silence. Someone has an idea?
>      >>
>      >>
>      >> Best regards,
>      >> Hubert
>      >>
>      >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
>      >> <gilberto.nunes32 at gmail.com <mailto:gilberto.nunes32
at gmail.com>>:
>      >>>
>      >>> Ah! Indeed! You need to perform an upgrade in the clients
as well.
>      >>>
>      >>>
>      >>>
>      >>>
>      >>>
>      >>>
>      >>>
>      >>>
>      >>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert
>     <revirii at googlemail.com <mailto:revirii at
googlemail.com>> escreveu:
>      >>>>
>      >>>> morning to those still reading :-)
>      >>>>
>      >>>> i found this:
>    
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
<https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them>
>      >>>>
>      >>>> there's a paragraph about "peer
rejected" with the same error
>     message,
>      >>>> telling me: "Update the cluster.op-version"
- i had only
>     updated the
>      >>>> server nodes, but not the clients. So upgrading the
>     cluster.op-version
>      >>>> wasn't possible at this time. So... upgrading the
clients to
>     version
>      >>>> 11.1 and then the op-version should solve the
problem?
>      >>>>
>      >>>>
>      >>>> Thx,
>      >>>> Hubert
>      >>>>
>      >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert
>     <revirii at googlemail.com <mailto:revirii at
googlemail.com>>:
>      >>>>>
>      >>>>> Hi,
>      >>>>> just upgraded some gluster servers from version
10.4 to
>     version 11.1.
>      >>>>> Debian bullseye & bookworm. When only
installing the
>     packages: good,
>      >>>>> servers, volumes etc. work as expected.
>      >>>>>
>      >>>>> But one needs to test if the systems work after a
daemon
>     and/or server
>      >>>>> restart. Well, did a reboot, and after that the
>     rebooted/restarted
>      >>>>> system is "out". Log message from
working node:
>      >>>>>
>      >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID:
106163]
>      >>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
>      >>>>> 0-management: using the op-version 100000
>      >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID:
106490]
>      >>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
>      >>>>> 0-glusterd: Received probe from uuid:
>      >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
>      >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID:
106010]
>      >>>>>
[glusterd-utils.c:3824:glusterd_compare_friend_volume]
>     0-management:
>      >>>>> Version of Cksums sourceimages differ. local
cksum = 2204642525,
>      >>>>> remote cksum = 1931483801 on peer gluster190
>      >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID:
106493]
>      >>>>>
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp]
>     0-glusterd:
>      >>>>> Responded to gluster190 (0), ret: 0, op_ret: -1
>      >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID:
106493]
>      >>>>>
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
>      >>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e,
>     host:
>      >>>>> gluster190, port: 0
>      >>>>>
>      >>>>> peer status from rebooted node:
>      >>>>>
>      >>>>> root at gluster190 <mailto:root at
gluster190> ~ # gluster peer status
>      >>>>> Number of Peers: 2
>      >>>>>
>      >>>>> Hostname: gluster189
>      >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
>      >>>>> State: Peer Rejected (Connected)
>      >>>>>
>      >>>>> Hostname: gluster188
>      >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
>      >>>>> State: Peer Rejected (Connected)
>      >>>>>
>      >>>>> So the rebooted gluster190 is not accepted
anymore. And thus
>     does not
>      >>>>> appear in "gluster volume status". I
then followed this guide:
>      >>>>>
>      >>>>>
>    
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
<https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/>
>      >>>>>
>      >>>>> Remove everything under /var/lib/glusterd/
(except
>     glusterd.info) and
>      >>>>> restart glusterd service etc. Data get copied
from other nodes,
>      >>>>> 'gluster peer status' is ok again - but
the volume info is
>     missing,
>      >>>>> /var/lib/glusterd/vols is empty. When syncing
this dir from
>     another
>      >>>>> node, the volume then is available again, heals
start etc.
>      >>>>>
>      >>>>> Well, and just to be sure that everything's
working as it should,
>      >>>>> rebooted that node again - the rebooted node is
kicked out
>     again, and
>      >>>>> you have to restart bringing it back again.
>      >>>>>
>      >>>>> Sry, but did i miss anything? Has someone
experienced similar
>      >>>>> problems? I'll probably downgrade to 10.4
again, that version was
>      >>>>> working...
>      >>>>>
>      >>>>>
>      >>>>> Thx,
>      >>>>> Hubert
>      >>>> ________
>      >>>>
>      >>>>
>      >>>>
>      >>>> Community Meeting Calendar:
>      >>>>
>      >>>> Schedule -
>      >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>      >>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>      >>>> Gluster-users mailing list
>      >>>> Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>      >>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
>      > ________
>      >
>      >
>      >
>      > Community Meeting Calendar:
>      >
>      > Schedule -
>      > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>      > Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>      > Gluster-users mailing list
>      > Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>      > https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
>     -- 
>     Diego Zuccato
>     DIFA - Dip. di Fisica e Astronomia
>     Servizi Informatici
>     Alma Mater Studiorum - Universit? di Bologna
>     V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>     tel.: +39 051 20 95786
> 
>     ________
> 
> 
> 
>     Community Meeting Calendar:
> 
>     Schedule -
>     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Universit? di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Hu Bert

2024-Jan-19 08:48 UTC

head link

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Hi Strahil,
hm, don't get me wrong, it may sound a bit stupid, but... where do i
set the log level? Using debian...

https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level

ls /etc/glusterfs/
eventsconfig.json  glusterfs-georep-logrotate
gluster-rsyslog-5.8.conf  group-db-workload       group-gluster-block
 group-nl-cache  group-virt.example  logger.conf.example
glusterd.vol       glusterfs-logrotate
gluster-rsyslog-7.2.conf  group-distributed-virt  group-metadata-cache
 group-samba     gsyncd.conf         thin-arbiter.vol

checked: /etc/glusterfs/logger.conf.example

# To enable enhanced logging capabilities,
#
# 1. rename this file to /etc/glusterfs/logger.conf
#
# 2. rename /etc/rsyslog.d/gluster.conf.example to
#    /etc/rsyslog.d/gluster.conf
#
# This change requires restart of all gluster services/volumes and
# rsyslog.

tried (to test): /etc/glusterfs/logger.conf with "
LOG_LEVEL='WARNING' "

restart glusterd on that node, but this doesn't work, log-level stays
on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably
/etc/rsyslog.conf on debian. But first it would be better to know
where to set the log-level for glusterd.

Depending on how much the DEBUG log-level talks ;-) i could assign up
to 100G to /var


Thx & best regards,
Hubert


Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
<hunter86_bg at yahoo.com>:>
> Are you able to set the logs to debug level ?
> It might provide a clue what it is going on.
>
> Best Regards,
> Strahil Nikolov
>
> On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
> <diego.zuccato at unibo.it> wrote:
> That's the same kind of errors I keep seeing on my 2 clusters,
> regenerated some months ago. Seems a pseudo-split-brain that should be
> impossible on a replica 3 cluster but keeps happening.
> Sadly going to ditch Gluster ASAP.
>
> Diego
>
> Il 18/01/2024 07:11, Hu Bert ha scritto:
> > Good morning,
> > heal still not running. Pending heals now sum up to 60K per brick.
> > Heal was starting instantly e.g. after server reboot with version
> > 10.4, but doesn't with version 11. What could be wrong?
> >
> > I only see these errors on one of the "good" servers in
glustershd.log:
> >
> > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:
> > remote operation failed.
> > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
> > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> > f00681b}, {errno=2}, {error=No such file or directory}]
> > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:
> > remote operation failed.
> > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
> > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> > d94dd11}, {errno=2}, {error=No such file or directory}]
> >
> > About 7K today. Any ideas? Someone?
> >
> >
> > Best regards,
> > Hubert
> >
> > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii at
googlemail.com>:
> >>
> >> ok, finally managed to get all servers, volumes etc runnung, but
took
> >> a couple of restarts, cksum checks etc.
> >>
> >> One problem: a volume doesn't heal automatically or
doesn't heal at all.
> >>
> >> gluster volume status
> >> Status of volume: workdata
> >> Gluster process                            TCP Port  RDMA Port 
Online  Pid
> >>
------------------------------------------------------------------------------
> >> Brick glusterpub1:/gluster/md3/workdata    58832    0          Y  
3436
> >> Brick glusterpub2:/gluster/md3/workdata    59315    0          Y  
1526
> >> Brick glusterpub3:/gluster/md3/workdata    56917    0          Y  
1952
> >> Brick glusterpub1:/gluster/md4/workdata    59688    0          Y  
3755
> >> Brick glusterpub2:/gluster/md4/workdata    60271    0          Y  
2271
> >> Brick glusterpub3:/gluster/md4/workdata    49461    0          Y  
2399
> >> Brick glusterpub1:/gluster/md5/workdata    54651    0          Y  
4208
> >> Brick glusterpub2:/gluster/md5/workdata    49685    0          Y  
2751
> >> Brick glusterpub3:/gluster/md5/workdata    59202    0          Y  
2803
> >> Brick glusterpub1:/gluster/md6/workdata    55829    0          Y  
4583
> >> Brick glusterpub2:/gluster/md6/workdata    50455    0          Y  
3296
> >> Brick glusterpub3:/gluster/md6/workdata    50262    0          Y  
3237
> >> Brick glusterpub1:/gluster/md7/workdata    52238    0          Y  
5014
> >> Brick glusterpub2:/gluster/md7/workdata    52474    0          Y  
3673
> >> Brick glusterpub3:/gluster/md7/workdata    57966    0          Y  
3653
> >> Self-heal Daemon on localhost              N/A      N/A        Y  
4141
> >> Self-heal Daemon on glusterpub1            N/A      N/A        Y  
5570
> >> Self-heal Daemon on glusterpub2            N/A      N/A        Y  
4139
> >>
> >> "gluster volume heal workdata info" lists a lot of files
per brick.
> >> "gluster volume heal workdata statistics heal-count"
shows thousands
> >> of files per brick.
> >> "gluster volume heal workdata enable" has no effect.
> >>
> >> gluster volume heal workdata full
> >> Launching heal operation to perform full self heal on volume
workdata
> >> has been successful
> >> Use heal info commands to check status.
> >>
> >> -> not doing anything at all. And nothing happening on the 2
"good"
> >> servers in e.g. glustershd.log. Heal was working as expected on
> >> version 10.4, but here... silence. Someone has an idea?
> >>
> >>
> >> Best regards,
> >> Hubert
> >>
> >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
> >> <gilberto.nunes32 at gmail.com>:
> >>>
> >>> Ah! Indeed! You need to perform an upgrade in the clients as
well.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert <revirii at
googlemail.com> escreveu:
> >>>>
> >>>> morning to those still reading :-)
> >>>>
> >>>> i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> >>>>
> >>>> there's a paragraph about "peer rejected"
with the same error message,
> >>>> telling me: "Update the cluster.op-version" - i
had only updated the
> >>>> server nodes, but not the clients. So upgrading the
cluster.op-version
> >>>> wasn't possible at this time. So... upgrading the
clients to version
> >>>> 11.1 and then the op-version should solve the problem?
> >>>>
> >>>>
> >>>> Thx,
> >>>> Hubert
> >>>>
> >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert
<revirii at googlemail.com>:
> >>>>>
> >>>>> Hi,
> >>>>> just upgraded some gluster servers from version 10.4
to version 11.1.
> >>>>> Debian bullseye & bookworm. When only installing
the packages: good,
> >>>>> servers, volumes etc. work as expected.
> >>>>>
> >>>>> But one needs to test if the systems work after a
daemon and/or server
> >>>>> restart. Well, did a reboot, and after that the
rebooted/restarted
> >>>>> system is "out". Log message from working
node:
> >>>>>
> >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]
> >>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> >>>>> 0-management: using the op-version 100000
> >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]
> >>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> >>>>> 0-glusterd: Received probe from uuid:
> >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
> >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]
> >>>>> [glusterd-utils.c:3824:glusterd_compare_friend_volume]
0-management:
> >>>>> Version of Cksums sourceimages differ. local cksum =
2204642525,
> >>>>> remote cksum = 1931483801 on peer gluster190
> >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]
> >>>>>
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> >>>>> Responded to gluster190 (0), ret: 0, op_ret: -1
> >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]
> >>>>> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk]
0-glusterd:
> >>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> >>>>> gluster190, port: 0
> >>>>>
> >>>>> peer status from rebooted node:
> >>>>>
> >>>>> root at gluster190 ~ # gluster peer status
> >>>>> Number of Peers: 2
> >>>>>
> >>>>> Hostname: gluster189
> >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> >>>>> State: Peer Rejected (Connected)
> >>>>>
> >>>>> Hostname: gluster188
> >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> >>>>> State: Peer Rejected (Connected)
> >>>>>
> >>>>> So the rebooted gluster190 is not accepted anymore.
And thus does not
> >>>>> appear in "gluster volume status". I then
followed this guide:
> >>>>>
> >>>>>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
> >>>>>
> >>>>> Remove everything under /var/lib/glusterd/ (except
glusterd.info) and
> >>>>> restart glusterd service etc. Data get copied from
other nodes,
> >>>>> 'gluster peer status' is ok again - but the
volume info is missing,
> >>>>> /var/lib/glusterd/vols is empty. When syncing this dir
from another
> >>>>> node, the volume then is available again, heals start
etc.
> >>>>>
> >>>>> Well, and just to be sure that everything's
working as it should,
> >>>>> rebooted that node again - the rebooted node is kicked
out again, and
> >>>>> you have to restart bringing it back again.
> >>>>>
> >>>>> Sry, but did i miss anything? Has someone experienced
similar
> >>>>> problems? I'll probably downgrade to 10.4 again,
that version was
> >>>>> working...
> >>>>>
> >>>>>
> >>>>> Thx,
> >>>>> Hubert
> >>>> ________
> >>>>
> >>>>
> >>>>
> >>>> Community Meeting Calendar:
> >>>>
> >>>> Schedule -
> >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >>>> Bridge: https://meet.google.com/cpu-eiue-hvk
> >>>> Gluster-users mailing list
> >>>> Gluster-users at gluster.org
> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Universit? di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

Apparently Analagous Threads

Search for more seemingly similar threads

Gluster users - Jan 2024 - Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Apparently Analagous Threads