thr3ads.net - Gluster users - [Gluster-users] Upgrade 10.4 -> 11.1 making problems [Jan 2024]

If this information is useful, please help other people find it:
Share via:

Gilberto Ferreira

2024-Jan-19 15:23 UTC

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

gluster volume set testvol diagnostics.brick-log-level WARNING
gluster volume set testvol diagnostics.brick-sys-log-level WARNING
gluster volume set testvol diagnostics.client-log-level ERROR
gluster --log-level=ERROR volume status

---
Gilberto Nunes Ferreira






Em sex., 19 de jan. de 2024 ?s 05:49, Hu Bert <revirii at googlemail.com>
escreveu:
> Hi Strahil,
> hm, don't get me wrong, it may sound a bit stupid, but... where do i
> set the log level? Using debian...
>
>
>
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>
> ls /etc/glusterfs/
> eventsconfig.json  glusterfs-georep-logrotate
> gluster-rsyslog-5.8.conf  group-db-workload       group-gluster-block
>  group-nl-cache  group-virt.example  logger.conf.example
> glusterd.vol       glusterfs-logrotate
> gluster-rsyslog-7.2.conf  group-distributed-virt  group-metadata-cache
>  group-samba     gsyncd.conf         thin-arbiter.vol
>
> checked: /etc/glusterfs/logger.conf.example
>
> # To enable enhanced logging capabilities,
> #
> # 1. rename this file to /etc/glusterfs/logger.conf
> #
> # 2. rename /etc/rsyslog.d/gluster.conf.example to
> #    /etc/rsyslog.d/gluster.conf
> #
> # This change requires restart of all gluster services/volumes and
> # rsyslog.
>
> tried (to test): /etc/glusterfs/logger.conf with "
LOG_LEVEL='WARNING' "
>
> restart glusterd on that node, but this doesn't work, log-level stays
> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably
> /etc/rsyslog.conf on debian. But first it would be better to know
> where to set the log-level for glusterd.
>
> Depending on how much the DEBUG log-level talks ;-) i could assign up
> to 100G to /var
>
>
> Thx & best regards,
> Hubert
>
>
> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
> <hunter86_bg at yahoo.com>:
> >
> > Are you able to set the logs to debug level ?
> > It might provide a clue what it is going on.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
> > <diego.zuccato at unibo.it> wrote:
> > That's the same kind of errors I keep seeing on my 2 clusters,
> > regenerated some months ago. Seems a pseudo-split-brain that should be
> > impossible on a replica 3 cluster but keeps happening.
> > Sadly going to ditch Gluster ASAP.
> >
> > Diego
> >
> > Il 18/01/2024 07:11, Hu Bert ha scritto:
> > > Good morning,
> > > heal still not running. Pending heals now sum up to 60K per
brick.
> > > Heal was starting instantly e.g. after server reboot with version
> > > 10.4, but doesn't with version 11. What could be wrong?
> > >
> > > I only see these errors on one of the "good" servers in
glustershd.log:
> > >
> > > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-0:
> > > remote operation failed.
> > > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
> > > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> > > f00681b}, {errno=2}, {error=No such file or directory}]
> > > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-1:
> > > remote operation failed.
> > > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
> > > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> > > d94dd11}, {errno=2}, {error=No such file or directory}]
> > >
> > > About 7K today. Any ideas? Someone?
> > >
> > >
> > > Best regards,
> > > Hubert
> > >
> > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <
> revirii at googlemail.com>:
> > >>
> > >> ok, finally managed to get all servers, volumes etc runnung,
but took
> > >> a couple of restarts, cksum checks etc.
> > >>
> > >> One problem: a volume doesn't heal automatically or
doesn't heal at
> all.
> > >>
> > >> gluster volume status
> > >> Status of volume: workdata
> > >> Gluster process                            TCP Port  RDMA
Port
> Online  Pid
> > >>
>
------------------------------------------------------------------------------
> > >> Brick glusterpub1:/gluster/md3/workdata    58832    0        
Y
> 3436
> > >> Brick glusterpub2:/gluster/md3/workdata    59315    0        
Y
> 1526
> > >> Brick glusterpub3:/gluster/md3/workdata    56917    0        
Y
> 1952
> > >> Brick glusterpub1:/gluster/md4/workdata    59688    0        
Y
> 3755
> > >> Brick glusterpub2:/gluster/md4/workdata    60271    0        
Y
> 2271
> > >> Brick glusterpub3:/gluster/md4/workdata    49461    0        
Y
> 2399
> > >> Brick glusterpub1:/gluster/md5/workdata    54651    0        
Y
> 4208
> > >> Brick glusterpub2:/gluster/md5/workdata    49685    0        
Y
> 2751
> > >> Brick glusterpub3:/gluster/md5/workdata    59202    0        
Y
> 2803
> > >> Brick glusterpub1:/gluster/md6/workdata    55829    0        
Y
> 4583
> > >> Brick glusterpub2:/gluster/md6/workdata    50455    0        
Y
> 3296
> > >> Brick glusterpub3:/gluster/md6/workdata    50262    0        
Y
> 3237
> > >> Brick glusterpub1:/gluster/md7/workdata    52238    0        
Y
> 5014
> > >> Brick glusterpub2:/gluster/md7/workdata    52474    0        
Y
> 3673
> > >> Brick glusterpub3:/gluster/md7/workdata    57966    0        
Y
> 3653
> > >> Self-heal Daemon on localhost              N/A      N/A      
Y
> 4141
> > >> Self-heal Daemon on glusterpub1            N/A      N/A      
Y
> 5570
> > >> Self-heal Daemon on glusterpub2            N/A      N/A      
Y
> 4139
> > >>
> > >> "gluster volume heal workdata info" lists a lot of
files per brick.
> > >> "gluster volume heal workdata statistics
heal-count" shows thousands
> > >> of files per brick.
> > >> "gluster volume heal workdata enable" has no
effect.
> > >>
> > >> gluster volume heal workdata full
> > >> Launching heal operation to perform full self heal on volume
workdata
> > >> has been successful
> > >> Use heal info commands to check status.
> > >>
> > >> -> not doing anything at all. And nothing happening on the
2 "good"
> > >> servers in e.g. glustershd.log. Heal was working as expected
on
> > >> version 10.4, but here... silence. Someone has an idea?
> > >>
> > >>
> > >> Best regards,
> > >> Hubert
> > >>
> > >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
> > >> <gilberto.nunes32 at gmail.com>:
> > >>>
> > >>> Ah! Indeed! You need to perform an upgrade in the clients
as well.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert <
> revirii at googlemail.com> escreveu:
> > >>>>
> > >>>> morning to those still reading :-)
> > >>>>
> > >>>> i found this:
>
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> > >>>>
> > >>>> there's a paragraph about "peer
rejected" with the same error
> message,
> > >>>> telling me: "Update the cluster.op-version"
- i had only updated the
> > >>>> server nodes, but not the clients. So upgrading the
> cluster.op-version
> > >>>> wasn't possible at this time. So... upgrading the
clients to version
> > >>>> 11.1 and then the op-version should solve the
problem?
> > >>>>
> > >>>>
> > >>>> Thx,
> > >>>> Hubert
> > >>>>
> > >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert
<
> revirii at googlemail.com>:
> > >>>>>
> > >>>>> Hi,
> > >>>>> just upgraded some gluster servers from version
10.4 to version
> 11.1.
> > >>>>> Debian bullseye & bookworm. When only
installing the packages:
> good,
> > >>>>> servers, volumes etc. work as expected.
> > >>>>>
> > >>>>> But one needs to test if the systems work after a
daemon and/or
> server
> > >>>>> restart. Well, did a reboot, and after that the
rebooted/restarted
> > >>>>> system is "out". Log message from
working node:
> > >>>>>
> > >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID:
106163]
> > >>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> > >>>>> 0-management: using the op-version 100000
> > >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID:
106490]
> > >>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> > >>>>> 0-glusterd: Received probe from uuid:
> > >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
> > >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID:
106010]
> > >>>>>
[glusterd-utils.c:3824:glusterd_compare_friend_volume]
> 0-management:
> > >>>>> Version of Cksums sourceimages differ. local
cksum = 2204642525,
> > >>>>> remote cksum = 1931483801 on peer gluster190
> > >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID:
106493]
> > >>>>>
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> > >>>>> Responded to gluster190 (0), ret: 0, op_ret: -1
> > >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID:
106493]
> > >>>>>
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> > >>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> > >>>>> gluster190, port: 0
> > >>>>>
> > >>>>> peer status from rebooted node:
> > >>>>>
> > >>>>> root at gluster190 ~ # gluster peer status
> > >>>>> Number of Peers: 2
> > >>>>>
> > >>>>> Hostname: gluster189
> > >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> > >>>>> State: Peer Rejected (Connected)
> > >>>>>
> > >>>>> Hostname: gluster188
> > >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> > >>>>> State: Peer Rejected (Connected)
> > >>>>>
> > >>>>> So the rebooted gluster190 is not accepted
anymore. And thus does
> not
> > >>>>> appear in "gluster volume status". I
then followed this guide:
> > >>>>>
> > >>>>>
>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
> > >>>>>
> > >>>>> Remove everything under /var/lib/glusterd/
(except glusterd.info)
> and
> > >>>>> restart glusterd service etc. Data get copied
from other nodes,
> > >>>>> 'gluster peer status' is ok again - but
the volume info is missing,
> > >>>>> /var/lib/glusterd/vols is empty. When syncing
this dir from another
> > >>>>> node, the volume then is available again, heals
start etc.
> > >>>>>
> > >>>>> Well, and just to be sure that everything's
working as it should,
> > >>>>> rebooted that node again - the rebooted node is
kicked out again,
> and
> > >>>>> you have to restart bringing it back again.
> > >>>>>
> > >>>>> Sry, but did i miss anything? Has someone
experienced similar
> > >>>>> problems? I'll probably downgrade to 10.4
again, that version was
> > >>>>> working...
> > >>>>>
> > >>>>>
> > >>>>> Thx,
> > >>>>> Hubert
> > >>>> ________
> > >>>>
> > >>>>
> > >>>>
> > >>>> Community Meeting Calendar:
> > >>>>
> > >>>> Schedule -
> > >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > >>>> Bridge: https://meet.google.com/cpu-eiue-hvk
> > >>>> Gluster-users mailing list
> > >>>> Gluster-users at gluster.org
> > >>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
> > > ________
> > >
> > >
> > >
> > > Community Meeting Calendar:
> > >
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > --
> > Diego Zuccato
> > DIFA - Dip. di Fisica e Astronomia
> > Servizi Informatici
> > Alma Mater Studiorum - Universit? di Bologna
> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > tel.: +39 051 20 95786
> >
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240119/52fc42a7/attachment.html>

Hu Bert

2024-Jan-20 07:44 UTC

head link

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Good morning,

thx Gilberto, did the first three (set to WARNING), but the last one
doesn't work. Anyway, with setting these three some new messages
appear:

[2024-01-20 07:23:58.561106 +0000] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd
is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
{errno=77}, {error=File descriptor in bad state}]
[2024-01-20 07:23:58.561177 +0000] E [MSGID: 108028]
[afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3:
Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor
in bad state]
[2024-01-20 07:23:58.562151 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11:
remote operation failed.
[{path=<gfid:faf59566-10f5-4ddd-8b0c-a87bc6a334fb>},
{gfid=faf59566-10f5-4ddd-8b0c-a87b
c6a334fb}, {errno=2}, {error=No such file or directory}]
[2024-01-20 07:23:58.562296 +0000] W [MSGID: 114061]
[client-common.c:530:client_pre_flush_v2] 0-workdata-client-11:
remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
{errno=77}, {error=File descriptor in bad state}]
[2024-01-20 07:23:58.860552 +0000] W [MSGID: 114061]
[client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd
is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
{errno=77}, {error=File descriptor in bad state}]
[2024-01-20 07:23:58.860608 +0000] E [MSGID: 108028]
[afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2:
Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor
in bad state]
[2024-01-20 07:23:58.861520 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8:
remote operation failed.
[{path=<gfid:60465723-5dc0-4ebe-aced-9f2c12e52642>},
{gfid=60465723-5dc0-4ebe-aced-9f2c1
2e52642}, {errno=2}, {error=No such file or directory}]
[2024-01-20 07:23:58.861640 +0000] W [MSGID: 114061]
[client-common.c:530:client_pre_flush_v2] 0-workdata-client-8:
remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
{errno=77}, {error=File descriptor in bad state}]

Not many log entries appear, only a few. Has someone seen error
messages like these? Setting diagnostics.brick-sys-log-level to DEBUG
shows way more log entries, uploaded it to:
https://file.io/spLhlcbMCzr8 - not sure if that helps.


Thx,
Hubert

Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira
<gilberto.nunes32 at gmail.com>:>
> gluster volume set testvol diagnostics.brick-log-level WARNING
> gluster volume set testvol diagnostics.brick-sys-log-level WARNING
> gluster volume set testvol diagnostics.client-log-level ERROR
> gluster --log-level=ERROR volume status
>
> ---
> Gilberto Nunes Ferreira
>
>
>
>
>
>
> Em sex., 19 de jan. de 2024 ?s 05:49, Hu Bert <revirii at
googlemail.com> escreveu:
>>
>> Hi Strahil,
>> hm, don't get me wrong, it may sound a bit stupid, but... where do
i
>> set the log level? Using debian...
>>
>>
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
>>
>> ls /etc/glusterfs/
>> eventsconfig.json  glusterfs-georep-logrotate
>> gluster-rsyslog-5.8.conf  group-db-workload       group-gluster-block
>>  group-nl-cache  group-virt.example  logger.conf.example
>> glusterd.vol       glusterfs-logrotate
>> gluster-rsyslog-7.2.conf  group-distributed-virt  group-metadata-cache
>>  group-samba     gsyncd.conf         thin-arbiter.vol
>>
>> checked: /etc/glusterfs/logger.conf.example
>>
>> # To enable enhanced logging capabilities,
>> #
>> # 1. rename this file to /etc/glusterfs/logger.conf
>> #
>> # 2. rename /etc/rsyslog.d/gluster.conf.example to
>> #    /etc/rsyslog.d/gluster.conf
>> #
>> # This change requires restart of all gluster services/volumes and
>> # rsyslog.
>>
>> tried (to test): /etc/glusterfs/logger.conf with "
LOG_LEVEL='WARNING' "
>>
>> restart glusterd on that node, but this doesn't work, log-level
stays
>> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably
>> /etc/rsyslog.conf on debian. But first it would be better to know
>> where to set the log-level for glusterd.
>>
>> Depending on how much the DEBUG log-level talks ;-) i could assign up
>> to 100G to /var
>>
>>
>> Thx & best regards,
>> Hubert
>>
>>
>> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
>> <hunter86_bg at yahoo.com>:
>> >
>> > Are you able to set the logs to debug level ?
>> > It might provide a clue what it is going on.
>> >
>> > Best Regards,
>> > Strahil Nikolov
>> >
>> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
>> > <diego.zuccato at unibo.it> wrote:
>> > That's the same kind of errors I keep seeing on my 2 clusters,
>> > regenerated some months ago. Seems a pseudo-split-brain that
should be
>> > impossible on a replica 3 cluster but keeps happening.
>> > Sadly going to ditch Gluster ASAP.
>> >
>> > Diego
>> >
>> > Il 18/01/2024 07:11, Hu Bert ha scritto:
>> > > Good morning,
>> > > heal still not running. Pending heals now sum up to 60K per
brick.
>> > > Heal was starting instantly e.g. after server reboot with
version
>> > > 10.4, but doesn't with version 11. What could be wrong?
>> > >
>> > > I only see these errors on one of the "good"
servers in glustershd.log:
>> > >
>> > > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
>> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-0:
>> > > remote operation failed.
>> > > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
>> > > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
>> > > f00681b}, {errno=2}, {error=No such file or directory}]
>> > > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
>> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-1:
>> > > remote operation failed.
>> > > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
>> > > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
>> > > d94dd11}, {errno=2}, {error=No such file or directory}]
>> > >
>> > > About 7K today. Any ideas? Someone?
>> > >
>> > >
>> > > Best regards,
>> > > Hubert
>> > >
>> > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert
<revirii at googlemail.com>:
>> > >>
>> > >> ok, finally managed to get all servers, volumes etc
runnung, but took
>> > >> a couple of restarts, cksum checks etc.
>> > >>
>> > >> One problem: a volume doesn't heal automatically or
doesn't heal at all.
>> > >>
>> > >> gluster volume status
>> > >> Status of volume: workdata
>> > >> Gluster process                            TCP Port  RDMA
Port  Online  Pid
>> > >>
------------------------------------------------------------------------------
>> > >> Brick glusterpub1:/gluster/md3/workdata    58832    0    
Y      3436
>> > >> Brick glusterpub2:/gluster/md3/workdata    59315    0    
Y      1526
>> > >> Brick glusterpub3:/gluster/md3/workdata    56917    0    
Y      1952
>> > >> Brick glusterpub1:/gluster/md4/workdata    59688    0    
Y      3755
>> > >> Brick glusterpub2:/gluster/md4/workdata    60271    0    
Y      2271
>> > >> Brick glusterpub3:/gluster/md4/workdata    49461    0    
Y      2399
>> > >> Brick glusterpub1:/gluster/md5/workdata    54651    0    
Y      4208
>> > >> Brick glusterpub2:/gluster/md5/workdata    49685    0    
Y      2751
>> > >> Brick glusterpub3:/gluster/md5/workdata    59202    0    
Y      2803
>> > >> Brick glusterpub1:/gluster/md6/workdata    55829    0    
Y      4583
>> > >> Brick glusterpub2:/gluster/md6/workdata    50455    0    
Y      3296
>> > >> Brick glusterpub3:/gluster/md6/workdata    50262    0    
Y      3237
>> > >> Brick glusterpub1:/gluster/md7/workdata    52238    0    
Y      5014
>> > >> Brick glusterpub2:/gluster/md7/workdata    52474    0    
Y      3673
>> > >> Brick glusterpub3:/gluster/md7/workdata    57966    0    
Y      3653
>> > >> Self-heal Daemon on localhost              N/A      N/A  
Y      4141
>> > >> Self-heal Daemon on glusterpub1            N/A      N/A  
Y      5570
>> > >> Self-heal Daemon on glusterpub2            N/A      N/A  
Y      4139
>> > >>
>> > >> "gluster volume heal workdata info" lists a lot
of files per brick.
>> > >> "gluster volume heal workdata statistics
heal-count" shows thousands
>> > >> of files per brick.
>> > >> "gluster volume heal workdata enable" has no
effect.
>> > >>
>> > >> gluster volume heal workdata full
>> > >> Launching heal operation to perform full self heal on
volume workdata
>> > >> has been successful
>> > >> Use heal info commands to check status.
>> > >>
>> > >> -> not doing anything at all. And nothing happening on
the 2 "good"
>> > >> servers in e.g. glustershd.log. Heal was working as
expected on
>> > >> version 10.4, but here... silence. Someone has an idea?
>> > >>
>> > >>
>> > >> Best regards,
>> > >> Hubert
>> > >>
>> > >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto
Ferreira
>> > >> <gilberto.nunes32 at gmail.com>:
>> > >>>
>> > >>> Ah! Indeed! You need to perform an upgrade in the
clients as well.
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert
<revirii at googlemail.com> escreveu:
>> > >>>>
>> > >>>> morning to those still reading :-)
>> > >>>>
>> > >>>> i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
>> > >>>>
>> > >>>> there's a paragraph about "peer
rejected" with the same error message,
>> > >>>> telling me: "Update the
cluster.op-version" - i had only updated the
>> > >>>> server nodes, but not the clients. So upgrading
the cluster.op-version
>> > >>>> wasn't possible at this time. So... upgrading
the clients to version
>> > >>>> 11.1 and then the op-version should solve the
problem?
>> > >>>>
>> > >>>>
>> > >>>> Thx,
>> > >>>> Hubert
>> > >>>>
>> > >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu
Bert <revirii at googlemail.com>:
>> > >>>>>
>> > >>>>> Hi,
>> > >>>>> just upgraded some gluster servers from
version 10.4 to version 11.1.
>> > >>>>> Debian bullseye & bookworm. When only
installing the packages: good,
>> > >>>>> servers, volumes etc. work as expected.
>> > >>>>>
>> > >>>>> But one needs to test if the systems work
after a daemon and/or server
>> > >>>>> restart. Well, did a reboot, and after that
the rebooted/restarted
>> > >>>>> system is "out". Log message from
working node:
>> > >>>>>
>> > >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID:
106163]
>> > >>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
>> > >>>>> 0-management: using the op-version 100000
>> > >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID:
106490]
>> > >>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
>> > >>>>> 0-glusterd: Received probe from uuid:
>> > >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
>> > >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID:
106010]
>> > >>>>>
[glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
>> > >>>>> Version of Cksums sourceimages differ. local
cksum = 2204642525,
>> > >>>>> remote cksum = 1931483801 on peer gluster190
>> > >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID:
106493]
>> > >>>>>
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
>> > >>>>> Responded to gluster190 (0), ret: 0, op_ret:
-1
>> > >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID:
106493]
>> > >>>>>
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
>> > >>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e, host:
>> > >>>>> gluster190, port: 0
>> > >>>>>
>> > >>>>> peer status from rebooted node:
>> > >>>>>
>> > >>>>> root at gluster190 ~ # gluster peer status
>> > >>>>> Number of Peers: 2
>> > >>>>>
>> > >>>>> Hostname: gluster189
>> > >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
>> > >>>>> State: Peer Rejected (Connected)
>> > >>>>>
>> > >>>>> Hostname: gluster188
>> > >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
>> > >>>>> State: Peer Rejected (Connected)
>> > >>>>>
>> > >>>>> So the rebooted gluster190 is not accepted
anymore. And thus does not
>> > >>>>> appear in "gluster volume status".
I then followed this guide:
>> > >>>>>
>> > >>>>>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>> > >>>>>
>> > >>>>> Remove everything under /var/lib/glusterd/
(except glusterd.info) and
>> > >>>>> restart glusterd service etc. Data get copied
from other nodes,
>> > >>>>> 'gluster peer status' is ok again -
but the volume info is missing,
>> > >>>>> /var/lib/glusterd/vols is empty. When syncing
this dir from another
>> > >>>>> node, the volume then is available again,
heals start etc.
>> > >>>>>
>> > >>>>> Well, and just to be sure that
everything's working as it should,
>> > >>>>> rebooted that node again - the rebooted node
is kicked out again, and
>> > >>>>> you have to restart bringing it back again.
>> > >>>>>
>> > >>>>> Sry, but did i miss anything? Has someone
experienced similar
>> > >>>>> problems? I'll probably downgrade to 10.4
again, that version was
>> > >>>>> working...
>> > >>>>>
>> > >>>>>
>> > >>>>> Thx,
>> > >>>>> Hubert
>> > >>>> ________
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Community Meeting Calendar:
>> > >>>>
>> > >>>> Schedule -
>> > >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00
UTC
>> > >>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> > >>>> Gluster-users mailing list
>> > >>>> Gluster-users at gluster.org
>> > >>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
>> > > ________
>> > >
>> > >
>> > >
>> > > Community Meeting Calendar:
>> > >
>> > > Schedule -
>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> > > Bridge: https://meet.google.com/cpu-eiue-hvk
>> > > Gluster-users mailing list
>> > > Gluster-users at gluster.org
>> > > https://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> > --
>> > Diego Zuccato
>> > DIFA - Dip. di Fisica e Astronomia
>> > Servizi Informatici
>> > Alma Mater Studiorum - Universit? di Bologna
>> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>> > tel.: +39 051 20 95786
>> >
>> > ________
>> >
>> >
>> >
>> > Community Meeting Calendar:
>> >
>> > Schedule -
>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> > Bridge: https://meet.google.com/cpu-eiue-hvk
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> > ________
>> >
>> >
>> >
>> > Community Meeting Calendar:
>> >
>> > Schedule -
>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> > Bridge: https://meet.google.com/cpu-eiue-hvk
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users

Seemingly Similar Threads

Search for more possibly parallel threads

Gluster users - Jan 2024 - Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Seemingly Similar Threads