thr3ads.net - Gluster users - [Gluster-users] Upgrade 10.4 -> 11.1 making problems [Jan 2024]

If this information is useful, please help other people find it:
Share via:

Hu Bert

2024-Jan-25 07:42 UTC

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Good morning,

hope i got it right... using:
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02

mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata

gfid 1:
getfattr -n trusted.glusterfs.pathinfo -e text
/mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
getfattr: Removing leading '/' from absolute path names
# file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
trusted.glusterfs.pathinfo="(<DISTRIBUTE:workdata-dht>
(<REPLICATE:workdata-replicate-3>
<POSIX(/gluster/md6/workdata):glusterpub1:/gluster/md6/workdata/images/133/283/13328349/128x128s.jpg>
<POSIX(/gluster/md6/workdata):glusterpub2:/gl
uster/md6/workdata/images/133/283/13328349/128x128s.jpg>))"

gfid 2:
getfattr -n trusted.glusterfs.pathinfo -e text
/mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
getfattr: Removing leading '/' from absolute path names
# file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
trusted.glusterfs.pathinfo="(<DISTRIBUTE:workdata-dht>
(<REPLICATE:workdata-replicate-2>
<POSIX(/gluster/md5/workdata):glusterpub2:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>
<POSIX(/gluster/md5/workdata
):glusterpub1:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>))"

glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the
misbehaving (not healing) one.

The file with gfid 1 is available under
/gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2
bricks, but missing on glusterpub3 brick.

gfid 2:
/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
is present on glusterpub1+2, but not on glusterpub3.


Thx,
Hubert

Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov
<hunter86_bg at yahoo.com>:>
> Hi,
>
> Can you find and check the files with gfids:
> 60465723-5dc0-4ebe-aced-9f2c12e52642
> faf59566-10f5-4ddd-8b0c-a87bc6a334fb
>
> Use 'getfattr -d -e hex -m. ' command from
https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output
.
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Jan 20, 2024 at 9:44, Hu Bert
> <revirii at googlemail.com> wrote:
> Good morning,
>
> thx Gilberto, did the first three (set to WARNING), but the last one
> doesn't work. Anyway, with setting these three some new messages
> appear:
>
> [2024-01-20 07:23:58.561106 +0000] W [MSGID: 114061]
> [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd
> is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.561177 +0000] E [MSGID: 108028]
> [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3:
> Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor
> in bad state]
> [2024-01-20 07:23:58.562151 +0000] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11:
> remote operation failed.
> [{path=<gfid:faf59566-10f5-4ddd-8b0c-a87bc6a334fb>},
> {gfid=faf59566-10f5-4ddd-8b0c-a87b
> c6a334fb}, {errno=2}, {error=No such file or directory}]
> [2024-01-20 07:23:58.562296 +0000] W [MSGID: 114061]
> [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11:
> remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.860552 +0000] W [MSGID: 114061]
> [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd
> is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.860608 +0000] E [MSGID: 108028]
> [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2:
> Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor
> in bad state]
> [2024-01-20 07:23:58.861520 +0000] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8:
> remote operation failed.
> [{path=<gfid:60465723-5dc0-4ebe-aced-9f2c12e52642>},
> {gfid=60465723-5dc0-4ebe-aced-9f2c1
> 2e52642}, {errno=2}, {error=No such file or directory}]
> [2024-01-20 07:23:58.861640 +0000] W [MSGID: 114061]
> [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8:
> remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
> {errno=77}, {error=File descriptor in bad state}]
>
> Not many log entries appear, only a few. Has someone seen error
> messages like these? Setting diagnostics.brick-sys-log-level to DEBUG
> shows way more log entries, uploaded it to:
> https://file.io/spLhlcbMCzr8 - not sure if that helps.
>
>
> Thx,
> Hubert
>
> Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira
> <gilberto.nunes32 at gmail.com>:
>
> >
> > gluster volume set testvol diagnostics.brick-log-level WARNING
> > gluster volume set testvol diagnostics.brick-sys-log-level WARNING
> > gluster volume set testvol diagnostics.client-log-level ERROR
> > gluster --log-level=ERROR volume status
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> >
> >
> >
> >
> >
> > Em sex., 19 de jan. de 2024 ?s 05:49, Hu Bert <revirii at
googlemail.com> escreveu:
> >>
> >> Hi Strahil,
> >> hm, don't get me wrong, it may sound a bit stupid, but...
where do i
> >> set the log level? Using debian...
> >>
> >>
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
> >>
> >> ls /etc/glusterfs/
> >> eventsconfig.json  glusterfs-georep-logrotate
> >> gluster-rsyslog-5.8.conf  group-db-workload     
group-gluster-block
> >>  group-nl-cache  group-virt.example  logger.conf.example
> >> glusterd.vol      glusterfs-logrotate
> >> gluster-rsyslog-7.2.conf  group-distributed-virt 
group-metadata-cache
> >>  group-samba    gsyncd.conf        thin-arbiter.vol
> >>
> >> checked: /etc/glusterfs/logger.conf.example
> >>
> >> # To enable enhanced logging capabilities,
> >> #
> >> # 1. rename this file to /etc/glusterfs/logger.conf
> >> #
> >> # 2. rename /etc/rsyslog.d/gluster.conf.example to
> >> #    /etc/rsyslog.d/gluster.conf
> >> #
> >> # This change requires restart of all gluster services/volumes and
> >> # rsyslog.
> >>
> >> tried (to test): /etc/glusterfs/logger.conf with "
LOG_LEVEL='WARNING' "
> >>
> >> restart glusterd on that node, but this doesn't work,
log-level stays
> >> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist.
Probably
> >> /etc/rsyslog.conf on debian. But first it would be better to know
> >> where to set the log-level for glusterd.
> >>
> >> Depending on how much the DEBUG log-level talks ;-) i could assign
up
> >> to 100G to /var
> >>
> >>
> >> Thx & best regards,
> >> Hubert
> >>
> >>
> >> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
> >> <hunter86_bg at yahoo.com>:
> >> >
> >> > Are you able to set the logs to debug level ?
> >> > It might provide a clue what it is going on.
> >> >
> >> > Best Regards,
> >> > Strahil Nikolov
> >> >
> >> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
> >> > <diego.zuccato at unibo.it> wrote:
> >> > That's the same kind of errors I keep seeing on my 2
clusters,
> >> > regenerated some months ago. Seems a pseudo-split-brain that
should be
> >> > impossible on a replica 3 cluster but keeps happening.
> >> > Sadly going to ditch Gluster ASAP.
> >> >
> >> > Diego
> >> >
> >> > Il 18/01/2024 07:11, Hu Bert ha scritto:
> >> > > Good morning,
> >> > > heal still not running. Pending heals now sum up to 60K
per brick.
> >> > > Heal was starting instantly e.g. after server reboot
with version
> >> > > 10.4, but doesn't with version 11. What could be
wrong?
> >> > >
> >> > > I only see these errors on one of the "good"
servers in glustershd.log:
> >> > >
> >> > > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
> >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-0:
> >> > > remote operation failed.
> >> > >
[{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
> >> > > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> >> > > f00681b}, {errno=2}, {error=No such file or directory}]
> >> > > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
> >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-1:
> >> > > remote operation failed.
> >> > >
[{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
> >> > > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> >> > > d94dd11}, {errno=2}, {error=No such file or directory}]
> >> > >
> >> > > About 7K today. Any ideas? Someone?
> >> > >
> >> > >
> >> > > Best regards,
> >> > > Hubert
> >> > >
> >> > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert
<revirii at googlemail.com>:
> >> > >>
> >> > >> ok, finally managed to get all servers, volumes etc
runnung, but took
> >> > >> a couple of restarts, cksum checks etc.
> >> > >>
> >> > >> One problem: a volume doesn't heal automatically
or doesn't heal at all.
> >> > >>
> >> > >> gluster volume status
> >> > >> Status of volume: workdata
> >> > >> Gluster process                            TCP Port 
RDMA Port  Online  Pid
> >> > >>
------------------------------------------------------------------------------
> >> > >> Brick glusterpub1:/gluster/md3/workdata    58832   
0          Y      3436
> >> > >> Brick glusterpub2:/gluster/md3/workdata    59315   
0          Y      1526
> >> > >> Brick glusterpub3:/gluster/md3/workdata    56917   
0          Y      1952
> >> > >> Brick glusterpub1:/gluster/md4/workdata    59688   
0          Y      3755
> >> > >> Brick glusterpub2:/gluster/md4/workdata    60271   
0          Y      2271
> >> > >> Brick glusterpub3:/gluster/md4/workdata    49461   
0          Y      2399
> >> > >> Brick glusterpub1:/gluster/md5/workdata    54651   
0          Y      4208
> >> > >> Brick glusterpub2:/gluster/md5/workdata    49685   
0          Y      2751
> >> > >> Brick glusterpub3:/gluster/md5/workdata    59202   
0          Y      2803
> >> > >> Brick glusterpub1:/gluster/md6/workdata    55829   
0          Y      4583
> >> > >> Brick glusterpub2:/gluster/md6/workdata    50455   
0          Y      3296
> >> > >> Brick glusterpub3:/gluster/md6/workdata    50262   
0          Y      3237
> >> > >> Brick glusterpub1:/gluster/md7/workdata    52238   
0          Y      5014
> >> > >> Brick glusterpub2:/gluster/md7/workdata    52474   
0          Y      3673
> >> > >> Brick glusterpub3:/gluster/md7/workdata    57966   
0          Y      3653
> >> > >> Self-heal Daemon on localhost              N/A     
N/A        Y      4141
> >> > >> Self-heal Daemon on glusterpub1            N/A     
N/A        Y      5570
> >> > >> Self-heal Daemon on glusterpub2            N/A     
N/A        Y      4139
> >> > >>
> >> > >> "gluster volume heal workdata info" lists
a lot of files per brick.
> >> > >> "gluster volume heal workdata statistics
heal-count" shows thousands
> >> > >> of files per brick.
> >> > >> "gluster volume heal workdata enable" has
no effect.
> >> > >>
> >> > >> gluster volume heal workdata full
> >> > >> Launching heal operation to perform full self heal
on volume workdata
> >> > >> has been successful
> >> > >> Use heal info commands to check status.
> >> > >>
> >> > >> -> not doing anything at all. And nothing
happening on the 2 "good"
> >> > >> servers in e.g. glustershd.log. Heal was working as
expected on
> >> > >> version 10.4, but here... silence. Someone has an
idea?
> >> > >>
> >> > >>
> >> > >> Best regards,
> >> > >> Hubert
> >> > >>
> >> > >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto
Ferreira
> >> > >> <gilberto.nunes32 at gmail.com>:
> >> > >>>
> >> > >>> Ah! Indeed! You need to perform an upgrade in
the clients as well.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert
<revirii at googlemail.com> escreveu:
> >> > >>>>
> >> > >>>> morning to those still reading :-)
> >> > >>>>
> >> > >>>> i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> >> > >>>>
> >> > >>>> there's a paragraph about "peer
rejected" with the same error message,
> >> > >>>> telling me: "Update the
cluster.op-version" - i had only updated the
> >> > >>>> server nodes, but not the clients. So
upgrading the cluster.op-version
> >> > >>>> wasn't possible at this time. So...
upgrading the clients to version
> >> > >>>> 11.1 and then the op-version should solve
the problem?
> >> > >>>>
> >> > >>>>
> >> > >>>> Thx,
> >> > >>>> Hubert
> >> > >>>>
> >> > >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb
Hu Bert <revirii at googlemail.com>:
> >> > >>>>>
> >> > >>>>> Hi,
> >> > >>>>> just upgraded some gluster servers from
version 10.4 to version 11.1.
> >> > >>>>> Debian bullseye & bookworm. When
only installing the packages: good,
> >> > >>>>> servers, volumes etc. work as expected.
> >> > >>>>>
> >> > >>>>> But one needs to test if the systems
work after a daemon and/or server
> >> > >>>>> restart. Well, did a reboot, and after
that the rebooted/restarted
> >> > >>>>> system is "out". Log message
from working node:
> >> > >>>>>
> >> > >>>>> [2024-01-15 08:02:21.585694 +0000] I
[MSGID: 106163]
> >> > >>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> >> > >>>>> 0-management: using the op-version
100000
> >> > >>>>> [2024-01-15 08:02:21.589601 +0000] I
[MSGID: 106490]
> >> > >>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> >> > >>>>> 0-glusterd: Received probe from uuid:
> >> > >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
> >> > >>>>> [2024-01-15 08:02:23.608349 +0000] E
[MSGID: 106010]
> >> > >>>>>
[glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> >> > >>>>> Version of Cksums sourceimages differ.
local cksum = 2204642525,
> >> > >>>>> remote cksum = 1931483801 on peer
gluster190
> >> > >>>>> [2024-01-15 08:02:23.608584 +0000] I
[MSGID: 106493]
> >> > >>>>>
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> >> > >>>>> Responded to gluster190 (0), ret: 0,
op_ret: -1
> >> > >>>>> [2024-01-15 08:02:23.613553 +0000] I
[MSGID: 106493]
> >> > >>>>>
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> >> > >>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> >> > >>>>> gluster190, port: 0
> >> > >>>>>
> >> > >>>>> peer status from rebooted node:
> >> > >>>>>
> >> > >>>>> root at gluster190 ~ # gluster peer
status
> >> > >>>>> Number of Peers: 2
> >> > >>>>>
> >> > >>>>> Hostname: gluster189
> >> > >>>>> Uuid:
50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> >> > >>>>> State: Peer Rejected (Connected)
> >> > >>>>>
> >> > >>>>> Hostname: gluster188
> >> > >>>>> Uuid:
e15a33fe-e2f7-47cf-ac53-a3b34136555d
> >> > >>>>> State: Peer Rejected (Connected)
> >> > >>>>>
> >> > >>>>> So the rebooted gluster190 is not
accepted anymore. And thus does not
> >> > >>>>> appear in "gluster volume
status". I then followed this guide:
> >> > >>>>>
> >> > >>>>>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
> >> > >>>>>
> >> > >>>>> Remove everything under
/var/lib/glusterd/ (except glusterd.info) and
> >> > >>>>> restart glusterd service etc. Data get
copied from other nodes,
> >> > >>>>> 'gluster peer status' is ok
again - but the volume info is missing,
> >> > >>>>> /var/lib/glusterd/vols is empty. When
syncing this dir from another
> >> > >>>>> node, the volume then is available
again, heals start etc.
> >> > >>>>>
> >> > >>>>> Well, and just to be sure that
everything's working as it should,
> >> > >>>>> rebooted that node again - the rebooted
node is kicked out again, and
> >> > >>>>> you have to restart bringing it back
again.
> >> > >>>>>
> >> > >>>>> Sry, but did i miss anything? Has
someone experienced similar
> >> > >>>>> problems? I'll probably downgrade to
10.4 again, that version was
> >> > >>>>> working...
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Thx,
> >> > >>>>> Hubert
> >> > >>>> ________
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> Community Meeting Calendar:
> >> > >>>>
> >> > >>>> Schedule -
> >> > >>>> Every 2nd and 4th Tuesday at 14:30 IST /
09:00 UTC
> >> > >>>> Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > >>>> Gluster-users mailing list
> >> > >>>> Gluster-users at gluster.org
> >> > >>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
> >> > > ________
> >> > >
> >> > >
> >> > >
> >> > > Community Meeting Calendar:
> >> > >
> >> > > Schedule -
> >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > > Gluster-users mailing list
> >> > > Gluster-users at gluster.org
> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> >
> >> > --
> >> > Diego Zuccato
> >> > DIFA - Dip. di Fisica e Astronomia
> >> > Servizi Informatici
> >> > Alma Mater Studiorum - Universit? di Bologna
> >> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> >> > tel.: +39 051 20 95786
> >> >
> >> > ________
> >> >
> >> >
> >> >
> >> > Community Meeting Calendar:
> >> >
> >> > Schedule -
> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> > Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> >
> >> > ________
> >> >
> >> >
> >> >
> >> > Community Meeting Calendar:
> >> >
> >> > Schedule -
> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> > Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> ________
> >>
> >>
> >>
> >> Community Meeting Calendar:
> >>
> >> Schedule -
> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> Bridge: https://meet.google.com/cpu-eiue-hvk
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users

Strahil Nikolov

2024-Jan-27 05:13 UTC

head link

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

You don't need to mount it.
Like this :
# getfattr -d -e hex -m.
/path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e
# file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e
trusted.gfid=0x00462be83e6149318bdadae1645c639e
trusted.gfid2path.05fcbdafdeea18ab=0x30326333373930632d386637622d346436652d393464362d3936393132313930643131312f66696c656c6f636b696e672e7079
trusted.glusterfs.mdata=0x010000000000000000000000006170340c0000000025b6a745000000006170340c0000000020efb577000000006170340c0000000020d42b07
trusted.glusterfs.shard.block-size=0x0000000004000000
trusted.glusterfs.shard.file-size=0x00000000000000cd000000000000000000000000000000010000000000000000


Best Regards,
Strahil Nikolov



? ?????????, 25 ?????? 2024 ?. ? 09:42:46 ?. ???????+2, Hu Bert <revirii at
googlemail.com> ??????:





Good morning,

hope i got it right... using:
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3.1/html/administration_guide/ch27s02

mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata

gfid 1:
getfattr -n trusted.glusterfs.pathinfo -e text
/mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
getfattr: Removing leading '/' from absolute path names
# file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb
trusted.glusterfs.pathinfo="(<DISTRIBUTE:workdata-dht>
(<REPLICATE:workdata-replicate-3>
<POSIX(/gluster/md6/workdata):glusterpub1:/gluster/md6/workdata/images/133/283/13328349/128x128s.jpg>
<POSIX(/gluster/md6/workdata):glusterpub2:/gl
uster/md6/workdata/images/133/283/13328349/128x128s.jpg>))"

gfid 2:
getfattr -n trusted.glusterfs.pathinfo -e text
/mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
getfattr: Removing leading '/' from absolute path names
# file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642
trusted.glusterfs.pathinfo="(<DISTRIBUTE:workdata-dht>
(<REPLICATE:workdata-replicate-2>
<POSIX(/gluster/md5/workdata):glusterpub2:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>
<POSIX(/gluster/md5/workdata
):glusterpub1:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>))"

glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the
misbehaving (not healing) one.

The file with gfid 1 is available under
/gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2
bricks, but missing on glusterpub3 brick.

gfid 2:
/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642
is present on glusterpub1+2, but not on glusterpub3.


Thx,
Hubert

Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov
<hunter86_bg at yahoo.com>:
>
> Hi,
>
> Can you find and check the files with gfids:
> 60465723-5dc0-4ebe-aced-9f2c12e52642
> faf59566-10f5-4ddd-8b0c-a87bc6a334fb
>
> Use 'getfattr -d -e hex -m. ' command from
https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output
.
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Jan 20, 2024 at 9:44, Hu Bert
> <revirii at googlemail.com> wrote:
> Good morning,
>
> thx Gilberto, did the first three (set to WARNING), but the last one
> doesn't work. Anyway, with setting these three some new messages
> appear:
>
> [2024-01-20 07:23:58.561106 +0000] W [MSGID: 114061]
> [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd
> is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.561177 +0000] E [MSGID: 108028]
> [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3:
> Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor
> in bad state]
> [2024-01-20 07:23:58.562151 +0000] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11:
> remote operation failed.
> [{path=<gfid:faf59566-10f5-4ddd-8b0c-a87bc6a334fb>},
> {gfid=faf59566-10f5-4ddd-8b0c-a87b
> c6a334fb}, {errno=2}, {error=No such file or directory}]
> [2024-01-20 07:23:58.562296 +0000] W [MSGID: 114061]
> [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11:
> remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.860552 +0000] W [MSGID: 114061]
> [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd
> is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
> {errno=77}, {error=File descriptor in bad state}]
> [2024-01-20 07:23:58.860608 +0000] E [MSGID: 108028]
> [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2:
> Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor
> in bad state]
> [2024-01-20 07:23:58.861520 +0000] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8:
> remote operation failed.
> [{path=<gfid:60465723-5dc0-4ebe-aced-9f2c12e52642>},
> {gfid=60465723-5dc0-4ebe-aced-9f2c1
> 2e52642}, {errno=2}, {error=No such file or directory}]
> [2024-01-20 07:23:58.861640 +0000] W [MSGID: 114061]
> [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8:
> remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},
> {errno=77}, {error=File descriptor in bad state}]
>
> Not many log entries appear, only a few. Has someone seen error
> messages like these? Setting diagnostics.brick-sys-log-level to DEBUG
> shows way more log entries, uploaded it to:
> https://file.io/spLhlcbMCzr8 - not sure if that helps.
>
>
> Thx,
> Hubert
>
> Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira
> <gilberto.nunes32 at gmail.com>:
>
> >
> > gluster volume set testvol diagnostics.brick-log-level WARNING
> > gluster volume set testvol diagnostics.brick-sys-log-level WARNING
> > gluster volume set testvol diagnostics.client-log-level ERROR
> > gluster --log-level=ERROR volume status
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> >
> >
> >
> >
> >
> > Em sex., 19 de jan. de 2024 ?s 05:49, Hu Bert <revirii at
googlemail.com> escreveu:
> >>
> >> Hi Strahil,
> >> hm, don't get me wrong, it may sound a bit stupid, but...
where do i
> >> set the log level? Using debian...
> >>
> >>
https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level
> >>
> >> ls /etc/glusterfs/
> >> eventsconfig.json? glusterfs-georep-logrotate
> >> gluster-rsyslog-5.8.conf? group-db-workload? ? ?
group-gluster-block
> >>? group-nl-cache? group-virt.example? logger.conf.example
> >> glusterd.vol? ? ? glusterfs-logrotate
> >> gluster-rsyslog-7.2.conf? group-distributed-virt?
group-metadata-cache
> >>? group-samba? ? gsyncd.conf? ? ? ? thin-arbiter.vol
> >>
> >> checked: /etc/glusterfs/logger.conf.example
> >>
> >> # To enable enhanced logging capabilities,
> >> #
> >> # 1. rename this file to /etc/glusterfs/logger.conf
> >> #
> >> # 2. rename /etc/rsyslog.d/gluster.conf.example to
> >> #? ? /etc/rsyslog.d/gluster.conf
> >> #
> >> # This change requires restart of all gluster services/volumes and
> >> # rsyslog.
> >>
> >> tried (to test): /etc/glusterfs/logger.conf with "
LOG_LEVEL='WARNING' "
> >>
> >> restart glusterd on that node, but this doesn't work,
log-level stays
> >> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist.
Probably
> >> /etc/rsyslog.conf on debian. But first it would be better to know
> >> where to set the log-level for glusterd.
> >>
> >> Depending on how much the DEBUG log-level talks ;-) i could assign
up
> >> to 100G to /var
> >>
> >>
> >> Thx & best regards,
> >> Hubert
> >>
> >>
> >> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov
> >> <hunter86_bg at yahoo.com>:
> >> >
> >> > Are you able to set the logs to debug level ?
> >> > It might provide a clue what it is going on.
> >> >
> >> > Best Regards,
> >> > Strahil Nikolov
> >> >
> >> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato
> >> > <diego.zuccato at unibo.it> wrote:
> >> > That's the same kind of errors I keep seeing on my 2
clusters,
> >> > regenerated some months ago. Seems a pseudo-split-brain that
should be
> >> > impossible on a replica 3 cluster but keeps happening.
> >> > Sadly going to ditch Gluster ASAP.
> >> >
> >> > Diego
> >> >
> >> > Il 18/01/2024 07:11, Hu Bert ha scritto:
> >> > > Good morning,
> >> > > heal still not running. Pending heals now sum up to 60K
per brick.
> >> > > Heal was starting instantly e.g. after server reboot
with version
> >> > > 10.4, but doesn't with version 11. What could be
wrong?
> >> > >
> >> > > I only see these errors on one of the "good"
servers in glustershd.log:
> >> > >
> >> > > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]
> >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-0:
> >> > > remote operation failed.
> >> > >
[{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},
> >> > > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e
> >> > > f00681b}, {errno=2}, {error=No such file or directory}]
> >> > > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]
> >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk]
0-workdata-client-1:
> >> > > remote operation failed.
> >> > >
[{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},
> >> > > {gfid=3e9b178c-ae1f-4d85-ae47-fc539
> >> > > d94dd11}, {errno=2}, {error=No such file or directory}]
> >> > >
> >> > > About 7K today. Any ideas? Someone?
> >> > >
> >> > >
> >> > > Best regards,
> >> > > Hubert
> >> > >
> >> > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert
<revirii at googlemail.com>:
> >> > >>
> >> > >> ok, finally managed to get all servers, volumes etc
runnung, but took
> >> > >> a couple of restarts, cksum checks etc.
> >> > >>
> >> > >> One problem: a volume doesn't heal automatically
or doesn't heal at all.
> >> > >>
> >> > >> gluster volume status
> >> > >> Status of volume: workdata
> >> > >> Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port?
RDMA Port? Online? Pid
> >> > >>
------------------------------------------------------------------------------
> >> > >> Brick glusterpub1:/gluster/md3/workdata? ? 58832? ?
0? ? ? ? ? Y? ? ? 3436
> >> > >> Brick glusterpub2:/gluster/md3/workdata? ? 59315? ?
0? ? ? ? ? Y? ? ? 1526
> >> > >> Brick glusterpub3:/gluster/md3/workdata? ? 56917? ?
0? ? ? ? ? Y? ? ? 1952
> >> > >> Brick glusterpub1:/gluster/md4/workdata? ? 59688? ?
0? ? ? ? ? Y? ? ? 3755
> >> > >> Brick glusterpub2:/gluster/md4/workdata? ? 60271? ?
0? ? ? ? ? Y? ? ? 2271
> >> > >> Brick glusterpub3:/gluster/md4/workdata? ? 49461? ?
0? ? ? ? ? Y? ? ? 2399
> >> > >> Brick glusterpub1:/gluster/md5/workdata? ? 54651? ?
0? ? ? ? ? Y? ? ? 4208
> >> > >> Brick glusterpub2:/gluster/md5/workdata? ? 49685? ?
0? ? ? ? ? Y? ? ? 2751
> >> > >> Brick glusterpub3:/gluster/md5/workdata? ? 59202? ?
0? ? ? ? ? Y? ? ? 2803
> >> > >> Brick glusterpub1:/gluster/md6/workdata? ? 55829? ?
0? ? ? ? ? Y? ? ? 4583
> >> > >> Brick glusterpub2:/gluster/md6/workdata? ? 50455? ?
0? ? ? ? ? Y? ? ? 3296
> >> > >> Brick glusterpub3:/gluster/md6/workdata? ? 50262? ?
0? ? ? ? ? Y? ? ? 3237
> >> > >> Brick glusterpub1:/gluster/md7/workdata? ? 52238? ?
0? ? ? ? ? Y? ? ? 5014
> >> > >> Brick glusterpub2:/gluster/md7/workdata? ? 52474? ?
0? ? ? ? ? Y? ? ? 3673
> >> > >> Brick glusterpub3:/gluster/md7/workdata? ? 57966? ?
0? ? ? ? ? Y? ? ? 3653
> >> > >> Self-heal Daemon on localhost? ? ? ? ? ? ? N/A? ? ?
N/A? ? ? ? Y? ? ? 4141
> >> > >> Self-heal Daemon on glusterpub1? ? ? ? ? ? N/A? ? ?
N/A? ? ? ? Y? ? ? 5570
> >> > >> Self-heal Daemon on glusterpub2? ? ? ? ? ? N/A? ? ?
N/A? ? ? ? Y? ? ? 4139
> >> > >>
> >> > >> "gluster volume heal workdata info" lists
a lot of files per brick.
> >> > >> "gluster volume heal workdata statistics
heal-count" shows thousands
> >> > >> of files per brick.
> >> > >> "gluster volume heal workdata enable" has
no effect.
> >> > >>
> >> > >> gluster volume heal workdata full
> >> > >> Launching heal operation to perform full self heal
on volume workdata
> >> > >> has been successful
> >> > >> Use heal info commands to check status.
> >> > >>
> >> > >> -> not doing anything at all. And nothing
happening on the 2 "good"
> >> > >> servers in e.g. glustershd.log. Heal was working as
expected on
> >> > >> version 10.4, but here... silence. Someone has an
idea?
> >> > >>
> >> > >>
> >> > >> Best regards,
> >> > >> Hubert
> >> > >>
> >> > >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto
Ferreira
> >> > >> <gilberto.nunes32 at gmail.com>:
> >> > >>>
> >> > >>> Ah! Indeed! You need to perform an upgrade in
the clients as well.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert
<revirii at googlemail.com> escreveu:
> >> > >>>>
> >> > >>>> morning to those still reading :-)
> >> > >>>>
> >> > >>>> i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> >> > >>>>
> >> > >>>> there's a paragraph about "peer
rejected" with the same error message,
> >> > >>>> telling me: "Update the
cluster.op-version" - i had only updated the
> >> > >>>> server nodes, but not the clients. So
upgrading the cluster.op-version
> >> > >>>> wasn't possible at this time. So...
upgrading the clients to version
> >> > >>>> 11.1 and then the op-version should solve
the problem?
> >> > >>>>
> >> > >>>>
> >> > >>>> Thx,
> >> > >>>> Hubert
> >> > >>>>
> >> > >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb
Hu Bert <revirii at googlemail.com>:
> >> > >>>>>
> >> > >>>>> Hi,
> >> > >>>>> just upgraded some gluster servers from
version 10.4 to version 11.1.
> >> > >>>>> Debian bullseye & bookworm. When
only installing the packages: good,
> >> > >>>>> servers, volumes etc. work as expected.
> >> > >>>>>
> >> > >>>>> But one needs to test if the systems
work after a daemon and/or server
> >> > >>>>> restart. Well, did a reboot, and after
that the rebooted/restarted
> >> > >>>>> system is "out". Log message
from working node:
> >> > >>>>>
> >> > >>>>> [2024-01-15 08:02:21.585694 +0000] I
[MSGID: 106163]
> >> > >>>>>
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> >> > >>>>> 0-management: using the op-version
100000
> >> > >>>>> [2024-01-15 08:02:21.589601 +0000] I
[MSGID: 106490]
> >> > >>>>>
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> >> > >>>>> 0-glusterd: Received probe from uuid:
> >> > >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e
> >> > >>>>> [2024-01-15 08:02:23.608349 +0000] E
[MSGID: 106010]
> >> > >>>>>
[glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> >> > >>>>> Version of Cksums sourceimages differ.
local cksum = 2204642525,
> >> > >>>>> remote cksum = 1931483801 on peer
gluster190
> >> > >>>>> [2024-01-15 08:02:23.608584 +0000] I
[MSGID: 106493]
> >> > >>>>>
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> >> > >>>>> Responded to gluster190 (0), ret: 0,
op_ret: -1
> >> > >>>>> [2024-01-15 08:02:23.613553 +0000] I
[MSGID: 106493]
> >> > >>>>>
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> >> > >>>>> Received RJT from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> >> > >>>>> gluster190, port: 0
> >> > >>>>>
> >> > >>>>> peer status from rebooted node:
> >> > >>>>>
> >> > >>>>> root at gluster190 ~ # gluster peer
status
> >> > >>>>> Number of Peers: 2
> >> > >>>>>
> >> > >>>>> Hostname: gluster189
> >> > >>>>> Uuid:
50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> >> > >>>>> State: Peer Rejected (Connected)
> >> > >>>>>
> >> > >>>>> Hostname: gluster188
> >> > >>>>> Uuid:
e15a33fe-e2f7-47cf-ac53-a3b34136555d
> >> > >>>>> State: Peer Rejected (Connected)
> >> > >>>>>
> >> > >>>>> So the rebooted gluster190 is not
accepted anymore. And thus does not
> >> > >>>>> appear in "gluster volume
status". I then followed this guide:
> >> > >>>>>
> >> > >>>>>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
> >> > >>>>>
> >> > >>>>> Remove everything under
/var/lib/glusterd/ (except glusterd.info) and
> >> > >>>>> restart glusterd service etc. Data get
copied from other nodes,
> >> > >>>>> 'gluster peer status' is ok
again - but the volume info is missing,
> >> > >>>>> /var/lib/glusterd/vols is empty. When
syncing this dir from another
> >> > >>>>> node, the volume then is available
again, heals start etc.
> >> > >>>>>
> >> > >>>>> Well, and just to be sure that
everything's working as it should,
> >> > >>>>> rebooted that node again - the rebooted
node is kicked out again, and
> >> > >>>>> you have to restart bringing it back
again.
> >> > >>>>>
> >> > >>>>> Sry, but did i miss anything? Has
someone experienced similar
> >> > >>>>> problems? I'll probably downgrade to
10.4 again, that version was
> >> > >>>>> working...
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Thx,
> >> > >>>>> Hubert
> >> > >>>> ________
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> Community Meeting Calendar:
> >> > >>>>
> >> > >>>> Schedule -
> >> > >>>> Every 2nd and 4th Tuesday at 14:30 IST /
09:00 UTC
> >> > >>>> Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > >>>> Gluster-users mailing list
> >> > >>>> Gluster-users at gluster.org
> >> > >>>>
https://lists.gluster.org/mailman/listinfo/gluster-users
> >> > > ________
> >> > >
> >> > >
> >> > >
> >> > > Community Meeting Calendar:
> >> > >
> >> > > Schedule -
> >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > > Gluster-users mailing list
> >> > > Gluster-users at gluster.org
> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> >
> >> > --
> >> > Diego Zuccato
> >> > DIFA - Dip. di Fisica e Astronomia
> >> > Servizi Informatici
> >> > Alma Mater Studiorum - Universit? di Bologna
> >> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> >> > tel.: +39 051 20 95786
> >> >
> >> > ________
> >> >
> >> >
> >> >
> >> > Community Meeting Calendar:
> >> >
> >> > Schedule -
> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> > Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> >
> >> > ________
> >> >
> >> >
> >> >
> >> > Community Meeting Calendar:
> >> >
> >> > Schedule -
> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> > Bridge: https://meet.google.com/cpu-eiue-hvk
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >> ________
> >>
> >>
> >>
> >> Community Meeting Calendar:
> >>
> >> Schedule -
> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> Bridge: https://meet.google.com/cpu-eiue-hvk
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users

Apparently Analagous Threads

Search for more reasonably related threads

Gluster users - Jan 2024 - Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Apparently Analagous Threads