thr3ads.net - Gluster users - [Gluster-users] Gluster 11.0 upgrade [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Marcus Pedersén

2023-Feb-20 13:52 UTC

[Gluster-users] Gluster 11.0 upgrade

Hi again Xavi,

I did some more testing on my virt machines
with same setup:
Number of Bricks: 1 x (2 + 1) = 3
If I do it the same way, I upgrade the arbiter first,
I get the same behavior that the bricks do not start
and the other nodes does not "see" the upgraded node.
If I upgrade one of the other nodes (non arbiter) and restart
glusterd on both the arbiter and the other the arbiter starts
the bricks and connects with the other upgraded node as expected.
If I upgrade the last node (non arbiter) it will fail to start
the bricks, same behaviour as the arbiter at first.
If I then copy the /var/lib/gluster/vols/<myvol> from the
upgraded (non arbiter) node to the other node that does not start the bricks
and replace /var/lib/gluster/vols/<myvol> with the copied directory
and restarts glusterd it works nicely after that.
Everything then works the way it should.

So the question is if the arbiter is treated in some other way
compared to the other nodes?

Some type of config is happening at the start of the glusterd that
makes the node fail?

Do I dare to continue to upgrade my real cluster with the above described way?

Thanks!

Regards
Marcus



On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n
wrote:> I made a recusive diff on the upgraded arbiter.
>
> /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> /home/marcus/gds-common is one of the other nodes still on gluster 10
>
> diff -r
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
/home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> 5c5
> < listen-port=60419
> ---
> > listen-port=0
> 11c11
> < brick-fsid=14764358630653534655
> ---
> > brick-fsid=0
> diff -r
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
/home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> 5c5
> < listen-port=0
> ---
> > listen-port=60891
> 11c11
> < brick-fsid=0
> ---
> > brick-fsid=1088380223149770683
> diff -r /var/lib/glusterd/vols/gds-common/cksum
/home/marcus/gds-common/cksum
> 1c1
> < info=3948700922
> ---
> > info=458813151
> diff -r
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
/home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> 3c3
> <     option shared-brick-count 1
> ---
> >     option shared-brick-count 0
> diff -r
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
/home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> 3c3
> <     option shared-brick-count 0
> ---
> >     option shared-brick-count 1
> diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info
> 23a24
> > nfs.disable=on
>
>
> I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
> After that I upgraded to 11 and the first 2 nodes was fine but on the third
> node I got the same behaviour: the brick never started.
>
> Thanks for the help!
>
> Regards
> Marcus
>
>
> On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you recognize the sender and know the
content is safe.
> >
> >
> > Hi Marcus,
> >
> > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:
> > Hi Xavi,
> > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > and started glusterd again.
> >
> > The only log that is not empty is glusterd.log, I attach the log
> > from the restart time. The brick log, glustershd.log and
glfsheal-gds-common.log is empty.
> >
> > This are the errors in the log:
> > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061]
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010]
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on
peer urd-gds-031
> > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010]
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on
peer urd-gds-032
> >
> > Geo replication is not setup so I guess there is nothing strange that
there is an error regarding georep.
> > The checksum error seems natural to be there as the other nodes are
still on version 10.
> >
> > No. The configurations should be identical.
> >
> > Can you try to compare volume definitions in
/var/lib/glusterd/vols/gds-common between the upgraded server and one of the old
ones ?
> >
> > Regards,
> >
> > Xavi
> >
> >
> > My previous exprience with upgrades is that the local bricks starts
and
> > gluster is up and running. No connection with the other nodes until
they are upgraded as well.
> >
> >
> > gluster peer status, gives the output:
> > Number of Peers: 2
> >
> > Hostname: urd-gds-032
> > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > State: Peer Rejected (Connected)
> >
> > Hostname: urd-gds-031
> > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > State: Peer Rejected (Connected)
> >
> > I suppose and guess that this is due to that the arbiter is version 11
> > and the other 2 nodes are version 10.
> >
> > Please let me know if I can provide any other information
> > to try to solve this issue.
> >
> > Many thanks!
> > Marcus
> >
> >
> > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you recognize the sender and know
the content is safe.
> > >
> > >
> > > Hi Marcus,
> > >
> > > these errors shouldn't prevent the bricks from starting.
Isn't there any other error or warning ?
> > >
> > > Regards,
> > >
> > > Xavi
> > >
> > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n
<marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>> wrote:
> > > Hi all,
> > > I started an upgrade to gluster 11.0 from 10.3 on one of my
clusters.
> > > OS: Debian bullseye
> > >
> > > Volume Name: gds-common
> > > Type: Replicate
> > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > Options Reconfigured:
> > > cluster.granular-entry-heal: on
> > > storage.fips-mode-rchecksum: on
> > > transport.address-family: inet
> > > nfs.disable: on
> > > performance.client-io-threads: off
> > >
> > > I started with the arbiter node, stopped all of gluster
> > > upgraded to 11.0 and all went fine.
> > > After upgrade I was able to see the other nodes and
> > > all nodes were connected.
> > > After a reboot on the arbiter nothing works the way it should.
> > > Both brick1 and brick2 has connection but no connection
> > > with the arbiter.
> > > On the arbiter glusterd has started and is listening on port
24007,
> > > the problem seems to be glusterfsd, it never starts!
> > >
> > > If I run: gluster volume status
> > >
> > > Status of volume: gds-common
> > > Gluster process                             TCP Port  RDMA Port 
Online  Pid
> > >
------------------------------------------------------------------------------
> > > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A       
N       N/A
> > > Self-heal Daemon on localhost               N/A       N/A       
N       N/A
> > >
> > > Task Status of Volume gds-common
> > >
------------------------------------------------------------------------------
> > > There are no active volume tasks
> > >
> > >
> > > In glusterd.log I find the following errors (arbiter node):
> > > [2023-02-17 12:30:40.519585 +0000] E
[gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed
<{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061]
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > >
> > > In brick/urd-gds-gds-common.log I find the following error:
> > > [2023-02-17 12:30:43.550753 +0000] E
[gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed
<{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > >
> > > I enclose both logfiles.
> > >
> > > How do I resolve this issue??
> > >
> > > Many thanks in advance!!
> > >
> > > Marcus
> > > ---
> > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar
dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > E-mailing SLU will result in SLU processing your personal data.
For more information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > ________
> > >
> > >
> > >
> > > Community Meeting Calendar:
> > >
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org><mailto:Gluster-users at gluster.org<mailto:Gluster-users
at gluster.org>>
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > ---
> > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For
more information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Marcus Pedersén

2023-Feb-21 07:48 UTC

head link

[Gluster-users] Gluster 11.0 upgrade

Hi again,
There is something going on when gluster starts.
I fired up my virt machines this morning to do some more
testing and one of the nodes did not come online
in the cluster.
Looking at that node I found that only glusterd and glusterfs
has started.
After:
systemctl stop glusterd
killall glusterd glusterfs glusterfsd
systemctl start glusterd

Gluster started correct and glusterd, glusterfs and glusterfsd
all started and the node was online in the cluster.

I just wanted to let you know if this might help.

Regards
Marcus


On Mon, Feb 20, 2023 at 02:52:52PM +0100, Marcus Peders?n
wrote:> CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and know the
content is safe.
>
>
> Hi again Xavi,
>
> I did some more testing on my virt machines
> with same setup:
> Number of Bricks: 1 x (2 + 1) = 3
> If I do it the same way, I upgrade the arbiter first,
> I get the same behavior that the bricks do not start
> and the other nodes does not "see" the upgraded node.
> If I upgrade one of the other nodes (non arbiter) and restart
> glusterd on both the arbiter and the other the arbiter starts
> the bricks and connects with the other upgraded node as expected.
> If I upgrade the last node (non arbiter) it will fail to start
> the bricks, same behaviour as the arbiter at first.
> If I then copy the /var/lib/gluster/vols/<myvol> from the
> upgraded (non arbiter) node to the other node that does not start the
bricks
> and replace /var/lib/gluster/vols/<myvol> with the copied directory
> and restarts glusterd it works nicely after that.
> Everything then works the way it should.
>
> So the question is if the arbiter is treated in some other way
> compared to the other nodes?
>
> Some type of config is happening at the start of the glusterd that
> makes the node fail?
>
> Do I dare to continue to upgrade my real cluster with the above described
way?
>
> Thanks!
>
> Regards
> Marcus
>
>
>
> On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n wrote:
> > I made a recusive diff on the upgraded arbiter.
> >
> > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > /home/marcus/gds-common is one of the other nodes still on gluster 10
> >
> > diff -r
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
/home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > 5c5
> > < listen-port=60419
> > ---
> > > listen-port=0
> > 11c11
> > < brick-fsid=14764358630653534655
> > ---
> > > brick-fsid=0
> > diff -r
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
/home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > 5c5
> > < listen-port=0
> > ---
> > > listen-port=60891
> > 11c11
> > < brick-fsid=0
> > ---
> > > brick-fsid=1088380223149770683
> > diff -r /var/lib/glusterd/vols/gds-common/cksum
/home/marcus/gds-common/cksum
> > 1c1
> > < info=3948700922
> > ---
> > > info=458813151
> > diff -r
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
/home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > 3c3
> > <     option shared-brick-count 1
> > ---
> > >     option shared-brick-count 0
> > diff -r
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
/home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > 3c3
> > <     option shared-brick-count 0
> > ---
> > >     option shared-brick-count 1
> > diff -r /var/lib/glusterd/vols/gds-common/info
/home/marcus/gds-common/info
> > 23a24
> > > nfs.disable=on
> >
> >
> > I setup 3 virt machines  and configured them with gluster 10 (arbiter
1).
> > After that I upgraded to 11 and the first 2 nodes was fine but on the
third
> > node I got the same behaviour: the brick never started.
> >
> > Thanks for the help!
> >
> > Regards
> > Marcus
> >
> >
> > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you recognize the sender and know
the content is safe.
> > >
> > >
> > > Hi Marcus,
> > >
> > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n
<marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote:
> > > Hi Xavi,
> > > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > > and started glusterd again.
> > >
> > > The only log that is not empty is glusterd.log, I attach the log
> > > from the restart time. The brick log, glustershd.log and
glfsheal-gds-common.log is empty.
> > >
> > > This are the errors in the log:
> > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061]
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010]
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on
peer urd-gds-031
> > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010]
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on
peer urd-gds-032
> > >
> > > Geo replication is not setup so I guess there is nothing strange
that there is an error regarding georep.
> > > The checksum error seems natural to be there as the other nodes
are still on version 10.
> > >
> > > No. The configurations should be identical.
> > >
> > > Can you try to compare volume definitions in
/var/lib/glusterd/vols/gds-common between the upgraded server and one of the old
ones ?
> > >
> > > Regards,
> > >
> > > Xavi
> > >
> > >
> > > My previous exprience with upgrades is that the local bricks
starts and
> > > gluster is up and running. No connection with the other nodes
until they are upgraded as well.
> > >
> > >
> > > gluster peer status, gives the output:
> > > Number of Peers: 2
> > >
> > > Hostname: urd-gds-032
> > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > > State: Peer Rejected (Connected)
> > >
> > > Hostname: urd-gds-031
> > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > > State: Peer Rejected (Connected)
> > >
> > > I suppose and guess that this is due to that the arbiter is
version 11
> > > and the other 2 nodes are version 10.
> > >
> > > Please let me know if I can provide any other information
> > > to try to solve this issue.
> > >
> > > Many thanks!
> > > Marcus
> > >
> > >
> > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > > CAUTION: This email originated from outside of the
organization. Do not click links or open attachments unless you recognize the
sender and know the content is safe.
> > > >
> > > >
> > > > Hi Marcus,
> > > >
> > > > these errors shouldn't prevent the bricks from starting.
Isn't there any other error or warning ?
> > > >
> > > > Regards,
> > > >
> > > > Xavi
> > > >
> > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n
<marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>> wrote:
> > > > Hi all,
> > > > I started an upgrade to gluster 11.0 from 10.3 on one of my
clusters.
> > > > OS: Debian bullseye
> > > >
> > > > Volume Name: gds-common
> > > > Type: Replicate
> > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > > Options Reconfigured:
> > > > cluster.granular-entry-heal: on
> > > > storage.fips-mode-rchecksum: on
> > > > transport.address-family: inet
> > > > nfs.disable: on
> > > > performance.client-io-threads: off
> > > >
> > > > I started with the arbiter node, stopped all of gluster
> > > > upgraded to 11.0 and all went fine.
> > > > After upgrade I was able to see the other nodes and
> > > > all nodes were connected.
> > > > After a reboot on the arbiter nothing works the way it
should.
> > > > Both brick1 and brick2 has connection but no connection
> > > > with the arbiter.
> > > > On the arbiter glusterd has started and is listening on port
24007,
> > > > the problem seems to be glusterfsd, it never starts!
> > > >
> > > > If I run: gluster volume status
> > > >
> > > > Status of volume: gds-common
> > > > Gluster process                             TCP Port  RDMA
Port  Online  Pid
> > > >
------------------------------------------------------------------------------
> > > > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A   
N       N/A
> > > > Self-heal Daemon on localhost               N/A       N/A   
N       N/A
> > > >
> > > > Task Status of Volume gds-common
> > > >
------------------------------------------------------------------------------
> > > > There are no active volume tasks
> > > >
> > > >
> > > > In glusterd.log I find the following errors (arbiter node):
> > > > [2023-02-17 12:30:40.519585 +0000] E
[gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed
<{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061]
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > >
> > > > In brick/urd-gds-gds-common.log I find the following error:
> > > > [2023-02-17 12:30:43.550753 +0000] E
[gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed
<{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > >
> > > > I enclose both logfiles.
> > > >
> > > > How do I resolve this issue??
> > > >
> > > > Many thanks in advance!!
> > > >
> > > > Marcus
> > > > ---
> > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU
behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka
h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > E-mailing SLU will result in SLU processing your personal
data. For more information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > > ________
> > > >
> > > >
> > > >
> > > > Community Meeting Calendar:
> > > >
> > > > Schedule -
> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org><mailto:Gluster-users at gluster.org<mailto:Gluster-users
at gluster.org>>
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > ---
> > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar
dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > E-mailing SLU will result in SLU processing your personal data.
For more information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
--
**************************************************
* Marcus Peders?n                                *
* System administrator                           *
**************************************************
* Interbull Centre                               *
* ================                               *
* Department of Animal Breeding & Genetics ? SLU *
* Box 7023, SE-750 07                            *
* Uppsala, Sweden                                *
**************************************************
* Visiting address:                              *
* Room 55614, Ulls v?g 26, Ultuna                *
* Uppsala                                        *
* Sweden                                         *
*                                                *
* Tel: +46-(0)18-67 1962                         *
*                                                *
**************************************************
*     ISO 9001 Bureau Veritas No SE004561-1      *
**************************************************
---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>

Xavi Hernandez

2023-Feb-21 08:29 UTC

head link

[Gluster-users] Gluster 11.0 upgrade

Hi Marcus,

On Mon, Feb 20, 2023 at 2:53 PM Marcus Peders?n <marcus.pedersen at
slu.se>
wrote:
> Hi again Xavi,
>
> I did some more testing on my virt machines
> with same setup:
> Number of Bricks: 1 x (2 + 1) = 3
> If I do it the same way, I upgrade the arbiter first,
> I get the same behavior that the bricks do not start
> and the other nodes does not "see" the upgraded node.
> If I upgrade one of the other nodes (non arbiter) and restart
> glusterd on both the arbiter and the other the arbiter starts
> the bricks and connects with the other upgraded node as expected.
> If I upgrade the last node (non arbiter) it will fail to start
> the bricks, same behaviour as the arbiter at first.
> If I then copy the /var/lib/gluster/vols/<myvol> from the
> upgraded (non arbiter) node to the other node that does not start the
> bricks
> and replace /var/lib/gluster/vols/<myvol> with the copied directory
> and restarts glusterd it works nicely after that.
> Everything then works the way it should.
>
> So the question is if the arbiter is treated in some other way
> compared to the other nodes?
>
It seems so, but at this point I'm not sure what could be the difference.

>
> Some type of config is happening at the start of the glusterd that
> makes the node fail?
>
Gluster requires that all glusterd share the same configuration. In this
case it seems that the "info" file in the volume definition has
different
contents on the servers.  One of the servers has the value
"nfs.disable=on"
but the others do not. This can be the difference that causes the checksum
error.

You can try to copy the "info" file from one node to the one that
doesn't
start and try restarting glusterd.

> Do I dare to continue to upgrade my real cluster with the above described
> way?
>
> Thanks!
>
> Regards
> Marcus
>
>
>
> On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n wrote:
> > I made a recusive diff on the upgraded arbiter.
> >
> > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > /home/marcus/gds-common is one of the other nodes still on gluster 10
> >
> > diff -r
> /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > 5c5
> > < listen-port=60419
> > ---
> > > listen-port=0
> > 11c11
> > < brick-fsid=14764358630653534655
> > ---
> > > brick-fsid=0
> > diff -r
> /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > 5c5
> > < listen-port=0
> > ---
> > > listen-port=60891
> > 11c11
> > < brick-fsid=0
> > ---
> > > brick-fsid=1088380223149770683
> > diff -r /var/lib/glusterd/vols/gds-common/cksum
> /home/marcus/gds-common/cksum
> > 1c1
> > < info=3948700922
> > ---
> > > info=458813151
> > diff -r
>
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > 3c3
> > <     option shared-brick-count 1
> > ---
> > >     option shared-brick-count 0
> > diff -r
>
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > 3c3
> > <     option shared-brick-count 0
> > ---
> > >     option shared-brick-count 1
> > diff -r /var/lib/glusterd/vols/gds-common/info
> /home/marcus/gds-common/info
> > 23a24
> > > nfs.disable=on
> >
> >
> > I setup 3 virt machines  and configured them with gluster 10 (arbiter
1).
> > After that I upgraded to 11 and the first 2 nodes was fine but on the
> third
> > node I got the same behaviour: the brick never started.
> >
> > Thanks for the help!
> >
> > Regards
> > Marcus
> >
> >
> > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > CAUTION: This email originated from outside of the organization.
Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
> > >
> > >
> > > Hi Marcus,
> > >
> > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n <
> marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>>
wrote:
> > > Hi Xavi,
> > > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > > and started glusterd again.
> > >
> > > The only log that is not empty is glusterd.log, I attach the log
> > > from the restart time. The brick log, glustershd.log and
> glfsheal-gds-common.log is empty.
> > >
> > > This are the errors in the log:
> > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061]
> [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
> [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010]
> [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management:
> Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum
> = 2065453698 on peer urd-gds-031
> > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010]
> [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management:
> Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum
> = 2065453698 on peer urd-gds-032
> > >
> > > Geo replication is not setup so I guess there is nothing strange
that
> there is an error regarding georep.
> > > The checksum error seems natural to be there as the other nodes
are
> still on version 10.
> > >
> > > No. The configurations should be identical.
> > >
> > > Can you try to compare volume definitions in
> /var/lib/glusterd/vols/gds-common between the upgraded server and one of
> the old ones ?
> > >
> > > Regards,
> > >
> > > Xavi
> > >
> > >
> > > My previous exprience with upgrades is that the local bricks
starts and
> > > gluster is up and running. No connection with the other nodes
until
> they are upgraded as well.
> > >
> > >
> > > gluster peer status, gives the output:
> > > Number of Peers: 2
> > >
> > > Hostname: urd-gds-032
> > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > > State: Peer Rejected (Connected)
> > >
> > > Hostname: urd-gds-031
> > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > > State: Peer Rejected (Connected)
> > >
> > > I suppose and guess that this is due to that the arbiter is
version 11
> > > and the other 2 nodes are version 10.
> > >
> > > Please let me know if I can provide any other information
> > > to try to solve this issue.
> > >
> > > Many thanks!
> > > Marcus
> > >
> > >
> > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > > CAUTION: This email originated from outside of the
organization. Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
> > > >
> > > >
> > > > Hi Marcus,
> > > >
> > > > these errors shouldn't prevent the bricks from starting.
Isn't there
> any other error or warning ?
> > > >
> > > > Regards,
> > > >
> > > > Xavi
> > > >
> > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n <
> marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se><mailto:
> marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>>>
wrote:
> > > > Hi all,
> > > > I started an upgrade to gluster 11.0 from 10.3 on one of my
clusters.
> > > > OS: Debian bullseye
> > > >
> > > > Volume Name: gds-common
> > > > Type: Replicate
> > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > > Options Reconfigured:
> > > > cluster.granular-entry-heal: on
> > > > storage.fips-mode-rchecksum: on
> > > > transport.address-family: inet
> > > > nfs.disable: on
> > > > performance.client-io-threads: off
> > > >
> > > > I started with the arbiter node, stopped all of gluster
> > > > upgraded to 11.0 and all went fine.
> > > > After upgrade I was able to see the other nodes and
> > > > all nodes were connected.
> > > > After a reboot on the arbiter nothing works the way it
should.
> > > > Both brick1 and brick2 has connection but no connection
> > > > with the arbiter.
> > > > On the arbiter glusterd has started and is listening on port
24007,
> > > > the problem seems to be glusterfsd, it never starts!
> > > >
> > > > If I run: gluster volume status
> > > >
> > > > Status of volume: gds-common
> > > > Gluster process                             TCP Port  RDMA
Port
> Online  Pid
> > > >
>
------------------------------------------------------------------------------
> > > > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A   
N
>      N/A
> > > > Self-heal Daemon on localhost               N/A       N/A   
N
>      N/A
> > > >
> > > > Task Status of Volume gds-common
> > > >
>
------------------------------------------------------------------------------
> > > > There are no active volume tasks
> > > >
> > > >
> > > > In glusterd.log I find the following errors (arbiter node):
> > > > [2023-02-17 12:30:40.519585 +0000] E
> [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call
> failed <{function=io_uring_setup()}, {error=12 (Cannot allocate
memory)}>
> > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061]
> [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
> [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > >
> > > > In brick/urd-gds-gds-common.log I find the following error:
> > > > [2023-02-17 12:30:43.550753 +0000] E
> [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call
> failed <{function=io_uring_setup()}, {error=12 (Cannot allocate
memory)}>
> > > >
> > > > I enclose both logfiles.
> > > >
> > > > How do I resolve this issue??
> > > >
> > > > Many thanks in advance!!
> > > >
> > > > Marcus
> > > > ---
> > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU
behandlar
> dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > E-mailing SLU will result in SLU processing your personal
data. For
> more information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > > ________
> > > >
> > > >
> > > >
> > > > Community Meeting Calendar:
> > > >
> > > > Schedule -
> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org><mailto:
> Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>>
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > ---
> > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar
dina
> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > E-mailing SLU will result in SLU processing your personal data.
For
> more information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
> personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230221/d9a1332d/attachment.html>

Gluster users - Feb 2023 - Gluster 11.0 upgrade

[Gluster-users] Gluster 11.0 upgrade

[Gluster-users] Gluster 11.0 upgrade

[Gluster-users] Gluster 11.0 upgrade