Hi Xavi,
Copy the same info file worked well and the gluster 11 arbiter
is now up and running and all the nodes are communication
the way they should.
Just another note on something I discovered on my virt machines.
All the three nodes has been upgarded to 11.0 and are working.
If I run:
gluster volume get all cluster.op-version
I get:
Option Value
------ -----
cluster.op-version 100000
Which is correct as I have not updated the op-version,
but if I run:
gluster volume get all cluster.max-op-version
I get:
Option Value
------ -----
cluster.max-op-version 100000
I expected the max-op-version to be 110000.
Isn't it supposed to be 110000?
And after upgrade you should upgrade the op-version
to 110000?
Many thanks for all your help!
Regards
Marcus
On Tue, Feb 21, 2023 at 09:29:28AM +0100, Xavi Hernandez
wrote:> CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and know the
content is safe.
>
>
> Hi Marcus,
>
> On Mon, Feb 20, 2023 at 2:53 PM Marcus Peders?n <marcus.pedersen at
slu.se<mailto:marcus.pedersen at slu.se>> wrote:
> Hi again Xavi,
>
> I did some more testing on my virt machines
> with same setup:
> Number of Bricks: 1 x (2 + 1) = 3
> If I do it the same way, I upgrade the arbiter first,
> I get the same behavior that the bricks do not start
> and the other nodes does not "see" the upgraded node.
> If I upgrade one of the other nodes (non arbiter) and restart
> glusterd on both the arbiter and the other the arbiter starts
> the bricks and connects with the other upgraded node as expected.
> If I upgrade the last node (non arbiter) it will fail to start
> the bricks, same behaviour as the arbiter at first.
> If I then copy the /var/lib/gluster/vols/<myvol> from the
> upgraded (non arbiter) node to the other node that does not start the
bricks
> and replace /var/lib/gluster/vols/<myvol> with the copied directory
> and restarts glusterd it works nicely after that.
> Everything then works the way it should.
>
> So the question is if the arbiter is treated in some other way
> compared to the other nodes?
>
> It seems so, but at this point I'm not sure what could be the
difference.
>
>
> Some type of config is happening at the start of the glusterd that
> makes the node fail?
>
> Gluster requires that all glusterd share the same configuration. In this
case it seems that the "info" file in the volume definition has
different contents on the servers. One of the servers has the value
"nfs.disable=on" but the others do not. This can be the difference
that causes the checksum error.
>
> You can try to copy the "info" file from one node to the one that
doesn't start and try restarting glusterd.
>
>
> Do I dare to continue to upgrade my real cluster with the above described
way?
>
> Thanks!
>
> Regards
> Marcus
>
>
>
> On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n wrote:
> > I made a recusive diff on the upgraded arbiter.
> >
> > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > /home/marcus/gds-common is one of the other nodes still on gluster 10
> >
> > diff -r
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
/home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > 5c5
> > < listen-port=60419
> > ---
> > > listen-port=0
> > 11c11
> > < brick-fsid=14764358630653534655
> > ---
> > > brick-fsid=0
> > diff -r
/var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
/home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > 5c5
> > < listen-port=0
> > ---
> > > listen-port=60891
> > 11c11
> > < brick-fsid=0
> > ---
> > > brick-fsid=1088380223149770683
> > diff -r /var/lib/glusterd/vols/gds-common/cksum
/home/marcus/gds-common/cksum
> > 1c1
> > < info=3948700922
> > ---
> > > info=458813151
> > diff -r
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
/home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > 3c3
> > < option shared-brick-count 1
> > ---
> > > option shared-brick-count 0
> > diff -r
/var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
/home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > 3c3
> > < option shared-brick-count 0
> > ---
> > > option shared-brick-count 1
> > diff -r /var/lib/glusterd/vols/gds-common/info
/home/marcus/gds-common/info
> > 23a24
> > > nfs.disable=on
> >
> >
> > I setup 3 virt machines and configured them with gluster 10 (arbiter
1).
> > After that I upgraded to 11 and the first 2 nodes was fine but on the
third
> > node I got the same behaviour: the brick never started.
> >
> > Thanks for the help!
> >
> > Regards
> > Marcus
> >
> >
> > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you recognize the sender and know
the content is safe.
> > >
> > >
> > > Hi Marcus,
> > >
> > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n
<marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>> wrote:
> > > Hi Xavi,
> > > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > > and started glusterd again.
> > >
> > > The only log that is not empty is glusterd.log, I attach the log
> > > from the restart time. The brick log, glustershd.log and
glfsheal-gds-common.log is empty.
> > >
> > > This are the errors in the log:
> > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061]
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010]
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on
peer urd-gds-031
> > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010]
[glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of
Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on
peer urd-gds-032
> > >
> > > Geo replication is not setup so I guess there is nothing strange
that there is an error regarding georep.
> > > The checksum error seems natural to be there as the other nodes
are still on version 10.
> > >
> > > No. The configurations should be identical.
> > >
> > > Can you try to compare volume definitions in
/var/lib/glusterd/vols/gds-common between the upgraded server and one of the old
ones ?
> > >
> > > Regards,
> > >
> > > Xavi
> > >
> > >
> > > My previous exprience with upgrades is that the local bricks
starts and
> > > gluster is up and running. No connection with the other nodes
until they are upgraded as well.
> > >
> > >
> > > gluster peer status, gives the output:
> > > Number of Peers: 2
> > >
> > > Hostname: urd-gds-032
> > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > > State: Peer Rejected (Connected)
> > >
> > > Hostname: urd-gds-031
> > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > > State: Peer Rejected (Connected)
> > >
> > > I suppose and guess that this is due to that the arbiter is
version 11
> > > and the other 2 nodes are version 10.
> > >
> > > Please let me know if I can provide any other information
> > > to try to solve this issue.
> > >
> > > Many thanks!
> > > Marcus
> > >
> > >
> > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > > CAUTION: This email originated from outside of the
organization. Do not click links or open attachments unless you recognize the
sender and know the content is safe.
> > > >
> > > >
> > > > Hi Marcus,
> > > >
> > > > these errors shouldn't prevent the bricks from starting.
Isn't there any other error or warning ?
> > > >
> > > > Regards,
> > > >
> > > > Xavi
> > > >
> > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n
<marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at
slu.se>>>> wrote:
> > > > Hi all,
> > > > I started an upgrade to gluster 11.0 from 10.3 on one of my
clusters.
> > > > OS: Debian bullseye
> > > >
> > > > Volume Name: gds-common
> > > > Type: Replicate
> > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > > Options Reconfigured:
> > > > cluster.granular-entry-heal: on
> > > > storage.fips-mode-rchecksum: on
> > > > transport.address-family: inet
> > > > nfs.disable: on
> > > > performance.client-io-threads: off
> > > >
> > > > I started with the arbiter node, stopped all of gluster
> > > > upgraded to 11.0 and all went fine.
> > > > After upgrade I was able to see the other nodes and
> > > > all nodes were connected.
> > > > After a reboot on the arbiter nothing works the way it
should.
> > > > Both brick1 and brick2 has connection but no connection
> > > > with the arbiter.
> > > > On the arbiter glusterd has started and is listening on port
24007,
> > > > the problem seems to be glusterfsd, it never starts!
> > > >
> > > > If I run: gluster volume status
> > > >
> > > > Status of volume: gds-common
> > > > Gluster process TCP Port RDMA
Port Online Pid
> > > >
------------------------------------------------------------------------------
> > > > Brick urd-gds-030:/urd-gds/gds-common N/A N/A
N N/A
> > > > Self-heal Daemon on localhost N/A N/A
N N/A
> > > >
> > > > Task Status of Volume gds-common
> > > >
------------------------------------------------------------------------------
> > > > There are no active volume tasks
> > > >
> > > >
> > > > In glusterd.log I find the following errors (arbiter node):
> > > > [2023-02-17 12:30:40.519585 +0000] E
[gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed
<{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061]
[glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
[{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > >
> > > > In brick/urd-gds-gds-common.log I find the following error:
> > > > [2023-02-17 12:30:43.550753 +0000] E
[gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed
<{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > >
> > > > I enclose both logfiles.
> > > >
> > > > How do I resolve this issue??
> > > >
> > > > Many thanks in advance!!
> > > >
> > > > Marcus
> > > > ---
> > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU
behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka
h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > E-mailing SLU will result in SLU processing your personal
data. For more information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > > ________
> > > >
> > > >
> > > >
> > > > Community Meeting Calendar:
> > > >
> > > > Schedule -
> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org<mailto:Gluster-users at
gluster.org><mailto:Gluster-users at gluster.org<mailto:Gluster-users
at gluster.org>><mailto:Gluster-users at
gluster.org<mailto:Gluster-users at gluster.org><mailto:Gluster-users
at gluster.org<mailto:Gluster-users at gluster.org>>>
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > ---
> > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar
dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > E-mailing SLU will result in SLU processing your personal data.
For more information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ---
> N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
--
**************************************************
* Marcus Peders?n *
* System administrator *
**************************************************
* Interbull Centre *
* ================ *
* Department of Animal Breeding & Genetics ? SLU *
* Box 7023, SE-750 07 *
* Uppsala, Sweden *
**************************************************
* Visiting address: *
* Room 55614, Ulls v?g 26, Ultuna *
* Uppsala *
* Sweden *
* *
* Tel: +46-(0)18-67 1962 *
* *
**************************************************
* ISO 9001 Bureau Veritas No SE004561-1 *
**************************************************
---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina
personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r
<https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more
information on how this is done, click here
<https://www.slu.se/en/about-slu/contact-slu/personal-data/>