Hi again Xavi, I did some more testing on my virt machines with same setup: Number of Bricks: 1 x (2 + 1) = 3 If I do it the same way, I upgrade the arbiter first, I get the same behavior that the bricks do not start and the other nodes does not "see" the upgraded node. If I upgrade one of the other nodes (non arbiter) and restart glusterd on both the arbiter and the other the arbiter starts the bricks and connects with the other upgraded node as expected. If I upgrade the last node (non arbiter) it will fail to start the bricks, same behaviour as the arbiter at first. If I then copy the /var/lib/gluster/vols/<myvol> from the upgraded (non arbiter) node to the other node that does not start the bricks and replace /var/lib/gluster/vols/<myvol> with the copied directory and restarts glusterd it works nicely after that. Everything then works the way it should. So the question is if the arbiter is treated in some other way compared to the other nodes? Some type of config is happening at the start of the glusterd that makes the node fail? Do I dare to continue to upgrade my real cluster with the above described way? Thanks! Regards Marcus On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n wrote:> I made a recusive diff on the upgraded arbiter. > > /var/lib/glusterd/vols/gds-common is the upgraded aribiter > /home/marcus/gds-common is one of the other nodes still on gluster 10 > > diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common > 5c5 > < listen-port=60419 > --- > > listen-port=0 > 11c11 > < brick-fsid=14764358630653534655 > --- > > brick-fsid=0 > diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common > 5c5 > < listen-port=0 > --- > > listen-port=60891 > 11c11 > < brick-fsid=0 > --- > > brick-fsid=1088380223149770683 > diff -r /var/lib/glusterd/vols/gds-common/cksum /home/marcus/gds-common/cksum > 1c1 > < info=3948700922 > --- > > info=458813151 > diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol > 3c3 > < option shared-brick-count 1 > --- > > option shared-brick-count 0 > diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol > 3c3 > < option shared-brick-count 0 > --- > > option shared-brick-count 1 > diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info > 23a24 > > nfs.disable=on > > > I setup 3 virt machines and configured them with gluster 10 (arbiter 1). > After that I upgraded to 11 and the first 2 nodes was fine but on the third > node I got the same behaviour: the brick never started. > > Thanks for the help! > > Regards > Marcus > > > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > Hi Marcus, > > > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: > > Hi Xavi, > > I stopped glusterd and killall glusterd glusterfs glusterfsd > > and started glusterd again. > > > > The only log that is not empty is glusterd.log, I attach the log > > from the restart time. The brick log, glustershd.log and glfsheal-gds-common.log is empty. > > > > This are the errors in the log: > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-031 > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-032 > > > > Geo replication is not setup so I guess there is nothing strange that there is an error regarding georep. > > The checksum error seems natural to be there as the other nodes are still on version 10. > > > > No. The configurations should be identical. > > > > Can you try to compare volume definitions in /var/lib/glusterd/vols/gds-common between the upgraded server and one of the old ones ? > > > > Regards, > > > > Xavi > > > > > > My previous exprience with upgrades is that the local bricks starts and > > gluster is up and running. No connection with the other nodes until they are upgraded as well. > > > > > > gluster peer status, gives the output: > > Number of Peers: 2 > > > > Hostname: urd-gds-032 > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439 > > State: Peer Rejected (Connected) > > > > Hostname: urd-gds-031 > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf > > State: Peer Rejected (Connected) > > > > I suppose and guess that this is due to that the arbiter is version 11 > > and the other 2 nodes are version 10. > > > > Please let me know if I can provide any other information > > to try to solve this issue. > > > > Many thanks! > > Marcus > > > > > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote: > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > > > > Hi Marcus, > > > > > > these errors shouldn't prevent the bricks from starting. Isn't there any other error or warning ? > > > > > > Regards, > > > > > > Xavi > > > > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>>> wrote: > > > Hi all, > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters. > > > OS: Debian bullseye > > > > > > Volume Name: gds-common > > > Type: Replicate > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6 > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 x (2 + 1) = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: urd-gds-031:/urd-gds/gds-common > > > Brick2: urd-gds-032:/urd-gds/gds-common > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter) > > > Options Reconfigured: > > > cluster.granular-entry-heal: on > > > storage.fips-mode-rchecksum: on > > > transport.address-family: inet > > > nfs.disable: on > > > performance.client-io-threads: off > > > > > > I started with the arbiter node, stopped all of gluster > > > upgraded to 11.0 and all went fine. > > > After upgrade I was able to see the other nodes and > > > all nodes were connected. > > > After a reboot on the arbiter nothing works the way it should. > > > Both brick1 and brick2 has connection but no connection > > > with the arbiter. > > > On the arbiter glusterd has started and is listening on port 24007, > > > the problem seems to be glusterfsd, it never starts! > > > > > > If I run: gluster volume status > > > > > > Status of volume: gds-common > > > Gluster process TCP Port RDMA Port Online Pid > > > ------------------------------------------------------------------------------ > > > Brick urd-gds-030:/urd-gds/gds-common N/A N/A N N/A > > > Self-heal Daemon on localhost N/A N/A N N/A > > > > > > Task Status of Volume gds-common > > > ------------------------------------------------------------------------------ > > > There are no active volume tasks > > > > > > > > > In glusterd.log I find the following errors (arbiter node): > > > [2023-02-17 12:30:40.519585 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}> > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > > > > In brick/urd-gds-gds-common.log I find the following error: > > > [2023-02-17 12:30:43.550753 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}> > > > > > > I enclose both logfiles. > > > > > > How do I resolve this issue?? > > > > > > Many thanks in advance!! > > > > > > Marcus > > > --- > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > Gluster-users mailing list > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org><mailto:Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > --- > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>--- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Hi again, There is something going on when gluster starts. I fired up my virt machines this morning to do some more testing and one of the nodes did not come online in the cluster. Looking at that node I found that only glusterd and glusterfs has started. After: systemctl stop glusterd killall glusterd glusterfs glusterfsd systemctl start glusterd Gluster started correct and glusterd, glusterfs and glusterfsd all started and the node was online in the cluster. I just wanted to let you know if this might help. Regards Marcus On Mon, Feb 20, 2023 at 02:52:52PM +0100, Marcus Peders?n wrote:> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > Hi again Xavi, > > I did some more testing on my virt machines > with same setup: > Number of Bricks: 1 x (2 + 1) = 3 > If I do it the same way, I upgrade the arbiter first, > I get the same behavior that the bricks do not start > and the other nodes does not "see" the upgraded node. > If I upgrade one of the other nodes (non arbiter) and restart > glusterd on both the arbiter and the other the arbiter starts > the bricks and connects with the other upgraded node as expected. > If I upgrade the last node (non arbiter) it will fail to start > the bricks, same behaviour as the arbiter at first. > If I then copy the /var/lib/gluster/vols/<myvol> from the > upgraded (non arbiter) node to the other node that does not start the bricks > and replace /var/lib/gluster/vols/<myvol> with the copied directory > and restarts glusterd it works nicely after that. > Everything then works the way it should. > > So the question is if the arbiter is treated in some other way > compared to the other nodes? > > Some type of config is happening at the start of the glusterd that > makes the node fail? > > Do I dare to continue to upgrade my real cluster with the above described way? > > Thanks! > > Regards > Marcus > > > > On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n wrote: > > I made a recusive diff on the upgraded arbiter. > > > > /var/lib/glusterd/vols/gds-common is the upgraded aribiter > > /home/marcus/gds-common is one of the other nodes still on gluster 10 > > > > diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common > > 5c5 > > < listen-port=60419 > > --- > > > listen-port=0 > > 11c11 > > < brick-fsid=14764358630653534655 > > --- > > > brick-fsid=0 > > diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common > > 5c5 > > < listen-port=0 > > --- > > > listen-port=60891 > > 11c11 > > < brick-fsid=0 > > --- > > > brick-fsid=1088380223149770683 > > diff -r /var/lib/glusterd/vols/gds-common/cksum /home/marcus/gds-common/cksum > > 1c1 > > < info=3948700922 > > --- > > > info=458813151 > > diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol > > 3c3 > > < option shared-brick-count 1 > > --- > > > option shared-brick-count 0 > > diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol > > 3c3 > > < option shared-brick-count 0 > > --- > > > option shared-brick-count 1 > > diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info > > 23a24 > > > nfs.disable=on > > > > > > I setup 3 virt machines and configured them with gluster 10 (arbiter 1). > > After that I upgraded to 11 and the first 2 nodes was fine but on the third > > node I got the same behaviour: the brick never started. > > > > Thanks for the help! > > > > Regards > > Marcus > > > > > > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote: > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > > > > Hi Marcus, > > > > > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: > > > Hi Xavi, > > > I stopped glusterd and killall glusterd glusterfs glusterfsd > > > and started glusterd again. > > > > > > The only log that is not empty is glusterd.log, I attach the log > > > from the restart time. The brick log, glustershd.log and glfsheal-gds-common.log is empty. > > > > > > This are the errors in the log: > > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-031 > > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-032 > > > > > > Geo replication is not setup so I guess there is nothing strange that there is an error regarding georep. > > > The checksum error seems natural to be there as the other nodes are still on version 10. > > > > > > No. The configurations should be identical. > > > > > > Can you try to compare volume definitions in /var/lib/glusterd/vols/gds-common between the upgraded server and one of the old ones ? > > > > > > Regards, > > > > > > Xavi > > > > > > > > > My previous exprience with upgrades is that the local bricks starts and > > > gluster is up and running. No connection with the other nodes until they are upgraded as well. > > > > > > > > > gluster peer status, gives the output: > > > Number of Peers: 2 > > > > > > Hostname: urd-gds-032 > > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439 > > > State: Peer Rejected (Connected) > > > > > > Hostname: urd-gds-031 > > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf > > > State: Peer Rejected (Connected) > > > > > > I suppose and guess that this is due to that the arbiter is version 11 > > > and the other 2 nodes are version 10. > > > > > > Please let me know if I can provide any other information > > > to try to solve this issue. > > > > > > Many thanks! > > > Marcus > > > > > > > > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote: > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > > > > > > > > > Hi Marcus, > > > > > > > > these errors shouldn't prevent the bricks from starting. Isn't there any other error or warning ? > > > > > > > > Regards, > > > > > > > > Xavi > > > > > > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se><mailto:marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>>> wrote: > > > > Hi all, > > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters. > > > > OS: Debian bullseye > > > > > > > > Volume Name: gds-common > > > > Type: Replicate > > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6 > > > > Status: Started > > > > Snapshot Count: 0 > > > > Number of Bricks: 1 x (2 + 1) = 3 > > > > Transport-type: tcp > > > > Bricks: > > > > Brick1: urd-gds-031:/urd-gds/gds-common > > > > Brick2: urd-gds-032:/urd-gds/gds-common > > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter) > > > > Options Reconfigured: > > > > cluster.granular-entry-heal: on > > > > storage.fips-mode-rchecksum: on > > > > transport.address-family: inet > > > > nfs.disable: on > > > > performance.client-io-threads: off > > > > > > > > I started with the arbiter node, stopped all of gluster > > > > upgraded to 11.0 and all went fine. > > > > After upgrade I was able to see the other nodes and > > > > all nodes were connected. > > > > After a reboot on the arbiter nothing works the way it should. > > > > Both brick1 and brick2 has connection but no connection > > > > with the arbiter. > > > > On the arbiter glusterd has started and is listening on port 24007, > > > > the problem seems to be glusterfsd, it never starts! > > > > > > > > If I run: gluster volume status > > > > > > > > Status of volume: gds-common > > > > Gluster process TCP Port RDMA Port Online Pid > > > > ------------------------------------------------------------------------------ > > > > Brick urd-gds-030:/urd-gds/gds-common N/A N/A N N/A > > > > Self-heal Daemon on localhost N/A N/A N N/A > > > > > > > > Task Status of Volume gds-common > > > > ------------------------------------------------------------------------------ > > > > There are no active volume tasks > > > > > > > > > > > > In glusterd.log I find the following errors (arbiter node): > > > > [2023-02-17 12:30:40.519585 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}> > > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > > > > > > In brick/urd-gds-gds-common.log I find the following error: > > > > [2023-02-17 12:30:43.550753 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}> > > > > > > > > I enclose both logfiles. > > > > > > > > How do I resolve this issue?? > > > > > > > > Many thanks in advance!! > > > > > > > > Marcus > > > > --- > > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > > > > ________ > > > > > > > > > > > > > > > > Community Meeting Calendar: > > > > > > > > Schedule - > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org><mailto:Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > --- > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/> > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- ************************************************** * Marcus Peders?n * * System administrator * ************************************************** * Interbull Centre * * ================ * * Department of Animal Breeding & Genetics ? SLU * * Box 7023, SE-750 07 * * Uppsala, Sweden * ************************************************** * Visiting address: * * Room 55614, Ulls v?g 26, Ultuna * * Uppsala * * Sweden * * * * Tel: +46-(0)18-67 1962 * * * ************************************************** * ISO 9001 Bureau Veritas No SE004561-1 * ************************************************** --- N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
Hi Marcus, On Mon, Feb 20, 2023 at 2:53 PM Marcus Peders?n <marcus.pedersen at slu.se> wrote:> Hi again Xavi, > > I did some more testing on my virt machines > with same setup: > Number of Bricks: 1 x (2 + 1) = 3 > If I do it the same way, I upgrade the arbiter first, > I get the same behavior that the bricks do not start > and the other nodes does not "see" the upgraded node. > If I upgrade one of the other nodes (non arbiter) and restart > glusterd on both the arbiter and the other the arbiter starts > the bricks and connects with the other upgraded node as expected. > If I upgrade the last node (non arbiter) it will fail to start > the bricks, same behaviour as the arbiter at first. > If I then copy the /var/lib/gluster/vols/<myvol> from the > upgraded (non arbiter) node to the other node that does not start the > bricks > and replace /var/lib/gluster/vols/<myvol> with the copied directory > and restarts glusterd it works nicely after that. > Everything then works the way it should. > > So the question is if the arbiter is treated in some other way > compared to the other nodes? >It seems so, but at this point I'm not sure what could be the difference.> > Some type of config is happening at the start of the glusterd that > makes the node fail? >Gluster requires that all glusterd share the same configuration. In this case it seems that the "info" file in the volume definition has different contents on the servers. One of the servers has the value "nfs.disable=on" but the others do not. This can be the difference that causes the checksum error. You can try to copy the "info" file from one node to the one that doesn't start and try restarting glusterd.> Do I dare to continue to upgrade my real cluster with the above described > way? > > Thanks! > > Regards > Marcus > > > > On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Peders?n wrote: > > I made a recusive diff on the upgraded arbiter. > > > > /var/lib/glusterd/vols/gds-common is the upgraded aribiter > > /home/marcus/gds-common is one of the other nodes still on gluster 10 > > > > diff -r > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common > /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common > > 5c5 > > < listen-port=60419 > > --- > > > listen-port=0 > > 11c11 > > < brick-fsid=14764358630653534655 > > --- > > > brick-fsid=0 > > diff -r > /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common > /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common > > 5c5 > > < listen-port=0 > > --- > > > listen-port=60891 > > 11c11 > > < brick-fsid=0 > > --- > > > brick-fsid=1088380223149770683 > > diff -r /var/lib/glusterd/vols/gds-common/cksum > /home/marcus/gds-common/cksum > > 1c1 > > < info=3948700922 > > --- > > > info=458813151 > > diff -r > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol > /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol > > 3c3 > > < option shared-brick-count 1 > > --- > > > option shared-brick-count 0 > > diff -r > /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol > /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol > > 3c3 > > < option shared-brick-count 0 > > --- > > > option shared-brick-count 1 > > diff -r /var/lib/glusterd/vols/gds-common/info > /home/marcus/gds-common/info > > 23a24 > > > nfs.disable=on > > > > > > I setup 3 virt machines and configured them with gluster 10 (arbiter 1). > > After that I upgraded to 11 and the first 2 nodes was fine but on the > third > > node I got the same behaviour: the brick never started. > > > > Thanks for the help! > > > > Regards > > Marcus > > > > > > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote: > > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you recognize the sender and > know the content is safe. > > > > > > > > > Hi Marcus, > > > > > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Peders?n < > marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote: > > > Hi Xavi, > > > I stopped glusterd and killall glusterd glusterfs glusterfsd > > > and started glusterd again. > > > > > > The only log that is not empty is glusterd.log, I attach the log > > > from the restart time. The brick log, glustershd.log and > glfsheal-gds-common.log is empty. > > > > > > This are the errors in the log: > > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed > [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: > Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum > = 2065453698 on peer urd-gds-031 > > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] > [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: > Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum > = 2065453698 on peer urd-gds-032 > > > > > > Geo replication is not setup so I guess there is nothing strange that > there is an error regarding georep. > > > The checksum error seems natural to be there as the other nodes are > still on version 10. > > > > > > No. The configurations should be identical. > > > > > > Can you try to compare volume definitions in > /var/lib/glusterd/vols/gds-common between the upgraded server and one of > the old ones ? > > > > > > Regards, > > > > > > Xavi > > > > > > > > > My previous exprience with upgrades is that the local bricks starts and > > > gluster is up and running. No connection with the other nodes until > they are upgraded as well. > > > > > > > > > gluster peer status, gives the output: > > > Number of Peers: 2 > > > > > > Hostname: urd-gds-032 > > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439 > > > State: Peer Rejected (Connected) > > > > > > Hostname: urd-gds-031 > > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf > > > State: Peer Rejected (Connected) > > > > > > I suppose and guess that this is due to that the arbiter is version 11 > > > and the other 2 nodes are version 10. > > > > > > Please let me know if I can provide any other information > > > to try to solve this issue. > > > > > > Many thanks! > > > Marcus > > > > > > > > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote: > > > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you recognize the sender and > know the content is safe. > > > > > > > > > > > > Hi Marcus, > > > > > > > > these errors shouldn't prevent the bricks from starting. Isn't there > any other error or warning ? > > > > > > > > Regards, > > > > > > > > Xavi > > > > > > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Peders?n < > marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se><mailto: > marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>>> wrote: > > > > Hi all, > > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters. > > > > OS: Debian bullseye > > > > > > > > Volume Name: gds-common > > > > Type: Replicate > > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6 > > > > Status: Started > > > > Snapshot Count: 0 > > > > Number of Bricks: 1 x (2 + 1) = 3 > > > > Transport-type: tcp > > > > Bricks: > > > > Brick1: urd-gds-031:/urd-gds/gds-common > > > > Brick2: urd-gds-032:/urd-gds/gds-common > > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter) > > > > Options Reconfigured: > > > > cluster.granular-entry-heal: on > > > > storage.fips-mode-rchecksum: on > > > > transport.address-family: inet > > > > nfs.disable: on > > > > performance.client-io-threads: off > > > > > > > > I started with the arbiter node, stopped all of gluster > > > > upgraded to 11.0 and all went fine. > > > > After upgrade I was able to see the other nodes and > > > > all nodes were connected. > > > > After a reboot on the arbiter nothing works the way it should. > > > > Both brick1 and brick2 has connection but no connection > > > > with the arbiter. > > > > On the arbiter glusterd has started and is listening on port 24007, > > > > the problem seems to be glusterfsd, it never starts! > > > > > > > > If I run: gluster volume status > > > > > > > > Status of volume: gds-common > > > > Gluster process TCP Port RDMA Port > Online Pid > > > > > ------------------------------------------------------------------------------ > > > > Brick urd-gds-030:/urd-gds/gds-common N/A N/A N > N/A > > > > Self-heal Daemon on localhost N/A N/A N > N/A > > > > > > > > Task Status of Volume gds-common > > > > > ------------------------------------------------------------------------------ > > > > There are no active volume tasks > > > > > > > > > > > > In glusterd.log I find the following errors (arbiter node): > > > > [2023-02-17 12:30:40.519585 +0000] E > [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call > failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}> > > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] > [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed > [{Key=log-group}, {errno=2}, {error=No such file or directory}] > > > > > > > > In brick/urd-gds-gds-common.log I find the following error: > > > > [2023-02-17 12:30:43.550753 +0000] E > [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call > failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}> > > > > > > > > I enclose both logfiles. > > > > > > > > How do I resolve this issue?? > > > > > > > > Many thanks in advance!! > > > > > > > > Marcus > > > > --- > > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar > dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r < > https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > > E-mailing SLU will result in SLU processing your personal data. For > more information on how this is done, click here < > https://www.slu.se/en/about-slu/contact-slu/personal-data/> > > > > ________ > > > > > > > > > > > > > > > > Community Meeting Calendar: > > > > > > > > Schedule - > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org><mailto: > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > --- > > > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina > personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r < > https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > > > E-mailing SLU will result in SLU processing your personal data. For > more information on how this is done, click here < > https://www.slu.se/en/about-slu/contact-slu/personal-data/> > --- > N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina > personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r < > https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more > information on how this is done, click here < > https://www.slu.se/en/about-slu/contact-slu/personal-data/> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230221/d9a1332d/attachment.html>