lejeczek
2020-Jul-01 14:46 UTC
[Gluster-users] volume process does not start - glusterfs is happy with it?
On 30/06/2020 11:31, Barak Sason Rofman wrote:> Greetings, > > I'm not sure if that's directly related to your problem, > but on a general level, AFAIK, replica-2 vols are not > recommended due to split brain possibility: > https://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ > > It's recommended to either use replica-3 or arbiter Arbiter. > > Regards, > > On Tue, Jun 30, 2020 at 1:14 PM lejeczek > <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote: > > Hi everybody. > > I have two peers in the cluster and a 2-replica volume > which seems okey if it was not for one weird bit - > when a peer reboots then on that peer after a reboot I > see: > > $ gluster volume status USERs > Status of volume: USERs > Gluster process???????????????????????????? TCP Port? > RDMA Port? Online? Pid > ------------------------------------------------------------------------------ > Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U > SERs??????????????????????????????????????? N/A?????? > N/A??????? N?????? N/A? > Brick dzien.direct:/00.STORAGE/2/0-GLUSTER- > USERs?????????????????????????????????????? 49152???? > 0????????? Y?????? 57338 > Self-heal Daemon on localhost?????????????? N/A?????? > N/A??????? Y?????? 4302 > Self-heal Daemon on dzien.direct??????????? N/A?????? > N/A??????? Y?????? 57359 > ? > Task Status of Volume USERs > ------------------------------------------------------------------------------ > There are no active volume tasks > > I do not suppose it's expected. > On such rebooted node I see: > $ systemctl status -l glusterd > ? glusterd.service - GlusterFS, a clustered > file-system server > ?? Loaded: loaded > (/usr/lib/systemd/system/glusterd.service; enabled; > vendor preset: enabled) > ? Drop-In: /etc/systemd/system/glusterd.service.d > ?????????? ??override.conf > ?? Active: active (running) since Mon 2020-06-29 > 21:37:36 BST; 13h ago > ???? Docs: man:glusterd(8) > ? Process: 4071 ExecStart=/usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level $LOG_LEVEL > $GLUSTERD_OPTIONS (code=exited, status> > ?Main PID: 4086 (glusterd) > ??? Tasks: 20 (limit: 101792) > ?? Memory: 28.9M > ?? CGroup: /system.slice/glusterd.service > ?????????? ??4086 /usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level INFO > ?????????? ??4302 /usr/sbin/glusterfs -s localhost > --volfile-id shd/USERs -p > /var/run/gluster/shd/USERs/USERs-shd.pid -l /var/log/g> > > Jun 29 21:37:36 swir.private.pawel systemd[1]: > Starting GlusterFS, a clustered file-system server... > Jun 29 21:37:36 swir.private.pawel systemd[1]: Started > GlusterFS, a clustered file-system server. > > And I do not see any other apparent problems nor errors. > On that node I manually: > $ systemctl restart glusterd.service > and... > > $ gluster volume status USERs > Status of volume: USERs > Gluster process???????????????????????????? TCP Port? > RDMA Port? Online? Pid > ------------------------------------------------------------------------------ > Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U > SERs??????????????????????????????????????? 49152???? > 0????????? Y?????? 103225 > Brick dzien.direct:/00.STORAGE/2/0-GLUSTER- > USERs?????????????????????????????????????? 49152???? > 0????????? Y?????? 57338 > Self-heal Daemon on localhost?????????????? N/A?????? > N/A??????? Y?????? 103270 > Self-heal Daemon on dzien.direct??????????? N/A?????? > N/A??????? Y?????? 57359 > > Is not a puzzle??? I'm on glusterfs-7.6-1.el8.x86_64 > I hope somebody can share some thoughts. > many thanks, L. >That cannot be it!? If the root cause of this problem is 2-replica volume then it would be a massive cock-up! Then 2-volume replica should be banned and forbidden. I hope some can suggest a way to troubleshoot it. ps. we all, I presume all, know problems of 2-replica volumes. many thanks, L.> ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > *Barak Sason Rofman* > > Gluster Storage?Development > > Red Hat?Israel <https://www.redhat.com/> > > 34 Jerusalem rd. Ra'anana, 43501 > > bsasonro at redhat.com <mailto:adi at redhat.com>? > ??T:?_+972-9-7692304_ > M:?_+972-52-4326355_ > > @RedHat <https://twitter.com/redhat>???Red Hat > <https://www.linkedin.com/company/red-hat>??Red Hat > <https://www.facebook.com/redhat.il/> > <https://red.ht/sig> >
Felix Kölzow
2020-Jul-01 17:57 UTC
[Gluster-users] volume process does not start - glusterfs is happy with it?
Hey, what about the device mapper? Everything was mount properly during reboot? It happens to me if the lvm device mapper got a timeout during the reboot process while mounting the brick itself. Regards, Felix On 01/07/2020 16:46, lejeczek wrote:> > On 30/06/2020 11:31, Barak Sason Rofman wrote: >> Greetings, >> >> I'm not sure if that's directly related to your problem, >> but on a general level, AFAIK, replica-2 vols are not >> recommended due to split brain possibility: >> https://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >> >> It's recommended to either use replica-3 or arbiter Arbiter. >> >> Regards, >> >> On Tue, Jun 30, 2020 at 1:14 PM lejeczek >> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>> wrote: >> >> Hi everybody. >> >> I have two peers in the cluster and a 2-replica volume >> which seems okey if it was not for one weird bit - >> when a peer reboots then on that peer after a reboot I >> see: >> >> $ gluster volume status USERs >> Status of volume: USERs >> Gluster process???????????????????????????? TCP Port >> RDMA Port? Online? Pid >> ------------------------------------------------------------------------------ >> Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U >> SERs??????????????????????????????????????? N/A >> N/A??????? N?????? N/A >> Brick dzien.direct:/00.STORAGE/2/0-GLUSTER- >> USERs?????????????????????????????????????? 49152 >> 0????????? Y?????? 57338 >> Self-heal Daemon on localhost?????????????? N/A >> N/A??????? Y?????? 4302 >> Self-heal Daemon on dzien.direct??????????? N/A >> N/A??????? Y?????? 57359 >> >> Task Status of Volume USERs >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> I do not suppose it's expected. >> On such rebooted node I see: >> $ systemctl status -l glusterd >> ? glusterd.service - GlusterFS, a clustered >> file-system server >> ?? Loaded: loaded >> (/usr/lib/systemd/system/glusterd.service; enabled; >> vendor preset: enabled) >> ? Drop-In: /etc/systemd/system/glusterd.service.d >> ?????????? ??override.conf >> ?? Active: active (running) since Mon 2020-06-29 >> 21:37:36 BST; 13h ago >> ???? Docs: man:glusterd(8) >> ? Process: 4071 ExecStart=/usr/sbin/glusterd -p >> /var/run/glusterd.pid --log-level $LOG_LEVEL >> $GLUSTERD_OPTIONS (code=exited, status> >> ?Main PID: 4086 (glusterd) >> ??? Tasks: 20 (limit: 101792) >> ?? Memory: 28.9M >> ?? CGroup: /system.slice/glusterd.service >> ?????????? ??4086 /usr/sbin/glusterd -p >> /var/run/glusterd.pid --log-level INFO >> ?????????? ??4302 /usr/sbin/glusterfs -s localhost >> --volfile-id shd/USERs -p >> /var/run/gluster/shd/USERs/USERs-shd.pid -l /var/log/g> >> >> Jun 29 21:37:36 swir.private.pawel systemd[1]: >> Starting GlusterFS, a clustered file-system server... >> Jun 29 21:37:36 swir.private.pawel systemd[1]: Started >> GlusterFS, a clustered file-system server. >> >> And I do not see any other apparent problems nor errors. >> On that node I manually: >> $ systemctl restart glusterd.service >> and... >> >> $ gluster volume status USERs >> Status of volume: USERs >> Gluster process???????????????????????????? TCP Port >> RDMA Port? Online? Pid >> ------------------------------------------------------------------------------ >> Brick swir.direct:/00.STORAGE/2/0-GLUSTER-U >> SERs??????????????????????????????????????? 49152 >> 0????????? Y?????? 103225 >> Brick dzien.direct:/00.STORAGE/2/0-GLUSTER- >> USERs?????????????????????????????????????? 49152 >> 0????????? Y?????? 57338 >> Self-heal Daemon on localhost?????????????? N/A >> N/A??????? Y?????? 103270 >> Self-heal Daemon on dzien.direct??????????? N/A >> N/A??????? Y?????? 57359 >> >> Is not a puzzle??? I'm on glusterfs-7.6-1.el8.x86_64 >> I hope somebody can share some thoughts. >> many thanks, L. >> > That cannot be it!? If the root cause of this problem is > 2-replica volume then it would be a massive cock-up! Then > 2-volume replica should be banned and forbidden. > > I hope some can suggest a way to troubleshoot it. > > ps. we all, I presume all, know problems of 2-replica volumes. > > many thanks, L. > > >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> <mailto:Gluster-users at gluster.org> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> *Barak Sason Rofman* >> >> Gluster Storage?Development >> >> Red Hat?Israel <https://www.redhat.com/> >> >> 34 Jerusalem rd. Ra'anana, 43501 >> >> bsasonro at redhat.com <mailto:adi at redhat.com> >> ??T:?_+972-9-7692304_ >> M:?_+972-52-4326355_ >> >> @RedHat <https://twitter.com/redhat>???Red Hat >> <https://www.linkedin.com/company/red-hat>??Red Hat >> <https://www.facebook.com/redhat.il/> >> <https://red.ht/sig> >> > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users