Artem Russakovskii
2019-Mar-20 04:21 UTC
[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP
Can I roll back performance.write-behind: off and lru-limit=0 then? I'm waiting for the debug packages to be available for OpenSUSE, then I can help Amar with another debug session. In the meantime, have you had time to set up 1x4 replicate testing? I was told you were only testing 1x3, and it's the 4th brick that may be causing the crash, which is consistent with this whole time only 1 of 4 bricks constantly crashing. The other 3 have been rock solid. I'm hoping you could find the issue without a debug session this way. Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | +ArtemRussakovskii <https://plus.google.com/+ArtemRussakovskii> | @ArtemR <http://twitter.com/ArtemR> On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran <nbalacha at redhat.com> wrote:> Hi Artem, > > I think you are running into a different crash. The ones reported which > were prevented by turning off write-behind are now fixed. > We will need to look into the one you are seeing to see why it is > happening. > > Regards, > Nithya > > > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii <archon810 at gmail.com> > wrote: > >> The flood is indeed fixed for us on 5.5. However, the crashes are not. >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror >> <http://www.apkmirror.com/>, Illogical Robot LLC >> beerpla.net | +ArtemRussakovskii >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >> <http://twitter.com/ArtemR> >> >> >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert <revirii at googlemail.com> wrote: >> >>> Hi Amar, >>> >>> if you refer to this bug: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test >>> setup i haven't seen those entries, while copying & deleting a few GBs >>> of data. For a final statement we have to wait until i updated our >>> live gluster servers - could take place on tuesday or wednesday. >>> >>> Maybe other users can do an update to 5.4 as well and report back here. >>> >>> >>> Hubert >>> >>> >>> >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan >>> <atumball at redhat.com>: >>> > >>> > Hi Hu Bert, >>> > >>> > Appreciate the feedback. Also are the other boiling issues related to >>> logs fixed now? >>> > >>> > -Amar >>> > >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert <revirii at googlemail.com> >>> wrote: >>> >> >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >>> >> volumes done. In 'gluster peer status' the peers stay connected during >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the >>> >> logs. Looks good :-) >>> >> >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < >>> revirii at googlemail.com>: >>> >> > >>> >> > Good morning :-) >>> >> > >>> >> > for debian the packages are there: >>> >> > >>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >>> >> > >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if >>> there >>> >> > are some errors etc. and report back. >>> >> > >>> >> > btw: no release notes for 5.4 and 5.5 so far? >>> >> > https://docs.gluster.org/en/latest/release-notes/ ? >>> >> > >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >>> >> > <srangana at redhat.com>: >>> >> > > >>> >> > > We created a 5.5 release tag, and it is under packaging now. It >>> should >>> >> > > be packaged and ready for testing early next week and should be >>> released >>> >> > > close to mid-week next week. >>> >> > > >>> >> > > Thanks, >>> >> > > Shyam >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >>> >> > > > Wednesday now with no update :-/ >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police <http://www.androidpolice.com>, APK >>> Mirror >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot LLC >>> >> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> >> > > > <http://twitter.com/ArtemR> >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < >>> archon810 at gmail.com >>> >> > > > <mailto:archon810 at gmail.com>> wrote: >>> >> > > > >>> >> > > > Hi Amar, >>> >> > > > >>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE >>> build >>> >> > > > repos. Maybe later today? >>> >> > > > >>> >> > > > Thanks. >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police <http://www.androidpolice.com>, >>> APK Mirror >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot LLC >>> >> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR >>> >> > > > <http://twitter.com/ArtemR> >>> >> > > > >>> >> > > > >>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan >>> >> > > > <atumball at redhat.com <mailto:atumball at redhat.com>> wrote: >>> >> > > > >>> >> > > > We are talking days. Not weeks. Considering already it >>> is >>> >> > > > Thursday here. 1 more day for tagging, and packaging. >>> May be ok >>> >> > > > to expect it on Monday. >>> >> > > > >>> >> > > > -Amar >>> >> > > > >>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >>> >> > > > <archon810 at gmail.com <mailto:archon810 at gmail.com>> >>> wrote: >>> >> > > > >>> >> > > > Is the next release going to be an imminent hotfix, >>> i.e. >>> >> > > > something like today/tomorrow, or are we talking >>> weeks? >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police < >>> http://www.androidpolice.com>, APK >>> >> > > > Mirror <http://www.apkmirror.com/>, Illogical >>> Robot LLC >>> >> > > > beerpla.net <http://beerpla.net/> | >>> +ArtemRussakovskii >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | >>> @ArtemR >>> >> > > > <http://twitter.com/ArtemR> >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii >>> >> > > > <archon810 at gmail.com <mailto:archon810 at gmail.com>> >>> wrote: >>> >> > > > >>> >> > > > Ended up downgrading to 5.3 just in case. Peer >>> status >>> >> > > > and volume status are OK now. >>> >> > > > >>> >> > > > zypper install --oldpackage >>> glusterfs-5.3-lp150.100.1 >>> >> > > > Loading repository data... >>> >> > > > Reading installed packages... >>> >> > > > Resolving package dependencies... >>> >> > > > >>> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 >>> requires >>> >> > > > libgfapi0 = 5.3, but this requirement cannot be >>> provided >>> >> > > > not installable providers: >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >>> >> > > > Solution 1: Following actions will be done: >>> >> > > > downgrade of libgfapi0-5.4-lp150.100.1.x86_64 >>> to >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to >>> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of libgfrpc0-5.4-lp150.100.1.x86_64 >>> to >>> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of libgfxdr0-5.4-lp150.100.1.x86_64 >>> to >>> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >>> >> > > > downgrade of >>> libglusterfs0-5.4-lp150.100.1.x86_64 to >>> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >>> >> > > > Solution 2: do not install >>> glusterfs-5.3-lp150.100.1.x86_64 >>> >> > > > Solution 3: break >>> glusterfs-5.3-lp150.100.1.x86_64 by >>> >> > > > ignoring some of its dependencies >>> >> > > > >>> >> > > > Choose from above solutions by number or cancel >>> >> > > > [1/2/3/c] (c): 1 >>> >> > > > Resolving dependencies... >>> >> > > > Resolving package dependencies... >>> >> > > > >>> >> > > > The following 6 packages are going to be >>> downgraded: >>> >> > > > glusterfs libgfapi0 libgfchangelog0 libgfrpc0 >>> >> > > > libgfxdr0 libglusterfs0 >>> >> > > > >>> >> > > > 6 packages to downgrade. >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police >>> >> > > > <http://www.androidpolice.com>, APK Mirror >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot >>> LLC >>> >> > > > beerpla.net <http://beerpla.net/> | >>> +ArtemRussakovskii >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | >>> @ArtemR >>> >> > > > <http://twitter.com/ArtemR> >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem >>> Russakovskii >>> >> > > > <archon810 at gmail.com <mailto: >>> archon810 at gmail.com>> wrote: >>> >> > > > >>> >> > > > Noticed the same when upgrading from 5.3 to >>> 5.4, as >>> >> > > > mentioned. >>> >> > > > >>> >> > > > I'm confused though. Is actual replication >>> affected, >>> >> > > > because the 5.4 server and the 3x 5.3 >>> servers still >>> >> > > > show heal info as all 4 connected, and the >>> files >>> >> > > > seem to be replicating correctly as well. >>> >> > > > >>> >> > > > So what's actually affected - just the >>> status >>> >> > > > command, or leaving 5.4 on one of the nodes >>> is doing >>> >> > > > some damage to the underlying fs? Is it >>> fixable by >>> >> > > > tweaking transport.socket.ssl-enabled? Does >>> >> > > > upgrading all servers to 5.4 resolve it, or >>> should >>> >> > > > we revert back to 5.3? >>> >> > > > >>> >> > > > Sincerely, >>> >> > > > Artem >>> >> > > > >>> >> > > > -- >>> >> > > > Founder, Android Police >>> >> > > > <http://www.androidpolice.com>, APK Mirror >>> >> > > > <http://www.apkmirror.com/>, Illogical >>> Robot LLC >>> >> > > > beerpla.net <http://beerpla.net/> | >>> >> > > > +ArtemRussakovskii >>> >> > > > <https://plus.google.com/+ArtemRussakovskii >>> > >>> >> > > > | @ArtemR <http://twitter.com/ArtemR> >>> >> > > > >>> >> > > > >>> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >>> >> > > > <revirii at googlemail.com >>> >> > > > <mailto:revirii at googlemail.com>> wrote: >>> >> > > > >>> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it >>> worked. >>> >> > > > all replicas are up and >>> >> > > > running. Awaiting updated v5.4. >>> >> > > > >>> >> > > > thx :-) >>> >> > > > >>> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr >>> schrieb Hari >>> >> > > > Gowtham <hgowtham at redhat.com >>> >> > > > <mailto:hgowtham at redhat.com>>: >>> >> > > > > >>> >> > > > > There are plans to revert the patch >>> causing >>> >> > > > this error and rebuilt 5.4. >>> >> > > > > This should happen faster. the >>> rebuilt 5.4 >>> >> > > > should be void of this upgrade issue. >>> >> > > > > >>> >> > > > > In the meantime, you can use 5.3 for >>> this cluster. >>> >> > > > > Downgrading to 5.3 will work if it >>> was just >>> >> > > > one node that was upgrade to 5.4 >>> >> > > > > and the other nodes are still in 5.3. >>> >> > > > > >>> >> > > > > On Tue, Mar 5, 2019 at 1:07 PM Hu Bert >>> >> > > > <revirii at googlemail.com >>> >> > > > <mailto:revirii at googlemail.com>> wrote: >>> >> > > > > > >>> >> > > > > > Hi Hari, >>> >> > > > > > >>> >> > > > > > thx for the hint. Do you know when >>> this will >>> >> > > > be fixed? Is a downgrade >>> >> > > > > > 5.4 -> 5.3 a possibility to fix >>> this? >>> >> > > > > > >>> >> > > > > > Hubert >>> >> > > > > > >>> >> > > > > > Am Di., 5. M?rz 2019 um 08:32 Uhr >>> schrieb >>> >> > > > Hari Gowtham <hgowtham at redhat.com >>> >> > > > <mailto:hgowtham at redhat.com>>: >>> >> > > > > > > >>> >> > > > > > > Hi, >>> >> > > > > > > >>> >> > > > > > > This is a known issue we are >>> working on. >>> >> > > > > > > As the checksum differs between >>> the >>> >> > > > updated and non updated node, the >>> >> > > > > > > peers are getting rejected. >>> >> > > > > > > The bricks aren't coming because >>> of the >>> >> > > > same issue. >>> >> > > > > > > >>> >> > > > > > > More about the issue: >>> >> > > > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1685120 >>> >> > > > > > > >>> >> > > > > > > On Tue, Mar 5, 2019 at 12:56 PM >>> Hu Bert >>> >> > > > <revirii at googlemail.com >>> >> > > > <mailto:revirii at googlemail.com>> wrote: >>> >> > > > > > > > >>> >> > > > > > > > Interestingly: gluster volume >>> status >>> >> > > > misses gluster1, while heal >>> >> > > > > > > > statistics show gluster1: >>> >> > > > > > > > >>> >> > > > > > > > gluster volume status workdata >>> >> > > > > > > > Status of volume: workdata >>> >> > > > > > > > Gluster process >>> >> > > > TCP Port RDMA Port Online Pid >>> >> > > > > > > > >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > Brick >>> gluster2:/gluster/md4/workdata >>> >> > > > 49153 0 Y 1723 >>> >> > > > > > > > Brick >>> gluster3:/gluster/md4/workdata >>> >> > > > 49153 0 Y 2068 >>> >> > > > > > > > Self-heal Daemon on localhost >>> >> > > > N/A N/A Y 1732 >>> >> > > > > > > > Self-heal Daemon on gluster3 >>> >> > > > N/A N/A Y 2077 >>> >> > > > > > > > >>> >> > > > > > > > vs. >>> >> > > > > > > > >>> >> > > > > > > > gluster volume heal workdata >>> statistics >>> >> > > > heal-count >>> >> > > > > > > > Gathering count of entries to >>> be healed >>> >> > > > on volume workdata has been successful >>> >> > > > > > > > >>> >> > > > > > > > Brick >>> gluster1:/gluster/md4/workdata >>> >> > > > > > > > Number of entries: 0 >>> >> > > > > > > > >>> >> > > > > > > > Brick >>> gluster2:/gluster/md4/workdata >>> >> > > > > > > > Number of entries: 10745 >>> >> > > > > > > > >>> >> > > > > > > > Brick >>> gluster3:/gluster/md4/workdata >>> >> > > > > > > > Number of entries: 10744 >>> >> > > > > > > > >>> >> > > > > > > > Am Di., 5. M?rz 2019 um 08:18 >>> Uhr >>> >> > > > schrieb Hu Bert <revirii at googlemail.com >>> >> > > > <mailto:revirii at googlemail.com>>: >>> >> > > > > > > > > >>> >> > > > > > > > > Hi Miling, >>> >> > > > > > > > > >>> >> > > > > > > > > well, there are such entries, >>> but >>> >> > > > those haven't been a problem during >>> >> > > > > > > > > install and the last kernel >>> >> > > > update+reboot. The entries look like: >>> >> > > > > > > > > >>> >> > > > > > > > > PUBLIC_IP >>> gluster2.alpserver.de >>> >> > > > <http://gluster2.alpserver.de> gluster2 >>> >> > > > > > > > > >>> >> > > > > > > > > 192.168.0.50 gluster1 >>> >> > > > > > > > > 192.168.0.51 gluster2 >>> >> > > > > > > > > 192.168.0.52 gluster3 >>> >> > > > > > > > > >>> >> > > > > > > > > 'ping gluster2' resolves to >>> LAN IP; I >>> >> > > > removed the last entry in the >>> >> > > > > > > > > 1st line, did a reboot ... >>> no, didn't >>> >> > > > help. From >>> >> > > > > > > > > >>> /var/log/glusterfs/glusterd.log >>> >> > > > > > > > > on gluster 2: >>> >> > > > > > > > > >>> >> > > > > > > > > [2019-03-05 07:04:36.188128] >>> E [MSGID: >>> >> > > > 106010] >>> >> > > > > > > > > >>> >> > > > >>> [glusterd-utils.c:3483:glusterd_compare_friend_volume] >>> >> > > > 0-management: >>> >> > > > > > > > > Version of Cksums persistent >>> differ. >>> >> > > > local cksum = 3950307018, remote >>> >> > > > > > > > > cksum = 455409345 on peer >>> gluster1 >>> >> > > > > > > > > [2019-03-05 07:04:36.188314] >>> I [MSGID: >>> >> > > > 106493] >>> >> > > > > > > > > >>> >> > > > >>> [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] >>> >> > > > 0-glusterd: >>> >> > > > > > > > > Responded to gluster1 (0), >>> ret: 0, >>> >> > > > op_ret: -1 >>> >> > > > > > > > > >>> >> > > > > > > > > Interestingly there are no >>> entries in >>> >> > > > the brick logs of the rejected >>> >> > > > > > > > > server. Well, not surprising >>> as no >>> >> > > > brick process is running. The >>> >> > > > > > > > > server gluster1 is still in >>> rejected >>> >> > > > state. >>> >> > > > > > > > > >>> >> > > > > > > > > 'gluster volume start >>> workdata force' >>> >> > > > starts the brick process on >>> >> > > > > > > > > gluster1, and some heals are >>> happening >>> >> > > > on gluster2+3, but via 'gluster >>> >> > > > > > > > > volume status workdata' the >>> volumes >>> >> > > > still aren't complete. >>> >> > > > > > > > > >>> >> > > > > > > > > gluster1: >>> >> > > > > > > > > >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > > Brick >>> gluster1:/gluster/md4/workdata >>> >> > > > 49152 0 Y 2523 >>> >> > > > > > > > > Self-heal Daemon on localhost >>> >> > > > N/A N/A Y 2549 >>> >> > > > > > > > > >>> >> > > > > > > > > gluster2: >>> >> > > > > > > > > Gluster process >>> >> > > > TCP Port RDMA Port Online Pid >>> >> > > > > > > > > >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > > Brick >>> gluster2:/gluster/md4/workdata >>> >> > > > 49153 0 Y 1723 >>> >> > > > > > > > > Brick >>> gluster3:/gluster/md4/workdata >>> >> > > > 49153 0 Y 2068 >>> >> > > > > > > > > Self-heal Daemon on localhost >>> >> > > > N/A N/A Y 1732 >>> >> > > > > > > > > Self-heal Daemon on gluster3 >>> >> > > > N/A N/A Y 2077 >>> >> > > > > > > > > >>> >> > > > > > > > > >>> >> > > > > > > > > Hubert >>> >> > > > > > > > > >>> >> > > > > > > > > Am Di., 5. M?rz 2019 um 07:58 >>> Uhr >>> >> > > > schrieb Milind Changire < >>> mchangir at redhat.com >>> >> > > > <mailto:mchangir at redhat.com>>: >>> >> > > > > > > > > > >>> >> > > > > > > > > > There are probably DNS >>> entries or >>> >> > > > /etc/hosts entries with the public IP >>> Addresses >>> >> > > > that the host names (gluster1, gluster2, >>> >> > > > gluster3) are getting resolved to. >>> >> > > > > > > > > > /etc/resolv.conf would tell >>> which is >>> >> > > > the default domain searched for the >>> node names >>> >> > > > and the DNS servers which respond to >>> the queries. >>> >> > > > > > > > > > >>> >> > > > > > > > > > >>> >> > > > > > > > > > On Tue, Mar 5, 2019 at >>> 12:14 PM Hu >>> >> > > > Bert <revirii at googlemail.com >>> >> > > > <mailto:revirii at googlemail.com>> wrote: >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Good morning, >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> i have a replicate 3 setup >>> with 2 >>> >> > > > volumes, running on version 5.3 on >>> >> > > > > > > > > >> debian stretch. This >>> morning i >>> >> > > > upgraded one server to version 5.4 and >>> >> > > > > > > > > >> rebooted the machine; >>> after the >>> >> > > > restart i noticed that: >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> - no brick process is >>> running >>> >> > > > > > > > > >> - gluster volume status >>> only shows >>> >> > > > the server itself: >>> >> > > > > > > > > >> gluster volume status >>> workdata >>> >> > > > > > > > > >> Status of volume: workdata >>> >> > > > > > > > > >> Gluster process >>> >> > > > TCP Port RDMA Port Online >>> Pid >>> >> > > > > > > > > >> >>> >> > > > >>> ------------------------------------------------------------------------------ >>> >> > > > > > > > > >> Brick >>> >> > > > gluster1:/gluster/md4/workdata >>> N/A >>> >> > > > N/A N N/A >>> >> > > > > > > > > >> NFS Server on localhost >>> >> > > > N/A N/A N >>> N/A >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> - gluster peer status on >>> the server >>> >> > > > > > > > > >> gluster peer status >>> >> > > > > > > > > >> Number of Peers: 2 >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster3 >>> >> > > > > > > > > >> Uuid: >>> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>> >> > > > > > > > > >> State: Peer Rejected >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster2 >>> >> > > > > > > > > >> Uuid: >>> >> > > > 162fea82-406a-4f51-81a3-e90235d8da27 >>> >> > > > > > > > > >> State: Peer Rejected >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> - gluster peer status on >>> the other >>> >> > > > 2 servers: >>> >> > > > > > > > > >> gluster peer status >>> >> > > > > > > > > >> Number of Peers: 2 >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster1 >>> >> > > > > > > > > >> Uuid: >>> >> > > > 9a360776-7b58-49ae-831e-a0ce4e4afbef >>> >> > > > > > > > > >> State: Peer Rejected >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Hostname: gluster3 >>> >> > > > > > > > > >> Uuid: >>> >> > > > c7b4a448-ca6a-4051-877f-788f9ee9bc4a >>> >> > > > > > > > > >> State: Peer in Cluster >>> (Connected) >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> I noticed that, in the >>> brick logs, >>> >> > > > i see that the public IP is used >>> >> > > > > > > > > >> instead of the LAN IP. >>> brick logs >>> >> > > > from one of the volumes: >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> rejected node: >>> >> > > > https://pastebin.com/qkpj10Sd >>> >> > > > > > > > > >> connected nodes: >>> >> > > > https://pastebin.com/8SxVVYFV >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Why is the public IP >>> suddenly used >>> >> > > > instead of the LAN IP? Killing all >>> >> > > > > > > > > >> gluster processes and >>> rebooting >>> >> > > > (again) didn't help. >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> >>> >> > > > > > > > > >> Thx, >>> >> > > > > > > > > >> Hubert >>> >> > > > > > > > > >> >>> >> > > > >>> _______________________________________________ >>> >> > > > > > > > > >> Gluster-users mailing list >>> >> > > > > > > > > >> Gluster-users at gluster.org >>> >> > > > <mailto:Gluster-users at gluster.org> >>> >> > > > > > > > > >> >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > > > > > > > >>> >> > > > > > > > > > >>> >> > > > > > > > > > >>> >> > > > > > > > > > -- >>> >> > > > > > > > > > Milind >>> >> > > > > > > > > > >>> >> > > > > > > > >>> >> > > > >>> _______________________________________________ >>> >> > > > > > > > Gluster-users mailing list >>> >> > > > > > > > Gluster-users at gluster.org >>> >> > > > <mailto:Gluster-users at gluster.org> >>> >> > > > > > > > >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > -- >>> >> > > > > > > Regards, >>> >> > > > > > > Hari Gowtham. >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > -- >>> >> > > > > Regards, >>> >> > > > > Hari Gowtham. >>> >> > > > >>> _______________________________________________ >>> >> > > > Gluster-users mailing list >>> >> > > > Gluster-users at gluster.org >>> >> > > > <mailto:Gluster-users at gluster.org> >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > >>> >> > > > _______________________________________________ >>> >> > > > Gluster-users mailing list >>> >> > > > Gluster-users at gluster.org <mailto: >>> Gluster-users at gluster.org> >>> >> > > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > Amar Tumballi (amarts) >>> >> > > > >>> >> > > > >>> >> > > > _______________________________________________ >>> >> > > > Gluster-users mailing list >>> >> > > > Gluster-users at gluster.org >>> >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> > > > >>> >> > > _______________________________________________ >>> >> > > Gluster-users mailing list >>> >> > > Gluster-users at gluster.org >>> >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > >>> > >>> > >>> > -- >>> > Amar Tumballi (amarts) >>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190319/fbc03107/attachment.html>
Amar Tumballi Suryanarayan
2019-Mar-20 05:45 UTC
[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP
On Wed, Mar 20, 2019 at 9:52 AM Artem Russakovskii <archon810 at gmail.com> wrote:> Can I roll back performance.write-behind: off and lru-limit=0 then? I'm > waiting for the debug packages to be available for OpenSUSE, then I can > help Amar with another debug session. > >Yes, the write-behind issue is now fixed. You can enable write-behind. Also remove lru-limit=0, so you can also utilize the benefit of garbage collection introduced in 5.4 Lets get to fixing the problem once the debuginfo packages are available.> In the meantime, have you had time to set up 1x4 replicate testing? I was > told you were only testing 1x3, and it's the 4th brick that may be causing > the crash, which is consistent with this whole time only 1 of 4 bricks > constantly crashing. The other 3 have been rock solid. I'm hoping you could > find the issue without a debug session this way. > >That is my gut feeling still. Added a basic test case with 4 bricks, https://review.gluster.org/#/c/glusterfs/+/22328/. But I think this particular issue is happening only on certain pattern of access for 1x4 setup. Lets get to the root of it once we have debuginfo packages for Suse builds. -Amar Sincerely,> Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > <http://twitter.com/ArtemR> > > > On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran <nbalacha at redhat.com> > wrote: > > > Hi Artem, > > > > I think you are running into a different crash. The ones reported which > > were prevented by turning off write-behind are now fixed. > > We will need to look into the one you are seeing to see why it is > > happening. > > > > Regards, > > Nithya > > > > > > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii <archon810 at gmail.com> > > wrote: > > > >> The flood is indeed fixed for us on 5.5. However, the crashes are not. > >> > >> Sincerely, > >> Artem > >> > >> -- > >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror > >> <http://www.apkmirror.com/>, Illogical Robot LLC > >> beerpla.net | +ArtemRussakovskii > >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > >> <http://twitter.com/ArtemR> > >> > >> > >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert <revirii at googlemail.com> wrote: > >> > >>> Hi Amar, > >>> > >>> if you refer to this bug: > >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test > >>> setup i haven't seen those entries, while copying & deleting a few GBs > >>> of data. For a final statement we have to wait until i updated our > >>> live gluster servers - could take place on tuesday or wednesday. > >>> > >>> Maybe other users can do an update to 5.4 as well and report back here. > >>> > >>> > >>> Hubert > >>> > >>> > >>> > >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan > >>> <atumball at redhat.com>: > >>> > > >>> > Hi Hu Bert, > >>> > > >>> > Appreciate the feedback. Also are the other boiling issues related to > >>> logs fixed now? > >>> > > >>> > -Amar > >>> > > >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert <revirii at googlemail.com> > >>> wrote: > >>> >> > >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 > >>> >> volumes done. In 'gluster peer status' the peers stay connected > during > >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the > >>> >> logs. Looks good :-) > >>> >> > >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < > >>> revirii at googlemail.com>: > >>> >> > > >>> >> > Good morning :-) > >>> >> > > >>> >> > for debian the packages are there: > >>> >> > > >>> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ > >>> >> > > >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if > >>> there > >>> >> > are some errors etc. and report back. > >>> >> > > >>> >> > btw: no release notes for 5.4 and 5.5 so far? > >>> >> > https://docs.gluster.org/en/latest/release-notes/ ? > >>> >> > > >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan > >>> >> > <srangana at redhat.com>: > >>> >> > > > >>> >> > > We created a 5.5 release tag, and it is under packaging now. It > >>> should > >>> >> > > be packaged and ready for testing early next week and should be > >>> released > >>> >> > > close to mid-week next week. > >>> >> > > > >>> >> > > Thanks, > >>> >> > > Shyam > >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: > >>> >> > > > Wednesday now with no update :-/ > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police <http://www.androidpolice.com>, APK > >>> Mirror > >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot LLC > >>> >> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii > >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > >>> >> > > > <http://twitter.com/ArtemR> > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < > >>> archon810 at gmail.com > >>> >> > > > <mailto:archon810 at gmail.com>> wrote: > >>> >> > > > > >>> >> > > > Hi Amar, > >>> >> > > > > >>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE > >>> build > >>> >> > > > repos. Maybe later today? > >>> >> > > > > >>> >> > > > Thanks. > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police <http://www.androidpolice.com>, > >>> APK Mirror > >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot LLC > >>> >> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii > >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR > >>> >> > > > <http://twitter.com/ArtemR> > >>> >> > > > > >>> >> > > > > >>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan > >>> >> > > > <atumball at redhat.com <mailto:atumball at redhat.com>> wrote: > >>> >> > > > > >>> >> > > > We are talking days. Not weeks. Considering already it > >>> is > >>> >> > > > Thursday here. 1 more day for tagging, and packaging. > >>> May be ok > >>> >> > > > to expect it on Monday. > >>> >> > > > > >>> >> > > > -Amar > >>> >> > > > > >>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii > >>> >> > > > <archon810 at gmail.com <mailto:archon810 at gmail.com>> > >>> wrote: > >>> >> > > > > >>> >> > > > Is the next release going to be an imminent > hotfix, > >>> i.e. > >>> >> > > > something like today/tomorrow, or are we talking > >>> weeks? > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police < > >>> http://www.androidpolice.com>, APK > >>> >> > > > Mirror <http://www.apkmirror.com/>, Illogical > >>> Robot LLC > >>> >> > > > beerpla.net <http://beerpla.net/> | > >>> +ArtemRussakovskii > >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | > >>> @ArtemR > >>> >> > > > <http://twitter.com/ArtemR> > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii > >>> >> > > > <archon810 at gmail.com <mailto:archon810 at gmail.com > >> > >>> wrote: > >>> >> > > > > >>> >> > > > Ended up downgrading to 5.3 just in case. Peer > >>> status > >>> >> > > > and volume status are OK now. > >>> >> > > > > >>> >> > > > zypper install --oldpackage > >>> glusterfs-5.3-lp150.100.1 > >>> >> > > > Loading repository data... > >>> >> > > > Reading installed packages... > >>> >> > > > Resolving package dependencies... > >>> >> > > > > >>> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 > >>> requires > >>> >> > > > libgfapi0 = 5.3, but this requirement cannot > be > >>> provided > >>> >> > > > not installable providers: > >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] > >>> >> > > > Solution 1: Following actions will be done: > >>> >> > > > downgrade of > libgfapi0-5.4-lp150.100.1.x86_64 > >>> to > >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to > >>> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > libgfrpc0-5.4-lp150.100.1.x86_64 > >>> to > >>> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > libgfxdr0-5.4-lp150.100.1.x86_64 > >>> to > >>> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 > >>> >> > > > downgrade of > >>> libglusterfs0-5.4-lp150.100.1.x86_64 to > >>> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 > >>> >> > > > Solution 2: do not install > >>> glusterfs-5.3-lp150.100.1.x86_64 > >>> >> > > > Solution 3: break > >>> glusterfs-5.3-lp150.100.1.x86_64 by > >>> >> > > > ignoring some of its dependencies > >>> >> > > > > >>> >> > > > Choose from above solutions by number or > cancel > >>> >> > > > [1/2/3/c] (c): 1 > >>> >> > > > Resolving dependencies... > >>> >> > > > Resolving package dependencies... > >>> >> > > > > >>> >> > > > The following 6 packages are going to be > >>> downgraded: > >>> >> > > > glusterfs libgfapi0 libgfchangelog0 > libgfrpc0 > >>> >> > > > libgfxdr0 libglusterfs0 > >>> >> > > > > >>> >> > > > 6 packages to downgrade. > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police > >>> >> > > > <http://www.androidpolice.com>, APK Mirror > >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot > >>> LLC > >>> >> > > > beerpla.net <http://beerpla.net/> | > >>> +ArtemRussakovskii > >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> > | > >>> @ArtemR > >>> >> > > > <http://twitter.com/ArtemR> > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem > >>> Russakovskii > >>> >> > > > <archon810 at gmail.com <mailto: > >>> archon810 at gmail.com>> wrote: > >>> >> > > > > >>> >> > > > Noticed the same when upgrading from 5.3 > to > >>> 5.4, as > >>> >> > > > mentioned. > >>> >> > > > > >>> >> > > > I'm confused though. Is actual replication > >>> affected, > >>> >> > > > because the 5.4 server and the 3x 5.3 > >>> servers still > >>> >> > > > show heal info as all 4 connected, and the > >>> files > >>> >> > > > seem to be replicating correctly as well. > >>> >> > > > > >>> >> > > > So what's actually affected - just the > >>> status > >>> >> > > > command, or leaving 5.4 on one of the > nodes > >>> is doing > >>> >> > > > some damage to the underlying fs? Is it > >>> fixable by > >>> >> > > > tweaking transport.socket.ssl-enabled? > Does > >>> >> > > > upgrading all servers to 5.4 resolve it, > or > >>> should > >>> >> > > > we revert back to 5.3? > >>> >> > > > > >>> >> > > > Sincerely, > >>> >> > > > Artem > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Founder, Android Police > >>> >> > > > <http://www.androidpolice.com>, APK > Mirror > >>> >> > > > <http://www.apkmirror.com/>, Illogical > >>> Robot LLC > >>> >> > > > beerpla.net <http://beerpla.net/> | > >>> >> > > > +ArtemRussakovskii > >>> >> > > > < > https://plus.google.com/+ArtemRussakovskii > >>> > > >>> >> > > > | @ArtemR <http://twitter.com/ArtemR> > >>> >> > > > > >>> >> > > > > >>> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert > >>> >> > > > <revirii at googlemail.com > >>> >> > > > <mailto:revirii at googlemail.com>> wrote: > >>> >> > > > > >>> >> > > > fyi: did a downgrade 5.4 -> 5.3 and it > >>> worked. > >>> >> > > > all replicas are up and > >>> >> > > > running. Awaiting updated v5.4. > >>> >> > > > > >>> >> > > > thx :-) > >>> >> > > > > >>> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr > >>> schrieb Hari > >>> >> > > > Gowtham <hgowtham at redhat.com > >>> >> > > > <mailto:hgowtham at redhat.com>>: > >>> >> > > > > > >>> >> > > > > There are plans to revert the patch > >>> causing > >>> >> > > > this error and rebuilt 5.4. > >>> >> > > > > This should happen faster. the > >>> rebuilt 5.4 > >>> >> > > > should be void of this upgrade issue. > >>> >> > > > > > >>> >> > > > > In the meantime, you can use 5.3 for > >>> this cluster. > >>> >> > > > > Downgrading to 5.3 will work if it > >>> was just > >>> >> > > > one node that was upgrade to 5.4 > >>> >> > > > > and the other nodes are still in 5.3-- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190320/7475ac69/attachment.html>