Mahdi Adnan
2017-May-26  11:16 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
Hi, Attached are the logs for both the rebalance and the mount. -- Respectfully Mahdi A. Mahdi ________________________________ From: Krutika Dhananjay <kdhananj at redhat.com> Sent: Friday, May 26, 2017 1:12:28 PM To: Mahdi Adnan Cc: gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier Subject: Re: Rebalance + VM corruption - current status and request for feedback Could you provide the rebalance and mount logs? -Krutika On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Good morning, So i have tested the new Gluster 3.10.2, and after starting rebalance two VMs were paused due to storage error and third one was not responding. After rebalance completed i started the VMs and it did not boot, and throw an XFS wrong inode error into the screen. My setup: 4 nodes running CentOS7.3 with Gluster 3.10.2 4 bricks in distributed replica with group set to virt. I added the volume to ovirt and created three VMs, i ran a loop to create 5GB file inside the VMs. Added new 4 bricks to the existing nodes. Started rebalane "with force to bypass the warning message" VMs started to fail after rebalancing. -- Respectfully Mahdi A. Mahdi ________________________________ From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> Sent: Wednesday, May 17, 2017 6:59:20 AM To: gluster-user Cc: Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi Adnan Subject: Rebalance + VM corruption - current status and request for feedback Hi, In the past couple of weeks, we've sent the following fixes concerning VM corruption upon doing rebalance - https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051 These fixes are very much part of the latest 3.10.2 release. Satheesaran within Red Hat also verified that they work and he's not seeing corruption issues anymore. I'd like to hear feedback from the users themselves on these fixes (on your test environments to begin with) before even changing the status of the bug to CLOSED. Although 3.10.2 has a patch that prevents rebalance sub-commands from being executed on sharded volumes, you can override the check by using the 'force' option. For example, # gluster volume rebalance myvol start force Very much looking forward to hearing from you all. Thanks, Krutika -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170526/47ed5606/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.tar.gz Type: application/gzip Size: 83023 bytes Desc: logs.tar.gz URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170526/47ed5606/attachment.bin>
Krutika Dhananjay
2017-May-29  06:20 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
Hi, I took a look at your logs. It very much seems like an issue that is caused by a mismatch in glusterfs client and server packages. So your client (mount) seems to be still running 3.7.20, as confirmed by the occurrence of the following log message: [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol /rhev/data-center/mnt/glusterSD/s1:_testvol) [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol /rhev/data-center/mnt/glusterSD/s1:_testvol) [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol /rhev/data-center/mnt/glusterSD/s1:_testvol) whereas the servers have rightly been upgraded to 3.10.2, as seen in rebalance log: [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=5 --xlator-option *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option *dht.commit-hash=3376396580 --socket-file /var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid -l /var/log/glusterfs/testvol-rebalance.log) Could you upgrade all packages to 3.10.2 and try again? -Krutika On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hi, > > > Attached are the logs for both the rebalance and the mount. > > > > -- > > Respectfully > *Mahdi A. Mahdi* > > ------------------------------ > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Friday, May 26, 2017 1:12:28 PM > *To:* Mahdi Adnan > *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin > Lemonnier > *Subject:* Re: Rebalance + VM corruption - current status and request for > feedback > > Could you provide the rebalance and mount logs? > > -Krutika > > On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Good morning, >> >> >> So i have tested the new Gluster 3.10.2, and after starting rebalance two >> VMs were paused due to storage error and third one was not responding. >> >> After rebalance completed i started the VMs and it did not boot, and >> throw an XFS wrong inode error into the screen. >> >> >> My setup: >> >> 4 nodes running CentOS7.3 with Gluster 3.10.2 >> >> 4 bricks in distributed replica with group set to virt. >> >> I added the volume to ovirt and created three VMs, i ran a loop to create >> 5GB file inside the VMs. >> >> Added new 4 bricks to the existing nodes. >> >> Started rebalane "with force to bypass the warning message" >> >> VMs started to fail after rebalancing. >> >> >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Wednesday, May 17, 2017 6:59:20 AM >> *To:* gluster-user >> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi >> Adnan >> *Subject:* Rebalance + VM corruption - current status and request for >> feedback >> >> Hi, >> >> In the past couple of weeks, we've sent the following fixes concerning VM >> corruption upon doing rebalance - https://review.gluster.org/#/q >> /status:merged+project:glusterfs+branch:master+topic:bug-1440051 >> >> These fixes are very much part of the latest 3.10.2 release. >> >> Satheesaran within Red Hat also verified that they work and he's not >> seeing corruption issues anymore. >> >> I'd like to hear feedback from the users themselves on these fixes (on >> your test environments to begin with) before even changing the status of >> the bug to CLOSED. >> >> Although 3.10.2 has a patch that prevents rebalance sub-commands from >> being executed on sharded volumes, you can override the check by using the >> 'force' option. >> >> For example, >> >> # gluster volume rebalance myvol start force >> >> Very much looking forward to hearing from you all. >> >> Thanks, >> Krutika >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170529/2f62a55a/attachment.html>