Gandalf Corvotempesta
2017-Jun-04  12:00 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
Great news. Is this planned to be published in next release? Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at redhat.com> ha scritto:> Thanks for that update. Very happy to hear it ran fine without any issues. > :) > > Yeah so you can ignore those 'No such file or directory' errors. They > represent a transient state where DHT in the client process is yet to > figure out the new location of the file. > > -Krutika > > > On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Hello, >> >> >> Yes, i forgot to upgrade the client as well. >> >> I did the upgrade and created a new volume, same options as before, with >> one VM running and doing lots of IOs. i started the rebalance with force >> and after it completed the process i rebooted the VM, and it did start >> normally without issues. >> >> I repeated the process and did another rebalance while the VM running and >> everything went fine. >> >> But the logs in the client throwing lots of warning messages: >> >> >> [2017-05-29 13:14:59.416382] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> [2017-05-29 13:14:59.416427] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> [2017-05-29 13:14:59.808251] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> [2017-05-29 13:14:59.808287] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> >> >> >> Although the process went smooth, i will run another extensive test >> tomorrow just to be sure. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Monday, May 29, 2017 9:20:29 AM >> >> *To:* Mahdi Adnan >> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >> Lemonnier >> *Subject:* Re: Rebalance + VM corruption - current status and request >> for feedback >> >> Hi, >> >> I took a look at your logs. >> It very much seems like an issue that is caused by a mismatch in >> glusterfs client and server packages. >> So your client (mount) seems to be still running 3.7.20, as confirmed by >> the occurrence of the following log message: >> >> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >> /rhev/data-center/mnt/glusterSD/s1:_testvol) >> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >> /rhev/data-center/mnt/glusterSD/s1:_testvol) >> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >> /rhev/data-center/mnt/glusterSD/s1:_testvol) >> >> whereas the servers have rightly been upgraded to 3.10.2, as seen in >> rebalance log: >> >> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 >> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol >> --xlator-option *dht.use-readdirp=yes --xlator-option >> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes >> --xlator-option *replicate*.data-self-heal=off --xlator-option >> *replicate*.metadata-self-heal=off --xlator-option >> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on >> --xlator-option *dht.rebalance-cmd=5 --xlator-option >> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option >> *dht.commit-hash=3376396580 <(337)%20639-6580> --socket-file >> /var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock >> --pid-file /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid >> -l /var/log/glusterfs/testvol-rebalance.log) >> >> >> Could you upgrade all packages to 3.10.2 and try again? >> >> -Krutika >> >> >> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >>> Hi, >>> >>> >>> Attached are the logs for both the rebalance and the mount. >>> >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> ------------------------------ >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Friday, May 26, 2017 1:12:28 PM >>> *To:* Mahdi Adnan >>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >>> Lemonnier >>> *Subject:* Re: Rebalance + VM corruption - current status and request >>> for feedback >>> >>> Could you provide the rebalance and mount logs? >>> >>> -Krutika >>> >>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>>> Good morning, >>>> >>>> >>>> So i have tested the new Gluster 3.10.2, and after starting rebalance >>>> two VMs were paused due to storage error and third one was not responding. >>>> >>>> After rebalance completed i started the VMs and it did not boot, and >>>> throw an XFS wrong inode error into the screen. >>>> >>>> >>>> My setup: >>>> >>>> 4 nodes running CentOS7.3 with Gluster 3.10.2 >>>> >>>> 4 bricks in distributed replica with group set to virt. >>>> >>>> I added the volume to ovirt and created three VMs, i ran a loop to >>>> create 5GB file inside the VMs. >>>> >>>> Added new 4 bricks to the existing nodes. >>>> >>>> Started rebalane "with force to bypass the warning message" >>>> >>>> VMs started to fail after rebalancing. >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> ------------------------------ >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM >>>> *To:* gluster-user >>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi >>>> Adnan >>>> *Subject:* Rebalance + VM corruption - current status and request for >>>> feedback >>>> >>>> Hi, >>>> >>>> In the past couple of weeks, we've sent the following fixes concerning >>>> VM corruption upon doing rebalance - https://review.gluster.org/#/q >>>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051 >>>> >>>> These fixes are very much part of the latest 3.10.2 release. >>>> >>>> Satheesaran within Red Hat also verified that they work and he's not >>>> seeing corruption issues anymore. >>>> >>>> I'd like to hear feedback from the users themselves on these fixes (on >>>> your test environments to begin with) before even changing the status of >>>> the bug to CLOSED. >>>> >>>> Although 3.10.2 has a patch that prevents rebalance sub-commands from >>>> being executed on sharded volumes, you can override the check by using the >>>> 'force' option. >>>> >>>> For example, >>>> >>>> # gluster volume rebalance myvol start force >>>> >>>> Very much looking forward to hearing from you all. >>>> >>>> Thanks, >>>> Krutika >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170604/c0bab3a5/attachment.html>
Krutika Dhananjay
2017-Jun-05  04:49 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
The fixes are already available in 3.10.2, 3.8.12 and 3.11.0 -Krutika On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta < gandalf.corvotempesta at gmail.com> wrote:> Great news. > Is this planned to be published in next release? > > Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at redhat.com> ha > scritto: > >> Thanks for that update. Very happy to hear it ran fine without any >> issues. :) >> >> Yeah so you can ignore those 'No such file or directory' errors. They >> represent a transient state where DHT in the client process is yet to >> figure out the new location of the file. >> >> -Krutika >> >> >> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >>> Hello, >>> >>> >>> Yes, i forgot to upgrade the client as well. >>> >>> I did the upgrade and created a new volume, same options as before, with >>> one VM running and doing lots of IOs. i started the rebalance with force >>> and after it completed the process i rebooted the VM, and it did start >>> normally without issues. >>> >>> I repeated the process and did another rebalance while the VM running >>> and everything went fine. >>> >>> But the logs in the client throwing lots of warning messages: >>> >>> >>> [2017-05-29 13:14:59.416382] W [MSGID: 114031] >>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>> [2017-05-29 13:14:59.416427] W [MSGID: 114031] >>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>> [2017-05-29 13:14:59.808251] W [MSGID: 114031] >>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>> [2017-05-29 13:14:59.808287] W [MSGID: 114031] >>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>> >>> >>> >>> Although the process went smooth, i will run another extensive test >>> tomorrow just to be sure. >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> ------------------------------ >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Monday, May 29, 2017 9:20:29 AM >>> >>> *To:* Mahdi Adnan >>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >>> Lemonnier >>> *Subject:* Re: Rebalance + VM corruption - current status and request >>> for feedback >>> >>> Hi, >>> >>> I took a look at your logs. >>> It very much seems like an issue that is caused by a mismatch in >>> glusterfs client and server packages. >>> So your client (mount) seems to be still running 3.7.20, as confirmed by >>> the occurrence of the following log message: >>> >>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main] >>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >>> /rhev/data-center/mnt/glusterSD/s1:_testvol) >>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main] >>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >>> /rhev/data-center/mnt/glusterSD/s1:_testvol) >>> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main] >>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >>> /rhev/data-center/mnt/glusterSD/s1:_testvol) >>> >>> whereas the servers have rightly been upgraded to 3.10.2, as seen in >>> rebalance log: >>> >>> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main] >>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 >>> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol >>> --xlator-option *dht.use-readdirp=yes --xlator-option >>> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes >>> --xlator-option *replicate*.data-self-heal=off --xlator-option >>> *replicate*.metadata-self-heal=off --xlator-option >>> *replicate*.entry-self-heal=off --xlator-option >>> *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=5 >>> --xlator-option *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b >>> --xlator-option *dht.commit-hash=3376396580 <(337)%20639-6580> >>> --socket-file /var/run/gluster/gluster-rebal >>> ance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file >>> /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid >>> -l /var/log/glusterfs/testvol-rebalance.log) >>> >>> >>> Could you upgrade all packages to 3.10.2 and try again? >>> >>> -Krutika >>> >>> >>> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> Attached are the logs for both the rebalance and the mount. >>>> >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> ------------------------------ >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Friday, May 26, 2017 1:12:28 PM >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >>>> Lemonnier >>>> *Subject:* Re: Rebalance + VM corruption - current status and request >>>> for feedback >>>> >>>> Could you provide the rebalance and mount logs? >>>> >>>> -Krutika >>>> >>>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>>> Good morning, >>>>> >>>>> >>>>> So i have tested the new Gluster 3.10.2, and after starting rebalance >>>>> two VMs were paused due to storage error and third one was not responding. >>>>> >>>>> After rebalance completed i started the VMs and it did not boot, and >>>>> throw an XFS wrong inode error into the screen. >>>>> >>>>> >>>>> My setup: >>>>> >>>>> 4 nodes running CentOS7.3 with Gluster 3.10.2 >>>>> >>>>> 4 bricks in distributed replica with group set to virt. >>>>> >>>>> I added the volume to ovirt and created three VMs, i ran a loop to >>>>> create 5GB file inside the VMs. >>>>> >>>>> Added new 4 bricks to the existing nodes. >>>>> >>>>> Started rebalane "with force to bypass the warning message" >>>>> >>>>> VMs started to fail after rebalancing. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> ------------------------------ >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM >>>>> *To:* gluster-user >>>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; >>>>> Mahdi Adnan >>>>> *Subject:* Rebalance + VM corruption - current status and request for >>>>> feedback >>>>> >>>>> Hi, >>>>> >>>>> In the past couple of weeks, we've sent the following fixes concerning >>>>> VM corruption upon doing rebalance - https://review.gluster.org/#/q >>>>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051 >>>>> >>>>> These fixes are very much part of the latest 3.10.2 release. >>>>> >>>>> Satheesaran within Red Hat also verified that they work and he's not >>>>> seeing corruption issues anymore. >>>>> >>>>> I'd like to hear feedback from the users themselves on these fixes (on >>>>> your test environments to begin with) before even changing the status of >>>>> the bug to CLOSED. >>>>> >>>>> Although 3.10.2 has a patch that prevents rebalance sub-commands from >>>>> being executed on sharded volumes, you can override the check by using the >>>>> 'force' option. >>>>> >>>>> For example, >>>>> >>>>> # gluster volume rebalance myvol start force >>>>> >>>>> Very much looking forward to hearing from you all. >>>>> >>>>> Thanks, >>>>> Krutika >>>>> >>>> >>>> >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170605/207f6e9f/attachment.html>
Gandalf Corvotempesta
2017-Jun-05  11:36 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
Great, thanks! Il 5 giu 2017 6:49 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha scritto:> The fixes are already available in 3.10.2, 3.8.12 and 3.11.0 > > -Krutika > > On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta < > gandalf.corvotempesta at gmail.com> wrote: > >> Great news. >> Is this planned to be published in next release? >> >> Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at redhat.com> ha >> scritto: >> >>> Thanks for that update. Very happy to hear it ran fine without any >>> issues. :) >>> >>> Yeah so you can ignore those 'No such file or directory' errors. They >>> represent a transient state where DHT in the client process is yet to >>> figure out the new location of the file. >>> >>> -Krutika >>> >>> >>> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>>> Hello, >>>> >>>> >>>> Yes, i forgot to upgrade the client as well. >>>> >>>> I did the upgrade and created a new volume, same options as before, >>>> with one VM running and doing lots of IOs. i started the rebalance with >>>> force and after it completed the process i rebooted the VM, and it did >>>> start normally without issues. >>>> >>>> I repeated the process and did another rebalance while the VM running >>>> and everything went fine. >>>> >>>> But the logs in the client throwing lots of warning messages: >>>> >>>> >>>> [2017-05-29 13:14:59.416382] W [MSGID: 114031] >>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>>> [2017-05-29 13:14:59.416427] W [MSGID: 114031] >>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>>> [2017-05-29 13:14:59.808251] W [MSGID: 114031] >>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>>> [2017-05-29 13:14:59.808287] W [MSGID: 114031] >>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >>>> >>>> >>>> >>>> Although the process went smooth, i will run another extensive test >>>> tomorrow just to be sure. >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> ------------------------------ >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Monday, May 29, 2017 9:20:29 AM >>>> >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >>>> Lemonnier >>>> *Subject:* Re: Rebalance + VM corruption - current status and request >>>> for feedback >>>> >>>> Hi, >>>> >>>> I took a look at your logs. >>>> It very much seems like an issue that is caused by a mismatch in >>>> glusterfs client and server packages. >>>> So your client (mount) seems to be still running 3.7.20, as confirmed >>>> by the occurrence of the following log message: >>>> >>>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main] >>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >>>> /rhev/data-center/mnt/glusterSD/s1:_testvol) >>>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main] >>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >>>> /rhev/data-center/mnt/glusterSD/s1:_testvol) >>>> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main] >>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >>>> /rhev/data-center/mnt/glusterSD/s1:_testvol) >>>> >>>> whereas the servers have rightly been upgraded to 3.10.2, as seen in >>>> rebalance log: >>>> >>>> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main] >>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 >>>> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol >>>> --xlator-option *dht.use-readdirp=yes --xlator-option >>>> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes >>>> --xlator-option *replicate*.data-self-heal=off --xlator-option >>>> *replicate*.metadata-self-heal=off --xlator-option >>>> *replicate*.entry-self-heal=off --xlator-option >>>> *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=5 >>>> --xlator-option *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b >>>> --xlator-option *dht.commit-hash=3376396580 <(337)%20639-6580> >>>> --socket-file /var/run/gluster/gluster-rebal >>>> ance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file >>>> /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid >>>> -l /var/log/glusterfs/testvol-rebalance.log) >>>> >>>> >>>> Could you upgrade all packages to 3.10.2 and try again? >>>> >>>> -Krutika >>>> >>>> >>>> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> Attached are the logs for both the rebalance and the mount. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> ------------------------------ >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> *Sent:* Friday, May 26, 2017 1:12:28 PM >>>>> *To:* Mahdi Adnan >>>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >>>>> Lemonnier >>>>> *Subject:* Re: Rebalance + VM corruption - current status and request >>>>> for feedback >>>>> >>>>> Could you provide the rebalance and mount logs? >>>>> >>>>> -Krutika >>>>> >>>>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>>> Good morning, >>>>>> >>>>>> >>>>>> So i have tested the new Gluster 3.10.2, and after starting rebalance >>>>>> two VMs were paused due to storage error and third one was not responding. >>>>>> >>>>>> After rebalance completed i started the VMs and it did not boot, and >>>>>> throw an XFS wrong inode error into the screen. >>>>>> >>>>>> >>>>>> My setup: >>>>>> >>>>>> 4 nodes running CentOS7.3 with Gluster 3.10.2 >>>>>> >>>>>> 4 bricks in distributed replica with group set to virt. >>>>>> >>>>>> I added the volume to ovirt and created three VMs, i ran a loop to >>>>>> create 5GB file inside the VMs. >>>>>> >>>>>> Added new 4 bricks to the existing nodes. >>>>>> >>>>>> Started rebalane "with force to bypass the warning message" >>>>>> >>>>>> VMs started to fail after rebalancing. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> *Mahdi A. Mahdi* >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM >>>>>> *To:* gluster-user >>>>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; >>>>>> Mahdi Adnan >>>>>> *Subject:* Rebalance + VM corruption - current status and request >>>>>> for feedback >>>>>> >>>>>> Hi, >>>>>> >>>>>> In the past couple of weeks, we've sent the following fixes >>>>>> concerning VM corruption upon doing rebalance - >>>>>> https://review.gluster.org/#/q/status:merged+project:gluster >>>>>> fs+branch:master+topic:bug-1440051 >>>>>> >>>>>> These fixes are very much part of the latest 3.10.2 release. >>>>>> >>>>>> Satheesaran within Red Hat also verified that they work and he's not >>>>>> seeing corruption issues anymore. >>>>>> >>>>>> I'd like to hear feedback from the users themselves on these fixes >>>>>> (on your test environments to begin with) before even changing the status >>>>>> of the bug to CLOSED. >>>>>> >>>>>> Although 3.10.2 has a patch that prevents rebalance sub-commands from >>>>>> being executed on sharded volumes, you can override the check by using the >>>>>> 'force' option. >>>>>> >>>>>> For example, >>>>>> >>>>>> # gluster volume rebalance myvol start force >>>>>> >>>>>> Very much looking forward to hearing from you all. >>>>>> >>>>>> Thanks, >>>>>> Krutika >>>>>> >>>>> >>>>> >>>> >>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170605/99bcb771/attachment.html>
Apparently Analagous Threads
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback