Krutika Dhananjay
2017-May-29 13:27 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
Thanks for that update. Very happy to hear it ran fine without any issues. :) Yeah so you can ignore those 'No such file or directory' errors. They represent a transient state where DHT in the client process is yet to figure out the new location of the file. -Krutika On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hello, > > > Yes, i forgot to upgrade the client as well. > > I did the upgrade and created a new volume, same options as before, with > one VM running and doing lots of IOs. i started the rebalance with force > and after it completed the process i rebooted the VM, and it did start > normally without issues. > > I repeated the process and did another rebalance while the VM running and > everything went fine. > > But the logs in the client throwing lots of warning messages: > > > [2017-05-29 13:14:59.416382] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 2-gfs_vol2-client-2: remote operation failed. Path: > /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe- > f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f > (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] > [2017-05-29 13:14:59.416427] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 2-gfs_vol2-client-3: remote operation failed. Path: > /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe- > f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f > (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] > [2017-05-29 13:14:59.808251] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 2-gfs_vol2-client-2: remote operation failed. Path: > /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe- > f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f > (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] > [2017-05-29 13:14:59.808287] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 2-gfs_vol2-client-3: remote operation failed. Path: > /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe- > f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f > (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] > > > > Although the process went smooth, i will run another extensive test > tomorrow just to be sure. > > -- > > Respectfully > *Mahdi A. Mahdi* > > ------------------------------ > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Monday, May 29, 2017 9:20:29 AM > > *To:* Mahdi Adnan > *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin > Lemonnier > *Subject:* Re: Rebalance + VM corruption - current status and request for > feedback > > Hi, > > I took a look at your logs. > It very much seems like an issue that is caused by a mismatch in glusterfs > client and server packages. > So your client (mount) seems to be still running 3.7.20, as confirmed by > the occurrence of the following log message: > > [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 > (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 > --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol > /rhev/data-center/mnt/glusterSD/s1:_testvol) > [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 > (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 > --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol > /rhev/data-center/mnt/glusterSD/s1:_testvol) > [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 > (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 > --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol > /rhev/data-center/mnt/glusterSD/s1:_testvol) > > whereas the servers have rightly been upgraded to 3.10.2, as seen in > rebalance log: > > [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 > (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol > --xlator-option *dht.use-readdirp=yes --xlator-option > *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes > --xlator-option *replicate*.data-self-heal=off --xlator-option > *replicate*.metadata-self-heal=off --xlator-option > *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on > --xlator-option *dht.rebalance-cmd=5 --xlator-option > *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option > *dht.commit-hash=3376396580 --socket-file /var/run/gluster/gluster- > rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file > /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e- > 1ede-47b1-b9a5-bfde6e60f07b.pid -l /var/log/glusterfs/testvol- > rebalance.log) > > > Could you upgrade all packages to 3.10.2 and try again? > > -Krutika > > > On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Hi, >> >> >> Attached are the logs for both the rebalance and the mount. >> >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Friday, May 26, 2017 1:12:28 PM >> *To:* Mahdi Adnan >> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >> Lemonnier >> *Subject:* Re: Rebalance + VM corruption - current status and request >> for feedback >> >> Could you provide the rebalance and mount logs? >> >> -Krutika >> >> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >>> Good morning, >>> >>> >>> So i have tested the new Gluster 3.10.2, and after starting rebalance >>> two VMs were paused due to storage error and third one was not responding. >>> >>> After rebalance completed i started the VMs and it did not boot, and >>> throw an XFS wrong inode error into the screen. >>> >>> >>> My setup: >>> >>> 4 nodes running CentOS7.3 with Gluster 3.10.2 >>> >>> 4 bricks in distributed replica with group set to virt. >>> >>> I added the volume to ovirt and created three VMs, i ran a loop to >>> create 5GB file inside the VMs. >>> >>> Added new 4 bricks to the existing nodes. >>> >>> Started rebalane "with force to bypass the warning message" >>> >>> VMs started to fail after rebalancing. >>> >>> >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> ------------------------------ >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM >>> *To:* gluster-user >>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi >>> Adnan >>> *Subject:* Rebalance + VM corruption - current status and request for >>> feedback >>> >>> Hi, >>> >>> In the past couple of weeks, we've sent the following fixes concerning >>> VM corruption upon doing rebalance - https://review.gluster.org/#/q >>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051 >>> >>> These fixes are very much part of the latest 3.10.2 release. >>> >>> Satheesaran within Red Hat also verified that they work and he's not >>> seeing corruption issues anymore. >>> >>> I'd like to hear feedback from the users themselves on these fixes (on >>> your test environments to begin with) before even changing the status of >>> the bug to CLOSED. >>> >>> Although 3.10.2 has a patch that prevents rebalance sub-commands from >>> being executed on sharded volumes, you can override the check by using the >>> 'force' option. >>> >>> For example, >>> >>> # gluster volume rebalance myvol start force >>> >>> Very much looking forward to hearing from you all. >>> >>> Thanks, >>> Krutika >>> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170529/91faacc3/attachment.html>
Gandalf Corvotempesta
2017-Jun-04 12:00 UTC
[Gluster-users] Rebalance + VM corruption - current status and request for feedback
Great news. Is this planned to be published in next release? Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at redhat.com> ha scritto:> Thanks for that update. Very happy to hear it ran fine without any issues. > :) > > Yeah so you can ignore those 'No such file or directory' errors. They > represent a transient state where DHT in the client process is yet to > figure out the new location of the file. > > -Krutika > > > On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Hello, >> >> >> Yes, i forgot to upgrade the client as well. >> >> I did the upgrade and created a new volume, same options as before, with >> one VM running and doing lots of IOs. i started the rebalance with force >> and after it completed the process i rebooted the VM, and it did start >> normally without issues. >> >> I repeated the process and did another rebalance while the VM running and >> everything went fine. >> >> But the logs in the client throwing lots of warning messages: >> >> >> [2017-05-29 13:14:59.416382] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> [2017-05-29 13:14:59.416427] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> [2017-05-29 13:14:59.808251] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> [2017-05-29 13:14:59.808287] W [MSGID: 114031] >> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: >> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c >> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f >> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory] >> >> >> >> Although the process went smooth, i will run another extensive test >> tomorrow just to be sure. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Monday, May 29, 2017 9:20:29 AM >> >> *To:* Mahdi Adnan >> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >> Lemonnier >> *Subject:* Re: Rebalance + VM corruption - current status and request >> for feedback >> >> Hi, >> >> I took a look at your logs. >> It very much seems like an issue that is caused by a mismatch in >> glusterfs client and server packages. >> So your client (mount) seems to be still running 3.7.20, as confirmed by >> the occurrence of the following log message: >> >> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >> /rhev/data-center/mnt/glusterSD/s1:_testvol) >> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >> /rhev/data-center/mnt/glusterSD/s1:_testvol) >> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 >> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 >> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol >> /rhev/data-center/mnt/glusterSD/s1:_testvol) >> >> whereas the servers have rightly been upgraded to 3.10.2, as seen in >> rebalance log: >> >> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main] >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 >> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol >> --xlator-option *dht.use-readdirp=yes --xlator-option >> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes >> --xlator-option *replicate*.data-self-heal=off --xlator-option >> *replicate*.metadata-self-heal=off --xlator-option >> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on >> --xlator-option *dht.rebalance-cmd=5 --xlator-option >> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option >> *dht.commit-hash=3376396580 <(337)%20639-6580> --socket-file >> /var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock >> --pid-file /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid >> -l /var/log/glusterfs/testvol-rebalance.log) >> >> >> Could you upgrade all packages to 3.10.2 and try again? >> >> -Krutika >> >> >> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >>> Hi, >>> >>> >>> Attached are the logs for both the rebalance and the mount. >>> >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> ------------------------------ >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Friday, May 26, 2017 1:12:28 PM >>> *To:* Mahdi Adnan >>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin >>> Lemonnier >>> *Subject:* Re: Rebalance + VM corruption - current status and request >>> for feedback >>> >>> Could you provide the rebalance and mount logs? >>> >>> -Krutika >>> >>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>>> Good morning, >>>> >>>> >>>> So i have tested the new Gluster 3.10.2, and after starting rebalance >>>> two VMs were paused due to storage error and third one was not responding. >>>> >>>> After rebalance completed i started the VMs and it did not boot, and >>>> throw an XFS wrong inode error into the screen. >>>> >>>> >>>> My setup: >>>> >>>> 4 nodes running CentOS7.3 with Gluster 3.10.2 >>>> >>>> 4 bricks in distributed replica with group set to virt. >>>> >>>> I added the volume to ovirt and created three VMs, i ran a loop to >>>> create 5GB file inside the VMs. >>>> >>>> Added new 4 bricks to the existing nodes. >>>> >>>> Started rebalane "with force to bypass the warning message" >>>> >>>> VMs started to fail after rebalancing. >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> ------------------------------ >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM >>>> *To:* gluster-user >>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi >>>> Adnan >>>> *Subject:* Rebalance + VM corruption - current status and request for >>>> feedback >>>> >>>> Hi, >>>> >>>> In the past couple of weeks, we've sent the following fixes concerning >>>> VM corruption upon doing rebalance - https://review.gluster.org/#/q >>>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051 >>>> >>>> These fixes are very much part of the latest 3.10.2 release. >>>> >>>> Satheesaran within Red Hat also verified that they work and he's not >>>> seeing corruption issues anymore. >>>> >>>> I'd like to hear feedback from the users themselves on these fixes (on >>>> your test environments to begin with) before even changing the status of >>>> the bug to CLOSED. >>>> >>>> Although 3.10.2 has a patch that prevents rebalance sub-commands from >>>> being executed on sharded volumes, you can override the check by using the >>>> 'force' option. >>>> >>>> For example, >>>> >>>> # gluster volume rebalance myvol start force >>>> >>>> Very much looking forward to hearing from you all. >>>> >>>> Thanks, >>>> Krutika >>>> >>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170604/c0bab3a5/attachment.html>
Apparently Analagous Threads
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback
- Rebalance + VM corruption - current status and request for feedback