thr3ads.net - Gluster users - [Gluster-users] Rebalance + VM corruption - current status and request for feedback [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Gandalf Corvotempesta

2017-Jun-04 12:00 UTC

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Great news.
Is this planned to be published in next release?

Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha
scritto:
> Thanks for that update. Very happy to hear it ran fine without any issues.
> :)
>
> Yeah so you can ignore those 'No such file or directory' errors.
They
> represent a transient state where DHT in the client process is yet to
> figure out the new location of the file.
>
> -Krutika
>
>
> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Hello,
>>
>>
>> Yes, i forgot to upgrade the client as well.
>>
>> I did the upgrade and created a new volume, same options as before,
with
>> one VM running and doing lots of IOs. i started the rebalance with
force
>> and after it completed the process i rebooted the VM, and it did start
>> normally without issues.
>>
>> I repeated the process and did another rebalance while the VM running
and
>> everything went fine.
>>
>> But the logs in the client throwing lots of warning messages:
>>
>>
>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>
>>
>>
>> Although the process went smooth, i will run another extensive test
>> tomorrow just to be sure.
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>> Lemonnier
>> *Subject:* Re: Rebalance + VM corruption - current status and request
>> for feedback
>>
>> Hi,
>>
>> I took a look at your logs.
>> It very much seems like an issue that is caused by a mismatch in
>> glusterfs client and server packages.
>> So your client (mount) seems to be still running 3.7.20, as confirmed
by
>> the occurrence of the following log message:
>>
>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.20
>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>
>> whereas the servers have rightly been upgraded to 3.10.2, as seen in
>> rebalance log:
>>
>> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.10.2
>> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
>> --xlator-option *dht.use-readdirp=yes --xlator-option
>> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
>> --xlator-option *replicate*.data-self-heal=off --xlator-option
>> *replicate*.metadata-self-heal=off --xlator-option
>> *replicate*.entry-self-heal=off --xlator-option
*dht.readdir-optimize=on
>> --xlator-option *dht.rebalance-cmd=5 --xlator-option
>> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
>> *dht.commit-hash=3376396580 <(337)%20639-6580> --socket-file
>>
/var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock
>> --pid-file
/var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
>> -l /var/log/glusterfs/testvol-rebalance.log)
>>
>>
>> Could you upgrade all packages to 3.10.2 and try again?
>>
>> -Krutika
>>
>>
>> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> Attached are the logs for both the rebalance and the mount.
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Friday, May 26, 2017 1:12:28 PM
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>>> Lemonnier
>>> *Subject:* Re: Rebalance + VM corruption - current status and
request
>>> for feedback
>>>
>>> Could you provide the rebalance and mount logs?
>>>
>>> -Krutika
>>>
>>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>>> Good morning,
>>>>
>>>>
>>>> So i have tested the new Gluster 3.10.2, and after starting
rebalance
>>>> two VMs were paused due to storage error and third one was not
responding.
>>>>
>>>> After rebalance completed i started the VMs and it did not
boot, and
>>>> throw an XFS wrong inode error into the screen.
>>>>
>>>>
>>>> My setup:
>>>>
>>>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>>>
>>>> 4 bricks in distributed replica with group set to virt.
>>>>
>>>> I added the volume to ovirt and created three VMs, i ran a loop
to
>>>> create 5GB file inside the VMs.
>>>>
>>>> Added new 4 bricks to the existing nodes.
>>>>
>>>> Started rebalane "with force to bypass the warning
message"
>>>>
>>>> VMs started to fail after rebalancing.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> ------------------------------
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>>>> *To:* gluster-user
>>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin
Lemonnier; Mahdi
>>>> Adnan
>>>> *Subject:* Rebalance + VM corruption - current status and
request for
>>>> feedback
>>>>
>>>> Hi,
>>>>
>>>> In the past couple of weeks, we've sent the following fixes
concerning
>>>> VM corruption upon doing rebalance -
https://review.gluster.org/#/q
>>>>
/status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>>>
>>>> These fixes are very much part of the latest 3.10.2 release.
>>>>
>>>> Satheesaran within Red Hat also verified that they work and
he's not
>>>> seeing corruption issues anymore.
>>>>
>>>> I'd like to hear feedback from the users themselves on
these fixes (on
>>>> your test environments to begin with) before even changing the
status of
>>>> the bug to CLOSED.
>>>>
>>>> Although 3.10.2 has a patch that prevents rebalance
sub-commands from
>>>> being executed on sharded volumes, you can override the check
by using the
>>>> 'force' option.
>>>>
>>>> For example,
>>>>
>>>> # gluster volume rebalance myvol start force
>>>>
>>>> Very much looking forward to hearing from you all.
>>>>
>>>> Thanks,
>>>> Krutika
>>>>
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170604/c0bab3a5/attachment.html>

Krutika Dhananjay

2017-Jun-05 04:49 UTC

head link

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

The fixes are already available in 3.10.2, 3.8.12 and 3.11.0

-Krutika

On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta <
gandalf.corvotempesta at gmail.com> wrote:
> Great news.
> Is this planned to be published in next release?
>
> Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha
> scritto:
>
>> Thanks for that update. Very happy to hear it ran fine without any
>> issues. :)
>>
>> Yeah so you can ignore those 'No such file or directory'
errors. They
>> represent a transient state where DHT in the client process is yet to
>> figure out the new location of the file.
>>
>> -Krutika
>>
>>
>> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>>> Hello,
>>>
>>>
>>> Yes, i forgot to upgrade the client as well.
>>>
>>> I did the upgrade and created a new volume, same options as before,
with
>>> one VM running and doing lots of IOs. i started the rebalance with
force
>>> and after it completed the process i rebooted the VM, and it did
start
>>> normally without issues.
>>>
>>> I repeated the process and did another rebalance while the VM
running
>>> and everything went fine.
>>>
>>> But the logs in the client throwing lots of warning messages:
>>>
>>>
>>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>>
>>>
>>>
>>> Although the process went smooth, i will run another extensive test
>>> tomorrow just to be sure.
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>>> Lemonnier
>>> *Subject:* Re: Rebalance + VM corruption - current status and
request
>>> for feedback
>>>
>>> Hi,
>>>
>>> I took a look at your logs.
>>> It very much seems like an issue that is caused by a mismatch in
>>> glusterfs client and server packages.
>>> So your client (mount) seems to be still running 3.7.20, as
confirmed by
>>> the occurrence of the following log message:
>>>
>>> [2017-05-26 08:58:23.647458] I [MSGID: 100030]
[glusterfsd.c:2338:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.20
>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>> [2017-05-26 08:58:40.901204] I [MSGID: 100030]
[glusterfsd.c:2338:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.20
>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>> [2017-05-26 08:58:48.923452] I [MSGID: 100030]
[glusterfsd.c:2338:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.20
>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>>
>>> whereas the servers have rightly been upgraded to 3.10.2, as seen
in
>>> rebalance log:
>>>
>>> [2017-05-26 09:36:36.075940] I [MSGID: 100030]
[glusterfsd.c:2475:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.10.2
>>> (args: /usr/sbin/glusterfs -s localhost --volfile-id
rebalance/testvol
>>> --xlator-option *dht.use-readdirp=yes --xlator-option
>>> *dht.lookup-unhashed=yes --xlator-option
*dht.assert-no-child-down=yes
>>> --xlator-option *replicate*.data-self-heal=off --xlator-option
>>> *replicate*.metadata-self-heal=off --xlator-option
>>> *replicate*.entry-self-heal=off --xlator-option
>>> *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=5
>>> --xlator-option *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b
>>> --xlator-option *dht.commit-hash=3376396580
<(337)%20639-6580>
>>> --socket-file /var/run/gluster/gluster-rebal
>>> ance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file
>>>
/var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
>>> -l /var/log/glusterfs/testvol-rebalance.log)
>>>
>>>
>>> Could you upgrade all packages to 3.10.2 and try again?
>>>
>>> -Krutika
>>>
>>>
>>> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> Attached are the logs for both the rebalance and the mount.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> ------------------------------
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Friday, May 26, 2017 1:12:28 PM
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson;
Kevin
>>>> Lemonnier
>>>> *Subject:* Re: Rebalance + VM corruption - current status and
request
>>>> for feedback
>>>>
>>>> Could you provide the rebalance and mount logs?
>>>>
>>>> -Krutika
>>>>
>>>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>>> wrote:
>>>>
>>>>> Good morning,
>>>>>
>>>>>
>>>>> So i have tested the new Gluster 3.10.2, and after starting
rebalance
>>>>> two VMs were paused due to storage error and third one was
not responding.
>>>>>
>>>>> After rebalance completed i started the VMs and it did not
boot, and
>>>>> throw an XFS wrong inode error into the screen.
>>>>>
>>>>>
>>>>> My setup:
>>>>>
>>>>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>>>>
>>>>> 4 bricks in distributed replica with group set to virt.
>>>>>
>>>>> I added the volume to ovirt and created three VMs, i ran a
loop to
>>>>> create 5GB file inside the VMs.
>>>>>
>>>>> Added new 4 bricks to the existing nodes.
>>>>>
>>>>> Started rebalane "with force to bypass the warning
message"
>>>>>
>>>>> VMs started to fail after rebalancing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> ------------------------------
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>>>>> *To:* gluster-user
>>>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin
Lemonnier;
>>>>> Mahdi Adnan
>>>>> *Subject:* Rebalance + VM corruption - current status and
request for
>>>>> feedback
>>>>>
>>>>> Hi,
>>>>>
>>>>> In the past couple of weeks, we've sent the following
fixes concerning
>>>>> VM corruption upon doing rebalance -
https://review.gluster.org/#/q
>>>>>
/status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>>>>
>>>>> These fixes are very much part of the latest 3.10.2
release.
>>>>>
>>>>> Satheesaran within Red Hat also verified that they work and
he's not
>>>>> seeing corruption issues anymore.
>>>>>
>>>>> I'd like to hear feedback from the users themselves on
these fixes (on
>>>>> your test environments to begin with) before even changing
the status of
>>>>> the bug to CLOSED.
>>>>>
>>>>> Although 3.10.2 has a patch that prevents rebalance
sub-commands from
>>>>> being executed on sharded volumes, you can override the
check by using the
>>>>> 'force' option.
>>>>>
>>>>> For example,
>>>>>
>>>>> # gluster volume rebalance myvol start force
>>>>>
>>>>> Very much looking forward to hearing from you all.
>>>>>
>>>>> Thanks,
>>>>> Krutika
>>>>>
>>>>
>>>>
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170605/207f6e9f/attachment.html>

Gandalf Corvotempesta

2017-Jun-05 11:36 UTC

head link

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Great, thanks!

Il 5 giu 2017 6:49 AM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha scritto:
> The fixes are already available in 3.10.2, 3.8.12 and 3.11.0
>
> -Krutika
>
> On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta <
> gandalf.corvotempesta at gmail.com> wrote:
>
>> Great news.
>> Is this planned to be published in next release?
>>
>> Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha
>> scritto:
>>
>>> Thanks for that update. Very happy to hear it ran fine without any
>>> issues. :)
>>>
>>> Yeah so you can ignore those 'No such file or directory'
errors. They
>>> represent a transient state where DHT in the client process is yet
to
>>> figure out the new location of the file.
>>>
>>> -Krutika
>>>
>>>
>>> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> Yes, i forgot to upgrade the client as well.
>>>>
>>>> I did the upgrade and created a new volume, same options as
before,
>>>> with one VM running and doing lots of IOs. i started the
rebalance with
>>>> force and after it completed the process i rebooted the VM, and
it did
>>>> start normally without issues.
>>>>
>>>> I repeated the process and did another rebalance while the VM
running
>>>> and everything went fine.
>>>>
>>>> But the logs in the client throwing lots of warning messages:
>>>>
>>>>
>>>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
2-gfs_vol2-client-2:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
directory]
>>>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
2-gfs_vol2-client-3:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
directory]
>>>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
2-gfs_vol2-client-2:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
directory]
>>>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
2-gfs_vol2-client-3:
>>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>>>
69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
directory]
>>>>
>>>>
>>>>
>>>> Although the process went smooth, i will run another extensive
test
>>>> tomorrow just to be sure.
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> ------------------------------
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson;
Kevin
>>>> Lemonnier
>>>> *Subject:* Re: Rebalance + VM corruption - current status and
request
>>>> for feedback
>>>>
>>>> Hi,
>>>>
>>>> I took a look at your logs.
>>>> It very much seems like an issue that is caused by a mismatch
in
>>>> glusterfs client and server packages.
>>>> So your client (mount) seems to be still running 3.7.20, as
confirmed
>>>> by the occurrence of the following log message:
>>>>
>>>> [2017-05-26 08:58:23.647458] I [MSGID: 100030]
[glusterfsd.c:2338:main]
>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version 3.7.20
>>>> (args: /usr/sbin/glusterfs --volfile-server=s1
--volfile-server=s2
>>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>>> [2017-05-26 08:58:40.901204] I [MSGID: 100030]
[glusterfsd.c:2338:main]
>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version 3.7.20
>>>> (args: /usr/sbin/glusterfs --volfile-server=s1
--volfile-server=s2
>>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>>> [2017-05-26 08:58:48.923452] I [MSGID: 100030]
[glusterfsd.c:2338:main]
>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version 3.7.20
>>>> (args: /usr/sbin/glusterfs --volfile-server=s1
--volfile-server=s2
>>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>>>
>>>> whereas the servers have rightly been upgraded to 3.10.2, as
seen in
>>>> rebalance log:
>>>>
>>>> [2017-05-26 09:36:36.075940] I [MSGID: 100030]
[glusterfsd.c:2475:main]
>>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs
version 3.10.2
>>>> (args: /usr/sbin/glusterfs -s localhost --volfile-id
rebalance/testvol
>>>> --xlator-option *dht.use-readdirp=yes --xlator-option
>>>> *dht.lookup-unhashed=yes --xlator-option
*dht.assert-no-child-down=yes
>>>> --xlator-option *replicate*.data-self-heal=off --xlator-option
>>>> *replicate*.metadata-self-heal=off --xlator-option
>>>> *replicate*.entry-self-heal=off --xlator-option
>>>> *dht.readdir-optimize=on --xlator-option *dht.rebalance-cmd=5
>>>> --xlator-option
*dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b
>>>> --xlator-option *dht.commit-hash=3376396580
<(337)%20639-6580>
>>>> --socket-file /var/run/gluster/gluster-rebal
>>>> ance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file
>>>>
/var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
>>>> -l /var/log/glusterfs/testvol-rebalance.log)
>>>>
>>>>
>>>> Could you upgrade all packages to 3.10.2 and try again?
>>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> Attached are the logs for both the rebalance and the mount.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> ------------------------------
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Friday, May 26, 2017 1:12:28 PM
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay
Mathieson; Kevin
>>>>> Lemonnier
>>>>> *Subject:* Re: Rebalance + VM corruption - current status
and request
>>>>> for feedback
>>>>>
>>>>> Could you provide the rebalance and mount logs?
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>>> Good morning,
>>>>>>
>>>>>>
>>>>>> So i have tested the new Gluster 3.10.2, and after
starting rebalance
>>>>>> two VMs were paused due to storage error and third one
was not responding.
>>>>>>
>>>>>> After rebalance completed i started the VMs and it did
not boot, and
>>>>>> throw an XFS wrong inode error into the screen.
>>>>>>
>>>>>>
>>>>>> My setup:
>>>>>>
>>>>>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>>>>>
>>>>>> 4 bricks in distributed replica with group set to virt.
>>>>>>
>>>>>> I added the volume to ovirt and created three VMs, i
ran a loop to
>>>>>> create 5GB file inside the VMs.
>>>>>>
>>>>>> Added new 4 bricks to the existing nodes.
>>>>>>
>>>>>> Started rebalane "with force to bypass the warning
message"
>>>>>>
>>>>>> VMs started to fail after rebalancing.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> *Mahdi A. Mahdi*
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Krutika Dhananjay <kdhananj at
redhat.com>
>>>>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>>>>>> *To:* gluster-user
>>>>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin
Lemonnier;
>>>>>> Mahdi Adnan
>>>>>> *Subject:* Rebalance + VM corruption - current status
and request
>>>>>> for feedback
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> In the past couple of weeks, we've sent the
following fixes
>>>>>> concerning VM corruption upon doing rebalance -
>>>>>>
https://review.gluster.org/#/q/status:merged+project:gluster
>>>>>> fs+branch:master+topic:bug-1440051
>>>>>>
>>>>>> These fixes are very much part of the latest 3.10.2
release.
>>>>>>
>>>>>> Satheesaran within Red Hat also verified that they work
and he's not
>>>>>> seeing corruption issues anymore.
>>>>>>
>>>>>> I'd like to hear feedback from the users themselves
on these fixes
>>>>>> (on your test environments to begin with) before even
changing the status
>>>>>> of the bug to CLOSED.
>>>>>>
>>>>>> Although 3.10.2 has a patch that prevents rebalance
sub-commands from
>>>>>> being executed on sharded volumes, you can override the
check by using the
>>>>>> 'force' option.
>>>>>>
>>>>>> For example,
>>>>>>
>>>>>> # gluster volume rebalance myvol start force
>>>>>>
>>>>>> Very much looking forward to hearing from you all.
>>>>>>
>>>>>> Thanks,
>>>>>> Krutika
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170605/99bcb771/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

Gluster users - Jun 2017 - Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Maybe Matching Threads