thr3ads.net - Gluster users - [Gluster-users] Rebalance + VM corruption - current status and request for feedback [May 2017]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2017-May-29 06:20 UTC

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Hi,

I took a look at your logs.
It very much seems like an issue that is caused by a mismatch in glusterfs
client and server packages.
So your client (mount) seems to be still running 3.7.20, as confirmed by
the occurrence of the following log message:

[2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
(args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
--volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)
[2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
(args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
--volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)
[2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
(args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
--volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)

whereas the servers have rightly been upgraded to 3.10.2, as seen in
rebalance log:

[2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
(args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
--xlator-option *dht.use-readdirp=yes --xlator-option
*dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
--xlator-option *replicate*.data-self-heal=off --xlator-option
*replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
--xlator-option *dht.rebalance-cmd=5 --xlator-option
*dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
*dht.commit-hash=3376396580 --socket-file
/var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock
--pid-file
/var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
-l /var/log/glusterfs/testvol-rebalance.log)


Could you upgrade all packages to 3.10.2 and try again?

-Krutika


On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:
> Hi,
>
>
> Attached are the logs for both the rebalance and the mount.
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Friday, May 26, 2017 1:12:28 PM
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Could you provide the rebalance and mount logs?
>
> -Krutika
>
> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Good morning,
>>
>>
>> So i have tested the new Gluster 3.10.2, and after starting rebalance
two
>> VMs were paused due to storage error and third one was not responding.
>>
>> After rebalance completed i started the VMs and it did not boot, and
>> throw an XFS wrong inode error into the screen.
>>
>>
>> My setup:
>>
>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>
>> 4 bricks in distributed replica with group set to virt.
>>
>> I added the volume to ovirt and created three VMs, i ran a loop to
create
>> 5GB file inside the VMs.
>>
>> Added new 4 bricks to the existing nodes.
>>
>> Started rebalane "with force to bypass the warning message"
>>
>> VMs started to fail after rebalancing.
>>
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>> *To:* gluster-user
>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi
>> Adnan
>> *Subject:* Rebalance + VM corruption - current status and request for
>> feedback
>>
>> Hi,
>>
>> In the past couple of weeks, we've sent the following fixes
concerning VM
>> corruption upon doing rebalance - https://review.gluster.org/#/q
>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>
>> These fixes are very much part of the latest 3.10.2 release.
>>
>> Satheesaran within Red Hat also verified that they work and he's
not
>> seeing corruption issues anymore.
>>
>> I'd like to hear feedback from the users themselves on these fixes
(on
>> your test environments to begin with) before even changing the status
of
>> the bug to CLOSED.
>>
>> Although 3.10.2 has a patch that prevents rebalance sub-commands from
>> being executed on sharded volumes, you can override the check by using
the
>> 'force' option.
>>
>> For example,
>>
>> # gluster volume rebalance myvol start force
>>
>> Very much looking forward to hearing from you all.
>>
>> Thanks,
>> Krutika
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170529/2f62a55a/attachment.html>

Mahdi Adnan

2017-May-29 13:21 UTC

head link

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Hello,


Yes, i forgot to upgrade the client as well.

I did the upgrade and created a new volume, same options as before, with one VM
running and doing lots of IOs. i started the rebalance with force and after it
completed the process i rebooted the VM, and it did start normally without
issues.

I repeated the process and did another rebalance while the VM running and
everything went fine.

But the logs in the client throwing lots of warning messages:


[2017-05-29 13:14:59.416382] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: remote
operation failed. Path:
/50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
(93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
[2017-05-29 13:14:59.416427] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: remote
operation failed. Path:
/50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
(93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
[2017-05-29 13:14:59.808251] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2: remote
operation failed. Path:
/50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
(93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
[2017-05-29 13:14:59.808287] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3: remote
operation failed. Path:
/50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
(93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]


Although the process went smooth, i will run another extensive test tomorrow
just to be sure.

--

Respectfully
Mahdi A. Mahdi

________________________________
From: Krutika Dhananjay <kdhananj at redhat.com>
Sent: Monday, May 29, 2017 9:20:29 AM
To: Mahdi Adnan
Cc: gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier
Subject: Re: Rebalance + VM corruption - current status and request for feedback

Hi,

I took a look at your logs.
It very much seems like an issue that is caused by a mismatch in glusterfs
client and server packages.
So your client (mount) seems to be still running 3.7.20, as confirmed by the
occurrence of the following log message:

[2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args:
/usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 --volfile-server=s3
--volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)
[2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args:
/usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 --volfile-server=s3
--volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)
[2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20 (args:
/usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2 --volfile-server=s3
--volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)

whereas the servers have rightly been upgraded to 3.10.2, as seen in rebalance
log:

[2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2 (args:
/usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol --xlator-option
*dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option
*dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off
--xlator-option *replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
--xlator-option *dht.rebalance-cmd=5 --xlator-option
*dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
*dht.commit-hash=3376396580 --socket-file
/var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock
--pid-file
/var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
-l /var/log/glusterfs/testvol-rebalance.log)


Could you upgrade all packages to 3.10.2 and try again?

-Krutika


On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Hi,


Attached are the logs for both the rebalance and the mount.


--

Respectfully
Mahdi A. Mahdi

________________________________
From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>
Sent: Friday, May 26, 2017 1:12:28 PM
To: Mahdi Adnan
Cc: gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier
Subject: Re: Rebalance + VM corruption - current status and request for feedback

Could you provide the rebalance and mount logs?

-Krutika

On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at
outlook.com<mailto:mahdi.adnan at outlook.com>> wrote:

Good morning,


So i have tested the new Gluster 3.10.2, and after starting rebalance two VMs
were paused due to storage error and third one was not responding.

After rebalance completed i started the VMs and it did not boot, and throw an
XFS wrong inode error into the screen.


My setup:

4 nodes running CentOS7.3 with Gluster 3.10.2

4 bricks in distributed replica with group set to virt.

I added the volume to ovirt and created three VMs, i ran a loop to create 5GB
file inside the VMs.

Added new 4 bricks to the existing nodes.

Started rebalane "with force to bypass the warning message"

VMs started to fail after rebalancing.



--

Respectfully
Mahdi A. Mahdi

________________________________
From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at
redhat.com>>
Sent: Wednesday, May 17, 2017 6:59:20 AM
To: gluster-user
Cc: Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi Adnan
Subject: Rebalance + VM corruption - current status and request for feedback

Hi,

In the past couple of weeks, we've sent the following fixes concerning VM
corruption upon doing rebalance -
https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051

These fixes are very much part of the latest 3.10.2 release.

Satheesaran within Red Hat also verified that they work and he's not seeing
corruption issues anymore.

I'd like to hear feedback from the users themselves on these fixes (on your
test environments to begin with) before even changing the status of the bug to
CLOSED.

Although 3.10.2 has a patch that prevents rebalance sub-commands from being
executed on sharded volumes, you can override the check by using the
'force' option.

For example,

# gluster volume rebalance myvol start force

Very much looking forward to hearing from you all.

Thanks,
Krutika


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170529/1958aca5/attachment.html>

Krutika Dhananjay

2017-May-29 13:27 UTC

head link

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Thanks for that update. Very happy to hear it ran fine without any issues.
:)

Yeah so you can ignore those 'No such file or directory' errors. They
represent a transient state where DHT in the client process is yet to
figure out the new location of the file.

-Krutika


On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:
> Hello,
>
>
> Yes, i forgot to upgrade the client as well.
>
> I did the upgrade and created a new volume, same options as before, with
> one VM running and doing lots of IOs. i started the rebalance with force
> and after it completed the process i rebooted the VM, and it did start
> normally without issues.
>
> I repeated the process and did another rebalance while the VM running and
> everything went fine.
>
> But the logs in the client throwing lots of warning messages:
>
>
> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>
>
>
> Although the process went smooth, i will run another extensive test
> tomorrow just to be sure.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Monday, May 29, 2017 9:20:29 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> I took a look at your logs.
> It very much seems like an issue that is caused by a mismatch in glusterfs
> client and server packages.
> So your client (mount) seems to be still running 3.7.20, as confirmed by
> the occurrence of the following log message:
>
> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>
> whereas the servers have rightly been upgraded to 3.10.2, as seen in
> rebalance log:
>
> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
> --xlator-option *dht.use-readdirp=yes --xlator-option
> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
> --xlator-option *dht.rebalance-cmd=5 --xlator-option
> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
> *dht.commit-hash=3376396580 --socket-file /var/run/gluster/gluster-
> rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file
> /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-
> 1ede-47b1-b9a5-bfde6e60f07b.pid -l /var/log/glusterfs/testvol-
> rebalance.log)
>
>
> Could you upgrade all packages to 3.10.2 and try again?
>
> -Krutika
>
>
> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Hi,
>>
>>
>> Attached are the logs for both the rebalance and the mount.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Friday, May 26, 2017 1:12:28 PM
>> *To:* Mahdi Adnan
>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>> Lemonnier
>> *Subject:* Re: Rebalance + VM corruption - current status and request
>> for feedback
>>
>> Could you provide the rebalance and mount logs?
>>
>> -Krutika
>>
>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>>> Good morning,
>>>
>>>
>>> So i have tested the new Gluster 3.10.2, and after starting
rebalance
>>> two VMs were paused due to storage error and third one was not
responding.
>>>
>>> After rebalance completed i started the VMs and it did not boot,
and
>>> throw an XFS wrong inode error into the screen.
>>>
>>>
>>> My setup:
>>>
>>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>>
>>> 4 bricks in distributed replica with group set to virt.
>>>
>>> I added the volume to ovirt and created three VMs, i ran a loop to
>>> create 5GB file inside the VMs.
>>>
>>> Added new 4 bricks to the existing nodes.
>>>
>>> Started rebalane "with force to bypass the warning
message"
>>>
>>> VMs started to fail after rebalancing.
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>>> *To:* gluster-user
>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier;
Mahdi
>>> Adnan
>>> *Subject:* Rebalance + VM corruption - current status and request
for
>>> feedback
>>>
>>> Hi,
>>>
>>> In the past couple of weeks, we've sent the following fixes
concerning
>>> VM corruption upon doing rebalance - https://review.gluster.org/#/q
>>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>>
>>> These fixes are very much part of the latest 3.10.2 release.
>>>
>>> Satheesaran within Red Hat also verified that they work and
he's not
>>> seeing corruption issues anymore.
>>>
>>> I'd like to hear feedback from the users themselves on these
fixes (on
>>> your test environments to begin with) before even changing the
status of
>>> the bug to CLOSED.
>>>
>>> Although 3.10.2 has a patch that prevents rebalance sub-commands
from
>>> being executed on sharded volumes, you can override the check by
using the
>>> 'force' option.
>>>
>>> For example,
>>>
>>> # gluster volume rebalance myvol start force
>>>
>>> Very much looking forward to hearing from you all.
>>>
>>> Thanks,
>>> Krutika
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170529/91faacc3/attachment.html>

Krutika Dhananjay

2017-Jun-06 06:17 UTC

head link

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Hi Mahdi,

Did you get a chance to verify this fix again?
If this fix works for you, is it OK if we move this bug to CLOSED state and
revert the rebalance-cli warning patch?

-Krutika

On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:
> Hello,
>
>
> Yes, i forgot to upgrade the client as well.
>
> I did the upgrade and created a new volume, same options as before, with
> one VM running and doing lots of IOs. i started the rebalance with force
> and after it completed the process i rebooted the VM, and it did start
> normally without issues.
>
> I repeated the process and did another rebalance while the VM running and
> everything went fine.
>
> But the logs in the client throwing lots of warning messages:
>
>
> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>
>
>
> Although the process went smooth, i will run another extensive test
> tomorrow just to be sure.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Monday, May 29, 2017 9:20:29 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> I took a look at your logs.
> It very much seems like an issue that is caused by a mismatch in glusterfs
> client and server packages.
> So your client (mount) seems to be still running 3.7.20, as confirmed by
> the occurrence of the following log message:
>
> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>
> whereas the servers have rightly been upgraded to 3.10.2, as seen in
> rebalance log:
>
> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
> --xlator-option *dht.use-readdirp=yes --xlator-option
> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
> --xlator-option *dht.rebalance-cmd=5 --xlator-option
> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
> *dht.commit-hash=3376396580 --socket-file /var/run/gluster/gluster-
> rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file
> /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-
> 1ede-47b1-b9a5-bfde6e60f07b.pid -l /var/log/glusterfs/testvol-
> rebalance.log)
>
>
> Could you upgrade all packages to 3.10.2 and try again?
>
> -Krutika
>
>
> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
> wrote:
>
>> Hi,
>>
>>
>> Attached are the logs for both the rebalance and the mount.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Friday, May 26, 2017 1:12:28 PM
>> *To:* Mahdi Adnan
>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>> Lemonnier
>> *Subject:* Re: Rebalance + VM corruption - current status and request
>> for feedback
>>
>> Could you provide the rebalance and mount logs?
>>
>> -Krutika
>>
>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>>> Good morning,
>>>
>>>
>>> So i have tested the new Gluster 3.10.2, and after starting
rebalance
>>> two VMs were paused due to storage error and third one was not
responding.
>>>
>>> After rebalance completed i started the VMs and it did not boot,
and
>>> throw an XFS wrong inode error into the screen.
>>>
>>>
>>> My setup:
>>>
>>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>>
>>> 4 bricks in distributed replica with group set to virt.
>>>
>>> I added the volume to ovirt and created three VMs, i ran a loop to
>>> create 5GB file inside the VMs.
>>>
>>> Added new 4 bricks to the existing nodes.
>>>
>>> Started rebalane "with force to bypass the warning
message"
>>>
>>> VMs started to fail after rebalancing.
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>>> *To:* gluster-user
>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier;
Mahdi
>>> Adnan
>>> *Subject:* Rebalance + VM corruption - current status and request
for
>>> feedback
>>>
>>> Hi,
>>>
>>> In the past couple of weeks, we've sent the following fixes
concerning
>>> VM corruption upon doing rebalance - https://review.gluster.org/#/q
>>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>>
>>> These fixes are very much part of the latest 3.10.2 release.
>>>
>>> Satheesaran within Red Hat also verified that they work and
he's not
>>> seeing corruption issues anymore.
>>>
>>> I'd like to hear feedback from the users themselves on these
fixes (on
>>> your test environments to begin with) before even changing the
status of
>>> the bug to CLOSED.
>>>
>>> Although 3.10.2 has a patch that prevents rebalance sub-commands
from
>>> being executed on sharded volumes, you can override the check by
using the
>>> 'force' option.
>>>
>>> For example,
>>>
>>> # gluster volume rebalance myvol start force
>>>
>>> Very much looking forward to hearing from you all.
>>>
>>> Thanks,
>>> Krutika
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170606/d37d34cf/attachment.html>

Gluster users - May 2017 - Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

[Gluster-users] Rebalance + VM corruption - current status and request for feedback