thr3ads.net - Gluster users - [Gluster-users] Gluster 3.8.10 rebalance VMs corruption [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Gandalf Corvotempesta

2017-Apr-27 11:00 UTC

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

I think we are talking about a different bug.

Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" <pkarampu at
redhat.com> ha
scritto:
> I am not a DHT developer, so some of what I say could be a little wrong.
> But this is what I gather.
> I think they found 2 classes of bugs in dht
> 1) Graceful fop failover when rebalance is in progress is missing for some
> fops, that lead to VM pause.
>
> I see that https://review.gluster.org/17085 got merged on 24th on master
> for this. I see patches are posted for 3.8.x for this one.
>
> 2) I think there is some work needs to be done for dht_[f]xattrop. I
> believe this is the next step that is underway.
>
>
> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta <
> gandalf.corvotempesta at gmail.com> wrote:
>
>> Updates on this critical bug ?
>>
>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
>> gandalf.corvotempesta at gmail.com> ha scritto:
>>
>>> Any update ?
>>> In addition, if this is a different bug but the
"workflow" is the same
>>> as the previous one, how is possible that fixing the previous bug
>>> triggered this new one ?
>>>
>>> Is possible to have some details ?
>>>
>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at
redhat.com>:
>>> > Nope. This is a different bug.
>>> >
>>> > -Krutika
>>> >
>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>>> > <gandalf.corvotempesta at gmail.com> wrote:
>>> >>
>>> >> This is a good news
>>> >> Is this related to the previously fixed bug?
>>> >>
>>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"
<kdhananj at redhat.com> ha
>>> >> scritto:
>>> >>>
>>> >>> So Raghavendra has an RCA for this issue.
>>> >>>
>>> >>> Copy-pasting his comment here:
>>> >>>
>>> >>> <RCA>
>>> >>>
>>> >>> Following is a rough algorithm of shard_writev:
>>> >>>
>>> >>> 1. Based on the offset, calculate the shards touched
by current
>>> write.
>>> >>> 2. Look for inodes corresponding to these shard files
in itable.
>>> >>> 3. If one or more inodes are missing from itable,
issue mknod for
>>> >>> corresponding shard files and ignore EEXIST in cbk.
>>> >>> 4. resume writes on respective shards.
>>> >>>
>>> >>> Now, imagine a write which falls to an existing
"shard_file". For the
>>> >>> sake of discussion lets consider a distribute of three
subvols - s1,
>>> s2, s3
>>> >>>
>>> >>> 1. "shard_file" hashes to subvolume s2 and
is present on s2
>>> >>> 2. add a subvolume s4 and initiate a fix layout. The
layout of
>>> ".shard"
>>> >>> is fixed to include s4 and hash ranges are changed.
>>> >>> 3. write that touches "shard_file" is
issued.
>>> >>> 4. The inode for "shard_file" is not present
in itable after a graph
>>> >>> switch and features/shard issues an mknod.
>>> >>> 5. With new layout of .shard, lets say
"shard_file" hashes to s3 and
>>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file
is already
>>> present on
>>> >>> s2.
>>> >>>
>>> >>> So, we have two files on two different subvols of dht
representing
>>> same
>>> >>> shard and this will lead to corruption.
>>> >>>
>>> >>> </RCA>
>>> >>>
>>> >>> Raghavendra will be sending out a patch in DHT to fix
this issue.
>>> >>>
>>> >>> -Krutika
>>> >>>
>>> >>>
>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar
Karampuri
>>> >>> <pkarampu at redhat.com> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>>
>>> >>>>> Do you guys have any update regarding this
issue ?
>>> >>>>
>>> >>>> I do not actively work on this issue so I do not
have an accurate
>>> >>>> update, but from what I heard from Krutika and
Raghavendra(works on
>>> DHT) is:
>>> >>>> Krutika debugged initially and found that the
issue seems more
>>> likely to be
>>> >>>> in DHT, Satheesaran who helped us recreate this
issue in lab found
>>> that just
>>> >>>> fix-layout without rebalance also caused the
corruption 1 out of 3
>>> times.
>>> >>>> Raghavendra came up with a possible RCA for why
this can happen.
>>> >>>> Raghavendra(CCed) would be the right person to
provide accurate
>>> update.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>>
>>> >>>>> Respectfully
>>> >>>>> Mahdi A. Mahdi
>>> >>>>>
>>> >>>>> ________________________________
>>> >>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
>>> >>>>> To: Mahdi Adnan
>>> >>>>> Cc: Nithya Balachandran; Gowdappa,
Raghavendra; Susant Palai;
>>> >>>>> gluster-users at gluster.org List
>>> >>>>>
>>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>>> corruption
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>> So it looks like Satheesaran managed to
recreate this issue. We
>>> will be
>>> >>>>> seeking his help in debugging this. It will be
easier that way.
>>> >>>>>
>>> >>>>> -Krutika
>>> >>>>>
>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan
<
>>> mahdi.adnan at outlook.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Hello and thank you for your email.
>>> >>>>>> Actually no, i didn't check the gfid
of the vms.
>>> >>>>>> If this will help, i can setup a new test
cluster and get all the
>>> data
>>> >>>>>> you need.
>>> >>>>>>
>>> >>>>>> Get Outlook for Android
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> From: Nithya Balachandran
>>> >>>>>> Sent: Monday, March 20, 20:57
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>> To: Krutika Dhananjay
>>> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra,
Susant Palai,
>>> >>>>>> gluster-users at gluster.org List
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> Do you know the GFIDs of the VM images
which were corrupted?
>>> >>>>>>
>>> >>>>>> Regards,
>>> >>>>>>
>>> >>>>>> Nithya
>>> >>>>>>
>>> >>>>>> On 20 March 2017 at 20:37, Krutika
Dhananjay <kdhananj at redhat.com
>>> >
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> I looked at the logs.
>>> >>>>>>
>>> >>>>>> From the time the new graph (since the
add-brick command you
>>> shared
>>> >>>>>> where bricks 41 through 44 are added) is
switched to (line 3011
>>> onwards in
>>> >>>>>> nfs-gfapi.log), I see the following kinds
of errors:
>>> >>>>>>
>>> >>>>>> 1. Lookups to a bunch of files failed with
ENOENT on both replicas
>>> >>>>>> which protocol/client converts to ESTALE.
I am guessing these
>>> entries got
>>> >>>>>> migrated to
>>> >>>>>>
>>> >>>>>> other subvolumes leading to 'No such
file or directory' errors.
>>> >>>>>>
>>> >>>>>> DHT and thereafter shard get the same
error code and log the
>>> >>>>>> following:
>>> >>>>>>
>>> >>>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID:
109040]
>>> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
>>> 17-vmware2-dht:
>>> >>>>>>
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to
>>> lookup the file
>>> >>>>>> on vmware2-dht [Stale file handle]
>>> >>>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID:
133014]
>>> >>>>>> [shard.c:1253:shard_common_stat_cbk]
17-vmware2-shard: stat
>>> failed:
>>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532
[Stale file handle]
>>> >>>>>>
>>> >>>>>> which is fine.
>>> >>>>>>
>>> >>>>>> 2. The other kind are from AFR logging of
possible split-brain
>>> which I
>>> >>>>>> suppose are harmless too.
>>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID:
108008]
>>> >>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13:
>>> Unreadable
>>> >>>>>> subvolume -1 found with event generation 2
for gfid
>>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9.
(Possible split-brain)
>>> >>>>>>
>>> >>>>>> Since you are saying the bug is hit only
on VMs that are
>>> undergoing IO
>>> >>>>>> while rebalance is running (as opposed to
those that remained
>>> powered off),
>>> >>>>>>
>>> >>>>>> rebalance + IO could be causing some
issues.
>>> >>>>>>
>>> >>>>>> CC'ing DHT devs
>>> >>>>>>
>>> >>>>>> Raghavendra/Nithya/Susant,
>>> >>>>>>
>>> >>>>>> Could you take a look?
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi
Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> Thank you for your email mate.
>>> >>>>>>
>>> >>>>>> Yes, im aware of this but, to save costs i
chose replica 2, this
>>> >>>>>> cluster is all flash.
>>> >>>>>>
>>> >>>>>> In version 3.7.x i had issues with ping
timeout, if one hosts went
>>> >>>>>> down for few seconds the whole cluster
hangs and become
>>> unavailable, to
>>> >>>>>> avoid this i adjusted the ping timeout to
5 seconds.
>>> >>>>>>
>>> >>>>>> As for choosing Ganesha over gfapi, VMWare
does not support
>>> Gluster
>>> >>>>>> (FUSE or gfapi) im stuck with NFS for this
volume.
>>> >>>>>>
>>> >>>>>> The other volume is mounted using gfapi in
oVirt cluster.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> While I'm still going through the
logs, just wanted to point out a
>>> >>>>>> couple of things:
>>> >>>>>>
>>> >>>>>> 1. It is recommended that you use 3-way
replication (replica
>>> count 3)
>>> >>>>>> for VM store use case
>>> >>>>>>
>>> >>>>>> 2. network.ping-timeout at 5 seconds is
way too low. Please
>>> change it
>>> >>>>>> to 30.
>>> >>>>>>
>>> >>>>>> Is there any specific reason for using
NFS-Ganesha over
>>> gfapi/FUSE?
>>> >>>>>>
>>> >>>>>> Will get back with anything else I might
find or more questions
>>> if I
>>> >>>>>> have any.
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi
Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> Thanks mate,
>>> >>>>>>
>>> >>>>>> Kindly, check the attachment.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> In that case could you share the
ganesha-gfapi logs?
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi
Adnan
>>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>>> >>>>>>
>>> >>>>>> I have two volumes, one is mounted using
libgfapi for ovirt
>>> mount, the
>>> >>>>>> other one is exported via NFS-Ganesha for
VMWare which is the one
>>> im testing
>>> >>>>>> now.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi
Adnan
>>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>>> >>>>>>
>>> >>>>>> Kindly, check the attached new log file, i
dont know if it's
>>> helpful
>>> >>>>>> or not but, i couldn't find the log
with the name you just
>>> described.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> No. Are you using FUSE or libgfapi for
accessing the volume? Or
>>> is it
>>> >>>>>> NFS?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> mnt-disk11-vmware2.log seems like a brick
log. Could you attach
>>> the
>>> >>>>>> fuse mount logs? It should be right under
/var/log/glusterfs/
>>> directory
>>> >>>>>>
>>> >>>>>> named after the mount point name, only
hyphenated.
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi
Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> Hello Krutika,
>>> >>>>>>
>>> >>>>>> Kindly, check the attached logs.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Hi Mahdi,
>>> >>>>>>
>>> >>>>>> Could you attach mount, brick and
rebalance logs?
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi
Adnan
>>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I have upgraded to Gluster 3.8.10 today
and ran the add-brick
>>> >>>>>> procedure in a volume contains few VMs.
>>> >>>>>>
>>> >>>>>> After the completion of rebalance, i have
rebooted the VMs, some
>>> of
>>> >>>>>> ran just fine, and others just crashed.
>>> >>>>>>
>>> >>>>>> Windows boot to recovery mode and Linux
throw xfs errors and does
>>> not
>>> >>>>>> boot.
>>> >>>>>>
>>> >>>>>> I ran the test again and it happened just
as the first one, but i
>>> have
>>> >>>>>> noticed only VMs doing disk IOs are
affected by this bug.
>>> >>>>>>
>>> >>>>>> The VMs in power off mode started fine and
even md5 of the disk
>>> file
>>> >>>>>> did not change after the rebalance.
>>> >>>>>>
>>> >>>>>> anyone else can confirm this ?
>>> >>>>>>
>>> >>>>>> Volume info:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Volume Name: vmware2
>>> >>>>>>
>>> >>>>>> Type: Distributed-Replicate
>>> >>>>>>
>>> >>>>>> Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
>>> >>>>>>
>>> >>>>>> Status: Started
>>> >>>>>>
>>> >>>>>> Snapshot Count: 0
>>> >>>>>>
>>> >>>>>> Number of Bricks: 22 x 2 = 44
>>> >>>>>>
>>> >>>>>> Transport-type: tcp
>>> >>>>>>
>>> >>>>>> Bricks:
>>> >>>>>>
>>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Options Reconfigured:
>>> >>>>>>
>>> >>>>>> cluster.server-quorum-type: server
>>> >>>>>>
>>> >>>>>> nfs.disable: on
>>> >>>>>>
>>> >>>>>> performance.readdir-ahead: on
>>> >>>>>>
>>> >>>>>> transport.address-family: inet
>>> >>>>>>
>>> >>>>>> performance.quick-read: off
>>> >>>>>>
>>> >>>>>> performance.read-ahead: off
>>> >>>>>>
>>> >>>>>> performance.io-cache: off
>>> >>>>>>
>>> >>>>>> performance.stat-prefetch: off
>>> >>>>>>
>>> >>>>>> cluster.eager-lock: enable
>>> >>>>>>
>>> >>>>>> network.remote-dio: enable
>>> >>>>>>
>>> >>>>>> features.shard: on
>>> >>>>>>
>>> >>>>>> cluster.data-self-heal-algorithm: full
>>> >>>>>>
>>> >>>>>> features.cache-invalidation: on
>>> >>>>>>
>>> >>>>>> ganesha.enable: on
>>> >>>>>>
>>> >>>>>> features.shard-block-size: 256MB
>>> >>>>>>
>>> >>>>>> client.event-threads: 2
>>> >>>>>>
>>> >>>>>> server.event-threads: 2
>>> >>>>>>
>>> >>>>>> cluster.favorite-child-policy: size
>>> >>>>>>
>>> >>>>>> storage.build-pgfid: off
>>> >>>>>>
>>> >>>>>> network.ping-timeout: 5
>>> >>>>>>
>>> >>>>>> cluster.enable-shared-storage: enable
>>> >>>>>>
>>> >>>>>> nfs-ganesha: enable
>>> >>>>>>
>>> >>>>>> cluster.server-quorum-ratio: 51%
>>> >>>>>>
>>> >>>>>> Adding bricks:
>>> >>>>>>
>>> >>>>>> gluster volume add-brick vmware2 replica 2
>>> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>>> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> starting fix layout:
>>> >>>>>>
>>> >>>>>> gluster volume rebalance vmware2
fix-layout start
>>> >>>>>>
>>> >>>>>> Starting rebalance:
>>> >>>>>>
>>> >>>>>> gluster volume rebalance vmware2  start
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>>
_______________________________________________
>>> >>>>>> Gluster-users mailing list
>>> >>>>>> Gluster-users at gluster.org
>>> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
_______________________________________________
>>> >>>>> Gluster-users mailing list
>>> >>>>> Gluster-users at gluster.org
>>> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Pranith
>>> >>>
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> Gluster-users mailing list
>>> >>> Gluster-users at gluster.org
>>> >>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >
>>> >
>>>
>>
>
>
> --
> Pranith
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/55194d14/attachment.html>

Pranith Kumar Karampuri

2017-Apr-27 11:03 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

I am very positive about the two things I told you. These are the latest
things that happened for VM corruption with rebalance.

On Thu, Apr 27, 2017 at 4:30 PM, Gandalf Corvotempesta <
gandalf.corvotempesta at gmail.com> wrote:
> I think we are talking about a different bug.
>
> Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" <pkarampu
at redhat.com>
> ha scritto:
>
>> I am not a DHT developer, so some of what I say could be a little
wrong.
>> But this is what I gather.
>> I think they found 2 classes of bugs in dht
>> 1) Graceful fop failover when rebalance is in progress is missing for
>> some fops, that lead to VM pause.
>>
>> I see that https://review.gluster.org/17085 got merged on 24th on
master
>> for this. I see patches are posted for 3.8.x for this one.
>>
>> 2) I think there is some work needs to be done for dht_[f]xattrop. I
>> believe this is the next step that is underway.
>>
>>
>> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta <
>> gandalf.corvotempesta at gmail.com> wrote:
>>
>>> Updates on this critical bug ?
>>>
>>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
>>> gandalf.corvotempesta at gmail.com> ha scritto:
>>>
>>>> Any update ?
>>>> In addition, if this is a different bug but the
"workflow" is the same
>>>> as the previous one, how is possible that fixing the previous
bug
>>>> triggered this new one ?
>>>>
>>>> Is possible to have some details ?
>>>>
>>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at
redhat.com>:
>>>> > Nope. This is a different bug.
>>>> >
>>>> > -Krutika
>>>> >
>>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>>>> > <gandalf.corvotempesta at gmail.com> wrote:
>>>> >>
>>>> >> This is a good news
>>>> >> Is this related to the previously fixed bug?
>>>> >>
>>>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"
<kdhananj at redhat.com> ha
>>>> >> scritto:
>>>> >>>
>>>> >>> So Raghavendra has an RCA for this issue.
>>>> >>>
>>>> >>> Copy-pasting his comment here:
>>>> >>>
>>>> >>> <RCA>
>>>> >>>
>>>> >>> Following is a rough algorithm of shard_writev:
>>>> >>>
>>>> >>> 1. Based on the offset, calculate the shards
touched by current
>>>> write.
>>>> >>> 2. Look for inodes corresponding to these shard
files in itable.
>>>> >>> 3. If one or more inodes are missing from itable,
issue mknod for
>>>> >>> corresponding shard files and ignore EEXIST in
cbk.
>>>> >>> 4. resume writes on respective shards.
>>>> >>>
>>>> >>> Now, imagine a write which falls to an existing
"shard_file". For
>>>> the
>>>> >>> sake of discussion lets consider a distribute of
three subvols -
>>>> s1, s2, s3
>>>> >>>
>>>> >>> 1. "shard_file" hashes to subvolume s2
and is present on s2
>>>> >>> 2. add a subvolume s4 and initiate a fix layout.
The layout of
>>>> ".shard"
>>>> >>> is fixed to include s4 and hash ranges are
changed.
>>>> >>> 3. write that touches "shard_file" is
issued.
>>>> >>> 4. The inode for "shard_file" is not
present in itable after a graph
>>>> >>> switch and features/shard issues an mknod.
>>>> >>> 5. With new layout of .shard, lets say
"shard_file" hashes to s3 and
>>>> >>> mknod (shard_file) on s3 succeeds. But, the
shard_file is already
>>>> present on
>>>> >>> s2.
>>>> >>>
>>>> >>> So, we have two files on two different subvols of
dht representing
>>>> same
>>>> >>> shard and this will lead to corruption.
>>>> >>>
>>>> >>> </RCA>
>>>> >>>
>>>> >>> Raghavendra will be sending out a patch in DHT to
fix this issue.
>>>> >>>
>>>> >>> -Krutika
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar
Karampuri
>>>> >>> <pkarampu at redhat.com> wrote:
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan
<
>>>> mahdi.adnan at outlook.com>
>>>> >>>> wrote:
>>>> >>>>>
>>>> >>>>> Hi,
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Do you guys have any update regarding this
issue ?
>>>> >>>>
>>>> >>>> I do not actively work on this issue so I do
not have an accurate
>>>> >>>> update, but from what I heard from Krutika and
Raghavendra(works
>>>> on DHT) is:
>>>> >>>> Krutika debugged initially and found that the
issue seems more
>>>> likely to be
>>>> >>>> in DHT, Satheesaran who helped us recreate
this issue in lab found
>>>> that just
>>>> >>>> fix-layout without rebalance also caused the
corruption 1 out of 3
>>>> times.
>>>> >>>> Raghavendra came up with a possible RCA for
why this can happen.
>>>> >>>> Raghavendra(CCed) would be the right person to
provide accurate
>>>> update.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>>
>>>> >>>>> Respectfully
>>>> >>>>> Mahdi A. Mahdi
>>>> >>>>>
>>>> >>>>> ________________________________
>>>> >>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
>>>> >>>>> To: Mahdi Adnan
>>>> >>>>> Cc: Nithya Balachandran; Gowdappa,
Raghavendra; Susant Palai;
>>>> >>>>> gluster-users at gluster.org List
>>>> >>>>>
>>>> >>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>
>>>> >>>>> Hi,
>>>> >>>>>
>>>> >>>>> So it looks like Satheesaran managed to
recreate this issue. We
>>>> will be
>>>> >>>>> seeking his help in debugging this. It
will be easier that way.
>>>> >>>>>
>>>> >>>>> -Krutika
>>>> >>>>>
>>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi
Adnan <
>>>> mahdi.adnan at outlook.com>
>>>> >>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Hello and thank you for your email.
>>>> >>>>>> Actually no, i didn't check the
gfid of the vms.
>>>> >>>>>> If this will help, i can setup a new
test cluster and get all
>>>> the data
>>>> >>>>>> you need.
>>>> >>>>>>
>>>> >>>>>> Get Outlook for Android
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> From: Nithya Balachandran
>>>> >>>>>> Sent: Monday, March 20, 20:57
>>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>> To: Krutika Dhananjay
>>>> >>>>>> Cc: Mahdi Adnan, Gowdappa,
Raghavendra, Susant Palai,
>>>> >>>>>> gluster-users at gluster.org List
>>>> >>>>>>
>>>> >>>>>> Hi,
>>>> >>>>>>
>>>> >>>>>> Do you know the GFIDs of the VM images
which were corrupted?
>>>> >>>>>>
>>>> >>>>>> Regards,
>>>> >>>>>>
>>>> >>>>>> Nithya
>>>> >>>>>>
>>>> >>>>>> On 20 March 2017 at 20:37, Krutika
Dhananjay <
>>>> kdhananj at redhat.com>
>>>> >>>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> I looked at the logs.
>>>> >>>>>>
>>>> >>>>>> From the time the new graph (since the
add-brick command you
>>>> shared
>>>> >>>>>> where bricks 41 through 44 are added)
is switched to (line 3011
>>>> onwards in
>>>> >>>>>> nfs-gfapi.log), I see the following
kinds of errors:
>>>> >>>>>>
>>>> >>>>>> 1. Lookups to a bunch of files failed
with ENOENT on both
>>>> replicas
>>>> >>>>>> which protocol/client converts to
ESTALE. I am guessing these
>>>> entries got
>>>> >>>>>> migrated to
>>>> >>>>>>
>>>> >>>>>> other subvolumes leading to 'No
such file or directory' errors.
>>>> >>>>>>
>>>> >>>>>> DHT and thereafter shard get the same
error code and log the
>>>> >>>>>> following:
>>>> >>>>>>
>>>> >>>>>>  0 [2017-03-17 14:04:26.353444] E
[MSGID: 109040]
>>>> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
>>>> 17-vmware2-dht:
>>>> >>>>>>
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to
>>>> lookup the file
>>>> >>>>>> on vmware2-dht [Stale file handle]
>>>> >>>>>>   1 [2017-03-17 14:04:26.353528] E
[MSGID: 133014]
>>>> >>>>>> [shard.c:1253:shard_common_stat_cbk]
17-vmware2-shard: stat
>>>> failed:
>>>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532
[Stale file handle]
>>>> >>>>>>
>>>> >>>>>> which is fine.
>>>> >>>>>>
>>>> >>>>>> 2. The other kind are from AFR logging
of possible split-brain
>>>> which I
>>>> >>>>>> suppose are harmless too.
>>>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID:
108008]
>>>> >>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13:
>>>> Unreadable
>>>> >>>>>> subvolume -1 found with event
generation 2 for gfid
>>>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9.
(Possible split-brain)
>>>> >>>>>>
>>>> >>>>>> Since you are saying the bug is hit
only on VMs that are
>>>> undergoing IO
>>>> >>>>>> while rebalance is running (as opposed
to those that remained
>>>> powered off),
>>>> >>>>>>
>>>> >>>>>> rebalance + IO could be causing some
issues.
>>>> >>>>>>
>>>> >>>>>> CC'ing DHT devs
>>>> >>>>>>
>>>> >>>>>> Raghavendra/Nithya/Susant,
>>>> >>>>>>
>>>> >>>>>> Could you take a look?
>>>> >>>>>>
>>>> >>>>>> -Krutika
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi
Adnan <
>>>> mahdi.adnan at outlook.com>
>>>> >>>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Thank you for your email mate.
>>>> >>>>>>
>>>> >>>>>> Yes, im aware of this but, to save
costs i chose replica 2, this
>>>> >>>>>> cluster is all flash.
>>>> >>>>>>
>>>> >>>>>> In version 3.7.x i had issues with
ping timeout, if one hosts
>>>> went
>>>> >>>>>> down for few seconds the whole cluster
hangs and become
>>>> unavailable, to
>>>> >>>>>> avoid this i adjusted the ping timeout
to 5 seconds.
>>>> >>>>>>
>>>> >>>>>> As for choosing Ganesha over gfapi,
VMWare does not support
>>>> Gluster
>>>> >>>>>> (FUSE or gfapi) im stuck with NFS for
this volume.
>>>> >>>>>>
>>>> >>>>>> The other volume is mounted using
gfapi in oVirt cluster.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>>
>>>> >>>>>> Respectfully
>>>> >>>>>> Mahdi A. Mahdi
>>>> >>>>>>
>>>> >>>>>> From: Krutika Dhananjay <kdhananj
at redhat.com>
>>>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49
PM
>>>> >>>>>>
>>>> >>>>>> To: Mahdi Adnan
>>>> >>>>>> Cc: gluster-users at gluster.org
>>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> While I'm still going through the
logs, just wanted to point out
>>>> a
>>>> >>>>>> couple of things:
>>>> >>>>>>
>>>> >>>>>> 1. It is recommended that you use
3-way replication (replica
>>>> count 3)
>>>> >>>>>> for VM store use case
>>>> >>>>>>
>>>> >>>>>> 2. network.ping-timeout at 5 seconds
is way too low. Please
>>>> change it
>>>> >>>>>> to 30.
>>>> >>>>>>
>>>> >>>>>> Is there any specific reason for using
NFS-Ganesha over
>>>> gfapi/FUSE?
>>>> >>>>>>
>>>> >>>>>> Will get back with anything else I
might find or more questions
>>>> if I
>>>> >>>>>> have any.
>>>> >>>>>>
>>>> >>>>>> -Krutika
>>>> >>>>>>
>>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi
Adnan <
>>>> mahdi.adnan at outlook.com>
>>>> >>>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Thanks mate,
>>>> >>>>>>
>>>> >>>>>> Kindly, check the attachment.
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>>
>>>> >>>>>> Respectfully
>>>> >>>>>> Mahdi A. Mahdi
>>>> >>>>>>
>>>> >>>>>> From: Krutika Dhananjay <kdhananj
at redhat.com>
>>>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22
AM
>>>> >>>>>>
>>>> >>>>>> To: Mahdi Adnan
>>>> >>>>>> Cc: gluster-users at gluster.org
>>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> In that case could you share the
ganesha-gfapi logs?
>>>> >>>>>>
>>>> >>>>>> -Krutika
>>>> >>>>>>
>>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM,
Mahdi Adnan
>>>> >>>>>> <mahdi.adnan at outlook.com>
wrote:
>>>> >>>>>>
>>>> >>>>>> I have two volumes, one is mounted
using libgfapi for ovirt
>>>> mount, the
>>>> >>>>>> other one is exported via NFS-Ganesha
for VMWare which is the
>>>> one im testing
>>>> >>>>>> now.
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>>
>>>> >>>>>> Respectfully
>>>> >>>>>> Mahdi A. Mahdi
>>>> >>>>>>
>>>> >>>>>> From: Krutika Dhananjay <kdhananj
at redhat.com>
>>>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19
AM
>>>> >>>>>>
>>>> >>>>>> To: Mahdi Adnan
>>>> >>>>>> Cc: gluster-users at gluster.org
>>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM,
Mahdi Adnan
>>>> >>>>>> <mahdi.adnan at outlook.com>
wrote:
>>>> >>>>>>
>>>> >>>>>> Kindly, check the attached new log
file, i dont know if it's
>>>> helpful
>>>> >>>>>> or not but, i couldn't find the
log with the name you just
>>>> described.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> No. Are you using FUSE or libgfapi for
accessing the volume? Or
>>>> is it
>>>> >>>>>> NFS?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> -Krutika
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>>
>>>> >>>>>> Respectfully
>>>> >>>>>> Mahdi A. Mahdi
>>>> >>>>>>
>>>> >>>>>> From: Krutika Dhananjay <kdhananj
at redhat.com>
>>>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40
PM
>>>> >>>>>>
>>>> >>>>>> To: Mahdi Adnan
>>>> >>>>>> Cc: gluster-users at gluster.org
>>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> mnt-disk11-vmware2.log seems like a
brick log. Could you attach
>>>> the
>>>> >>>>>> fuse mount logs? It should be right
under /var/log/glusterfs/
>>>> directory
>>>> >>>>>>
>>>> >>>>>> named after the mount point name, only
hyphenated.
>>>> >>>>>>
>>>> >>>>>> -Krutika
>>>> >>>>>>
>>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi
Adnan <
>>>> mahdi.adnan at outlook.com>
>>>> >>>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Hello Krutika,
>>>> >>>>>>
>>>> >>>>>> Kindly, check the attached logs.
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>>
>>>> >>>>>> Respectfully
>>>> >>>>>> Mahdi A. Mahdi
>>>> >>>>>>
>>>> >>>>>> From: Krutika Dhananjay <kdhananj
at redhat.com>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03
PM
>>>> >>>>>> To: Mahdi Adnan
>>>> >>>>>> Cc: gluster-users at gluster.org
>>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>> corruption
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Hi Mahdi,
>>>> >>>>>>
>>>> >>>>>> Could you attach mount, brick and
rebalance logs?
>>>> >>>>>>
>>>> >>>>>> -Krutika
>>>> >>>>>>
>>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM,
Mahdi Adnan
>>>> >>>>>> <mahdi.adnan at outlook.com>
wrote:
>>>> >>>>>>
>>>> >>>>>> Hi,
>>>> >>>>>>
>>>> >>>>>> I have upgraded to Gluster 3.8.10
today and ran the add-brick
>>>> >>>>>> procedure in a volume contains few
VMs.
>>>> >>>>>>
>>>> >>>>>> After the completion of rebalance, i
have rebooted the VMs, some
>>>> of
>>>> >>>>>> ran just fine, and others just
crashed.
>>>> >>>>>>
>>>> >>>>>> Windows boot to recovery mode and
Linux throw xfs errors and
>>>> does not
>>>> >>>>>> boot.
>>>> >>>>>>
>>>> >>>>>> I ran the test again and it happened
just as the first one, but
>>>> i have
>>>> >>>>>> noticed only VMs doing disk IOs are
affected by this bug.
>>>> >>>>>>
>>>> >>>>>> The VMs in power off mode started fine
and even md5 of the disk
>>>> file
>>>> >>>>>> did not change after the rebalance.
>>>> >>>>>>
>>>> >>>>>> anyone else can confirm this ?
>>>> >>>>>>
>>>> >>>>>> Volume info:
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Volume Name: vmware2
>>>> >>>>>>
>>>> >>>>>> Type: Distributed-Replicate
>>>> >>>>>>
>>>> >>>>>> Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>> >>>>>>
>>>> >>>>>> Status: Started
>>>> >>>>>>
>>>> >>>>>> Snapshot Count: 0
>>>> >>>>>>
>>>> >>>>>> Number of Bricks: 22 x 2 = 44
>>>> >>>>>>
>>>> >>>>>> Transport-type: tcp
>>>> >>>>>>
>>>> >>>>>> Bricks:
>>>> >>>>>>
>>>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>> >>>>>>
>>>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>> >>>>>>
>>>> >>>>>> Options Reconfigured:
>>>> >>>>>>
>>>> >>>>>> cluster.server-quorum-type: server
>>>> >>>>>>
>>>> >>>>>> nfs.disable: on
>>>> >>>>>>
>>>> >>>>>> performance.readdir-ahead: on
>>>> >>>>>>
>>>> >>>>>> transport.address-family: inet
>>>> >>>>>>
>>>> >>>>>> performance.quick-read: off
>>>> >>>>>>
>>>> >>>>>> performance.read-ahead: off
>>>> >>>>>>
>>>> >>>>>> performance.io-cache: off
>>>> >>>>>>
>>>> >>>>>> performance.stat-prefetch: off
>>>> >>>>>>
>>>> >>>>>> cluster.eager-lock: enable
>>>> >>>>>>
>>>> >>>>>> network.remote-dio: enable
>>>> >>>>>>
>>>> >>>>>> features.shard: on
>>>> >>>>>>
>>>> >>>>>> cluster.data-self-heal-algorithm: full
>>>> >>>>>>
>>>> >>>>>> features.cache-invalidation: on
>>>> >>>>>>
>>>> >>>>>> ganesha.enable: on
>>>> >>>>>>
>>>> >>>>>> features.shard-block-size: 256MB
>>>> >>>>>>
>>>> >>>>>> client.event-threads: 2
>>>> >>>>>>
>>>> >>>>>> server.event-threads: 2
>>>> >>>>>>
>>>> >>>>>> cluster.favorite-child-policy: size
>>>> >>>>>>
>>>> >>>>>> storage.build-pgfid: off
>>>> >>>>>>
>>>> >>>>>> network.ping-timeout: 5
>>>> >>>>>>
>>>> >>>>>> cluster.enable-shared-storage: enable
>>>> >>>>>>
>>>> >>>>>> nfs-ganesha: enable
>>>> >>>>>>
>>>> >>>>>> cluster.server-quorum-ratio: 51%
>>>> >>>>>>
>>>> >>>>>> Adding bricks:
>>>> >>>>>>
>>>> >>>>>> gluster volume add-brick vmware2
replica 2
>>>> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>>>> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>>>> >>>>>>
>>>> >>>>>> starting fix layout:
>>>> >>>>>>
>>>> >>>>>> gluster volume rebalance vmware2
fix-layout start
>>>> >>>>>>
>>>> >>>>>> Starting rebalance:
>>>> >>>>>>
>>>> >>>>>> gluster volume rebalance vmware2 
start
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>>
>>>> >>>>>> Respectfully
>>>> >>>>>> Mahdi A. Mahdi
>>>> >>>>>>
>>>> >>>>>>
_______________________________________________
>>>> >>>>>> Gluster-users mailing list
>>>> >>>>>> Gluster-users at gluster.org
>>>> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
_______________________________________________
>>>> >>>>> Gluster-users mailing list
>>>> >>>>> Gluster-users at gluster.org
>>>> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> Pranith
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> Gluster-users mailing list
>>>> >>> Gluster-users at gluster.org
>>>> >>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>> >
>>>> >
>>>>
>>>
>>
>>
>> --
>> Pranith
>>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/15f5c9fc/attachment.html>

Gluster users - Apr 2017 - Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption