thr3ads.net - Gluster users - [Gluster-users] Gluster 3.8.10 rebalance VMs corruption [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2017-Apr-27 10:58 UTC

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

I am not a DHT developer, so some of what I say could be a little wrong.
But this is what I gather.
I think they found 2 classes of bugs in dht
1) Graceful fop failover when rebalance is in progress is missing for some
fops, that lead to VM pause.

I see that https://review.gluster.org/17085 got merged on 24th on master
for this. I see patches are posted for 3.8.x for this one.

2) I think there is some work needs to be done for dht_[f]xattrop. I
believe this is the next step that is underway.


On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta <
gandalf.corvotempesta at gmail.com> wrote:
> Updates on this critical bug ?
>
> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
> gandalf.corvotempesta at gmail.com> ha scritto:
>
>> Any update ?
>> In addition, if this is a different bug but the "workflow" is
the same
>> as the previous one, how is possible that fixing the previous bug
>> triggered this new one ?
>>
>> Is possible to have some details ?
>>
>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at
redhat.com>:
>> > Nope. This is a different bug.
>> >
>> > -Krutika
>> >
>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>> > <gandalf.corvotempesta at gmail.com> wrote:
>> >>
>> >> This is a good news
>> >> Is this related to the previously fixed bug?
>> >>
>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"
<kdhananj at redhat.com> ha
>> >> scritto:
>> >>>
>> >>> So Raghavendra has an RCA for this issue.
>> >>>
>> >>> Copy-pasting his comment here:
>> >>>
>> >>> <RCA>
>> >>>
>> >>> Following is a rough algorithm of shard_writev:
>> >>>
>> >>> 1. Based on the offset, calculate the shards touched by
current write.
>> >>> 2. Look for inodes corresponding to these shard files in
itable.
>> >>> 3. If one or more inodes are missing from itable, issue
mknod for
>> >>> corresponding shard files and ignore EEXIST in cbk.
>> >>> 4. resume writes on respective shards.
>> >>>
>> >>> Now, imagine a write which falls to an existing
"shard_file". For the
>> >>> sake of discussion lets consider a distribute of three
subvols - s1,
>> s2, s3
>> >>>
>> >>> 1. "shard_file" hashes to subvolume s2 and is
present on s2
>> >>> 2. add a subvolume s4 and initiate a fix layout. The
layout of
>> ".shard"
>> >>> is fixed to include s4 and hash ranges are changed.
>> >>> 3. write that touches "shard_file" is issued.
>> >>> 4. The inode for "shard_file" is not present in
itable after a graph
>> >>> switch and features/shard issues an mknod.
>> >>> 5. With new layout of .shard, lets say
"shard_file" hashes to s3 and
>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is
already
>> present on
>> >>> s2.
>> >>>
>> >>> So, we have two files on two different subvols of dht
representing
>> same
>> >>> shard and this will lead to corruption.
>> >>>
>> >>> </RCA>
>> >>>
>> >>> Raghavendra will be sending out a patch in DHT to fix this
issue.
>> >>>
>> >>> -Krutika
>> >>>
>> >>>
>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
>> >>> <pkarampu at redhat.com> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <
>> mahdi.adnan at outlook.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>>
>> >>>>> Do you guys have any update regarding this issue ?
>> >>>>
>> >>>> I do not actively work on this issue so I do not have
an accurate
>> >>>> update, but from what I heard from Krutika and
Raghavendra(works on
>> DHT) is:
>> >>>> Krutika debugged initially and found that the issue
seems more
>> likely to be
>> >>>> in DHT, Satheesaran who helped us recreate this issue
in lab found
>> that just
>> >>>> fix-layout without rebalance also caused the
corruption 1 out of 3
>> times.
>> >>>> Raghavendra came up with a possible RCA for why this
can happen.
>> >>>> Raghavendra(CCed) would be the right person to provide
accurate
>> update.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Respectfully
>> >>>>> Mahdi A. Mahdi
>> >>>>>
>> >>>>> ________________________________
>> >>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
>> >>>>> To: Mahdi Adnan
>> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra;
Susant Palai;
>> >>>>> gluster-users at gluster.org List
>> >>>>>
>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> So it looks like Satheesaran managed to recreate
this issue. We
>> will be
>> >>>>> seeking his help in debugging this. It will be
easier that way.
>> >>>>>
>> >>>>> -Krutika
>> >>>>>
>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <
>> mahdi.adnan at outlook.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hello and thank you for your email.
>> >>>>>> Actually no, i didn't check the gfid of
the vms.
>> >>>>>> If this will help, i can setup a new test
cluster and get all the
>> data
>> >>>>>> you need.
>> >>>>>>
>> >>>>>> Get Outlook for Android
>> >>>>>>
>> >>>>>>
>> >>>>>> From: Nithya Balachandran
>> >>>>>> Sent: Monday, March 20, 20:57
>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>> corruption
>> >>>>>> To: Krutika Dhananjay
>> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant
Palai,
>> >>>>>> gluster-users at gluster.org List
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> Do you know the GFIDs of the VM images which
were corrupted?
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>>
>> >>>>>> Nithya
>> >>>>>>
>> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay
<kdhananj at redhat.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> I looked at the logs.
>> >>>>>>
>> >>>>>> From the time the new graph (since the
add-brick command you shared
>> >>>>>> where bricks 41 through 44 are added) is
switched to (line 3011
>> onwards in
>> >>>>>> nfs-gfapi.log), I see the following kinds of
errors:
>> >>>>>>
>> >>>>>> 1. Lookups to a bunch of files failed with
ENOENT on both replicas
>> >>>>>> which protocol/client converts to ESTALE. I am
guessing these
>> entries got
>> >>>>>> migrated to
>> >>>>>>
>> >>>>>> other subvolumes leading to 'No such file
or directory' errors.
>> >>>>>>
>> >>>>>> DHT and thereafter shard get the same error
code and log the
>> >>>>>> following:
>> >>>>>>
>> >>>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID:
109040]
>> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
>> 17-vmware2-dht:
>> >>>>>>
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to lookup
>> the file
>> >>>>>> on vmware2-dht [Stale file handle]
>> >>>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID:
133014]
>> >>>>>> [shard.c:1253:shard_common_stat_cbk]
17-vmware2-shard: stat
>> failed:
>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale
file handle]
>> >>>>>>
>> >>>>>> which is fine.
>> >>>>>>
>> >>>>>> 2. The other kind are from AFR logging of
possible split-brain
>> which I
>> >>>>>> suppose are harmless too.
>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>> >>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13:
>> Unreadable
>> >>>>>> subvolume -1 found with event generation 2 for
gfid
>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9.
(Possible split-brain)
>> >>>>>>
>> >>>>>> Since you are saying the bug is hit only on
VMs that are
>> undergoing IO
>> >>>>>> while rebalance is running (as opposed to
those that remained
>> powered off),
>> >>>>>>
>> >>>>>> rebalance + IO could be causing some issues.
>> >>>>>>
>> >>>>>> CC'ing DHT devs
>> >>>>>>
>> >>>>>> Raghavendra/Nithya/Susant,
>> >>>>>>
>> >>>>>> Could you take a look?
>> >>>>>>
>> >>>>>> -Krutika
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan
<
>> mahdi.adnan at outlook.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Thank you for your email mate.
>> >>>>>>
>> >>>>>> Yes, im aware of this but, to save costs i
chose replica 2, this
>> >>>>>> cluster is all flash.
>> >>>>>>
>> >>>>>> In version 3.7.x i had issues with ping
timeout, if one hosts went
>> >>>>>> down for few seconds the whole cluster hangs
and become
>> unavailable, to
>> >>>>>> avoid this i adjusted the ping timeout to 5
seconds.
>> >>>>>>
>> >>>>>> As for choosing Ganesha over gfapi, VMWare
does not support Gluster
>> >>>>>> (FUSE or gfapi) im stuck with NFS for this
volume.
>> >>>>>>
>> >>>>>> The other volume is mounted using gfapi in
oVirt cluster.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Respectfully
>> >>>>>> Mahdi A. Mahdi
>> >>>>>>
>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM
>> >>>>>>
>> >>>>>> To: Mahdi Adnan
>> >>>>>> Cc: gluster-users at gluster.org
>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>> corruption
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> While I'm still going through the logs,
just wanted to point out a
>> >>>>>> couple of things:
>> >>>>>>
>> >>>>>> 1. It is recommended that you use 3-way
replication (replica count
>> 3)
>> >>>>>> for VM store use case
>> >>>>>>
>> >>>>>> 2. network.ping-timeout at 5 seconds is way
too low. Please change
>> it
>> >>>>>> to 30.
>> >>>>>>
>> >>>>>> Is there any specific reason for using
NFS-Ganesha over gfapi/FUSE?
>> >>>>>>
>> >>>>>> Will get back with anything else I might find
or more questions if
>> I
>> >>>>>> have any.
>> >>>>>>
>> >>>>>> -Krutika
>> >>>>>>
>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan
<
>> mahdi.adnan at outlook.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Thanks mate,
>> >>>>>>
>> >>>>>> Kindly, check the attachment.
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Respectfully
>> >>>>>> Mahdi A. Mahdi
>> >>>>>>
>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM
>> >>>>>>
>> >>>>>> To: Mahdi Adnan
>> >>>>>> Cc: gluster-users at gluster.org
>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>> corruption
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> In that case could you share the ganesha-gfapi
logs?
>> >>>>>>
>> >>>>>> -Krutika
>> >>>>>>
>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan
>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>> >>>>>>
>> >>>>>> I have two volumes, one is mounted using
libgfapi for ovirt mount,
>> the
>> >>>>>> other one is exported via NFS-Ganesha for
VMWare which is the one
>> im testing
>> >>>>>> now.
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Respectfully
>> >>>>>> Mahdi A. Mahdi
>> >>>>>>
>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM
>> >>>>>>
>> >>>>>> To: Mahdi Adnan
>> >>>>>> Cc: gluster-users at gluster.org
>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>> corruption
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan
>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>> >>>>>>
>> >>>>>> Kindly, check the attached new log file, i
dont know if it's
>> helpful
>> >>>>>> or not but, i couldn't find the log with
the name you just
>> described.
>> >>>>>>
>> >>>>>>
>> >>>>>> No. Are you using FUSE or libgfapi for
accessing the volume? Or is
>> it
>> >>>>>> NFS?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> -Krutika
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Respectfully
>> >>>>>> Mahdi A. Mahdi
>> >>>>>>
>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM
>> >>>>>>
>> >>>>>> To: Mahdi Adnan
>> >>>>>> Cc: gluster-users at gluster.org
>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>> corruption
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> mnt-disk11-vmware2.log seems like a brick log.
Could you attach the
>> >>>>>> fuse mount logs? It should be right under
/var/log/glusterfs/
>> directory
>> >>>>>>
>> >>>>>> named after the mount point name, only
hyphenated.
>> >>>>>>
>> >>>>>> -Krutika
>> >>>>>>
>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan
<
>> mahdi.adnan at outlook.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Hello Krutika,
>> >>>>>>
>> >>>>>> Kindly, check the attached logs.
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Respectfully
>> >>>>>> Mahdi A. Mahdi
>> >>>>>>
>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>> >>>>>>
>> >>>>>>
>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM
>> >>>>>> To: Mahdi Adnan
>> >>>>>> Cc: gluster-users at gluster.org
>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>> corruption
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Hi Mahdi,
>> >>>>>>
>> >>>>>> Could you attach mount, brick and rebalance
logs?
>> >>>>>>
>> >>>>>> -Krutika
>> >>>>>>
>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan
>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I have upgraded to Gluster 3.8.10 today and
ran the add-brick
>> >>>>>> procedure in a volume contains few VMs.
>> >>>>>>
>> >>>>>> After the completion of rebalance, i have
rebooted the VMs, some of
>> >>>>>> ran just fine, and others just crashed.
>> >>>>>>
>> >>>>>> Windows boot to recovery mode and Linux throw
xfs errors and does
>> not
>> >>>>>> boot.
>> >>>>>>
>> >>>>>> I ran the test again and it happened just as
the first one, but i
>> have
>> >>>>>> noticed only VMs doing disk IOs are affected
by this bug.
>> >>>>>>
>> >>>>>> The VMs in power off mode started fine and
even md5 of the disk
>> file
>> >>>>>> did not change after the rebalance.
>> >>>>>>
>> >>>>>> anyone else can confirm this ?
>> >>>>>>
>> >>>>>> Volume info:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Volume Name: vmware2
>> >>>>>>
>> >>>>>> Type: Distributed-Replicate
>> >>>>>>
>> >>>>>> Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
>> >>>>>>
>> >>>>>> Status: Started
>> >>>>>>
>> >>>>>> Snapshot Count: 0
>> >>>>>>
>> >>>>>> Number of Bricks: 22 x 2 = 44
>> >>>>>>
>> >>>>>> Transport-type: tcp
>> >>>>>>
>> >>>>>> Bricks:
>> >>>>>>
>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>> >>>>>>
>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>> >>>>>>
>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>> >>>>>>
>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>> >>>>>>
>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>> >>>>>>
>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>> >>>>>>
>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>> >>>>>>
>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>> >>>>>>
>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>> >>>>>>
>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>> >>>>>>
>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>> >>>>>>
>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>> >>>>>>
>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>> >>>>>>
>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>> >>>>>>
>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>> >>>>>>
>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>> >>>>>>
>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>> >>>>>>
>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>> >>>>>>
>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>> >>>>>>
>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>> >>>>>>
>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>> >>>>>>
>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>> >>>>>>
>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>> >>>>>>
>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>> >>>>>>
>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>> >>>>>>
>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>> >>>>>>
>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>> >>>>>>
>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>> >>>>>>
>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>> >>>>>>
>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>> >>>>>>
>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>> >>>>>>
>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>> >>>>>>
>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>> >>>>>>
>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>> >>>>>>
>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>> >>>>>>
>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>> >>>>>>
>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>> >>>>>>
>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>> >>>>>>
>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>> >>>>>>
>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>> >>>>>>
>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>> >>>>>>
>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>> >>>>>>
>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>> >>>>>>
>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>> >>>>>>
>> >>>>>> Options Reconfigured:
>> >>>>>>
>> >>>>>> cluster.server-quorum-type: server
>> >>>>>>
>> >>>>>> nfs.disable: on
>> >>>>>>
>> >>>>>> performance.readdir-ahead: on
>> >>>>>>
>> >>>>>> transport.address-family: inet
>> >>>>>>
>> >>>>>> performance.quick-read: off
>> >>>>>>
>> >>>>>> performance.read-ahead: off
>> >>>>>>
>> >>>>>> performance.io-cache: off
>> >>>>>>
>> >>>>>> performance.stat-prefetch: off
>> >>>>>>
>> >>>>>> cluster.eager-lock: enable
>> >>>>>>
>> >>>>>> network.remote-dio: enable
>> >>>>>>
>> >>>>>> features.shard: on
>> >>>>>>
>> >>>>>> cluster.data-self-heal-algorithm: full
>> >>>>>>
>> >>>>>> features.cache-invalidation: on
>> >>>>>>
>> >>>>>> ganesha.enable: on
>> >>>>>>
>> >>>>>> features.shard-block-size: 256MB
>> >>>>>>
>> >>>>>> client.event-threads: 2
>> >>>>>>
>> >>>>>> server.event-threads: 2
>> >>>>>>
>> >>>>>> cluster.favorite-child-policy: size
>> >>>>>>
>> >>>>>> storage.build-pgfid: off
>> >>>>>>
>> >>>>>> network.ping-timeout: 5
>> >>>>>>
>> >>>>>> cluster.enable-shared-storage: enable
>> >>>>>>
>> >>>>>> nfs-ganesha: enable
>> >>>>>>
>> >>>>>> cluster.server-quorum-ratio: 51%
>> >>>>>>
>> >>>>>> Adding bricks:
>> >>>>>>
>> >>>>>> gluster volume add-brick vmware2 replica 2
>> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>> >>>>>>
>> >>>>>> starting fix layout:
>> >>>>>>
>> >>>>>> gluster volume rebalance vmware2 fix-layout
start
>> >>>>>>
>> >>>>>> Starting rebalance:
>> >>>>>>
>> >>>>>> gluster volume rebalance vmware2  start
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Respectfully
>> >>>>>> Mahdi A. Mahdi
>> >>>>>>
>> >>>>>>
_______________________________________________
>> >>>>>> Gluster-users mailing list
>> >>>>>> Gluster-users at gluster.org
>> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Gluster-users mailing list
>> >>>>> Gluster-users at gluster.org
>> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Pranith
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Gluster-users mailing list
>> >>> Gluster-users at gluster.org
>> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/4ddb1250/attachment.html>

Gandalf Corvotempesta

2017-Apr-27 11:00 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

I think we are talking about a different bug.

Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" <pkarampu at
redhat.com> ha
scritto:
> I am not a DHT developer, so some of what I say could be a little wrong.
> But this is what I gather.
> I think they found 2 classes of bugs in dht
> 1) Graceful fop failover when rebalance is in progress is missing for some
> fops, that lead to VM pause.
>
> I see that https://review.gluster.org/17085 got merged on 24th on master
> for this. I see patches are posted for 3.8.x for this one.
>
> 2) I think there is some work needs to be done for dht_[f]xattrop. I
> believe this is the next step that is underway.
>
>
> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta <
> gandalf.corvotempesta at gmail.com> wrote:
>
>> Updates on this critical bug ?
>>
>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
>> gandalf.corvotempesta at gmail.com> ha scritto:
>>
>>> Any update ?
>>> In addition, if this is a different bug but the
"workflow" is the same
>>> as the previous one, how is possible that fixing the previous bug
>>> triggered this new one ?
>>>
>>> Is possible to have some details ?
>>>
>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at
redhat.com>:
>>> > Nope. This is a different bug.
>>> >
>>> > -Krutika
>>> >
>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>>> > <gandalf.corvotempesta at gmail.com> wrote:
>>> >>
>>> >> This is a good news
>>> >> Is this related to the previously fixed bug?
>>> >>
>>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay"
<kdhananj at redhat.com> ha
>>> >> scritto:
>>> >>>
>>> >>> So Raghavendra has an RCA for this issue.
>>> >>>
>>> >>> Copy-pasting his comment here:
>>> >>>
>>> >>> <RCA>
>>> >>>
>>> >>> Following is a rough algorithm of shard_writev:
>>> >>>
>>> >>> 1. Based on the offset, calculate the shards touched
by current
>>> write.
>>> >>> 2. Look for inodes corresponding to these shard files
in itable.
>>> >>> 3. If one or more inodes are missing from itable,
issue mknod for
>>> >>> corresponding shard files and ignore EEXIST in cbk.
>>> >>> 4. resume writes on respective shards.
>>> >>>
>>> >>> Now, imagine a write which falls to an existing
"shard_file". For the
>>> >>> sake of discussion lets consider a distribute of three
subvols - s1,
>>> s2, s3
>>> >>>
>>> >>> 1. "shard_file" hashes to subvolume s2 and
is present on s2
>>> >>> 2. add a subvolume s4 and initiate a fix layout. The
layout of
>>> ".shard"
>>> >>> is fixed to include s4 and hash ranges are changed.
>>> >>> 3. write that touches "shard_file" is
issued.
>>> >>> 4. The inode for "shard_file" is not present
in itable after a graph
>>> >>> switch and features/shard issues an mknod.
>>> >>> 5. With new layout of .shard, lets say
"shard_file" hashes to s3 and
>>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file
is already
>>> present on
>>> >>> s2.
>>> >>>
>>> >>> So, we have two files on two different subvols of dht
representing
>>> same
>>> >>> shard and this will lead to corruption.
>>> >>>
>>> >>> </RCA>
>>> >>>
>>> >>> Raghavendra will be sending out a patch in DHT to fix
this issue.
>>> >>>
>>> >>> -Krutika
>>> >>>
>>> >>>
>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar
Karampuri
>>> >>> <pkarampu at redhat.com> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>>
>>> >>>>> Do you guys have any update regarding this
issue ?
>>> >>>>
>>> >>>> I do not actively work on this issue so I do not
have an accurate
>>> >>>> update, but from what I heard from Krutika and
Raghavendra(works on
>>> DHT) is:
>>> >>>> Krutika debugged initially and found that the
issue seems more
>>> likely to be
>>> >>>> in DHT, Satheesaran who helped us recreate this
issue in lab found
>>> that just
>>> >>>> fix-layout without rebalance also caused the
corruption 1 out of 3
>>> times.
>>> >>>> Raghavendra came up with a possible RCA for why
this can happen.
>>> >>>> Raghavendra(CCed) would be the right person to
provide accurate
>>> update.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>>
>>> >>>>> Respectfully
>>> >>>>> Mahdi A. Mahdi
>>> >>>>>
>>> >>>>> ________________________________
>>> >>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
>>> >>>>> To: Mahdi Adnan
>>> >>>>> Cc: Nithya Balachandran; Gowdappa,
Raghavendra; Susant Palai;
>>> >>>>> gluster-users at gluster.org List
>>> >>>>>
>>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs
>>> corruption
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>> So it looks like Satheesaran managed to
recreate this issue. We
>>> will be
>>> >>>>> seeking his help in debugging this. It will be
easier that way.
>>> >>>>>
>>> >>>>> -Krutika
>>> >>>>>
>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan
<
>>> mahdi.adnan at outlook.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Hello and thank you for your email.
>>> >>>>>> Actually no, i didn't check the gfid
of the vms.
>>> >>>>>> If this will help, i can setup a new test
cluster and get all the
>>> data
>>> >>>>>> you need.
>>> >>>>>>
>>> >>>>>> Get Outlook for Android
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> From: Nithya Balachandran
>>> >>>>>> Sent: Monday, March 20, 20:57
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>> To: Krutika Dhananjay
>>> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra,
Susant Palai,
>>> >>>>>> gluster-users at gluster.org List
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> Do you know the GFIDs of the VM images
which were corrupted?
>>> >>>>>>
>>> >>>>>> Regards,
>>> >>>>>>
>>> >>>>>> Nithya
>>> >>>>>>
>>> >>>>>> On 20 March 2017 at 20:37, Krutika
Dhananjay <kdhananj at redhat.com
>>> >
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> I looked at the logs.
>>> >>>>>>
>>> >>>>>> From the time the new graph (since the
add-brick command you
>>> shared
>>> >>>>>> where bricks 41 through 44 are added) is
switched to (line 3011
>>> onwards in
>>> >>>>>> nfs-gfapi.log), I see the following kinds
of errors:
>>> >>>>>>
>>> >>>>>> 1. Lookups to a bunch of files failed with
ENOENT on both replicas
>>> >>>>>> which protocol/client converts to ESTALE.
I am guessing these
>>> entries got
>>> >>>>>> migrated to
>>> >>>>>>
>>> >>>>>> other subvolumes leading to 'No such
file or directory' errors.
>>> >>>>>>
>>> >>>>>> DHT and thereafter shard get the same
error code and log the
>>> >>>>>> following:
>>> >>>>>>
>>> >>>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID:
109040]
>>> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
>>> 17-vmware2-dht:
>>> >>>>>>
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to
>>> lookup the file
>>> >>>>>> on vmware2-dht [Stale file handle]
>>> >>>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID:
133014]
>>> >>>>>> [shard.c:1253:shard_common_stat_cbk]
17-vmware2-shard: stat
>>> failed:
>>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532
[Stale file handle]
>>> >>>>>>
>>> >>>>>> which is fine.
>>> >>>>>>
>>> >>>>>> 2. The other kind are from AFR logging of
possible split-brain
>>> which I
>>> >>>>>> suppose are harmless too.
>>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID:
108008]
>>> >>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13:
>>> Unreadable
>>> >>>>>> subvolume -1 found with event generation 2
for gfid
>>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9.
(Possible split-brain)
>>> >>>>>>
>>> >>>>>> Since you are saying the bug is hit only
on VMs that are
>>> undergoing IO
>>> >>>>>> while rebalance is running (as opposed to
those that remained
>>> powered off),
>>> >>>>>>
>>> >>>>>> rebalance + IO could be causing some
issues.
>>> >>>>>>
>>> >>>>>> CC'ing DHT devs
>>> >>>>>>
>>> >>>>>> Raghavendra/Nithya/Susant,
>>> >>>>>>
>>> >>>>>> Could you take a look?
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi
Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> Thank you for your email mate.
>>> >>>>>>
>>> >>>>>> Yes, im aware of this but, to save costs i
chose replica 2, this
>>> >>>>>> cluster is all flash.
>>> >>>>>>
>>> >>>>>> In version 3.7.x i had issues with ping
timeout, if one hosts went
>>> >>>>>> down for few seconds the whole cluster
hangs and become
>>> unavailable, to
>>> >>>>>> avoid this i adjusted the ping timeout to
5 seconds.
>>> >>>>>>
>>> >>>>>> As for choosing Ganesha over gfapi, VMWare
does not support
>>> Gluster
>>> >>>>>> (FUSE or gfapi) im stuck with NFS for this
volume.
>>> >>>>>>
>>> >>>>>> The other volume is mounted using gfapi in
oVirt cluster.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> While I'm still going through the
logs, just wanted to point out a
>>> >>>>>> couple of things:
>>> >>>>>>
>>> >>>>>> 1. It is recommended that you use 3-way
replication (replica
>>> count 3)
>>> >>>>>> for VM store use case
>>> >>>>>>
>>> >>>>>> 2. network.ping-timeout at 5 seconds is
way too low. Please
>>> change it
>>> >>>>>> to 30.
>>> >>>>>>
>>> >>>>>> Is there any specific reason for using
NFS-Ganesha over
>>> gfapi/FUSE?
>>> >>>>>>
>>> >>>>>> Will get back with anything else I might
find or more questions
>>> if I
>>> >>>>>> have any.
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi
Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> Thanks mate,
>>> >>>>>>
>>> >>>>>> Kindly, check the attachment.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> In that case could you share the
ganesha-gfapi logs?
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi
Adnan
>>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>>> >>>>>>
>>> >>>>>> I have two volumes, one is mounted using
libgfapi for ovirt
>>> mount, the
>>> >>>>>> other one is exported via NFS-Ganesha for
VMWare which is the one
>>> im testing
>>> >>>>>> now.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi
Adnan
>>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>>> >>>>>>
>>> >>>>>> Kindly, check the attached new log file, i
dont know if it's
>>> helpful
>>> >>>>>> or not but, i couldn't find the log
with the name you just
>>> described.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> No. Are you using FUSE or libgfapi for
accessing the volume? Or
>>> is it
>>> >>>>>> NFS?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM
>>> >>>>>>
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> mnt-disk11-vmware2.log seems like a brick
log. Could you attach
>>> the
>>> >>>>>> fuse mount logs? It should be right under
/var/log/glusterfs/
>>> directory
>>> >>>>>>
>>> >>>>>> named after the mount point name, only
hyphenated.
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi
Adnan <
>>> mahdi.adnan at outlook.com>
>>> >>>>>> wrote:
>>> >>>>>>
>>> >>>>>> Hello Krutika,
>>> >>>>>>
>>> >>>>>> Kindly, check the attached logs.
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM
>>> >>>>>> To: Mahdi Adnan
>>> >>>>>> Cc: gluster-users at gluster.org
>>> >>>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>> corruption
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Hi Mahdi,
>>> >>>>>>
>>> >>>>>> Could you attach mount, brick and
rebalance logs?
>>> >>>>>>
>>> >>>>>> -Krutika
>>> >>>>>>
>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi
Adnan
>>> >>>>>> <mahdi.adnan at outlook.com> wrote:
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I have upgraded to Gluster 3.8.10 today
and ran the add-brick
>>> >>>>>> procedure in a volume contains few VMs.
>>> >>>>>>
>>> >>>>>> After the completion of rebalance, i have
rebooted the VMs, some
>>> of
>>> >>>>>> ran just fine, and others just crashed.
>>> >>>>>>
>>> >>>>>> Windows boot to recovery mode and Linux
throw xfs errors and does
>>> not
>>> >>>>>> boot.
>>> >>>>>>
>>> >>>>>> I ran the test again and it happened just
as the first one, but i
>>> have
>>> >>>>>> noticed only VMs doing disk IOs are
affected by this bug.
>>> >>>>>>
>>> >>>>>> The VMs in power off mode started fine and
even md5 of the disk
>>> file
>>> >>>>>> did not change after the rebalance.
>>> >>>>>>
>>> >>>>>> anyone else can confirm this ?
>>> >>>>>>
>>> >>>>>> Volume info:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Volume Name: vmware2
>>> >>>>>>
>>> >>>>>> Type: Distributed-Replicate
>>> >>>>>>
>>> >>>>>> Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
>>> >>>>>>
>>> >>>>>> Status: Started
>>> >>>>>>
>>> >>>>>> Snapshot Count: 0
>>> >>>>>>
>>> >>>>>> Number of Bricks: 22 x 2 = 44
>>> >>>>>>
>>> >>>>>> Transport-type: tcp
>>> >>>>>>
>>> >>>>>> Bricks:
>>> >>>>>>
>>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>> >>>>>>
>>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>> >>>>>>
>>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>> >>>>>>
>>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>> >>>>>>
>>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>> >>>>>>
>>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>> >>>>>>
>>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>> >>>>>>
>>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>> >>>>>>
>>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>> >>>>>>
>>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>> >>>>>>
>>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> Options Reconfigured:
>>> >>>>>>
>>> >>>>>> cluster.server-quorum-type: server
>>> >>>>>>
>>> >>>>>> nfs.disable: on
>>> >>>>>>
>>> >>>>>> performance.readdir-ahead: on
>>> >>>>>>
>>> >>>>>> transport.address-family: inet
>>> >>>>>>
>>> >>>>>> performance.quick-read: off
>>> >>>>>>
>>> >>>>>> performance.read-ahead: off
>>> >>>>>>
>>> >>>>>> performance.io-cache: off
>>> >>>>>>
>>> >>>>>> performance.stat-prefetch: off
>>> >>>>>>
>>> >>>>>> cluster.eager-lock: enable
>>> >>>>>>
>>> >>>>>> network.remote-dio: enable
>>> >>>>>>
>>> >>>>>> features.shard: on
>>> >>>>>>
>>> >>>>>> cluster.data-self-heal-algorithm: full
>>> >>>>>>
>>> >>>>>> features.cache-invalidation: on
>>> >>>>>>
>>> >>>>>> ganesha.enable: on
>>> >>>>>>
>>> >>>>>> features.shard-block-size: 256MB
>>> >>>>>>
>>> >>>>>> client.event-threads: 2
>>> >>>>>>
>>> >>>>>> server.event-threads: 2
>>> >>>>>>
>>> >>>>>> cluster.favorite-child-policy: size
>>> >>>>>>
>>> >>>>>> storage.build-pgfid: off
>>> >>>>>>
>>> >>>>>> network.ping-timeout: 5
>>> >>>>>>
>>> >>>>>> cluster.enable-shared-storage: enable
>>> >>>>>>
>>> >>>>>> nfs-ganesha: enable
>>> >>>>>>
>>> >>>>>> cluster.server-quorum-ratio: 51%
>>> >>>>>>
>>> >>>>>> Adding bricks:
>>> >>>>>>
>>> >>>>>> gluster volume add-brick vmware2 replica 2
>>> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>>> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>>> >>>>>>
>>> >>>>>> starting fix layout:
>>> >>>>>>
>>> >>>>>> gluster volume rebalance vmware2
fix-layout start
>>> >>>>>>
>>> >>>>>> Starting rebalance:
>>> >>>>>>
>>> >>>>>> gluster volume rebalance vmware2  start
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>>
>>> >>>>>> Respectfully
>>> >>>>>> Mahdi A. Mahdi
>>> >>>>>>
>>> >>>>>>
_______________________________________________
>>> >>>>>> Gluster-users mailing list
>>> >>>>>> Gluster-users at gluster.org
>>> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
_______________________________________________
>>> >>>>> Gluster-users mailing list
>>> >>>>> Gluster-users at gluster.org
>>> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Pranith
>>> >>>
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> Gluster-users mailing list
>>> >>> Gluster-users at gluster.org
>>> >>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>> >
>>> >
>>>
>>
>
>
> --
> Pranith
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/55194d14/attachment.html>

Gluster users - Apr 2017 - Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption