thr3ads.net - Gluster users - [Gluster-users] Gluster 3.8.10 rebalance VMs corruption [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Serkan Çoban

2017-Apr-27 11:21 UTC

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

I think this is he fix Gandalf asking for:
https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec47e1ba27ce


On Thu, Apr 27, 2017 at 2:03 PM, Pranith Kumar Karampuri
<pkarampu at redhat.com> wrote:> I am very positive about the two things I told you. These are the latest
> things that happened for VM corruption with rebalance.
>
> On Thu, Apr 27, 2017 at 4:30 PM, Gandalf Corvotempesta
> <gandalf.corvotempesta at gmail.com> wrote:
>>
>> I think we are talking about a different bug.
>>
>> Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri"
<pkarampu at redhat.com>
>> ha scritto:
>>>
>>> I am not a DHT developer, so some of what I say could be a little
wrong.
>>> But this is what I gather.
>>> I think they found 2 classes of bugs in dht
>>> 1) Graceful fop failover when rebalance is in progress is missing
for
>>> some fops, that lead to VM pause.
>>>
>>> I see that https://review.gluster.org/17085 got merged on 24th on
master
>>> for this. I see patches are posted for 3.8.x for this one.
>>>
>>> 2) I think there is some work needs to be done for dht_[f]xattrop.
I
>>> believe this is the next step that is underway.
>>>
>>>
>>> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta
>>> <gandalf.corvotempesta at gmail.com> wrote:
>>>>
>>>> Updates on this critical bug ?
>>>>
>>>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta"
>>>> <gandalf.corvotempesta at gmail.com> ha scritto:
>>>>>
>>>>> Any update ?
>>>>> In addition, if this is a different bug but the
"workflow" is the same
>>>>> as the previous one, how is possible that fixing the
previous bug
>>>>> triggered this new one ?
>>>>>
>>>>> Is possible to have some details ?
>>>>>
>>>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj
at redhat.com>:
>>>>> > Nope. This is a different bug.
>>>>> >
>>>>> > -Krutika
>>>>> >
>>>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
>>>>> > <gandalf.corvotempesta at gmail.com> wrote:
>>>>> >>
>>>>> >> This is a good news
>>>>> >> Is this related to the previously fixed bug?
>>>>> >>
>>>>> >> Il 3 apr 2017 10:22 AM, "Krutika
Dhananjay" <kdhananj at redhat.com> ha
>>>>> >> scritto:
>>>>> >>>
>>>>> >>> So Raghavendra has an RCA for this issue.
>>>>> >>>
>>>>> >>> Copy-pasting his comment here:
>>>>> >>>
>>>>> >>> <RCA>
>>>>> >>>
>>>>> >>> Following is a rough algorithm of
shard_writev:
>>>>> >>>
>>>>> >>> 1. Based on the offset, calculate the shards
touched by current
>>>>> >>> write.
>>>>> >>> 2. Look for inodes corresponding to these
shard files in itable.
>>>>> >>> 3. If one or more inodes are missing from
itable, issue mknod for
>>>>> >>> corresponding shard files and ignore EEXIST in
cbk.
>>>>> >>> 4. resume writes on respective shards.
>>>>> >>>
>>>>> >>> Now, imagine a write which falls to an
existing "shard_file". For
>>>>> >>> the
>>>>> >>> sake of discussion lets consider a distribute
of three subvols -
>>>>> >>> s1, s2, s3
>>>>> >>>
>>>>> >>> 1. "shard_file" hashes to subvolume
s2 and is present on s2
>>>>> >>> 2. add a subvolume s4 and initiate a fix
layout. The layout of
>>>>> >>> ".shard"
>>>>> >>> is fixed to include s4 and hash ranges are
changed.
>>>>> >>> 3. write that touches "shard_file"
is issued.
>>>>> >>> 4. The inode for "shard_file" is not
present in itable after a
>>>>> >>> graph
>>>>> >>> switch and features/shard issues an mknod.
>>>>> >>> 5. With new layout of .shard, lets say
"shard_file" hashes to s3
>>>>> >>> and
>>>>> >>> mknod (shard_file) on s3 succeeds. But, the
shard_file is already
>>>>> >>> present on
>>>>> >>> s2.
>>>>> >>>
>>>>> >>> So, we have two files on two different subvols
of dht representing
>>>>> >>> same
>>>>> >>> shard and this will lead to corruption.
>>>>> >>>
>>>>> >>> </RCA>
>>>>> >>>
>>>>> >>> Raghavendra will be sending out a patch in DHT
to fix this issue.
>>>>> >>>
>>>>> >>> -Krutika
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith
Kumar Karampuri
>>>>> >>> <pkarampu at redhat.com> wrote:
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi
Adnan
>>>>> >>>> <mahdi.adnan at outlook.com>
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> Hi,
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Do you guys have any update regarding
this issue ?
>>>>> >>>>
>>>>> >>>> I do not actively work on this issue so I
do not have an accurate
>>>>> >>>> update, but from what I heard from Krutika
and Raghavendra(works
>>>>> >>>> on DHT) is:
>>>>> >>>> Krutika debugged initially and found that
the issue seems more
>>>>> >>>> likely to be
>>>>> >>>> in DHT, Satheesaran who helped us recreate
this issue in lab found
>>>>> >>>> that just
>>>>> >>>> fix-layout without rebalance also caused
the corruption 1 out of 3
>>>>> >>>> times.
>>>>> >>>> Raghavendra came up with a possible RCA
for why this can happen.
>>>>> >>>> Raghavendra(CCed) would be the right
person to provide accurate
>>>>> >>>> update.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>>
>>>>> >>>>> Respectfully
>>>>> >>>>> Mahdi A. Mahdi
>>>>> >>>>>
>>>>> >>>>> ________________________________
>>>>> >>>>> From: Krutika Dhananjay <kdhananj
at redhat.com>
>>>>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55
PM
>>>>> >>>>> To: Mahdi Adnan
>>>>> >>>>> Cc: Nithya Balachandran; Gowdappa,
Raghavendra; Susant Palai;
>>>>> >>>>> gluster-users at gluster.org List
>>>>> >>>>>
>>>>> >>>>> Subject: Re: [Gluster-users] Gluster
3.8.10 rebalance VMs
>>>>> >>>>> corruption
>>>>> >>>>>
>>>>> >>>>> Hi,
>>>>> >>>>>
>>>>> >>>>> So it looks like Satheesaran managed
to recreate this issue. We
>>>>> >>>>> will be
>>>>> >>>>> seeking his help in debugging this. It
will be easier that way.
>>>>> >>>>>
>>>>> >>>>> -Krutika
>>>>> >>>>>
>>>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi
Adnan
>>>>> >>>>> <mahdi.adnan at outlook.com>
>>>>> >>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hello and thank you for your
email.
>>>>> >>>>>> Actually no, i didn't check
the gfid of the vms.
>>>>> >>>>>> If this will help, i can setup a
new test cluster and get all
>>>>> >>>>>> the data
>>>>> >>>>>> you need.
>>>>> >>>>>>
>>>>> >>>>>> Get Outlook for Android
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> From: Nithya Balachandran
>>>>> >>>>>> Sent: Monday, March 20, 20:57
>>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
>>>>> >>>>>> corruption
>>>>> >>>>>> To: Krutika Dhananjay
>>>>> >>>>>> Cc: Mahdi Adnan, Gowdappa,
Raghavendra, Susant Palai,
>>>>> >>>>>> gluster-users at gluster.org List
>>>>> >>>>>>
>>>>> >>>>>> Hi,
>>>>> >>>>>>
>>>>> >>>>>> Do you know the GFIDs of the VM
images which were corrupted?
>>>>> >>>>>>
>>>>> >>>>>> Regards,
>>>>> >>>>>>
>>>>> >>>>>> Nithya
>>>>> >>>>>>
>>>>> >>>>>> On 20 March 2017 at 20:37, Krutika
Dhananjay
>>>>> >>>>>> <kdhananj at redhat.com>
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> I looked at the logs.
>>>>> >>>>>>
>>>>> >>>>>> From the time the new graph (since
the add-brick command you
>>>>> >>>>>> shared
>>>>> >>>>>> where bricks 41 through 44 are
added) is switched to (line 3011
>>>>> >>>>>> onwards in
>>>>> >>>>>> nfs-gfapi.log), I see the
following kinds of errors:
>>>>> >>>>>>
>>>>> >>>>>> 1. Lookups to a bunch of files
failed with ENOENT on both
>>>>> >>>>>> replicas
>>>>> >>>>>> which protocol/client converts to
ESTALE. I am guessing these
>>>>> >>>>>> entries got
>>>>> >>>>>> migrated to
>>>>> >>>>>>
>>>>> >>>>>> other subvolumes leading to
'No such file or directory' errors.
>>>>> >>>>>>
>>>>> >>>>>> DHT and thereafter shard get the
same error code and log the
>>>>> >>>>>> following:
>>>>> >>>>>>
>>>>> >>>>>>  0 [2017-03-17 14:04:26.353444] E
[MSGID: 109040]
>>>>> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
>>>>> >>>>>> 17-vmware2-dht:
>>>>> >>>>>>
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to
>>>>> >>>>>> lookup the file
>>>>> >>>>>> on vmware2-dht [Stale file handle]
>>>>> >>>>>>   1 [2017-03-17 14:04:26.353528] E
[MSGID: 133014]
>>>>> >>>>>>
[shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat
>>>>> >>>>>> failed:
>>>>> >>>>>>
a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>>>> >>>>>>
>>>>> >>>>>> which is fine.
>>>>> >>>>>>
>>>>> >>>>>> 2. The other kind are from AFR
logging of possible split-brain
>>>>> >>>>>> which I
>>>>> >>>>>> suppose are harmless too.
>>>>> >>>>>> [2017-03-17 14:23:36.968883] W
[MSGID: 108008]
>>>>> >>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13:
>>>>> >>>>>> Unreadable
>>>>> >>>>>> subvolume -1 found with event
generation 2 for gfid
>>>>> >>>>>>
74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
>>>>> >>>>>>
>>>>> >>>>>> Since you are saying the bug is
hit only on VMs that are
>>>>> >>>>>> undergoing IO
>>>>> >>>>>> while rebalance is running (as
opposed to those that remained
>>>>> >>>>>> powered off),
>>>>> >>>>>>
>>>>> >>>>>> rebalance + IO could be causing
some issues.
>>>>> >>>>>>
>>>>> >>>>>> CC'ing DHT devs
>>>>> >>>>>>
>>>>> >>>>>> Raghavendra/Nithya/Susant,
>>>>> >>>>>>
>>>>> >>>>>> Could you take a look?
>>>>> >>>>>>
>>>>> >>>>>> -Krutika
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM,
Mahdi Adnan
>>>>> >>>>>> <mahdi.adnan at outlook.com>
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Thank you for your email mate.
>>>>> >>>>>>
>>>>> >>>>>> Yes, im aware of this but, to save
costs i chose replica 2, this
>>>>> >>>>>> cluster is all flash.
>>>>> >>>>>>
>>>>> >>>>>> In version 3.7.x i had issues with
ping timeout, if one hosts
>>>>> >>>>>> went
>>>>> >>>>>> down for few seconds the whole
cluster hangs and become
>>>>> >>>>>> unavailable, to
>>>>> >>>>>> avoid this i adjusted the ping
timeout to 5 seconds.
>>>>> >>>>>>
>>>>> >>>>>> As for choosing Ganesha over
gfapi, VMWare does not support
>>>>> >>>>>> Gluster
>>>>> >>>>>> (FUSE or gfapi) im stuck with NFS
for this volume.
>>>>> >>>>>>
>>>>> >>>>>> The other volume is mounted using
gfapi in oVirt cluster.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>>
>>>>> >>>>>> Respectfully
>>>>> >>>>>> Mahdi A. Mahdi
>>>>> >>>>>>
>>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
>>>>> >>>>>> Sent: Sunday, March 19, 2017
2:01:49 PM
>>>>> >>>>>>
>>>>> >>>>>> To: Mahdi Adnan
>>>>> >>>>>> Cc: gluster-users at gluster.org
>>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
>>>>> >>>>>> corruption
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> While I'm still going through
the logs, just wanted to point out
>>>>> >>>>>> a
>>>>> >>>>>> couple of things:
>>>>> >>>>>>
>>>>> >>>>>> 1. It is recommended that you use
3-way replication (replica
>>>>> >>>>>> count 3)
>>>>> >>>>>> for VM store use case
>>>>> >>>>>>
>>>>> >>>>>> 2. network.ping-timeout at 5
seconds is way too low. Please
>>>>> >>>>>> change it
>>>>> >>>>>> to 30.
>>>>> >>>>>>
>>>>> >>>>>> Is there any specific reason for
using NFS-Ganesha over
>>>>> >>>>>> gfapi/FUSE?
>>>>> >>>>>>
>>>>> >>>>>> Will get back with anything else I
might find or more questions
>>>>> >>>>>> if I
>>>>> >>>>>> have any.
>>>>> >>>>>>
>>>>> >>>>>> -Krutika
>>>>> >>>>>>
>>>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM,
Mahdi Adnan
>>>>> >>>>>> <mahdi.adnan at outlook.com>
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Thanks mate,
>>>>> >>>>>>
>>>>> >>>>>> Kindly, check the attachment.
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>>
>>>>> >>>>>> Respectfully
>>>>> >>>>>> Mahdi A. Mahdi
>>>>> >>>>>>
>>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
>>>>> >>>>>> Sent: Sunday, March 19, 2017
10:00:22 AM
>>>>> >>>>>>
>>>>> >>>>>> To: Mahdi Adnan
>>>>> >>>>>> Cc: gluster-users at gluster.org
>>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
>>>>> >>>>>> corruption
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> In that case could you share the
ganesha-gfapi logs?
>>>>> >>>>>>
>>>>> >>>>>> -Krutika
>>>>> >>>>>>
>>>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM,
Mahdi Adnan
>>>>> >>>>>> <mahdi.adnan at outlook.com>
wrote:
>>>>> >>>>>>
>>>>> >>>>>> I have two volumes, one is mounted
using libgfapi for ovirt
>>>>> >>>>>> mount, the
>>>>> >>>>>> other one is exported via
NFS-Ganesha for VMWare which is the
>>>>> >>>>>> one im testing
>>>>> >>>>>> now.
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>>
>>>>> >>>>>> Respectfully
>>>>> >>>>>> Mahdi A. Mahdi
>>>>> >>>>>>
>>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
>>>>> >>>>>> Sent: Sunday, March 19, 2017
8:02:19 AM
>>>>> >>>>>>
>>>>> >>>>>> To: Mahdi Adnan
>>>>> >>>>>> Cc: gluster-users at gluster.org
>>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
>>>>> >>>>>> corruption
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM,
Mahdi Adnan
>>>>> >>>>>> <mahdi.adnan at outlook.com>
wrote:
>>>>> >>>>>>
>>>>> >>>>>> Kindly, check the attached new log
file, i dont know if it's
>>>>> >>>>>> helpful
>>>>> >>>>>> or not but, i couldn't find
the log with the name you just
>>>>> >>>>>> described.
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> No. Are you using FUSE or libgfapi
for accessing the volume? Or
>>>>> >>>>>> is it
>>>>> >>>>>> NFS?
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> -Krutika
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>>
>>>>> >>>>>> Respectfully
>>>>> >>>>>> Mahdi A. Mahdi
>>>>> >>>>>>
>>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
>>>>> >>>>>> Sent: Saturday, March 18, 2017
6:10:40 PM
>>>>> >>>>>>
>>>>> >>>>>> To: Mahdi Adnan
>>>>> >>>>>> Cc: gluster-users at gluster.org
>>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
>>>>> >>>>>> corruption
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> mnt-disk11-vmware2.log seems like
a brick log. Could you attach
>>>>> >>>>>> the
>>>>> >>>>>> fuse mount logs? It should be
right under /var/log/glusterfs/
>>>>> >>>>>> directory
>>>>> >>>>>>
>>>>> >>>>>> named after the mount point name,
only hyphenated.
>>>>> >>>>>>
>>>>> >>>>>> -Krutika
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM,
Mahdi Adnan
>>>>> >>>>>> <mahdi.adnan at outlook.com>
>>>>> >>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hello Krutika,
>>>>> >>>>>>
>>>>> >>>>>> Kindly, check the attached logs.
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>>
>>>>> >>>>>> Respectfully
>>>>> >>>>>> Mahdi A. Mahdi
>>>>> >>>>>>
>>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Sent: Saturday, March 18, 2017
3:29:03 PM
>>>>> >>>>>> To: Mahdi Adnan
>>>>> >>>>>> Cc: gluster-users at gluster.org
>>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
>>>>> >>>>>> corruption
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Hi Mahdi,
>>>>> >>>>>>
>>>>> >>>>>> Could you attach mount, brick and
rebalance logs?
>>>>> >>>>>>
>>>>> >>>>>> -Krutika
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM,
Mahdi Adnan
>>>>> >>>>>> <mahdi.adnan at outlook.com>
wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi,
>>>>> >>>>>>
>>>>> >>>>>> I have upgraded to Gluster 3.8.10
today and ran the add-brick
>>>>> >>>>>> procedure in a volume contains few
VMs.
>>>>> >>>>>>
>>>>> >>>>>> After the completion of rebalance,
i have rebooted the VMs, some
>>>>> >>>>>> of
>>>>> >>>>>> ran just fine, and others just
crashed.
>>>>> >>>>>>
>>>>> >>>>>> Windows boot to recovery mode and
Linux throw xfs errors and
>>>>> >>>>>> does not
>>>>> >>>>>> boot.
>>>>> >>>>>>
>>>>> >>>>>> I ran the test again and it
happened just as the first one, but
>>>>> >>>>>> i have
>>>>> >>>>>> noticed only VMs doing disk IOs
are affected by this bug.
>>>>> >>>>>>
>>>>> >>>>>> The VMs in power off mode started
fine and even md5 of the disk
>>>>> >>>>>> file
>>>>> >>>>>> did not change after the
rebalance.
>>>>> >>>>>>
>>>>> >>>>>> anyone else can confirm this ?
>>>>> >>>>>>
>>>>> >>>>>> Volume info:
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Volume Name: vmware2
>>>>> >>>>>>
>>>>> >>>>>> Type: Distributed-Replicate
>>>>> >>>>>>
>>>>> >>>>>> Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>> >>>>>>
>>>>> >>>>>> Status: Started
>>>>> >>>>>>
>>>>> >>>>>> Snapshot Count: 0
>>>>> >>>>>>
>>>>> >>>>>> Number of Bricks: 22 x 2 = 44
>>>>> >>>>>>
>>>>> >>>>>> Transport-type: tcp
>>>>> >>>>>>
>>>>> >>>>>> Bricks:
>>>>> >>>>>>
>>>>> >>>>>> Brick1:
gluster01:/mnt/disk1/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick2:
gluster03:/mnt/disk1/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick3:
gluster02:/mnt/disk1/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick4:
gluster04:/mnt/disk1/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick5:
gluster01:/mnt/disk2/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick6:
gluster03:/mnt/disk2/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick7:
gluster02:/mnt/disk2/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick8:
gluster04:/mnt/disk2/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick9:
gluster01:/mnt/disk3/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick10:
gluster03:/mnt/disk3/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick11:
gluster02:/mnt/disk3/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick12:
gluster04:/mnt/disk3/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick13:
gluster01:/mnt/disk4/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick14:
gluster03:/mnt/disk4/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick15:
gluster02:/mnt/disk4/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick16:
gluster04:/mnt/disk4/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick17:
gluster01:/mnt/disk5/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick18:
gluster03:/mnt/disk5/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick19:
gluster02:/mnt/disk5/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick20:
gluster04:/mnt/disk5/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick21:
gluster01:/mnt/disk6/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick22:
gluster03:/mnt/disk6/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick23:
gluster02:/mnt/disk6/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick24:
gluster04:/mnt/disk6/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick25:
gluster01:/mnt/disk7/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick26:
gluster03:/mnt/disk7/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick27:
gluster02:/mnt/disk7/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick28:
gluster04:/mnt/disk7/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick29:
gluster01:/mnt/disk8/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick30:
gluster03:/mnt/disk8/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick31:
gluster02:/mnt/disk8/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick32:
gluster04:/mnt/disk8/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick33:
gluster01:/mnt/disk9/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick34:
gluster03:/mnt/disk9/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick35:
gluster02:/mnt/disk9/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick36:
gluster04:/mnt/disk9/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick37:
gluster01:/mnt/disk10/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick38:
gluster03:/mnt/disk10/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick39:
gluster02:/mnt/disk10/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick40:
gluster04:/mnt/disk10/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick41:
gluster01:/mnt/disk11/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick42:
gluster03:/mnt/disk11/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick43:
gluster02:/mnt/disk11/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Brick44:
gluster04:/mnt/disk11/vmware2
>>>>> >>>>>>
>>>>> >>>>>> Options Reconfigured:
>>>>> >>>>>>
>>>>> >>>>>> cluster.server-quorum-type: server
>>>>> >>>>>>
>>>>> >>>>>> nfs.disable: on
>>>>> >>>>>>
>>>>> >>>>>> performance.readdir-ahead: on
>>>>> >>>>>>
>>>>> >>>>>> transport.address-family: inet
>>>>> >>>>>>
>>>>> >>>>>> performance.quick-read: off
>>>>> >>>>>>
>>>>> >>>>>> performance.read-ahead: off
>>>>> >>>>>>
>>>>> >>>>>> performance.io-cache: off
>>>>> >>>>>>
>>>>> >>>>>> performance.stat-prefetch: off
>>>>> >>>>>>
>>>>> >>>>>> cluster.eager-lock: enable
>>>>> >>>>>>
>>>>> >>>>>> network.remote-dio: enable
>>>>> >>>>>>
>>>>> >>>>>> features.shard: on
>>>>> >>>>>>
>>>>> >>>>>> cluster.data-self-heal-algorithm:
full
>>>>> >>>>>>
>>>>> >>>>>> features.cache-invalidation: on
>>>>> >>>>>>
>>>>> >>>>>> ganesha.enable: on
>>>>> >>>>>>
>>>>> >>>>>> features.shard-block-size: 256MB
>>>>> >>>>>>
>>>>> >>>>>> client.event-threads: 2
>>>>> >>>>>>
>>>>> >>>>>> server.event-threads: 2
>>>>> >>>>>>
>>>>> >>>>>> cluster.favorite-child-policy:
size
>>>>> >>>>>>
>>>>> >>>>>> storage.build-pgfid: off
>>>>> >>>>>>
>>>>> >>>>>> network.ping-timeout: 5
>>>>> >>>>>>
>>>>> >>>>>> cluster.enable-shared-storage:
enable
>>>>> >>>>>>
>>>>> >>>>>> nfs-ganesha: enable
>>>>> >>>>>>
>>>>> >>>>>> cluster.server-quorum-ratio: 51%
>>>>> >>>>>>
>>>>> >>>>>> Adding bricks:
>>>>> >>>>>>
>>>>> >>>>>> gluster volume add-brick vmware2
replica 2
>>>>> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>>>>> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>>>>> >>>>>>
>>>>> >>>>>> starting fix layout:
>>>>> >>>>>>
>>>>> >>>>>> gluster volume rebalance vmware2
fix-layout start
>>>>> >>>>>>
>>>>> >>>>>> Starting rebalance:
>>>>> >>>>>>
>>>>> >>>>>> gluster volume rebalance vmware2 
start
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>>
>>>>> >>>>>> Respectfully
>>>>> >>>>>> Mahdi A. Mahdi
>>>>> >>>>>>
>>>>> >>>>>>
_______________________________________________
>>>>> >>>>>> Gluster-users mailing list
>>>>> >>>>>> Gluster-users at gluster.org
>>>>> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
_______________________________________________
>>>>> >>>>> Gluster-users mailing list
>>>>> >>>>> Gluster-users at gluster.org
>>>>> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> Pranith
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
_______________________________________________
>>>>> >>> Gluster-users mailing list
>>>>> >>> Gluster-users at gluster.org
>>>>> >>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>> >
>>>>> >
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>
>
>
>
> --
> Pranith
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Pranith Kumar Karampuri

2017-Apr-27 11:31 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

But even after that fix, it is still leading to pause. And these are the
two updates on what the developers are doing as per my understanding. So
that workflow is not stable yet IMO.

On Thu, Apr 27, 2017 at 4:51 PM, Serkan ?oban <cobanserkan at gmail.com>
wrote:
> I think this is he fix Gandalf asking for:
> https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec
> 47e1ba27ce
>
>
> On Thu, Apr 27, 2017 at 2:03 PM, Pranith Kumar Karampuri
> <pkarampu at redhat.com> wrote:
> > I am very positive about the two things I told you. These are the
latest
> > things that happened for VM corruption with rebalance.
> >
> > On Thu, Apr 27, 2017 at 4:30 PM, Gandalf Corvotempesta
> > <gandalf.corvotempesta at gmail.com> wrote:
> >>
> >> I think we are talking about a different bug.
> >>
> >> Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri"
<pkarampu at redhat.com
> >
> >> ha scritto:
> >>>
> >>> I am not a DHT developer, so some of what I say could be a
little
> wrong.
> >>> But this is what I gather.
> >>> I think they found 2 classes of bugs in dht
> >>> 1) Graceful fop failover when rebalance is in progress is
missing for
> >>> some fops, that lead to VM pause.
> >>>
> >>> I see that https://review.gluster.org/17085 got merged on 24th
on
> master
> >>> for this. I see patches are posted for 3.8.x for this one.
> >>>
> >>> 2) I think there is some work needs to be done for
dht_[f]xattrop. I
> >>> believe this is the next step that is underway.
> >>>
> >>>
> >>> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta
> >>> <gandalf.corvotempesta at gmail.com> wrote:
> >>>>
> >>>> Updates on this critical bug ?
> >>>>
> >>>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta"
> >>>> <gandalf.corvotempesta at gmail.com> ha scritto:
> >>>>>
> >>>>> Any update ?
> >>>>> In addition, if this is a different bug but the
"workflow" is the
> same
> >>>>> as the previous one, how is possible that fixing the
previous bug
> >>>>> triggered this new one ?
> >>>>>
> >>>>> Is possible to have some details ?
> >>>>>
> >>>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay
<kdhananj at redhat.com>:
> >>>>> > Nope. This is a different bug.
> >>>>> >
> >>>>> > -Krutika
> >>>>> >
> >>>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf
Corvotempesta
> >>>>> > <gandalf.corvotempesta at gmail.com> wrote:
> >>>>> >>
> >>>>> >> This is a good news
> >>>>> >> Is this related to the previously fixed bug?
> >>>>> >>
> >>>>> >> Il 3 apr 2017 10:22 AM, "Krutika
Dhananjay" <kdhananj at redhat.com>
> ha
> >>>>> >> scritto:
> >>>>> >>>
> >>>>> >>> So Raghavendra has an RCA for this issue.
> >>>>> >>>
> >>>>> >>> Copy-pasting his comment here:
> >>>>> >>>
> >>>>> >>> <RCA>
> >>>>> >>>
> >>>>> >>> Following is a rough algorithm of
shard_writev:
> >>>>> >>>
> >>>>> >>> 1. Based on the offset, calculate the
shards touched by current
> >>>>> >>> write.
> >>>>> >>> 2. Look for inodes corresponding to these
shard files in itable.
> >>>>> >>> 3. If one or more inodes are missing from
itable, issue mknod for
> >>>>> >>> corresponding shard files and ignore
EEXIST in cbk.
> >>>>> >>> 4. resume writes on respective shards.
> >>>>> >>>
> >>>>> >>> Now, imagine a write which falls to an
existing "shard_file". For
> >>>>> >>> the
> >>>>> >>> sake of discussion lets consider a
distribute of three subvols -
> >>>>> >>> s1, s2, s3
> >>>>> >>>
> >>>>> >>> 1. "shard_file" hashes to
subvolume s2 and is present on s2
> >>>>> >>> 2. add a subvolume s4 and initiate a fix
layout. The layout of
> >>>>> >>> ".shard"
> >>>>> >>> is fixed to include s4 and hash ranges
are changed.
> >>>>> >>> 3. write that touches
"shard_file" is issued.
> >>>>> >>> 4. The inode for "shard_file"
is not present in itable after a
> >>>>> >>> graph
> >>>>> >>> switch and features/shard issues an
mknod.
> >>>>> >>> 5. With new layout of .shard, lets say
"shard_file" hashes to s3
> >>>>> >>> and
> >>>>> >>> mknod (shard_file) on s3 succeeds. But,
the shard_file is already
> >>>>> >>> present on
> >>>>> >>> s2.
> >>>>> >>>
> >>>>> >>> So, we have two files on two different
subvols of dht
> representing
> >>>>> >>> same
> >>>>> >>> shard and this will lead to corruption.
> >>>>> >>>
> >>>>> >>> </RCA>
> >>>>> >>>
> >>>>> >>> Raghavendra will be sending out a patch
in DHT to fix this issue.
> >>>>> >>>
> >>>>> >>> -Krutika
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith
Kumar Karampuri
> >>>>> >>> <pkarampu at redhat.com> wrote:
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM,
Mahdi Adnan
> >>>>> >>>> <mahdi.adnan at outlook.com>
> >>>>> >>>> wrote:
> >>>>> >>>>>
> >>>>> >>>>> Hi,
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> Do you guys have any update
regarding this issue ?
> >>>>> >>>>
> >>>>> >>>> I do not actively work on this issue
so I do not have an
> accurate
> >>>>> >>>> update, but from what I heard from
Krutika and Raghavendra(works
> >>>>> >>>> on DHT) is:
> >>>>> >>>> Krutika debugged initially and found
that the issue seems more
> >>>>> >>>> likely to be
> >>>>> >>>> in DHT, Satheesaran who helped us
recreate this issue in lab
> found
> >>>>> >>>> that just
> >>>>> >>>> fix-layout without rebalance also
caused the corruption 1 out
> of 3
> >>>>> >>>> times.
> >>>>> >>>> Raghavendra came up with a possible
RCA for why this can happen.
> >>>>> >>>> Raghavendra(CCed) would be the right
person to provide accurate
> >>>>> >>>> update.
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> --
> >>>>> >>>>>
> >>>>> >>>>> Respectfully
> >>>>> >>>>> Mahdi A. Mahdi
> >>>>> >>>>>
> >>>>> >>>>> ________________________________
> >>>>> >>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>> >>>>> Sent: Tuesday, March 21, 2017
3:02:55 PM
> >>>>> >>>>> To: Mahdi Adnan
> >>>>> >>>>> Cc: Nithya Balachandran;
Gowdappa, Raghavendra; Susant Palai;
> >>>>> >>>>> gluster-users at gluster.org List
> >>>>> >>>>>
> >>>>> >>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>> corruption
> >>>>> >>>>>
> >>>>> >>>>> Hi,
> >>>>> >>>>>
> >>>>> >>>>> So it looks like Satheesaran
managed to recreate this issue. We
> >>>>> >>>>> will be
> >>>>> >>>>> seeking his help in debugging
this. It will be easier that way.
> >>>>> >>>>>
> >>>>> >>>>> -Krutika
> >>>>> >>>>>
> >>>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM,
Mahdi Adnan
> >>>>> >>>>> <mahdi.adnan at
outlook.com>
> >>>>> >>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Hello and thank you for your
email.
> >>>>> >>>>>> Actually no, i didn't
check the gfid of the vms.
> >>>>> >>>>>> If this will help, i can
setup a new test cluster and get all
> >>>>> >>>>>> the data
> >>>>> >>>>>> you need.
> >>>>> >>>>>>
> >>>>> >>>>>> Get Outlook for Android
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> From: Nithya Balachandran
> >>>>> >>>>>> Sent: Monday, March 20, 20:57
> >>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>>> corruption
> >>>>> >>>>>> To: Krutika Dhananjay
> >>>>> >>>>>> Cc: Mahdi Adnan, Gowdappa,
Raghavendra, Susant Palai,
> >>>>> >>>>>> gluster-users at gluster.org
List
> >>>>> >>>>>>
> >>>>> >>>>>> Hi,
> >>>>> >>>>>>
> >>>>> >>>>>> Do you know the GFIDs of the
VM images which were corrupted?
> >>>>> >>>>>>
> >>>>> >>>>>> Regards,
> >>>>> >>>>>>
> >>>>> >>>>>> Nithya
> >>>>> >>>>>>
> >>>>> >>>>>> On 20 March 2017 at 20:37,
Krutika Dhananjay
> >>>>> >>>>>> <kdhananj at
redhat.com>
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> I looked at the logs.
> >>>>> >>>>>>
> >>>>> >>>>>> From the time the new graph
(since the add-brick command you
> >>>>> >>>>>> shared
> >>>>> >>>>>> where bricks 41 through 44
are added) is switched to (line
> 3011
> >>>>> >>>>>> onwards in
> >>>>> >>>>>> nfs-gfapi.log), I see the
following kinds of errors:
> >>>>> >>>>>>
> >>>>> >>>>>> 1. Lookups to a bunch of
files failed with ENOENT on both
> >>>>> >>>>>> replicas
> >>>>> >>>>>> which protocol/client
converts to ESTALE. I am guessing these
> >>>>> >>>>>> entries got
> >>>>> >>>>>> migrated to
> >>>>> >>>>>>
> >>>>> >>>>>> other subvolumes leading to
'No such file or directory'
> errors.
> >>>>> >>>>>>
> >>>>> >>>>>> DHT and thereafter shard get
the same error code and log the
> >>>>> >>>>>> following:
> >>>>> >>>>>>
> >>>>> >>>>>>  0 [2017-03-17
14:04:26.353444] E [MSGID: 109040]
> >>>>> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
> >>>>> >>>>>> 17-vmware2-dht:
> >>>>> >>>>>>
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to
> >>>>> >>>>>> lookup the file
> >>>>> >>>>>> on vmware2-dht [Stale file
handle]
> >>>>> >>>>>>   1 [2017-03-17
14:04:26.353528] E [MSGID: 133014]
> >>>>> >>>>>>
[shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat
> >>>>> >>>>>> failed:
> >>>>> >>>>>>
a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
> >>>>> >>>>>>
> >>>>> >>>>>> which is fine.
> >>>>> >>>>>>
> >>>>> >>>>>> 2. The other kind are from
AFR logging of possible split-brain
> >>>>> >>>>>> which I
> >>>>> >>>>>> suppose are harmless too.
> >>>>> >>>>>> [2017-03-17 14:23:36.968883]
W [MSGID: 108008]
> >>>>> >>>>>>
[afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13:
> >>>>> >>>>>> Unreadable
> >>>>> >>>>>> subvolume -1 found with event
generation 2 for gfid
> >>>>> >>>>>>
74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
> >>>>> >>>>>>
> >>>>> >>>>>> Since you are saying the bug
is hit only on VMs that are
> >>>>> >>>>>> undergoing IO
> >>>>> >>>>>> while rebalance is running
(as opposed to those that remained
> >>>>> >>>>>> powered off),
> >>>>> >>>>>>
> >>>>> >>>>>> rebalance + IO could be
causing some issues.
> >>>>> >>>>>>
> >>>>> >>>>>> CC'ing DHT devs
> >>>>> >>>>>>
> >>>>> >>>>>> Raghavendra/Nithya/Susant,
> >>>>> >>>>>>
> >>>>> >>>>>> Could you take a look?
> >>>>> >>>>>>
> >>>>> >>>>>> -Krutika
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> On Sun, Mar 19, 2017 at 4:55
PM, Mahdi Adnan
> >>>>> >>>>>> <mahdi.adnan at
outlook.com>
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Thank you for your email
mate.
> >>>>> >>>>>>
> >>>>> >>>>>> Yes, im aware of this but, to
save costs i chose replica 2,
> this
> >>>>> >>>>>> cluster is all flash.
> >>>>> >>>>>>
> >>>>> >>>>>> In version 3.7.x i had issues
with ping timeout, if one hosts
> >>>>> >>>>>> went
> >>>>> >>>>>> down for few seconds the
whole cluster hangs and become
> >>>>> >>>>>> unavailable, to
> >>>>> >>>>>> avoid this i adjusted the
ping timeout to 5 seconds.
> >>>>> >>>>>>
> >>>>> >>>>>> As for choosing Ganesha over
gfapi, VMWare does not support
> >>>>> >>>>>> Gluster
> >>>>> >>>>>> (FUSE or gfapi) im stuck with
NFS for this volume.
> >>>>> >>>>>>
> >>>>> >>>>>> The other volume is mounted
using gfapi in oVirt cluster.
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>>
> >>>>> >>>>>> Respectfully
> >>>>> >>>>>> Mahdi A. Mahdi
> >>>>> >>>>>>
> >>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>> >>>>>> Sent: Sunday, March 19, 2017
2:01:49 PM
> >>>>> >>>>>>
> >>>>> >>>>>> To: Mahdi Adnan
> >>>>> >>>>>> Cc: gluster-users at
gluster.org
> >>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>>> corruption
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> While I'm still going
through the logs, just wanted to point
> out
> >>>>> >>>>>> a
> >>>>> >>>>>> couple of things:
> >>>>> >>>>>>
> >>>>> >>>>>> 1. It is recommended that you
use 3-way replication (replica
> >>>>> >>>>>> count 3)
> >>>>> >>>>>> for VM store use case
> >>>>> >>>>>>
> >>>>> >>>>>> 2. network.ping-timeout at 5
seconds is way too low. Please
> >>>>> >>>>>> change it
> >>>>> >>>>>> to 30.
> >>>>> >>>>>>
> >>>>> >>>>>> Is there any specific reason
for using NFS-Ganesha over
> >>>>> >>>>>> gfapi/FUSE?
> >>>>> >>>>>>
> >>>>> >>>>>> Will get back with anything
else I might find or more
> questions
> >>>>> >>>>>> if I
> >>>>> >>>>>> have any.
> >>>>> >>>>>>
> >>>>> >>>>>> -Krutika
> >>>>> >>>>>>
> >>>>> >>>>>> On Sun, Mar 19, 2017 at 2:36
PM, Mahdi Adnan
> >>>>> >>>>>> <mahdi.adnan at
outlook.com>
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Thanks mate,
> >>>>> >>>>>>
> >>>>> >>>>>> Kindly, check the attachment.
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>>
> >>>>> >>>>>> Respectfully
> >>>>> >>>>>> Mahdi A. Mahdi
> >>>>> >>>>>>
> >>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>> >>>>>> Sent: Sunday, March 19, 2017
10:00:22 AM
> >>>>> >>>>>>
> >>>>> >>>>>> To: Mahdi Adnan
> >>>>> >>>>>> Cc: gluster-users at
gluster.org
> >>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>>> corruption
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> In that case could you share
the ganesha-gfapi logs?
> >>>>> >>>>>>
> >>>>> >>>>>> -Krutika
> >>>>> >>>>>>
> >>>>> >>>>>> On Sun, Mar 19, 2017 at 12:13
PM, Mahdi Adnan
> >>>>> >>>>>> <mahdi.adnan at
outlook.com> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> I have two volumes, one is
mounted using libgfapi for ovirt
> >>>>> >>>>>> mount, the
> >>>>> >>>>>> other one is exported via
NFS-Ganesha for VMWare which is the
> >>>>> >>>>>> one im testing
> >>>>> >>>>>> now.
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>>
> >>>>> >>>>>> Respectfully
> >>>>> >>>>>> Mahdi A. Mahdi
> >>>>> >>>>>>
> >>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>> >>>>>> Sent: Sunday, March 19, 2017
8:02:19 AM
> >>>>> >>>>>>
> >>>>> >>>>>> To: Mahdi Adnan
> >>>>> >>>>>> Cc: gluster-users at
gluster.org
> >>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>>> corruption
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> On Sat, Mar 18, 2017 at 10:36
PM, Mahdi Adnan
> >>>>> >>>>>> <mahdi.adnan at
outlook.com> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Kindly, check the attached
new log file, i dont know if it's
> >>>>> >>>>>> helpful
> >>>>> >>>>>> or not but, i couldn't
find the log with the name you just
> >>>>> >>>>>> described.
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> No. Are you using FUSE or
libgfapi for accessing the volume?
> Or
> >>>>> >>>>>> is it
> >>>>> >>>>>> NFS?
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> -Krutika
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>>
> >>>>> >>>>>> Respectfully
> >>>>> >>>>>> Mahdi A. Mahdi
> >>>>> >>>>>>
> >>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>> >>>>>> Sent: Saturday, March 18,
2017 6:10:40 PM
> >>>>> >>>>>>
> >>>>> >>>>>> To: Mahdi Adnan
> >>>>> >>>>>> Cc: gluster-users at
gluster.org
> >>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>>> corruption
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> mnt-disk11-vmware2.log seems
like a brick log. Could you
> attach
> >>>>> >>>>>> the
> >>>>> >>>>>> fuse mount logs? It should be
right under /var/log/glusterfs/
> >>>>> >>>>>> directory
> >>>>> >>>>>>
> >>>>> >>>>>> named after the mount point
name, only hyphenated.
> >>>>> >>>>>>
> >>>>> >>>>>> -Krutika
> >>>>> >>>>>>
> >>>>> >>>>>> On Sat, Mar 18, 2017 at 7:27
PM, Mahdi Adnan
> >>>>> >>>>>> <mahdi.adnan at
outlook.com>
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Hello Krutika,
> >>>>> >>>>>>
> >>>>> >>>>>> Kindly, check the attached
logs.
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>>
> >>>>> >>>>>> Respectfully
> >>>>> >>>>>> Mahdi A. Mahdi
> >>>>> >>>>>>
> >>>>> >>>>>> From: Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Sent: Saturday, March 18,
2017 3:29:03 PM
> >>>>> >>>>>> To: Mahdi Adnan
> >>>>> >>>>>> Cc: gluster-users at
gluster.org
> >>>>> >>>>>> Subject: Re: [Gluster-users]
Gluster 3.8.10 rebalance VMs
> >>>>> >>>>>> corruption
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Hi Mahdi,
> >>>>> >>>>>>
> >>>>> >>>>>> Could you attach mount, brick
and rebalance logs?
> >>>>> >>>>>>
> >>>>> >>>>>> -Krutika
> >>>>> >>>>>>
> >>>>> >>>>>> On Sat, Mar 18, 2017 at 12:14
AM, Mahdi Adnan
> >>>>> >>>>>> <mahdi.adnan at
outlook.com> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Hi,
> >>>>> >>>>>>
> >>>>> >>>>>> I have upgraded to Gluster
3.8.10 today and ran the add-brick
> >>>>> >>>>>> procedure in a volume
contains few VMs.
> >>>>> >>>>>>
> >>>>> >>>>>> After the completion of
rebalance, i have rebooted the VMs,
> some
> >>>>> >>>>>> of
> >>>>> >>>>>> ran just fine, and others
just crashed.
> >>>>> >>>>>>
> >>>>> >>>>>> Windows boot to recovery mode
and Linux throw xfs errors and
> >>>>> >>>>>> does not
> >>>>> >>>>>> boot.
> >>>>> >>>>>>
> >>>>> >>>>>> I ran the test again and it
happened just as the first one,
> but
> >>>>> >>>>>> i have
> >>>>> >>>>>> noticed only VMs doing disk
IOs are affected by this bug.
> >>>>> >>>>>>
> >>>>> >>>>>> The VMs in power off mode
started fine and even md5 of the
> disk
> >>>>> >>>>>> file
> >>>>> >>>>>> did not change after the
rebalance.
> >>>>> >>>>>>
> >>>>> >>>>>> anyone else can confirm this
?
> >>>>> >>>>>>
> >>>>> >>>>>> Volume info:
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Volume Name: vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Type: Distributed-Replicate
> >>>>> >>>>>>
> >>>>> >>>>>> Volume ID:
02328d46-a285-4533-aa3a-fb9bfeb688bf
> >>>>> >>>>>>
> >>>>> >>>>>> Status: Started
> >>>>> >>>>>>
> >>>>> >>>>>> Snapshot Count: 0
> >>>>> >>>>>>
> >>>>> >>>>>> Number of Bricks: 22 x 2 = 44
> >>>>> >>>>>>
> >>>>> >>>>>> Transport-type: tcp
> >>>>> >>>>>>
> >>>>> >>>>>> Bricks:
> >>>>> >>>>>>
> >>>>> >>>>>> Brick1:
gluster01:/mnt/disk1/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick2:
gluster03:/mnt/disk1/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick3:
gluster02:/mnt/disk1/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick4:
gluster04:/mnt/disk1/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick5:
gluster01:/mnt/disk2/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick6:
gluster03:/mnt/disk2/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick7:
gluster02:/mnt/disk2/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick8:
gluster04:/mnt/disk2/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick9:
gluster01:/mnt/disk3/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick10:
gluster03:/mnt/disk3/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick11:
gluster02:/mnt/disk3/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick12:
gluster04:/mnt/disk3/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick13:
gluster01:/mnt/disk4/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick14:
gluster03:/mnt/disk4/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick15:
gluster02:/mnt/disk4/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick16:
gluster04:/mnt/disk4/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick17:
gluster01:/mnt/disk5/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick18:
gluster03:/mnt/disk5/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick19:
gluster02:/mnt/disk5/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick20:
gluster04:/mnt/disk5/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick21:
gluster01:/mnt/disk6/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick22:
gluster03:/mnt/disk6/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick23:
gluster02:/mnt/disk6/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick24:
gluster04:/mnt/disk6/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick25:
gluster01:/mnt/disk7/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick26:
gluster03:/mnt/disk7/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick27:
gluster02:/mnt/disk7/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick28:
gluster04:/mnt/disk7/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick29:
gluster01:/mnt/disk8/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick30:
gluster03:/mnt/disk8/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick31:
gluster02:/mnt/disk8/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick32:
gluster04:/mnt/disk8/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick33:
gluster01:/mnt/disk9/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick34:
gluster03:/mnt/disk9/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick35:
gluster02:/mnt/disk9/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick36:
gluster04:/mnt/disk9/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick37:
gluster01:/mnt/disk10/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick38:
gluster03:/mnt/disk10/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick39:
gluster02:/mnt/disk10/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick40:
gluster04:/mnt/disk10/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick41:
gluster01:/mnt/disk11/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick42:
gluster03:/mnt/disk11/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick43:
gluster02:/mnt/disk11/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Brick44:
gluster04:/mnt/disk11/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> Options Reconfigured:
> >>>>> >>>>>>
> >>>>> >>>>>> cluster.server-quorum-type:
server
> >>>>> >>>>>>
> >>>>> >>>>>> nfs.disable: on
> >>>>> >>>>>>
> >>>>> >>>>>> performance.readdir-ahead: on
> >>>>> >>>>>>
> >>>>> >>>>>> transport.address-family:
inet
> >>>>> >>>>>>
> >>>>> >>>>>> performance.quick-read: off
> >>>>> >>>>>>
> >>>>> >>>>>> performance.read-ahead: off
> >>>>> >>>>>>
> >>>>> >>>>>> performance.io-cache: off
> >>>>> >>>>>>
> >>>>> >>>>>> performance.stat-prefetch:
off
> >>>>> >>>>>>
> >>>>> >>>>>> cluster.eager-lock: enable
> >>>>> >>>>>>
> >>>>> >>>>>> network.remote-dio: enable
> >>>>> >>>>>>
> >>>>> >>>>>> features.shard: on
> >>>>> >>>>>>
> >>>>> >>>>>>
cluster.data-self-heal-algorithm: full
> >>>>> >>>>>>
> >>>>> >>>>>> features.cache-invalidation:
on
> >>>>> >>>>>>
> >>>>> >>>>>> ganesha.enable: on
> >>>>> >>>>>>
> >>>>> >>>>>> features.shard-block-size:
256MB
> >>>>> >>>>>>
> >>>>> >>>>>> client.event-threads: 2
> >>>>> >>>>>>
> >>>>> >>>>>> server.event-threads: 2
> >>>>> >>>>>>
> >>>>> >>>>>>
cluster.favorite-child-policy: size
> >>>>> >>>>>>
> >>>>> >>>>>> storage.build-pgfid: off
> >>>>> >>>>>>
> >>>>> >>>>>> network.ping-timeout: 5
> >>>>> >>>>>>
> >>>>> >>>>>>
cluster.enable-shared-storage: enable
> >>>>> >>>>>>
> >>>>> >>>>>> nfs-ganesha: enable
> >>>>> >>>>>>
> >>>>> >>>>>> cluster.server-quorum-ratio:
51%
> >>>>> >>>>>>
> >>>>> >>>>>> Adding bricks:
> >>>>> >>>>>>
> >>>>> >>>>>> gluster volume add-brick
vmware2 replica 2
> >>>>> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
> >>>>> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
> >>>>> >>>>>>
> >>>>> >>>>>> starting fix layout:
> >>>>> >>>>>>
> >>>>> >>>>>> gluster volume rebalance
vmware2 fix-layout start
> >>>>> >>>>>>
> >>>>> >>>>>> Starting rebalance:
> >>>>> >>>>>>
> >>>>> >>>>>> gluster volume rebalance
vmware2  start
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>>
> >>>>> >>>>>> Respectfully
> >>>>> >>>>>> Mahdi A. Mahdi
> >>>>> >>>>>>
> >>>>> >>>>>>
_______________________________________________
> >>>>> >>>>>> Gluster-users mailing list
> >>>>> >>>>>> Gluster-users at gluster.org
> >>>>> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
_______________________________________________
> >>>>> >>>>> Gluster-users mailing list
> >>>>> >>>>> Gluster-users at gluster.org
> >>>>> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>> --
> >>>>> >>>> Pranith
> >>>>> >>>
> >>>>> >>>
> >>>>> >>>
> >>>>> >>>
_______________________________________________
> >>>>> >>> Gluster-users mailing list
> >>>>> >>> Gluster-users at gluster.org
> >>>>> >>>
http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>> >
> >>>>> >
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >
> >
> >
> >
> > --
> > Pranith
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/77c551f5/attachment.html>

Gandalf Corvotempesta

2017-Apr-27 11:45 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-27 13:21 GMT+02:00 Serkan ?oban <cobanserkan at
gmail.com>:> I think this is he fix Gandalf asking for:
>
https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec47e1ba27ce
Yes, i'm talking about this.

Gluster users - Apr 2017 - Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption