thr3ads.net - Gluster users - [Gluster-users] Gluster 3.8.10 rebalance VMs corruption [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Gandalf Corvotempesta

2017-Apr-18 18:24 UTC

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Any update ?
In addition, if this is a different bug but the "workflow" is the same
as the previous one, how is possible that fixing the previous bug
triggered this new one ?

Is possible to have some details ?

2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at
redhat.com>:> Nope. This is a different bug.
>
> -Krutika
>
> On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
> <gandalf.corvotempesta at gmail.com> wrote:
>>
>> This is a good news
>> Is this related to the previously fixed bug?
>>
>> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha
>> scritto:
>>>
>>> So Raghavendra has an RCA for this issue.
>>>
>>> Copy-pasting his comment here:
>>>
>>> <RCA>
>>>
>>> Following is a rough algorithm of shard_writev:
>>>
>>> 1. Based on the offset, calculate the shards touched by current
write.
>>> 2. Look for inodes corresponding to these shard files in itable.
>>> 3. If one or more inodes are missing from itable, issue mknod for
>>> corresponding shard files and ignore EEXIST in cbk.
>>> 4. resume writes on respective shards.
>>>
>>> Now, imagine a write which falls to an existing
"shard_file". For the
>>> sake of discussion lets consider a distribute of three subvols -
s1, s2, s3
>>>
>>> 1. "shard_file" hashes to subvolume s2 and is present on
s2
>>> 2. add a subvolume s4 and initiate a fix layout. The layout of
".shard"
>>> is fixed to include s4 and hash ranges are changed.
>>> 3. write that touches "shard_file" is issued.
>>> 4. The inode for "shard_file" is not present in itable
after a graph
>>> switch and features/shard issues an mknod.
>>> 5. With new layout of .shard, lets say "shard_file"
hashes to s3 and
>>> mknod (shard_file) on s3 succeeds. But, the shard_file is already
present on
>>> s2.
>>>
>>> So, we have two files on two different subvols of dht representing
same
>>> shard and this will lead to corruption.
>>>
>>> </RCA>
>>>
>>> Raghavendra will be sending out a patch in DHT to fix this issue.
>>>
>>> -Krutika
>>>
>>>
>>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
>>> <pkarampu at redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan
at outlook.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> Do you guys have any update regarding this issue ?
>>>>
>>>> I do not actively work on this issue so I do not have an
accurate
>>>> update, but from what I heard from Krutika and
Raghavendra(works on DHT) is:
>>>> Krutika debugged initially and found that the issue seems more
likely to be
>>>> in DHT, Satheesaran who helped us recreate this issue in lab
found that just
>>>> fix-layout without rebalance also caused the corruption 1 out
of 3 times.
>>>> Raghavendra came up with a possible RCA for why this can
happen.
>>>> Raghavendra(CCed) would be the right person to provide accurate
update.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> Mahdi A. Mahdi
>>>>>
>>>>> ________________________________
>>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
>>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
>>>>> To: Mahdi Adnan
>>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant
Palai;
>>>>> gluster-users at gluster.org List
>>>>>
>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>>
>>>>> Hi,
>>>>>
>>>>> So it looks like Satheesaran managed to recreate this
issue. We will be
>>>>> seeking his help in debugging this. It will be easier that
way.
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello and thank you for your email.
>>>>>> Actually no, i didn't check the gfid of the vms.
>>>>>> If this will help, i can setup a new test cluster and
get all the data
>>>>>> you need.
>>>>>>
>>>>>> Get Outlook for Android
>>>>>>
>>>>>>
>>>>>> From: Nithya Balachandran
>>>>>> Sent: Monday, March 20, 20:57
>>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
>>>>>> To: Krutika Dhananjay
>>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>>>>> gluster-users at gluster.org List
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Do you know the GFIDs of the VM images which were
corrupted?
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Nithya
>>>>>>
>>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay
<kdhananj at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>> I looked at the logs.
>>>>>>
>>>>>> From the time the new graph (since the add-brick
command you shared
>>>>>> where bricks 41 through 44 are added) is switched to
(line 3011 onwards in
>>>>>> nfs-gfapi.log), I see the following kinds of errors:
>>>>>>
>>>>>> 1. Lookups to a bunch of files failed with ENOENT on
both replicas
>>>>>> which protocol/client converts to ESTALE. I am guessing
these entries got
>>>>>> migrated to
>>>>>>
>>>>>> other subvolumes leading to 'No such file or
directory' errors.
>>>>>>
>>>>>> DHT and thereafter shard get the same error code and
log the
>>>>>> following:
>>>>>>
>>>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>>>>> [dht-helper.c:1198:dht_migration_complete_check_task]
17-vmware2-dht:
>>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>:
failed     to lookup the file
>>>>>> on vmware2-dht [Stale file handle]
>>>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard:
stat failed:
>>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file
handle]
>>>>>>
>>>>>> which is fine.
>>>>>>
>>>>>> 2. The other kind are from AFR logging of possible
split-brain which I
>>>>>> suppose are harmless too.
>>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13: Unreadable
>>>>>> subvolume -1 found with event generation 2 for gfid
>>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible
split-brain)
>>>>>>
>>>>>> Since you are saying the bug is hit only on VMs that
are undergoing IO
>>>>>> while rebalance is running (as opposed to those that
remained powered off),
>>>>>>
>>>>>> rebalance + IO could be causing some issues.
>>>>>>
>>>>>> CC'ing DHT devs
>>>>>>
>>>>>> Raghavendra/Nithya/Susant,
>>>>>>
>>>>>> Could you take a look?
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>>> wrote:
>>>>>>
>>>>>> Thank you for your email mate.
>>>>>>
>>>>>> Yes, im aware of this but, to save costs i chose
replica 2, this
>>>>>> cluster is all flash.
>>>>>>
>>>>>> In version 3.7.x i had issues with ping timeout, if one
hosts went
>>>>>> down for few seconds the whole cluster hangs and become
unavailable, to
>>>>>> avoid this i adjusted the ping timeout to 5 seconds.
>>>>>>
>>>>>> As for choosing Ganesha over gfapi, VMWare does not
support Gluster
>>>>>> (FUSE or gfapi) im stuck with NFS for this volume.
>>>>>>
>>>>>> The other volume is mounted using gfapi in oVirt
cluster.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> Mahdi A. Mahdi
>>>>>>
>>>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
>>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM
>>>>>>
>>>>>> To: Mahdi Adnan
>>>>>> Cc: gluster-users at gluster.org
>>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
>>>>>>
>>>>>>
>>>>>>
>>>>>> While I'm still going through the logs, just wanted
to point out a
>>>>>> couple of things:
>>>>>>
>>>>>> 1. It is recommended that you use 3-way replication
(replica count 3)
>>>>>> for VM store use case
>>>>>>
>>>>>> 2. network.ping-timeout at 5 seconds is way too low.
Please change it
>>>>>> to 30.
>>>>>>
>>>>>> Is there any specific reason for using NFS-Ganesha over
gfapi/FUSE?
>>>>>>
>>>>>> Will get back with anything else I might find or more
questions if I
>>>>>> have any.
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>>> wrote:
>>>>>>
>>>>>> Thanks mate,
>>>>>>
>>>>>> Kindly, check the attachment.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> Mahdi A. Mahdi
>>>>>>
>>>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
>>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM
>>>>>>
>>>>>> To: Mahdi Adnan
>>>>>> Cc: gluster-users at gluster.org
>>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
>>>>>>
>>>>>>
>>>>>>
>>>>>> In that case could you share the ganesha-gfapi logs?
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan
>>>>>> <mahdi.adnan at outlook.com> wrote:
>>>>>>
>>>>>> I have two volumes, one is mounted using libgfapi for
ovirt mount, the
>>>>>> other one is exported via NFS-Ganesha for VMWare which
is the one im testing
>>>>>> now.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> Mahdi A. Mahdi
>>>>>>
>>>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
>>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM
>>>>>>
>>>>>> To: Mahdi Adnan
>>>>>> Cc: gluster-users at gluster.org
>>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan
>>>>>> <mahdi.adnan at outlook.com> wrote:
>>>>>>
>>>>>> Kindly, check the attached new log file, i dont know if
it's helpful
>>>>>> or not but, i couldn't find the log with the name
you just described.
>>>>>>
>>>>>>
>>>>>> No. Are you using FUSE or libgfapi for accessing the
volume? Or is it
>>>>>> NFS?
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> Mahdi A. Mahdi
>>>>>>
>>>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
>>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM
>>>>>>
>>>>>> To: Mahdi Adnan
>>>>>> Cc: gluster-users at gluster.org
>>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
>>>>>>
>>>>>>
>>>>>>
>>>>>> mnt-disk11-vmware2.log seems like a brick log. Could
you attach the
>>>>>> fuse mount logs? It should be right under
/var/log/glusterfs/ directory
>>>>>>
>>>>>> named after the mount point name, only hyphenated.
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello Krutika,
>>>>>>
>>>>>> Kindly, check the attached logs.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> Mahdi A. Mahdi
>>>>>>
>>>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
>>>>>>
>>>>>>
>>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM
>>>>>> To: Mahdi Adnan
>>>>>> Cc: gluster-users at gluster.org
>>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Mahdi,
>>>>>>
>>>>>> Could you attach mount, brick and rebalance logs?
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan
>>>>>> <mahdi.adnan at outlook.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have upgraded to Gluster 3.8.10 today and ran the
add-brick
>>>>>> procedure in a volume contains few VMs.
>>>>>>
>>>>>> After the completion of rebalance, i have rebooted the
VMs, some of
>>>>>> ran just fine, and others just crashed.
>>>>>>
>>>>>> Windows boot to recovery mode and Linux throw xfs
errors and does not
>>>>>> boot.
>>>>>>
>>>>>> I ran the test again and it happened just as the first
one, but i have
>>>>>> noticed only VMs doing disk IOs are affected by this
bug.
>>>>>>
>>>>>> The VMs in power off mode started fine and even md5 of
the disk file
>>>>>> did not change after the rebalance.
>>>>>>
>>>>>> anyone else can confirm this ?
>>>>>>
>>>>>> Volume info:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Volume Name: vmware2
>>>>>>
>>>>>> Type: Distributed-Replicate
>>>>>>
>>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>>>
>>>>>> Status: Started
>>>>>>
>>>>>> Snapshot Count: 0
>>>>>>
>>>>>> Number of Bricks: 22 x 2 = 44
>>>>>>
>>>>>> Transport-type: tcp
>>>>>>
>>>>>> Bricks:
>>>>>>
>>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>>>>
>>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>>>>
>>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>>>>
>>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>>>>
>>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>>>>
>>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>>>>
>>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>>>>
>>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>>>>
>>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>>>>
>>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>>>>
>>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>>>>
>>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>>>>
>>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>>>>
>>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>>>>
>>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>>>>
>>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>>>>
>>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>>>>
>>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>>>>
>>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>>>>
>>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>>>>
>>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>>>>
>>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>>>>
>>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>>>>
>>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>>>>
>>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>>>>
>>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>>>>
>>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>>>>
>>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>>>>
>>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>>>>
>>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>>>>
>>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>>>>
>>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>>>>
>>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>>>>
>>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>>>>
>>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>>>>
>>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>>>>
>>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>>>>
>>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>>>>
>>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>>>>
>>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>>>>
>>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>>>>
>>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>>>>
>>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>>>>
>>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>>>>
>>>>>> Options Reconfigured:
>>>>>>
>>>>>> cluster.server-quorum-type: server
>>>>>>
>>>>>> nfs.disable: on
>>>>>>
>>>>>> performance.readdir-ahead: on
>>>>>>
>>>>>> transport.address-family: inet
>>>>>>
>>>>>> performance.quick-read: off
>>>>>>
>>>>>> performance.read-ahead: off
>>>>>>
>>>>>> performance.io-cache: off
>>>>>>
>>>>>> performance.stat-prefetch: off
>>>>>>
>>>>>> cluster.eager-lock: enable
>>>>>>
>>>>>> network.remote-dio: enable
>>>>>>
>>>>>> features.shard: on
>>>>>>
>>>>>> cluster.data-self-heal-algorithm: full
>>>>>>
>>>>>> features.cache-invalidation: on
>>>>>>
>>>>>> ganesha.enable: on
>>>>>>
>>>>>> features.shard-block-size: 256MB
>>>>>>
>>>>>> client.event-threads: 2
>>>>>>
>>>>>> server.event-threads: 2
>>>>>>
>>>>>> cluster.favorite-child-policy: size
>>>>>>
>>>>>> storage.build-pgfid: off
>>>>>>
>>>>>> network.ping-timeout: 5
>>>>>>
>>>>>> cluster.enable-shared-storage: enable
>>>>>>
>>>>>> nfs-ganesha: enable
>>>>>>
>>>>>> cluster.server-quorum-ratio: 51%
>>>>>>
>>>>>> Adding bricks:
>>>>>>
>>>>>> gluster volume add-brick vmware2 replica 2
>>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
>>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
>>>>>>
>>>>>> starting fix layout:
>>>>>>
>>>>>> gluster volume rebalance vmware2 fix-layout start
>>>>>>
>>>>>> Starting rebalance:
>>>>>>
>>>>>> gluster volume rebalance vmware2  start
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> Mahdi A. Mahdi
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>

Gandalf Corvotempesta

2017-Apr-27 06:43 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Updates on this critical bug ?

Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" <
gandalf.corvotempesta at gmail.com> ha scritto:
> Any update ?
> In addition, if this is a different bug but the "workflow" is the
same
> as the previous one, how is possible that fixing the previous bug
> triggered this new one ?
>
> Is possible to have some details ?
>
> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at
redhat.com>:
> > Nope. This is a different bug.
> >
> > -Krutika
> >
> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta
> > <gandalf.corvotempesta at gmail.com> wrote:
> >>
> >> This is a good news
> >> Is this related to the previously fixed bug?
> >>
> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj
at redhat.com> ha
> >> scritto:
> >>>
> >>> So Raghavendra has an RCA for this issue.
> >>>
> >>> Copy-pasting his comment here:
> >>>
> >>> <RCA>
> >>>
> >>> Following is a rough algorithm of shard_writev:
> >>>
> >>> 1. Based on the offset, calculate the shards touched by
current write.
> >>> 2. Look for inodes corresponding to these shard files in
itable.
> >>> 3. If one or more inodes are missing from itable, issue mknod
for
> >>> corresponding shard files and ignore EEXIST in cbk.
> >>> 4. resume writes on respective shards.
> >>>
> >>> Now, imagine a write which falls to an existing
"shard_file". For the
> >>> sake of discussion lets consider a distribute of three subvols
- s1,
> s2, s3
> >>>
> >>> 1. "shard_file" hashes to subvolume s2 and is
present on s2
> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of
".shard"
> >>> is fixed to include s4 and hash ranges are changed.
> >>> 3. write that touches "shard_file" is issued.
> >>> 4. The inode for "shard_file" is not present in
itable after a graph
> >>> switch and features/shard issues an mknod.
> >>> 5. With new layout of .shard, lets say "shard_file"
hashes to s3 and
> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is
already
> present on
> >>> s2.
> >>>
> >>> So, we have two files on two different subvols of dht
representing same
> >>> shard and this will lead to corruption.
> >>>
> >>> </RCA>
> >>>
> >>> Raghavendra will be sending out a patch in DHT to fix this
issue.
> >>>
> >>> -Krutika
> >>>
> >>>
> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri
> >>> <pkarampu at redhat.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <
> mahdi.adnan at outlook.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>
> >>>>> Do you guys have any update regarding this issue ?
> >>>>
> >>>> I do not actively work on this issue so I do not have an
accurate
> >>>> update, but from what I heard from Krutika and
Raghavendra(works on
> DHT) is:
> >>>> Krutika debugged initially and found that the issue seems
more likely
> to be
> >>>> in DHT, Satheesaran who helped us recreate this issue in
lab found
> that just
> >>>> fix-layout without rebalance also caused the corruption 1
out of 3
> times.
> >>>> Raghavendra came up with a possible RCA for why this can
happen.
> >>>> Raghavendra(CCed) would be the right person to provide
accurate
> update.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Respectfully
> >>>>> Mahdi A. Mahdi
> >>>>>
> >>>>> ________________________________
> >>>>> From: Krutika Dhananjay <kdhananj at redhat.com>
> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM
> >>>>> To: Mahdi Adnan
> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant
Palai;
> >>>>> gluster-users at gluster.org List
> >>>>>
> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance
VMs corruption
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> So it looks like Satheesaran managed to recreate this
issue. We will
> be
> >>>>> seeking his help in debugging this. It will be easier
that way.
> >>>>>
> >>>>> -Krutika
> >>>>>
> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <
> mahdi.adnan at outlook.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hello and thank you for your email.
> >>>>>> Actually no, i didn't check the gfid of the
vms.
> >>>>>> If this will help, i can setup a new test cluster
and get all the
> data
> >>>>>> you need.
> >>>>>>
> >>>>>> Get Outlook for Android
> >>>>>>
> >>>>>>
> >>>>>> From: Nithya Balachandran
> >>>>>> Sent: Monday, March 20, 20:57
> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
> >>>>>> To: Krutika Dhananjay
> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant
Palai,
> >>>>>> gluster-users at gluster.org List
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Do you know the GFIDs of the VM images which were
corrupted?
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Nithya
> >>>>>>
> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay
<kdhananj at redhat.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> I looked at the logs.
> >>>>>>
> >>>>>> From the time the new graph (since the add-brick
command you shared
> >>>>>> where bricks 41 through 44 are added) is switched
to (line 3011
> onwards in
> >>>>>> nfs-gfapi.log), I see the following kinds of
errors:
> >>>>>>
> >>>>>> 1. Lookups to a bunch of files failed with ENOENT
on both replicas
> >>>>>> which protocol/client converts to ESTALE. I am
guessing these
> entries got
> >>>>>> migrated to
> >>>>>>
> >>>>>> other subvolumes leading to 'No such file or
directory' errors.
> >>>>>>
> >>>>>> DHT and thereafter shard get the same error code
and log the
> >>>>>> following:
> >>>>>>
> >>>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
> >>>>>>
[dht-helper.c:1198:dht_migration_complete_check_task]
> 17-vmware2-dht:
> >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>:
failed     to lookup
> the file
> >>>>>> on vmware2-dht [Stale file handle]
> >>>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
> >>>>>> [shard.c:1253:shard_common_stat_cbk]
17-vmware2-shard: stat failed:
> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file
handle]
> >>>>>>
> >>>>>> which is fine.
> >>>>>>
> >>>>>> 2. The other kind are from AFR logging of possible
split-brain
> which I
> >>>>>> suppose are harmless too.
> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
> >>>>>> [afr-read-txn.c:228:afr_read_txn]
17-vmware2-replicate-13:
> Unreadable
> >>>>>> subvolume -1 found with event generation 2 for
gfid
> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible
split-brain)
> >>>>>>
> >>>>>> Since you are saying the bug is hit only on VMs
that are undergoing
> IO
> >>>>>> while rebalance is running (as opposed to those
that remained
> powered off),
> >>>>>>
> >>>>>> rebalance + IO could be causing some issues.
> >>>>>>
> >>>>>> CC'ing DHT devs
> >>>>>>
> >>>>>> Raghavendra/Nithya/Susant,
> >>>>>>
> >>>>>> Could you take a look?
> >>>>>>
> >>>>>> -Krutika
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <
> mahdi.adnan at outlook.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Thank you for your email mate.
> >>>>>>
> >>>>>> Yes, im aware of this but, to save costs i chose
replica 2, this
> >>>>>> cluster is all flash.
> >>>>>>
> >>>>>> In version 3.7.x i had issues with ping timeout,
if one hosts went
> >>>>>> down for few seconds the whole cluster hangs and
become
> unavailable, to
> >>>>>> avoid this i adjusted the ping timeout to 5
seconds.
> >>>>>>
> >>>>>> As for choosing Ganesha over gfapi, VMWare does
not support Gluster
> >>>>>> (FUSE or gfapi) im stuck with NFS for this volume.
> >>>>>>
> >>>>>> The other volume is mounted using gfapi in oVirt
cluster.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Respectfully
> >>>>>> Mahdi A. Mahdi
> >>>>>>
> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM
> >>>>>>
> >>>>>> To: Mahdi Adnan
> >>>>>> Cc: gluster-users at gluster.org
> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> While I'm still going through the logs, just
wanted to point out a
> >>>>>> couple of things:
> >>>>>>
> >>>>>> 1. It is recommended that you use 3-way
replication (replica count
> 3)
> >>>>>> for VM store use case
> >>>>>>
> >>>>>> 2. network.ping-timeout at 5 seconds is way too
low. Please change
> it
> >>>>>> to 30.
> >>>>>>
> >>>>>> Is there any specific reason for using NFS-Ganesha
over gfapi/FUSE?
> >>>>>>
> >>>>>> Will get back with anything else I might find or
more questions if I
> >>>>>> have any.
> >>>>>>
> >>>>>> -Krutika
> >>>>>>
> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <
> mahdi.adnan at outlook.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Thanks mate,
> >>>>>>
> >>>>>> Kindly, check the attachment.
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Respectfully
> >>>>>> Mahdi A. Mahdi
> >>>>>>
> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM
> >>>>>>
> >>>>>> To: Mahdi Adnan
> >>>>>> Cc: gluster-users at gluster.org
> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> In that case could you share the ganesha-gfapi
logs?
> >>>>>>
> >>>>>> -Krutika
> >>>>>>
> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan
> >>>>>> <mahdi.adnan at outlook.com> wrote:
> >>>>>>
> >>>>>> I have two volumes, one is mounted using libgfapi
for ovirt mount,
> the
> >>>>>> other one is exported via NFS-Ganesha for VMWare
which is the one
> im testing
> >>>>>> now.
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Respectfully
> >>>>>> Mahdi A. Mahdi
> >>>>>>
> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM
> >>>>>>
> >>>>>> To: Mahdi Adnan
> >>>>>> Cc: gluster-users at gluster.org
> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan
> >>>>>> <mahdi.adnan at outlook.com> wrote:
> >>>>>>
> >>>>>> Kindly, check the attached new log file, i dont
know if it's helpful
> >>>>>> or not but, i couldn't find the log with the
name you just
> described.
> >>>>>>
> >>>>>>
> >>>>>> No. Are you using FUSE or libgfapi for accessing
the volume? Or is
> it
> >>>>>> NFS?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -Krutika
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Respectfully
> >>>>>> Mahdi A. Mahdi
> >>>>>>
> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM
> >>>>>>
> >>>>>> To: Mahdi Adnan
> >>>>>> Cc: gluster-users at gluster.org
> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> mnt-disk11-vmware2.log seems like a brick log.
Could you attach the
> >>>>>> fuse mount logs? It should be right under
/var/log/glusterfs/
> directory
> >>>>>>
> >>>>>> named after the mount point name, only hyphenated.
> >>>>>>
> >>>>>> -Krutika
> >>>>>>
> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <
> mahdi.adnan at outlook.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hello Krutika,
> >>>>>>
> >>>>>> Kindly, check the attached logs.
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Respectfully
> >>>>>> Mahdi A. Mahdi
> >>>>>>
> >>>>>> From: Krutika Dhananjay <kdhananj at
redhat.com>
> >>>>>>
> >>>>>>
> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM
> >>>>>> To: Mahdi Adnan
> >>>>>> Cc: gluster-users at gluster.org
> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10
rebalance VMs corruption
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Hi Mahdi,
> >>>>>>
> >>>>>> Could you attach mount, brick and rebalance logs?
> >>>>>>
> >>>>>> -Krutika
> >>>>>>
> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan
> >>>>>> <mahdi.adnan at outlook.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have upgraded to Gluster 3.8.10 today and ran
the add-brick
> >>>>>> procedure in a volume contains few VMs.
> >>>>>>
> >>>>>> After the completion of rebalance, i have rebooted
the VMs, some of
> >>>>>> ran just fine, and others just crashed.
> >>>>>>
> >>>>>> Windows boot to recovery mode and Linux throw xfs
errors and does
> not
> >>>>>> boot.
> >>>>>>
> >>>>>> I ran the test again and it happened just as the
first one, but i
> have
> >>>>>> noticed only VMs doing disk IOs are affected by
this bug.
> >>>>>>
> >>>>>> The VMs in power off mode started fine and even
md5 of the disk file
> >>>>>> did not change after the rebalance.
> >>>>>>
> >>>>>> anyone else can confirm this ?
> >>>>>>
> >>>>>> Volume info:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Volume Name: vmware2
> >>>>>>
> >>>>>> Type: Distributed-Replicate
> >>>>>>
> >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
> >>>>>>
> >>>>>> Status: Started
> >>>>>>
> >>>>>> Snapshot Count: 0
> >>>>>>
> >>>>>> Number of Bricks: 22 x 2 = 44
> >>>>>>
> >>>>>> Transport-type: tcp
> >>>>>>
> >>>>>> Bricks:
> >>>>>>
> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2
> >>>>>>
> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2
> >>>>>>
> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2
> >>>>>>
> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2
> >>>>>>
> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2
> >>>>>>
> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2
> >>>>>>
> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2
> >>>>>>
> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2
> >>>>>>
> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2
> >>>>>>
> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2
> >>>>>>
> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2
> >>>>>>
> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2
> >>>>>>
> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2
> >>>>>>
> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2
> >>>>>>
> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2
> >>>>>>
> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2
> >>>>>>
> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2
> >>>>>>
> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2
> >>>>>>
> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2
> >>>>>>
> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2
> >>>>>>
> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2
> >>>>>>
> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2
> >>>>>>
> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2
> >>>>>>
> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2
> >>>>>>
> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2
> >>>>>>
> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2
> >>>>>>
> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2
> >>>>>>
> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2
> >>>>>>
> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2
> >>>>>>
> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2
> >>>>>>
> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2
> >>>>>>
> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2
> >>>>>>
> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2
> >>>>>>
> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2
> >>>>>>
> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2
> >>>>>>
> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2
> >>>>>>
> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2
> >>>>>>
> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2
> >>>>>>
> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2
> >>>>>>
> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2
> >>>>>>
> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2
> >>>>>>
> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2
> >>>>>>
> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2
> >>>>>>
> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2
> >>>>>>
> >>>>>> Options Reconfigured:
> >>>>>>
> >>>>>> cluster.server-quorum-type: server
> >>>>>>
> >>>>>> nfs.disable: on
> >>>>>>
> >>>>>> performance.readdir-ahead: on
> >>>>>>
> >>>>>> transport.address-family: inet
> >>>>>>
> >>>>>> performance.quick-read: off
> >>>>>>
> >>>>>> performance.read-ahead: off
> >>>>>>
> >>>>>> performance.io-cache: off
> >>>>>>
> >>>>>> performance.stat-prefetch: off
> >>>>>>
> >>>>>> cluster.eager-lock: enable
> >>>>>>
> >>>>>> network.remote-dio: enable
> >>>>>>
> >>>>>> features.shard: on
> >>>>>>
> >>>>>> cluster.data-self-heal-algorithm: full
> >>>>>>
> >>>>>> features.cache-invalidation: on
> >>>>>>
> >>>>>> ganesha.enable: on
> >>>>>>
> >>>>>> features.shard-block-size: 256MB
> >>>>>>
> >>>>>> client.event-threads: 2
> >>>>>>
> >>>>>> server.event-threads: 2
> >>>>>>
> >>>>>> cluster.favorite-child-policy: size
> >>>>>>
> >>>>>> storage.build-pgfid: off
> >>>>>>
> >>>>>> network.ping-timeout: 5
> >>>>>>
> >>>>>> cluster.enable-shared-storage: enable
> >>>>>>
> >>>>>> nfs-ganesha: enable
> >>>>>>
> >>>>>> cluster.server-quorum-ratio: 51%
> >>>>>>
> >>>>>> Adding bricks:
> >>>>>>
> >>>>>> gluster volume add-brick vmware2 replica 2
> >>>>>> gluster01:/mnt/disk11/vmware2
gluster03:/mnt/disk11/vmware2
> >>>>>> gluster02:/mnt/disk11/vmware2
gluster04:/mnt/disk11/vmware2
> >>>>>>
> >>>>>> starting fix layout:
> >>>>>>
> >>>>>> gluster volume rebalance vmware2 fix-layout start
> >>>>>>
> >>>>>> Starting rebalance:
> >>>>>>
> >>>>>> gluster volume rebalance vmware2  start
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Respectfully
> >>>>>> Mahdi A. Mahdi
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Gluster-users mailing list
> >>>>>> Gluster-users at gluster.org
> >>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Gluster-users mailing list
> >>>>> Gluster-users at gluster.org
> >>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Pranith
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/cdcc5928/attachment.html>

Gluster users - Apr 2017 - Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption