thr3ads.net - Gluster users - [Gluster-users] Gluster 3.8.10 rebalance VMs corruption [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Gandalf Corvotempesta

2017-Apr-03 11:33 UTC

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

This is a good news
Is this related to the previously fixed bug?

Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha
scritto:
> So Raghavendra has an RCA for this issue.
>
> Copy-pasting his comment here:
>
> <RCA>
>
> Following is a rough algorithm of shard_writev:
>
> 1. Based on the offset, calculate the shards touched by current write.
> 2. Look for inodes corresponding to these shard files in itable.
> 3. If one or more inodes are missing from itable, issue mknod for
corresponding shard files and ignore EEXIST in cbk.
> 4. resume writes on respective shards.
>
> Now, imagine a write which falls to an existing "shard_file". For
the sake of discussion lets consider a distribute of three subvols - s1, s2, s3
>
> 1. "shard_file" hashes to subvolume s2 and is present on s2
> 2. add a subvolume s4 and initiate a fix layout. The layout of
".shard" is fixed to include s4 and hash ranges are changed.
> 3. write that touches "shard_file" is issued.
> 4. The inode for "shard_file" is not present in itable after a
graph switch and features/shard issues an mknod.
> 5. With new layout of .shard, lets say "shard_file" hashes to s3
and mknod (shard_file) on s3 succeeds. But, the shard_file is already present on
s2.
>
> So, we have two files on two different subvols of dht representing same
shard and this will lead to corruption.
>
> </RCA>
>
> Raghavendra will be sending out a patch in DHT to fix this issue.
>
> -Krutika
>
>
> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> Do you guys have any update regarding this issue ?
>>>
>> I do not actively work on this issue so I do not have an accurate
update,
>> but from what I heard from Krutika and Raghavendra(works on DHT) is:
>> Krutika debugged initially and found that the issue seems more likely
to be
>> in DHT, Satheesaran who helped us recreate this issue in lab found that
>> just fix-layout without rebalance also caused the corruption 1 out of 3
>> times. Raghavendra came up with a possible RCA for why this can happen.
>> Raghavendra(CCed) would be the right person to provide accurate update.
>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>>> *To:* Mahdi Adnan
>>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>>> gluster-users at gluster.org List
>>>
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>
>>> Hi,
>>>
>>> So it looks like Satheesaran managed to recreate this issue. We
will be
>>> seeking his help in debugging this. It will be easier that way.
>>>
>>> -Krutika
>>>
>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>>> Hello and thank you for your email.
>>>> Actually no, i didn't check the gfid of the vms.
>>>> If this will help, i can setup a new test cluster and get all
the data
>>>> you need.
>>>>
>>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>>
>>>> From: Nithya Balachandran
>>>> Sent: Monday, March 20, 20:57
>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>> To: Krutika Dhananjay
>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>>> gluster-users at gluster.org List
>>>>
>>>> Hi,
>>>>
>>>> Do you know the GFIDs of the VM images which were corrupted?
>>>>
>>>> Regards,
>>>>
>>>> Nithya
>>>>
>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at
redhat.com>
>>>> wrote:
>>>>
>>>> I looked at the logs.
>>>>
>>>> From the time the new graph (since the add-brick command you
shared
>>>> where bricks 41 through 44 are added) is switched to (line 3011
onwards in
>>>> nfs-gfapi.log), I see the following kinds of errors:
>>>>
>>>> 1. Lookups to a bunch of files failed with ENOENT on both
replicas
>>>> which protocol/client converts to ESTALE. I am guessing these
entries got
>>>> migrated to
>>>>
>>>> other subvolumes leading to 'No such file or directory'
errors.
>>>>
>>>> DHT and thereafter shard get the same error code and log the
following:
>>>>
>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>>> [dht-helper.c:1198:dht_migration_complete_check_task]
17-vmware2-dht:
>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed    
to lookup the
>>>> file on vmware2-dht [Stale file handle]
>>>>
>>>>
>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat
failed:
>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>>>
>>>> which is fine.
>>>>
>>>> 2. The other kind are from AFR logging of possible split-brain
which I
>>>> suppose are harmless too.
>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13:
Unreadable
>>>> subvolume -1 found with event generation 2 for gfid
>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)
>>>>
>>>> Since you are saying the bug is hit only on VMs that are
undergoing IO
>>>> while rebalance is running (as opposed to those that remained
powered off),
>>>>
>>>> rebalance + IO could be causing some issues.
>>>>
>>>> CC'ing DHT devs
>>>>
>>>> Raghavendra/Nithya/Susant,
>>>>
>>>> Could you take a look?
>>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>>> wrote:
>>>>
>>>> Thank you for your email mate.
>>>>
>>>> Yes, im aware of this but, to save costs i chose replica 2,
this
>>>> cluster is all flash.
>>>>
>>>> In version 3.7.x i had issues with ping timeout, if one hosts
went down
>>>> for few seconds the whole cluster hangs and become unavailable,
to avoid
>>>> this i adjusted the ping timeout to 5 seconds.
>>>>
>>>> As for choosing Ganesha over gfapi, VMWare does not support
Gluster
>>>> (FUSE or gfapi) im stuck with NFS for this volume.
>>>>
>>>> The other volume is mounted using gfapi in oVirt cluster.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>
>>>>
>>>>
>>>> While I'm still going through the logs, just wanted to
point out a
>>>> couple of things:
>>>>
>>>> 1. It is recommended that you use 3-way replication (replica
count 3)
>>>> for VM store use case
>>>>
>>>> 2. network.ping-timeout at 5 seconds is way too low. Please
change it
>>>> to 30.
>>>>
>>>> Is there any specific reason for using NFS-Ganesha over
gfapi/FUSE?
>>>>
>>>> Will get back with anything else I might find or more questions
if I
>>>> have any.
>>>>
>>>> -Krutika
>>>>
>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>>> wrote:
>>>>
>>>> Thanks mate,
>>>>
>>>> Kindly, check the attachment.
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>
>>>>
>>>>
>>>> In that case could you share the ganesha-gfapi logs?
>>>>
>>>> -Krutika
>>>>
>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan
at outlook.com>
>>>> wrote:
>>>>
>>>> I have two volumes, one is mounted using libgfapi for ovirt
mount, the
>>>> other one is exported via NFS-Ganesha for VMWare which is the
one im
>>>> testing now.
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>
>>>>
>>>>
>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan
at outlook.com>
>>>> wrote:
>>>>
>>>> Kindly, check the attached new log file, i dont know if
it's helpful or
>>>> not but, i couldn't find the log with the name you just
described.
>>>>
>>>>
>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or
is it
>>>> NFS?
>>>>
>>>>
>>>>
>>>> -Krutika
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>
>>>>
>>>>
>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach
the
>>>> fuse mount logs? It should be right under /var/log/glusterfs/
directory
>>>>
>>>> named after the mount point name, only hyphenated.
>>>>
>>>> -Krutika
>>>>
>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>>> wrote:
>>>>
>>>> Hello Krutika,
>>>>
>>>> Kindly, check the attached logs.
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>
>>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>
>>>>
>>>>
>>>> Hi Mahdi,
>>>>
>>>> Could you attach mount, brick and rebalance logs?
>>>>
>>>> -Krutika
>>>>
>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan
at outlook.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick
procedure
>>>> in a volume contains few VMs.
>>>>
>>>> After the completion of rebalance, i have rebooted the VMs,
some of ran
>>>> just fine, and others just crashed.
>>>>
>>>> Windows boot to recovery mode and Linux throw xfs errors and
does not
>>>> boot.
>>>>
>>>> I ran the test again and it happened just as the first one, but
i have
>>>> noticed only VMs doing disk IOs are affected by this bug.
>>>>
>>>> The VMs in power off mode started fine and even md5 of the disk
file
>>>> did not change after the rebalance.
>>>>
>>>> anyone else can confirm this ?
>>>>
>>>> Volume info:
>>>>
>>>>
>>>>
>>>> Volume Name: vmware2
>>>>
>>>> Type: Distributed-Replicate
>>>>
>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>
>>>> Status: Started
>>>>
>>>> Snapshot Count: 0
>>>>
>>>> Number of Bricks: 22 x 2 = 44
>>>>
>>>> Transport-type: tcp
>>>>
>>>> Bricks:
>>>>
>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>>
>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>>
>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>>
>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>>
>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>>
>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>>
>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>>
>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>>
>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>>
>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>>
>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>>
>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>>
>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>>
>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>>
>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>>
>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>>
>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>>
>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>>
>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>>
>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>>
>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>>
>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>>
>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>>
>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>>
>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>>
>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>>
>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>>
>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>>
>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>>
>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>>
>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>>
>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>>
>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>>
>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>>
>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>>
>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>>
>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>>
>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>>
>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>>
>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>>
>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>>
>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>>
>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>>
>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>>
>>>> Options Reconfigured:
>>>>
>>>> cluster.server-quorum-type: server
>>>>
>>>> nfs.disable: on
>>>>
>>>> performance.readdir-ahead: on
>>>>
>>>> transport.address-family: inet
>>>>
>>>> performance.quick-read: off
>>>>
>>>> performance.read-ahead: off
>>>>
>>>> performance.io-cache: off
>>>>
>>>> performance.stat-prefetch: off
>>>>
>>>> cluster.eager-lock: enable
>>>>
>>>> network.remote-dio: enable
>>>>
>>>> features.shard: on
>>>>
>>>> cluster.data-self-heal-algorithm: full
>>>>
>>>> features.cache-invalidation: on
>>>>
>>>> ganesha.enable: on
>>>>
>>>> features.shard-block-size: 256MB
>>>>
>>>> client.event-threads: 2
>>>>
>>>> server.event-threads: 2
>>>>
>>>> cluster.favorite-child-policy: size
>>>>
>>>> storage.build-pgfid: off
>>>>
>>>> network.ping-timeout: 5
>>>>
>>>> cluster.enable-shared-storage: enable
>>>>
>>>> nfs-ganesha: enable
>>>>
>>>> cluster.server-quorum-ratio: 51%
>>>>
>>>> Adding bricks:
>>>>
>>>> gluster volume add-brick vmware2 replica 2
>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2
>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2
>>>>
>>>> starting fix layout:
>>>>
>>>> gluster volume rebalance vmware2 fix-layout start
>>>>
>>>> Starting rebalance:
>>>>
>>>> gluster volume rebalance vmware2  start
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170403/43ec3cb0/attachment.html>

Krutika Dhananjay

2017-Apr-04 14:11 UTC

head link

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Nope. This is a different bug.

-Krutika

On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta <
gandalf.corvotempesta at gmail.com> wrote:
> This is a good news
> Is this related to the previously fixed bug?
>
> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at
redhat.com> ha
> scritto:
>
>> So Raghavendra has an RCA for this issue.
>>
>> Copy-pasting his comment here:
>>
>> <RCA>
>>
>> Following is a rough algorithm of shard_writev:
>>
>> 1. Based on the offset, calculate the shards touched by current write.
>> 2. Look for inodes corresponding to these shard files in itable.
>> 3. If one or more inodes are missing from itable, issue mknod for
corresponding shard files and ignore EEXIST in cbk.
>> 4. resume writes on respective shards.
>>
>> Now, imagine a write which falls to an existing "shard_file".
For the sake of discussion lets consider a distribute of three subvols - s1, s2,
s3
>>
>> 1. "shard_file" hashes to subvolume s2 and is present on s2
>> 2. add a subvolume s4 and initiate a fix layout. The layout of
".shard" is fixed to include s4 and hash ranges are changed.
>> 3. write that touches "shard_file" is issued.
>> 4. The inode for "shard_file" is not present in itable after
a graph switch and features/shard issues an mknod.
>> 5. With new layout of .shard, lets say "shard_file" hashes to
s3 and mknod (shard_file) on s3 succeeds. But, the shard_file is already present
on s2.
>>
>> So, we have two files on two different subvols of dht representing same
shard and this will lead to corruption.
>>
>> </RCA>
>>
>> Raghavendra will be sending out a patch in DHT to fix this issue.
>>
>> -Krutika
>>
>>
>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>>
>>>
>>>
>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> Do you guys have any update regarding this issue ?
>>>>
>>> I do not actively work on this issue so I do not have an accurate
>>> update, but from what I heard from Krutika and Raghavendra(works on
DHT)
>>> is: Krutika debugged initially and found that the issue seems more
likely
>>> to be in DHT, Satheesaran who helped us recreate this issue in lab
found
>>> that just fix-layout without rebalance also caused the corruption 1
out of
>>> 3 times. Raghavendra came up with a possible RCA for why this can
happen.
>>> Raghavendra(CCed) would be the right person to provide accurate
update.
>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> ------------------------------
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>>>> *To:* Mahdi Adnan
>>>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>>>> gluster-users at gluster.org List
>>>>
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>
>>>> Hi,
>>>>
>>>> So it looks like Satheesaran managed to recreate this issue. We
will be
>>>> seeking his help in debugging this. It will be easier that way.
>>>>
>>>> -Krutika
>>>>
>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at
outlook.com>
>>>> wrote:
>>>>
>>>>> Hello and thank you for your email.
>>>>> Actually no, i didn't check the gfid of the vms.
>>>>> If this will help, i can setup a new test cluster and get
all the data
>>>>> you need.
>>>>>
>>>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>>>
>>>>> From: Nithya Balachandran
>>>>> Sent: Monday, March 20, 20:57
>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>> To: Krutika Dhananjay
>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>>>> gluster-users at gluster.org List
>>>>>
>>>>> Hi,
>>>>>
>>>>> Do you know the GFIDs of the VM images which were
corrupted?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nithya
>>>>>
>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj
at redhat.com>
>>>>> wrote:
>>>>>
>>>>> I looked at the logs.
>>>>>
>>>>> From the time the new graph (since the add-brick command
you shared
>>>>> where bricks 41 through 44 are added) is switched to (line
3011 onwards in
>>>>> nfs-gfapi.log), I see the following kinds of errors:
>>>>>
>>>>> 1. Lookups to a bunch of files failed with ENOENT on both
replicas
>>>>> which protocol/client converts to ESTALE. I am guessing
these entries got
>>>>> migrated to
>>>>>
>>>>> other subvolumes leading to 'No such file or
directory' errors.
>>>>>
>>>>> DHT and thereafter shard get the same error code and log
the following:
>>>>>
>>>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>>>> [dht-helper.c:1198:dht_migration_complete_check_task]
17-vmware2-dht:
>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed  
to lookup the
>>>>> file on vmware2-dht [Stale file handle]
>>>>>
>>>>>
>>>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat
failed:
>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]
>>>>>
>>>>> which is fine.
>>>>>
>>>>> 2. The other kind are from AFR logging of possible
split-brain which I
>>>>> suppose are harmless too.
>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008]
>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13:
Unreadable
>>>>> subvolume -1 found with event generation 2 for gfid
>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible
split-brain)
>>>>>
>>>>> Since you are saying the bug is hit only on VMs that are
undergoing IO
>>>>> while rebalance is running (as opposed to those that
remained powered off),
>>>>>
>>>>> rebalance + IO could be causing some issues.
>>>>>
>>>>> CC'ing DHT devs
>>>>>
>>>>> Raghavendra/Nithya/Susant,
>>>>>
>>>>> Could you take a look?
>>>>>
>>>>> -Krutika
>>>>>
>>>>>
>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>> Thank you for your email mate.
>>>>>
>>>>> Yes, im aware of this but, to save costs i chose replica 2,
this
>>>>> cluster is all flash.
>>>>>
>>>>> In version 3.7.x i had issues with ping timeout, if one
hosts went
>>>>> down for few seconds the whole cluster hangs and become
unavailable, to
>>>>> avoid this i adjusted the ping timeout to 5 seconds.
>>>>>
>>>>> As for choosing Ganesha over gfapi, VMWare does not support
Gluster
>>>>> (FUSE or gfapi) im stuck with NFS for this volume.
>>>>>
>>>>> The other volume is mounted using gfapi in oVirt cluster.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>>>>>
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-users at gluster.org
>>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>>
>>>>>
>>>>>
>>>>> While I'm still going through the logs, just wanted to
point out a
>>>>> couple of things:
>>>>>
>>>>> 1. It is recommended that you use 3-way replication
(replica count 3)
>>>>> for VM store use case
>>>>>
>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please
change it
>>>>> to 30.
>>>>>
>>>>> Is there any specific reason for using NFS-Ganesha over
gfapi/FUSE?
>>>>>
>>>>> Will get back with anything else I might find or more
questions if I
>>>>> have any.
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>> Thanks mate,
>>>>>
>>>>> Kindly, check the attachment.
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>>>>>
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-users at gluster.org
>>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>>
>>>>>
>>>>>
>>>>> In that case could you share the ganesha-gfapi logs?
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>> I have two volumes, one is mounted using libgfapi for ovirt
mount, the
>>>>> other one is exported via NFS-Ganesha for VMWare which is
the one im
>>>>> testing now.
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>>>>
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-users at gluster.org
>>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>> Kindly, check the attached new log file, i dont know if
it's helpful
>>>>> or not but, i couldn't find the log with the name you
just described.
>>>>>
>>>>>
>>>>> No. Are you using FUSE or libgfapi for accessing the
volume? Or is it
>>>>> NFS?
>>>>>
>>>>>
>>>>>
>>>>> -Krutika
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>>>>
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-users at gluster.org
>>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>>
>>>>>
>>>>>
>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you
attach the
>>>>> fuse mount logs? It should be right under
/var/log/glusterfs/ directory
>>>>>
>>>>> named after the mount point name, only hyphenated.
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>> Hello Krutika,
>>>>>
>>>>> Kindly, check the attached logs.
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>>
>>>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-users at gluster.org
>>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs
corruption
>>>>>
>>>>>
>>>>>
>>>>> Hi Mahdi,
>>>>>
>>>>> Could you attach mount, brick and rebalance logs?
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan
<mahdi.adnan at outlook.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have upgraded to Gluster 3.8.10 today and ran the
add-brick
>>>>> procedure in a volume contains few VMs.
>>>>>
>>>>> After the completion of rebalance, i have rebooted the VMs,
some of
>>>>> ran just fine, and others just crashed.
>>>>>
>>>>> Windows boot to recovery mode and Linux throw xfs errors
and does not
>>>>> boot.
>>>>>
>>>>> I ran the test again and it happened just as the first one,
but i have
>>>>> noticed only VMs doing disk IOs are affected by this bug.
>>>>>
>>>>> The VMs in power off mode started fine and even md5 of the
disk file
>>>>> did not change after the rebalance.
>>>>>
>>>>> anyone else can confirm this ?
>>>>>
>>>>> Volume info:
>>>>>
>>>>>
>>>>>
>>>>> Volume Name: vmware2
>>>>>
>>>>> Type: Distributed-Replicate
>>>>>
>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>>
>>>>> Status: Started
>>>>>
>>>>> Snapshot Count: 0
>>>>>
>>>>> Number of Bricks: 22 x 2 = 44
>>>>>
>>>>> Transport-type: tcp
>>>>>
>>>>> Bricks:
>>>>>
>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>>>
>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>>>
>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>>>
>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>>>
>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>>>
>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>>>
>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>>>
>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>>>
>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>>>
>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>>>
>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>>>
>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>>>
>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>>>
>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>>>
>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>>>
>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>>>
>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>>>
>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>>>
>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>>>
>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>>>
>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>>>
>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>>>
>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>>>
>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>>>
>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>>>
>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>>>
>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>>>
>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>>>
>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>>>
>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>>>
>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>>>
>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>>>
>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>>>
>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>>>
>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>>>
>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>>>
>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>>>
>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>>>
>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>>>
>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>>>
>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>>>
>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>>>
>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>>>
>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>>>
>>>>> Options Reconfigured:
>>>>>
>>>>> cluster.server-quorum-type: server
>>>>>
>>>>> nfs.disable: on
>>>>>
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> transport.address-family: inet
>>>>>
>>>>> performance.quick-read: off
>>>>>
>>>>> performance.read-ahead: off
>>>>>
>>>>> performance.io-cache: off
>>>>>
>>>>> performance.stat-prefetch: off
>>>>>
>>>>> cluster.eager-lock: enable
>>>>>
>>>>> network.remote-dio: enable
>>>>>
>>>>> features.shard: on
>>>>>
>>>>> cluster.data-self-heal-algorithm: full
>>>>>
>>>>> features.cache-invalidation: on
>>>>>
>>>>> ganesha.enable: on
>>>>>
>>>>> features.shard-block-size: 256MB
>>>>>
>>>>> client.event-threads: 2
>>>>>
>>>>> server.event-threads: 2
>>>>>
>>>>> cluster.favorite-child-policy: size
>>>>>
>>>>> storage.build-pgfid: off
>>>>>
>>>>> network.ping-timeout: 5
>>>>>
>>>>> cluster.enable-shared-storage: enable
>>>>>
>>>>> nfs-ganesha: enable
>>>>>
>>>>> cluster.server-quorum-ratio: 51%
>>>>>
>>>>> Adding bricks:
>>>>>
>>>>> gluster volume add-brick vmware2 replica 2
>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2
>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2
>>>>>
>>>>> starting fix layout:
>>>>>
>>>>> gluster volume rebalance vmware2 fix-layout start
>>>>>
>>>>> Starting rebalance:
>>>>>
>>>>> gluster volume rebalance vmware2  start
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170404/602306a4/attachment.html>

Gluster users - Apr 2017 - Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

[Gluster-users] Gluster 3.8.10 rebalance VMs corruption