Gandalf Corvotempesta
2017-Apr-18 18:24 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Any update ? In addition, if this is a different bug but the "workflow" is the same as the previous one, how is possible that fixing the previous bug triggered this new one ? Is possible to have some details ? 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>:> Nope. This is a different bug. > > -Krutika > > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta > <gandalf.corvotempesta at gmail.com> wrote: >> >> This is a good news >> Is this related to the previously fixed bug? >> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha >> scritto: >>> >>> So Raghavendra has an RCA for this issue. >>> >>> Copy-pasting his comment here: >>> >>> <RCA> >>> >>> Following is a rough algorithm of shard_writev: >>> >>> 1. Based on the offset, calculate the shards touched by current write. >>> 2. Look for inodes corresponding to these shard files in itable. >>> 3. If one or more inodes are missing from itable, issue mknod for >>> corresponding shard files and ignore EEXIST in cbk. >>> 4. resume writes on respective shards. >>> >>> Now, imagine a write which falls to an existing "shard_file". For the >>> sake of discussion lets consider a distribute of three subvols - s1, s2, s3 >>> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2 >>> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard" >>> is fixed to include s4 and hash ranges are changed. >>> 3. write that touches "shard_file" is issued. >>> 4. The inode for "shard_file" is not present in itable after a graph >>> switch and features/shard issues an mknod. >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already present on >>> s2. >>> >>> So, we have two files on two different subvols of dht representing same >>> shard and this will lead to corruption. >>> >>> </RCA> >>> >>> Raghavendra will be sending out a patch in DHT to fix this issue. >>> >>> -Krutika >>> >>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri >>> <pkarampu at redhat.com> wrote: >>>> >>>> >>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> >>>>> Do you guys have any update regarding this issue ? >>>> >>>> I do not actively work on this issue so I do not have an accurate >>>> update, but from what I heard from Krutika and Raghavendra(works on DHT) is: >>>> Krutika debugged initially and found that the issue seems more likely to be >>>> in DHT, Satheesaran who helped us recreate this issue in lab found that just >>>> fix-layout without rebalance also caused the corruption 1 out of 3 times. >>>> Raghavendra came up with a possible RCA for why this can happen. >>>> Raghavendra(CCed) would be the right person to provide accurate update. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> Mahdi A. Mahdi >>>>> >>>>> ________________________________ >>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM >>>>> To: Mahdi Adnan >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >>>>> gluster-users at gluster.org List >>>>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> >>>>> Hi, >>>>> >>>>> So it looks like Satheesaran managed to recreate this issue. We will be >>>>> seeking his help in debugging this. It will be easier that way. >>>>> >>>>> -Krutika >>>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>>> >>>>>> Hello and thank you for your email. >>>>>> Actually no, i didn't check the gfid of the vms. >>>>>> If this will help, i can setup a new test cluster and get all the data >>>>>> you need. >>>>>> >>>>>> Get Outlook for Android >>>>>> >>>>>> >>>>>> From: Nithya Balachandran >>>>>> Sent: Monday, March 20, 20:57 >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>>> To: Krutika Dhananjay >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >>>>>> gluster-users at gluster.org List >>>>>> >>>>>> Hi, >>>>>> >>>>>> Do you know the GFIDs of the VM images which were corrupted? >>>>>> >>>>>> Regards, >>>>>> >>>>>> Nithya >>>>>> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> >>>>>> wrote: >>>>>> >>>>>> I looked at the logs. >>>>>> >>>>>> From the time the new graph (since the add-brick command you shared >>>>>> where bricks 41 through 44 are added) is switched to (line 3011 onwards in >>>>>> nfs-gfapi.log), I see the following kinds of errors: >>>>>> >>>>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas >>>>>> which protocol/client converts to ESTALE. I am guessing these entries got >>>>>> migrated to >>>>>> >>>>>> other subvolumes leading to 'No such file or directory' errors. >>>>>> >>>>>> DHT and thereafter shard get the same error code and log the >>>>>> following: >>>>>> >>>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >>>>>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht: >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup the file >>>>>> on vmware2-dht [Stale file handle] >>>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >>>>>> >>>>>> which is fine. >>>>>> >>>>>> 2. The other kind are from AFR logging of possible split-brain which I >>>>>> suppose are harmless too. >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable >>>>>> subvolume -1 found with event generation 2 for gfid >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >>>>>> >>>>>> Since you are saying the bug is hit only on VMs that are undergoing IO >>>>>> while rebalance is running (as opposed to those that remained powered off), >>>>>> >>>>>> rebalance + IO could be causing some issues. >>>>>> >>>>>> CC'ing DHT devs >>>>>> >>>>>> Raghavendra/Nithya/Susant, >>>>>> >>>>>> Could you take a look? >>>>>> >>>>>> -Krutika >>>>>> >>>>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>>> wrote: >>>>>> >>>>>> Thank you for your email mate. >>>>>> >>>>>> Yes, im aware of this but, to save costs i chose replica 2, this >>>>>> cluster is all flash. >>>>>> >>>>>> In version 3.7.x i had issues with ping timeout, if one hosts went >>>>>> down for few seconds the whole cluster hangs and become unavailable, to >>>>>> avoid this i adjusted the ping timeout to 5 seconds. >>>>>> >>>>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster >>>>>> (FUSE or gfapi) im stuck with NFS for this volume. >>>>>> >>>>>> The other volume is mounted using gfapi in oVirt cluster. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> Mahdi A. Mahdi >>>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM >>>>>> >>>>>> To: Mahdi Adnan >>>>>> Cc: gluster-users at gluster.org >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>>> >>>>>> >>>>>> >>>>>> While I'm still going through the logs, just wanted to point out a >>>>>> couple of things: >>>>>> >>>>>> 1. It is recommended that you use 3-way replication (replica count 3) >>>>>> for VM store use case >>>>>> >>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please change it >>>>>> to 30. >>>>>> >>>>>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? >>>>>> >>>>>> Will get back with anything else I might find or more questions if I >>>>>> have any. >>>>>> >>>>>> -Krutika >>>>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>>> wrote: >>>>>> >>>>>> Thanks mate, >>>>>> >>>>>> Kindly, check the attachment. >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> Mahdi A. Mahdi >>>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM >>>>>> >>>>>> To: Mahdi Adnan >>>>>> Cc: gluster-users at gluster.org >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>>> >>>>>> >>>>>> >>>>>> In that case could you share the ganesha-gfapi logs? >>>>>> >>>>>> -Krutika >>>>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan >>>>>> <mahdi.adnan at outlook.com> wrote: >>>>>> >>>>>> I have two volumes, one is mounted using libgfapi for ovirt mount, the >>>>>> other one is exported via NFS-Ganesha for VMWare which is the one im testing >>>>>> now. >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> Mahdi A. Mahdi >>>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM >>>>>> >>>>>> To: Mahdi Adnan >>>>>> Cc: gluster-users at gluster.org >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan >>>>>> <mahdi.adnan at outlook.com> wrote: >>>>>> >>>>>> Kindly, check the attached new log file, i dont know if it's helpful >>>>>> or not but, i couldn't find the log with the name you just described. >>>>>> >>>>>> >>>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it >>>>>> NFS? >>>>>> >>>>>> >>>>>> >>>>>> -Krutika >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> Mahdi A. Mahdi >>>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM >>>>>> >>>>>> To: Mahdi Adnan >>>>>> Cc: gluster-users at gluster.org >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>>> >>>>>> >>>>>> >>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the >>>>>> fuse mount logs? It should be right under /var/log/glusterfs/ directory >>>>>> >>>>>> named after the mount point name, only hyphenated. >>>>>> >>>>>> -Krutika >>>>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>>> wrote: >>>>>> >>>>>> Hello Krutika, >>>>>> >>>>>> Kindly, check the attached logs. >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> Mahdi A. Mahdi >>>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>>> >>>>>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM >>>>>> To: Mahdi Adnan >>>>>> Cc: gluster-users at gluster.org >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>>> >>>>>> >>>>>> >>>>>> Hi Mahdi, >>>>>> >>>>>> Could you attach mount, brick and rebalance logs? >>>>>> >>>>>> -Krutika >>>>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan >>>>>> <mahdi.adnan at outlook.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick >>>>>> procedure in a volume contains few VMs. >>>>>> >>>>>> After the completion of rebalance, i have rebooted the VMs, some of >>>>>> ran just fine, and others just crashed. >>>>>> >>>>>> Windows boot to recovery mode and Linux throw xfs errors and does not >>>>>> boot. >>>>>> >>>>>> I ran the test again and it happened just as the first one, but i have >>>>>> noticed only VMs doing disk IOs are affected by this bug. >>>>>> >>>>>> The VMs in power off mode started fine and even md5 of the disk file >>>>>> did not change after the rebalance. >>>>>> >>>>>> anyone else can confirm this ? >>>>>> >>>>>> Volume info: >>>>>> >>>>>> >>>>>> >>>>>> Volume Name: vmware2 >>>>>> >>>>>> Type: Distributed-Replicate >>>>>> >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>>>>> >>>>>> Status: Started >>>>>> >>>>>> Snapshot Count: 0 >>>>>> >>>>>> Number of Bricks: 22 x 2 = 44 >>>>>> >>>>>> Transport-type: tcp >>>>>> >>>>>> Bricks: >>>>>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2 >>>>>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2 >>>>>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2 >>>>>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2 >>>>>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2 >>>>>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2 >>>>>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2 >>>>>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2 >>>>>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2 >>>>>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2 >>>>>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2 >>>>>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2 >>>>>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2 >>>>>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2 >>>>>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2 >>>>>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2 >>>>>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2 >>>>>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2 >>>>>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2 >>>>>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2 >>>>>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2 >>>>>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2 >>>>>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2 >>>>>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2 >>>>>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2 >>>>>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2 >>>>>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2 >>>>>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2 >>>>>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2 >>>>>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2 >>>>>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2 >>>>>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2 >>>>>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2 >>>>>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2 >>>>>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2 >>>>>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2 >>>>>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2 >>>>>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2 >>>>>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2 >>>>>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2 >>>>>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2 >>>>>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2 >>>>>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2 >>>>>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2 >>>>>> >>>>>> Options Reconfigured: >>>>>> >>>>>> cluster.server-quorum-type: server >>>>>> >>>>>> nfs.disable: on >>>>>> >>>>>> performance.readdir-ahead: on >>>>>> >>>>>> transport.address-family: inet >>>>>> >>>>>> performance.quick-read: off >>>>>> >>>>>> performance.read-ahead: off >>>>>> >>>>>> performance.io-cache: off >>>>>> >>>>>> performance.stat-prefetch: off >>>>>> >>>>>> cluster.eager-lock: enable >>>>>> >>>>>> network.remote-dio: enable >>>>>> >>>>>> features.shard: on >>>>>> >>>>>> cluster.data-self-heal-algorithm: full >>>>>> >>>>>> features.cache-invalidation: on >>>>>> >>>>>> ganesha.enable: on >>>>>> >>>>>> features.shard-block-size: 256MB >>>>>> >>>>>> client.event-threads: 2 >>>>>> >>>>>> server.event-threads: 2 >>>>>> >>>>>> cluster.favorite-child-policy: size >>>>>> >>>>>> storage.build-pgfid: off >>>>>> >>>>>> network.ping-timeout: 5 >>>>>> >>>>>> cluster.enable-shared-storage: enable >>>>>> >>>>>> nfs-ganesha: enable >>>>>> >>>>>> cluster.server-quorum-ratio: 51% >>>>>> >>>>>> Adding bricks: >>>>>> >>>>>> gluster volume add-brick vmware2 replica 2 >>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >>>>>> >>>>>> starting fix layout: >>>>>> >>>>>> gluster volume rebalance vmware2 fix-layout start >>>>>> >>>>>> Starting rebalance: >>>>>> >>>>>> gluster volume rebalance vmware2 start >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Respectfully >>>>>> Mahdi A. Mahdi >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> >>>> -- >>>> Pranith >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >
Gandalf Corvotempesta
2017-Apr-27 06:43 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Updates on this critical bug ? Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" < gandalf.corvotempesta at gmail.com> ha scritto:> Any update ? > In addition, if this is a different bug but the "workflow" is the same > as the previous one, how is possible that fixing the previous bug > triggered this new one ? > > Is possible to have some details ? > > 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>: > > Nope. This is a different bug. > > > > -Krutika > > > > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta > > <gandalf.corvotempesta at gmail.com> wrote: > >> > >> This is a good news > >> Is this related to the previously fixed bug? > >> > >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha > >> scritto: > >>> > >>> So Raghavendra has an RCA for this issue. > >>> > >>> Copy-pasting his comment here: > >>> > >>> <RCA> > >>> > >>> Following is a rough algorithm of shard_writev: > >>> > >>> 1. Based on the offset, calculate the shards touched by current write. > >>> 2. Look for inodes corresponding to these shard files in itable. > >>> 3. If one or more inodes are missing from itable, issue mknod for > >>> corresponding shard files and ignore EEXIST in cbk. > >>> 4. resume writes on respective shards. > >>> > >>> Now, imagine a write which falls to an existing "shard_file". For the > >>> sake of discussion lets consider a distribute of three subvols - s1, > s2, s3 > >>> > >>> 1. "shard_file" hashes to subvolume s2 and is present on s2 > >>> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard" > >>> is fixed to include s4 and hash ranges are changed. > >>> 3. write that touches "shard_file" is issued. > >>> 4. The inode for "shard_file" is not present in itable after a graph > >>> switch and features/shard issues an mknod. > >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and > >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already > present on > >>> s2. > >>> > >>> So, we have two files on two different subvols of dht representing same > >>> shard and this will lead to corruption. > >>> > >>> </RCA> > >>> > >>> Raghavendra will be sending out a patch in DHT to fix this issue. > >>> > >>> -Krutika > >>> > >>> > >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri > >>> <pkarampu at redhat.com> wrote: > >>>> > >>>> > >>>> > >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan < > mahdi.adnan at outlook.com> > >>>> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> > >>>>> Do you guys have any update regarding this issue ? > >>>> > >>>> I do not actively work on this issue so I do not have an accurate > >>>> update, but from what I heard from Krutika and Raghavendra(works on > DHT) is: > >>>> Krutika debugged initially and found that the issue seems more likely > to be > >>>> in DHT, Satheesaran who helped us recreate this issue in lab found > that just > >>>> fix-layout without rebalance also caused the corruption 1 out of 3 > times. > >>>> Raghavendra came up with a possible RCA for why this can happen. > >>>> Raghavendra(CCed) would be the right person to provide accurate > update. > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>>> Respectfully > >>>>> Mahdi A. Mahdi > >>>>> > >>>>> ________________________________ > >>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM > >>>>> To: Mahdi Adnan > >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; > >>>>> gluster-users at gluster.org List > >>>>> > >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>> > >>>>> Hi, > >>>>> > >>>>> So it looks like Satheesaran managed to recreate this issue. We will > be > >>>>> seeking his help in debugging this. It will be easier that way. > >>>>> > >>>>> -Krutika > >>>>> > >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan < > mahdi.adnan at outlook.com> > >>>>> wrote: > >>>>>> > >>>>>> Hello and thank you for your email. > >>>>>> Actually no, i didn't check the gfid of the vms. > >>>>>> If this will help, i can setup a new test cluster and get all the > data > >>>>>> you need. > >>>>>> > >>>>>> Get Outlook for Android > >>>>>> > >>>>>> > >>>>>> From: Nithya Balachandran > >>>>>> Sent: Monday, March 20, 20:57 > >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>>> To: Krutika Dhananjay > >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, > >>>>>> gluster-users at gluster.org List > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Do you know the GFIDs of the VM images which were corrupted? > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Nithya > >>>>>> > >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> > >>>>>> wrote: > >>>>>> > >>>>>> I looked at the logs. > >>>>>> > >>>>>> From the time the new graph (since the add-brick command you shared > >>>>>> where bricks 41 through 44 are added) is switched to (line 3011 > onwards in > >>>>>> nfs-gfapi.log), I see the following kinds of errors: > >>>>>> > >>>>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas > >>>>>> which protocol/client converts to ESTALE. I am guessing these > entries got > >>>>>> migrated to > >>>>>> > >>>>>> other subvolumes leading to 'No such file or directory' errors. > >>>>>> > >>>>>> DHT and thereafter shard get the same error code and log the > >>>>>> following: > >>>>>> > >>>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] > >>>>>> [dht-helper.c:1198:dht_migration_complete_check_task] > 17-vmware2-dht: > >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup > the file > >>>>>> on vmware2-dht [Stale file handle] > >>>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] > >>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: > >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] > >>>>>> > >>>>>> which is fine. > >>>>>> > >>>>>> 2. The other kind are from AFR logging of possible split-brain > which I > >>>>>> suppose are harmless too. > >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] > >>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: > Unreadable > >>>>>> subvolume -1 found with event generation 2 for gfid > >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) > >>>>>> > >>>>>> Since you are saying the bug is hit only on VMs that are undergoing > IO > >>>>>> while rebalance is running (as opposed to those that remained > powered off), > >>>>>> > >>>>>> rebalance + IO could be causing some issues. > >>>>>> > >>>>>> CC'ing DHT devs > >>>>>> > >>>>>> Raghavendra/Nithya/Susant, > >>>>>> > >>>>>> Could you take a look? > >>>>>> > >>>>>> -Krutika > >>>>>> > >>>>>> > >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan < > mahdi.adnan at outlook.com> > >>>>>> wrote: > >>>>>> > >>>>>> Thank you for your email mate. > >>>>>> > >>>>>> Yes, im aware of this but, to save costs i chose replica 2, this > >>>>>> cluster is all flash. > >>>>>> > >>>>>> In version 3.7.x i had issues with ping timeout, if one hosts went > >>>>>> down for few seconds the whole cluster hangs and become > unavailable, to > >>>>>> avoid this i adjusted the ping timeout to 5 seconds. > >>>>>> > >>>>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster > >>>>>> (FUSE or gfapi) im stuck with NFS for this volume. > >>>>>> > >>>>>> The other volume is mounted using gfapi in oVirt cluster. > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Respectfully > >>>>>> Mahdi A. Mahdi > >>>>>> > >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM > >>>>>> > >>>>>> To: Mahdi Adnan > >>>>>> Cc: gluster-users at gluster.org > >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>>> > >>>>>> > >>>>>> > >>>>>> While I'm still going through the logs, just wanted to point out a > >>>>>> couple of things: > >>>>>> > >>>>>> 1. It is recommended that you use 3-way replication (replica count > 3) > >>>>>> for VM store use case > >>>>>> > >>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please change > it > >>>>>> to 30. > >>>>>> > >>>>>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? > >>>>>> > >>>>>> Will get back with anything else I might find or more questions if I > >>>>>> have any. > >>>>>> > >>>>>> -Krutika > >>>>>> > >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan < > mahdi.adnan at outlook.com> > >>>>>> wrote: > >>>>>> > >>>>>> Thanks mate, > >>>>>> > >>>>>> Kindly, check the attachment. > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Respectfully > >>>>>> Mahdi A. Mahdi > >>>>>> > >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM > >>>>>> > >>>>>> To: Mahdi Adnan > >>>>>> Cc: gluster-users at gluster.org > >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>>> > >>>>>> > >>>>>> > >>>>>> In that case could you share the ganesha-gfapi logs? > >>>>>> > >>>>>> -Krutika > >>>>>> > >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan > >>>>>> <mahdi.adnan at outlook.com> wrote: > >>>>>> > >>>>>> I have two volumes, one is mounted using libgfapi for ovirt mount, > the > >>>>>> other one is exported via NFS-Ganesha for VMWare which is the one > im testing > >>>>>> now. > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Respectfully > >>>>>> Mahdi A. Mahdi > >>>>>> > >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM > >>>>>> > >>>>>> To: Mahdi Adnan > >>>>>> Cc: gluster-users at gluster.org > >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan > >>>>>> <mahdi.adnan at outlook.com> wrote: > >>>>>> > >>>>>> Kindly, check the attached new log file, i dont know if it's helpful > >>>>>> or not but, i couldn't find the log with the name you just > described. > >>>>>> > >>>>>> > >>>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is > it > >>>>>> NFS? > >>>>>> > >>>>>> > >>>>>> > >>>>>> -Krutika > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Respectfully > >>>>>> Mahdi A. Mahdi > >>>>>> > >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM > >>>>>> > >>>>>> To: Mahdi Adnan > >>>>>> Cc: gluster-users at gluster.org > >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>>> > >>>>>> > >>>>>> > >>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the > >>>>>> fuse mount logs? It should be right under /var/log/glusterfs/ > directory > >>>>>> > >>>>>> named after the mount point name, only hyphenated. > >>>>>> > >>>>>> -Krutika > >>>>>> > >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan < > mahdi.adnan at outlook.com> > >>>>>> wrote: > >>>>>> > >>>>>> Hello Krutika, > >>>>>> > >>>>>> Kindly, check the attached logs. > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Respectfully > >>>>>> Mahdi A. Mahdi > >>>>>> > >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>>> > >>>>>> > >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM > >>>>>> To: Mahdi Adnan > >>>>>> Cc: gluster-users at gluster.org > >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > >>>>>> > >>>>>> > >>>>>> > >>>>>> Hi Mahdi, > >>>>>> > >>>>>> Could you attach mount, brick and rebalance logs? > >>>>>> > >>>>>> -Krutika > >>>>>> > >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan > >>>>>> <mahdi.adnan at outlook.com> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick > >>>>>> procedure in a volume contains few VMs. > >>>>>> > >>>>>> After the completion of rebalance, i have rebooted the VMs, some of > >>>>>> ran just fine, and others just crashed. > >>>>>> > >>>>>> Windows boot to recovery mode and Linux throw xfs errors and does > not > >>>>>> boot. > >>>>>> > >>>>>> I ran the test again and it happened just as the first one, but i > have > >>>>>> noticed only VMs doing disk IOs are affected by this bug. > >>>>>> > >>>>>> The VMs in power off mode started fine and even md5 of the disk file > >>>>>> did not change after the rebalance. > >>>>>> > >>>>>> anyone else can confirm this ? > >>>>>> > >>>>>> Volume info: > >>>>>> > >>>>>> > >>>>>> > >>>>>> Volume Name: vmware2 > >>>>>> > >>>>>> Type: Distributed-Replicate > >>>>>> > >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf > >>>>>> > >>>>>> Status: Started > >>>>>> > >>>>>> Snapshot Count: 0 > >>>>>> > >>>>>> Number of Bricks: 22 x 2 = 44 > >>>>>> > >>>>>> Transport-type: tcp > >>>>>> > >>>>>> Bricks: > >>>>>> > >>>>>> Brick1: gluster01:/mnt/disk1/vmware2 > >>>>>> > >>>>>> Brick2: gluster03:/mnt/disk1/vmware2 > >>>>>> > >>>>>> Brick3: gluster02:/mnt/disk1/vmware2 > >>>>>> > >>>>>> Brick4: gluster04:/mnt/disk1/vmware2 > >>>>>> > >>>>>> Brick5: gluster01:/mnt/disk2/vmware2 > >>>>>> > >>>>>> Brick6: gluster03:/mnt/disk2/vmware2 > >>>>>> > >>>>>> Brick7: gluster02:/mnt/disk2/vmware2 > >>>>>> > >>>>>> Brick8: gluster04:/mnt/disk2/vmware2 > >>>>>> > >>>>>> Brick9: gluster01:/mnt/disk3/vmware2 > >>>>>> > >>>>>> Brick10: gluster03:/mnt/disk3/vmware2 > >>>>>> > >>>>>> Brick11: gluster02:/mnt/disk3/vmware2 > >>>>>> > >>>>>> Brick12: gluster04:/mnt/disk3/vmware2 > >>>>>> > >>>>>> Brick13: gluster01:/mnt/disk4/vmware2 > >>>>>> > >>>>>> Brick14: gluster03:/mnt/disk4/vmware2 > >>>>>> > >>>>>> Brick15: gluster02:/mnt/disk4/vmware2 > >>>>>> > >>>>>> Brick16: gluster04:/mnt/disk4/vmware2 > >>>>>> > >>>>>> Brick17: gluster01:/mnt/disk5/vmware2 > >>>>>> > >>>>>> Brick18: gluster03:/mnt/disk5/vmware2 > >>>>>> > >>>>>> Brick19: gluster02:/mnt/disk5/vmware2 > >>>>>> > >>>>>> Brick20: gluster04:/mnt/disk5/vmware2 > >>>>>> > >>>>>> Brick21: gluster01:/mnt/disk6/vmware2 > >>>>>> > >>>>>> Brick22: gluster03:/mnt/disk6/vmware2 > >>>>>> > >>>>>> Brick23: gluster02:/mnt/disk6/vmware2 > >>>>>> > >>>>>> Brick24: gluster04:/mnt/disk6/vmware2 > >>>>>> > >>>>>> Brick25: gluster01:/mnt/disk7/vmware2 > >>>>>> > >>>>>> Brick26: gluster03:/mnt/disk7/vmware2 > >>>>>> > >>>>>> Brick27: gluster02:/mnt/disk7/vmware2 > >>>>>> > >>>>>> Brick28: gluster04:/mnt/disk7/vmware2 > >>>>>> > >>>>>> Brick29: gluster01:/mnt/disk8/vmware2 > >>>>>> > >>>>>> Brick30: gluster03:/mnt/disk8/vmware2 > >>>>>> > >>>>>> Brick31: gluster02:/mnt/disk8/vmware2 > >>>>>> > >>>>>> Brick32: gluster04:/mnt/disk8/vmware2 > >>>>>> > >>>>>> Brick33: gluster01:/mnt/disk9/vmware2 > >>>>>> > >>>>>> Brick34: gluster03:/mnt/disk9/vmware2 > >>>>>> > >>>>>> Brick35: gluster02:/mnt/disk9/vmware2 > >>>>>> > >>>>>> Brick36: gluster04:/mnt/disk9/vmware2 > >>>>>> > >>>>>> Brick37: gluster01:/mnt/disk10/vmware2 > >>>>>> > >>>>>> Brick38: gluster03:/mnt/disk10/vmware2 > >>>>>> > >>>>>> Brick39: gluster02:/mnt/disk10/vmware2 > >>>>>> > >>>>>> Brick40: gluster04:/mnt/disk10/vmware2 > >>>>>> > >>>>>> Brick41: gluster01:/mnt/disk11/vmware2 > >>>>>> > >>>>>> Brick42: gluster03:/mnt/disk11/vmware2 > >>>>>> > >>>>>> Brick43: gluster02:/mnt/disk11/vmware2 > >>>>>> > >>>>>> Brick44: gluster04:/mnt/disk11/vmware2 > >>>>>> > >>>>>> Options Reconfigured: > >>>>>> > >>>>>> cluster.server-quorum-type: server > >>>>>> > >>>>>> nfs.disable: on > >>>>>> > >>>>>> performance.readdir-ahead: on > >>>>>> > >>>>>> transport.address-family: inet > >>>>>> > >>>>>> performance.quick-read: off > >>>>>> > >>>>>> performance.read-ahead: off > >>>>>> > >>>>>> performance.io-cache: off > >>>>>> > >>>>>> performance.stat-prefetch: off > >>>>>> > >>>>>> cluster.eager-lock: enable > >>>>>> > >>>>>> network.remote-dio: enable > >>>>>> > >>>>>> features.shard: on > >>>>>> > >>>>>> cluster.data-self-heal-algorithm: full > >>>>>> > >>>>>> features.cache-invalidation: on > >>>>>> > >>>>>> ganesha.enable: on > >>>>>> > >>>>>> features.shard-block-size: 256MB > >>>>>> > >>>>>> client.event-threads: 2 > >>>>>> > >>>>>> server.event-threads: 2 > >>>>>> > >>>>>> cluster.favorite-child-policy: size > >>>>>> > >>>>>> storage.build-pgfid: off > >>>>>> > >>>>>> network.ping-timeout: 5 > >>>>>> > >>>>>> cluster.enable-shared-storage: enable > >>>>>> > >>>>>> nfs-ganesha: enable > >>>>>> > >>>>>> cluster.server-quorum-ratio: 51% > >>>>>> > >>>>>> Adding bricks: > >>>>>> > >>>>>> gluster volume add-brick vmware2 replica 2 > >>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 > >>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 > >>>>>> > >>>>>> starting fix layout: > >>>>>> > >>>>>> gluster volume rebalance vmware2 fix-layout start > >>>>>> > >>>>>> Starting rebalance: > >>>>>> > >>>>>> gluster volume rebalance vmware2 start > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> Respectfully > >>>>>> Mahdi A. Mahdi > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Gluster-users mailing list > >>>>>> Gluster-users at gluster.org > >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Gluster-users mailing list > >>>>> Gluster-users at gluster.org > >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Pranith > >>> > >>> > >>> > >>> _______________________________________________ > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> http://lists.gluster.org/mailman/listinfo/gluster-users > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/cdcc5928/attachment.html>