Gandalf Corvotempesta
2017-Apr-03 11:33 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
This is a good news Is this related to the previously fixed bug? Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha scritto:> So Raghavendra has an RCA for this issue. > > Copy-pasting his comment here: > > <RCA> > > Following is a rough algorithm of shard_writev: > > 1. Based on the offset, calculate the shards touched by current write. > 2. Look for inodes corresponding to these shard files in itable. > 3. If one or more inodes are missing from itable, issue mknod for corresponding shard files and ignore EEXIST in cbk. > 4. resume writes on respective shards. > > Now, imagine a write which falls to an existing "shard_file". For the sake of discussion lets consider a distribute of three subvols - s1, s2, s3 > > 1. "shard_file" hashes to subvolume s2 and is present on s2 > 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard" is fixed to include s4 and hash ranges are changed. > 3. write that touches "shard_file" is issued. > 4. The inode for "shard_file" is not present in itable after a graph switch and features/shard issues an mknod. > 5. With new layout of .shard, lets say "shard_file" hashes to s3 and mknod (shard_file) on s3 succeeds. But, the shard_file is already present on s2. > > So, we have two files on two different subvols of dht representing same shard and this will lead to corruption. > > </RCA> > > Raghavendra will be sending out a patch in DHT to fix this issue. > > -Krutika > > > On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri < > pkarampu at redhat.com> wrote: > >> >> >> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >>> Hi, >>> >>> >>> Do you guys have any update regarding this issue ? >>> >> I do not actively work on this issue so I do not have an accurate update, >> but from what I heard from Krutika and Raghavendra(works on DHT) is: >> Krutika debugged initially and found that the issue seems more likely to be >> in DHT, Satheesaran who helped us recreate this issue in lab found that >> just fix-layout without rebalance also caused the corruption 1 out of 3 >> times. Raghavendra came up with a possible RCA for why this can happen. >> Raghavendra(CCed) would be the right person to provide accurate update. >> >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> ------------------------------ >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM >>> *To:* Mahdi Adnan >>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >>> gluster-users at gluster.org List >>> >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> Hi, >>> >>> So it looks like Satheesaran managed to recreate this issue. We will be >>> seeking his help in debugging this. It will be easier that way. >>> >>> -Krutika >>> >>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>>> Hello and thank you for your email. >>>> Actually no, i didn't check the gfid of the vms. >>>> If this will help, i can setup a new test cluster and get all the data >>>> you need. >>>> >>>> Get Outlook for Android <https://aka.ms/ghei36> >>>> >>>> From: Nithya Balachandran >>>> Sent: Monday, March 20, 20:57 >>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> To: Krutika Dhananjay >>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >>>> gluster-users at gluster.org List >>>> >>>> Hi, >>>> >>>> Do you know the GFIDs of the VM images which were corrupted? >>>> >>>> Regards, >>>> >>>> Nithya >>>> >>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> >>>> wrote: >>>> >>>> I looked at the logs. >>>> >>>> From the time the new graph (since the add-brick command you shared >>>> where bricks 41 through 44 are added) is switched to (line 3011 onwards in >>>> nfs-gfapi.log), I see the following kinds of errors: >>>> >>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas >>>> which protocol/client converts to ESTALE. I am guessing these entries got >>>> migrated to >>>> >>>> other subvolumes leading to 'No such file or directory' errors. >>>> >>>> DHT and thereafter shard get the same error code and log the following: >>>> >>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >>>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht: >>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup the >>>> file on vmware2-dht [Stale file handle] >>>> >>>> >>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: >>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >>>> >>>> which is fine. >>>> >>>> 2. The other kind are from AFR logging of possible split-brain which I >>>> suppose are harmless too. >>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable >>>> subvolume -1 found with event generation 2 for gfid >>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >>>> >>>> Since you are saying the bug is hit only on VMs that are undergoing IO >>>> while rebalance is running (as opposed to those that remained powered off), >>>> >>>> rebalance + IO could be causing some issues. >>>> >>>> CC'ing DHT devs >>>> >>>> Raghavendra/Nithya/Susant, >>>> >>>> Could you take a look? >>>> >>>> -Krutika >>>> >>>> >>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>> Thank you for your email mate. >>>> >>>> Yes, im aware of this but, to save costs i chose replica 2, this >>>> cluster is all flash. >>>> >>>> In version 3.7.x i had issues with ping timeout, if one hosts went down >>>> for few seconds the whole cluster hangs and become unavailable, to avoid >>>> this i adjusted the ping timeout to 5 seconds. >>>> >>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster >>>> (FUSE or gfapi) im stuck with NFS for this volume. >>>> >>>> The other volume is mounted using gfapi in oVirt cluster. >>>> >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Sunday, March 19, 2017 2:01:49 PM >>>> >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-users at gluster.org >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> >>>> >>>> While I'm still going through the logs, just wanted to point out a >>>> couple of things: >>>> >>>> 1. It is recommended that you use 3-way replication (replica count 3) >>>> for VM store use case >>>> >>>> 2. network.ping-timeout at 5 seconds is way too low. Please change it >>>> to 30. >>>> >>>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? >>>> >>>> Will get back with anything else I might find or more questions if I >>>> have any. >>>> >>>> -Krutika >>>> >>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>> Thanks mate, >>>> >>>> Kindly, check the attachment. >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Sunday, March 19, 2017 10:00:22 AM >>>> >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-users at gluster.org >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> >>>> >>>> In that case could you share the ganesha-gfapi logs? >>>> >>>> -Krutika >>>> >>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>> I have two volumes, one is mounted using libgfapi for ovirt mount, the >>>> other one is exported via NFS-Ganesha for VMWare which is the one im >>>> testing now. >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM >>>> >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-users at gluster.org >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> >>>> >>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>> Kindly, check the attached new log file, i dont know if it's helpful or >>>> not but, i couldn't find the log with the name you just described. >>>> >>>> >>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it >>>> NFS? >>>> >>>> >>>> >>>> -Krutika >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM >>>> >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-users at gluster.org >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> >>>> >>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the >>>> fuse mount logs? It should be right under /var/log/glusterfs/ directory >>>> >>>> named after the mount point name, only hyphenated. >>>> >>>> -Krutika >>>> >>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>> Hello Krutika, >>>> >>>> Kindly, check the attached logs. >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> >>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM >>>> *To:* Mahdi Adnan >>>> *Cc:* gluster-users at gluster.org >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> >>>> >>>> Hi Mahdi, >>>> >>>> Could you attach mount, brick and rebalance logs? >>>> >>>> -Krutika >>>> >>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>> Hi, >>>> >>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure >>>> in a volume contains few VMs. >>>> >>>> After the completion of rebalance, i have rebooted the VMs, some of ran >>>> just fine, and others just crashed. >>>> >>>> Windows boot to recovery mode and Linux throw xfs errors and does not >>>> boot. >>>> >>>> I ran the test again and it happened just as the first one, but i have >>>> noticed only VMs doing disk IOs are affected by this bug. >>>> >>>> The VMs in power off mode started fine and even md5 of the disk file >>>> did not change after the rebalance. >>>> >>>> anyone else can confirm this ? >>>> >>>> Volume info: >>>> >>>> >>>> >>>> Volume Name: vmware2 >>>> >>>> Type: Distributed-Replicate >>>> >>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>>> >>>> Status: Started >>>> >>>> Snapshot Count: 0 >>>> >>>> Number of Bricks: 22 x 2 = 44 >>>> >>>> Transport-type: tcp >>>> >>>> Bricks: >>>> >>>> Brick1: gluster01:/mnt/disk1/vmware2 >>>> >>>> Brick2: gluster03:/mnt/disk1/vmware2 >>>> >>>> Brick3: gluster02:/mnt/disk1/vmware2 >>>> >>>> Brick4: gluster04:/mnt/disk1/vmware2 >>>> >>>> Brick5: gluster01:/mnt/disk2/vmware2 >>>> >>>> Brick6: gluster03:/mnt/disk2/vmware2 >>>> >>>> Brick7: gluster02:/mnt/disk2/vmware2 >>>> >>>> Brick8: gluster04:/mnt/disk2/vmware2 >>>> >>>> Brick9: gluster01:/mnt/disk3/vmware2 >>>> >>>> Brick10: gluster03:/mnt/disk3/vmware2 >>>> >>>> Brick11: gluster02:/mnt/disk3/vmware2 >>>> >>>> Brick12: gluster04:/mnt/disk3/vmware2 >>>> >>>> Brick13: gluster01:/mnt/disk4/vmware2 >>>> >>>> Brick14: gluster03:/mnt/disk4/vmware2 >>>> >>>> Brick15: gluster02:/mnt/disk4/vmware2 >>>> >>>> Brick16: gluster04:/mnt/disk4/vmware2 >>>> >>>> Brick17: gluster01:/mnt/disk5/vmware2 >>>> >>>> Brick18: gluster03:/mnt/disk5/vmware2 >>>> >>>> Brick19: gluster02:/mnt/disk5/vmware2 >>>> >>>> Brick20: gluster04:/mnt/disk5/vmware2 >>>> >>>> Brick21: gluster01:/mnt/disk6/vmware2 >>>> >>>> Brick22: gluster03:/mnt/disk6/vmware2 >>>> >>>> Brick23: gluster02:/mnt/disk6/vmware2 >>>> >>>> Brick24: gluster04:/mnt/disk6/vmware2 >>>> >>>> Brick25: gluster01:/mnt/disk7/vmware2 >>>> >>>> Brick26: gluster03:/mnt/disk7/vmware2 >>>> >>>> Brick27: gluster02:/mnt/disk7/vmware2 >>>> >>>> Brick28: gluster04:/mnt/disk7/vmware2 >>>> >>>> Brick29: gluster01:/mnt/disk8/vmware2 >>>> >>>> Brick30: gluster03:/mnt/disk8/vmware2 >>>> >>>> Brick31: gluster02:/mnt/disk8/vmware2 >>>> >>>> Brick32: gluster04:/mnt/disk8/vmware2 >>>> >>>> Brick33: gluster01:/mnt/disk9/vmware2 >>>> >>>> Brick34: gluster03:/mnt/disk9/vmware2 >>>> >>>> Brick35: gluster02:/mnt/disk9/vmware2 >>>> >>>> Brick36: gluster04:/mnt/disk9/vmware2 >>>> >>>> Brick37: gluster01:/mnt/disk10/vmware2 >>>> >>>> Brick38: gluster03:/mnt/disk10/vmware2 >>>> >>>> Brick39: gluster02:/mnt/disk10/vmware2 >>>> >>>> Brick40: gluster04:/mnt/disk10/vmware2 >>>> >>>> Brick41: gluster01:/mnt/disk11/vmware2 >>>> >>>> Brick42: gluster03:/mnt/disk11/vmware2 >>>> >>>> Brick43: gluster02:/mnt/disk11/vmware2 >>>> >>>> Brick44: gluster04:/mnt/disk11/vmware2 >>>> >>>> Options Reconfigured: >>>> >>>> cluster.server-quorum-type: server >>>> >>>> nfs.disable: on >>>> >>>> performance.readdir-ahead: on >>>> >>>> transport.address-family: inet >>>> >>>> performance.quick-read: off >>>> >>>> performance.read-ahead: off >>>> >>>> performance.io-cache: off >>>> >>>> performance.stat-prefetch: off >>>> >>>> cluster.eager-lock: enable >>>> >>>> network.remote-dio: enable >>>> >>>> features.shard: on >>>> >>>> cluster.data-self-heal-algorithm: full >>>> >>>> features.cache-invalidation: on >>>> >>>> ganesha.enable: on >>>> >>>> features.shard-block-size: 256MB >>>> >>>> client.event-threads: 2 >>>> >>>> server.event-threads: 2 >>>> >>>> cluster.favorite-child-policy: size >>>> >>>> storage.build-pgfid: off >>>> >>>> network.ping-timeout: 5 >>>> >>>> cluster.enable-shared-storage: enable >>>> >>>> nfs-ganesha: enable >>>> >>>> cluster.server-quorum-ratio: 51% >>>> >>>> Adding bricks: >>>> >>>> gluster volume add-brick vmware2 replica 2 >>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >>>> >>>> starting fix layout: >>>> >>>> gluster volume rebalance vmware2 fix-layout start >>>> >>>> Starting rebalance: >>>> >>>> gluster volume rebalance vmware2 start >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> >> -- >> Pranith >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170403/43ec3cb0/attachment.html>
Krutika Dhananjay
2017-Apr-04 14:11 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Nope. This is a different bug. -Krutika On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta < gandalf.corvotempesta at gmail.com> wrote:> This is a good news > Is this related to the previously fixed bug? > > Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha > scritto: > >> So Raghavendra has an RCA for this issue. >> >> Copy-pasting his comment here: >> >> <RCA> >> >> Following is a rough algorithm of shard_writev: >> >> 1. Based on the offset, calculate the shards touched by current write. >> 2. Look for inodes corresponding to these shard files in itable. >> 3. If one or more inodes are missing from itable, issue mknod for corresponding shard files and ignore EEXIST in cbk. >> 4. resume writes on respective shards. >> >> Now, imagine a write which falls to an existing "shard_file". For the sake of discussion lets consider a distribute of three subvols - s1, s2, s3 >> >> 1. "shard_file" hashes to subvolume s2 and is present on s2 >> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard" is fixed to include s4 and hash ranges are changed. >> 3. write that touches "shard_file" is issued. >> 4. The inode for "shard_file" is not present in itable after a graph switch and features/shard issues an mknod. >> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and mknod (shard_file) on s3 succeeds. But, the shard_file is already present on s2. >> >> So, we have two files on two different subvols of dht representing same shard and this will lead to corruption. >> >> </RCA> >> >> Raghavendra will be sending out a patch in DHT to fix this issue. >> >> -Krutika >> >> >> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri < >> pkarampu at redhat.com> wrote: >> >>> >>> >>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> Do you guys have any update regarding this issue ? >>>> >>> I do not actively work on this issue so I do not have an accurate >>> update, but from what I heard from Krutika and Raghavendra(works on DHT) >>> is: Krutika debugged initially and found that the issue seems more likely >>> to be in DHT, Satheesaran who helped us recreate this issue in lab found >>> that just fix-layout without rebalance also caused the corruption 1 out of >>> 3 times. Raghavendra came up with a possible RCA for why this can happen. >>> Raghavendra(CCed) would be the right person to provide accurate update. >>> >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> ------------------------------ >>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM >>>> *To:* Mahdi Adnan >>>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >>>> gluster-users at gluster.org List >>>> >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> Hi, >>>> >>>> So it looks like Satheesaran managed to recreate this issue. We will be >>>> seeking his help in debugging this. It will be easier that way. >>>> >>>> -Krutika >>>> >>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>> wrote: >>>> >>>>> Hello and thank you for your email. >>>>> Actually no, i didn't check the gfid of the vms. >>>>> If this will help, i can setup a new test cluster and get all the data >>>>> you need. >>>>> >>>>> Get Outlook for Android <https://aka.ms/ghei36> >>>>> >>>>> From: Nithya Balachandran >>>>> Sent: Monday, March 20, 20:57 >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> To: Krutika Dhananjay >>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >>>>> gluster-users at gluster.org List >>>>> >>>>> Hi, >>>>> >>>>> Do you know the GFIDs of the VM images which were corrupted? >>>>> >>>>> Regards, >>>>> >>>>> Nithya >>>>> >>>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> >>>>> wrote: >>>>> >>>>> I looked at the logs. >>>>> >>>>> From the time the new graph (since the add-brick command you shared >>>>> where bricks 41 through 44 are added) is switched to (line 3011 onwards in >>>>> nfs-gfapi.log), I see the following kinds of errors: >>>>> >>>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas >>>>> which protocol/client converts to ESTALE. I am guessing these entries got >>>>> migrated to >>>>> >>>>> other subvolumes leading to 'No such file or directory' errors. >>>>> >>>>> DHT and thereafter shard get the same error code and log the following: >>>>> >>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >>>>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht: >>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup the >>>>> file on vmware2-dht [Stale file handle] >>>>> >>>>> >>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: >>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >>>>> >>>>> which is fine. >>>>> >>>>> 2. The other kind are from AFR logging of possible split-brain which I >>>>> suppose are harmless too. >>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable >>>>> subvolume -1 found with event generation 2 for gfid >>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >>>>> >>>>> Since you are saying the bug is hit only on VMs that are undergoing IO >>>>> while rebalance is running (as opposed to those that remained powered off), >>>>> >>>>> rebalance + IO could be causing some issues. >>>>> >>>>> CC'ing DHT devs >>>>> >>>>> Raghavendra/Nithya/Susant, >>>>> >>>>> Could you take a look? >>>>> >>>>> -Krutika >>>>> >>>>> >>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>> Thank you for your email mate. >>>>> >>>>> Yes, im aware of this but, to save costs i chose replica 2, this >>>>> cluster is all flash. >>>>> >>>>> In version 3.7.x i had issues with ping timeout, if one hosts went >>>>> down for few seconds the whole cluster hangs and become unavailable, to >>>>> avoid this i adjusted the ping timeout to 5 seconds. >>>>> >>>>> As for choosing Ganesha over gfapi, VMWare does not support Gluster >>>>> (FUSE or gfapi) im stuck with NFS for this volume. >>>>> >>>>> The other volume is mounted using gfapi in oVirt cluster. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> *Sent:* Sunday, March 19, 2017 2:01:49 PM >>>>> >>>>> *To:* Mahdi Adnan >>>>> *Cc:* gluster-users at gluster.org >>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> >>>>> >>>>> >>>>> While I'm still going through the logs, just wanted to point out a >>>>> couple of things: >>>>> >>>>> 1. It is recommended that you use 3-way replication (replica count 3) >>>>> for VM store use case >>>>> >>>>> 2. network.ping-timeout at 5 seconds is way too low. Please change it >>>>> to 30. >>>>> >>>>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? >>>>> >>>>> Will get back with anything else I might find or more questions if I >>>>> have any. >>>>> >>>>> -Krutika >>>>> >>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>> Thanks mate, >>>>> >>>>> Kindly, check the attachment. >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> *Sent:* Sunday, March 19, 2017 10:00:22 AM >>>>> >>>>> *To:* Mahdi Adnan >>>>> *Cc:* gluster-users at gluster.org >>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> >>>>> >>>>> >>>>> In that case could you share the ganesha-gfapi logs? >>>>> >>>>> -Krutika >>>>> >>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>> I have two volumes, one is mounted using libgfapi for ovirt mount, the >>>>> other one is exported via NFS-Ganesha for VMWare which is the one im >>>>> testing now. >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM >>>>> >>>>> *To:* Mahdi Adnan >>>>> *Cc:* gluster-users at gluster.org >>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> >>>>> >>>>> >>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>> Kindly, check the attached new log file, i dont know if it's helpful >>>>> or not but, i couldn't find the log with the name you just described. >>>>> >>>>> >>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it >>>>> NFS? >>>>> >>>>> >>>>> >>>>> -Krutika >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM >>>>> >>>>> *To:* Mahdi Adnan >>>>> *Cc:* gluster-users at gluster.org >>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> >>>>> >>>>> >>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the >>>>> fuse mount logs? It should be right under /var/log/glusterfs/ directory >>>>> >>>>> named after the mount point name, only hyphenated. >>>>> >>>>> -Krutika >>>>> >>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>> Hello Krutika, >>>>> >>>>> Kindly, check the attached logs. >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM >>>>> *To:* Mahdi Adnan >>>>> *Cc:* gluster-users at gluster.org >>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>>> >>>>> >>>>> >>>>> Hi Mahdi, >>>>> >>>>> Could you attach mount, brick and rebalance logs? >>>>> >>>>> -Krutika >>>>> >>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick >>>>> procedure in a volume contains few VMs. >>>>> >>>>> After the completion of rebalance, i have rebooted the VMs, some of >>>>> ran just fine, and others just crashed. >>>>> >>>>> Windows boot to recovery mode and Linux throw xfs errors and does not >>>>> boot. >>>>> >>>>> I ran the test again and it happened just as the first one, but i have >>>>> noticed only VMs doing disk IOs are affected by this bug. >>>>> >>>>> The VMs in power off mode started fine and even md5 of the disk file >>>>> did not change after the rebalance. >>>>> >>>>> anyone else can confirm this ? >>>>> >>>>> Volume info: >>>>> >>>>> >>>>> >>>>> Volume Name: vmware2 >>>>> >>>>> Type: Distributed-Replicate >>>>> >>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>>>> >>>>> Status: Started >>>>> >>>>> Snapshot Count: 0 >>>>> >>>>> Number of Bricks: 22 x 2 = 44 >>>>> >>>>> Transport-type: tcp >>>>> >>>>> Bricks: >>>>> >>>>> Brick1: gluster01:/mnt/disk1/vmware2 >>>>> >>>>> Brick2: gluster03:/mnt/disk1/vmware2 >>>>> >>>>> Brick3: gluster02:/mnt/disk1/vmware2 >>>>> >>>>> Brick4: gluster04:/mnt/disk1/vmware2 >>>>> >>>>> Brick5: gluster01:/mnt/disk2/vmware2 >>>>> >>>>> Brick6: gluster03:/mnt/disk2/vmware2 >>>>> >>>>> Brick7: gluster02:/mnt/disk2/vmware2 >>>>> >>>>> Brick8: gluster04:/mnt/disk2/vmware2 >>>>> >>>>> Brick9: gluster01:/mnt/disk3/vmware2 >>>>> >>>>> Brick10: gluster03:/mnt/disk3/vmware2 >>>>> >>>>> Brick11: gluster02:/mnt/disk3/vmware2 >>>>> >>>>> Brick12: gluster04:/mnt/disk3/vmware2 >>>>> >>>>> Brick13: gluster01:/mnt/disk4/vmware2 >>>>> >>>>> Brick14: gluster03:/mnt/disk4/vmware2 >>>>> >>>>> Brick15: gluster02:/mnt/disk4/vmware2 >>>>> >>>>> Brick16: gluster04:/mnt/disk4/vmware2 >>>>> >>>>> Brick17: gluster01:/mnt/disk5/vmware2 >>>>> >>>>> Brick18: gluster03:/mnt/disk5/vmware2 >>>>> >>>>> Brick19: gluster02:/mnt/disk5/vmware2 >>>>> >>>>> Brick20: gluster04:/mnt/disk5/vmware2 >>>>> >>>>> Brick21: gluster01:/mnt/disk6/vmware2 >>>>> >>>>> Brick22: gluster03:/mnt/disk6/vmware2 >>>>> >>>>> Brick23: gluster02:/mnt/disk6/vmware2 >>>>> >>>>> Brick24: gluster04:/mnt/disk6/vmware2 >>>>> >>>>> Brick25: gluster01:/mnt/disk7/vmware2 >>>>> >>>>> Brick26: gluster03:/mnt/disk7/vmware2 >>>>> >>>>> Brick27: gluster02:/mnt/disk7/vmware2 >>>>> >>>>> Brick28: gluster04:/mnt/disk7/vmware2 >>>>> >>>>> Brick29: gluster01:/mnt/disk8/vmware2 >>>>> >>>>> Brick30: gluster03:/mnt/disk8/vmware2 >>>>> >>>>> Brick31: gluster02:/mnt/disk8/vmware2 >>>>> >>>>> Brick32: gluster04:/mnt/disk8/vmware2 >>>>> >>>>> Brick33: gluster01:/mnt/disk9/vmware2 >>>>> >>>>> Brick34: gluster03:/mnt/disk9/vmware2 >>>>> >>>>> Brick35: gluster02:/mnt/disk9/vmware2 >>>>> >>>>> Brick36: gluster04:/mnt/disk9/vmware2 >>>>> >>>>> Brick37: gluster01:/mnt/disk10/vmware2 >>>>> >>>>> Brick38: gluster03:/mnt/disk10/vmware2 >>>>> >>>>> Brick39: gluster02:/mnt/disk10/vmware2 >>>>> >>>>> Brick40: gluster04:/mnt/disk10/vmware2 >>>>> >>>>> Brick41: gluster01:/mnt/disk11/vmware2 >>>>> >>>>> Brick42: gluster03:/mnt/disk11/vmware2 >>>>> >>>>> Brick43: gluster02:/mnt/disk11/vmware2 >>>>> >>>>> Brick44: gluster04:/mnt/disk11/vmware2 >>>>> >>>>> Options Reconfigured: >>>>> >>>>> cluster.server-quorum-type: server >>>>> >>>>> nfs.disable: on >>>>> >>>>> performance.readdir-ahead: on >>>>> >>>>> transport.address-family: inet >>>>> >>>>> performance.quick-read: off >>>>> >>>>> performance.read-ahead: off >>>>> >>>>> performance.io-cache: off >>>>> >>>>> performance.stat-prefetch: off >>>>> >>>>> cluster.eager-lock: enable >>>>> >>>>> network.remote-dio: enable >>>>> >>>>> features.shard: on >>>>> >>>>> cluster.data-self-heal-algorithm: full >>>>> >>>>> features.cache-invalidation: on >>>>> >>>>> ganesha.enable: on >>>>> >>>>> features.shard-block-size: 256MB >>>>> >>>>> client.event-threads: 2 >>>>> >>>>> server.event-threads: 2 >>>>> >>>>> cluster.favorite-child-policy: size >>>>> >>>>> storage.build-pgfid: off >>>>> >>>>> network.ping-timeout: 5 >>>>> >>>>> cluster.enable-shared-storage: enable >>>>> >>>>> nfs-ganesha: enable >>>>> >>>>> cluster.server-quorum-ratio: 51% >>>>> >>>>> Adding bricks: >>>>> >>>>> gluster volume add-brick vmware2 replica 2 >>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >>>>> >>>>> starting fix layout: >>>>> >>>>> gluster volume rebalance vmware2 fix-layout start >>>>> >>>>> Starting rebalance: >>>>> >>>>> gluster volume rebalance vmware2 start >>>>> >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> >>> >>> >>> -- >>> Pranith >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170404/602306a4/attachment.html>