Serkan Çoban
2017-Apr-27 11:21 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
I think this is he fix Gandalf asking for: https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec47e1ba27ce On Thu, Apr 27, 2017 at 2:03 PM, Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:> I am very positive about the two things I told you. These are the latest > things that happened for VM corruption with rebalance. > > On Thu, Apr 27, 2017 at 4:30 PM, Gandalf Corvotempesta > <gandalf.corvotempesta at gmail.com> wrote: >> >> I think we are talking about a different bug. >> >> Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" <pkarampu at redhat.com> >> ha scritto: >>> >>> I am not a DHT developer, so some of what I say could be a little wrong. >>> But this is what I gather. >>> I think they found 2 classes of bugs in dht >>> 1) Graceful fop failover when rebalance is in progress is missing for >>> some fops, that lead to VM pause. >>> >>> I see that https://review.gluster.org/17085 got merged on 24th on master >>> for this. I see patches are posted for 3.8.x for this one. >>> >>> 2) I think there is some work needs to be done for dht_[f]xattrop. I >>> believe this is the next step that is underway. >>> >>> >>> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta >>> <gandalf.corvotempesta at gmail.com> wrote: >>>> >>>> Updates on this critical bug ? >>>> >>>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" >>>> <gandalf.corvotempesta at gmail.com> ha scritto: >>>>> >>>>> Any update ? >>>>> In addition, if this is a different bug but the "workflow" is the same >>>>> as the previous one, how is possible that fixing the previous bug >>>>> triggered this new one ? >>>>> >>>>> Is possible to have some details ? >>>>> >>>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>: >>>>> > Nope. This is a different bug. >>>>> > >>>>> > -Krutika >>>>> > >>>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta >>>>> > <gandalf.corvotempesta at gmail.com> wrote: >>>>> >> >>>>> >> This is a good news >>>>> >> Is this related to the previously fixed bug? >>>>> >> >>>>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> ha >>>>> >> scritto: >>>>> >>> >>>>> >>> So Raghavendra has an RCA for this issue. >>>>> >>> >>>>> >>> Copy-pasting his comment here: >>>>> >>> >>>>> >>> <RCA> >>>>> >>> >>>>> >>> Following is a rough algorithm of shard_writev: >>>>> >>> >>>>> >>> 1. Based on the offset, calculate the shards touched by current >>>>> >>> write. >>>>> >>> 2. Look for inodes corresponding to these shard files in itable. >>>>> >>> 3. If one or more inodes are missing from itable, issue mknod for >>>>> >>> corresponding shard files and ignore EEXIST in cbk. >>>>> >>> 4. resume writes on respective shards. >>>>> >>> >>>>> >>> Now, imagine a write which falls to an existing "shard_file". For >>>>> >>> the >>>>> >>> sake of discussion lets consider a distribute of three subvols - >>>>> >>> s1, s2, s3 >>>>> >>> >>>>> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2 >>>>> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of >>>>> >>> ".shard" >>>>> >>> is fixed to include s4 and hash ranges are changed. >>>>> >>> 3. write that touches "shard_file" is issued. >>>>> >>> 4. The inode for "shard_file" is not present in itable after a >>>>> >>> graph >>>>> >>> switch and features/shard issues an mknod. >>>>> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 >>>>> >>> and >>>>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already >>>>> >>> present on >>>>> >>> s2. >>>>> >>> >>>>> >>> So, we have two files on two different subvols of dht representing >>>>> >>> same >>>>> >>> shard and this will lead to corruption. >>>>> >>> >>>>> >>> </RCA> >>>>> >>> >>>>> >>> Raghavendra will be sending out a patch in DHT to fix this issue. >>>>> >>> >>>>> >>> -Krutika >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri >>>>> >>> <pkarampu at redhat.com> wrote: >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan >>>>> >>>> <mahdi.adnan at outlook.com> >>>>> >>>> wrote: >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Do you guys have any update regarding this issue ? >>>>> >>>> >>>>> >>>> I do not actively work on this issue so I do not have an accurate >>>>> >>>> update, but from what I heard from Krutika and Raghavendra(works >>>>> >>>> on DHT) is: >>>>> >>>> Krutika debugged initially and found that the issue seems more >>>>> >>>> likely to be >>>>> >>>> in DHT, Satheesaran who helped us recreate this issue in lab found >>>>> >>>> that just >>>>> >>>> fix-layout without rebalance also caused the corruption 1 out of 3 >>>>> >>>> times. >>>>> >>>> Raghavendra came up with a possible RCA for why this can happen. >>>>> >>>> Raghavendra(CCed) would be the right person to provide accurate >>>>> >>>> update. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> Respectfully >>>>> >>>>> Mahdi A. Mahdi >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> >>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM >>>>> >>>>> To: Mahdi Adnan >>>>> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >>>>> >>>>> gluster-users at gluster.org List >>>>> >>>>> >>>>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>> corruption >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> So it looks like Satheesaran managed to recreate this issue. We >>>>> >>>>> will be >>>>> >>>>> seeking his help in debugging this. It will be easier that way. >>>>> >>>>> >>>>> >>>>> -Krutika >>>>> >>>>> >>>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan >>>>> >>>>> <mahdi.adnan at outlook.com> >>>>> >>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Hello and thank you for your email. >>>>> >>>>>> Actually no, i didn't check the gfid of the vms. >>>>> >>>>>> If this will help, i can setup a new test cluster and get all >>>>> >>>>>> the data >>>>> >>>>>> you need. >>>>> >>>>>> >>>>> >>>>>> Get Outlook for Android >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> From: Nithya Balachandran >>>>> >>>>>> Sent: Monday, March 20, 20:57 >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>>> corruption >>>>> >>>>>> To: Krutika Dhananjay >>>>> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >>>>> >>>>>> gluster-users at gluster.org List >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> Do you know the GFIDs of the VM images which were corrupted? >>>>> >>>>>> >>>>> >>>>>> Regards, >>>>> >>>>>> >>>>> >>>>>> Nithya >>>>> >>>>>> >>>>> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay >>>>> >>>>>> <kdhananj at redhat.com> >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> I looked at the logs. >>>>> >>>>>> >>>>> >>>>>> From the time the new graph (since the add-brick command you >>>>> >>>>>> shared >>>>> >>>>>> where bricks 41 through 44 are added) is switched to (line 3011 >>>>> >>>>>> onwards in >>>>> >>>>>> nfs-gfapi.log), I see the following kinds of errors: >>>>> >>>>>> >>>>> >>>>>> 1. Lookups to a bunch of files failed with ENOENT on both >>>>> >>>>>> replicas >>>>> >>>>>> which protocol/client converts to ESTALE. I am guessing these >>>>> >>>>>> entries got >>>>> >>>>>> migrated to >>>>> >>>>>> >>>>> >>>>>> other subvolumes leading to 'No such file or directory' errors. >>>>> >>>>>> >>>>> >>>>>> DHT and thereafter shard get the same error code and log the >>>>> >>>>>> following: >>>>> >>>>>> >>>>> >>>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >>>>> >>>>>> [dht-helper.c:1198:dht_migration_complete_check_task] >>>>> >>>>>> 17-vmware2-dht: >>>>> >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to >>>>> >>>>>> lookup the file >>>>> >>>>>> on vmware2-dht [Stale file handle] >>>>> >>>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >>>>> >>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat >>>>> >>>>>> failed: >>>>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >>>>> >>>>>> >>>>> >>>>>> which is fine. >>>>> >>>>>> >>>>> >>>>>> 2. The other kind are from AFR logging of possible split-brain >>>>> >>>>>> which I >>>>> >>>>>> suppose are harmless too. >>>>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >>>>> >>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: >>>>> >>>>>> Unreadable >>>>> >>>>>> subvolume -1 found with event generation 2 for gfid >>>>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >>>>> >>>>>> >>>>> >>>>>> Since you are saying the bug is hit only on VMs that are >>>>> >>>>>> undergoing IO >>>>> >>>>>> while rebalance is running (as opposed to those that remained >>>>> >>>>>> powered off), >>>>> >>>>>> >>>>> >>>>>> rebalance + IO could be causing some issues. >>>>> >>>>>> >>>>> >>>>>> CC'ing DHT devs >>>>> >>>>>> >>>>> >>>>>> Raghavendra/Nithya/Susant, >>>>> >>>>>> >>>>> >>>>>> Could you take a look? >>>>> >>>>>> >>>>> >>>>>> -Krutika >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan >>>>> >>>>>> <mahdi.adnan at outlook.com> >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Thank you for your email mate. >>>>> >>>>>> >>>>> >>>>>> Yes, im aware of this but, to save costs i chose replica 2, this >>>>> >>>>>> cluster is all flash. >>>>> >>>>>> >>>>> >>>>>> In version 3.7.x i had issues with ping timeout, if one hosts >>>>> >>>>>> went >>>>> >>>>>> down for few seconds the whole cluster hangs and become >>>>> >>>>>> unavailable, to >>>>> >>>>>> avoid this i adjusted the ping timeout to 5 seconds. >>>>> >>>>>> >>>>> >>>>>> As for choosing Ganesha over gfapi, VMWare does not support >>>>> >>>>>> Gluster >>>>> >>>>>> (FUSE or gfapi) im stuck with NFS for this volume. >>>>> >>>>>> >>>>> >>>>>> The other volume is mounted using gfapi in oVirt cluster. >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> >>>>> >>>>>> Respectfully >>>>> >>>>>> Mahdi A. Mahdi >>>>> >>>>>> >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM >>>>> >>>>>> >>>>> >>>>>> To: Mahdi Adnan >>>>> >>>>>> Cc: gluster-users at gluster.org >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>>> corruption >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> While I'm still going through the logs, just wanted to point out >>>>> >>>>>> a >>>>> >>>>>> couple of things: >>>>> >>>>>> >>>>> >>>>>> 1. It is recommended that you use 3-way replication (replica >>>>> >>>>>> count 3) >>>>> >>>>>> for VM store use case >>>>> >>>>>> >>>>> >>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please >>>>> >>>>>> change it >>>>> >>>>>> to 30. >>>>> >>>>>> >>>>> >>>>>> Is there any specific reason for using NFS-Ganesha over >>>>> >>>>>> gfapi/FUSE? >>>>> >>>>>> >>>>> >>>>>> Will get back with anything else I might find or more questions >>>>> >>>>>> if I >>>>> >>>>>> have any. >>>>> >>>>>> >>>>> >>>>>> -Krutika >>>>> >>>>>> >>>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan >>>>> >>>>>> <mahdi.adnan at outlook.com> >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Thanks mate, >>>>> >>>>>> >>>>> >>>>>> Kindly, check the attachment. >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> >>>>> >>>>>> Respectfully >>>>> >>>>>> Mahdi A. Mahdi >>>>> >>>>>> >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM >>>>> >>>>>> >>>>> >>>>>> To: Mahdi Adnan >>>>> >>>>>> Cc: gluster-users at gluster.org >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>>> corruption >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> In that case could you share the ganesha-gfapi logs? >>>>> >>>>>> >>>>> >>>>>> -Krutika >>>>> >>>>>> >>>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan >>>>> >>>>>> <mahdi.adnan at outlook.com> wrote: >>>>> >>>>>> >>>>> >>>>>> I have two volumes, one is mounted using libgfapi for ovirt >>>>> >>>>>> mount, the >>>>> >>>>>> other one is exported via NFS-Ganesha for VMWare which is the >>>>> >>>>>> one im testing >>>>> >>>>>> now. >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> >>>>> >>>>>> Respectfully >>>>> >>>>>> Mahdi A. Mahdi >>>>> >>>>>> >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM >>>>> >>>>>> >>>>> >>>>>> To: Mahdi Adnan >>>>> >>>>>> Cc: gluster-users at gluster.org >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>>> corruption >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan >>>>> >>>>>> <mahdi.adnan at outlook.com> wrote: >>>>> >>>>>> >>>>> >>>>>> Kindly, check the attached new log file, i dont know if it's >>>>> >>>>>> helpful >>>>> >>>>>> or not but, i couldn't find the log with the name you just >>>>> >>>>>> described. >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> No. Are you using FUSE or libgfapi for accessing the volume? Or >>>>> >>>>>> is it >>>>> >>>>>> NFS? >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> -Krutika >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> >>>>> >>>>>> Respectfully >>>>> >>>>>> Mahdi A. Mahdi >>>>> >>>>>> >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM >>>>> >>>>>> >>>>> >>>>>> To: Mahdi Adnan >>>>> >>>>>> Cc: gluster-users at gluster.org >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>>> corruption >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach >>>>> >>>>>> the >>>>> >>>>>> fuse mount logs? It should be right under /var/log/glusterfs/ >>>>> >>>>>> directory >>>>> >>>>>> >>>>> >>>>>> named after the mount point name, only hyphenated. >>>>> >>>>>> >>>>> >>>>>> -Krutika >>>>> >>>>>> >>>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan >>>>> >>>>>> <mahdi.adnan at outlook.com> >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Hello Krutika, >>>>> >>>>>> >>>>> >>>>>> Kindly, check the attached logs. >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> >>>>> >>>>>> Respectfully >>>>> >>>>>> Mahdi A. Mahdi >>>>> >>>>>> >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM >>>>> >>>>>> To: Mahdi Adnan >>>>> >>>>>> Cc: gluster-users at gluster.org >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs >>>>> >>>>>> corruption >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Hi Mahdi, >>>>> >>>>>> >>>>> >>>>>> Could you attach mount, brick and rebalance logs? >>>>> >>>>>> >>>>> >>>>>> -Krutika >>>>> >>>>>> >>>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan >>>>> >>>>>> <mahdi.adnan at outlook.com> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick >>>>> >>>>>> procedure in a volume contains few VMs. >>>>> >>>>>> >>>>> >>>>>> After the completion of rebalance, i have rebooted the VMs, some >>>>> >>>>>> of >>>>> >>>>>> ran just fine, and others just crashed. >>>>> >>>>>> >>>>> >>>>>> Windows boot to recovery mode and Linux throw xfs errors and >>>>> >>>>>> does not >>>>> >>>>>> boot. >>>>> >>>>>> >>>>> >>>>>> I ran the test again and it happened just as the first one, but >>>>> >>>>>> i have >>>>> >>>>>> noticed only VMs doing disk IOs are affected by this bug. >>>>> >>>>>> >>>>> >>>>>> The VMs in power off mode started fine and even md5 of the disk >>>>> >>>>>> file >>>>> >>>>>> did not change after the rebalance. >>>>> >>>>>> >>>>> >>>>>> anyone else can confirm this ? >>>>> >>>>>> >>>>> >>>>>> Volume info: >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Volume Name: vmware2 >>>>> >>>>>> >>>>> >>>>>> Type: Distributed-Replicate >>>>> >>>>>> >>>>> >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>>>> >>>>>> >>>>> >>>>>> Status: Started >>>>> >>>>>> >>>>> >>>>>> Snapshot Count: 0 >>>>> >>>>>> >>>>> >>>>>> Number of Bricks: 22 x 2 = 44 >>>>> >>>>>> >>>>> >>>>>> Transport-type: tcp >>>>> >>>>>> >>>>> >>>>>> Bricks: >>>>> >>>>>> >>>>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2 >>>>> >>>>>> >>>>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2 >>>>> >>>>>> >>>>> >>>>>> Options Reconfigured: >>>>> >>>>>> >>>>> >>>>>> cluster.server-quorum-type: server >>>>> >>>>>> >>>>> >>>>>> nfs.disable: on >>>>> >>>>>> >>>>> >>>>>> performance.readdir-ahead: on >>>>> >>>>>> >>>>> >>>>>> transport.address-family: inet >>>>> >>>>>> >>>>> >>>>>> performance.quick-read: off >>>>> >>>>>> >>>>> >>>>>> performance.read-ahead: off >>>>> >>>>>> >>>>> >>>>>> performance.io-cache: off >>>>> >>>>>> >>>>> >>>>>> performance.stat-prefetch: off >>>>> >>>>>> >>>>> >>>>>> cluster.eager-lock: enable >>>>> >>>>>> >>>>> >>>>>> network.remote-dio: enable >>>>> >>>>>> >>>>> >>>>>> features.shard: on >>>>> >>>>>> >>>>> >>>>>> cluster.data-self-heal-algorithm: full >>>>> >>>>>> >>>>> >>>>>> features.cache-invalidation: on >>>>> >>>>>> >>>>> >>>>>> ganesha.enable: on >>>>> >>>>>> >>>>> >>>>>> features.shard-block-size: 256MB >>>>> >>>>>> >>>>> >>>>>> client.event-threads: 2 >>>>> >>>>>> >>>>> >>>>>> server.event-threads: 2 >>>>> >>>>>> >>>>> >>>>>> cluster.favorite-child-policy: size >>>>> >>>>>> >>>>> >>>>>> storage.build-pgfid: off >>>>> >>>>>> >>>>> >>>>>> network.ping-timeout: 5 >>>>> >>>>>> >>>>> >>>>>> cluster.enable-shared-storage: enable >>>>> >>>>>> >>>>> >>>>>> nfs-ganesha: enable >>>>> >>>>>> >>>>> >>>>>> cluster.server-quorum-ratio: 51% >>>>> >>>>>> >>>>> >>>>>> Adding bricks: >>>>> >>>>>> >>>>> >>>>>> gluster volume add-brick vmware2 replica 2 >>>>> >>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >>>>> >>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >>>>> >>>>>> >>>>> >>>>>> starting fix layout: >>>>> >>>>>> >>>>> >>>>>> gluster volume rebalance vmware2 fix-layout start >>>>> >>>>>> >>>>> >>>>>> Starting rebalance: >>>>> >>>>>> >>>>> >>>>>> gluster volume rebalance vmware2 start >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> >>>>> >>>>>> Respectfully >>>>> >>>>>> Mahdi A. Mahdi >>>>> >>>>>> >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Gluster-users mailing list >>>>> >>>>>> Gluster-users at gluster.org >>>>> >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> Gluster-users mailing list >>>>> >>>>> Gluster-users at gluster.org >>>>> >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> -- >>>>> >>>> Pranith >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> _______________________________________________ >>>>> >>> Gluster-users mailing list >>>>> >>> Gluster-users at gluster.org >>>>> >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>> > >>>>> > >>> >>> >>> >>> >>> -- >>> Pranith > > > > > -- > Pranith > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
Pranith Kumar Karampuri
2017-Apr-27 11:31 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
But even after that fix, it is still leading to pause. And these are the two updates on what the developers are doing as per my understanding. So that workflow is not stable yet IMO. On Thu, Apr 27, 2017 at 4:51 PM, Serkan ?oban <cobanserkan at gmail.com> wrote:> I think this is he fix Gandalf asking for: > https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec > 47e1ba27ce > > > On Thu, Apr 27, 2017 at 2:03 PM, Pranith Kumar Karampuri > <pkarampu at redhat.com> wrote: > > I am very positive about the two things I told you. These are the latest > > things that happened for VM corruption with rebalance. > > > > On Thu, Apr 27, 2017 at 4:30 PM, Gandalf Corvotempesta > > <gandalf.corvotempesta at gmail.com> wrote: > >> > >> I think we are talking about a different bug. > >> > >> Il 27 apr 2017 12:58 PM, "Pranith Kumar Karampuri" <pkarampu at redhat.com > > > >> ha scritto: > >>> > >>> I am not a DHT developer, so some of what I say could be a little > wrong. > >>> But this is what I gather. > >>> I think they found 2 classes of bugs in dht > >>> 1) Graceful fop failover when rebalance is in progress is missing for > >>> some fops, that lead to VM pause. > >>> > >>> I see that https://review.gluster.org/17085 got merged on 24th on > master > >>> for this. I see patches are posted for 3.8.x for this one. > >>> > >>> 2) I think there is some work needs to be done for dht_[f]xattrop. I > >>> believe this is the next step that is underway. > >>> > >>> > >>> On Thu, Apr 27, 2017 at 12:13 PM, Gandalf Corvotempesta > >>> <gandalf.corvotempesta at gmail.com> wrote: > >>>> > >>>> Updates on this critical bug ? > >>>> > >>>> Il 18 apr 2017 8:24 PM, "Gandalf Corvotempesta" > >>>> <gandalf.corvotempesta at gmail.com> ha scritto: > >>>>> > >>>>> Any update ? > >>>>> In addition, if this is a different bug but the "workflow" is the > same > >>>>> as the previous one, how is possible that fixing the previous bug > >>>>> triggered this new one ? > >>>>> > >>>>> Is possible to have some details ? > >>>>> > >>>>> 2017-04-04 16:11 GMT+02:00 Krutika Dhananjay <kdhananj at redhat.com>: > >>>>> > Nope. This is a different bug. > >>>>> > > >>>>> > -Krutika > >>>>> > > >>>>> > On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta > >>>>> > <gandalf.corvotempesta at gmail.com> wrote: > >>>>> >> > >>>>> >> This is a good news > >>>>> >> Is this related to the previously fixed bug? > >>>>> >> > >>>>> >> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhananj at redhat.com> > ha > >>>>> >> scritto: > >>>>> >>> > >>>>> >>> So Raghavendra has an RCA for this issue. > >>>>> >>> > >>>>> >>> Copy-pasting his comment here: > >>>>> >>> > >>>>> >>> <RCA> > >>>>> >>> > >>>>> >>> Following is a rough algorithm of shard_writev: > >>>>> >>> > >>>>> >>> 1. Based on the offset, calculate the shards touched by current > >>>>> >>> write. > >>>>> >>> 2. Look for inodes corresponding to these shard files in itable. > >>>>> >>> 3. If one or more inodes are missing from itable, issue mknod for > >>>>> >>> corresponding shard files and ignore EEXIST in cbk. > >>>>> >>> 4. resume writes on respective shards. > >>>>> >>> > >>>>> >>> Now, imagine a write which falls to an existing "shard_file". For > >>>>> >>> the > >>>>> >>> sake of discussion lets consider a distribute of three subvols - > >>>>> >>> s1, s2, s3 > >>>>> >>> > >>>>> >>> 1. "shard_file" hashes to subvolume s2 and is present on s2 > >>>>> >>> 2. add a subvolume s4 and initiate a fix layout. The layout of > >>>>> >>> ".shard" > >>>>> >>> is fixed to include s4 and hash ranges are changed. > >>>>> >>> 3. write that touches "shard_file" is issued. > >>>>> >>> 4. The inode for "shard_file" is not present in itable after a > >>>>> >>> graph > >>>>> >>> switch and features/shard issues an mknod. > >>>>> >>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 > >>>>> >>> and > >>>>> >>> mknod (shard_file) on s3 succeeds. But, the shard_file is already > >>>>> >>> present on > >>>>> >>> s2. > >>>>> >>> > >>>>> >>> So, we have two files on two different subvols of dht > representing > >>>>> >>> same > >>>>> >>> shard and this will lead to corruption. > >>>>> >>> > >>>>> >>> </RCA> > >>>>> >>> > >>>>> >>> Raghavendra will be sending out a patch in DHT to fix this issue. > >>>>> >>> > >>>>> >>> -Krutika > >>>>> >>> > >>>>> >>> > >>>>> >>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri > >>>>> >>> <pkarampu at redhat.com> wrote: > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan > >>>>> >>>> <mahdi.adnan at outlook.com> > >>>>> >>>> wrote: > >>>>> >>>>> > >>>>> >>>>> Hi, > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> Do you guys have any update regarding this issue ? > >>>>> >>>> > >>>>> >>>> I do not actively work on this issue so I do not have an > accurate > >>>>> >>>> update, but from what I heard from Krutika and Raghavendra(works > >>>>> >>>> on DHT) is: > >>>>> >>>> Krutika debugged initially and found that the issue seems more > >>>>> >>>> likely to be > >>>>> >>>> in DHT, Satheesaran who helped us recreate this issue in lab > found > >>>>> >>>> that just > >>>>> >>>> fix-layout without rebalance also caused the corruption 1 out > of 3 > >>>>> >>>> times. > >>>>> >>>> Raghavendra came up with a possible RCA for why this can happen. > >>>>> >>>> Raghavendra(CCed) would be the right person to provide accurate > >>>>> >>>> update. > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> -- > >>>>> >>>>> > >>>>> >>>>> Respectfully > >>>>> >>>>> Mahdi A. Mahdi > >>>>> >>>>> > >>>>> >>>>> ________________________________ > >>>>> >>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> >>>>> Sent: Tuesday, March 21, 2017 3:02:55 PM > >>>>> >>>>> To: Mahdi Adnan > >>>>> >>>>> Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; > >>>>> >>>>> gluster-users at gluster.org List > >>>>> >>>>> > >>>>> >>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>> corruption > >>>>> >>>>> > >>>>> >>>>> Hi, > >>>>> >>>>> > >>>>> >>>>> So it looks like Satheesaran managed to recreate this issue. We > >>>>> >>>>> will be > >>>>> >>>>> seeking his help in debugging this. It will be easier that way. > >>>>> >>>>> > >>>>> >>>>> -Krutika > >>>>> >>>>> > >>>>> >>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan > >>>>> >>>>> <mahdi.adnan at outlook.com> > >>>>> >>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Hello and thank you for your email. > >>>>> >>>>>> Actually no, i didn't check the gfid of the vms. > >>>>> >>>>>> If this will help, i can setup a new test cluster and get all > >>>>> >>>>>> the data > >>>>> >>>>>> you need. > >>>>> >>>>>> > >>>>> >>>>>> Get Outlook for Android > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> From: Nithya Balachandran > >>>>> >>>>>> Sent: Monday, March 20, 20:57 > >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>>> corruption > >>>>> >>>>>> To: Krutika Dhananjay > >>>>> >>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, > >>>>> >>>>>> gluster-users at gluster.org List > >>>>> >>>>>> > >>>>> >>>>>> Hi, > >>>>> >>>>>> > >>>>> >>>>>> Do you know the GFIDs of the VM images which were corrupted? > >>>>> >>>>>> > >>>>> >>>>>> Regards, > >>>>> >>>>>> > >>>>> >>>>>> Nithya > >>>>> >>>>>> > >>>>> >>>>>> On 20 March 2017 at 20:37, Krutika Dhananjay > >>>>> >>>>>> <kdhananj at redhat.com> > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> I looked at the logs. > >>>>> >>>>>> > >>>>> >>>>>> From the time the new graph (since the add-brick command you > >>>>> >>>>>> shared > >>>>> >>>>>> where bricks 41 through 44 are added) is switched to (line > 3011 > >>>>> >>>>>> onwards in > >>>>> >>>>>> nfs-gfapi.log), I see the following kinds of errors: > >>>>> >>>>>> > >>>>> >>>>>> 1. Lookups to a bunch of files failed with ENOENT on both > >>>>> >>>>>> replicas > >>>>> >>>>>> which protocol/client converts to ESTALE. I am guessing these > >>>>> >>>>>> entries got > >>>>> >>>>>> migrated to > >>>>> >>>>>> > >>>>> >>>>>> other subvolumes leading to 'No such file or directory' > errors. > >>>>> >>>>>> > >>>>> >>>>>> DHT and thereafter shard get the same error code and log the > >>>>> >>>>>> following: > >>>>> >>>>>> > >>>>> >>>>>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] > >>>>> >>>>>> [dht-helper.c:1198:dht_migration_complete_check_task] > >>>>> >>>>>> 17-vmware2-dht: > >>>>> >>>>>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to > >>>>> >>>>>> lookup the file > >>>>> >>>>>> on vmware2-dht [Stale file handle] > >>>>> >>>>>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] > >>>>> >>>>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat > >>>>> >>>>>> failed: > >>>>> >>>>>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] > >>>>> >>>>>> > >>>>> >>>>>> which is fine. > >>>>> >>>>>> > >>>>> >>>>>> 2. The other kind are from AFR logging of possible split-brain > >>>>> >>>>>> which I > >>>>> >>>>>> suppose are harmless too. > >>>>> >>>>>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] > >>>>> >>>>>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: > >>>>> >>>>>> Unreadable > >>>>> >>>>>> subvolume -1 found with event generation 2 for gfid > >>>>> >>>>>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) > >>>>> >>>>>> > >>>>> >>>>>> Since you are saying the bug is hit only on VMs that are > >>>>> >>>>>> undergoing IO > >>>>> >>>>>> while rebalance is running (as opposed to those that remained > >>>>> >>>>>> powered off), > >>>>> >>>>>> > >>>>> >>>>>> rebalance + IO could be causing some issues. > >>>>> >>>>>> > >>>>> >>>>>> CC'ing DHT devs > >>>>> >>>>>> > >>>>> >>>>>> Raghavendra/Nithya/Susant, > >>>>> >>>>>> > >>>>> >>>>>> Could you take a look? > >>>>> >>>>>> > >>>>> >>>>>> -Krutika > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan > >>>>> >>>>>> <mahdi.adnan at outlook.com> > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Thank you for your email mate. > >>>>> >>>>>> > >>>>> >>>>>> Yes, im aware of this but, to save costs i chose replica 2, > this > >>>>> >>>>>> cluster is all flash. > >>>>> >>>>>> > >>>>> >>>>>> In version 3.7.x i had issues with ping timeout, if one hosts > >>>>> >>>>>> went > >>>>> >>>>>> down for few seconds the whole cluster hangs and become > >>>>> >>>>>> unavailable, to > >>>>> >>>>>> avoid this i adjusted the ping timeout to 5 seconds. > >>>>> >>>>>> > >>>>> >>>>>> As for choosing Ganesha over gfapi, VMWare does not support > >>>>> >>>>>> Gluster > >>>>> >>>>>> (FUSE or gfapi) im stuck with NFS for this volume. > >>>>> >>>>>> > >>>>> >>>>>> The other volume is mounted using gfapi in oVirt cluster. > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> > >>>>> >>>>>> Respectfully > >>>>> >>>>>> Mahdi A. Mahdi > >>>>> >>>>>> > >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> >>>>>> Sent: Sunday, March 19, 2017 2:01:49 PM > >>>>> >>>>>> > >>>>> >>>>>> To: Mahdi Adnan > >>>>> >>>>>> Cc: gluster-users at gluster.org > >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>>> corruption > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> While I'm still going through the logs, just wanted to point > out > >>>>> >>>>>> a > >>>>> >>>>>> couple of things: > >>>>> >>>>>> > >>>>> >>>>>> 1. It is recommended that you use 3-way replication (replica > >>>>> >>>>>> count 3) > >>>>> >>>>>> for VM store use case > >>>>> >>>>>> > >>>>> >>>>>> 2. network.ping-timeout at 5 seconds is way too low. Please > >>>>> >>>>>> change it > >>>>> >>>>>> to 30. > >>>>> >>>>>> > >>>>> >>>>>> Is there any specific reason for using NFS-Ganesha over > >>>>> >>>>>> gfapi/FUSE? > >>>>> >>>>>> > >>>>> >>>>>> Will get back with anything else I might find or more > questions > >>>>> >>>>>> if I > >>>>> >>>>>> have any. > >>>>> >>>>>> > >>>>> >>>>>> -Krutika > >>>>> >>>>>> > >>>>> >>>>>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan > >>>>> >>>>>> <mahdi.adnan at outlook.com> > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Thanks mate, > >>>>> >>>>>> > >>>>> >>>>>> Kindly, check the attachment. > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> > >>>>> >>>>>> Respectfully > >>>>> >>>>>> Mahdi A. Mahdi > >>>>> >>>>>> > >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> >>>>>> Sent: Sunday, March 19, 2017 10:00:22 AM > >>>>> >>>>>> > >>>>> >>>>>> To: Mahdi Adnan > >>>>> >>>>>> Cc: gluster-users at gluster.org > >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>>> corruption > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> In that case could you share the ganesha-gfapi logs? > >>>>> >>>>>> > >>>>> >>>>>> -Krutika > >>>>> >>>>>> > >>>>> >>>>>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan > >>>>> >>>>>> <mahdi.adnan at outlook.com> wrote: > >>>>> >>>>>> > >>>>> >>>>>> I have two volumes, one is mounted using libgfapi for ovirt > >>>>> >>>>>> mount, the > >>>>> >>>>>> other one is exported via NFS-Ganesha for VMWare which is the > >>>>> >>>>>> one im testing > >>>>> >>>>>> now. > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> > >>>>> >>>>>> Respectfully > >>>>> >>>>>> Mahdi A. Mahdi > >>>>> >>>>>> > >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> >>>>>> Sent: Sunday, March 19, 2017 8:02:19 AM > >>>>> >>>>>> > >>>>> >>>>>> To: Mahdi Adnan > >>>>> >>>>>> Cc: gluster-users at gluster.org > >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>>> corruption > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan > >>>>> >>>>>> <mahdi.adnan at outlook.com> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Kindly, check the attached new log file, i dont know if it's > >>>>> >>>>>> helpful > >>>>> >>>>>> or not but, i couldn't find the log with the name you just > >>>>> >>>>>> described. > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> No. Are you using FUSE or libgfapi for accessing the volume? > Or > >>>>> >>>>>> is it > >>>>> >>>>>> NFS? > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> -Krutika > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> > >>>>> >>>>>> Respectfully > >>>>> >>>>>> Mahdi A. Mahdi > >>>>> >>>>>> > >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> >>>>>> Sent: Saturday, March 18, 2017 6:10:40 PM > >>>>> >>>>>> > >>>>> >>>>>> To: Mahdi Adnan > >>>>> >>>>>> Cc: gluster-users at gluster.org > >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>>> corruption > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> mnt-disk11-vmware2.log seems like a brick log. Could you > attach > >>>>> >>>>>> the > >>>>> >>>>>> fuse mount logs? It should be right under /var/log/glusterfs/ > >>>>> >>>>>> directory > >>>>> >>>>>> > >>>>> >>>>>> named after the mount point name, only hyphenated. > >>>>> >>>>>> > >>>>> >>>>>> -Krutika > >>>>> >>>>>> > >>>>> >>>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan > >>>>> >>>>>> <mahdi.adnan at outlook.com> > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Hello Krutika, > >>>>> >>>>>> > >>>>> >>>>>> Kindly, check the attached logs. > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> > >>>>> >>>>>> Respectfully > >>>>> >>>>>> Mahdi A. Mahdi > >>>>> >>>>>> > >>>>> >>>>>> From: Krutika Dhananjay <kdhananj at redhat.com> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Sent: Saturday, March 18, 2017 3:29:03 PM > >>>>> >>>>>> To: Mahdi Adnan > >>>>> >>>>>> Cc: gluster-users at gluster.org > >>>>> >>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs > >>>>> >>>>>> corruption > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Hi Mahdi, > >>>>> >>>>>> > >>>>> >>>>>> Could you attach mount, brick and rebalance logs? > >>>>> >>>>>> > >>>>> >>>>>> -Krutika > >>>>> >>>>>> > >>>>> >>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan > >>>>> >>>>>> <mahdi.adnan at outlook.com> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Hi, > >>>>> >>>>>> > >>>>> >>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick > >>>>> >>>>>> procedure in a volume contains few VMs. > >>>>> >>>>>> > >>>>> >>>>>> After the completion of rebalance, i have rebooted the VMs, > some > >>>>> >>>>>> of > >>>>> >>>>>> ran just fine, and others just crashed. > >>>>> >>>>>> > >>>>> >>>>>> Windows boot to recovery mode and Linux throw xfs errors and > >>>>> >>>>>> does not > >>>>> >>>>>> boot. > >>>>> >>>>>> > >>>>> >>>>>> I ran the test again and it happened just as the first one, > but > >>>>> >>>>>> i have > >>>>> >>>>>> noticed only VMs doing disk IOs are affected by this bug. > >>>>> >>>>>> > >>>>> >>>>>> The VMs in power off mode started fine and even md5 of the > disk > >>>>> >>>>>> file > >>>>> >>>>>> did not change after the rebalance. > >>>>> >>>>>> > >>>>> >>>>>> anyone else can confirm this ? > >>>>> >>>>>> > >>>>> >>>>>> Volume info: > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Volume Name: vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Type: Distributed-Replicate > >>>>> >>>>>> > >>>>> >>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf > >>>>> >>>>>> > >>>>> >>>>>> Status: Started > >>>>> >>>>>> > >>>>> >>>>>> Snapshot Count: 0 > >>>>> >>>>>> > >>>>> >>>>>> Number of Bricks: 22 x 2 = 44 > >>>>> >>>>>> > >>>>> >>>>>> Transport-type: tcp > >>>>> >>>>>> > >>>>> >>>>>> Bricks: > >>>>> >>>>>> > >>>>> >>>>>> Brick1: gluster01:/mnt/disk1/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick2: gluster03:/mnt/disk1/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick3: gluster02:/mnt/disk1/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick4: gluster04:/mnt/disk1/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick5: gluster01:/mnt/disk2/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick6: gluster03:/mnt/disk2/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick7: gluster02:/mnt/disk2/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick8: gluster04:/mnt/disk2/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick9: gluster01:/mnt/disk3/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick10: gluster03:/mnt/disk3/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick11: gluster02:/mnt/disk3/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick12: gluster04:/mnt/disk3/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick13: gluster01:/mnt/disk4/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick14: gluster03:/mnt/disk4/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick15: gluster02:/mnt/disk4/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick16: gluster04:/mnt/disk4/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick17: gluster01:/mnt/disk5/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick18: gluster03:/mnt/disk5/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick19: gluster02:/mnt/disk5/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick20: gluster04:/mnt/disk5/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick21: gluster01:/mnt/disk6/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick22: gluster03:/mnt/disk6/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick23: gluster02:/mnt/disk6/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick24: gluster04:/mnt/disk6/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick25: gluster01:/mnt/disk7/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick26: gluster03:/mnt/disk7/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick27: gluster02:/mnt/disk7/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick28: gluster04:/mnt/disk7/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick29: gluster01:/mnt/disk8/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick30: gluster03:/mnt/disk8/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick31: gluster02:/mnt/disk8/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick32: gluster04:/mnt/disk8/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick33: gluster01:/mnt/disk9/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick34: gluster03:/mnt/disk9/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick35: gluster02:/mnt/disk9/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick36: gluster04:/mnt/disk9/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick37: gluster01:/mnt/disk10/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick38: gluster03:/mnt/disk10/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick39: gluster02:/mnt/disk10/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick40: gluster04:/mnt/disk10/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick41: gluster01:/mnt/disk11/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick42: gluster03:/mnt/disk11/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick43: gluster02:/mnt/disk11/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Brick44: gluster04:/mnt/disk11/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> Options Reconfigured: > >>>>> >>>>>> > >>>>> >>>>>> cluster.server-quorum-type: server > >>>>> >>>>>> > >>>>> >>>>>> nfs.disable: on > >>>>> >>>>>> > >>>>> >>>>>> performance.readdir-ahead: on > >>>>> >>>>>> > >>>>> >>>>>> transport.address-family: inet > >>>>> >>>>>> > >>>>> >>>>>> performance.quick-read: off > >>>>> >>>>>> > >>>>> >>>>>> performance.read-ahead: off > >>>>> >>>>>> > >>>>> >>>>>> performance.io-cache: off > >>>>> >>>>>> > >>>>> >>>>>> performance.stat-prefetch: off > >>>>> >>>>>> > >>>>> >>>>>> cluster.eager-lock: enable > >>>>> >>>>>> > >>>>> >>>>>> network.remote-dio: enable > >>>>> >>>>>> > >>>>> >>>>>> features.shard: on > >>>>> >>>>>> > >>>>> >>>>>> cluster.data-self-heal-algorithm: full > >>>>> >>>>>> > >>>>> >>>>>> features.cache-invalidation: on > >>>>> >>>>>> > >>>>> >>>>>> ganesha.enable: on > >>>>> >>>>>> > >>>>> >>>>>> features.shard-block-size: 256MB > >>>>> >>>>>> > >>>>> >>>>>> client.event-threads: 2 > >>>>> >>>>>> > >>>>> >>>>>> server.event-threads: 2 > >>>>> >>>>>> > >>>>> >>>>>> cluster.favorite-child-policy: size > >>>>> >>>>>> > >>>>> >>>>>> storage.build-pgfid: off > >>>>> >>>>>> > >>>>> >>>>>> network.ping-timeout: 5 > >>>>> >>>>>> > >>>>> >>>>>> cluster.enable-shared-storage: enable > >>>>> >>>>>> > >>>>> >>>>>> nfs-ganesha: enable > >>>>> >>>>>> > >>>>> >>>>>> cluster.server-quorum-ratio: 51% > >>>>> >>>>>> > >>>>> >>>>>> Adding bricks: > >>>>> >>>>>> > >>>>> >>>>>> gluster volume add-brick vmware2 replica 2 > >>>>> >>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 > >>>>> >>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 > >>>>> >>>>>> > >>>>> >>>>>> starting fix layout: > >>>>> >>>>>> > >>>>> >>>>>> gluster volume rebalance vmware2 fix-layout start > >>>>> >>>>>> > >>>>> >>>>>> Starting rebalance: > >>>>> >>>>>> > >>>>> >>>>>> gluster volume rebalance vmware2 start > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> > >>>>> >>>>>> Respectfully > >>>>> >>>>>> Mahdi A. Mahdi > >>>>> >>>>>> > >>>>> >>>>>> _______________________________________________ > >>>>> >>>>>> Gluster-users mailing list > >>>>> >>>>>> Gluster-users at gluster.org > >>>>> >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> _______________________________________________ > >>>>> >>>>> Gluster-users mailing list > >>>>> >>>>> Gluster-users at gluster.org > >>>>> >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> -- > >>>>> >>>> Pranith > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> _______________________________________________ > >>>>> >>> Gluster-users mailing list > >>>>> >>> Gluster-users at gluster.org > >>>>> >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >>>>> > > >>>>> > > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > > > > > > > > > > -- > > Pranith > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170427/77c551f5/attachment.html>
Gandalf Corvotempesta
2017-Apr-27 11:45 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
2017-04-27 13:21 GMT+02:00 Serkan ?oban <cobanserkan at gmail.com>:> I think this is he fix Gandalf asking for: > https://github.com/gluster/glusterfs/commit/6e3054b42f9aef1e35b493fbb002ec47e1ba27ceYes, i'm talking about this.