Pranith Kumar Karampuri
2017-Mar-28 18:19 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hi, > > > Do you guys have any update regarding this issue ? >I do not actively work on this issue so I do not have an accurate update, but from what I heard from Krutika and Raghavendra(works on DHT) is: Krutika debugged initially and found that the issue seems more likely to be in DHT, Satheesaran who helped us recreate this issue in lab found that just fix-layout without rebalance also caused the corruption 1 out of 3 times. Raghavendra came up with a possible RCA for why this can happen. Raghavendra(CCed) would be the right person to provide accurate update.> > > -- > > Respectfully > *Mahdi A. Mahdi* > > ------------------------------ > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Tuesday, March 21, 2017 3:02:55 PM > *To:* Mahdi Adnan > *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; > gluster-users at gluster.org List > > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > Hi, > > So it looks like Satheesaran managed to recreate this issue. We will be > seeking his help in debugging this. It will be easier that way. > > -Krutika > > On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Hello and thank you for your email. >> Actually no, i didn't check the gfid of the vms. >> If this will help, i can setup a new test cluster and get all the data >> you need. >> >> Get Outlook for Android <https://aka.ms/ghei36> >> >> From: Nithya Balachandran >> Sent: Monday, March 20, 20:57 >> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> To: Krutika Dhananjay >> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >> gluster-users at gluster.org List >> >> Hi, >> >> Do you know the GFIDs of the VM images which were corrupted? >> >> Regards, >> >> Nithya >> >> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> wrote: >> >> I looked at the logs. >> >> From the time the new graph (since the add-brick command you shared where >> bricks 41 through 44 are added) is switched to (line 3011 onwards in >> nfs-gfapi.log), I see the following kinds of errors: >> >> 1. Lookups to a bunch of files failed with ENOENT on both replicas which >> protocol/client converts to ESTALE. I am guessing these entries got >> migrated to >> >> other subvolumes leading to 'No such file or directory' errors. >> >> DHT and thereafter shard get the same error code and log the following: >> >> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht: >> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup the >> file on vmware2-dht [Stale file handle] >> >> >> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: >> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >> >> which is fine. >> >> 2. The other kind are from AFR logging of possible split-brain which I >> suppose are harmless too. >> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable >> subvolume -1 found with event generation 2 for gfid >> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >> >> Since you are saying the bug is hit only on VMs that are undergoing IO >> while rebalance is running (as opposed to those that remained powered off), >> >> rebalance + IO could be causing some issues. >> >> CC'ing DHT devs >> >> Raghavendra/Nithya/Susant, >> >> Could you take a look? >> >> -Krutika >> >> >> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Thank you for your email mate. >> >> Yes, im aware of this but, to save costs i chose replica 2, this cluster >> is all flash. >> >> In version 3.7.x i had issues with ping timeout, if one hosts went down >> for few seconds the whole cluster hangs and become unavailable, to avoid >> this i adjusted the ping timeout to 5 seconds. >> >> As for choosing Ganesha over gfapi, VMWare does not support Gluster (FUSE >> or gfapi) im stuck with NFS for this volume. >> >> The other volume is mounted using gfapi in oVirt cluster. >> >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Sunday, March 19, 2017 2:01:49 PM >> >> *To:* Mahdi Adnan >> *Cc:* gluster-users at gluster.org >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> >> >> While I'm still going through the logs, just wanted to point out a couple >> of things: >> >> 1. It is recommended that you use 3-way replication (replica count 3) for >> VM store use case >> >> 2. network.ping-timeout at 5 seconds is way too low. Please change it to >> 30. >> >> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? >> >> Will get back with anything else I might find or more questions if I have >> any. >> >> -Krutika >> >> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Thanks mate, >> >> Kindly, check the attachment. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Sunday, March 19, 2017 10:00:22 AM >> >> *To:* Mahdi Adnan >> *Cc:* gluster-users at gluster.org >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> >> >> In that case could you share the ganesha-gfapi logs? >> >> -Krutika >> >> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> I have two volumes, one is mounted using libgfapi for ovirt mount, the >> other one is exported via NFS-Ganesha for VMWare which is the one im >> testing now. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Sunday, March 19, 2017 8:02:19 AM >> >> *To:* Mahdi Adnan >> *Cc:* gluster-users at gluster.org >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> >> >> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Kindly, check the attached new log file, i dont know if it's helpful or >> not but, i couldn't find the log with the name you just described. >> >> >> No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS? >> >> >> >> -Krutika >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Saturday, March 18, 2017 6:10:40 PM >> >> *To:* Mahdi Adnan >> *Cc:* gluster-users at gluster.org >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> >> >> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse >> mount logs? It should be right under /var/log/glusterfs/ directory >> >> named after the mount point name, only hyphenated. >> >> -Krutika >> >> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Hello Krutika, >> >> Kindly, check the attached logs. >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> >> *Sent:* Saturday, March 18, 2017 3:29:03 PM >> *To:* Mahdi Adnan >> *Cc:* gluster-users at gluster.org >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> >> >> Hi Mahdi, >> >> Could you attach mount, brick and rebalance logs? >> >> -Krutika >> >> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >> Hi, >> >> I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure >> in a volume contains few VMs. >> >> After the completion of rebalance, i have rebooted the VMs, some of ran >> just fine, and others just crashed. >> >> Windows boot to recovery mode and Linux throw xfs errors and does not >> boot. >> >> I ran the test again and it happened just as the first one, but i have >> noticed only VMs doing disk IOs are affected by this bug. >> >> The VMs in power off mode started fine and even md5 of the disk file did >> not change after the rebalance. >> >> anyone else can confirm this ? >> >> Volume info: >> >> >> >> Volume Name: vmware2 >> >> Type: Distributed-Replicate >> >> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 22 x 2 = 44 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: gluster01:/mnt/disk1/vmware2 >> >> Brick2: gluster03:/mnt/disk1/vmware2 >> >> Brick3: gluster02:/mnt/disk1/vmware2 >> >> Brick4: gluster04:/mnt/disk1/vmware2 >> >> Brick5: gluster01:/mnt/disk2/vmware2 >> >> Brick6: gluster03:/mnt/disk2/vmware2 >> >> Brick7: gluster02:/mnt/disk2/vmware2 >> >> Brick8: gluster04:/mnt/disk2/vmware2 >> >> Brick9: gluster01:/mnt/disk3/vmware2 >> >> Brick10: gluster03:/mnt/disk3/vmware2 >> >> Brick11: gluster02:/mnt/disk3/vmware2 >> >> Brick12: gluster04:/mnt/disk3/vmware2 >> >> Brick13: gluster01:/mnt/disk4/vmware2 >> >> Brick14: gluster03:/mnt/disk4/vmware2 >> >> Brick15: gluster02:/mnt/disk4/vmware2 >> >> Brick16: gluster04:/mnt/disk4/vmware2 >> >> Brick17: gluster01:/mnt/disk5/vmware2 >> >> Brick18: gluster03:/mnt/disk5/vmware2 >> >> Brick19: gluster02:/mnt/disk5/vmware2 >> >> Brick20: gluster04:/mnt/disk5/vmware2 >> >> Brick21: gluster01:/mnt/disk6/vmware2 >> >> Brick22: gluster03:/mnt/disk6/vmware2 >> >> Brick23: gluster02:/mnt/disk6/vmware2 >> >> Brick24: gluster04:/mnt/disk6/vmware2 >> >> Brick25: gluster01:/mnt/disk7/vmware2 >> >> Brick26: gluster03:/mnt/disk7/vmware2 >> >> Brick27: gluster02:/mnt/disk7/vmware2 >> >> Brick28: gluster04:/mnt/disk7/vmware2 >> >> Brick29: gluster01:/mnt/disk8/vmware2 >> >> Brick30: gluster03:/mnt/disk8/vmware2 >> >> Brick31: gluster02:/mnt/disk8/vmware2 >> >> Brick32: gluster04:/mnt/disk8/vmware2 >> >> Brick33: gluster01:/mnt/disk9/vmware2 >> >> Brick34: gluster03:/mnt/disk9/vmware2 >> >> Brick35: gluster02:/mnt/disk9/vmware2 >> >> Brick36: gluster04:/mnt/disk9/vmware2 >> >> Brick37: gluster01:/mnt/disk10/vmware2 >> >> Brick38: gluster03:/mnt/disk10/vmware2 >> >> Brick39: gluster02:/mnt/disk10/vmware2 >> >> Brick40: gluster04:/mnt/disk10/vmware2 >> >> Brick41: gluster01:/mnt/disk11/vmware2 >> >> Brick42: gluster03:/mnt/disk11/vmware2 >> >> Brick43: gluster02:/mnt/disk11/vmware2 >> >> Brick44: gluster04:/mnt/disk11/vmware2 >> >> Options Reconfigured: >> >> cluster.server-quorum-type: server >> >> nfs.disable: on >> >> performance.readdir-ahead: on >> >> transport.address-family: inet >> >> performance.quick-read: off >> >> performance.read-ahead: off >> >> performance.io-cache: off >> >> performance.stat-prefetch: off >> >> cluster.eager-lock: enable >> >> network.remote-dio: enable >> >> features.shard: on >> >> cluster.data-self-heal-algorithm: full >> >> features.cache-invalidation: on >> >> ganesha.enable: on >> >> features.shard-block-size: 256MB >> >> client.event-threads: 2 >> >> server.event-threads: 2 >> >> cluster.favorite-child-policy: size >> >> storage.build-pgfid: off >> >> network.ping-timeout: 5 >> >> cluster.enable-shared-storage: enable >> >> nfs-ganesha: enable >> >> cluster.server-quorum-ratio: 51% >> >> Adding bricks: >> >> gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 >> gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 >> gluster04:/mnt/disk11/vmware2 >> >> starting fix layout: >> >> gluster volume rebalance vmware2 fix-layout start >> >> Starting rebalance: >> >> gluster volume rebalance vmware2 start >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> >> >> >> >> >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170328/78ab0b52/attachment.html>
Gandalf Corvotempesta
2017-Mar-29 07:02 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Is rebalance and fix layout needed when adding new bricks? Any workaround for extending a cluster without loose data? Il 28 mar 2017 8:19 PM, "Pranith Kumar Karampuri" <pkarampu at redhat.com> ha scritto:> > > On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > >> Hi, >> >> >> Do you guys have any update regarding this issue ? >> > I do not actively work on this issue so I do not have an accurate update, > but from what I heard from Krutika and Raghavendra(works on DHT) is: > Krutika debugged initially and found that the issue seems more likely to be > in DHT, Satheesaran who helped us recreate this issue in lab found that > just fix-layout without rebalance also caused the corruption 1 out of 3 > times. Raghavendra came up with a possible RCA for why this can happen. > Raghavendra(CCed) would be the right person to provide accurate update. > >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> *From:* Krutika Dhananjay <kdhananj at redhat.com> >> *Sent:* Tuesday, March 21, 2017 3:02:55 PM >> *To:* Mahdi Adnan >> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; >> gluster-users at gluster.org List >> >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> Hi, >> >> So it looks like Satheesaran managed to recreate this issue. We will be >> seeking his help in debugging this. It will be easier that way. >> >> -Krutika >> >> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >> wrote: >> >>> Hello and thank you for your email. >>> Actually no, i didn't check the gfid of the vms. >>> If this will help, i can setup a new test cluster and get all the data >>> you need. >>> >>> Get Outlook for Android <https://aka.ms/ghei36> >>> >>> From: Nithya Balachandran >>> Sent: Monday, March 20, 20:57 >>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> To: Krutika Dhananjay >>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, >>> gluster-users at gluster.org List >>> >>> Hi, >>> >>> Do you know the GFIDs of the VM images which were corrupted? >>> >>> Regards, >>> >>> Nithya >>> >>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> >>> wrote: >>> >>> I looked at the logs. >>> >>> From the time the new graph (since the add-brick command you shared >>> where bricks 41 through 44 are added) is switched to (line 3011 onwards in >>> nfs-gfapi.log), I see the following kinds of errors: >>> >>> 1. Lookups to a bunch of files failed with ENOENT on both replicas which >>> protocol/client converts to ESTALE. I am guessing these entries got >>> migrated to >>> >>> other subvolumes leading to 'No such file or directory' errors. >>> >>> DHT and thereafter shard get the same error code and log the following: >>> >>> 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] >>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht: >>> <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup the >>> file on vmware2-dht [Stale file handle] >>> >>> >>> 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] >>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: >>> a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] >>> >>> which is fine. >>> >>> 2. The other kind are from AFR logging of possible split-brain which I >>> suppose are harmless too. >>> [2017-03-17 14:23:36.968883] W [MSGID: 108008] >>> [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable >>> subvolume -1 found with event generation 2 for gfid >>> 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) >>> >>> Since you are saying the bug is hit only on VMs that are undergoing IO >>> while rebalance is running (as opposed to those that remained powered off), >>> >>> rebalance + IO could be causing some issues. >>> >>> CC'ing DHT devs >>> >>> Raghavendra/Nithya/Susant, >>> >>> Could you take a look? >>> >>> -Krutika >>> >>> >>> On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>> Thank you for your email mate. >>> >>> Yes, im aware of this but, to save costs i chose replica 2, this cluster >>> is all flash. >>> >>> In version 3.7.x i had issues with ping timeout, if one hosts went down >>> for few seconds the whole cluster hangs and become unavailable, to avoid >>> this i adjusted the ping timeout to 5 seconds. >>> >>> As for choosing Ganesha over gfapi, VMWare does not support Gluster >>> (FUSE or gfapi) im stuck with NFS for this volume. >>> >>> The other volume is mounted using gfapi in oVirt cluster. >>> >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Sunday, March 19, 2017 2:01:49 PM >>> >>> *To:* Mahdi Adnan >>> *Cc:* gluster-users at gluster.org >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> >>> >>> While I'm still going through the logs, just wanted to point out a >>> couple of things: >>> >>> 1. It is recommended that you use 3-way replication (replica count 3) >>> for VM store use case >>> >>> 2. network.ping-timeout at 5 seconds is way too low. Please change it to >>> 30. >>> >>> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? >>> >>> Will get back with anything else I might find or more questions if I >>> have any. >>> >>> -Krutika >>> >>> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>> Thanks mate, >>> >>> Kindly, check the attachment. >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Sunday, March 19, 2017 10:00:22 AM >>> >>> *To:* Mahdi Adnan >>> *Cc:* gluster-users at gluster.org >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> >>> >>> In that case could you share the ganesha-gfapi logs? >>> >>> -Krutika >>> >>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>> I have two volumes, one is mounted using libgfapi for ovirt mount, the >>> other one is exported via NFS-Ganesha for VMWare which is the one im >>> testing now. >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Sunday, March 19, 2017 8:02:19 AM >>> >>> *To:* Mahdi Adnan >>> *Cc:* gluster-users at gluster.org >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> >>> >>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>> Kindly, check the attached new log file, i dont know if it's helpful or >>> not but, i couldn't find the log with the name you just described. >>> >>> >>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it >>> NFS? >>> >>> >>> >>> -Krutika >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> *Sent:* Saturday, March 18, 2017 6:10:40 PM >>> >>> *To:* Mahdi Adnan >>> *Cc:* gluster-users at gluster.org >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> >>> >>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse >>> mount logs? It should be right under /var/log/glusterfs/ directory >>> >>> named after the mount point name, only hyphenated. >>> >>> -Krutika >>> >>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>> Hello Krutika, >>> >>> Kindly, check the attached logs. >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> *From:* Krutika Dhananjay <kdhananj at redhat.com> >>> >>> *Sent:* Saturday, March 18, 2017 3:29:03 PM >>> *To:* Mahdi Adnan >>> *Cc:* gluster-users at gluster.org >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> >>> >>> Hi Mahdi, >>> >>> Could you attach mount, brick and rebalance logs? >>> >>> -Krutika >>> >>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com> >>> wrote: >>> >>> Hi, >>> >>> I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure >>> in a volume contains few VMs. >>> >>> After the completion of rebalance, i have rebooted the VMs, some of ran >>> just fine, and others just crashed. >>> >>> Windows boot to recovery mode and Linux throw xfs errors and does not >>> boot. >>> >>> I ran the test again and it happened just as the first one, but i have >>> noticed only VMs doing disk IOs are affected by this bug. >>> >>> The VMs in power off mode started fine and even md5 of the disk file did >>> not change after the rebalance. >>> >>> anyone else can confirm this ? >>> >>> Volume info: >>> >>> >>> >>> Volume Name: vmware2 >>> >>> Type: Distributed-Replicate >>> >>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>> >>> Status: Started >>> >>> Snapshot Count: 0 >>> >>> Number of Bricks: 22 x 2 = 44 >>> >>> Transport-type: tcp >>> >>> Bricks: >>> >>> Brick1: gluster01:/mnt/disk1/vmware2 >>> >>> Brick2: gluster03:/mnt/disk1/vmware2 >>> >>> Brick3: gluster02:/mnt/disk1/vmware2 >>> >>> Brick4: gluster04:/mnt/disk1/vmware2 >>> >>> Brick5: gluster01:/mnt/disk2/vmware2 >>> >>> Brick6: gluster03:/mnt/disk2/vmware2 >>> >>> Brick7: gluster02:/mnt/disk2/vmware2 >>> >>> Brick8: gluster04:/mnt/disk2/vmware2 >>> >>> Brick9: gluster01:/mnt/disk3/vmware2 >>> >>> Brick10: gluster03:/mnt/disk3/vmware2 >>> >>> Brick11: gluster02:/mnt/disk3/vmware2 >>> >>> Brick12: gluster04:/mnt/disk3/vmware2 >>> >>> Brick13: gluster01:/mnt/disk4/vmware2 >>> >>> Brick14: gluster03:/mnt/disk4/vmware2 >>> >>> Brick15: gluster02:/mnt/disk4/vmware2 >>> >>> Brick16: gluster04:/mnt/disk4/vmware2 >>> >>> Brick17: gluster01:/mnt/disk5/vmware2 >>> >>> Brick18: gluster03:/mnt/disk5/vmware2 >>> >>> Brick19: gluster02:/mnt/disk5/vmware2 >>> >>> Brick20: gluster04:/mnt/disk5/vmware2 >>> >>> Brick21: gluster01:/mnt/disk6/vmware2 >>> >>> Brick22: gluster03:/mnt/disk6/vmware2 >>> >>> Brick23: gluster02:/mnt/disk6/vmware2 >>> >>> Brick24: gluster04:/mnt/disk6/vmware2 >>> >>> Brick25: gluster01:/mnt/disk7/vmware2 >>> >>> Brick26: gluster03:/mnt/disk7/vmware2 >>> >>> Brick27: gluster02:/mnt/disk7/vmware2 >>> >>> Brick28: gluster04:/mnt/disk7/vmware2 >>> >>> Brick29: gluster01:/mnt/disk8/vmware2 >>> >>> Brick30: gluster03:/mnt/disk8/vmware2 >>> >>> Brick31: gluster02:/mnt/disk8/vmware2 >>> >>> Brick32: gluster04:/mnt/disk8/vmware2 >>> >>> Brick33: gluster01:/mnt/disk9/vmware2 >>> >>> Brick34: gluster03:/mnt/disk9/vmware2 >>> >>> Brick35: gluster02:/mnt/disk9/vmware2 >>> >>> Brick36: gluster04:/mnt/disk9/vmware2 >>> >>> Brick37: gluster01:/mnt/disk10/vmware2 >>> >>> Brick38: gluster03:/mnt/disk10/vmware2 >>> >>> Brick39: gluster02:/mnt/disk10/vmware2 >>> >>> Brick40: gluster04:/mnt/disk10/vmware2 >>> >>> Brick41: gluster01:/mnt/disk11/vmware2 >>> >>> Brick42: gluster03:/mnt/disk11/vmware2 >>> >>> Brick43: gluster02:/mnt/disk11/vmware2 >>> >>> Brick44: gluster04:/mnt/disk11/vmware2 >>> >>> Options Reconfigured: >>> >>> cluster.server-quorum-type: server >>> >>> nfs.disable: on >>> >>> performance.readdir-ahead: on >>> >>> transport.address-family: inet >>> >>> performance.quick-read: off >>> >>> performance.read-ahead: off >>> >>> performance.io-cache: off >>> >>> performance.stat-prefetch: off >>> >>> cluster.eager-lock: enable >>> >>> network.remote-dio: enable >>> >>> features.shard: on >>> >>> cluster.data-self-heal-algorithm: full >>> >>> features.cache-invalidation: on >>> >>> ganesha.enable: on >>> >>> features.shard-block-size: 256MB >>> >>> client.event-threads: 2 >>> >>> server.event-threads: 2 >>> >>> cluster.favorite-child-policy: size >>> >>> storage.build-pgfid: off >>> >>> network.ping-timeout: 5 >>> >>> cluster.enable-shared-storage: enable >>> >>> nfs-ganesha: enable >>> >>> cluster.server-quorum-ratio: 51% >>> >>> Adding bricks: >>> >>> gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 >>> gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 >>> gluster04:/mnt/disk11/vmware2 >>> >>> starting fix layout: >>> >>> gluster volume rebalance vmware2 fix-layout start >>> >>> Starting rebalance: >>> >>> gluster volume rebalance vmware2 start >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Pranith > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170329/644735be/attachment.html>