Krutika Dhananjay
2017-Mar-21 12:02 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Hi, So it looks like Satheesaran managed to recreate this issue. We will be seeking his help in debugging this. It will be easier that way. -Krutika On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hello and thank you for your email. > Actually no, i didn't check the gfid of the vms. > If this will help, i can setup a new test cluster and get all the data you > need. > > Get Outlook for Android <https://aka.ms/ghei36> > > From: Nithya Balachandran > Sent: Monday, March 20, 20:57 > Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > To: Krutika Dhananjay > Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, > gluster-users at gluster.org List > > Hi, > > Do you know the GFIDs of the VM images which were corrupted? > > Regards, > > Nithya > > On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com> wrote: > > I looked at the logs. > > From the time the new graph (since the add-brick command you shared where > bricks 41 through 44 are added) is switched to (line 3011 onwards in > nfs-gfapi.log), I see the following kinds of errors: > > 1. Lookups to a bunch of files failed with ENOENT on both replicas which > protocol/client converts to ESTALE. I am guessing these entries got > migrated to > > other subvolumes leading to 'No such file or directory' errors. > > DHT and thereafter shard get the same error code and log the following: > > 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] [dht-helper.c:1198:dht_migration_complete_check_task] > 17-vmware2-dht: <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed > to lookup the file on vmware2-dht [Stale file handle] > > > 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] > [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: > a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] > > which is fine. > > 2. The other kind are from AFR logging of possible split-brain which I > suppose are harmless too. > [2017-03-17 14:23:36.968883] W [MSGID: 108008] > [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable > subvolume -1 found with event generation 2 for gfid > 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) > > Since you are saying the bug is hit only on VMs that are undergoing IO > while rebalance is running (as opposed to those that remained powered off), > > rebalance + IO could be causing some issues. > > CC'ing DHT devs > > Raghavendra/Nithya/Susant, > > Could you take a look? > > -Krutika > > > On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > > Thank you for your email mate. > > Yes, im aware of this but, to save costs i chose replica 2, this cluster > is all flash. > > In version 3.7.x i had issues with ping timeout, if one hosts went down > for few seconds the whole cluster hangs and become unavailable, to avoid > this i adjusted the ping timeout to 5 seconds. > > As for choosing Ganesha over gfapi, VMWare does not support Gluster (FUSE > or gfapi) im stuck with NFS for this volume. > > The other volume is mounted using gfapi in oVirt cluster. > > > > -- > > Respectfully > *Mahdi A. Mahdi* > > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Sunday, March 19, 2017 2:01:49 PM > > *To:* Mahdi Adnan > *Cc:* gluster-users at gluster.org > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > > > While I'm still going through the logs, just wanted to point out a couple > of things: > > 1. It is recommended that you use 3-way replication (replica count 3) for > VM store use case > > 2. network.ping-timeout at 5 seconds is way too low. Please change it to > 30. > > Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? > > Will get back with anything else I might find or more questions if I have > any. > > -Krutika > > On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > > Thanks mate, > > Kindly, check the attachment. > > -- > > Respectfully > *Mahdi A. Mahdi* > > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Sunday, March 19, 2017 10:00:22 AM > > *To:* Mahdi Adnan > *Cc:* gluster-users at gluster.org > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > > > In that case could you share the ganesha-gfapi logs? > > -Krutika > > On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > > I have two volumes, one is mounted using libgfapi for ovirt mount, the > other one is exported via NFS-Ganesha for VMWare which is the one im > testing now. > > -- > > Respectfully > *Mahdi A. Mahdi* > > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Sunday, March 19, 2017 8:02:19 AM > > *To:* Mahdi Adnan > *Cc:* gluster-users at gluster.org > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > > > On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > > Kindly, check the attached new log file, i dont know if it's helpful or > not but, i couldn't find the log with the name you just described. > > > No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS? > > > > -Krutika > > -- > > Respectfully > *Mahdi A. Mahdi* > > *From:* Krutika Dhananjay <kdhananj at redhat.com> > *Sent:* Saturday, March 18, 2017 6:10:40 PM > > *To:* Mahdi Adnan > *Cc:* gluster-users at gluster.org > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > > > mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse > mount logs? It should be right under /var/log/glusterfs/ directory > > named after the mount point name, only hyphenated. > > -Krutika > > On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > > Hello Krutika, > > Kindly, check the attached logs. > > -- > > Respectfully > *Mahdi A. Mahdi* > > *From:* Krutika Dhananjay <kdhananj at redhat.com> > > *Sent:* Saturday, March 18, 2017 3:29:03 PM > *To:* Mahdi Adnan > *Cc:* gluster-users at gluster.org > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > > > Hi Mahdi, > > Could you attach mount, brick and rebalance logs? > > -Krutika > > On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com> > wrote: > > Hi, > > I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure in > a volume contains few VMs. > > After the completion of rebalance, i have rebooted the VMs, some of ran > just fine, and others just crashed. > > Windows boot to recovery mode and Linux throw xfs errors and does not boot. > > I ran the test again and it happened just as the first one, but i have > noticed only VMs doing disk IOs are affected by this bug. > > The VMs in power off mode started fine and even md5 of the disk file did > not change after the rebalance. > > anyone else can confirm this ? > > Volume info: > > > > Volume Name: vmware2 > > Type: Distributed-Replicate > > Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 22 x 2 = 44 > > Transport-type: tcp > > Bricks: > > Brick1: gluster01:/mnt/disk1/vmware2 > > Brick2: gluster03:/mnt/disk1/vmware2 > > Brick3: gluster02:/mnt/disk1/vmware2 > > Brick4: gluster04:/mnt/disk1/vmware2 > > Brick5: gluster01:/mnt/disk2/vmware2 > > Brick6: gluster03:/mnt/disk2/vmware2 > > Brick7: gluster02:/mnt/disk2/vmware2 > > Brick8: gluster04:/mnt/disk2/vmware2 > > Brick9: gluster01:/mnt/disk3/vmware2 > > Brick10: gluster03:/mnt/disk3/vmware2 > > Brick11: gluster02:/mnt/disk3/vmware2 > > Brick12: gluster04:/mnt/disk3/vmware2 > > Brick13: gluster01:/mnt/disk4/vmware2 > > Brick14: gluster03:/mnt/disk4/vmware2 > > Brick15: gluster02:/mnt/disk4/vmware2 > > Brick16: gluster04:/mnt/disk4/vmware2 > > Brick17: gluster01:/mnt/disk5/vmware2 > > Brick18: gluster03:/mnt/disk5/vmware2 > > Brick19: gluster02:/mnt/disk5/vmware2 > > Brick20: gluster04:/mnt/disk5/vmware2 > > Brick21: gluster01:/mnt/disk6/vmware2 > > Brick22: gluster03:/mnt/disk6/vmware2 > > Brick23: gluster02:/mnt/disk6/vmware2 > > Brick24: gluster04:/mnt/disk6/vmware2 > > Brick25: gluster01:/mnt/disk7/vmware2 > > Brick26: gluster03:/mnt/disk7/vmware2 > > Brick27: gluster02:/mnt/disk7/vmware2 > > Brick28: gluster04:/mnt/disk7/vmware2 > > Brick29: gluster01:/mnt/disk8/vmware2 > > Brick30: gluster03:/mnt/disk8/vmware2 > > Brick31: gluster02:/mnt/disk8/vmware2 > > Brick32: gluster04:/mnt/disk8/vmware2 > > Brick33: gluster01:/mnt/disk9/vmware2 > > Brick34: gluster03:/mnt/disk9/vmware2 > > Brick35: gluster02:/mnt/disk9/vmware2 > > Brick36: gluster04:/mnt/disk9/vmware2 > > Brick37: gluster01:/mnt/disk10/vmware2 > > Brick38: gluster03:/mnt/disk10/vmware2 > > Brick39: gluster02:/mnt/disk10/vmware2 > > Brick40: gluster04:/mnt/disk10/vmware2 > > Brick41: gluster01:/mnt/disk11/vmware2 > > Brick42: gluster03:/mnt/disk11/vmware2 > > Brick43: gluster02:/mnt/disk11/vmware2 > > Brick44: gluster04:/mnt/disk11/vmware2 > > Options Reconfigured: > > cluster.server-quorum-type: server > > nfs.disable: on > > performance.readdir-ahead: on > > transport.address-family: inet > > performance.quick-read: off > > performance.read-ahead: off > > performance.io-cache: off > > performance.stat-prefetch: off > > cluster.eager-lock: enable > > network.remote-dio: enable > > features.shard: on > > cluster.data-self-heal-algorithm: full > > features.cache-invalidation: on > > ganesha.enable: on > > features.shard-block-size: 256MB > > client.event-threads: 2 > > server.event-threads: 2 > > cluster.favorite-child-policy: size > > storage.build-pgfid: off > > network.ping-timeout: 5 > > cluster.enable-shared-storage: enable > > nfs-ganesha: enable > > cluster.server-quorum-ratio: 51% > > Adding bricks: > > gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 > gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 > gluster04:/mnt/disk11/vmware2 > > starting fix layout: > > gluster volume rebalance vmware2 fix-layout start > > Starting rebalance: > > gluster volume rebalance vmware2 start > > > -- > > Respectfully > *Mahdi A. Mahdi* > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170321/30b644c1/attachment.html>
Hi, Do you guys have any update regarding this issue ? -- Respectfully Mahdi A. Mahdi ________________________________ From: Krutika Dhananjay <kdhananj at redhat.com> Sent: Tuesday, March 21, 2017 3:02:55 PM To: Mahdi Adnan Cc: Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai; gluster-users at gluster.org List Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption Hi, So it looks like Satheesaran managed to recreate this issue. We will be seeking his help in debugging this. It will be easier that way. -Krutika On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Hello and thank you for your email. Actually no, i didn't check the gfid of the vms. If this will help, i can setup a new test cluster and get all the data you need. Get Outlook for Android<https://aka.ms/ghei36> From: Nithya Balachandran Sent: Monday, March 20, 20:57 Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption To: Krutika Dhananjay Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai, gluster-users at gluster.org<mailto:gluster-users at gluster.org> List Hi, Do you know the GFIDs of the VM images which were corrupted? Regards, Nithya On 20 March 2017 at 20:37, Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> wrote: I looked at the logs.>From the time the new graph (since the add-brick command you shared where bricks 41 through 44 are added) is switched to (line 3011 onwards in nfs-gfapi.log), I see the following kinds of errors:1. Lookups to a bunch of files failed with ENOENT on both replicas which protocol/client converts to ESTALE. I am guessing these entries got migrated to other subvolumes leading to 'No such file or directory' errors. DHT and thereafter shard get the same error code and log the following: 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040] [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht: <gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed to lookup the file on vmware2-dht [Stale file handle] 1 [2017-03-17 14:04:26.353528] E [MSGID: 133014] [shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed: a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle] which is fine. 2. The other kind are from AFR logging of possible split-brain which I suppose are harmless too. [2017-03-17 14:23:36.968883] W [MSGID: 108008] [afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable subvolume -1 found with event generation 2 for gfid 74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain) Since you are saying the bug is hit only on VMs that are undergoing IO while rebalance is running (as opposed to those that remained powered off), rebalance + IO could be causing some issues. CC'ing DHT devs Raghavendra/Nithya/Susant, Could you take a look? -Krutika On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Thank you for your email mate. Yes, im aware of this but, to save costs i chose replica 2, this cluster is all flash. In version 3.7.x i had issues with ping timeout, if one hosts went down for few seconds the whole cluster hangs and become unavailable, to avoid this i adjusted the ping timeout to 5 seconds. As for choosing Ganesha over gfapi, VMWare does not support Gluster (FUSE or gfapi) im stuck with NFS for this volume. The other volume is mounted using gfapi in oVirt cluster. -- Respectfully Mahdi A. Mahdi From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> Sent: Sunday, March 19, 2017 2:01:49 PM To: Mahdi Adnan Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption While I'm still going through the logs, just wanted to point out a couple of things: 1. It is recommended that you use 3-way replication (replica count 3) for VM store use case 2. network.ping-timeout at 5 seconds is way too low. Please change it to 30. Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? Will get back with anything else I might find or more questions if I have any. -Krutika On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Thanks mate, Kindly, check the attachment. -- Respectfully Mahdi A. Mahdi From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> Sent: Sunday, March 19, 2017 10:00:22 AM To: Mahdi Adnan Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption In that case could you share the ganesha-gfapi logs? -Krutika On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: I have two volumes, one is mounted using libgfapi for ovirt mount, the other one is exported via NFS-Ganesha for VMWare which is the one im testing now. -- Respectfully Mahdi A. Mahdi From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> Sent: Sunday, March 19, 2017 8:02:19 AM To: Mahdi Adnan Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Kindly, check the attached new log file, i dont know if it's helpful or not but, i couldn't find the log with the name you just described. No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS? -Krutika -- Respectfully Mahdi A. Mahdi From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> Sent: Saturday, March 18, 2017 6:10:40 PM To: Mahdi Adnan Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse mount logs? It should be right under /var/log/glusterfs/ directory named after the mount point name, only hyphenated. -Krutika On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Hello Krutika, Kindly, check the attached logs. -- Respectfully Mahdi A. Mahdi From: Krutika Dhananjay <kdhananj at redhat.com<mailto:kdhananj at redhat.com>> Sent: Saturday, March 18, 2017 3:29:03 PM To: Mahdi Adnan Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption Hi Mahdi, Could you attach mount, brick and rebalance logs? -Krutika On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com<mailto:mahdi.adnan at outlook.com>> wrote: Hi, I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure in a volume contains few VMs. After the completion of rebalance, i have rebooted the VMs, some of ran just fine, and others just crashed. Windows boot to recovery mode and Linux throw xfs errors and does not boot. I ran the test again and it happened just as the first one, but i have noticed only VMs doing disk IOs are affected by this bug. The VMs in power off mode started fine and even md5 of the disk file did not change after the rebalance. anyone else can confirm this ? Volume info: Volume Name: vmware2 Type: Distributed-Replicate Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf Status: Started Snapshot Count: 0 Number of Bricks: 22 x 2 = 44 Transport-type: tcp Bricks: Brick1: gluster01:/mnt/disk1/vmware2 Brick2: gluster03:/mnt/disk1/vmware2 Brick3: gluster02:/mnt/disk1/vmware2 Brick4: gluster04:/mnt/disk1/vmware2 Brick5: gluster01:/mnt/disk2/vmware2 Brick6: gluster03:/mnt/disk2/vmware2 Brick7: gluster02:/mnt/disk2/vmware2 Brick8: gluster04:/mnt/disk2/vmware2 Brick9: gluster01:/mnt/disk3/vmware2 Brick10: gluster03:/mnt/disk3/vmware2 Brick11: gluster02:/mnt/disk3/vmware2 Brick12: gluster04:/mnt/disk3/vmware2 Brick13: gluster01:/mnt/disk4/vmware2 Brick14: gluster03:/mnt/disk4/vmware2 Brick15: gluster02:/mnt/disk4/vmware2 Brick16: gluster04:/mnt/disk4/vmware2 Brick17: gluster01:/mnt/disk5/vmware2 Brick18: gluster03:/mnt/disk5/vmware2 Brick19: gluster02:/mnt/disk5/vmware2 Brick20: gluster04:/mnt/disk5/vmware2 Brick21: gluster01:/mnt/disk6/vmware2 Brick22: gluster03:/mnt/disk6/vmware2 Brick23: gluster02:/mnt/disk6/vmware2 Brick24: gluster04:/mnt/disk6/vmware2 Brick25: gluster01:/mnt/disk7/vmware2 Brick26: gluster03:/mnt/disk7/vmware2 Brick27: gluster02:/mnt/disk7/vmware2 Brick28: gluster04:/mnt/disk7/vmware2 Brick29: gluster01:/mnt/disk8/vmware2 Brick30: gluster03:/mnt/disk8/vmware2 Brick31: gluster02:/mnt/disk8/vmware2 Brick32: gluster04:/mnt/disk8/vmware2 Brick33: gluster01:/mnt/disk9/vmware2 Brick34: gluster03:/mnt/disk9/vmware2 Brick35: gluster02:/mnt/disk9/vmware2 Brick36: gluster04:/mnt/disk9/vmware2 Brick37: gluster01:/mnt/disk10/vmware2 Brick38: gluster03:/mnt/disk10/vmware2 Brick39: gluster02:/mnt/disk10/vmware2 Brick40: gluster04:/mnt/disk10/vmware2 Brick41: gluster01:/mnt/disk11/vmware2 Brick42: gluster03:/mnt/disk11/vmware2 Brick43: gluster02:/mnt/disk11/vmware2 Brick44: gluster04:/mnt/disk11/vmware2 Options Reconfigured: cluster.server-quorum-type: server nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable features.shard: on cluster.data-self-heal-algorithm: full features.cache-invalidation: on ganesha.enable: on features.shard-block-size: 256MB client.event-threads: 2 server.event-threads: 2 cluster.favorite-child-policy: size storage.build-pgfid: off network.ping-timeout: 5 cluster.enable-shared-storage: enable nfs-ganesha: enable cluster.server-quorum-ratio: 51% Adding bricks: gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 starting fix layout: gluster volume rebalance vmware2 fix-layout start Starting rebalance: gluster volume rebalance vmware2 start -- Respectfully Mahdi A. Mahdi _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170327/ca1aff39/attachment.html>