Hi, I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure in a volume contains few VMs. After the completion of rebalance, i have rebooted the VMs, some of ran just fine, and others just crashed. Windows boot to recovery mode and Linux throw xfs errors and does not boot. I ran the test again and it happened just as the first one, but i have noticed only VMs doing disk IOs are affected by this bug. The VMs in power off mode started fine and even md5 of the disk file did not change after the rebalance. anyone else can confirm this ? Volume info: Volume Name: vmware2 Type: Distributed-Replicate Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf Status: Started Snapshot Count: 0 Number of Bricks: 22 x 2 = 44 Transport-type: tcp Bricks: Brick1: gluster01:/mnt/disk1/vmware2 Brick2: gluster03:/mnt/disk1/vmware2 Brick3: gluster02:/mnt/disk1/vmware2 Brick4: gluster04:/mnt/disk1/vmware2 Brick5: gluster01:/mnt/disk2/vmware2 Brick6: gluster03:/mnt/disk2/vmware2 Brick7: gluster02:/mnt/disk2/vmware2 Brick8: gluster04:/mnt/disk2/vmware2 Brick9: gluster01:/mnt/disk3/vmware2 Brick10: gluster03:/mnt/disk3/vmware2 Brick11: gluster02:/mnt/disk3/vmware2 Brick12: gluster04:/mnt/disk3/vmware2 Brick13: gluster01:/mnt/disk4/vmware2 Brick14: gluster03:/mnt/disk4/vmware2 Brick15: gluster02:/mnt/disk4/vmware2 Brick16: gluster04:/mnt/disk4/vmware2 Brick17: gluster01:/mnt/disk5/vmware2 Brick18: gluster03:/mnt/disk5/vmware2 Brick19: gluster02:/mnt/disk5/vmware2 Brick20: gluster04:/mnt/disk5/vmware2 Brick21: gluster01:/mnt/disk6/vmware2 Brick22: gluster03:/mnt/disk6/vmware2 Brick23: gluster02:/mnt/disk6/vmware2 Brick24: gluster04:/mnt/disk6/vmware2 Brick25: gluster01:/mnt/disk7/vmware2 Brick26: gluster03:/mnt/disk7/vmware2 Brick27: gluster02:/mnt/disk7/vmware2 Brick28: gluster04:/mnt/disk7/vmware2 Brick29: gluster01:/mnt/disk8/vmware2 Brick30: gluster03:/mnt/disk8/vmware2 Brick31: gluster02:/mnt/disk8/vmware2 Brick32: gluster04:/mnt/disk8/vmware2 Brick33: gluster01:/mnt/disk9/vmware2 Brick34: gluster03:/mnt/disk9/vmware2 Brick35: gluster02:/mnt/disk9/vmware2 Brick36: gluster04:/mnt/disk9/vmware2 Brick37: gluster01:/mnt/disk10/vmware2 Brick38: gluster03:/mnt/disk10/vmware2 Brick39: gluster02:/mnt/disk10/vmware2 Brick40: gluster04:/mnt/disk10/vmware2 Brick41: gluster01:/mnt/disk11/vmware2 Brick42: gluster03:/mnt/disk11/vmware2 Brick43: gluster02:/mnt/disk11/vmware2 Brick44: gluster04:/mnt/disk11/vmware2 Options Reconfigured: cluster.server-quorum-type: server nfs.disable: on performance.readdir-ahead: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable features.shard: on cluster.data-self-heal-algorithm: full features.cache-invalidation: on ganesha.enable: on features.shard-block-size: 256MB client.event-threads: 2 server.event-threads: 2 cluster.favorite-child-policy: size storage.build-pgfid: off network.ping-timeout: 5 cluster.enable-shared-storage: enable nfs-ganesha: enable cluster.server-quorum-ratio: 51% Adding bricks: gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 starting fix layout: gluster volume rebalance vmware2 fix-layout start Starting rebalance: gluster volume rebalance vmware2 start -- Respectfully Mahdi A. Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170317/cfb48b37/attachment.html>
Kevin Lemonnier
2017-Mar-17 20:11 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Hi, Yes, that's an old-ish bug. They fixed it recently, but I guess the fix hasn't made it's way into a public release yet. You'll have to get everything up from backups I think, been there :/ On Fri, Mar 17, 2017 at 06:44:14PM +0000, Mahdi Adnan wrote:> Hi, > > I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure in a volume contains few VMs. > After the completion of rebalance, i have rebooted the VMs, some of ran just fine, and others just crashed. > Windows boot to recovery mode and Linux throw xfs errors and does not boot. > I ran the test again and it happened just as the first one, but i have noticed only VMs doing disk IOs are affected by this bug. > The VMs in power off mode started fine and even md5 of the disk file did not change after the rebalance. > > anyone else can confirm this ? > > > Volume info: > > Volume Name: vmware2 > Type: Distributed-Replicate > Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf > Status: Started > Snapshot Count: 0 > Number of Bricks: 22 x 2 = 44 > Transport-type: tcp > Bricks: > Brick1: gluster01:/mnt/disk1/vmware2 > Brick2: gluster03:/mnt/disk1/vmware2 > Brick3: gluster02:/mnt/disk1/vmware2 > Brick4: gluster04:/mnt/disk1/vmware2 > Brick5: gluster01:/mnt/disk2/vmware2 > Brick6: gluster03:/mnt/disk2/vmware2 > Brick7: gluster02:/mnt/disk2/vmware2 > Brick8: gluster04:/mnt/disk2/vmware2 > Brick9: gluster01:/mnt/disk3/vmware2 > Brick10: gluster03:/mnt/disk3/vmware2 > Brick11: gluster02:/mnt/disk3/vmware2 > Brick12: gluster04:/mnt/disk3/vmware2 > Brick13: gluster01:/mnt/disk4/vmware2 > Brick14: gluster03:/mnt/disk4/vmware2 > Brick15: gluster02:/mnt/disk4/vmware2 > Brick16: gluster04:/mnt/disk4/vmware2 > Brick17: gluster01:/mnt/disk5/vmware2 > Brick18: gluster03:/mnt/disk5/vmware2 > Brick19: gluster02:/mnt/disk5/vmware2 > Brick20: gluster04:/mnt/disk5/vmware2 > Brick21: gluster01:/mnt/disk6/vmware2 > Brick22: gluster03:/mnt/disk6/vmware2 > Brick23: gluster02:/mnt/disk6/vmware2 > Brick24: gluster04:/mnt/disk6/vmware2 > Brick25: gluster01:/mnt/disk7/vmware2 > Brick26: gluster03:/mnt/disk7/vmware2 > Brick27: gluster02:/mnt/disk7/vmware2 > Brick28: gluster04:/mnt/disk7/vmware2 > Brick29: gluster01:/mnt/disk8/vmware2 > Brick30: gluster03:/mnt/disk8/vmware2 > Brick31: gluster02:/mnt/disk8/vmware2 > Brick32: gluster04:/mnt/disk8/vmware2 > Brick33: gluster01:/mnt/disk9/vmware2 > Brick34: gluster03:/mnt/disk9/vmware2 > Brick35: gluster02:/mnt/disk9/vmware2 > Brick36: gluster04:/mnt/disk9/vmware2 > Brick37: gluster01:/mnt/disk10/vmware2 > Brick38: gluster03:/mnt/disk10/vmware2 > Brick39: gluster02:/mnt/disk10/vmware2 > Brick40: gluster04:/mnt/disk10/vmware2 > Brick41: gluster01:/mnt/disk11/vmware2 > Brick42: gluster03:/mnt/disk11/vmware2 > Brick43: gluster02:/mnt/disk11/vmware2 > Brick44: gluster04:/mnt/disk11/vmware2 > Options Reconfigured: > cluster.server-quorum-type: server > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > features.shard: on > cluster.data-self-heal-algorithm: full > features.cache-invalidation: on > ganesha.enable: on > features.shard-block-size: 256MB > client.event-threads: 2 > server.event-threads: 2 > cluster.favorite-child-policy: size > storage.build-pgfid: off > network.ping-timeout: 5 > cluster.enable-shared-storage: enable > nfs-ganesha: enable > cluster.server-quorum-ratio: 51% > > > Adding bricks: > gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 > > > starting fix layout: > gluster volume rebalance vmware2 fix-layout start > > Starting rebalance: > gluster volume rebalance vmware2 start > > > > -- > > Respectfully > Mahdi A. Mahdi >> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170317/537bcc60/attachment.sig>
Krutika Dhananjay
2017-Mar-18 12:29 UTC
[Gluster-users] Gluster 3.8.10 rebalance VMs corruption
Hi Mahdi, Could you attach mount, brick and rebalance logs? -Krutika On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hi, > > I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure in > a volume contains few VMs. > After the completion of rebalance, i have rebooted the VMs, some of ran > just fine, and others just crashed. > Windows boot to recovery mode and Linux throw xfs errors and does not boot. > I ran the test again and it happened just as the first one, but i have > noticed only VMs doing disk IOs are affected by this bug. > The VMs in power off mode started fine and even md5 of the disk file did > not change after the rebalance. > > anyone else can confirm this ? > > > Volume info: > > Volume Name: vmware2 > Type: Distributed-Replicate > Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf > Status: Started > Snapshot Count: 0 > Number of Bricks: 22 x 2 = 44 > Transport-type: tcp > Bricks: > Brick1: gluster01:/mnt/disk1/vmware2 > Brick2: gluster03:/mnt/disk1/vmware2 > Brick3: gluster02:/mnt/disk1/vmware2 > Brick4: gluster04:/mnt/disk1/vmware2 > Brick5: gluster01:/mnt/disk2/vmware2 > Brick6: gluster03:/mnt/disk2/vmware2 > Brick7: gluster02:/mnt/disk2/vmware2 > Brick8: gluster04:/mnt/disk2/vmware2 > Brick9: gluster01:/mnt/disk3/vmware2 > Brick10: gluster03:/mnt/disk3/vmware2 > Brick11: gluster02:/mnt/disk3/vmware2 > Brick12: gluster04:/mnt/disk3/vmware2 > Brick13: gluster01:/mnt/disk4/vmware2 > Brick14: gluster03:/mnt/disk4/vmware2 > Brick15: gluster02:/mnt/disk4/vmware2 > Brick16: gluster04:/mnt/disk4/vmware2 > Brick17: gluster01:/mnt/disk5/vmware2 > Brick18: gluster03:/mnt/disk5/vmware2 > Brick19: gluster02:/mnt/disk5/vmware2 > Brick20: gluster04:/mnt/disk5/vmware2 > Brick21: gluster01:/mnt/disk6/vmware2 > Brick22: gluster03:/mnt/disk6/vmware2 > Brick23: gluster02:/mnt/disk6/vmware2 > Brick24: gluster04:/mnt/disk6/vmware2 > Brick25: gluster01:/mnt/disk7/vmware2 > Brick26: gluster03:/mnt/disk7/vmware2 > Brick27: gluster02:/mnt/disk7/vmware2 > Brick28: gluster04:/mnt/disk7/vmware2 > Brick29: gluster01:/mnt/disk8/vmware2 > Brick30: gluster03:/mnt/disk8/vmware2 > Brick31: gluster02:/mnt/disk8/vmware2 > Brick32: gluster04:/mnt/disk8/vmware2 > Brick33: gluster01:/mnt/disk9/vmware2 > Brick34: gluster03:/mnt/disk9/vmware2 > Brick35: gluster02:/mnt/disk9/vmware2 > Brick36: gluster04:/mnt/disk9/vmware2 > Brick37: gluster01:/mnt/disk10/vmware2 > Brick38: gluster03:/mnt/disk10/vmware2 > Brick39: gluster02:/mnt/disk10/vmware2 > Brick40: gluster04:/mnt/disk10/vmware2 > Brick41: gluster01:/mnt/disk11/vmware2 > Brick42: gluster03:/mnt/disk11/vmware2 > Brick43: gluster02:/mnt/disk11/vmware2 > Brick44: gluster04:/mnt/disk11/vmware2 > Options Reconfigured: > cluster.server-quorum-type: server > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > features.shard: on > cluster.data-self-heal-algorithm: full > features.cache-invalidation: on > ganesha.enable: on > features.shard-block-size: 256MB > client.event-threads: 2 > server.event-threads: 2 > cluster.favorite-child-policy: size > storage.build-pgfid: off > network.ping-timeout: 5 > cluster.enable-shared-storage: enable > nfs-ganesha: enable > cluster.server-quorum-ratio: 51% > > > Adding bricks: > gluster volume add-brick vmware2 replica 2 gluster01:/mnt/disk11/vmware2 > gluster03:/mnt/disk11/vmware2 gluster02:/mnt/disk11/vmware2 > gluster04:/mnt/disk11/vmware2 > > > starting fix layout: > gluster volume rebalance vmware2 fix-layout start > > Starting rebalance: > gluster volume rebalance vmware2 start > > > > -- > > Respectfully > *Mahdi A. Mahdi* > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170318/f6240fac/attachment.html>