Hi, We have a volume of 4 servers 8x2 bricks (Distributed-Replicate) hosting VMs for ESXi, i tried expanding the volume with 8 more bricks, and after rebalancing the volume, the VMs got corrupted. Gluster version is 3.8.9 and the volume is using the default parameters of group "virt" plus sharding. I created a new volume without sharding and got the same issue after the rebalance. I checked the reported bugs and the mailing list, and i noticed it's a bug in Gluster. Is it affecting all of Gluster versions ? is there any workaround or a volume setup that is not affected by this issue ? Thank you. -- Respectfully Mahdi A. Mahdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170225/3810ed10/attachment.html>
> We have a volume of 4 servers 8x2 bricks (Distributed-Replicate) hosting VMs for ESXi, i tried expanding the volume with 8 more bricks, and after rebalancing the volume, the VMs got corrupted. > [...] > Is it affecting all of Gluster versions ? is there any workaround or a volume setup that is not affected by this issue ?Sure sounds like what corrupted everything for me a few months ago :). Had to spend the whole night re-creating the VMs from backups, and explaining the dataloss and downtime to the clients wasn't easy. Unfortunatly I believe they never managed to reproduce the issue, so I don't think it was ever fixed, no. We are using 3.7.13 so downgrading won't help you, I don't know of any workaround. We decided to just not expand volumes, when one is full we just create a new one instead of adding bricks to the existing. Not ideal, but not a bid deal, at least yet. Since VMs are easy enough to live migrate from one volume to another, it seemed like the easiest solution. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170226/1c7b09e6/attachment.sig>
Hi, We fixed this (thanks to Satheesaran for recreating the issue and to Raghavendra G and Pranith for the RCA) as recently as last week. The bug was in DHT-shard interaction. The patches are https://review.gluster.org/#/c/16709/ followed by https://review.gluster.org/#/c/14419 to be applied in that order. Do you mind giving these a try before it makes it into the next .x releases of 3.8, 3.9 and 3.10? I could make the src tarball with these patches applied if you like. -Krutika On Sat, Feb 25, 2017 at 8:56 PM, Mahdi Adnan <mahdi.adnan at outlook.com> wrote:> Hi, > > > We have a volume of 4 servers 8x2 bricks (Distributed-Replicate) hosting > VMs for ESXi, i tried expanding the volume with 8 more bricks, and after > rebalancing the volume, the VMs got corrupted. > > Gluster version is 3.8.9 and the volume is using the default parameters of > group "virt" plus sharding. > > I created a new volume without sharding and got the same issue after the > rebalance. > > I checked the reported bugs and the mailing list, and i noticed it's a bug > in Gluster. > > Is it affecting all of Gluster versions ? is there any workaround or a > volume setup that is not affected by this issue ? > > > Thank you. > > -- > > Respectfully > *Mahdi A. Mahdi* > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170226/c8bce77e/attachment.html>