Pranith Kumar Karampuri
2016-Nov-14 11:20 UTC
[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks
On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta < gandalf.corvotempesta at gmail.com> wrote:> 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <pkarampu at redhat.com>: > > To make gluster stable for VM images we had to add all these new features > > and then fix all the bugs Lindsay/Kevin reported. We just fixed a > corruption > > issue that can happen with replace-brick which will be available in 3.9.0 > > and 3.8.6. The only 2 other known issues that can lead to corruptions are > > add-brick and the bug you filed Gandalf. Krutika just 5 minutes back saw > > something that could possibly lead to the corruption for the add-brick > bug. > > Is that really the Root cause? We are not sure yet, we need more time. > > Without Lindsay/Kevin/David Gossage's support this workload would have > been > > in much worse condition. These bugs are not easy to re-create thus not > easy > > to fix. At least that has been Krutika's experience. > > Ok, but this changes should be placed in a "test" version and not > marked as stable. > I don't see any development release, only stable releases here. > Do you want all features ? Try the "beta/rc/unstable/alpha/dev" version. > Do you want the stable version without known bugs but slow on VMs > workload? Use the "-stable" version. > > If you relase as stable, users tend to upgrade their cluster and use > the newer feature (that you are marking as stable). > What If I upgrade a production cluster to a stable version and try to > add-brick that lead to data corruption ? > I have to restore terabytes worth of data? Gluster is made for > scale-out, what I my cluster was made with 500TB of VMs ? > Try to restore 500TB from a backup.................... > > This is unacceptable. add-brick/replace-brick should be common "daily" > operations. You should heavy check these for regression or bug. >This is a very good point. Adding other maintainers.> > > One more take away is to get the > > documentation right. Lack of documentation led Alex to try the worst > > possible combo for storing VMs on gluster. So we as community failed in > some > > way there as well. > > > > Krutika will be sending out VM usecase related documentation after > > 28th of this month. If you have any other feedback, do let us know. > > Yes, lack of updated docs or a reference architecture is a big issue. >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161114/b467ce90/attachment.html>
Niels de Vos
2016-Nov-14 14:54 UTC
[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks
On Mon, Nov 14, 2016 at 04:50:44PM +0530, Pranith Kumar Karampuri wrote:> On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta < > gandalf.corvotempesta at gmail.com> wrote: > > > 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <pkarampu at redhat.com>: > > > To make gluster stable for VM images we had to add all these new features > > > and then fix all the bugs Lindsay/Kevin reported. We just fixed a > > corruption > > > issue that can happen with replace-brick which will be available in 3.9.0 > > > and 3.8.6. The only 2 other known issues that can lead to corruptions are > > > add-brick and the bug you filed Gandalf. Krutika just 5 minutes back saw > > > something that could possibly lead to the corruption for the add-brick > > bug. > > > Is that really the Root cause? We are not sure yet, we need more time. > > > Without Lindsay/Kevin/David Gossage's support this workload would have > > been > > > in much worse condition. These bugs are not easy to re-create thus not > > easy > > > to fix. At least that has been Krutika's experience. > > > > Ok, but this changes should be placed in a "test" version and not > > marked as stable. > > I don't see any development release, only stable releases here. > > Do you want all features ? Try the "beta/rc/unstable/alpha/dev" version. > > Do you want the stable version without known bugs but slow on VMs > > workload? Use the "-stable" version. > > > > If you relase as stable, users tend to upgrade their cluster and use > > the newer feature (that you are marking as stable). > > What If I upgrade a production cluster to a stable version and try to > > add-brick that lead to data corruption ? > > I have to restore terabytes worth of data? Gluster is made for > > scale-out, what I my cluster was made with 500TB of VMs ? > > Try to restore 500TB from a backup.................... > > > > This is unacceptable. add-brick/replace-brick should be common "daily" > > operations. You should heavy check these for regression or bug. > > > > This is a very good point. Adding other maintainers.Obviously this is unacceptible for versions that have sharding as a functional (not experimental) feature. All supported features are expected to function without major problems (like corruption) for all standard Gluster operations. Add-brick/replace-brick are surely such Gluster operations. Of course it is possible that this does not always happen, and our tests did not catch the problem. In that case, we really need to have a bug report with all the details, and preferably a script that can be used to reproduce and detect the failure. FWIW sharding has several open bugs (like any other component), but it is not immediately clear to me if the problem reported in this email is in Bugzilla yet. These are the bugs that are expected to get fixed in upcoming minor releases: https://bugzilla.redhat.com/buglist.cgi?component=sharding&f1=bug_status&f2=version&o1=notequals&o2=notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline HTH, Niels -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161114/e433d1fd/attachment.sig>