Vijay Bellur
2016-Nov-14 16:01 UTC
[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks
On Mon, Nov 14, 2016 at 10:38 AM, Gandalf Corvotempesta <gandalf.corvotempesta at gmail.com> wrote:> 2016-11-14 15:54 GMT+01:00 Niels de Vos <ndevos at redhat.com>: >> Obviously this is unacceptible for versions that have sharding as a >> functional (not experimental) feature. All supported features are >> expected to function without major problems (like corruption) for all >> standard Gluster operations. Add-brick/replace-brick are surely such >> Gluster operations. > > Is sharding an experimental feature even in 3.8 ? > Because in 3.8 announcement, it's declared stable: > http://blog.gluster.org/2016/06/glusterfs-3-8-released/ > "Sharding is now stable for VM image storage. " >sharding was an experimental feature in 3.7. Based on the feedback that we received in testing, we called it out as stable in 3.8. The add-brick related issue is something that none of us encountered in testing and we will determine how we can avoid missing such problems in the future.>> FWIW sharding has several open bugs (like any other component), but it >> is not immediately clear to me if the problem reported in this email is >> in Bugzilla yet. These are the bugs that are expected to get fixed in >> upcoming minor releases: >> https://bugzilla.redhat.com/buglist.cgi?component=sharding&f1=bug_status&f2=version&o1=notequals&o2=notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline > > My issue with sharding was reported in bugzilla on 2016-07-12 > 4 months for a IMHO, critical bug. > > If you disable sharding on a sharded volume with existing shared data, > you corrupt every existing file.Accessing sharded data after disabling sharding is something that we did not visualize as a valid use case at any point in time. Also, you could access the contents by enabling sharding again. Given these factors I think this particular problem has not been prioritized by us. As with many other projects, we are in a stage today where the number of users and testers far outweigh the number of developers contributing code. With this state it becomes hard to prioritize problems from a long todo list for developers. If valuable community members like you feel strongly about a bug or feature that need attention of developers, please call such issues out on the mailing list. We will be more than happy to help. Having explained the developer perspective, I do apologize for any inconvenience you might have encountered from this particular bug. Thanks! Vijay
Gandalf Corvotempesta
2016-Nov-14 16:29 UTC
[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks
1016-11-14 17:01 GMT+01:00 Vijay Bellur <vbellur at redhat.com>:> Accessing sharded data after disabling sharding is something that we > did not visualize as a valid use case at any point in time. Also, you > could access the contents by enabling sharding again. Given these > factors I think this particular problem has not been prioritized by > us.That's not true. If you have VMs running on a sharded volume and you disable sharding, with the VM still running, everything crash and could lead to data loss, as VM will be unable to find their filesystem and so on, qemu currupts the image and so on..... If I write to a file that was shareded, (in example a log file), now when you disable the shard, the application would write the existing file (the one that was the first shard). If you reenable sharding, you lost some data Example: 128MB file. shard set to 64MB. You have 2 chunks: shard1+shard2 Now you are writing to the file: AAAA BBBB CCCC DDDD AAAA+BBBB are placed on shard1, CCCC+DDDD are placed on shard2 If you disable the shard and write some extra data, EEEE, then EEEE would be placed after BBBB in shard1 (growing more than 64MB) and not on shard3 If you re-enable shard, EEEE is lost, as gluster would expect it as shard3. and I think gluster will read only the first 64MB from shard1. If gluster read the whole file, you'll get something like this: AAAA BBBB EEEE CCCC DDDD in a text file this is bad, in a VM image, this mean data loss/corruption almost impossible to fix.> As with many other projects, we are in a stage today where the number > of users and testers far outweigh the number of developers > contributing code. With this state it becomes hard to prioritize > problems from a long todo list for developers. If valuable community > members like you feel strongly about a bug or feature that need > attention of developers, please call such issues out on the mailing > list. We will be more than happy to help.That's why i've asked for less feature and more stability. If you have to prioritize, please choose all bugs that could lead to data corruption or similiar.