thr3ads.net - Gluster users - [Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Gandalf Corvotempesta

2016-Nov-12 10:58 UTC

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

Il 12 nov 2016 10:21, "Kevin Lemonnier" <lemonnierk at
ulrar.net> ha scritto:> We've had a lot of problems in the past, but at least for us 3.7.12
(and
3.7.15)> seems to be working pretty well as long as you don't add bricks. We
started doing> multiple little clusters and abandonned the idea of one big cluster, had
no> issues since :)
>
Well, adding bricks could be usefull...  :)

Having to create multiple cluster is not a solution and is much more
expansive.
And if you corrupt data from a single cluster you still have issues

I think would be better to add less features and focus more to stability.
In a software defined storage, stability and consistency are the most
important things

I'm also subscribed to moosefs and lizardfs mailing list and I don't
recall
any single data corruption/data loss event

In gluster,  after some days of testing I've found a huge data corruption
issue that is still unfixed on bugzilla.
If you change the shard size on a populated cluster,  you break all
existing data.
Try to do this on a cluster with working VMs and see what happens....
a single cli command break everything and is still unfixed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161112/d6ee5cc6/attachment.html>

Kevin Lemonnier

2016-Nov-12 11:52 UTC

head link

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

> 
>    Having to create multiple cluster is not a solution and is much more
>    expansive.
>    And if you corrupt data from a single cluster you still have issues
> 
Sure, but thinking about it later we realised that it might be for the better.
I believe when sharding is enabled the shards will be dispersed across all the
replica sets, making it that losing a replica set will kill all your VMs.

Imagine a 16x3 volume for example, losing 2 bricks could bring the whole thing
down if they happen to be in the same replica set. (I might be wrong about the
way gluster disperse shards, it's my understanding only, never had the
chance
to test it).
With multiple small clusters, we have the same disk space in the end but not
that problem, it's a bit more annoying to manage but for now that's
allright.
> 
>    I'm also subscribed to moosefs and lizardfs mailing list and I
don't
>    recall any single data corruption/data loss event
> 
Never used those, might be just because there are less users ? Really have no
idea,
maybe you are right.
>    If you change the shard size on a populated cluster,A  you break all
>    existing data.
Not really shocked there. Guess the cli should warn you when you try re-setting
the option though, that would be nice.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161112/b94ce3c9/attachment.sig>

Pranith Kumar Karampuri

2016-Nov-14 10:50 UTC

head link

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

On Sat, Nov 12, 2016 at 4:28 PM, Gandalf Corvotempesta <
gandalf.corvotempesta at gmail.com> wrote:
> Il 12 nov 2016 10:21, "Kevin Lemonnier" <lemonnierk at
ulrar.net> ha scritto:
> > We've had a lot of problems in the past, but at least for us
3.7.12 (and
> 3.7.15)
> > seems to be working pretty well as long as you don't add bricks.
We
> started doing
> > multiple little clusters and abandonned the idea of one big cluster,
had
> no
> > issues since :)
> >
>
> Well, adding bricks could be usefull...  :)
>
> Having to create multiple cluster is not a solution and is much more
> expansive.
> And if you corrupt data from a single cluster you still have issues
>
> I think would be better to add less features and focus more to stability.
>First of all, thanks to all the folks who contributed to this thread. We
value your feedback.

In gluster-users and ovirt-community  we saw people trying gluster and
complain about heal times and split-brains. So we had to fix bugs in quorum
in 3-way replication; then we started working on features like sharding for
better heal times and arbiter volumes for cost benefits.

To make gluster stable for VM images we had to add all these new features
and then fix all the bugs Lindsay/Kevin reported. We just fixed a
corruption issue that can happen with replace-brick which will be available
in 3.9.0 and 3.8.6. The only 2 other known issues that can lead to
corruptions are add-brick and the bug you filed Gandalf. Krutika just 5
minutes back saw something that could possibly lead to the corruption for
the add-brick bug. Is that really the Root cause? We are not sure yet, we
need more time. Without Lindsay/Kevin/David Gossage's support this workload
would have been in much worse condition. These bugs are not easy to
re-create thus not easy to fix. At least that has been Krutika's experience.

       Take away from this mail thread for me is: I think it is important
to educate users about why we are adding new features. People are coming to
the conclusion that only bug fixing corresponds to stabilization and not
features. It is a wrong perception. Without the work that went into adding
all those new features above in gluster, most probably you guys wouldn't
have given gluster another chance because it used to be unusable before
these features for VM workloads. One more take away is to get the
documentation right. Lack of documentation led Alex to try the worst
possible combo for storing VMs on gluster. So we as community failed in
some way there as well.

      Krutika will be sending out VM usecase related documentation after
28th of this month. If you have any other feedback, do let us know.

In a software defined storage, stability and consistency are the
most> important things
>
> I'm also subscribed to moosefs and lizardfs mailing list and I
don't
> recall any single data corruption/data loss event
>
> In gluster,  after some days of testing I've found a huge data
corruption
> issue that is still unfixed on bugzilla.
> If you change the shard size on a populated cluster,  you break all
> existing data.
> Try to do this on a cluster with working VMs and see what happens....
> a single cli command break everything and is still unfixed.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161114/8432450a/attachment.html>

Krutika Dhananjay

2016-Nov-14 10:59 UTC

head link

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

Which data corruption issue is this? Could you point me to the bug report
on bugzilla?

-Krutika

On Sat, Nov 12, 2016 at 4:28 PM, Gandalf Corvotempesta <
gandalf.corvotempesta at gmail.com> wrote:
> Il 12 nov 2016 10:21, "Kevin Lemonnier" <lemonnierk at
ulrar.net> ha scritto:
> > We've had a lot of problems in the past, but at least for us
3.7.12 (and
> 3.7.15)
> > seems to be working pretty well as long as you don't add bricks.
We
> started doing
> > multiple little clusters and abandonned the idea of one big cluster,
had
> no
> > issues since :)
> >
>
> Well, adding bricks could be usefull...  :)
>
> Having to create multiple cluster is not a solution and is much more
> expansive.
> And if you corrupt data from a single cluster you still have issues
>
> I think would be better to add less features and focus more to stability.
> In a software defined storage, stability and consistency are the most
> important things
>
> I'm also subscribed to moosefs and lizardfs mailing list and I
don't
> recall any single data corruption/data loss event
>
> In gluster,  after some days of testing I've found a huge data
corruption
> issue that is still unfixed on bugzilla.
> If you change the shard size on a populated cluster,  you break all
> existing data.
> Try to do this on a cluster with working VMs and see what happens....
> a single cli command break everything and is still unfixed.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161114/93babb72/attachment.html>

Gluster users - Nov 2016 - 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks