thr3ads.net - Gluster users - [Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Niels de Vos

2016-Nov-14 14:54 UTC

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

On Mon, Nov 14, 2016 at 04:50:44PM +0530, Pranith Kumar Karampuri
wrote:> On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta <
> gandalf.corvotempesta at gmail.com> wrote:
> 
> > 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <pkarampu at
redhat.com>:
> > > To make gluster stable for VM images we had to add all these new
features
> > > and then fix all the bugs Lindsay/Kevin reported. We just fixed a
> > corruption
> > > issue that can happen with replace-brick which will be available
in 3.9.0
> > > and 3.8.6. The only 2 other known issues that can lead to
corruptions are
> > > add-brick and the bug you filed Gandalf. Krutika just 5 minutes
back saw
> > > something that could possibly lead to the corruption for the
add-brick
> > bug.
> > > Is that really the Root cause? We are not sure yet, we need more
time.
> > > Without Lindsay/Kevin/David Gossage's support this workload
would have
> > been
> > > in much worse condition. These bugs are not easy to re-create
thus not
> > easy
> > > to fix. At least that has been Krutika's experience.
> >
> > Ok, but this changes should be placed in a "test" version
and not
> > marked as stable.
> > I don't see any development release, only stable releases here.
> > Do you want all features ? Try the
"beta/rc/unstable/alpha/dev" version.
> > Do you want the stable version without known bugs but slow on VMs
> > workload? Use the "-stable" version.
> >
> > If you relase as stable, users tend to upgrade their cluster and use
> > the newer feature (that you are marking as stable).
> > What If I upgrade a production cluster to a stable version and try to
> > add-brick that lead to data corruption ?
> > I have to restore terabytes worth of data? Gluster is made for
> > scale-out, what I my cluster was made with 500TB of VMs ?
> > Try to restore 500TB from a backup....................
> >
> > This is unacceptable. add-brick/replace-brick should be common
"daily"
> > operations. You should heavy check these for regression or bug.
> >
> 
> This is a very good point. Adding other maintainers.
Obviously this is unacceptible for versions that have sharding as a
functional (not experimental) feature. All supported features are
expected to function without major problems (like corruption) for all
standard Gluster operations. Add-brick/replace-brick are surely such
Gluster operations.

Of course it is possible that this does not always happen, and our tests
did not catch the problem. In that case, we really need to have a bug
report with all the details, and preferably a script that can be used to
reproduce and detect the failure.

FWIW sharding has several open bugs (like any other component), but it
is not immediately clear to me if the problem reported in this email is
in Bugzilla yet. These are the bugs that are expected to get fixed in
upcoming minor releases:
 
https://bugzilla.redhat.com/buglist.cgi?component=sharding&f1=bug_status&f2=version&o1=notequals&o2=notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline

HTH,
Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161114/e433d1fd/attachment.sig>

Gandalf Corvotempesta

2016-Nov-14 15:38 UTC

head link

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

2016-11-14 15:54 GMT+01:00 Niels de Vos <ndevos at
redhat.com>:> Obviously this is unacceptible for versions that have sharding as a
> functional (not experimental) feature. All supported features are
> expected to function without major problems (like corruption) for all
> standard Gluster operations. Add-brick/replace-brick are surely such
> Gluster operations.
Is sharding an experimental feature even in 3.8 ?
Because in 3.8 announcement, it's declared stable:
http://blog.gluster.org/2016/06/glusterfs-3-8-released/
"Sharding is now stable for VM image storage. "
> FWIW sharding has several open bugs (like any other component), but it
> is not immediately clear to me if the problem reported in this email is
> in Bugzilla yet. These are the bugs that are expected to get fixed in
> upcoming minor releases:
>  
https://bugzilla.redhat.com/buglist.cgi?component=sharding&f1=bug_status&f2=version&o1=notequals&o2=notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline
My issue with sharding was reported in bugzilla on 2016-07-12
4 months for a IMHO, critical bug.

If you disable sharding on a sharded volume with existing shared data,
you corrupt every existing file.

Krutika Dhananjay

2016-Nov-14 15:39 UTC

head link

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

On Mon, Nov 14, 2016 at 8:24 PM, Niels de Vos <ndevos at redhat.com>
wrote:
> On Mon, Nov 14, 2016 at 04:50:44PM +0530, Pranith Kumar Karampuri wrote:
> > On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta <
> > gandalf.corvotempesta at gmail.com> wrote:
> >
> > > 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <
> pkarampu at redhat.com>:
> > > > To make gluster stable for VM images we had to add all these
new
> features
> > > > and then fix all the bugs Lindsay/Kevin reported. We just
fixed a
> > > corruption
> > > > issue that can happen with replace-brick which will be
available in
> 3.9.0
> > > > and 3.8.6. The only 2 other known issues that can lead to
> corruptions are
> > > > add-brick and the bug you filed Gandalf. Krutika just 5
minutes back
> saw
> > > > something that could possibly lead to the corruption for the
> add-brick
> > > bug.
> > > > Is that really the Root cause? We are not sure yet, we need
more
> time.
> > > > Without Lindsay/Kevin/David Gossage's support this
workload would
> have
> > > been
> > > > in much worse condition. These bugs are not easy to
re-create thus
> not
> > > easy
> > > > to fix. At least that has been Krutika's experience.
> > >
> > > Ok, but this changes should be placed in a "test"
version and not
> > > marked as stable.
> > > I don't see any development release, only stable releases
here.
> > > Do you want all features ? Try the
"beta/rc/unstable/alpha/dev"
> version.
> > > Do you want the stable version without known bugs but slow on VMs
> > > workload? Use the "-stable" version.
> > >
> > > If you relase as stable, users tend to upgrade their cluster and
use
> > > the newer feature (that you are marking as stable).
> > > What If I upgrade a production cluster to a stable version and
try to
> > > add-brick that lead to data corruption ?
> > > I have to restore terabytes worth of data? Gluster is made for
> > > scale-out, what I my cluster was made with 500TB of VMs ?
> > > Try to restore 500TB from a backup....................
> > >
> > > This is unacceptable. add-brick/replace-brick should be common
"daily"
> > > operations. You should heavy check these for regression or bug.
> > >
> >
> > This is a very good point. Adding other maintainers.
>
I think Pranith's intention here was to bring to other maintainers'
attention the point about
development releases vs stable releases although his inline comment may
have been a
bit out-of-place (I was part of the discussion that took place before this
reply of his, in office
today, hence taking the liberty to clarify).

-Krutika

> Obviously this is unacceptible for versions that have sharding as a
> functional (not experimental) feature. All supported features are
> expected to function without major problems (like corruption) for all
> standard Gluster operations. Add-brick/replace-brick are surely such
> Gluster operations.
>
> Of course it is possible that this does not always happen, and our tests
> did not catch the problem. In that case, we really need to have a bug
> report with all the details, and preferably a script that can be used to
> reproduce and detect the failure.
>
> FWIW sharding has several open bugs (like any other component), but it
> is not immediately clear to me if the problem reported in this email is
> in Bugzilla yet. These are the bugs that are expected to get fixed in
> upcoming minor releases:
>   https://bugzilla.redhat.com/buglist.cgi?component>
sharding&f1=bug_status&f2=version&o1=notequals&o2>
notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline
>
> HTH,
> Niels
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161114/b1914f32/attachment.html>

David Gossage

2016-Nov-14 16:01 UTC

head link

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

On Mon, Nov 14, 2016 at 8:54 AM, Niels de Vos <ndevos at redhat.com>
wrote:
> On Mon, Nov 14, 2016 at 04:50:44PM +0530, Pranith Kumar Karampuri wrote:
> > On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta <
> > gandalf.corvotempesta at gmail.com> wrote:
> >
> > > 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <
> pkarampu at redhat.com>:
> > > > To make gluster stable for VM images we had to add all these
new
> features
> > > > and then fix all the bugs Lindsay/Kevin reported. We just
fixed a
> > > corruption
> > > > issue that can happen with replace-brick which will be
available in
> 3.9.0
> > > > and 3.8.6. The only 2 other known issues that can lead to
> corruptions are
> > > > add-brick and the bug you filed Gandalf. Krutika just 5
minutes back
> saw
> > > > something that could possibly lead to the corruption for the
> add-brick
> > > bug.
> > > > Is that really the Root cause? We are not sure yet, we need
more
> time.
> > > > Without Lindsay/Kevin/David Gossage's support this
workload would
> have
> > > been
> > > > in much worse condition. These bugs are not easy to
re-create thus
> not
> > > easy
> > > > to fix. At least that has been Krutika's experience.
> > >
> > > Ok, but this changes should be placed in a "test"
version and not
> > > marked as stable.
> > > I don't see any development release, only stable releases
here.
> > > Do you want all features ? Try the
"beta/rc/unstable/alpha/dev"
> version.
> > > Do you want the stable version without known bugs but slow on VMs
> > > workload? Use the "-stable" version.
> > >
> > > If you relase as stable, users tend to upgrade their cluster and
use
> > > the newer feature (that you are marking as stable).
> > > What If I upgrade a production cluster to a stable version and
try to
> > > add-brick that lead to data corruption ?
> > > I have to restore terabytes worth of data? Gluster is made for
> > > scale-out, what I my cluster was made with 500TB of VMs ?
> > > Try to restore 500TB from a backup....................
> > >
> > > This is unacceptable. add-brick/replace-brick should be common
"daily"
> > > operations. You should heavy check these for regression or bug.
> > >
> >
> > This is a very good point. Adding other maintainers.
>
> Obviously this is unacceptible for versions that have sharding as a
> functional (not experimental) feature. All supported features are
> expected to function without major problems (like corruption) for all
> standard Gluster operations. Add-brick/replace-brick are surely such
> Gluster operations.
>
> Of course it is possible that this does not always happen, and our tests
> did not catch the problem. In that case, we really need to have a bug
> report with all the details, and preferably a script that can be used to
> reproduce and detect the failure.
>
I believe this bug relates to this particular issue raised in this email
chain.

https://bugzilla.redhat.com/show_bug.cgi?id=1387878

Kevin found bug, and Lindsay filed report after she was able to recreate it.

>
> FWIW sharding has several open bugs (like any other component), but it
> is not immediately clear to me if the problem reported in this email is
> in Bugzilla yet. These are the bugs that are expected to get fixed in
> upcoming minor releases:
>   https://bugzilla.redhat.com/buglist.cgi?component>
sharding&f1=bug_status&f2=version&o1=notequals&o2>
notequals&product=GlusterFS&query_format=advanced&v1=CLOSED&v2=mainline
>
> HTH,
> Niels
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161114/25535165/attachment.html>

Gluster users - Nov 2016 - 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

[Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks