thr3ads.net - Gluster users - [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files [Oct 2016]

If this information is useful, please help other people find it:
Share via:

David Gossage

2016-Oct-14 15:37 UTC

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

Sorry to resurrect an old email but did any resolution occur for this or a
cause found?  I just see this as a potential task I may need to also run
through some day and if their are pitfalls to watch for would be good to
know.

*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier <lemonnierk at ulrar.net>
wrote:
> Hi,
>
> Here is the info :
>
> Volume Name: VMs
> Type: Replicate
> Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: ips4adm.name:/mnt/storage/VMs
> Brick2: ips5adm.name:/mnt/storage/VMs
> Brick3: ips6adm.name:/mnt/storage/VMs
> Options Reconfigured:
> performance.readdir-ahead: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> features.shard: on
> features.shard-block-size: 64MB
> cluster.data-self-heal-algorithm: full
> network.ping-timeout: 15
>
>
> For the logs I'm sending that over to you in private.
>
>
> On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote:
> >    Could you please attach the glusterfs client and brick logs?
> >    Also provide output of `gluster volume info`.
> >    -Krutika
> >    On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier <lemonnierk at
ulrar.net
> >
> >    wrote:
> >
> >      >A  A  - What was the original (and current) geometry? (status
and
> info)
> >
> >      It was a 1x3 that I was trying to bump to 2x3.
> >      >A  A  - what parameters did you use when adding the bricks?
> >      >
> >
> >      Just a simple add-brick node1:/path node2:/path node3:/path
> >      Then a fix-layout when everything started going wrong.
> >
> >      I was able to salvage some VMs by stopping them then starting
them
> >      again,
> >      but most won't start for various reasons (disk corrupted,
grub not
> found
> >      ...).
> >      For those we are deleting the disks then importing them from
> backups,
> >      that's
> >      a huge loss but everything has been down for so long, no choice
..
> >      >A  A  On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
> >      >
> >      >A  I tried a fix-layout, and since that didn't work I
removed the
> brick
> >      (start then commit when it showed
> >      >A  completed). Not better, the volume is now running on the 3
> original
> >      bricks (replica 3) but the VMs
> >      >A  are still corrupted. I have 880 Mb of shards on the bricks
I
> removed
> >      for some reason, thos shards do exist
> >      >A  (and are bigger) on the "live" volume. I
don't understand why
> now
> >      that I have removed the new bricks
> >      >A  everything isn't working like before ..
> >      >
> >      >A  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier
wrote:
> >      >
> >      >A  Hi,
> >      >
> >      >A  I just added 3 bricks to a volume and all the VMs are
doing I/O
> >      errors now.
> >      >A  I rebooted a VM to see and it can't start again, am I
missing
> >      something ? Is the reblance required
> >      >A  to make everything run ?
> >      >
> >      >A  That's urgent, thanks.
> >      >
> >      >A  --
> >      >A  Kevin Lemonnier
> >      >A  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >      >
> >      >
> >      >
> >      >
> >      >A  _______________________________________________
> >      >A  Gluster-users mailing list
> >      >A  Gluster-users at gluster.org
> >      >A  http://www.gluster.org/mailman/listinfo/gluster-users
> >      >
> >      >
> >      >
> >      >A  _______________________________________________
> >      >A  Gluster-users mailing list
> >      >A  Gluster-users at gluster.org
> >      >A  http://www.gluster.org/mailman/listinfo/gluster-users
> >      >
> >      >A  --
> >      >A  Lindsay Mathieson
> >
> >      > _______________________________________________
> >      > Gluster-users mailing list
> >      > Gluster-users at gluster.org
> >      > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >      --
> >      Kevin Lemonnier
> >      PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >      _______________________________________________
> >      Gluster-users mailing list
> >      Gluster-users at gluster.org
> >      http://www.gluster.org/mailman/listinfo/gluster-users
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161014/38f7444f/attachment.html>

Krutika Dhananjay

2016-Oct-17 05:02 UTC

head link

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

Hi,

No. I did run add-brick on a volume with the same configuration as that of
Kevin, while IO was running, except
that I wasn't running VM workload. I compared the file checksums wrt the
original src files from which they were copied
and they matched.


@Kevin,

I see that network.ping-timeout on your setup is 15 seconds and  that's too
low. Could you reconfigure that to 30 seconds?

-Krutika

On Fri, Oct 14, 2016 at 9:07 PM, David Gossage <dgossage at
carouselchecks.com>
wrote:
> Sorry to resurrect an old email but did any resolution occur for this or a
> cause found?  I just see this as a potential task I may need to also run
> through some day and if their are pitfalls to watch for would be good to
> know.
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier <lemonnierk at
ulrar.net>
> wrote:
>
>> Hi,
>>
>> Here is the info :
>>
>> Volume Name: VMs
>> Type: Replicate
>> Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ips4adm.name:/mnt/storage/VMs
>> Brick2: ips5adm.name:/mnt/storage/VMs
>> Brick3: ips6adm.name:/mnt/storage/VMs
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> features.shard: on
>> features.shard-block-size: 64MB
>> cluster.data-self-heal-algorithm: full
>> network.ping-timeout: 15
>>
>>
>> For the logs I'm sending that over to you in private.
>>
>>
>> On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote:
>> >    Could you please attach the glusterfs client and brick logs?
>> >    Also provide output of `gluster volume info`.
>> >    -Krutika
>> >    On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier <
>> lemonnierk at ulrar.net>
>> >    wrote:
>> >
>> >      >A  A  - What was the original (and current) geometry?
(status and
>> info)
>> >
>> >      It was a 1x3 that I was trying to bump to 2x3.
>> >      >A  A  - what parameters did you use when adding the
bricks?
>> >      >
>> >
>> >      Just a simple add-brick node1:/path node2:/path node3:/path
>> >      Then a fix-layout when everything started going wrong.
>> >
>> >      I was able to salvage some VMs by stopping them then starting
them
>> >      again,
>> >      but most won't start for various reasons (disk corrupted,
grub not
>> found
>> >      ...).
>> >      For those we are deleting the disks then importing them from
>> backups,
>> >      that's
>> >      a huge loss but everything has been down for so long, no
choice ..
>> >      >A  A  On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
>> >      >
>> >      >A  I tried a fix-layout, and since that didn't work I
removed the
>> brick
>> >      (start then commit when it showed
>> >      >A  completed). Not better, the volume is now running on
the 3
>> original
>> >      bricks (replica 3) but the VMs
>> >      >A  are still corrupted. I have 880 Mb of shards on the
bricks I
>> removed
>> >      for some reason, thos shards do exist
>> >      >A  (and are bigger) on the "live" volume. I
don't understand why
>> now
>> >      that I have removed the new bricks
>> >      >A  everything isn't working like before ..
>> >      >
>> >      >A  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin
Lemonnier
>> wrote:
>> >      >
>> >      >A  Hi,
>> >      >
>> >      >A  I just added 3 bricks to a volume and all the VMs are
doing I/O
>> >      errors now.
>> >      >A  I rebooted a VM to see and it can't start again,
am I missing
>> >      something ? Is the reblance required
>> >      >A  to make everything run ?
>> >      >
>> >      >A  That's urgent, thanks.
>> >      >
>> >      >A  --
>> >      >A  Kevin Lemonnier
>> >      >A  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>> >      >
>> >      >
>> >      >
>> >      >
>> >      >A  _______________________________________________
>> >      >A  Gluster-users mailing list
>> >      >A  Gluster-users at gluster.org
>> >      >A  http://www.gluster.org/mailman/listinfo/gluster-users
>> >      >
>> >      >
>> >      >
>> >      >A  _______________________________________________
>> >      >A  Gluster-users mailing list
>> >      >A  Gluster-users at gluster.org
>> >      >A  http://www.gluster.org/mailman/listinfo/gluster-users
>> >      >
>> >      >A  --
>> >      >A  Lindsay Mathieson
>> >
>> >      > _______________________________________________
>> >      > Gluster-users mailing list
>> >      > Gluster-users at gluster.org
>> >      > http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>> >      --
>> >      Kevin Lemonnier
>> >      PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>> >      _______________________________________________
>> >      Gluster-users mailing list
>> >      Gluster-users at gluster.org
>> >      http://www.gluster.org/mailman/listinfo/gluster-users
>>
>> --
>> Kevin Lemonnier
>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161017/4b7e55c6/attachment.html>

Kevin Lemonnier

2016-Oct-17 06:43 UTC

head link

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

On Fri, Oct 14, 2016 at 10:37:03AM -0500, David Gossage
wrote:>    Sorry to resurrect an old email but did any resolution occur for this or
a
>    cause found?A  I just see this as a potential task I may need to also
run
>    through some day and if their are pitfalls to watch for would be good to
>    know.
Unfortunatly no, I ended up restoring almost all the VMs from backups then
we created two small clusters instead of a big one, and I guess we'll keep
creating 3 bricks cluster when needed for now.

Maybe just make sure you are running > 3.7.12, and if possible test it
on a non-production environment first. Still. hard to replicate the
same load for tests ..

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161017/43646b65/attachment.sig>

Gandalf Corvotempesta

2016-Oct-17 07:20 UTC

head link

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

Il 14 ott 2016 17:37, "David Gossage" <dgossage at
carouselchecks.com> ha
scritto:>
> Sorry to resurrect an old email but did any resolution occur for this ora cause found?  I just see this as a potential task I may need to also run
through some day and if their are pitfalls to watch for would be good to
know.>
I think that the issue wrote in these emails must be addressed in some way.
It's really bad that adding bricks to a cluster lead to data corruption as
adding bricks is a standard administration task

I hope that the issue will be detected and fixed asap.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161017/20743a29/attachment.html>

Gluster users - Oct 2016 - [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files