David Gossage
2016-Oct-14 15:37 UTC
[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files
Sorry to resurrect an old email but did any resolution occur for this or a cause found? I just see this as a potential task I may need to also run through some day and if their are pitfalls to watch for would be good to know. *David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284 On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier <lemonnierk at ulrar.net> wrote:> Hi, > > Here is the info : > > Volume Name: VMs > Type: Replicate > Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: ips4adm.name:/mnt/storage/VMs > Brick2: ips5adm.name:/mnt/storage/VMs > Brick3: ips6adm.name:/mnt/storage/VMs > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-type: auto > cluster.server-quorum-type: server > network.remote-dio: enable > cluster.eager-lock: enable > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > features.shard: on > features.shard-block-size: 64MB > cluster.data-self-heal-algorithm: full > network.ping-timeout: 15 > > > For the logs I'm sending that over to you in private. > > > On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote: > > Could you please attach the glusterfs client and brick logs? > > Also provide output of `gluster volume info`. > > -Krutika > > On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier <lemonnierk at ulrar.net > > > > wrote: > > > > >A A - What was the original (and current) geometry? (status and > info) > > > > It was a 1x3 that I was trying to bump to 2x3. > > >A A - what parameters did you use when adding the bricks? > > > > > > > Just a simple add-brick node1:/path node2:/path node3:/path > > Then a fix-layout when everything started going wrong. > > > > I was able to salvage some VMs by stopping them then starting them > > again, > > but most won't start for various reasons (disk corrupted, grub not > found > > ...). > > For those we are deleting the disks then importing them from > backups, > > that's > > a huge loss but everything has been down for so long, no choice .. > > >A A On 6/09/2016 8:00 AM, Kevin Lemonnier wrote: > > > > > >A I tried a fix-layout, and since that didn't work I removed the > brick > > (start then commit when it showed > > >A completed). Not better, the volume is now running on the 3 > original > > bricks (replica 3) but the VMs > > >A are still corrupted. I have 880 Mb of shards on the bricks I > removed > > for some reason, thos shards do exist > > >A (and are bigger) on the "live" volume. I don't understand why > now > > that I have removed the new bricks > > >A everything isn't working like before .. > > > > > >A On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote: > > > > > >A Hi, > > > > > >A I just added 3 bricks to a volume and all the VMs are doing I/O > > errors now. > > >A I rebooted a VM to see and it can't start again, am I missing > > something ? Is the reblance required > > >A to make everything run ? > > > > > >A That's urgent, thanks. > > > > > >A -- > > >A Kevin Lemonnier > > >A PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > > > > > > > > > > > > > >A _______________________________________________ > > >A Gluster-users mailing list > > >A Gluster-users at gluster.org > > >A http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > >A _______________________________________________ > > >A Gluster-users mailing list > > >A Gluster-users at gluster.org > > >A http://www.gluster.org/mailman/listinfo/gluster-users > > > > > >A -- > > >A Lindsay Mathieson > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > -- > > Kevin Lemonnier > > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161014/38f7444f/attachment.html>
Krutika Dhananjay
2016-Oct-17 05:02 UTC
[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files
Hi, No. I did run add-brick on a volume with the same configuration as that of Kevin, while IO was running, except that I wasn't running VM workload. I compared the file checksums wrt the original src files from which they were copied and they matched. @Kevin, I see that network.ping-timeout on your setup is 15 seconds and that's too low. Could you reconfigure that to 30 seconds? -Krutika On Fri, Oct 14, 2016 at 9:07 PM, David Gossage <dgossage at carouselchecks.com> wrote:> Sorry to resurrect an old email but did any resolution occur for this or a > cause found? I just see this as a potential task I may need to also run > through some day and if their are pitfalls to watch for would be good to > know. > > *David Gossage* > *Carousel Checks Inc. | System Administrator* > *Office* 708.613.2284 > > On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier <lemonnierk at ulrar.net> > wrote: > >> Hi, >> >> Here is the info : >> >> Volume Name: VMs >> Type: Replicate >> Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: ips4adm.name:/mnt/storage/VMs >> Brick2: ips5adm.name:/mnt/storage/VMs >> Brick3: ips6adm.name:/mnt/storage/VMs >> Options Reconfigured: >> performance.readdir-ahead: on >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> features.shard: on >> features.shard-block-size: 64MB >> cluster.data-self-heal-algorithm: full >> network.ping-timeout: 15 >> >> >> For the logs I'm sending that over to you in private. >> >> >> On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote: >> > Could you please attach the glusterfs client and brick logs? >> > Also provide output of `gluster volume info`. >> > -Krutika >> > On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier < >> lemonnierk at ulrar.net> >> > wrote: >> > >> > >A A - What was the original (and current) geometry? (status and >> info) >> > >> > It was a 1x3 that I was trying to bump to 2x3. >> > >A A - what parameters did you use when adding the bricks? >> > > >> > >> > Just a simple add-brick node1:/path node2:/path node3:/path >> > Then a fix-layout when everything started going wrong. >> > >> > I was able to salvage some VMs by stopping them then starting them >> > again, >> > but most won't start for various reasons (disk corrupted, grub not >> found >> > ...). >> > For those we are deleting the disks then importing them from >> backups, >> > that's >> > a huge loss but everything has been down for so long, no choice .. >> > >A A On 6/09/2016 8:00 AM, Kevin Lemonnier wrote: >> > > >> > >A I tried a fix-layout, and since that didn't work I removed the >> brick >> > (start then commit when it showed >> > >A completed). Not better, the volume is now running on the 3 >> original >> > bricks (replica 3) but the VMs >> > >A are still corrupted. I have 880 Mb of shards on the bricks I >> removed >> > for some reason, thos shards do exist >> > >A (and are bigger) on the "live" volume. I don't understand why >> now >> > that I have removed the new bricks >> > >A everything isn't working like before .. >> > > >> > >A On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier >> wrote: >> > > >> > >A Hi, >> > > >> > >A I just added 3 bricks to a volume and all the VMs are doing I/O >> > errors now. >> > >A I rebooted a VM to see and it can't start again, am I missing >> > something ? Is the reblance required >> > >A to make everything run ? >> > > >> > >A That's urgent, thanks. >> > > >> > >A -- >> > >A Kevin Lemonnier >> > >A PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >> > > >> > > >> > > >> > > >> > >A _______________________________________________ >> > >A Gluster-users mailing list >> > >A Gluster-users at gluster.org >> > >A http://www.gluster.org/mailman/listinfo/gluster-users >> > > >> > > >> > > >> > >A _______________________________________________ >> > >A Gluster-users mailing list >> > >A Gluster-users at gluster.org >> > >A http://www.gluster.org/mailman/listinfo/gluster-users >> > > >> > >A -- >> > >A Lindsay Mathieson >> > >> > > _______________________________________________ >> > > Gluster-users mailing list >> > > Gluster-users at gluster.org >> > > http://www.gluster.org/mailman/listinfo/gluster-users >> > >> > -- >> > Kevin Lemonnier >> > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > http://www.gluster.org/mailman/listinfo/gluster-users >> >> -- >> Kevin Lemonnier >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161017/4b7e55c6/attachment.html>
Kevin Lemonnier
2016-Oct-17 06:43 UTC
[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files
On Fri, Oct 14, 2016 at 10:37:03AM -0500, David Gossage wrote:> Sorry to resurrect an old email but did any resolution occur for this or a > cause found?A I just see this as a potential task I may need to also run > through some day and if their are pitfalls to watch for would be good to > know.Unfortunatly no, I ended up restoring almost all the VMs from backups then we created two small clusters instead of a big one, and I guess we'll keep creating 3 bricks cluster when needed for now. Maybe just make sure you are running > 3.7.12, and if possible test it on a non-production environment first. Still. hard to replicate the same load for tests .. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Digital signature URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161017/43646b65/attachment.sig>
Gandalf Corvotempesta
2016-Oct-17 07:20 UTC
[Gluster-users] [URGENT] Add-bricks to a volume corrupted the files
Il 14 ott 2016 17:37, "David Gossage" <dgossage at carouselchecks.com> ha scritto:> > Sorry to resurrect an old email but did any resolution occur for this ora cause found? I just see this as a potential task I may need to also run through some day and if their are pitfalls to watch for would be good to know.>I think that the issue wrote in these emails must be addressed in some way. It's really bad that adding bricks to a cluster lead to data corruption as adding bricks is a standard administration task I hope that the issue will be detected and fixed asap. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161017/20743a29/attachment.html>