The I/O errors are happening after, not during the heal.
As described, I just rebooted a node, waited for the heal to finish,
rebooted another, waited for the heal to finish then rebooted the third.
From that point, the VM just has a lot of I/O errors showing whenever I
use the disk a lot (importing big MySQL dumps). The VM "screen" on the
console
tab of proxmox just spams I/O errors from that point, which it didn't before
rebooting
the gluster nodes. Tried to poweroff the VM and force full heals, but I
didn't find
a way to fix the problem short of deleting the VM disk and restoring it from a
backup.
I have 3 other servers on 3.7.6 where that problem isn't happening, so it
might be a 3.7.11 bug,
but since the raid card failed recently on one of the nodes I'm not really
sure some other
piece of hardware isn't at fault .. Unfortunatly I don't have the
hardware to test that.
The only way to be sure would be to upgrade the 3.7.6 nodes to 3.7.11 and repeat
the same tests,
but those nodes are in production and the VM freezes during the heal last month
already
caused huge problems for our clients, really can't afford any other problems
there,
so testing on them isn't an option.
To sum up, I have 3 nodes on 3.7.6 with no corruption happening but huge freezes
during heals,
and 3 other nodes on 3.7.11 with no freezes during heal but corruption. qemu-img
doesn't see the
corruption, it only shows on the VM's screen and seems mostly harmless, but
sometimes the VM
does switch to read-only mode saying it had too many I/O errors.
Would the bitrot detection deamon detect a hardware problem ? I did enable it
but it didn't
detect anything, although I don't know how to force a check on it, no idea
if it ran a scrub
since the corruption happened.
On Thu, May 19, 2016 at 04:04:49PM -0400, Alastair Neil
wrote:> I am slightly confused you say you have image file corruption but then
you
> say the qemu-img check says there is no corruption.A If what you mean
is
> that you see I/O errors during a heal this is likely to be due to io
> starvation, something that is a well know issue.
> There is work happening to improve this in version 3.8:
> https://bugzilla.redhat.com/show_bug.cgi?id=1269461
> On 19 May 2016 at 09:58, Kevin Lemonnier <lemonnierk at ulrar.net>
wrote:
>
> That's a different problem then, I have corruption without
removing or
> adding bricks,
> as mentionned. Might be two separate issue
>
> On Thu, May 19, 2016 at 11:25:34PM +1000, Lindsay Mathieson wrote:
> >A A On 19/05/2016 12:17 AM, Lindsay Mathieson wrote:
> >
> >A A A One thought - since the VM's are active while the
brick is
> >A A A removed/re-added, could it be the shards that are written
> while the
> >A A A brick is added that are the reverse healing shards?
> >
> >A A I tested by:
> >
> >A A - removing brick 3
> >
> >A A - erasing brick 3
> >
> >A A - closing down all VM's
> >
> >A A - adding new brick 3
> >
> >A A - waiting until heal number reached its max and started
> decreasing
> >
> >A A A There were no reverse heals
> >
> >A A - Started the VM's backup. No real issues there though
one showed
> IO
> >A A errors, presumably due to shards being locked as they were
> healed.
> >
> >A A - VM's started ok, no reverse heals were noted and
eventually
> Brick 3 was
> >A A fully healed. The VM's do not appear to be corrupted.
> >
> >A A So it would appear the problem is adding a brick while the
volume
> is being
> >A A written to.
> >
> >A A Cheers,
> >
> >A --
> >A Lindsay Mathieson
>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160520/9ab03e0d/attachment.sig>