thr3ads.net - Gluster users - [Gluster-users] dealing with gluster outages due to disk timeouts [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Christian Rice

2016-Nov-24 06:12 UTC

[Gluster-users] dealing with gluster outages due to disk timeouts

This is a long-standing problem for me, and I?m wondering how to insulate myself
from it?pardon the long-windedness in advance.

I use gluster internationally as regional repositories of files, and it?s pretty
constantly being rsync?d to (ie, written to solely by rsync, optimized with
?inplace or similar).

These regional repositories are also being read from, each to the tune of
10-50MB/s. Each gluster pool is anywhere between 4 to 16 servers, each with one
brick of RAID6, all pools in a distributed-only config. I?m not currently using
distributed-replicated, but even that configuration is not immune to my problem.

So, here?s the problem:

If one disk on one gluster brick experiences timeouts, all the gluster clients
block. This is likely because the rate at which the disks are being exercised
by rsyncs (writes and stats) plus reads (client file access) causes an
overwhelming backlog of gluster ops, something perhaps is bottlenecked and
locking up, but in general it?s fairly useless to me. Running a ?df? hangs
completely.

This has been an issue for me for years. My usual procedure is to manually fail
the disk that?s experiencing timeouts, if it hasn?t been ejected already by the
raid controller, and remove the load from the gluster file system?it only takes
a fraction of a minute before the gluster volume recovers and I can add the load
back. Rebuilding parity to the brick?s raid is not the problem?it?s the moments
before the disk ultimately fails that causes the backlog of requests that really
causes problems.

I?m looking for advice as to how to insulate myself from this problem better.
My RAID cards don?t support modifying disk timeouts to be incredibly short. I
can see disk timeout messages from the raid card, and write an omprog function
to fail the disk, but that?s kinda brutal. Maybe I could get a different raid
card that supports shorter timeouts or fast disk failures, but if anyone has
experience with, say md raid1 not having this problem, or something similar, it
might be worth the expense to go that route.

If my memory is correct, gluster still has this problem with a
distributed-replicated configuration, because writes need to succeed on both
leafs before an operation is considered complete, so a timeout on one node is
still detrimental.

Insight, experience designing around this, tunables I haven?t considered?I?ll
take anything. I really like gluster, I?ll keep using it, but this is its
Achille?s heel for me. Is there a magic bullet? Or do I just need to fail
faster?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161124/0bbfbf3b/attachment.html>

Дмитрий Глушенок

2016-Dec-02 20:59 UTC

head link

[Gluster-users] dealing with gluster outages due to disk timeouts

Hi,

I always though that hardware RAID is a requirement for SDS as it hides all
dirty work with raw disks from software which just cannot deal with all kinds of
hardware faults. If disk starts to experiencing long delays, then after about 7
seconds RAID controller marks this disk as failed (this is what TLER/ERC for).
If your RAID card behaves differently you can try to decrease OS timeouts (disk
driver will send i/o error to filesystem then). But in this case complete brick
will go offline and you will definitely need replicated setup.
> 24 ????. 2016 ?., ? 9:12, Christian Rice <crice at pandora.com>
???????(?):
> 
> This is a long-standing problem for me, and I?m wondering how to insulate
myself from it?pardon the long-windedness in advance.
>  
> I use gluster internationally as regional repositories of files, and it?s
pretty constantly being rsync?d to (ie, written to solely by rsync, optimized
with ?inplace or similar).
>  
> These regional repositories are also being read from, each to the tune of
10-50MB/s.  Each gluster pool is anywhere between 4 to 16 servers, each with one
brick of RAID6, all pools in a distributed-only config.  I?m not currently using
distributed-replicated, but even that configuration is not immune to my problem.
>  
> So, here?s the problem:
>  
> If one disk on one gluster brick experiences timeouts, all the gluster
clients block.  This is likely because the rate at which the disks are being
exercised by rsyncs (writes and stats) plus reads (client file access) causes an
overwhelming backlog of gluster ops, something perhaps is bottlenecked and
locking up, but in general it?s fairly useless to me.  Running a ?df? hangs
completely.
>  
> This has been an issue for me for years.  My usual procedure is to manually
fail the disk that?s experiencing timeouts, if it hasn?t been ejected already by
the raid controller, and remove the load from the gluster file system?it only
takes a fraction of a minute before the gluster volume recovers and I can add
the load back.  Rebuilding parity to the brick?s raid is not the problem?it?s
the moments before the disk ultimately fails that causes the backlog of requests
that really causes problems.
>  
> I?m looking for advice as to how to insulate myself from this problem
better.  My RAID cards don?t support modifying disk timeouts to be incredibly
short.  I can see disk timeout messages from the raid card, and write an omprog
function to fail the disk, but that?s kinda brutal.  Maybe I could get a
different raid card that supports shorter timeouts or fast disk failures, but if
anyone has experience with, say md raid1 not having this problem, or something
similar, it might be worth the expense to go that route.
>  
> If my memory is correct, gluster still has this problem with a
distributed-replicated configuration, because writes need to succeed on both
leafs before an operation is considered complete, so a timeout on one node is
still detrimental.
>  
> Insight, experience designing around this, tunables I haven?t
considered?I?ll take anything.  I really like gluster, I?ll keep using it, but
this is its Achille?s heel for me.  Is there a magic bullet?  Or do I just need
to fail faster?
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>--
Dmitry Glushenok
Jet Infosystems

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161202/97133124/attachment.html>

Gluster users - Dec 2016 - dealing with gluster outages due to disk timeouts

[Gluster-users] dealing with gluster outages due to disk timeouts

[Gluster-users] dealing with gluster outages due to disk timeouts