thr3ads.net - Gluster users - [Gluster-users] gluster volume 3.10.4 hangs [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Seva Gluschenko

2017-Jul-31 08:12 UTC

[Gluster-users] gluster volume 3.10.4 hangs

Hi folks,
I'm running a simple gluster setup with a single volume replicated at two
servers, as follows:

Volume Name: gv0
Type: Replicate
Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: sst0:/var/glusterfs
Brick2: sst2:/var/glusterfs
Options Reconfigured:
cluster.self-heal-daemon: enable
performance.readdir-ahead: on
nfs.disable: on
transport.address-family: inet

This volume is used to store data in highload production, and recently I faced
two major problems that made the whole idea of using gluster quite
questionnable, so I would like to ask gluster developers and/or call for
community wisdom in hope that I might be missing something. The problem is, when
it happened that one of replica servers hung, it caused the whole glusterfs to
hang. Could you please drop me a hint, is it expected behaviour, or are there
any tweaks and server or volume settings that might be altered to change this?
Any help would be appreciated much.
--
Best Regards,

Seva Gluschenko
CTO @ http://webkontrol.ru (http://webkontrol.ru/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170731/5142ca58/attachment.html>

Dmitri Chebotarov

2017-Jul-31 16:35 UTC

head link

[Gluster-users] gluster volume 3.10.4 hangs

Hi

With only two nodes it's recommended to set
cluster.server-quorum-type=server and cluster.server-quorum-ratio=51% (i.e.
more than 50%).

On Mon, Jul 31, 2017 at 4:12 AM, Seva Gluschenko <gvs at webkontrol.ru>
wrote:
> Hi folks,
>
>
> I'm running a simple gluster setup with a single volume replicated at
two
> servers, as follows:
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: sst0:/var/glusterfs
> Brick2: sst2:/var/glusterfs
> Options Reconfigured:
> cluster.self-heal-daemon: enable
> performance.readdir-ahead: on
> nfs.disable: on
> transport.address-family: inet
>
> This volume is used to store data in highload production, and recently I
> faced two major problems that made the whole idea of using gluster quite
> questionnable, so I would like to ask gluster developers and/or call for
> community wisdom in hope that I might be missing something. The problem is,
> when it happened that one of replica servers hung, it caused the whole
> glusterfs to hang. Could you please drop me a hint, is it expected
> behaviour, or are there any tweaks and server or volume settings that might
> be altered to change this? Any help would be appreciated much.
>
>
> --
> Best Regards,
>
> Seva Gluschenko
> CTO @ http://webkontrol.ru
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170731/1da3a3e4/attachment.html>

2017-Jul-31 21:29 UTC

head link

[Gluster-users] gluster volume 3.10.4 hangs

On 7/31/2017 1:12 AM, Seva Gluschenko wrote:> Hi folks,
>
>
> I'm running a simple gluster setup with a single volume replicated at 
> two servers, as follows:
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> The problem is, when it happened that one of replica servers hung, it 
> caused the whole glusterfs to hang.
Yes, you lost quorum and the system doesn't want you to get a split-brain.
> Could you please drop me a hint, is it expected behaviour, or are 
> there any tweaks and server or volume settings that might be altered 
> to change this? Any help would be appreciated much.
>
Add a third replica node (or just an arbiter node if you aren't that 
ambitious or want to save on the kit)

That way when you lose a node, the cluster it will pause for 40 seconds 
or so while it figures things out and then continue on.
When the missing node returns, the self-heal will kick in and you will 
be back to 100%.

Your other alternative is to turn off quorum. But that risks 
split-brain. Depending upon your data, that may or may not be a serious 
issue.

-wk


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170731/197afbb8/attachment.html>

Seva Gluschenko

2017-Aug-02 07:31 UTC

head link

[Gluster-users] gluster volume 3.10.4 hangs

Thank you very much indeed, I'll try and add an arbiter node.

--
Best Regards,

Seva Gluschenko
CTO @ http://webkontrol.ru (http://webkontrol.ru/)
+7 916 172 6 170
August 1, 2017 12:29 AM, "WK" wrote:
On 7/31/2017 1:12 AM, Seva Gluschenko wrote: Hi folks,
I'm running a simple gluster setup with a single volume replicated at two
servers, as follows:

Volume Name: gv0
Type: Replicate
Volume ID: dd4996c0-04e6-4f9b-a04e-73279c4f112b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp The problem is, when it happened that one of replica servers
hung, it caused the whole glusterfs to hang.
Yes, you lost quorum and the system doesn't want you to get a split-brain.
Could you please drop me a hint, is it expected behaviour, or are there any
tweaks and server or volume settings that might be altered to change this? Any
help would be appreciated much.
Add a third replica node (or just an arbiter node if you aren't that
ambitious or want to save on the kit)

That way when you lose a node, the cluster it will pause for 40 seconds or so
while it figures things out and then continue on.
When the missing node returns, the self-heal will kick in and you will be back
to 100%.

Your other alternative is to turn off quorum. But that risks split-brain.
Depending upon your data, that may or may not be a serious issue.

-wk
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170802/1c141080/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

Gluster users - Jul 2017 - gluster volume 3.10.4 hangs

[Gluster-users] gluster volume 3.10.4 hangs

[Gluster-users] gluster volume 3.10.4 hangs

[Gluster-users] gluster volume 3.10.4 hangs

[Gluster-users] gluster volume 3.10.4 hangs

Reasonably Related Threads