thr3ads.net - Gluster users - [Gluster-users] How to prevent Brick terminated by socket temporarily unavailable [May 2019]

If this information is useful, please help other people find it:
Share via:

Jeff Bischoff

2019-May-16 20:50 UTC

[Gluster-users] How to prevent Brick terminated by socket temporarily unavailable

I'm having a frequent problem where some temporary condition causes bricks
to be shut down. The health-check feature is shutting them down, and according
to
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/
the brick will stay off and not be restarted (by design).

 

What I don't understand is:
What is causing this "Resource temporarily unavailable" in the first
place. From searching the web, it sounds like a socket timeout. Have you guys
seen this before?
If this is truly a temporary failure, why do we shut down the brick
indefinitely?
 

Should I try any of the following:
Increase 'network.ping-timeout' or 'client.grace-timeout'
Disable the health check feature by setting:
# gluster volume set <VOLNAME> storage.health-check-interval 0

 

The brick log looks like this at the time it is shut down:

------------------

[2019-05-08 13:48:33.642605] W [MSGID: 113075]
[posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix:
aio_write() on
/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check
returned [Resource temporarily unavailable]

[2019-05-08 13:48:33.749246] M [MSGID: 113075]
[posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix:
health-check failed, going down

[2019-05-08 13:48:34.000428] M [MSGID: 113075]
[posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix:
still alive! -> SIGTERM

[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5]
-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-:
received signum (15), shutting down

------------------

 

The GlusterD log shows this shortly after:

 

------------------
[2019-05-08 13:49:04.673536] I [MSGID: 106143]
[glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick
/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick
on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv
on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)
------------------

 

Any guidance would be greatly appreciated!

 

Best,

 

Jeff Bischoff

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/a433ed15/attachment.html>

Gluster users - May 2019 - How to prevent Brick terminated by socket temporarily unavailable

[Gluster-users] How to prevent Brick terminated by socket temporarily unavailable