Tiemen Ruiten
2015-Jun-08 09:35 UTC
[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding
Hello, We are running an oVirt cluster on top of a 2 node replica 2 Gluster volume. Yesterday we suddenly noticed VMs were not responding and quickly found out the Gluster volume had issues. These errors were filling up the etc-glusterfs-glusterd.log file: [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor]> 0-rpc-service: RPC program not available (req 1298437 330) for > 10.100.3.41:1022 > [2015-06-07 08:36:26.498073] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] > 0-rpcsvc: rpc actor failed to complete successfullyA restart of glusterfsd and glusterd resolved the issue, but triggered a lot of self-heals. We are running glusterfs 3.7.0 on ZFS. I have attached etc-glusterfs-glusterd.log, the brick log file and the glustershd.log. I would be grateful if anyone could shed any light on what happened here and if there's anything we can do to prevent it. -- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/2c2dbb0e/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: etc-glusterfs-glusterd.vol.log-20150608 Type: application/octet-stream Size: 157133 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/2c2dbb0e/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: export-gluster01-brick.log Type: text/x-log Size: 86478 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/2c2dbb0e/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: glustershd.log Type: text/x-log Size: 56836 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/2c2dbb0e/attachment-0003.bin>
Tiemen Ruiten
2015-Jun-08 09:40 UTC
[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding
Some extra points: - 10.100.3.41 is one of the oVirt hosts. - I only needed to restart glusterfsd & glusterd in one of the gluster nodes (also the one where I pulled the logs from) to get everything in working order. - it's a separate gluster volume, not managed from oVirt engine. On 8 June 2015 at 11:35, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:> Hello, > > We are running an oVirt cluster on top of a 2 node replica 2 Gluster > volume. Yesterday we suddenly noticed VMs were not responding and quickly > found out the Gluster volume had issues. These errors were filling up the > etc-glusterfs-glusterd.log file: > > [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor] >> 0-rpc-service: RPC program not available (req 1298437 330) for >> 10.100.3.41:1022 >> [2015-06-07 08:36:26.498073] E >> [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to >> complete successfully > > > A restart of glusterfsd and glusterd resolved the issue, but triggered a > lot of self-heals. > > We are running glusterfs 3.7.0 on ZFS. > > I have attached etc-glusterfs-glusterd.log, the brick log file and the > glustershd.log. I would be grateful if anyone could shed any light on what > happened here and if there's anything we can do to prevent it. > > -- > Tiemen Ruiten > Systems Engineer > R&D Media >-- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/98820df8/attachment.html>