Tiemen Ruiten
2015-Jun-08 09:40 UTC
[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding
Some extra points: - 10.100.3.41 is one of the oVirt hosts. - I only needed to restart glusterfsd & glusterd in one of the gluster nodes (also the one where I pulled the logs from) to get everything in working order. - it's a separate gluster volume, not managed from oVirt engine. On 8 June 2015 at 11:35, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:> Hello, > > We are running an oVirt cluster on top of a 2 node replica 2 Gluster > volume. Yesterday we suddenly noticed VMs were not responding and quickly > found out the Gluster volume had issues. These errors were filling up the > etc-glusterfs-glusterd.log file: > > [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor] >> 0-rpc-service: RPC program not available (req 1298437 330) for >> 10.100.3.41:1022 >> [2015-06-07 08:36:26.498073] E >> [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to >> complete successfully > > > A restart of glusterfsd and glusterd resolved the issue, but triggered a > lot of self-heals. > > We are running glusterfs 3.7.0 on ZFS. > > I have attached etc-glusterfs-glusterd.log, the brick log file and the > glustershd.log. I would be grateful if anyone could shed any light on what > happened here and if there's anything we can do to prevent it. > > -- > Tiemen Ruiten > Systems Engineer > R&D Media >-- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150608/98820df8/attachment.html>
Tiemen Ruiten
2015-Jun-09 09:30 UTC
[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding
Anyone who could help? We just ran into this exact same problem again. I just noticed we are running GlusterFS 3.7.1 on the clients (oVirt hosts/VDSM). Could this be an issue? On 8 June 2015 at 11:40, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:> Some extra points: > > - 10.100.3.41 is one of the oVirt hosts. > > - I only needed to restart glusterfsd & glusterd in one of the gluster > nodes (also the one where I pulled the logs from) to get everything in > working order. > > - it's a separate gluster volume, not managed from oVirt engine. > > On 8 June 2015 at 11:35, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote: > >> Hello, >> >> We are running an oVirt cluster on top of a 2 node replica 2 Gluster >> volume. Yesterday we suddenly noticed VMs were not responding and quickly >> found out the Gluster volume had issues. These errors were filling up the >> etc-glusterfs-glusterd.log file: >> >> [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor] >>> 0-rpc-service: RPC program not available (req 1298437 330) for >>> 10.100.3.41:1022 >>> [2015-06-07 08:36:26.498073] E >>> [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to >>> complete successfully >> >> >> A restart of glusterfsd and glusterd resolved the issue, but triggered a >> lot of self-heals. >> >> We are running glusterfs 3.7.0 on ZFS. >> >> I have attached etc-glusterfs-glusterd.log, the brick log file and the >> glustershd.log. I would be grateful if anyone could shed any light on what >> happened here and if there's anything we can do to prevent it. >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> > > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media >-- Tiemen Ruiten Systems Engineer R&D Media -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150609/c8498c91/attachment.html>