thr3ads.net - Gluster users - [Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Tiemen Ruiten

2015-Jun-08 09:40 UTC

[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding

Some extra points:

- 10.100.3.41 is one of the oVirt hosts.

- I only needed to restart glusterfsd & glusterd in one of the gluster
nodes (also the one where I pulled the logs from) to get everything in
working order.

- it's a separate gluster volume, not managed from oVirt engine.

On 8 June 2015 at 11:35, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:
> Hello,
>
> We are running an oVirt cluster on top of a 2 node replica 2 Gluster
> volume. Yesterday we suddenly noticed VMs were not responding and quickly
> found out the Gluster volume had issues. These errors were filling up the
> etc-glusterfs-glusterd.log file:
>
> [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor]
>> 0-rpc-service: RPC program not available (req 1298437 330) for
>> 10.100.3.41:1022
>> [2015-06-07 08:36:26.498073] E
>> [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
to
>> complete successfully
>
>
> A restart of glusterfsd and glusterd resolved the issue, but triggered a
> lot of self-heals.
>
> We are running glusterfs 3.7.0 on ZFS.
>
> I have attached etc-glusterfs-glusterd.log, the brick log file and the
> glustershd.log. I would be grateful if anyone could shed any light on what
> happened here and if there's anything we can do to prevent it.
>
> --
> Tiemen Ruiten
> Systems Engineer
> R&D Media
>


-- 
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150608/98820df8/attachment.html>

Tiemen Ruiten

2015-Jun-09 09:30 UTC

head link

[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding

Anyone who could help? We just ran into this exact same problem again. I
just noticed we are running GlusterFS 3.7.1 on the clients (oVirt
hosts/VDSM). Could this be an issue?

On 8 June 2015 at 11:40, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:
> Some extra points:
>
> - 10.100.3.41 is one of the oVirt hosts.
>
> - I only needed to restart glusterfsd & glusterd in one of the gluster
> nodes (also the one where I pulled the logs from) to get everything in
> working order.
>
> - it's a separate gluster volume, not managed from oVirt engine.
>
> On 8 June 2015 at 11:35, Tiemen Ruiten <t.ruiten at rdmedia.com>
wrote:
>
>> Hello,
>>
>> We are running an oVirt cluster on top of a 2 node replica 2 Gluster
>> volume. Yesterday we suddenly noticed VMs were not responding and
quickly
>> found out the Gluster volume had issues. These errors were filling up
the
>> etc-glusterfs-glusterd.log file:
>>
>> [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor]
>>> 0-rpc-service: RPC program not available (req 1298437 330) for
>>> 10.100.3.41:1022
>>> [2015-06-07 08:36:26.498073] E
>>> [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor
failed to
>>> complete successfully
>>
>>
>> A restart of glusterfsd and glusterd resolved the issue, but triggered
a
>> lot of self-heals.
>>
>> We are running glusterfs 3.7.0 on ZFS.
>>
>> I have attached etc-glusterfs-glusterd.log, the brick log file and the
>> glustershd.log. I would be grateful if anyone could shed any light on
what
>> happened here and if there's anything we can do to prevent it.
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>
>
>
> --
> Tiemen Ruiten
> Systems Engineer
> R&D Media
>


-- 
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150609/c8498c91/attachment.html>

Gluster users - Jun 2015 - 2 node replica 2 cluster - volume on one node stopped responding

[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding

[Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding