thr3ads.net - Gluster users - [Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2018-Nov-03 00:31 UTC

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

Mabi,

If bug 1637953 is what you are experiencing, then you need to follow the 
workarounds mentioned in 
https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html. 
Can you see if this works?

-Ravi


On 11/02/2018 11:40 PM, mabi wrote:> I tried again to manually run a heal by using the "gluster volume
heal" command because still not files have been healed and noticed the
following warning in the glusterd.log file:
>
> [2018-11-02 18:04:19.454702] I [MSGID: 106533]
[glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume myvol-private
> [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustershd: error returned while attempting to connect to host:(null), port:0
>
> It looks like the glustershd can't connect to "host:(null)",
could that be the reason why there is no healing taking place? if yes why do I
see here "host:(null)"? and what needs fixing?
>
> This seeem to have happened since I upgraded from 3.12.14 to 4.1.5.
>
> I really would appreciate some help here, I suspect being an issue with
GlusterFS 4.1.5.
>
> Thank you in advance for any feedback.
>
>
> ??????? Original Message ???????
> On Wednesday, October 31, 2018 11:13 AM, mabi <mabi at protonmail.ch>
wrote:
>
>> Hello,
>>
>> I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and
currently have a volume with around 27174 files which are not being healed. The
"volume heal info" command shows the same 27k files under the first
node and the second node but there is nothing under the 3rd node (arbiter).
>>
>> I already tried running a "volume heal" but none of the files
got healed.
>>
>> In the glfsheal log file for that particular volume the only error I
see is a few of these entries:
>>
>> [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail]
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1)
op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800
for 127.0.1.1:49152
>>
>> and then a few of these warnings:
>>
>> [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
[0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84)
[0x7f2a798e8a84]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58)
[0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
>>
>> the glustershd.log file shows the following:
>>
>> [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail]
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1)
op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800
for 127.0.1.1:49152
>> [2018-10-31 10:10:52.502502] E [MSGID: 114031]
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0:
remote operation failed [Transport endpoint is not connected]
>>
>> any idea what could be wrong here?
>>
>> Regards,
>> Mabi
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

mabi

2018-Nov-03 06:33 UTC

head link

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

Ravi, I actually restarted glustershd by unmounting my volume on the clients,
stopping and starting the volume on the cluster and re-mounting it on the
clients yesterday evening and it managed to get around 1500~ files cleared from
the "volume heal info" output. So I am down now to around ~25k more
files to heal. While restarting the volume I saw the following log entries in
the brick log file:

[2018-11-02 17:51:07.078738] W [inodelk.c:610:pl_inodelk_log_cleanup]
0-myvol-private-server: releasing lock on da4f31fb-ac53-4d78-a633-f0046ac3ebcc
held by {client=0x7fd48400c160, pid=-6 lk-owner=b0d405e0167f0000}


What also bothers me is that if I run a manual "volume heal" nothing
happens except the following log entry in glusterd log:

[2018-11-03 06:32:16.033214] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd:
error returned while attempting to connect to host:(null), port:0

That does not seem normal... what do you think?


??????? Original Message ???????
On Saturday, November 3, 2018 1:31 AM, Ravishankar N <ravishankar at
redhat.com> wrote:
> Mabi,
>
> If bug 1637953 is what you are experiencing, then you need to follow the
> workarounds mentioned in
> https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html.
> Can you see if this works?
>
> -Ravi
>
> On 11/02/2018 11:40 PM, mabi wrote:
>
> > I tried again to manually run a heal by using the "gluster volume
heal" command because still not files have been healed and noticed the
following warning in the glusterd.log file:
> > [2018-11-02 18:04:19.454702] I [MSGID: 106533]
[glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume myvol-private
> > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustershd: error returned while attempting to connect to host:(null), port:0
> > It looks like the glustershd can't connect to
"host:(null)", could that be the reason why there is no healing taking
place? if yes why do I see here "host:(null)"? and what needs fixing?
> > This seeem to have happened since I upgraded from 3.12.14 to 4.1.5.
> > I really would appreciate some help here, I suspect being an issue
with GlusterFS 4.1.5.
> > Thank you in advance for any feedback.
> > ??????? Original Message ???????
> > On Wednesday, October 31, 2018 11:13 AM, mabi mabi at protonmail.ch
wrote:
> >
> > > Hello,
> > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1
arbiter) and currently have a volume with around 27174 files which are not being
healed. The "volume heal info" command shows the same 27k files under
the first node and the second node but there is nothing under the 3rd node
(arbiter).
> > > I already tried running a "volume heal" but none of the
files got healed.
> > > In the glfsheal log file for that particular volume the only
error I see is a few of these entries:
> > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail]
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1)
op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800
for 127.0.1.1:49152
> > > and then a few of these warnings:
> > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
[0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84)
[0x7f2a798e8a84]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58)
[0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
> > > the glustershd.log file shows the following:
> > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail]
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1)
op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800
for 127.0.1.1:49152
> > > [2018-10-31 10:10:52.502502] E [MSGID: 114031]
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0:
remote operation failed [Transport endpoint is not connected]
> > > any idea what could be wrong here?
> > > Regards,
> > > Mabi
> >
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users

mabi

2018-Nov-03 10:43 UTC

head link

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

Ravi (or anyone else who can help), I now have even more files which are pending
for healing. Here is the output of a "volume heal info summary":

Brick node1:/data/myvol-private/brick
Status: Connected
Total Number of entries: 49845
Number of entries in heal pending: 49845
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick node2:/data/myvol-private/brick
Status: Connected
Total Number of entries: 26644
Number of entries in heal pending: 26644
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick node3:/srv/glusterfs/myvol-private/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Should I try to set the "cluster.data-self-heal" parameter of that
volume to "off" as mentioned in the bug?

And by doing that, does it mean that my files pending heal are in danger of
being lost?

Also is it dangerous to leave "cluster.data-self-heal" to off?



??????? Original Message ???????
On Saturday, November 3, 2018 1:31 AM, Ravishankar N <ravishankar at
redhat.com> wrote:
> Mabi,
>
> If bug 1637953 is what you are experiencing, then you need to follow the
> workarounds mentioned in
> https://lists.gluster.org/pipermail/gluster-users/2018-October/035178.html.
> Can you see if this works?
>
> -Ravi
>
> On 11/02/2018 11:40 PM, mabi wrote:
>
> > I tried again to manually run a heal by using the "gluster volume
heal" command because still not files have been healed and noticed the
following warning in the glusterd.log file:
> > [2018-11-02 18:04:19.454702] I [MSGID: 106533]
[glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume myvol-private
> > [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit]
0-glustershd: error returned while attempting to connect to host:(null), port:0
> > It looks like the glustershd can't connect to
"host:(null)", could that be the reason why there is no healing taking
place? if yes why do I see here "host:(null)"? and what needs fixing?
> > This seeem to have happened since I upgraded from 3.12.14 to 4.1.5.
> > I really would appreciate some help here, I suspect being an issue
with GlusterFS 4.1.5.
> > Thank you in advance for any feedback.
> > ??????? Original Message ???????
> > On Wednesday, October 31, 2018 11:13 AM, mabi mabi at protonmail.ch
wrote:
> >
> > > Hello,
> > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1
arbiter) and currently have a volume with around 27174 files which are not being
healed. The "volume heal info" command shows the same 27k files under
the first node and the second node but there is nothing under the 3rd node
(arbiter).
> > > I already tried running a "volume heal" but none of the
files got healed.
> > > In the glfsheal log file for that particular volume the only
error I see is a few of these entries:
> > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail]
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1)
op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800
for 127.0.1.1:49152
> > > and then a few of these warnings:
> > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a)
[0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84)
[0x7f2a798e8a84]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58)
[0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument]
> > > the glustershd.log file shows the following:
> > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail]
0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1)
op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800
for 127.0.1.1:49152
> > > [2018-10-31 10:10:52.502502] E [MSGID: 114031]
[client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0:
remote operation failed [Transport endpoint is not connected]
> > > any idea what could be wrong here?
> > > Regards,
> > > Mabi
> >
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users

Gluster users - Nov 2018 - Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica

[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica