mabi
2018-Oct-31 10:13 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Hello, I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and currently have a volume with around 27174 files which are not being healed. The "volume heal info" command shows the same 27k files under the first node and the second node but there is nothing under the 3rd node (arbiter). I already tried running a "volume heal" but none of the files got healed. In the glfsheal log file for that particular volume the only error I see is a few of these entries: [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800 for 127.0.1.1:49152 and then a few of these warnings: [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] the glustershd.log file shows the following: [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800 for 127.0.1.1:49152 [2018-10-31 10:10:52.502502] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: remote operation failed [Transport endpoint is not connected] any idea what could be wrong here? Regards, Mabi
mabi
2018-Nov-01 07:24 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
Is it possible the problem I encounter described below is caused by the following bug, introduced in 4.1.5: Bug 1637953 - data-self-heal in arbiter volume results in stale locks. https://bugzilla.redhat.com/show_bug.cgi?id=1637953 Regards, M. ??????? Original Message ??????? On Wednesday, October 31, 2018 11:13 AM, mabi <mabi at protonmail.ch> wrote:> Hello, > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and currently have a volume with around 27174 files which are not being healed. The "volume heal info" command shows the same 27k files under the first node and the second node but there is nothing under the 3rd node (arbiter). > > I already tried running a "volume heal" but none of the files got healed. > > In the glfsheal log file for that particular volume the only error I see is a few of these entries: > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800 for 127.0.1.1:49152 > > and then a few of these warnings: > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] > > the glustershd.log file shows the following: > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800 for 127.0.1.1:49152 > [2018-10-31 10:10:52.502502] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: remote operation failed [Transport endpoint is not connected] > > any idea what could be wrong here? > > Regards, > Mabi
mabi
2018-Nov-02 18:10 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
I tried again to manually run a heal by using the "gluster volume heal" command because still not files have been healed and noticed the following warning in the glusterd.log file: [2018-11-02 18:04:19.454702] I [MSGID: 106533] [glusterd-volume-ops.c:938:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume myvol-private [2018-11-02 18:04:19.457311] W [rpc-clnt.c:1753:rpc_clnt_submit] 0-glustershd: error returned while attempting to connect to host:(null), port:0 It looks like the glustershd can't connect to "host:(null)", could that be the reason why there is no healing taking place? if yes why do I see here "host:(null)"? and what needs fixing? This seeem to have happened since I upgraded from 3.12.14 to 4.1.5. I really would appreciate some help here, I suspect being an issue with GlusterFS 4.1.5. Thank you in advance for any feedback. ??????? Original Message ??????? On Wednesday, October 31, 2018 11:13 AM, mabi <mabi at protonmail.ch> wrote:> Hello, > > I have a GlusterFS 4.1.5 cluster with 3 nodes (including 1 arbiter) and currently have a volume with around 27174 files which are not being healed. The "volume heal info" command shows the same 27k files under the first node and the second node but there is nothing under the 3rd node (arbiter). > > I already tried running a "volume heal" but none of the files got healed. > > In the glfsheal log file for that particular volume the only error I see is a few of these entries: > > [2018-10-31 10:06:41.524300] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x108b sent = 2018-10-31 09:36:41.314203. timeout = 1800 for 127.0.1.1:49152 > > and then a few of these warnings: > > [2018-10-31 10:08:12.161498] W [dict.c:671:dict_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.5/xlator/cluster/replicate.so(+0x6734a) [0x7f2a6dff434a] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5da84) [0x7f2a798e8a84] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58) [0x7f2a798a37f8] ) 0-dict: dict is NULL [Invalid argument] > > the glustershd.log file shows the following: > > [2018-10-31 10:10:52.502453] E [rpc-clnt.c:184:call_bail] 0-myvol-private-client-0: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0xaa398 sent = 2018-10-31 09:40:50.927816. timeout = 1800 for 127.0.1.1:49152 > [2018-10-31 10:10:52.502502] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-myvol-private-client-0: remote operation failed [Transport endpoint is not connected] > > any idea what could be wrong here? > > Regards, > Mabi