thr3ads.net - Gluster users - [Gluster-users] Self heal issues [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Prasun Gera

2015-Aug-06 20:03 UTC

[Gluster-users] Self heal issues

I replaced the brick in a node in my 3x2 dist+repl volume (RHS 3). I'm
seeing that the heal process, which should essentially be a dump from the
working replica to the newly added one is taking exceptionally long. It has
moved ~100 G over a day on a 1Gigabit network. The CPU usage on both the
nodes of the replica has been pretty high. I also think that nagios is
making it worse. The heal is slow enough as it is, and nagios keeps
triggering heal info, which I think never completes. I also see my logs
filling up These are some of the log contents which I got by running tail
on them:

cli.log
[2015-08-06 19:52:20.926000] T [socket.c:2759:socket_connect]
(-->/lib64/libpthread.so.0() [0x3ec1407a51]
(-->/usr/lib64/libglusterfs.so.0(gf_timer_proc+0x120) [0x7fb84c0f6980]
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0xd9) [0x7fb84bc96249])))
0-glusterfs: connect () called on transport already connected
[2015-08-06 19:52:21.926068] T [rpc-clnt.c:418:rpc_clnt_reconnect]
0-glusterfs: attempting reconnect
[2015-08-06 19:52:21.926091] T [socket.c:2767:socket_connect] 0-glusterfs:
connecting 0xa198b0, state=0 gen=0 sock=-1
[2015-08-06 19:52:21.926114] W [dict.c:1060:data_to_str]
(-->/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0x6bea)
[0x7fb844f82bea] (-->/usr/lib64/glusterfs/
3.6.0.53/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7fb844f873bd] (-->/usr/lib64/glusterfs/
3.6.0.53/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb844f87270]))) 0-dict: data is NULL
[2015-08-06 19:52:21.926125] W [dict.c:1060:data_to_str]
(-->/usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so(+0x6bea)
[0x7fb844f82bea] (-->/usr/lib64/glusterfs/
3.6.0.53/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
[0x7fb844f873bd] (-->/usr/lib64/glusterfs/
3.6.0.53/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb844f8727b]))) 0-dict: data is NULL
[2015-08-06 19:52:21.926129] E [name.c:140:client_fill_address_family]
0-glusterfs: transport.address-family not specified. Could not guess
default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2015-08-06 19:52:21.926179] T [cli-quotad-client.c:100:cli_quotad_notify]
0-glusterfs: got RPC_CLNT_DISCONNECT


brick log full of these messages:

[2015-08-06 19:54:22.494254] I
[server-rpc-fops.c:693:server_removexattr_cbk] 0-gluster-server: 2206495:
REMOVEXATTR file path (fadccb1e-ea0c-416a-94ec-ec88fafec2a5) of key
security.ima ==> (No data available)
[2015-08-06 19:54:22.514814] E [marker.c:2574:marker_removexattr_cbk]
0-gluster-marker: No data available occurred while creating symlinks


sestatus
SELinux status:                 disabled

glusterfs --version
glusterfs 3.6.0.53 built on Mar 18 2015 08:12:38


Does anyone know what's going on ?

PS: I am using RHS because our school's satellite has the repos. Contacting
RHN over this would likely be complicated, and i would prefer solving this
on my own.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150806/fb2e8daf/attachment.html>

Ravishankar N

2015-Aug-07 05:10 UTC

head link

[Gluster-users] Self heal issues

On 08/07/2015 01:33 AM, Prasun Gera wrote:> I replaced the brick in a node in my 3x2 dist+repl volume (RHS 3). I'm 
> seeing that the heal process, which should essentially be a dump from 
> the working replica to the newly added one is taking exceptionally 
> long. It has moved ~100 G over a day on a 1Gigabit network. The CPU 
> usage on both the nodes of the replica has been pretty high. 
Does setting `cluster.data-self-heal-algorithm` to full make a 
difference in the cpu usage?
> I also think that nagios is making it worse. The heal is slow enough 
> as it is, and nagios keeps triggering heal info, which I think never 
> completes. I also see my logs filling up These are some of the log 
> contents which I got by running tail on them:

Gluster users - Aug 2015 - Self heal issues

[Gluster-users] Self heal issues

[Gluster-users] Self heal issues