thr3ads.net - Gluster users - [Gluster-users] gluster volume heal info hang on 3.10.6 & 3.12.11 [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Ning Adam

2018-Sep-14 07:45 UTC

[Gluster-users] gluster volume heal info hang on 3.10.6 & 3.12.11

(My last mail seemed to be dropped by the list, reposting in plain-text format)

Hello:

I have recently done a rolling upgraded on some of my 3.7.6 gluster
(AFR + DHT) deploments to 3.10.6, having two problems afterwards:
1)  IO error occationally
2)  "gluster volume heal vol0 info" hangs after  decent amount of
files written to the cluster  (vary from hours to days)

All of my 7 upgraded 3.10 clusters  reproduced these problems,  and
then I tried upgraded them to 3.12.11, but nothing improved.
I have submited a bug on
https://bugzilla.redhat.com/show_bug.cgi?id=1618932 ,  so far no one
seams to look into it.

The problem of heal hanging is most annoying because it makes me
impossible to monitor my clusters. So I tried to look into the code
and analyze with the DEBUG & TRACE log, found the following situation.

1) I have "network.ping-timeout: 120" in volume option . If I modify
network.ping-timeout to 0,
"gluster volume heal vol0 info" will finish eventually after 10~20
minutes (8x3 bricks setup). And it says no file to heal.
It takes even longer when cluster is larger (36x3 bricks setup).
It does not matter even I stop all my IO or restart all glusterd &
glusterfsd process.  heal info still hangs.

"gluster volume status"  is all normal.  All my brick servers are in
low latency LAN enviroment.

2) As one of my clusters having this problem is testing setup,
currently I may do some debug there.
I looked into heal/src/glfs-heal.c, compile the code and added some log,
the initialize call on "ret = glfs_init (fs)" hangs.
I traced to " ret = glfs_chdir (fs, "/");" and then
"dht_lookup(
path='/' )",  it never get to "dht_readdirp_cbk()"

I stopped all client IO operation on my gluster volume and then ran
"glfsheal vol0".
I looked into /var/log/gluster/glfshead_vol0.log, there is messive
looping of reconnect attempt on all bricks. eg:

    [2018-09-14 05:56:09.395869] T [rpc-clnt.c:407:rpc_clnt_reconnect]
0-vol0-client-12: attempting reconnect
    [2018-09-14 05:56:09.396160] T [socket.c:3125:socket_connect] (-->
/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f10d4e79778] (-->
/usr/lib/glus
    terfs//rpc-transport/socket.so(+0xd442)[0x7f10c647c442] (-->
/lib/libgfrpc.so.0(rpc_transport_connect+0x93)[0x7f10d437de43] (-->
/lib/libgfrpc.so.0(r
    pc_clnt_reconnect+0x136)[0x7f10d438121b] (-->
/lib/libglusterfs.so.0(+0x3e4e9)[0x7f10d4e894e9] )))))
0-vol0-client-12: connect () called on transport
     already connected
    [2018-09-14 05:56:09.396194] T [rpc-clnt.c:407:rpc_clnt_reconnect]
0-vol0-client-10: attempting reconnect
    [2018-09-14 05:56:09.396424] T [socket.c:3125:socket_connect] (-->
/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f10d4e79778] (-->
/usr/lib/glus
    terfs//rpc-transport/socket.so(+0xd442)[0x7f10c647c442] (-->
/lib/libgfrpc.so.0(rpc_transport_connect+0x93)[0x7f10d437de43] (-->
/lib/libgfrpc.so.0(r
    pc_clnt_reconnect+0x136)[0x7f10d438121b] (-->
/lib/libglusterfs.so.0(+0x3e4e9)[0x7f10d4e894e9] )))))
0-vol0-client-10: connect () called on transport
     already connected

It's also weird that 0-vol0-client-5 was having huge ammount RPC ping
latency
(29 seconds it's not possible because simple ICMP pinging that server
is less then 0.1 second)

     root at storage0-cluster0:~# less /vgrep ping
/var/log/glusterfs/glfsheal-vol0.log | grep laten
    [2018-09-14 05:13:16.008866] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
is 0ms
    [2018-09-14 05:13:16.016669] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
is 0ms
    [2018-09-14 05:13:16.021769] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
is 0ms
    [2018-09-14 05:13:16.033761] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
is 4ms
    [2018-09-14 05:13:16.101962] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
is 62ms
    [2018-09-14 05:13:46.105502] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping latency
is 29996ms
    [2018-09-14 05:20:13.895285] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
is 0ms
    [2018-09-14 05:20:13.901476] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
is 0ms
    [2018-09-14 05:20:13.908373] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
is 0ms
    [2018-09-14 05:20:13.918782] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
is 4ms
    [2018-09-14 05:20:13.987162] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
is 62ms
    [2018-09-14 05:20:43.992528] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping latency
is 29998ms
    [2018-09-14 05:25:11.983259] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
is 0ms
    [2018-09-14 05:25:11.991199] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
is 0ms
    [2018-09-14 05:25:11.996933] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
is 0ms
    [2018-09-14 05:25:12.007867] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
is 4ms
    [2018-09-14 05:25:12.076057] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
is 62ms

I checked vol0-client-5, nothing special comparing to other brick. CPU
is not high, disk has no problem.
No warning or error log in  /var/log/gluster/brick/data-brk01-src.log,
 After I turn on TRACE level log on brick,
the logging is too much to analyze, and  no apparent difference
against other brick.

I hope someone can have some idea about this, or pointing the
direction to look into may help.

Thanks very much

Ning Adam

2018-Sep-14 15:44 UTC

head link

[Gluster-users] gluster volume heal info hang on 3.10.6 & 3.12.11

I have made some progress:

Using gluster-3.12.1, "glfsheal vol0" is fine. It tells me some file
need to be heal

    Brick storage0-cluster0:/data/brk02/src
    <gfid:88dfb19a-92dd-4abd-bb15-4931e6e47cd9>
    <gfid:88dfb19a-92dd-4abd-bb15-4931e6e47cd9>/0001
    Status: Connected
    Number of entries: 2

The hanging problem was introduced in

    commit 94faf8c4a1f5a020593ca65f44d68e1ba2f632eb
    Author: karthik-us <ksubrahm at redhat.com>
    Date:   Wed Sep 20 18:30:21 2017 +0530

        cluster/afr: Sending subvol up/down events when subvol comes
up or goes down

        > BUG: 1493539

        (cherry picked from commit 3bbb4fe4b33dc3a3ed068ed2284077f2a4d8265a)

        Change-Id: I6580351b245d5f868e9ddc6a4eb4dd6afa3bb6ec
        BUG: 1492066
        Signed-off-by: karthik-us <ksubrahm at redhat.com>

But while I using HEAD of relase-3.12 branch and do "git revert
94faf8c4", "glfsheal vol0" still hangs,
so this is not a simple problem.

I found the bug related to "client.event-threads", because previously
I have the setting "client.event-threads: 6",
there was only latency log about vol0-client 0-5 and

    [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping
latency is 29998ms

No ping latency to other brick in the log.

After I do "gluster volume set vol0 client.event-threads: 24"  and
install 3.12.12 (since I am debuging on 8x3 cluster),
running "glfsheal vol0" finished in 30seconds (the time is still
acceptable).
And there was

    [2018-09-14 21:10:17.817764] D
[rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-23: Ping latency
is 29876ms

So it leads to a conclusion that connections to brick might be holding
lock in client event-thread.

My gluster volume info:

    Options Reconfigured:
    cluster.locking-scheme: full
    performance.open-behind: on
    cluster.lookup-unhashed: auto
    transport.address-family: inet
    cluster.lookup-optimize: on
    diagnostics.brick-log-level: TRACE
    performance.readdir-ahead: True
    performance.md-cache-timeout: 0
    cluster.server-quorum-type: none
    performance.client-io-threads: True
    performance.read-ahead: False
    features.cache-invalidation: False
    network.ping-timeout: 120
    cluster.quorum-type: auto
    nfs.disable: True
    performance.io-cache: False
    server.event-threads: 4
    cluster.rebal-throttle: lazy
    transport.keepalive: True
    performance.write-behind: True
    cluster.self-heal-daemon: enable
    client.event-threads: 6
    cluster.eager-lock: on
    diagnostics.client-log-level: INFO
    features.trash: off
    features.selinux: off

cc ksubrahm at redhat.com, if you have any clue for this problem.

Ning Adam <frostyplanet at gmail.com> ?2018?9?14???
??3:45???>
> (My last mail seemed to be dropped by the list, reposting in plain-text
format)
>
> Hello:
>
> I have recently done a rolling upgraded on some of my 3.7.6 gluster
> (AFR + DHT) deploments to 3.10.6, having two problems afterwards:
> 1)  IO error occationally
> 2)  "gluster volume heal vol0 info" hangs after  decent amount of
> files written to the cluster  (vary from hours to days)
>
> All of my 7 upgraded 3.10 clusters  reproduced these problems,  and
> then I tried upgraded them to 3.12.11, but nothing improved.
> I have submited a bug on
> https://bugzilla.redhat.com/show_bug.cgi?id=1618932 ,  so far no one
> seams to look into it.
>
> The problem of heal hanging is most annoying because it makes me
> impossible to monitor my clusters. So I tried to look into the code
> and analyze with the DEBUG & TRACE log, found the following situation.
>
> 1) I have "network.ping-timeout: 120" in volume option . If I
modify
> network.ping-timeout to 0,
> "gluster volume heal vol0 info" will finish eventually after
10~20
> minutes (8x3 bricks setup). And it says no file to heal.
> It takes even longer when cluster is larger (36x3 bricks setup).
> It does not matter even I stop all my IO or restart all glusterd &
> glusterfsd process.  heal info still hangs.
>
> "gluster volume status"  is all normal.  All my brick servers are
in
> low latency LAN enviroment.
>
> 2) As one of my clusters having this problem is testing setup,
> currently I may do some debug there.
> I looked into heal/src/glfs-heal.c, compile the code and added some log,
> the initialize call on "ret = glfs_init (fs)" hangs.
> I traced to " ret = glfs_chdir (fs, "/");" and then
"dht_lookup(
> path='/' )",  it never get to "dht_readdirp_cbk()"
>
> I stopped all client IO operation on my gluster volume and then ran
> "glfsheal vol0".
> I looked into /var/log/gluster/glfshead_vol0.log, there is messive
> looping of reconnect attempt on all bricks. eg:
>
>     [2018-09-14 05:56:09.395869] T [rpc-clnt.c:407:rpc_clnt_reconnect]
> 0-vol0-client-12: attempting reconnect
>     [2018-09-14 05:56:09.396160] T [socket.c:3125:socket_connect] (-->
> /lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f10d4e79778] (-->
> /usr/lib/glus
>     terfs//rpc-transport/socket.so(+0xd442)[0x7f10c647c442] (-->
> /lib/libgfrpc.so.0(rpc_transport_connect+0x93)[0x7f10d437de43] (-->
> /lib/libgfrpc.so.0(r
>     pc_clnt_reconnect+0x136)[0x7f10d438121b] (-->
> /lib/libglusterfs.so.0(+0x3e4e9)[0x7f10d4e894e9] )))))
> 0-vol0-client-12: connect () called on transport
>      already connected
>     [2018-09-14 05:56:09.396194] T [rpc-clnt.c:407:rpc_clnt_reconnect]
> 0-vol0-client-10: attempting reconnect
>     [2018-09-14 05:56:09.396424] T [socket.c:3125:socket_connect] (-->
> /lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f10d4e79778] (-->
> /usr/lib/glus
>     terfs//rpc-transport/socket.so(+0xd442)[0x7f10c647c442] (-->
> /lib/libgfrpc.so.0(rpc_transport_connect+0x93)[0x7f10d437de43] (-->
> /lib/libgfrpc.so.0(r
>     pc_clnt_reconnect+0x136)[0x7f10d438121b] (-->
> /lib/libglusterfs.so.0(+0x3e4e9)[0x7f10d4e894e9] )))))
> 0-vol0-client-10: connect () called on transport
>      already connected
>
> It's also weird that 0-vol0-client-5 was having huge ammount RPC ping
latency
> (29 seconds it's not possible because simple ICMP pinging that server
> is less then 0.1 second)
>
>      root at storage0-cluster0:~# less /vgrep ping
> /var/log/glusterfs/glfsheal-vol0.log | grep laten
>     [2018-09-14 05:13:16.008866] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
> is 0ms
>     [2018-09-14 05:13:16.016669] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
> is 0ms
>     [2018-09-14 05:13:16.021769] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
> is 0ms
>     [2018-09-14 05:13:16.033761] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
> is 4ms
>     [2018-09-14 05:13:16.101962] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
> is 62ms
>     [2018-09-14 05:13:46.105502] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping latency
> is 29996ms
>     [2018-09-14 05:20:13.895285] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
> is 0ms
>     [2018-09-14 05:20:13.901476] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
> is 0ms
>     [2018-09-14 05:20:13.908373] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
> is 0ms
>     [2018-09-14 05:20:13.918782] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
> is 4ms
>     [2018-09-14 05:20:13.987162] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
> is 62ms
>     [2018-09-14 05:20:43.992528] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-5: Ping latency
> is 29998ms
>     [2018-09-14 05:25:11.983259] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-0: Ping latency
> is 0ms
>     [2018-09-14 05:25:11.991199] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-1: Ping latency
> is 0ms
>     [2018-09-14 05:25:11.996933] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-2: Ping latency
> is 0ms
>     [2018-09-14 05:25:12.007867] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-3: Ping latency
> is 4ms
>     [2018-09-14 05:25:12.076057] D
> [rpc-clnt-ping.c:211:rpc_clnt_ping_cbk] 0-vol0-client-4: Ping latency
> is 62ms
>
> I checked vol0-client-5, nothing special comparing to other brick. CPU
> is not high, disk has no problem.
> No warning or error log in  /var/log/gluster/brick/data-brk01-src.log,
>  After I turn on TRACE level log on brick,
> the logging is too much to analyze, and  no apparent difference
> against other brick.
>
> I hope someone can have some idea about this, or pointing the
> direction to look into may help.
>
> Thanks very much

Gluster users - Sep 2018 - gluster volume heal info hang on 3.10.6 & 3.12.11

[Gluster-users] gluster volume heal info hang on 3.10.6 & 3.12.11

[Gluster-users] gluster volume heal info hang on 3.10.6 & 3.12.11