thr3ads.net - Gluster users - [Gluster-users] Extremely slow cluster performance [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Patrick Rennie

2019-Apr-21 09:41 UTC

[Gluster-users] Extremely slow cluster performance

Another small update from me, I have been keeping an eye on the
glustershd.log file to see what is going on and I keep seeing the same file
names come up in there every 10 minutes, but not a lot of other activity.
Logs below.
How can I be sure my heal is progressing through the files which actually
need to be healed? I thought it would show up in these logs.
I also increased the "cluster.shd-max-threads" from 4 to 8 to try and
speed
things up too.

Any ideas here?

Thanks,

- Patrick

On 01-B
-------
[2019-04-21 09:12:54.575689] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
5354c112-2e58-451d-a6f7-6bfcc1c9d904
[2019-04-21 09:12:54.733601] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904.
sources=[0] 2  sinks=1
[2019-04-21 09:13:12.028509] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:13:12.047470] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:23:13.044377] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:23:13.051479] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:33:07.400369] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa.
sources=[0] 2  sinks=1
[2019-04-21 09:33:11.825449] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
2fd9899f-192b-49cb-ae9c-df35d3f004fa
[2019-04-21 09:33:14.029837] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:33:14.037436] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:33:23.913882] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa.
sources=[0] 2  sinks=1
[2019-04-21 09:33:43.874201] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
c25b80fd-f7df-4c6d-92bd-db930e89a0b1
[2019-04-21 09:34:02.273898] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1.
sources=[0] 2  sinks=1
[2019-04-21 09:35:12.282045] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885.
sources=[0] 2  sinks=1
[2019-04-21 09:35:15.146252] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
94027f22-a7d7-4827-be0d-09cf5ddda885
[2019-04-21 09:35:15.254538] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885.
sources=[0] 2  sinks=1
[2019-04-21 09:35:22.900803] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45.
sources=[0] 2  sinks=1
[2019-04-21 09:35:27.150963] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
84c93069-cfd8-441b-a6e8-958bed535b45
[2019-04-21 09:35:29.186295] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45.
sources=[0] 2  sinks=1
[2019-04-21 09:35:35.967451] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9.
sources=[0] 2  sinks=1
[2019-04-21 09:35:40.733444] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
e747c32e-4353-4173-9024-855c69cdf9b9
[2019-04-21 09:35:58.707593] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9.
sources=[0] 2  sinks=1
[2019-04-21 09:36:25.554260] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d.
sources=[0] 2  sinks=1
[2019-04-21 09:36:26.031422] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
4758d581-9de0-403b-af8b-bfd3d71d020d
[2019-04-21 09:36:26.083982] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d.
sources=[0] 2  sinks=1

On 02-B
-------
[2019-04-21 09:03:15.815250] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:03:15.863153] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:03:15.867432] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:03:15.875134] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:03:39.020198] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:03:39.027345] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:13:18.524874] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:13:20.070172] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:13:20.074977] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:13:20.080827] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:13:40.015763] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:13:40.021805] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:23:21.991032] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:23:22.054565] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:23:22.059225] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:23:22.066266] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:23:41.129962] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:23:41.135919] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17

[2019-04-21 09:33:24.015223] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:33:24.069686] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:33:24.074341] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:33:24.080065] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:33:42.099515] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:33:42.107481] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17


On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie <patrickmrennie at
gmail.com>
wrote:
> Just another small update, I'm continuing to watch my brick logs and I
> just saw these errors come up in the recent events too. I am going to
> continue to post any errors I see in the hope of finding the right one to
> try and fix..
> This is from the logs on brick1, seems to be occurring on both nodes on
> brick1, although at different times. I'm not sure what this means, can
> anyone shed any light?
> I guess I am looking for some kind of specific error which may indicate
> something is broken or stuck and locking up and causing the extreme latency
> I'm seeing in the cluster.
>
> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a)
> [0x7f3b3e93158a]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45)
> [0x7f3b3e4c5d45]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>
> Thanks again,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <patrickmrennie at
gmail.com>
> wrote:
>
>> Hi Darrell,
>>
>> Thanks again for your advice, I've left it for a while but
unfortunately
>> it's still just as slow and causing more problems for our
operations now. I
>> will need to try and take some steps to at least bring performance back
to
>> normal while continuing to investigate the issue longer term. I can
>> definitely see one node with heavier CPU than the other, almost double,
>> which I am OK with, but I think the heal process is going to take
forever,
>> trying to check the "gluster volume heal info" shows
thousands and
>> thousands of files which may need healing, I have no idea how many in
total
>> the command is still running after hours, so I am not sure what has
gone so
>> wrong to cause this.
>>
>> I've checked cluster.op-version and cluster.max-op-version and it
looks
>> like I'm on the latest version there.
>>
>> I have no idea how long the healing is going to take on this cluster,
we
>> have around 560TB of data on here, but I don't think I can wait
that long
>> to try and restore performance to normal.
>>
>> Can anyone think of anything else I can try in the meantime to work out
>> what's causing the extreme latency?
>>
>> I've been going through cluster client the logs of some of our VMs
and on
>> some of our FTP servers I found this in the cluster mount log, but I am
not
>> seeing it on any of our other servers, just our FTP servers.
>>
>> [2019-04-21 07:16:19.925388] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:19:43.413834] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19:
remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:19:43.414153] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20:
remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:23:33.154717] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:33:24.943913] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>>
>> Any ideas what this could mean? I am basically just grasping at straws
>> here.
>>
>> I am going to hold off on the version upgrade until I know there are no
>> files which need healing, which could be a while, from some reading
I've
>> done there shouldn't be any issues with this as both are on v3.12.x
>>
>> I've free'd up a small amount of space, but I still need to
work on this
>> further.
>>
>> I've read of a command "find .glusterfs -type f -links -2
-exec rm {} \;"
>> which could be run on each brick and it would potentially clean up any
>> files which were deleted straight from the bricks, but not via the
client,
>> I have a feeling this could help me free up about 5-10TB per brick from
>> what I've been told about the history of this cluster. Can anyone
confirm
>> if this is actually safe to run?
>>
>> At this stage, I'm open to any suggestions as to how to proceed,
thanks
>> again for any advice.
>>
>> Cheers,
>>
>> - Patrick
>>
>> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <budic at
onholyground.com>
>> wrote:
>>
>>> Patrick,
>>>
>>> Sounds like progress. Be aware that gluster is expected to max out
the
>>> CPUs on at least one of your servers while healing. This is normal
and
>>> won?t adversely affect overall performance (any more than having
bricks in
>>> need of healing, at any rate) unless you?re overdoing it. shd
threads <= 4
>>> should not do that on your hardware. Other tunings may have also
increased
>>> overall performance, so you may see higher CPU than previously
anyway. I?d
>>> recommend upping those thread counts and letting it heal as fast as
>>> possible, especially if these are dedicated Gluster storage servers
(Ie:
>>> not also running VMs, etc). You should see ?normal? CPU use one
heals are
>>> completed. I see ~15-30% overall normally, 95-98% while healing (x
my 20
>>> cores). It?s also likely to be different between your servers, in a
pure
>>> replica, one tends to max and one tends to be a little higher, in a
>>> distributed-replica, I?d expect more than one to run harder while
healing.
>>>
>>> Keep the differences between doing an ls on a brick and doing an ls
on a
>>> gluster mount in mind. When you do a ls on a gluster volume, it
isn?t just
>>> doing a ls on one brick, it?s effectively doing it on ALL of your
bricks,
>>> and they all have to return data before the ls succeeds. In a
distributed
>>> volume, it?s figuring out where on each volume things live and
getting the
>>> stat() from each to assemble the whole thing. And if things are in
need of
>>> healing, it will take even longer to decide which version is
current and
>>> use it (shd triggers a heal anytime it encounters this). Any of
these
>>> things being slow slows down the overall response.
>>>
>>> At this point, I?d get some sleep too, and let your cluster heal
while
>>> you do. I?d really want it fully healed before I did any updates
anyway, so
>>> let it use CPU and get itself sorted out. Expect it to do a round
of
>>> healing after you upgrade each machine too, this is normal so don?t
let the
>>> CPU spike surprise you, It?s just catching up from the downtime
incurred by
>>> the update and/or reboot if you did one.
>>>
>>> That reminds me, check your gluster cluster.op-version and
>>> cluster.max-op-version (gluster vol get all all | grep op-version).
If
>>> op-version isn?t at the max-op-verison, set it to it so you?re
taking
>>> advantage of the latest features available to your version.
>>>
>>>   -Darrell
>>>
>>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <patrickmrennie at
gmail.com>
>>> wrote:
>>>
>>> Hi Darrell,
>>>
>>> Thanks again for your advice, I've applied the acltype=posixacl
on my
>>> zpools and I think that has reduced some of the noise from my brick
logs.
>>> I also bumped up some of the thread counts you suggested but my CPU
load
>>> skyrocketed, so I dropped it back down to something slightly lower,
but
>>> still higher than it was before, and will see how that goes for a
while.
>>>
>>> Although low space is a definite issue, if I run an ls anywhere on
my
>>> bricks directly it's instant, <1 second, and still takes
several minutes
>>> via gluster, so there is still a problem in my gluster
configuration
>>> somewhere. We don't have any snapshots, but I am trying to work
out if any
>>> data on there is safe to delete, or if there is any way I can
safely find
>>> and delete data which has been removed directly from the bricks in
the
>>> past. I also have lz4 compression already enabled on each zpool
which does
>>> help a bit, we get between 1.05 and 1.08x compression on this data.
>>> I've tried to go through each client and checked it's
cluster mount logs
>>> and also my brick logs and looking for errors, so far nothing is
jumping
>>> out at me, but there are some warnings and errors here and there, I
am
>>> trying to work out what they mean.
>>>
>>> It's already 1 am here and unfortunately, I'm still awake
working on
>>> this issue, but I think that I will have to leave the version
upgrades
>>> until tomorrow.
>>>
>>> Thanks again for your advice so far. If anyone has any ideas on
where I
>>> can look for errors other than brick logs or the cluster mount logs
to help
>>> resolve this issue, it would be much appreciated.
>>>
>>> Cheers,
>>>
>>> - Patrick
>>>
>>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <budic at
onholyground.com>
>>> wrote:
>>>
>>>> See inline:
>>>>
>>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <patrickmrennie
at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Darrell,
>>>>
>>>> Thanks for your reply, this issue seems to be getting worse
over the
>>>> last few days, really has me tearing my hair out. I will do as
you have
>>>> suggested and get started on upgrading from 3.12.14 to 3.12.15.
>>>> I've checked the zfs properties and all bricks have
"xattr=sa" set, but
>>>> none of them has "acltype=posixacl" set, currently
the acltype property
>>>> shows "off", if I make these changes will it apply
retroactively to the
>>>> existing data? I'm unfamiliar with what this will change so
I may need to
>>>> look into that before I proceed.
>>>>
>>>>
>>>> It is safe to apply that now, any new set/get calls will then
use it if
>>>> new posixacls exist, and use older if not. ZFS is good that
way. It should
>>>> clear up your posix_acl and posix errors over time.
>>>>
>>>> I understand performance is going to slow down as the bricks
get full,
>>>> I am currently trying to free space and migrate data to some
newer storage,
>>>> I have fresh several hundred TB storage I just setup recently
but with
>>>> these performance issues it's really slow. I also believe
there is
>>>> significant data which has been deleted directly from the
bricks in the
>>>> past, so if I can reclaim this space in a safe manner then I
will have at
>>>> least around 10-15% free space.
>>>>
>>>>
>>>> Full ZFS volumes will have a much larger impact on performance
than
>>>> you?d think, I?d prioritize this. If you have been taking zfs
snapshots,
>>>> consider deleting them to get the overall volume free space
back up. And
>>>> just to be sure it?s been said, delete from within the mounted
volumes,
>>>> don?t delete directly from the bricks (gluster will just try
and heal it
>>>> later, compounding your issues). Does not apply to deleting
other data from
>>>> the ZFS volume if it?s not part of the brick directory, of
course.
>>>>
>>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of
RAM so
>>>> generally they have plenty of resources available, currently
only using
>>>> around 330/512GB of memory.
>>>>
>>>> I will look into what your suggested settings will change, and
then
>>>> will probably go ahead with your recommendations, for our specs
as stated
>>>> above, what would you suggest for performance.io-thread-count ?
>>>>
>>>>
>>>> I run single 2630v4s on my servers, which have a smaller
storage
>>>> footprint than yours. I?d go with 32 for
performance.io-thread-count.
>>>> I?d try 4 for the shd thread settings on that gear. Your memory
use sounds
>>>> fine, so no worries there.
>>>>
>>>> Our workload is nothing too extreme, we have a few VMs which
write
>>>> backup data to this storage nightly for our clients, our VMs
don't live on
>>>> this cluster, but just write to it.
>>>>
>>>>
>>>> If they are writing compressible data, you?ll get immediate
benefit by
>>>> setting compression=lz4 on your ZFS volumes. It won?t help any
old data, of
>>>> course, but it will compress new data going forward. This is
another one
>>>> that?s safe to enable on the fly.
>>>>
>>>> I've been going through all of the logs I can, below are
some slightly
>>>> sanitized errors I've come across, but I'm not sure
what to make of them.
>>>> The main error I am seeing is the first one below, across
several of my
>>>> bricks, but possibly only for specific folders on the cluster,
I'm not 100%
>>>> about that yet though.
>>>>
>>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default 
[Operation not
>>>> supported]
>>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default 
[Operation not
>>>> supported]
>>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default 
[Operation not
>>>> supported]
>>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default 
[Operation not
>>>> supported]
>>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default 
[Operation not
>>>> supported]
>>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr]
>>>> 0-gvAA01-posix: Extended attributes not supported (try
remounting brick
>>>> with 'user_xattr' flag)
>>>>
>>>>
>>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002]
>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is
null for
>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002]
>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is
null for
>>>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available]
>>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050]
>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server:
24274759: LOOKUP
>>>> /xxxxxxxxxxxxxxxxxxxx
(a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>>>> Backup_clone1.vbm_62906_tmp), client:
>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3,
error-xlator:
>>>> gvAA01-posix [No data available]
>>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050]
>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server:
24276491: LOOKUP
>>>> /xxxxxxxxxxxxxxxxxxxx
(a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>>>> Backup_clone1.vbm_62906_tmp), client:
>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3,
error-xlator:
>>>> gvAA01-posix [No data available]
>>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002]
>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is
null for
>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002]
>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is
null for
>>>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available]
>>>>
>>>>
>>>> posixacls should clear those up, as mentioned.
>>>>
>>>>
>>>> [2019-04-20 14:25:59.654576] E
[inodelk.c:404:__inode_unlock_lock]
>>>> 0-gvAA01-locks:  Matching lock not found for unlock
0-9223372036854775807,
>>>> by 980fdbbd367f0000 on 0x7fc4f0161440
>>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053]
>>>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server:
6092928:
>>>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$
(25b14631-a179-4274-8243-6e272d4f2ad8),
>>>> client:
>>>>
cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4,
>>>> error-xlator: gvAA01-locks [Invalid argument]
>>>>
>>>>
>>>> [2019-04-20 13:35:07.495495] E
[rpcsvc.c:1364:rpcsvc_submit_generic]
>>>> 0-rpc-service: failed to submit message (XID: 0x247c644,
Program: GlusterFS
>>>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(tcp.gvAA01-server)
>>>> [2019-04-20 13:35:07.495619] E
[server.c:195:server_submit_reply]
>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a)
>>>> [0x7ff4ae6f796a]
>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8)
>>>> [0x7ff4ae2a96e8]
>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d)
>>>> [0x7ff4ae28528d] ) 0-: Reply submission failed
>>>>
>>>>
>>>> Fix the posix acls and see if these clear up over time as well,
I?m
>>>> unclear on what the overall effect of running without the posix
acls will
>>>> be to total gluster health. Your biggest problem sounds like
you need to
>>>> free up space on the volumes and get the overall volume health
back up to
>>>> par and see if that doesn?t resolve the symptoms you?re seeing.
>>>>
>>>>
>>>>
>>>> Thank you again for your assistance. It is greatly appreciated.
>>>>
>>>> - Patrick
>>>>
>>>>
>>>>
>>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <budic at
onholyground.com>
>>>> wrote:
>>>>
>>>>> Patrick,
>>>>>
>>>>> I would definitely upgrade your two nodes from 3.12.14 to
3.12.15. You
>>>>> also mention ZFS, and that error you show makes me think
you need to check
>>>>> to be sure you have ?xattr=sa? and ?acltype=posixacl? set
on your ZFS
>>>>> volumes.
>>>>>
>>>>> You also observed your bricks are crossing the 95% full
line, ZFS
>>>>> performance will degrade significantly the closer you get
to full. In my
>>>>> experience, this starts somewhere between 10% and 5% free
space remaining,
>>>>> so you?re in that realm.
>>>>>
>>>>> How?s your free memory on the servers doing? Do you have
your zfs arc
>>>>> cache limited to something less than all the RAM? It shares
pretty well,
>>>>> but I?ve encountered situations where other things won?t
try and take ram
>>>>> back properly if they think it?s in use, so ZFS never gets
the opportunity
>>>>> to give it up.
>>>>>
>>>>> Since your volume is a disperse-replica, you might try
tuning
>>>>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4,
or even more if
>>>>> the CPUs are beefy enough. And setting server.event-threads
to 4 and
>>>>> client.event-threads to 8 has proven helpful in many cases.
After you get
>>>>> upgraded to 3.12.15, enabling performance.stat-prefetch may
help as well. I
>>>>> don?t know if it matters, but I?d also recommend resetting
>>>>> performance.least-prio-threads to the default of 1 (or try
2 or 4) and/or
>>>>> also setting performance.io-thread-count to 32 if those
have beefy
>>>>> CPUs.
>>>>>
>>>>> Beyond those general ideas, more info about your hardware
(CPU and
>>>>> RAM) and workload (VMs, direct storage for web servers or
enders, etc) may
>>>>> net you some more ideas. Then you?re going to have to do
more digging into
>>>>> brick logs looking for errors and/or warnings to see what?s
going on.
>>>>>
>>>>>   -Darrell
>>>>>
>>>>>
>>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie
<patrickmrennie at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Gluster Users,
>>>>>
>>>>> I am hoping someone can help me with resolving an ongoing
issue I've
>>>>> been having, I'm new to mailing lists so forgive me if
I have gotten
>>>>> anything wrong. We have noticed our performance
deteriorating over the last
>>>>> few weeks, easily measured by trying to do an ls on one of
our top-level
>>>>> folders, and timing it, which usually would take 2-5
seconds, and now takes
>>>>> up to 20 minutes, which obviously renders our cluster
basically unusable.
>>>>> This has been intermittent in the past but is now almost
constant and I am
>>>>> not sure how to work out the exact cause. We have noticed
some errors in
>>>>> the brick logs, and have noticed that if we kill the right
brick process,
>>>>> performance instantly returns back to normal, this is not
always the same
>>>>> brick, but it indicates to me something in the brick
processes or
>>>>> background tasks may be causing extreme latency. Due to
this ability to fix
>>>>> it by killing the right brick process off, I think it's
a specific file, or
>>>>> folder, or operation which may be hanging and causing the
increased
>>>>> latency, but I am not sure how to work it out. One last
thing to add is
>>>>> that our bricks are getting quite full (~95% full), we are
trying to
>>>>> migrate data off to new storage but that is going slowly,
not helped by
>>>>> this issue. I am currently trying to run a full heal as
there appear to be
>>>>> many files needing healing, and I have all brick processes
running so they
>>>>> have an opportunity to heal, but this means performance is
very poor. It
>>>>> currently takes over 15-20 minutes to do an ls of one of
our top-level
>>>>> folders, which just contains 60-80 other folders, this
should take 2-5
>>>>> seconds. This is all being checked by FUSE mount locally on
the storage
>>>>> node itself, but it is the same for other clients and VMs
accessing the
>>>>> cluster. Initially, it seemed our NFS mounts were not
affected and operated
>>>>> at normal speed, but testing over the last day has shown
that our NFS
>>>>> clients are also extremely slow, so it doesn't seem
specific to FUSE as I
>>>>> first thought it might be.
>>>>>
>>>>> I am not sure how to proceed from here, I am fairly new to
gluster
>>>>> having inherited this setup from my predecessor and trying
to keep it
>>>>> going. I have included some info below to try and help with
diagnosis,
>>>>> please let me know if any further info would be helpful. I
would really
>>>>> appreciate any advice on what I could try to work out the
cause. Thank you
>>>>> in advance for reading this, and any suggestions you might
be able to
>>>>> offer.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> This is an example of the main error I see in our brick
logs, there
>>>>> have been others, I can post them when I see them again
too:
>>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001]
>>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr
failed on
>>>>> /brick1/<filename> library: system.posix_acl_default 
[Operation not
>>>>> supported]
>>>>> [2019-04-20 05:01:29.476313] W
[posix.c:4929:posix_getxattr]
>>>>> 0-gvAA01-posix: Extended attributes not supported (try
remounting brick
>>>>> with 'user_xattr' flag)
>>>>>
>>>>> Our setup consists of 2 storage nodes and an arbiter node.
I have
>>>>> noticed our nodes are on slightly different versions,
I'm not sure if this
>>>>> could be an issue. We have 9 bricks on each node, made up
of ZFS RAIDZ2
>>>>> pools - total capacity is around 560TB.
>>>>> We have bonded 10gbps NICS on each node, and I have tested
bandwidth
>>>>> with iperf and found that it's what would be expected
from this config.
>>>>> Individual brick performance seems ok, I've tested
several bricks
>>>>> using dd and can write a 10GB files at 1.7GB/s.
>>>>>
>>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M
count=10000
>>>>> 10000+0 records in
>>>>> 10000+0 records out
>>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7
GB/s
>>>>>
>>>>> Node 1:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.15
>>>>>
>>>>> Node 2:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.14
>>>>>
>>>>> Arbiter:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.14
>>>>>
>>>>> Here is our gluster volume status:
>>>>>
>>>>> # gluster volume status
>>>>> Status of volume: gvAA01
>>>>> Gluster process                             TCP Port  RDMA
Port
>>>>> Online  Pid
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> Brick 01-B:/brick1/gvAA01/brick    49152     0          Y  
7219
>>>>> Brick 02-B:/brick1/gvAA01/brick    49152     0          Y  
21845
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck1                                         49152     0    
Y
>>>>>    6931
>>>>> Brick 01-B:/brick2/gvAA01/brick    49153     0          Y  
7239
>>>>> Brick 02-B:/brick2/gvAA01/brick    49153     0          Y  
9916
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck2                                         49153     0    
Y
>>>>>    6939
>>>>> Brick 01-B:/brick3/gvAA01/brick    49154     0          Y  
7235
>>>>> Brick 02-B:/brick3/gvAA01/brick    49154     0          Y  
21858
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck3                                         49154     0    
Y
>>>>>    6947
>>>>> Brick 01-B:/brick4/gvAA01/brick    49155     0          Y  
31840
>>>>> Brick 02-B:/brick4/gvAA01/brick    49155     0          Y  
9933
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck4                                         49155     0    
Y
>>>>>    6956
>>>>> Brick 01-B:/brick5/gvAA01/brick    49156     0          Y  
7233
>>>>> Brick 02-B:/brick5/gvAA01/brick    49156     0          Y  
9942
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck5                                         49156     0    
Y
>>>>>    6964
>>>>> Brick 01-B:/brick6/gvAA01/brick    49157     0          Y  
7234
>>>>> Brick 02-B:/brick6/gvAA01/brick    49157     0          Y  
9952
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck6                                         49157     0    
Y
>>>>>    6974
>>>>> Brick 01-B:/brick7/gvAA01/brick    49158     0          Y  
7248
>>>>> Brick 02-B:/brick7/gvAA01/brick    49158     0          Y  
9960
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck7                                         49158     0    
Y
>>>>>    6984
>>>>> Brick 01-B:/brick8/gvAA01/brick    49159     0          Y  
7253
>>>>> Brick 02-B:/brick8/gvAA01/brick    49159     0          Y  
9970
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck8                                         49159     0    
Y
>>>>>    6993
>>>>> Brick 01-B:/brick9/gvAA01/brick    49160     0          Y  
7245
>>>>> Brick 02-B:/brick9/gvAA01/brick    49160     0          Y  
9984
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck9                                         49160     0    
Y
>>>>>    7001
>>>>> NFS Server on localhost                     2049      0    
Y
>>>>>    17276
>>>>> Self-heal Daemon on localhost               N/A       N/A  
Y
>>>>>    25245
>>>>> NFS Server on 02-B                 2049      0          Y  
9089
>>>>> Self-heal Daemon on 02-B           N/A       N/A        Y  
17838
>>>>> NFS Server on 00-a                 2049      0          Y  
15660
>>>>> Self-heal Daemon on 00-a           N/A       N/A        Y  
16218
>>>>>
>>>>> Task Status of Volume gvAA01
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>>
>>>>> And gluster volume info:
>>>>>
>>>>> # gluster volume info
>>>>>
>>>>> Volume Name: gvAA01
>>>>> Type: Distributed-Replicate
>>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 9 x (2 + 1) = 27
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: 01-B:/brick1/gvAA01/brick
>>>>> Brick2: 02-B:/brick1/gvAA01/brick
>>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
>>>>> Brick4: 01-B:/brick2/gvAA01/brick
>>>>> Brick5: 02-B:/brick2/gvAA01/brick
>>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
>>>>> Brick7: 01-B:/brick3/gvAA01/brick
>>>>> Brick8: 02-B:/brick3/gvAA01/brick
>>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
>>>>> Brick10: 01-B:/brick4/gvAA01/brick
>>>>> Brick11: 02-B:/brick4/gvAA01/brick
>>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
>>>>> Brick13: 01-B:/brick5/gvAA01/brick
>>>>> Brick14: 02-B:/brick5/gvAA01/brick
>>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
>>>>> Brick16: 01-B:/brick6/gvAA01/brick
>>>>> Brick17: 02-B:/brick6/gvAA01/brick
>>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)
>>>>> Brick19: 01-B:/brick7/gvAA01/brick
>>>>> Brick20: 02-B:/brick7/gvAA01/brick
>>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)
>>>>> Brick22: 01-B:/brick8/gvAA01/brick
>>>>> Brick23: 02-B:/brick8/gvAA01/brick
>>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)
>>>>> Brick25: 01-B:/brick9/gvAA01/brick
>>>>> Brick26: 02-B:/brick9/gvAA01/brick
>>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)
>>>>> Options Reconfigured:
>>>>> cluster.shd-max-threads: 4
>>>>> performance.least-prio-threads: 16
>>>>> cluster.readdir-optimize: on
>>>>> performance.quick-read: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.data-self-heal: on
>>>>> cluster.lookup-unhashed: auto
>>>>> cluster.lookup-optimize: on
>>>>> cluster.favorite-child-policy: mtime
>>>>> server.allow-insecure: on
>>>>> transport.address-family: inet
>>>>> client.bind-insecure: on
>>>>> cluster.entry-self-heal: off
>>>>> cluster.metadata-self-heal: off
>>>>> performance.md-cache-timeout: 600
>>>>> cluster.self-heal-daemon: enable
>>>>> performance.readdir-ahead: on
>>>>> diagnostics.brick-log-level: INFO
>>>>> nfs.disable: off
>>>>>
>>>>> Thank you for any assistance.
>>>>>
>>>>> - Patrick
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190421/31d3d12b/attachment-0001.html>

Darrell Budic

2019-Apr-21 16:43 UTC

head link

[Gluster-users] Extremely slow cluster performance

Patrick-

Specifically re:> Thanks again for your advice, I've left it for a while but
unfortunately it's still just as slow and causing more problems for our
operations now. I will need to try and take some steps to at least bring
performance back to normal while continuing to investigate the issue longer
term. I can definitely see one node with heavier CPU than the other, almost
double, which I am OK with, but I think the heal process is going to take
forever, trying to check the "gluster volume heal info" shows
thousands and thousands of files which may need healing, I have no idea how many
in total the command is still running after hours, so I am not sure what has
gone so wrong to cause this.
> ...
> I have no idea how long the healing is going to take on this cluster, we
have around 560TB of data on here, but I don't think I can wait that long to
try and restore performance to normal.
You?re in a bind, I know, but it?s just going to take some time recover. You
have a lot of data, and even at the best speeds your disks and networks can
muster, it?s going to take a while. Until your cluster is fully healed, anything
else you try may not have the full effect it would on a fully operational
cluster. Your predecessor may have made things worse by not having proper posix
attributes on the ZFS file system. You may have made things worse by killing
brick processes in your distributed-replicated setup, creating an additional
need for healing and possibly compounding the overall performance issues. I?m
not trying to blame you or make you feel bad, but I do want to point out that
there?s a problem here, and there is unlikely to be a silver bullet that will
resolve the issue instantly. You?re going to have to give it time to get back
into a ?normal" condition, which seems to be what your setup was configured
and tested for in the first place.

Those things said, rather than trying to move things from this cluster to
different storage, what about having your VMs mount different storage in the
first place and move the write load off of this cluster while it recovers?

Looking at the profile you posted for Strahil, your bricks are spending a lot of
time doing LOOKUPs, and some are slower than others by a significant margin. If
you haven?t already, check the zfs pools on those, make sure they don?t have any
failed disks that might be slowing them down. Consider if you can speed them up
with a ZIL or SLOG if they are spinning disks (although your previous server
descriptions sound like you don?t need a SLOG, ZILs may help fi they are HDDs)?
Just saw your additional comments that one server is faster than than the other,
it?s possible that it?s got the actual data and the other one is doing healings
every time it gets accessed, or it?s just got fuller and slower volumes. It may
make sense to try forcing all your VM mounts to the faster server for a while,
even if it?s the one with higher load (serving will get preference to healing,
but don?t push the shd-max-threads too high, they can squash performance. Given
it?s a dispersed volume, make sure you?ve got disperse.shd-max-threads at 4 or
8, and raise disperse.shd-wait-qlength to 4096 or so.

You?re getting into things best tested with everything working, but desperate
times call for accelerated testing, right?

You could experiment with different values of performance.io
<http://performance.io/>-thread-cound, try 48. But if your CPU load is
already near max, you?re getting everything you can out of your CPU already, so
don?t spend too much time on it.

Check out
https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache
<https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache>
and try applying these to your gluster volume. Without knowing more about your
workload, these may help if you?re doing a lot of directory listing and file
lookups or tests for the (non)existence of a file from your VMs. If those help,
search the mailing list for info on the mount option ?negative_cache=1? and a
thread titled '[Gluster-users] Gluster native mount is really slow compared
to nfs?, it may have some client side mount options that could give you further
benefits.

Have a look at
https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options
<https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options>,
cluster.data-sef-heal-algorithm full may help things heal faster for you.
performance.flush-behind & related may improve write response to the
clients, use caution unless you have UPSs & battery backed raids, etc. If
you have stats on network traffic on/between your two ?real? node servers, you
can use that as a proxy value for healing performance.

I looked up the performance.stat-prefetch bug for you, it was fixed back in 3.8,
so it should be safe to enable on your 3.12.x system even with servers at .15
& .14.

You?ll probably have to wait for devs to get anything else out of those logs,
but make sure your servers can all see each other (gluster peer status,
everything should be ?Peer in Cluster (Connected)? on all servers), and all 3
see all the bricks in the ?gluster vol status?.  Maybe check for split brain
files on those you keep seeing in the logs?

Good luck, have patience, and remember (& remind others) that things are not
in their normal state at this moment, and look for things outside of the gluster
server cluster to try to help
(https://joejulian.name/post/optimizing-web-performance-with-glusterfs/) get
through the healing as well.

   -Darrell
> On Apr 21, 2019, at 4:41 AM, Patrick Rennie <patrickmrennie at
gmail.com> wrote:
> 
> Another small update from me, I have been keeping an eye on the
glustershd.log file to see what is going on and I keep seeing the same file
names come up in there every 10 minutes, but not a lot of other activity. Logs
below.
> How can I be sure my heal is progressing through the files which actually
need to be healed? I thought it would show up in these logs.
> I also increased the "cluster.shd-max-threads" from 4 to 8 to try
and speed things up too.
> 
> Any ideas here? 
> 
> Thanks,
> 
> - Patrick
> 
> On 01-B
> -------
> [2019-04-21 09:12:54.575689] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904
> [2019-04-21 09:12:54.733601] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. sources=[0] 2 
sinks=1
> [2019-04-21 09:13:12.028509] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:13:12.047470] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> 
> [2019-04-21 09:23:13.044377] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:23:13.051479] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> 
> [2019-04-21 09:33:07.400369] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2  sinks=1
> [2019-04-21 09:33:11.825449] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa
> [2019-04-21 09:33:14.029837] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:33:14.037436] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> [2019-04-21 09:33:23.913882] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 
sinks=1
> [2019-04-21 09:33:43.874201] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1
> [2019-04-21 09:34:02.273898] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. sources=[0] 2 
sinks=1
> [2019-04-21 09:35:12.282045] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2  sinks=1
> [2019-04-21 09:35:15.146252] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885
> [2019-04-21 09:35:15.254538] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 
sinks=1
> [2019-04-21 09:35:22.900803] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2  sinks=1
> [2019-04-21 09:35:27.150963] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45
> [2019-04-21 09:35:29.186295] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 
sinks=1
> [2019-04-21 09:35:35.967451] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2  sinks=1
> [2019-04-21 09:35:40.733444] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9
> [2019-04-21 09:35:58.707593] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 
sinks=1
> [2019-04-21 09:36:25.554260] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2  sinks=1
> [2019-04-21 09:36:26.031422] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6:
performing metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d
> [2019-04-21 09:36:26.083982] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed
metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 
sinks=1
> 
> On 02-B
> -------
> [2019-04-21 09:03:15.815250] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
> [2019-04-21 09:03:15.863153] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:03:15.867432] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
> [2019-04-21 09:03:15.875134] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:03:39.020198] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:03:39.027345] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> 
> [2019-04-21 09:13:18.524874] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
> [2019-04-21 09:13:20.070172] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:13:20.074977] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
> [2019-04-21 09:13:20.080827] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:13:40.015763] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:13:40.021805] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> 
> [2019-04-21 09:23:21.991032] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
> [2019-04-21 09:23:22.054565] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:23:22.059225] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
> [2019-04-21 09:23:22.066266] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:23:41.129962] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:23:41.135919] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> 
> [2019-04-21 09:33:24.015223] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
> [2019-04-21 09:33:24.069686] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:33:24.074341] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
> [2019-04-21 09:33:24.080065] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
> [2019-04-21 09:33:42.099515] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
> [2019-04-21 09:33:42.107481] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
> 
> 
> On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie <patrickmrennie at
gmail.com <mailto:patrickmrennie at gmail.com>> wrote:
> Just another small update, I'm continuing to watch my brick logs and I
just saw these errors come up in the recent events too. I am going to continue
to post any errors I see in the hope of finding the right one to try and fix..
> This is from the logs on brick1, seems to be occurring on both nodes on
brick1, although at different times. I'm not sure what this means, can
anyone shed any light?
> I guess I am looking for some kind of specific error which may indicate
something is broken or stuck and locking up and causing the extreme latency
I'm seeing in the cluster.
> 
> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a)
[0x7f3b3e93158a]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45)
[0x7f3b3e4c5d45]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS 3.3,
ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
[0x7f3b3e9318fa]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
[0x7f3b3e4c5f35]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
[0x7f3b3e4b72cd] ) 0-: Reply submission failed
> 
> Thanks again,
> 
> -Patrick
> 
> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <patrickmrennie at
gmail.com <mailto:patrickmrennie at gmail.com>> wrote:
> Hi Darrell, 
> 
> Thanks again for your advice, I've left it for a while but
unfortunately it's still just as slow and causing more problems for our
operations now. I will need to try and take some steps to at least bring
performance back to normal while continuing to investigate the issue longer
term. I can definitely see one node with heavier CPU than the other, almost
double, which I am OK with, but I think the heal process is going to take
forever, trying to check the "gluster volume heal info" shows
thousands and thousands of files which may need healing, I have no idea how many
in total the command is still running after hours, so I am not sure what has
gone so wrong to cause this.
> 
> I've checked cluster.op-version and cluster.max-op-version and it looks
like I'm on the latest version there.
> 
> I have no idea how long the healing is going to take on this cluster, we
have around 560TB of data on here, but I don't think I can wait that long to
try and restore performance to normal.
> 
> Can anyone think of anything else I can try in the meantime to work out
what's causing the extreme latency?
> 
> I've been going through cluster client the logs of some of our VMs and
on some of our FTP servers I found this in the cluster mount log, but I am not
seeing it on any of our other servers, just our FTP servers.
> 
> [2019-04-21 07:16:19.925388] E [MSGID: 101046]
[dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
> [2019-04-21 07:19:43.413834] W [MSGID: 114031]
[client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote
operation failed [No such file or directory]
> [2019-04-21 07:19:43.414153] W [MSGID: 114031]
[client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote
operation failed [No such file or directory]
> [2019-04-21 07:23:33.154717] E [MSGID: 101046]
[dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
> [2019-04-21 07:33:24.943913] E [MSGID: 101046]
[dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
> 
> Any ideas what this could mean? I am basically just grasping at straws
here.
> 
> I am going to hold off on the version upgrade until I know there are no
files which need healing, which could be a while, from some reading I've
done there shouldn't be any issues with this as both are on v3.12.x
> 
> I've free'd up a small amount of space, but I still need to work on
this further.
> 
> I've read of a command "find .glusterfs -type f -links -2 -exec rm
{} \;" which could be run on each brick and it would potentially clean up
any files which were deleted straight from the bricks, but not via the client, I
have a feeling this could help me free up about 5-10TB per brick from what
I've been told about the history of this cluster. Can anyone confirm if this
is actually safe to run?
> 
> At this stage, I'm open to any suggestions as to how to proceed, thanks
again for any advice.
> 
> Cheers, 
> 
> - Patrick
> 
> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <budic at onholyground.com
<mailto:budic at onholyground.com>> wrote:
> Patrick,
> 
> Sounds like progress. Be aware that gluster is expected to max out the CPUs
on at least one of your servers while healing. This is normal and won?t
adversely affect overall performance (any more than having bricks in need of
healing, at any rate) unless you?re overdoing it. shd threads <= 4 should not
do that on your hardware. Other tunings may have also increased overall
performance, so you may see higher CPU than previously anyway. I?d recommend
upping those thread counts and letting it heal as fast as possible, especially
if these are dedicated Gluster storage servers (Ie: not also running VMs, etc).
You should see ?normal? CPU use one heals are completed. I see ~15-30% overall
normally, 95-98% while healing (x my 20 cores). It?s also likely to be different
between your servers, in a pure replica, one tends to max and one tends to be a
little higher, in a distributed-replica, I?d expect more than one to run harder
while healing.
> 
> Keep the differences between doing an ls on a brick and doing an ls on a
gluster mount in mind. When you do a ls on a gluster volume, it isn?t just doing
a ls on one brick, it?s effectively doing it on ALL of your bricks, and they all
have to return data before the ls succeeds. In a distributed volume, it?s
figuring out where on each volume things live and getting the stat() from each
to assemble the whole thing. And if things are in need of healing, it will take
even longer to decide which version is current and use it (shd triggers a heal
anytime it encounters this). Any of these things being slow slows down the
overall response.
> 
> At this point, I?d get some sleep too, and let your cluster heal while you
do. I?d really want it fully healed before I did any updates anyway, so let it
use CPU and get itself sorted out. Expect it to do a round of healing after you
upgrade each machine too, this is normal so don?t let the CPU spike surprise
you, It?s just catching up from the downtime incurred by the update and/or
reboot if you did one.
> 
> That reminds me, check your gluster cluster.op-version and
cluster.max-op-version (gluster vol get all all | grep op-version). If
op-version isn?t at the max-op-verison, set it to it so you?re taking advantage
of the latest features available to your version.
> 
>   -Darrell
> 
>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <patrickmrennie at
gmail.com <mailto:patrickmrennie at gmail.com>> wrote:
>> 
>> Hi Darrell, 
>> 
>> Thanks again for your advice, I've applied the acltype=posixacl on
my zpools and I think that has reduced some of the noise from my brick logs.
>> I also bumped up some of the thread counts you suggested but my CPU
load skyrocketed, so I dropped it back down to something slightly lower, but
still higher than it was before, and will see how that goes for a while.
>> 
>> Although low space is a definite issue, if I run an ls anywhere on my
bricks directly it's instant, <1 second, and still takes several minutes
via gluster, so there is still a problem in my gluster configuration somewhere.
We don't have any snapshots, but I am trying to work out if any data on
there is safe to delete, or if there is any way I can safely find and delete
data which has been removed directly from the bricks in the past. I also have
lz4 compression already enabled on each zpool which does help a bit, we get
between 1.05 and 1.08x compression on this data.
>> I've tried to go through each client and checked it's cluster
mount logs and also my brick logs and looking for errors, so far nothing is
jumping out at me, but there are some warnings and errors here and there, I am
trying to work out what they mean.
>> 
>> It's already 1 am here and unfortunately, I'm still awake
working on this issue, but I think that I will have to leave the version
upgrades until tomorrow.
>> 
>> Thanks again for your advice so far. If anyone has any ideas on where I
can look for errors other than brick logs or the cluster mount logs to help
resolve this issue, it would be much appreciated.
>> 
>> Cheers,
>> 
>> - Patrick
>> 
>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <budic at
onholyground.com <mailto:budic at onholyground.com>> wrote:
>> See inline:
>> 
>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <patrickmrennie at
gmail.com <mailto:patrickmrennie at gmail.com>> wrote:
>>> 
>>> Hi Darrell, 
>>> 
>>> Thanks for your reply, this issue seems to be getting worse over
the last few days, really has me tearing my hair out. I will do as you have
suggested and get started on upgrading from 3.12.14 to 3.12.15.
>>> I've checked the zfs properties and all bricks have
"xattr=sa" set, but none of them has "acltype=posixacl" set,
currently the acltype property shows "off", if I make these changes
will it apply retroactively to the existing data? I'm unfamiliar with what
this will change so I may need to look into that before I proceed.
>> 
>> It is safe to apply that now, any new set/get calls will then use it if
new posixacls exist, and use older if not. ZFS is good that way. It should clear
up your posix_acl and posix errors over time.
>> 
>>> I understand performance is going to slow down as the bricks get
full, I am currently trying to free space and migrate data to some newer
storage, I have fresh several hundred TB storage I just setup recently but with
these performance issues it's really slow. I also believe there is
significant data which has been deleted directly from the bricks in the past, so
if I can reclaim this space in a safe manner then I will have at least around
10-15% free space.
>> 
>> Full ZFS volumes will have a much larger impact on performance than
you?d think, I?d prioritize this. If you have been taking zfs snapshots,
consider deleting them to get the overall volume free space back up. And just to
be sure it?s been said, delete from within the mounted volumes, don?t delete
directly from the bricks (gluster will just try and heal it later, compounding
your issues). Does not apply to deleting other data from the ZFS volume if it?s
not part of the brick directory, of course.
>> 
>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so
generally they have plenty of resources available, currently only using around
330/512GB of memory.
>>> 
>>> I will look into what your suggested settings will change, and then
will probably go ahead with your recommendations, for our specs as stated above,
what would you suggest for performance.io
<http://performance.io/>-thread-count ?
>> 
>> I run single 2630v4s on my servers, which have a smaller storage
footprint than yours. I?d go with 32 for performance.io
<http://performance.io/>-thread-count. I?d try 4 for the shd thread
settings on that gear. Your memory use sounds fine, so no worries there.
>> 
>>> Our workload is nothing too extreme, we have a few VMs which write
backup data to this storage nightly for our clients, our VMs don't live on
this cluster, but just write to it.
>> 
>> If they are writing compressible data, you?ll get immediate benefit by
setting compression=lz4 on your ZFS volumes. It won?t help any old data, of
course, but it will compress new data going forward. This is another one that?s
safe to enable on the fly.
>> 
>>> I've been going through all of the logs I can, below are some
slightly sanitized errors I've come across, but I'm not sure what to
make of them. The main error I am seeing is the first one below, across several
of my bricks, but possibly only for specific folders on the cluster, I'm not
100% about that yet though.
>>> 
>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001]
[posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
/brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
supported]
>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001]
[posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
/brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
supported]
>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001]
[posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
/brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
supported]
>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001]
[posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
/brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
supported]
>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001]
[posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
/brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default  [Operation not
supported]
>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr]
0-gvAA01-posix: Extended attributes not supported (try remounting brick with
'user_xattr' flag)
>>> 
>>> 
>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002]
[posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for
/xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002]
[posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for
/brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available]
>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050]
[server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP
/xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
Backup_clone1.vbm_62906_tmp), client:
00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator:
gvAA01-posix [No data available]
>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050]
[server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP
/xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
Backup_clone1.vbm_62906_tmp), client:
00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator:
gvAA01-posix [No data available]
>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002]
[posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for
/xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002]
[posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for
/brick2/xxxxxxxxxxxxxxxxxxxx [No data available]
>>> 
>> 
>> posixacls should clear those up, as mentioned.
>> 
>>> 
>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock]
0-gvAA01-locks:  Matching lock not found for unlock 0-9223372036854775807, by
980fdbbd367f0000 on 0x7fc4f0161440
>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053]
[server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK
/xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client:
cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4,
error-xlator: gvAA01-locks [Invalid argument]
>>> 
>>> 
>>> [2019-04-20 13:35:07.495495] E
[rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message
(XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to
rpc-transport (tcp.gvAA01-server)
>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a)
[0x7ff4ae6f796a]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8)
[0x7ff4ae2a96e8]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d)
[0x7ff4ae28528d] ) 0-: Reply submission failed
>>> 
>> 
>> Fix the posix acls and see if these clear up over time as well, I?m
unclear on what the overall effect of running without the posix acls will be to
total gluster health. Your biggest problem sounds like you need to free up space
on the volumes and get the overall volume health back up to par and see if that
doesn?t resolve the symptoms you?re seeing.
>> 
>> 
>>> 
>>> Thank you again for your assistance. It is greatly appreciated. 
>>> 
>>> - Patrick
>>> 
>>> 
>>> 
>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <budic at
onholyground.com <mailto:budic at onholyground.com>> wrote:
>>> Patrick,
>>> 
>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15.
You also mention ZFS, and that error you show makes me think you need to check
to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes.
>>> 
>>> You also observed your bricks are crossing the 95% full line, ZFS
performance will degrade significantly the closer you get to full. In my
experience, this starts somewhere between 10% and 5% free space remaining, so
you?re in that realm.
>>> 
>>> How?s your free memory on the servers doing? Do you have your zfs
arc cache limited to something less than all the RAM? It shares pretty well, but
I?ve encountered situations where other things won?t try and take ram back
properly if they think it?s in use, so ZFS never gets the opportunity to give it
up.
>>> 
>>> Since your volume is a disperse-replica, you might try tuning
disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the
CPUs are beefy enough. And setting server.event-threads to 4 and
client.event-threads to 8 has proven helpful in many cases. After you get
upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I
don?t know if it matters, but I?d also recommend resetting
performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also
setting performance.io <http://performance.io/>-thread-count to 32 if
those have beefy CPUs.
>>> 
>>> Beyond those general ideas, more info about your hardware (CPU and
RAM) and workload (VMs, direct storage for web servers or enders, etc) may net
you some more ideas. Then you?re going to have to do more digging into brick
logs looking for errors and/or warnings to see what?s going on.
>>> 
>>>   -Darrell
>>> 
>>> 
>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie <patrickmrennie
at gmail.com <mailto:patrickmrennie at gmail.com>> wrote:
>>>> 
>>>> Hello Gluster Users, 
>>>> 
>>>> I am hoping someone can help me with resolving an ongoing issue
I've been having, I'm new to mailing lists so forgive me if I have
gotten anything wrong. We have noticed our performance deteriorating over the
last few weeks, easily measured by trying to do an ls on one of our top-level
folders, and timing it, which usually would take 2-5 seconds, and now takes up
to 20 minutes, which obviously renders our cluster basically unusable. This has
been intermittent in the past but is now almost constant and I am not sure how
to work out the exact cause. We have noticed some errors in the brick logs, and
have noticed that if we kill the right brick process, performance instantly
returns back to normal, this is not always the same brick, but it indicates to
me something in the brick processes or background tasks may be causing extreme
latency. Due to this ability to fix it by killing the right brick process off, I
think it's a specific file, or folder, or operation which may be hanging and
causing the increased latency, but I am not sure how to work it out. One last
thing to add is that our bricks are getting quite full (~95% full), we are
trying to migrate data off to new storage but that is going slowly, not helped
by this issue. I am currently trying to run a full heal as there appear to be
many files needing healing, and I have all brick processes running so they have
an opportunity to heal, but this means performance is very poor. It currently
takes over 15-20 minutes to do an ls of one of our top-level folders, which just
contains 60-80 other folders, this should take 2-5 seconds. This is all being
checked by FUSE mount locally on the storage node itself, but it is the same for
other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts
were not affected and operated at normal speed, but testing over the last day
has shown that our NFS clients are also extremely slow, so it doesn't seem
specific to FUSE as I first thought it might be.
>>>> 
>>>> I am not sure how to proceed from here, I am fairly new to
gluster having inherited this setup from my predecessor and trying to keep it
going. I have included some info below to try and help with diagnosis, please
let me know if any further info would be helpful. I would really appreciate any
advice on what I could try to work out the cause. Thank you in advance for
reading this, and any suggestions you might be able to offer.
>>>> 
>>>> - Patrick
>>>> 
>>>> This is an example of the main error I see in our brick logs,
there have been others, I can post them when I see them again too:
>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001]
[posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on
/brick1/<filename> library: system.posix_acl_default  [Operation not
supported]
>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr]
0-gvAA01-posix: Extended attributes not supported (try remounting brick with
'user_xattr' flag)
>>>> 
>>>> Our setup consists of 2 storage nodes and an arbiter node. I
have noticed our nodes are on slightly different versions, I'm not sure if
this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2
pools - total capacity is around 560TB.
>>>> We have bonded 10gbps NICS on each node, and I have tested
bandwidth with iperf and found that it's what would be expected from this
config.
>>>> Individual brick performance seems ok, I've tested several
bricks using dd and can write a 10GB files at 1.7GB/s.
>>>> 
>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000
>>>> 10000+0 records in
>>>> 10000+0 records out
>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s
>>>> 
>>>> Node 1:
>>>> # glusterfs --version
>>>> glusterfs 3.12.15
>>>> 
>>>> Node 2:
>>>> # glusterfs --version
>>>> glusterfs 3.12.14
>>>> 
>>>> Arbiter:
>>>> # glusterfs --version
>>>> glusterfs 3.12.14
>>>> 
>>>> Here is our gluster volume status:
>>>> 
>>>> # gluster volume status
>>>> Status of volume: gvAA01
>>>> Gluster process                             TCP Port  RDMA Port
Online  Pid
>>>>
------------------------------------------------------------------------------
>>>> Brick 01-B:/brick1/gvAA01/brick    49152     0          Y      
7219
>>>> Brick 02-B:/brick1/gvAA01/brick    49152     0          Y      
21845
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck1                                         49152     0        
Y       6931
>>>> Brick 01-B:/brick2/gvAA01/brick    49153     0          Y      
7239
>>>> Brick 02-B:/brick2/gvAA01/brick    49153     0          Y      
9916
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck2                                         49153     0        
Y       6939
>>>> Brick 01-B:/brick3/gvAA01/brick    49154     0          Y      
7235
>>>> Brick 02-B:/brick3/gvAA01/brick    49154     0          Y      
21858
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck3                                         49154     0        
Y       6947
>>>> Brick 01-B:/brick4/gvAA01/brick    49155     0          Y      
31840
>>>> Brick 02-B:/brick4/gvAA01/brick    49155     0          Y      
9933
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck4                                         49155     0        
Y       6956
>>>> Brick 01-B:/brick5/gvAA01/brick    49156     0          Y      
7233
>>>> Brick 02-B:/brick5/gvAA01/brick    49156     0          Y      
9942
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck5                                         49156     0        
Y       6964
>>>> Brick 01-B:/brick6/gvAA01/brick    49157     0          Y      
7234
>>>> Brick 02-B:/brick6/gvAA01/brick    49157     0          Y      
9952
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck6                                         49157     0        
Y       6974
>>>> Brick 01-B:/brick7/gvAA01/brick    49158     0          Y      
7248
>>>> Brick 02-B:/brick7/gvAA01/brick    49158     0          Y      
9960
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck7                                         49158     0        
Y       6984
>>>> Brick 01-B:/brick8/gvAA01/brick    49159     0          Y      
7253
>>>> Brick 02-B:/brick8/gvAA01/brick    49159     0          Y      
9970
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck8                                         49159     0        
Y       6993
>>>> Brick 01-B:/brick9/gvAA01/brick    49160     0          Y      
7245
>>>> Brick 02-B:/brick9/gvAA01/brick    49160     0          Y      
9984
>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>> ck9                                         49160     0        
Y       7001
>>>> NFS Server on localhost                     2049      0        
Y       17276
>>>> Self-heal Daemon on localhost               N/A       N/A      
Y       25245
>>>> NFS Server on 02-B                 2049      0          Y      
9089
>>>> Self-heal Daemon on 02-B           N/A       N/A        Y      
17838
>>>> NFS Server on 00-a                 2049      0          Y      
15660
>>>> Self-heal Daemon on 00-a           N/A       N/A        Y      
16218
>>>> 
>>>> Task Status of Volume gvAA01
>>>>
------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>> 
>>>> And gluster volume info: 
>>>> 
>>>> # gluster volume info
>>>> 
>>>> Volume Name: gvAA01
>>>> Type: Distributed-Replicate
>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 9 x (2 + 1) = 27
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: 01-B:/brick1/gvAA01/brick
>>>> Brick2: 02-B:/brick1/gvAA01/brick
>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
>>>> Brick4: 01-B:/brick2/gvAA01/brick
>>>> Brick5: 02-B:/brick2/gvAA01/brick
>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
>>>> Brick7: 01-B:/brick3/gvAA01/brick
>>>> Brick8: 02-B:/brick3/gvAA01/brick
>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
>>>> Brick10: 01-B:/brick4/gvAA01/brick
>>>> Brick11: 02-B:/brick4/gvAA01/brick
>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
>>>> Brick13: 01-B:/brick5/gvAA01/brick
>>>> Brick14: 02-B:/brick5/gvAA01/brick
>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
>>>> Brick16: 01-B:/brick6/gvAA01/brick
>>>> Brick17: 02-B:/brick6/gvAA01/brick
>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)
>>>> Brick19: 01-B:/brick7/gvAA01/brick
>>>> Brick20: 02-B:/brick7/gvAA01/brick
>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)
>>>> Brick22: 01-B:/brick8/gvAA01/brick
>>>> Brick23: 02-B:/brick8/gvAA01/brick
>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)
>>>> Brick25: 01-B:/brick9/gvAA01/brick
>>>> Brick26: 02-B:/brick9/gvAA01/brick
>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)
>>>> Options Reconfigured:
>>>> cluster.shd-max-threads: 4
>>>> performance.least-prio-threads: 16
>>>> cluster.readdir-optimize: on
>>>> performance.quick-read: off
>>>> performance.stat-prefetch: off
>>>> cluster.data-self-heal: on
>>>> cluster.lookup-unhashed: auto
>>>> cluster.lookup-optimize: on
>>>> cluster.favorite-child-policy: mtime
>>>> server.allow-insecure: on
>>>> transport.address-family: inet
>>>> client.bind-insecure: on
>>>> cluster.entry-self-heal: off
>>>> cluster.metadata-self-heal: off
>>>> performance.md-cache-timeout: 600
>>>> cluster.self-heal-daemon: enable
>>>> performance.readdir-ahead: on
>>>> diagnostics.brick-log-level: INFO
>>>> nfs.disable: off
>>>> 
>>>> Thank you for any assistance. 
>>>> 
>>>> - Patrick
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
>> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190421/c3442b25/attachment.html>

Gluster users - Apr 2019 - Extremely slow cluster performance

[Gluster-users] Extremely slow cluster performance

[Gluster-users] Extremely slow cluster performance