Another small update from me, I have been keeping an eye on the
glustershd.log file to see what is going on and I keep seeing the same file
names come up in there every 10 minutes, but not a lot of other activity.
Logs below.
How can I be sure my heal is progressing through the files which actually
need to be healed? I thought it would show up in these logs.
I also increased the "cluster.shd-max-threads" from 4 to 8 to try and
speed
things up too.
Any ideas here?
Thanks,
- Patrick
On 01-B
-------
[2019-04-21 09:12:54.575689] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
5354c112-2e58-451d-a6f7-6bfcc1c9d904
[2019-04-21 09:12:54.733601] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904.
sources=[0] 2 sinks=1
[2019-04-21 09:13:12.028509] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:13:12.047470] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:23:13.044377] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:23:13.051479] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:33:07.400369] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa.
sources=[0] 2 sinks=1
[2019-04-21 09:33:11.825449] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
2fd9899f-192b-49cb-ae9c-df35d3f004fa
[2019-04-21 09:33:14.029837] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:33:14.037436] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:33:23.913882] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa.
sources=[0] 2 sinks=1
[2019-04-21 09:33:43.874201] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
c25b80fd-f7df-4c6d-92bd-db930e89a0b1
[2019-04-21 09:34:02.273898] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1.
sources=[0] 2 sinks=1
[2019-04-21 09:35:12.282045] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885.
sources=[0] 2 sinks=1
[2019-04-21 09:35:15.146252] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
94027f22-a7d7-4827-be0d-09cf5ddda885
[2019-04-21 09:35:15.254538] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885.
sources=[0] 2 sinks=1
[2019-04-21 09:35:22.900803] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45.
sources=[0] 2 sinks=1
[2019-04-21 09:35:27.150963] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
84c93069-cfd8-441b-a6e8-958bed535b45
[2019-04-21 09:35:29.186295] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45.
sources=[0] 2 sinks=1
[2019-04-21 09:35:35.967451] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9.
sources=[0] 2 sinks=1
[2019-04-21 09:35:40.733444] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
e747c32e-4353-4173-9024-855c69cdf9b9
[2019-04-21 09:35:58.707593] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9.
sources=[0] 2 sinks=1
[2019-04-21 09:36:25.554260] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d.
sources=[0] 2 sinks=1
[2019-04-21 09:36:26.031422] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-gvAA01-replicate-6: performing metadata selfheal on
4758d581-9de0-403b-af8b-bfd3d71d020d
[2019-04-21 09:36:26.083982] I [MSGID: 108026]
[afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6:
Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d.
sources=[0] 2 sinks=1
On 02-B
-------
[2019-04-21 09:03:15.815250] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:03:15.863153] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:03:15.867432] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:03:15.875134] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:03:39.020198] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:03:39.027345] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:13:18.524874] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:13:20.070172] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:13:20.074977] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:13:20.080827] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:13:40.015763] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:13:40.021805] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:23:21.991032] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:23:22.054565] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:23:22.059225] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:23:22.066266] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:23:41.129962] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:23:41.135919] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
[2019-04-21 09:33:24.015223] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01
[2019-04-21 09:33:24.069686] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:33:24.074341] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4:
performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f
[2019-04-21 09:33:24.080065] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4:
expunging file
65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-14
[2019-04-21 09:33:42.099515] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5:
performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe
[2019-04-21 09:33:42.107481] W [MSGID: 108015]
[afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5:
expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp
(00000000-0000-0000-0000-000000000000) on gvAA01-client-17
On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie <patrickmrennie at
gmail.com>
wrote:
> Just another small update, I'm continuing to watch my brick logs and I
> just saw these errors come up in the recent events too. I am going to
> continue to post any errors I see in the hope of finding the right one to
> try and fix..
> This is from the logs on brick1, seems to be occurring on both nodes on
> brick1, although at different times. I'm not sure what this means, can
> anyone shed any light?
> I guess I am looking for some kind of specific error which may indicate
> something is broken or stuck and locking up and causing the extreme latency
> I'm seeing in the cluster.
>
> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a)
> [0x7f3b3e93158a]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45)
> [0x7f3b3e4c5d45]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server)
> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa)
> [0x7f3b3e9318fa]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35)
> [0x7f3b3e4c5f35]
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd)
> [0x7f3b3e4b72cd] ) 0-: Reply submission failed
>
> Thanks again,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie <patrickmrennie at
gmail.com>
> wrote:
>
>> Hi Darrell,
>>
>> Thanks again for your advice, I've left it for a while but
unfortunately
>> it's still just as slow and causing more problems for our
operations now. I
>> will need to try and take some steps to at least bring performance back
to
>> normal while continuing to investigate the issue longer term. I can
>> definitely see one node with heavier CPU than the other, almost double,
>> which I am OK with, but I think the heal process is going to take
forever,
>> trying to check the "gluster volume heal info" shows
thousands and
>> thousands of files which may need healing, I have no idea how many in
total
>> the command is still running after hours, so I am not sure what has
gone so
>> wrong to cause this.
>>
>> I've checked cluster.op-version and cluster.max-op-version and it
looks
>> like I'm on the latest version there.
>>
>> I have no idea how long the healing is going to take on this cluster,
we
>> have around 560TB of data on here, but I don't think I can wait
that long
>> to try and restore performance to normal.
>>
>> Can anyone think of anything else I can try in the meantime to work out
>> what's causing the extreme latency?
>>
>> I've been going through cluster client the logs of some of our VMs
and on
>> some of our FTP servers I found this in the cluster mount log, but I am
not
>> seeing it on any of our other servers, just our FTP servers.
>>
>> [2019-04-21 07:16:19.925388] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:19:43.413834] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19:
remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:19:43.414153] W [MSGID: 114031]
>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20:
remote
>> operation failed [No such file or directory]
>> [2019-04-21 07:23:33.154717] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>> [2019-04-21 07:33:24.943913] E [MSGID: 101046]
>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null
>>
>> Any ideas what this could mean? I am basically just grasping at straws
>> here.
>>
>> I am going to hold off on the version upgrade until I know there are no
>> files which need healing, which could be a while, from some reading
I've
>> done there shouldn't be any issues with this as both are on v3.12.x
>>
>> I've free'd up a small amount of space, but I still need to
work on this
>> further.
>>
>> I've read of a command "find .glusterfs -type f -links -2
-exec rm {} \;"
>> which could be run on each brick and it would potentially clean up any
>> files which were deleted straight from the bricks, but not via the
client,
>> I have a feeling this could help me free up about 5-10TB per brick from
>> what I've been told about the history of this cluster. Can anyone
confirm
>> if this is actually safe to run?
>>
>> At this stage, I'm open to any suggestions as to how to proceed,
thanks
>> again for any advice.
>>
>> Cheers,
>>
>> - Patrick
>>
>> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic <budic at
onholyground.com>
>> wrote:
>>
>>> Patrick,
>>>
>>> Sounds like progress. Be aware that gluster is expected to max out
the
>>> CPUs on at least one of your servers while healing. This is normal
and
>>> won?t adversely affect overall performance (any more than having
bricks in
>>> need of healing, at any rate) unless you?re overdoing it. shd
threads <= 4
>>> should not do that on your hardware. Other tunings may have also
increased
>>> overall performance, so you may see higher CPU than previously
anyway. I?d
>>> recommend upping those thread counts and letting it heal as fast as
>>> possible, especially if these are dedicated Gluster storage servers
(Ie:
>>> not also running VMs, etc). You should see ?normal? CPU use one
heals are
>>> completed. I see ~15-30% overall normally, 95-98% while healing (x
my 20
>>> cores). It?s also likely to be different between your servers, in a
pure
>>> replica, one tends to max and one tends to be a little higher, in a
>>> distributed-replica, I?d expect more than one to run harder while
healing.
>>>
>>> Keep the differences between doing an ls on a brick and doing an ls
on a
>>> gluster mount in mind. When you do a ls on a gluster volume, it
isn?t just
>>> doing a ls on one brick, it?s effectively doing it on ALL of your
bricks,
>>> and they all have to return data before the ls succeeds. In a
distributed
>>> volume, it?s figuring out where on each volume things live and
getting the
>>> stat() from each to assemble the whole thing. And if things are in
need of
>>> healing, it will take even longer to decide which version is
current and
>>> use it (shd triggers a heal anytime it encounters this). Any of
these
>>> things being slow slows down the overall response.
>>>
>>> At this point, I?d get some sleep too, and let your cluster heal
while
>>> you do. I?d really want it fully healed before I did any updates
anyway, so
>>> let it use CPU and get itself sorted out. Expect it to do a round
of
>>> healing after you upgrade each machine too, this is normal so don?t
let the
>>> CPU spike surprise you, It?s just catching up from the downtime
incurred by
>>> the update and/or reboot if you did one.
>>>
>>> That reminds me, check your gluster cluster.op-version and
>>> cluster.max-op-version (gluster vol get all all | grep op-version).
If
>>> op-version isn?t at the max-op-verison, set it to it so you?re
taking
>>> advantage of the latest features available to your version.
>>>
>>> -Darrell
>>>
>>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie <patrickmrennie at
gmail.com>
>>> wrote:
>>>
>>> Hi Darrell,
>>>
>>> Thanks again for your advice, I've applied the acltype=posixacl
on my
>>> zpools and I think that has reduced some of the noise from my brick
logs.
>>> I also bumped up some of the thread counts you suggested but my CPU
load
>>> skyrocketed, so I dropped it back down to something slightly lower,
but
>>> still higher than it was before, and will see how that goes for a
while.
>>>
>>> Although low space is a definite issue, if I run an ls anywhere on
my
>>> bricks directly it's instant, <1 second, and still takes
several minutes
>>> via gluster, so there is still a problem in my gluster
configuration
>>> somewhere. We don't have any snapshots, but I am trying to work
out if any
>>> data on there is safe to delete, or if there is any way I can
safely find
>>> and delete data which has been removed directly from the bricks in
the
>>> past. I also have lz4 compression already enabled on each zpool
which does
>>> help a bit, we get between 1.05 and 1.08x compression on this data.
>>> I've tried to go through each client and checked it's
cluster mount logs
>>> and also my brick logs and looking for errors, so far nothing is
jumping
>>> out at me, but there are some warnings and errors here and there, I
am
>>> trying to work out what they mean.
>>>
>>> It's already 1 am here and unfortunately, I'm still awake
working on
>>> this issue, but I think that I will have to leave the version
upgrades
>>> until tomorrow.
>>>
>>> Thanks again for your advice so far. If anyone has any ideas on
where I
>>> can look for errors other than brick logs or the cluster mount logs
to help
>>> resolve this issue, it would be much appreciated.
>>>
>>> Cheers,
>>>
>>> - Patrick
>>>
>>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic <budic at
onholyground.com>
>>> wrote:
>>>
>>>> See inline:
>>>>
>>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie <patrickmrennie
at gmail.com>
>>>> wrote:
>>>>
>>>> Hi Darrell,
>>>>
>>>> Thanks for your reply, this issue seems to be getting worse
over the
>>>> last few days, really has me tearing my hair out. I will do as
you have
>>>> suggested and get started on upgrading from 3.12.14 to 3.12.15.
>>>> I've checked the zfs properties and all bricks have
"xattr=sa" set, but
>>>> none of them has "acltype=posixacl" set, currently
the acltype property
>>>> shows "off", if I make these changes will it apply
retroactively to the
>>>> existing data? I'm unfamiliar with what this will change so
I may need to
>>>> look into that before I proceed.
>>>>
>>>>
>>>> It is safe to apply that now, any new set/get calls will then
use it if
>>>> new posixacls exist, and use older if not. ZFS is good that
way. It should
>>>> clear up your posix_acl and posix errors over time.
>>>>
>>>> I understand performance is going to slow down as the bricks
get full,
>>>> I am currently trying to free space and migrate data to some
newer storage,
>>>> I have fresh several hundred TB storage I just setup recently
but with
>>>> these performance issues it's really slow. I also believe
there is
>>>> significant data which has been deleted directly from the
bricks in the
>>>> past, so if I can reclaim this space in a safe manner then I
will have at
>>>> least around 10-15% free space.
>>>>
>>>>
>>>> Full ZFS volumes will have a much larger impact on performance
than
>>>> you?d think, I?d prioritize this. If you have been taking zfs
snapshots,
>>>> consider deleting them to get the overall volume free space
back up. And
>>>> just to be sure it?s been said, delete from within the mounted
volumes,
>>>> don?t delete directly from the bricks (gluster will just try
and heal it
>>>> later, compounding your issues). Does not apply to deleting
other data from
>>>> the ZFS volume if it?s not part of the brick directory, of
course.
>>>>
>>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of
RAM so
>>>> generally they have plenty of resources available, currently
only using
>>>> around 330/512GB of memory.
>>>>
>>>> I will look into what your suggested settings will change, and
then
>>>> will probably go ahead with your recommendations, for our specs
as stated
>>>> above, what would you suggest for performance.io-thread-count ?
>>>>
>>>>
>>>> I run single 2630v4s on my servers, which have a smaller
storage
>>>> footprint than yours. I?d go with 32 for
performance.io-thread-count.
>>>> I?d try 4 for the shd thread settings on that gear. Your memory
use sounds
>>>> fine, so no worries there.
>>>>
>>>> Our workload is nothing too extreme, we have a few VMs which
write
>>>> backup data to this storage nightly for our clients, our VMs
don't live on
>>>> this cluster, but just write to it.
>>>>
>>>>
>>>> If they are writing compressible data, you?ll get immediate
benefit by
>>>> setting compression=lz4 on your ZFS volumes. It won?t help any
old data, of
>>>> course, but it will compress new data going forward. This is
another one
>>>> that?s safe to enable on the fly.
>>>>
>>>> I've been going through all of the logs I can, below are
some slightly
>>>> sanitized errors I've come across, but I'm not sure
what to make of them.
>>>> The main error I am seeing is the first one below, across
several of my
>>>> bricks, but possibly only for specific folders on the cluster,
I'm not 100%
>>>> about that yet though.
>>>>
>>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default
[Operation not
>>>> supported]
>>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default
[Operation not
>>>> supported]
>>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default
[Operation not
>>>> supported]
>>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default
[Operation not
>>>> supported]
>>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001]
>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed
on
>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default
[Operation not
>>>> supported]
>>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr]
>>>> 0-gvAA01-posix: Extended attributes not supported (try
remounting brick
>>>> with 'user_xattr' flag)
>>>>
>>>>
>>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002]
>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is
null for
>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002]
>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is
null for
>>>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available]
>>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050]
>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server:
24274759: LOOKUP
>>>> /xxxxxxxxxxxxxxxxxxxx
(a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>>>> Backup_clone1.vbm_62906_tmp), client:
>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3,
error-xlator:
>>>> gvAA01-posix [No data available]
>>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050]
>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server:
24276491: LOOKUP
>>>> /xxxxxxxxxxxxxxxxxxxx
(a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud
>>>> Backup_clone1.vbm_62906_tmp), client:
>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3,
error-xlator:
>>>> gvAA01-posix [No data available]
>>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002]
>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is
null for
>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument]
>>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002]
>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is
null for
>>>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available]
>>>>
>>>>
>>>> posixacls should clear those up, as mentioned.
>>>>
>>>>
>>>> [2019-04-20 14:25:59.654576] E
[inodelk.c:404:__inode_unlock_lock]
>>>> 0-gvAA01-locks: Matching lock not found for unlock
0-9223372036854775807,
>>>> by 980fdbbd367f0000 on 0x7fc4f0161440
>>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053]
>>>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server:
6092928:
>>>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$
(25b14631-a179-4274-8243-6e272d4f2ad8),
>>>> client:
>>>>
cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4,
>>>> error-xlator: gvAA01-locks [Invalid argument]
>>>>
>>>>
>>>> [2019-04-20 13:35:07.495495] E
[rpcsvc.c:1364:rpcsvc_submit_generic]
>>>> 0-rpc-service: failed to submit message (XID: 0x247c644,
Program: GlusterFS
>>>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(tcp.gvAA01-server)
>>>> [2019-04-20 13:35:07.495619] E
[server.c:195:server_submit_reply]
>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a)
>>>> [0x7ff4ae6f796a]
>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8)
>>>> [0x7ff4ae2a96e8]
>>>>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d)
>>>> [0x7ff4ae28528d] ) 0-: Reply submission failed
>>>>
>>>>
>>>> Fix the posix acls and see if these clear up over time as well,
I?m
>>>> unclear on what the overall effect of running without the posix
acls will
>>>> be to total gluster health. Your biggest problem sounds like
you need to
>>>> free up space on the volumes and get the overall volume health
back up to
>>>> par and see if that doesn?t resolve the symptoms you?re seeing.
>>>>
>>>>
>>>>
>>>> Thank you again for your assistance. It is greatly appreciated.
>>>>
>>>> - Patrick
>>>>
>>>>
>>>>
>>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic <budic at
onholyground.com>
>>>> wrote:
>>>>
>>>>> Patrick,
>>>>>
>>>>> I would definitely upgrade your two nodes from 3.12.14 to
3.12.15. You
>>>>> also mention ZFS, and that error you show makes me think
you need to check
>>>>> to be sure you have ?xattr=sa? and ?acltype=posixacl? set
on your ZFS
>>>>> volumes.
>>>>>
>>>>> You also observed your bricks are crossing the 95% full
line, ZFS
>>>>> performance will degrade significantly the closer you get
to full. In my
>>>>> experience, this starts somewhere between 10% and 5% free
space remaining,
>>>>> so you?re in that realm.
>>>>>
>>>>> How?s your free memory on the servers doing? Do you have
your zfs arc
>>>>> cache limited to something less than all the RAM? It shares
pretty well,
>>>>> but I?ve encountered situations where other things won?t
try and take ram
>>>>> back properly if they think it?s in use, so ZFS never gets
the opportunity
>>>>> to give it up.
>>>>>
>>>>> Since your volume is a disperse-replica, you might try
tuning
>>>>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4,
or even more if
>>>>> the CPUs are beefy enough. And setting server.event-threads
to 4 and
>>>>> client.event-threads to 8 has proven helpful in many cases.
After you get
>>>>> upgraded to 3.12.15, enabling performance.stat-prefetch may
help as well. I
>>>>> don?t know if it matters, but I?d also recommend resetting
>>>>> performance.least-prio-threads to the default of 1 (or try
2 or 4) and/or
>>>>> also setting performance.io-thread-count to 32 if those
have beefy
>>>>> CPUs.
>>>>>
>>>>> Beyond those general ideas, more info about your hardware
(CPU and
>>>>> RAM) and workload (VMs, direct storage for web servers or
enders, etc) may
>>>>> net you some more ideas. Then you?re going to have to do
more digging into
>>>>> brick logs looking for errors and/or warnings to see what?s
going on.
>>>>>
>>>>> -Darrell
>>>>>
>>>>>
>>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie
<patrickmrennie at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Gluster Users,
>>>>>
>>>>> I am hoping someone can help me with resolving an ongoing
issue I've
>>>>> been having, I'm new to mailing lists so forgive me if
I have gotten
>>>>> anything wrong. We have noticed our performance
deteriorating over the last
>>>>> few weeks, easily measured by trying to do an ls on one of
our top-level
>>>>> folders, and timing it, which usually would take 2-5
seconds, and now takes
>>>>> up to 20 minutes, which obviously renders our cluster
basically unusable.
>>>>> This has been intermittent in the past but is now almost
constant and I am
>>>>> not sure how to work out the exact cause. We have noticed
some errors in
>>>>> the brick logs, and have noticed that if we kill the right
brick process,
>>>>> performance instantly returns back to normal, this is not
always the same
>>>>> brick, but it indicates to me something in the brick
processes or
>>>>> background tasks may be causing extreme latency. Due to
this ability to fix
>>>>> it by killing the right brick process off, I think it's
a specific file, or
>>>>> folder, or operation which may be hanging and causing the
increased
>>>>> latency, but I am not sure how to work it out. One last
thing to add is
>>>>> that our bricks are getting quite full (~95% full), we are
trying to
>>>>> migrate data off to new storage but that is going slowly,
not helped by
>>>>> this issue. I am currently trying to run a full heal as
there appear to be
>>>>> many files needing healing, and I have all brick processes
running so they
>>>>> have an opportunity to heal, but this means performance is
very poor. It
>>>>> currently takes over 15-20 minutes to do an ls of one of
our top-level
>>>>> folders, which just contains 60-80 other folders, this
should take 2-5
>>>>> seconds. This is all being checked by FUSE mount locally on
the storage
>>>>> node itself, but it is the same for other clients and VMs
accessing the
>>>>> cluster. Initially, it seemed our NFS mounts were not
affected and operated
>>>>> at normal speed, but testing over the last day has shown
that our NFS
>>>>> clients are also extremely slow, so it doesn't seem
specific to FUSE as I
>>>>> first thought it might be.
>>>>>
>>>>> I am not sure how to proceed from here, I am fairly new to
gluster
>>>>> having inherited this setup from my predecessor and trying
to keep it
>>>>> going. I have included some info below to try and help with
diagnosis,
>>>>> please let me know if any further info would be helpful. I
would really
>>>>> appreciate any advice on what I could try to work out the
cause. Thank you
>>>>> in advance for reading this, and any suggestions you might
be able to
>>>>> offer.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> This is an example of the main error I see in our brick
logs, there
>>>>> have been others, I can post them when I see them again
too:
>>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001]
>>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr
failed on
>>>>> /brick1/<filename> library: system.posix_acl_default
[Operation not
>>>>> supported]
>>>>> [2019-04-20 05:01:29.476313] W
[posix.c:4929:posix_getxattr]
>>>>> 0-gvAA01-posix: Extended attributes not supported (try
remounting brick
>>>>> with 'user_xattr' flag)
>>>>>
>>>>> Our setup consists of 2 storage nodes and an arbiter node.
I have
>>>>> noticed our nodes are on slightly different versions,
I'm not sure if this
>>>>> could be an issue. We have 9 bricks on each node, made up
of ZFS RAIDZ2
>>>>> pools - total capacity is around 560TB.
>>>>> We have bonded 10gbps NICS on each node, and I have tested
bandwidth
>>>>> with iperf and found that it's what would be expected
from this config.
>>>>> Individual brick performance seems ok, I've tested
several bricks
>>>>> using dd and can write a 10GB files at 1.7GB/s.
>>>>>
>>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M
count=10000
>>>>> 10000+0 records in
>>>>> 10000+0 records out
>>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7
GB/s
>>>>>
>>>>> Node 1:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.15
>>>>>
>>>>> Node 2:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.14
>>>>>
>>>>> Arbiter:
>>>>> # glusterfs --version
>>>>> glusterfs 3.12.14
>>>>>
>>>>> Here is our gluster volume status:
>>>>>
>>>>> # gluster volume status
>>>>> Status of volume: gvAA01
>>>>> Gluster process TCP Port RDMA
Port
>>>>> Online Pid
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y
7219
>>>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y
21845
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck1 49152 0
Y
>>>>> 6931
>>>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y
7239
>>>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y
9916
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck2 49153 0
Y
>>>>> 6939
>>>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y
7235
>>>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y
21858
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck3 49154 0
Y
>>>>> 6947
>>>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y
31840
>>>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y
9933
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck4 49155 0
Y
>>>>> 6956
>>>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y
7233
>>>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y
9942
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck5 49156 0
Y
>>>>> 6964
>>>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y
7234
>>>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y
9952
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck6 49157 0
Y
>>>>> 6974
>>>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y
7248
>>>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y
9960
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck7 49158 0
Y
>>>>> 6984
>>>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y
7253
>>>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y
9970
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck8 49159 0
Y
>>>>> 6993
>>>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y
7245
>>>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y
9984
>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri
>>>>> ck9 49160 0
Y
>>>>> 7001
>>>>> NFS Server on localhost 2049 0
Y
>>>>> 17276
>>>>> Self-heal Daemon on localhost N/A N/A
Y
>>>>> 25245
>>>>> NFS Server on 02-B 2049 0 Y
9089
>>>>> Self-heal Daemon on 02-B N/A N/A Y
17838
>>>>> NFS Server on 00-a 2049 0 Y
15660
>>>>> Self-heal Daemon on 00-a N/A N/A Y
16218
>>>>>
>>>>> Task Status of Volume gvAA01
>>>>>
>>>>>
------------------------------------------------------------------------------
>>>>> There are no active volume tasks
>>>>>
>>>>> And gluster volume info:
>>>>>
>>>>> # gluster volume info
>>>>>
>>>>> Volume Name: gvAA01
>>>>> Type: Distributed-Replicate
>>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 9 x (2 + 1) = 27
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: 01-B:/brick1/gvAA01/brick
>>>>> Brick2: 02-B:/brick1/gvAA01/brick
>>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)
>>>>> Brick4: 01-B:/brick2/gvAA01/brick
>>>>> Brick5: 02-B:/brick2/gvAA01/brick
>>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)
>>>>> Brick7: 01-B:/brick3/gvAA01/brick
>>>>> Brick8: 02-B:/brick3/gvAA01/brick
>>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)
>>>>> Brick10: 01-B:/brick4/gvAA01/brick
>>>>> Brick11: 02-B:/brick4/gvAA01/brick
>>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)
>>>>> Brick13: 01-B:/brick5/gvAA01/brick
>>>>> Brick14: 02-B:/brick5/gvAA01/brick
>>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)
>>>>> Brick16: 01-B:/brick6/gvAA01/brick
>>>>> Brick17: 02-B:/brick6/gvAA01/brick
>>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)
>>>>> Brick19: 01-B:/brick7/gvAA01/brick
>>>>> Brick20: 02-B:/brick7/gvAA01/brick
>>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)
>>>>> Brick22: 01-B:/brick8/gvAA01/brick
>>>>> Brick23: 02-B:/brick8/gvAA01/brick
>>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)
>>>>> Brick25: 01-B:/brick9/gvAA01/brick
>>>>> Brick26: 02-B:/brick9/gvAA01/brick
>>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)
>>>>> Options Reconfigured:
>>>>> cluster.shd-max-threads: 4
>>>>> performance.least-prio-threads: 16
>>>>> cluster.readdir-optimize: on
>>>>> performance.quick-read: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.data-self-heal: on
>>>>> cluster.lookup-unhashed: auto
>>>>> cluster.lookup-optimize: on
>>>>> cluster.favorite-child-policy: mtime
>>>>> server.allow-insecure: on
>>>>> transport.address-family: inet
>>>>> client.bind-insecure: on
>>>>> cluster.entry-self-heal: off
>>>>> cluster.metadata-self-heal: off
>>>>> performance.md-cache-timeout: 600
>>>>> cluster.self-heal-daemon: enable
>>>>> performance.readdir-ahead: on
>>>>> diagnostics.brick-log-level: INFO
>>>>> nfs.disable: off
>>>>>
>>>>> Thank you for any assistance.
>>>>>
>>>>> - Patrick
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190421/31d3d12b/attachment-0001.html>