thr3ads.net - Gluster users - [Gluster-users] How to diagnose volume rebalance failure? [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Susant Palai

2015-Dec-15 07:21 UTC

[Gluster-users] How to diagnose volume rebalance failure?

Hi PuYun,
  We need to figure out some mechanism to get the huge log files. Until then
here is something I can think can be reason that can affect the performance.

The rebalance normally starts in medium level [performance wise] which means for
you in this case will generate two threads for migration which can hog on those
2 cores. In case you run rebalance again, run it in lazy mode. Here is the
command.

"gluster v set <VOLUME-NAME> rebal-throttle lazy". This should
spawn just one thread for migration.

For logs: Can you grep for errors in the rebalance log file and upload? <till
we figure out a method to get full logs>

Thanks,
Susant

----- Original Message -----
From: "PuYun" <cloudor at 126.com>
To: "gluster-users" <gluster-users at gluster.org>
Sent: Tuesday, 15 December, 2015 5:51:00 AM
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure?



Hi, 


Another weird piece of infomation may be useful. The failed task had actually
been running for hours, but the status command output only 3621 sec.


============== shell ============== 
[root at d001 glusterfs]# gluster volume rebalance FastVol status 
Node Rebalanced-files size scanned failures skipped status run time in secs 
--------- ----------- ----------- ----------- ----------- -----------
------------ --------------
localhost 0 0Bytes 952767 0 0 failed 3621.00 
volume rebalance: FastVol: success: 

================================ 


As you can see, I started rebalance task for only 1 time. 
======== cmd_history.log-20151215 ====== 
[2015-12-14 12:50:41.443937] : volume start FastVol : SUCCESS 
[2015-12-14 12:55:01.367519] : volume rebalance FastVol start : SUCCESS 
[2015-12-14 13:55:22.132199] : volume rebalance FastVol status : SUCCESS 
[2015-12-14 23:04:01.780885] : volume rebalance FastVol status : SUCCESS 
[2015-12-14 23:35:56.708077] : volume rebalance FastVol status : SUCCESS 

================================= 


Because the task failed at [ 2015-12-14 21:46:54.179xx], something wrong might
happened at 3621 secs before, that is [ 2015-12-14 20:46:33.179xx]. I check logs
at that time, found nothing special.
========== FastVol-rebalance.log ======== 
[2015-12-14 20:46:33.166748] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/userPoint:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.171009] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/Weixin.oClDcjjJ/t2/n1/VSXZlm65KjfhbgoM/flag_finished from
subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.174851] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_origin.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.181448] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/userPoint from
subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.184996] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_small.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.191681] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_origin.jpg
from subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.195396] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_big_square.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1

================================== 


And, there is no logs around at [ 2015-12-14 20:46:33.179xx ] in
mnt-b1-brick.log, mnt-c1-brick.log and etc-glusterfs-glusterd.vol.log.




PuYun 





From: PuYun 
Date: 2015-12-15 07:30 
To: gluster-users 
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 


Hi, 


Failed again. I can see disconnections in logs, but no more details. 


=========== mnt-b1-brick.log =========== 
[2015-12-14 21:46:54.179662] I [MSGID: 115036] [server.c:552:server_rpc_notify]
0-FastVol-server: disconnecting connection from
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-1-0-0
[2015-12-14 21:46:54.181764] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on /
[2015-12-14 21:46:54.181815] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir
[2015-12-14 21:46:54.181856] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user
[2015-12-14 21:46:54.181918] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/0.jpg
[2015-12-14 21:46:54.181961] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an
[2015-12-14 21:46:54.182003] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/icon_loading_white22c04a.gif
[2015-12-14 21:46:54.182036] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji
[2015-12-14 21:46:54.182076] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay
[2015-12-14 21:46:54.182110] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an/ling00
[2015-12-14 21:46:54.182203] I [MSGID: 101055] [client_t.c:419:gf_client_unref]
0-FastVol-server: Shutting down connection
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-1-0-0

====================================== 


============== mnt-c1-brick.log -============ 
[2015-12-14 21:46:54.179597] I [MSGID: 115036] [server.c:552:server_rpc_notify]
0-FastVol-server: disconnecting connection from
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-0-0-0
[2015-12-14 21:46:54.180428] W [inodelk.c:404:pl_inodelk_log_cleanup]
0-FastVol-server: releasing lock on 5e300cdb-7298-44c0-90eb-5b50018daed6 held by
{client=0x7effc810cce0, pid=-3 lk-owner=fdffffff}
[2015-12-14 21:46:54.180454] W [inodelk.c:404:pl_inodelk_log_cleanup]
0-FastVol-server: releasing lock on 3c9a1cd5-84c8-4967-98d5-e75a402b1f74 held by
{client=0x7effc810cce0, pid=-3 lk-owner=fdffffff}
[2015-12-14 21:46:54.180483] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on /
[2015-12-14 21:46:54.180525] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir
[2015-12-14 21:46:54.180570] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user
[2015-12-14 21:46:54.180604] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/0.jpg
[2015-12-14 21:46:54.180634] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji
[2015-12-14 21:46:54.180678] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay
[2015-12-14 21:46:54.180725] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an/ling00
[2015-12-14 21:46:54.180779] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/icon_loading_white22c04a.gif
[2015-12-14 21:46:54.180820] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an
[2015-12-14 21:46:54.180859] I [MSGID: 101055] [client_t.c:419:gf_client_unref]
0-FastVol-server: Shutting down connection
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-0-0-0

====================================== 




============== etc-glusterfs-glusterd.vol.log ========== 
[2015-12-14 21:46:54.179819] W [socket.c:588:__socket_rwv] 0-management: readv
on /var/run/gluster/gluster-rebalance-dbee250a-e3fe-4448-b905-b76c5ba80b25.sock
failed (No data available)
[2015-12-14 21:46:54.209586] I [MSGID: 106007]
[glusterd-rebalance.c:162:__glusterd_defrag_notify] 0-management: Rebalance
process for volume FastVol has disconnected.
[2015-12-14 21:46:54.209627] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy]
0-management: size=588 max=1 total=1
[2015-12-14 21:46:54.209640] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy]
0-management: size=124 max=1 total=1

============================================= 




================== FastVol-rebalance.log ============ 
... 
[2015-12-14 21:46:53.423719] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/07.jpg from subvolume
FastVol-client-0 to FastVol-client-1
[2015-12-14 21:46:53.423976] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/126724/1d0ca0de913c4e50f85f2b29694e4e64.html
from subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 21:46:53.436268] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht: /for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/0.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 21:46:53.436597] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/icon_loading_white22c04a.gif:
attempting to move from FastVol-client-0 to FastVol-client-1

<EOF> 
============================================== 





PuYun 





From: PuYun 
Date: 2015-12-14 21:51 
To: gluster-users 
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 


Hi, 


Thank you for your reply. I don't know how to send you the huge sized
rebalance log file which is about 2GB.


However, I might have found out the reason why the task failed. My gluster
server has only 2 cpu cores and carries 2 ssd bricks. When the rebalance task
began, top 3 processes are 70%~80%, 30%~40 and 30%~40 cpu usage. Others are less
than 1%. But after a while, 2 CPU cores are used up totally and I even can't
login until the rebalance task failed.


It seems 2 bricks require 4 CPU cores at least. Now I upgrade the virtual server
with 8 CPU cores and start rebalance task again. Everything goes well for now.


I will report again when the current task completed or failed. 





PuYun 





From: Nithya Balachandran 
Date: 2015-12-14 18:57 
To: PuYun 
CC: gluster-users 
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 

Hi, 

Can you send us the rebalance log? 

Regards, 
Nithya 

----- Original Message ----- > From: "PuYun" <cloudor at 126.com> 
> To: "gluster-users" <gluster-users at gluster.org> 
> Sent: Monday, December 14, 2015 11:33:40 AM 
> Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
> 
> Here is the tail of the failed rebalance log, any clue? 
> 
> [2015-12-13 21:30:31.527493] I [dht-rebalance.c:2340:gf_defrag_process_dir]
> 0-FastVol-dht: Migration operation on dir 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Ny/5F/1MsH5--BcoGRAJPI took 20.95
secs
> [2015-12-13 21:30:31.528704] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Kn/hM/oHcPMp4hKq5Tq2ZQ/flag_finished:
> attempting to move from FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:30:31.543901] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/userPoint: 
> attempting to move from FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:31:37.210496] I [MSGID: 109081] 
> [dht-common.c:3780:dht_setxattr] 0-FastVol-dht: fixing the layout of 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Ny/7Q 
> [2015-12-13 21:31:37.722825] I [MSGID: 109045] 
> [dht-selfheal.c:1508:dht_fix_layout_of_directory] 0-FastVol-dht: subvolume
0
> (FastVol-client-0): 1032124 chunks 
> [2015-12-13 21:31:37.722837] I [MSGID: 109045] 
> [dht-selfheal.c:1508:dht_fix_layout_of_directory] 0-FastVol-dht: subvolume
1
> (FastVol-client-1): 1032124 chunks 
> [2015-12-13 21:33:03.955539] I [MSGID: 109064] 
> [dht-layout.c:808:dht_layout_dir_mismatch] 0-FastVol-dht: subvol: 
> FastVol-client-0; inode layout - 0 - 2146817919 - 1; disk layout - 
> 2146817920 - 4294967295 - 1 
> [2015-12-13 21:33:04.069859] I [MSGID: 109018] 
> [dht-common.c:806:dht_revalidate_cbk] 0-FastVol-dht: Mismatching layouts
for
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Ny/7Q, gfid = 
> f38c4ed2-a26a-4d83-adfd-6b0331831738 
> [2015-12-13 21:33:04.118800] I [MSGID: 109064] 
> [dht-layout.c:808:dht_layout_dir_mismatch] 0-FastVol-dht: subvol: 
> FastVol-client-1; inode layout - 2146817920 - 4294967295 - 1; disk layout -
> 0 - 2146817919 - 1 
> [2015-12-13 21:33:19.979507] I [MSGID: 109022] 
> [dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration 
> of 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Kn/hM/oHcPMp4hKq5Tq2ZQ/flag_finished 
> from subvolume FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:33:19.979459] I [MSGID: 109022] 
> [dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration 
> of /for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/userPoint 
> from subvolume FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:33:25.543941] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
>
/for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/portrait_origin.jpg:
> attempting to move from FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:33:25.962547] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
>
/for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/portrait_small.jpg:
> attempting to move from FastVol-client-0 to FastVol-client-1 
> 
> 
> Cloudor 
> 
> 
> 
> From: Sakshi Bansal 
> Date: 2015-12-12 13:02 
> To: ?? 
> CC: gluster-users 
> Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
> In the rebalance log file you can check the file/directory for which the 
> rebalance has failed. It can mention what was the fop for whihc the failure
> happened. 
> 
> _______________________________________________ 
> Gluster-users mailing list 
> Gluster-users at gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

PuYun

2015-Dec-15 13:08 UTC

head link

[Gluster-users] How to diagnose volume rebalance failure?

Hi Susant,
    Thank you for your advice.  I have set that parameter and started again in
lazy mode. I think maybe it will not fail again this time because every time it
fails it happens after migration started. Another important thing is fix-layout
rebalance which is without migration can complete after running for 25 hours.

    And, I can't grep any errors in the rebalance log file using "grep
' E '  FastVol-rebalance.log".

    Thank you.


PuYun
 
From: Susant Palai
Date: 2015-12-15 15:21
To: PuYun
CC: gluster-users
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure?
Hi PuYun,
  We need to figure out some mechanism to get the huge log files. Until then
here is something I can think can be reason that can affect the performance.
 
The rebalance normally starts in medium level [performance wise] which means for
you in this case will generate two threads for migration which can hog on those
2 cores. In case you run rebalance again, run it in lazy mode. Here is the
command.
 
"gluster v set <VOLUME-NAME> rebal-throttle lazy". This should
spawn just one thread for migration.
 
For logs: Can you grep for errors in the rebalance log file and upload? <till
we figure out a method to get full logs>
 
Thanks,
Susant
 
----- Original Message -----
From: "PuYun" <cloudor at 126.com>
To: "gluster-users" <gluster-users at gluster.org>
Sent: Tuesday, 15 December, 2015 5:51:00 AM
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure?
 
 
 
Hi, 
 
 
Another weird piece of infomation may be useful. The failed task had actually
been running for hours, but the status command output only 3621 sec.
 
 
============== shell ============== 
[root at d001 glusterfs]# gluster volume rebalance FastVol status 
Node Rebalanced-files size scanned failures skipped status run time in secs 
--------- ----------- ----------- ----------- ----------- -----------
------------ --------------
localhost 0 0Bytes 952767 0 0 failed 3621.00 
volume rebalance: FastVol: success: 
 
================================ 
 
 
As you can see, I started rebalance task for only 1 time. 
======== cmd_history.log-20151215 ====== 
[2015-12-14 12:50:41.443937] : volume start FastVol : SUCCESS 
[2015-12-14 12:55:01.367519] : volume rebalance FastVol start : SUCCESS 
[2015-12-14 13:55:22.132199] : volume rebalance FastVol status : SUCCESS 
[2015-12-14 23:04:01.780885] : volume rebalance FastVol status : SUCCESS 
[2015-12-14 23:35:56.708077] : volume rebalance FastVol status : SUCCESS 
 
================================= 
 
 
Because the task failed at [ 2015-12-14 21:46:54.179xx], something wrong might
happened at 3621 secs before, that is [ 2015-12-14 20:46:33.179xx]. I check logs
at that time, found nothing special.
========== FastVol-rebalance.log ======== 
[2015-12-14 20:46:33.166748] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/userPoint:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.171009] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/Weixin.oClDcjjJ/t2/n1/VSXZlm65KjfhbgoM/flag_finished from
subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.174851] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_origin.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.181448] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/userPoint from
subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.184996] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_small.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.191681] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_origin.jpg
from subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 20:46:33.195396] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjjJ/rH/wV/mNv6sX94lypFWdvM/portrait_big_square.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
 
================================== 
 
 
And, there is no logs around at [ 2015-12-14 20:46:33.179xx ] in
mnt-b1-brick.log, mnt-c1-brick.log and etc-glusterfs-glusterd.vol.log.
 
 
 
 
PuYun 
 
 
 
 
 
From: PuYun 
Date: 2015-12-15 07:30 
To: gluster-users 
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
 
 
Hi, 
 
 
Failed again. I can see disconnections in logs, but no more details. 
 
 
=========== mnt-b1-brick.log =========== 
[2015-12-14 21:46:54.179662] I [MSGID: 115036] [server.c:552:server_rpc_notify]
0-FastVol-server: disconnecting connection from
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-1-0-0
[2015-12-14 21:46:54.181764] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on /
[2015-12-14 21:46:54.181815] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir
[2015-12-14 21:46:54.181856] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user
[2015-12-14 21:46:54.181918] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/0.jpg
[2015-12-14 21:46:54.181961] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an
[2015-12-14 21:46:54.182003] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/icon_loading_white22c04a.gif
[2015-12-14 21:46:54.182036] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji
[2015-12-14 21:46:54.182076] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay
[2015-12-14 21:46:54.182110] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an/ling00
[2015-12-14 21:46:54.182203] I [MSGID: 101055] [client_t.c:419:gf_client_unref]
0-FastVol-server: Shutting down connection
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-1-0-0
 
====================================== 
 
 
============== mnt-c1-brick.log -============ 
[2015-12-14 21:46:54.179597] I [MSGID: 115036] [server.c:552:server_rpc_notify]
0-FastVol-server: disconnecting connection from
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-0-0-0
[2015-12-14 21:46:54.180428] W [inodelk.c:404:pl_inodelk_log_cleanup]
0-FastVol-server: releasing lock on 5e300cdb-7298-44c0-90eb-5b50018daed6 held by
{client=0x7effc810cce0, pid=-3 lk-owner=fdffffff}
[2015-12-14 21:46:54.180454] W [inodelk.c:404:pl_inodelk_log_cleanup]
0-FastVol-server: releasing lock on 3c9a1cd5-84c8-4967-98d5-e75a402b1f74 held by
{client=0x7effc810cce0, pid=-3 lk-owner=fdffffff}
[2015-12-14 21:46:54.180483] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on /
[2015-12-14 21:46:54.180525] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir
[2015-12-14 21:46:54.180570] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user
[2015-12-14 21:46:54.180604] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/0.jpg
[2015-12-14 21:46:54.180634] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji
[2015-12-14 21:46:54.180678] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay
[2015-12-14 21:46:54.180725] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an/ling00
[2015-12-14 21:46:54.180779] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/icon_loading_white22c04a.gif
[2015-12-14 21:46:54.180820] I [MSGID: 115013]
[server-helpers.c:294:do_fd_cleanup] 0-FastVol-server: fd cleanup on
/for_ybest_fsdir/user/ji/ay/an
[2015-12-14 21:46:54.180859] I [MSGID: 101055] [client_t.c:419:gf_client_unref]
0-FastVol-server: Shutting down connection
d001-1799-2015/12/14-12:54:56:347561-FastVol-client-0-0-0
 
====================================== 
 
 
 
 
============== etc-glusterfs-glusterd.vol.log ========== 
[2015-12-14 21:46:54.179819] W [socket.c:588:__socket_rwv] 0-management: readv
on /var/run/gluster/gluster-rebalance-dbee250a-e3fe-4448-b905-b76c5ba80b25.sock
failed (No data available)
[2015-12-14 21:46:54.209586] I [MSGID: 106007]
[glusterd-rebalance.c:162:__glusterd_defrag_notify] 0-management: Rebalance
process for volume FastVol has disconnected.
[2015-12-14 21:46:54.209627] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy]
0-management: size=588 max=1 total=1
[2015-12-14 21:46:54.209640] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy]
0-management: size=124 max=1 total=1
 
============================================= 
 
 
 
 
================== FastVol-rebalance.log ============ 
... 
[2015-12-14 21:46:53.423719] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/07.jpg from subvolume
FastVol-client-0 to FastVol-client-1
[2015-12-14 21:46:53.423976] I [MSGID: 109022]
[dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration of
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/126724/1d0ca0de913c4e50f85f2b29694e4e64.html
from subvolume FastVol-client-0 to FastVol-client-1
[2015-12-14 21:46:53.436268] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht: /for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/0.jpg:
attempting to move from FastVol-client-0 to FastVol-client-1
[2015-12-14 21:46:53.436597] I [dht-rebalance.c:1010:dht_migrate_file]
0-FastVol-dht:
/for_ybest_fsdir/user/ji/ay/up/a19640529/linkwrap/129836/icon_loading_white22c04a.gif:
attempting to move from FastVol-client-0 to FastVol-client-1
 
<EOF> 
============================================== 
 
 
 
 
 
PuYun 
 
 
 
 
 
From: PuYun 
Date: 2015-12-14 21:51 
To: gluster-users 
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
 
 
Hi, 
 
 
Thank you for your reply. I don't know how to send you the huge sized
rebalance log file which is about 2GB.
 
 
However, I might have found out the reason why the task failed. My gluster
server has only 2 cpu cores and carries 2 ssd bricks. When the rebalance task
began, top 3 processes are 70%~80%, 30%~40 and 30%~40 cpu usage. Others are less
than 1%. But after a while, 2 CPU cores are used up totally and I even can't
login until the rebalance task failed.
 
 
It seems 2 bricks require 4 CPU cores at least. Now I upgrade the virtual server
with 8 CPU cores and start rebalance task again. Everything goes well for now.
 
 
I will report again when the current task completed or failed. 
 
 
 
 
 
PuYun 
 
 
 
 
 
From: Nithya Balachandran 
Date: 2015-12-14 18:57 
To: PuYun 
CC: gluster-users 
Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
 
Hi, 
 
Can you send us the rebalance log? 
 
Regards, 
Nithya 
 
----- Original Message ----- > From: "PuYun" <cloudor at 126.com> 
> To: "gluster-users" <gluster-users at gluster.org> 
> Sent: Monday, December 14, 2015 11:33:40 AM 
> Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
> 
> Here is the tail of the failed rebalance log, any clue? 
> 
> [2015-12-13 21:30:31.527493] I [dht-rebalance.c:2340:gf_defrag_process_dir]
> 0-FastVol-dht: Migration operation on dir 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Ny/5F/1MsH5--BcoGRAJPI took 20.95
secs
> [2015-12-13 21:30:31.528704] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Kn/hM/oHcPMp4hKq5Tq2ZQ/flag_finished:
> attempting to move from FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:30:31.543901] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/userPoint: 
> attempting to move from FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:31:37.210496] I [MSGID: 109081] 
> [dht-common.c:3780:dht_setxattr] 0-FastVol-dht: fixing the layout of 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Ny/7Q 
> [2015-12-13 21:31:37.722825] I [MSGID: 109045] 
> [dht-selfheal.c:1508:dht_fix_layout_of_directory] 0-FastVol-dht: subvolume
0
> (FastVol-client-0): 1032124 chunks 
> [2015-12-13 21:31:37.722837] I [MSGID: 109045] 
> [dht-selfheal.c:1508:dht_fix_layout_of_directory] 0-FastVol-dht: subvolume
1
> (FastVol-client-1): 1032124 chunks 
> [2015-12-13 21:33:03.955539] I [MSGID: 109064] 
> [dht-layout.c:808:dht_layout_dir_mismatch] 0-FastVol-dht: subvol: 
> FastVol-client-0; inode layout - 0 - 2146817919 - 1; disk layout - 
> 2146817920 - 4294967295 - 1 
> [2015-12-13 21:33:04.069859] I [MSGID: 109018] 
> [dht-common.c:806:dht_revalidate_cbk] 0-FastVol-dht: Mismatching layouts
for
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Ny/7Q, gfid = 
> f38c4ed2-a26a-4d83-adfd-6b0331831738 
> [2015-12-13 21:33:04.118800] I [MSGID: 109064] 
> [dht-layout.c:808:dht_layout_dir_mismatch] 0-FastVol-dht: subvol: 
> FastVol-client-1; inode layout - 2146817920 - 4294967295 - 1; disk layout -
> 0 - 2146817919 - 1 
> [2015-12-13 21:33:19.979507] I [MSGID: 109022] 
> [dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration 
> of 
> /for_ybest_fsdir/user/Weixin.oClDcjhe/Kn/hM/oHcPMp4hKq5Tq2ZQ/flag_finished 
> from subvolume FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:33:19.979459] I [MSGID: 109022] 
> [dht-rebalance.c:1290:dht_migrate_file] 0-FastVol-dht: completed migration 
> of /for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/userPoint 
> from subvolume FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:33:25.543941] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
>
/for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/portrait_origin.jpg:
> attempting to move from FastVol-client-0 to FastVol-client-1 
> [2015-12-13 21:33:25.962547] I [dht-rebalance.c:1010:dht_migrate_file] 
> 0-FastVol-dht: 
>
/for_ybest_fsdir/user/Weixin.oClDcjhe/PU/ps/qUa-n38i8QBgeMdI/portrait_small.jpg:
> attempting to move from FastVol-client-0 to FastVol-client-1 
> 
> 
> Cloudor 
> 
> 
> 
> From: Sakshi Bansal 
> Date: 2015-12-12 13:02 
> To: ?? 
> CC: gluster-users 
> Subject: Re: [Gluster-users] How to diagnose volume rebalance failure? 
> In the rebalance log file you can check the file/directory for which the 
> rebalance has failed. It can mention what was the fop for whihc the failure
> happened. 
> 
> _______________________________________________ 
> Gluster-users mailing list 
> Gluster-users at gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151215/0e88dc74/attachment.html>

Gluster users - Dec 2015 - How to diagnose volume rebalance failure?

[Gluster-users] How to diagnose volume rebalance failure?

[Gluster-users] How to diagnose volume rebalance failure?