thr3ads.net - Gluster users - [Gluster-users] How to diagnose volume rebalance failure? [Dec 2015]

If this information is useful, please help other people find it:
Share via:

蒲云

2015-Dec-12 02:32 UTC

[Gluster-users] How to diagnose volume rebalance failure?

Hi,
I tried to rebalance my volume for several times, but all failed. Gluster
version is 3.7.4.

Status and info:
[root at d001 ~]# gluster volume rebalance FastVol status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            51251       553.4MB      
3422092             0             0               failed           13211.00
volume rebalance: FastVol: success:

[root at d001 ~]# gluster volume status
Status of volume: FastVol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick d001:/mnt/c1/brick                    49154     0          Y       32111
Brick d001:/mnt/b1/brick                    49155     0          Y       24557
Task Status of Volume FastVol
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : cf1b25a0-4e33-4abf-9bb9-64cfd7bad115
Status               : failed

[root at d001 ~]# gluster volume info
Volume Name: FastVol
Type: Distribute
Volume ID: dbee250a-e3fe-4448-b905-b76c5ba80b25
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: d001:/mnt/c1/brick
Brick2: d001:/mnt/b1/brick
Options Reconfigured:
nfs.disable: true
auth.allow: 127.0.0.1,10.*


I have checked FastVol-rebalance.log and find no error by searching " E
", then I find some warnings by searching " W ":
[2015-12-11 15:51:57.402661] W [MSGID: 109009]
[dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcji6/7g/yg/3LAF3uXtLlQndyFA: gfid different on
FastVol-client-1. gfid local = 393d4a1a-20b8-49b2-0000-000000000000, gfid subvol
= 393d4a1a-20b8-49b2-8f79-cb17472579e2
[2015-12-11 15:52:12.071984] W [MSGID: 109009]
[dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcji6/ZA: gfid different on FastVol-client-1.
gfid local = 5de2d8a9-954a-437a-8a4f-fe6ab30b646d, gfid subvol =
5de2d8a9-954a-437a-8a4f-fe6ab30b646d
[2015-12-11 16:04:24.346027] W [MSGID: 109009]
[dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht:
/for_ybest_fsdir/user/Weixin.oClDcjtd/2q: gfid different on FastVol-client-1.
gfid local = 49c3a238-c204-4b05-0000-000000000000, gfid subvol =
49c3a238-c204-4b05-85ea-9a400044def6
[2015-12-11 17:55:46.232418] W [MSGID: 109009]
[dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht:
/for_ybest_fsdir/user/li/ur/on/gzhi/linkwrap/49138: gfid different on
FastVol-client-1. gfid local = ae68fd66-36c8-4bd7-0000-000000000000, gfid subvol
= ae68fd66-36c8-4bd7-a183-94390fb5704c


I also checked etc-glusterfs-glusterd.vol.log, but find no errors or warnings
after the rebalance task had started. Latest lines in
etc-glusterfs-glusterd.vol.log :
[2015-12-11 16:03:26.709198] W [socket.c:588:__socket_rwv] 0-nfs: readv on
/var/run/gluster/b87982e05d7252cd3efe66bb7c634115.socket failed (Invalid
argument)
[2015-12-11 16:03:29.709626] W [socket.c:588:__socket_rwv] 0-nfs: readv on
/var/run/gluster/b87982e05d7252cd3efe66bb7c634115.socket failed (Invalid
argument)
[2015-12-11 16:03:30.315759] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2015-12-11 16:03:30.318867] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already
stopped
[2015-12-11 16:03:30.323944] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already
stopped
[2015-12-11 16:03:30.326917] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-12-11 16:03:30.329868] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2015-12-11 16:03:30.371050] I [run.c:190:runner_log] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (-->
/usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc]
(-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2]
(--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran
script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=FastVol -o
nfs.disable=on --gd-workdir=/var/lib/glusterd
[2015-12-11 16:03:30.389063] I [run.c:190:runner_log] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (-->
/usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc]
(-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2]
(--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran
script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh
--volname=FastVol -o nfs.disable=on --gd-workdir=/var/lib/glusterd
The message "I [MSGID: 106006]
[glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has
disconnected from glusterd." repeated 39 times between [2015-12-11
16:01:32.695911] and [2015-12-11 16:03:29.709689]
[2015-12-11 16:05:44.813587] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already
stopped
[2015-12-11 16:05:44.823077] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already
stopped
[2015-12-11 16:05:44.825986] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-12-11 16:05:44.829007] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2015-12-11 16:05:44.865623] I [run.c:190:runner_log] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (-->
/usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc]
(-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2]
(--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran
script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=FastVol -o
nfs.disable=true --gd-workdir=/var/lib/glusterd
[2015-12-11 16:05:44.873447] I [run.c:190:runner_log] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (-->
/usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc]
(-->
/usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2]
(--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran
script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh
--volname=FastVol -o nfs.disable=true --gd-workdir=/var/lib/glusterd
[2015-12-11 19:26:37.779065] W [socket.c:588:__socket_rwv] 0-management: readv
on /var/run/gluster/gluster-rebalance-dbee250a-e3fe-4448-b905-b76c5ba80b25.sock
failed (No data available)
[2015-12-11 19:26:38.220385] I [MSGID: 106007]
[glusterd-rebalance.c:162:__glusterd_defrag_notify] 0-management: Rebalance
process for volume FastVol has disconnected.
[2015-12-11 19:26:38.220446] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy]
0-management: size=588 max=1 total=1235
[2015-12-11 19:26:38.220462] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy]
0-management: size=124 max=1 total=1235
[2015-12-12 01:11:13.920354] I [MSGID: 106488]
[glusterd-handler.c:1463:__glusterd_handle_cli_get_volume] 0-glusterd: Received
get vol req
[2015-12-12 01:11:34.302028] I [MSGID: 106499]
[glusterd-handler.c:4258:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume FastVol

Thanks



Cloudor
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151212/13f596df/attachment.html>

Sakshi Bansal

2015-Dec-12 05:02 UTC

head link

[Gluster-users] How to diagnose volume rebalance failure?

In the rebalance log file you can check the file/directory for which the
rebalance has failed. It can mention what was the fop for whihc the failure
happened.

Gluster users - Dec 2015 - How to diagnose volume rebalance failure?

[Gluster-users] How to diagnose volume rebalance failure?

[Gluster-users] How to diagnose volume rebalance failure?