Nithya,
Is there any way to increase the logging level of the brick? There is
nothing obvious (to me) in the log (see below for the same time period as
the latest rebalance failure). This is the only brick on that server that
has disconnects like this.
Steve
[2017-10-17 02:22:13.453575] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-03-5825-2017/08/30-20:45:55:170091-video-client-4-2-318
(version: 3.8.15)
[2017-10-17 02:22:31.353286] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403
[2017-10-17 02:22:31.353326] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-403
[2017-10-17 02:22:42.288856] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404
(version: 3.8.13)
[2017-10-17 02:29:04.889303] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404
[2017-10-17 02:29:04.889347] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-404
[2017-10-17 02:29:15.327604] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405
(version: 3.8.13)
[2017-10-17 02:33:30.745314] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594
[2017-10-17 02:33:30.745360] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
[2017-10-17 02:33:30.745396] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-594
[2017-10-17 02:33:41.563748] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595
(version: 3.8.13)
[2017-10-17 02:36:43.833304] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595
[2017-10-17 02:36:43.833342] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
[2017-10-17 02:36:43.833371] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-595
[2017-10-17 02:36:54.569836] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596
(version: 3.8.13)
[2017-10-17 02:38:16.697306] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596
[2017-10-17 02:38:16.697370] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
[2017-10-17 02:38:16.697432] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-596
[2017-10-17 02:38:34.591506] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597
(version: 3.8.13)
[2017-10-17 02:55:56.473306] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19
[2017-10-17 02:55:56.473366] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-19
[2017-10-17 02:56:07.161790] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from titan-17527-2017/09/18-19:57:41:611709-video-client-4-0-20
(version: 3.8.8)
[2017-10-17 03:15:13.529281] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597
[2017-10-17 03:15:13.529330] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
[2017-10-17 03:15:13.529400] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-597
[2017-10-17 03:15:41.764247] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598
(version: 3.8.13)
[2017-10-17 03:20:28.921396] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0
[2017-10-17 03:20:28.921498] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-0
[2017-10-17 03:20:39.348678] I [login.c:76:gf_auth] 0-auth/login:
allowed user names: be603ada-6523-44d3-a900-zzzzzzzzzzzz
[2017-10-17 03:20:39.348909] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1
(version: 3.8.7)
[2017-10-17 03:27:18.385374] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1
[2017-10-17 03:27:18.385423] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc3-02-15013-2017/10/14-18:04:51:499320-video-client-4-0-1
[2017-10-17 03:31:47.325285] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598
[2017-10-17 03:31:47.325340] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
[2017-10-17 03:31:47.325384] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-598
[2017-10-17 03:32:00.855905] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599
(version: 3.8.13)
[2017-10-17 03:33:23.001337] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599
[2017-10-17 03:33:23.001400] I [MSGID: 115013]
[server-helpers.c:293:do_fd_cleanup] 0-video-server: fd cleanup on /xx
[2017-10-17 03:33:23.001450] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-599
[2017-10-17 03:33:33.860452] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-01-6174-2017/07/13-10:46:48:503667-video-client-4-7-600
(version: 3.8.13)
[2017-10-17 03:54:05.433317] I [MSGID: 115036]
[server.c:548:server_rpc_notify] 0-video-server: disconnecting
connection from
node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405
[2017-10-17 03:54:05.433353] I [MSGID: 101055]
[client_t.c:415:gf_client_unref] 0-video-server: Shutting down
connection node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-405
[2017-10-17 03:54:15.739343] I [MSGID: 115029]
[server-handshake.c:692:server_setvolume] 0-video-server: accepted
client from node-dc4-02-29040-2017/08/04-09:31:22:842268-video-client-4-7-406
(version: 3.8.13)
On 17 October 2017 at 10:26, Nithya Balachandran <nbalacha at redhat.com>
wrote:
>
>
> On 17 October 2017 at 14:48, Stephen Remde <stephen.remde at
gaist.co.uk>
> wrote:
>
>> Hi,
>>
>>
>> I have a rebalance that has failed on one peer twice now. Rebalance
logs below (directories anonomised and some irrelevant log lines cut). It looks
like it loses connection to the brick, but immediately stops the rebalance on
that peer instead of waiting for reconnection - which happens a second or so
later.
>> Is this normal behaviour? So far it has been the same server and the
same (remote) brick.
>>
>>
>> The brick shows a high number of disconnects compared to the other
bricks on the same server
>>
>>
>> ./export-md0-brick.log.1 2
>> ./export-md1-brick.log.1 2
>> ./export-md2-brick.log.1 181
>> ./export-md3-brick.log.1 2
>>
>>
>> Any clues? What could be causing this because there is nothing in the
log to indicate cause.
>>
>> The rebalance process requires that all DHT child subvols be up during
> the operation as it needs to reapply the directory layouts (which requires
> all child subvols to be up). As this is a pure distribute volume, even a
> single brick getting disconnected is enough to cause the process to stop.
>
> You would need to figure out why that brick is disconnecting so often. The
> brick logs might help with that.
>
> Regards,
> Nithya
>
>
>>
>> Steve
>>
>>
>> gluster volume info video
>>
>> Volume Name: video
>> Type: Distribute
>> Volume ID: ccdac37f-9b0e-415f-b62e-9071d8168199
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 9
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.0.0.31:/export/md0/brick
>> Brick2: 10.0.0.32:/export/md0/brick
>> Brick3: 10.0.0.31:/export/md1/brick
>> Brick4: 10.0.0.32:/export/md1/brick
>> Brick5: 10.0.0.31:/export/md2/brick
>> Brick6: 10.0.0.32:/export/md2/brick
>> Brick7: 10.0.0.31:/export/md3/brick
>> Brick8: 10.0.0.32:/export/md3/brick
>> Brick9: 10.0.0.33:/export/md0/brick
>> Options Reconfigured:
>> network.ping-timeout: 10
>> cluster.min-free-disk: 1%
>> transport.address-family: inet
>> performance.readdir-ahead: on
>> nfs.disable: on
>> cluster.rebal-throttle: lazy
>>
>> [2017-10-12 23:00:55.099153] W [socket.c:590:__socket_rwv]
0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer)
>> [2017-10-12 23:00:55.099709] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from
video-client-4. Client process will keep trying to connect to glusterd until
brick's port is available
>> [2017-10-12 23:00:55.099741] W [MSGID: 109073]
[dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting
>> [2017-10-12 23:00:55.099752] I [MSGID: 109029]
[dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance
>> [2017-10-12 23:01:05.478462] I [rpc-clnt.c:1947:rpc_clnt_reconfig]
0-video-client-4: changing port to 49164 (from 0)
>> [2017-10-12 23:01:05.481180] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs] 0-video-client-4:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
>> [2017-10-12 23:01:05.482630] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected to
video-client-4, attached to remote volume '/export/md2/brick'.
>> [2017-10-12 23:01:05.482659] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and
Client lk-version numbers are not same, reopening the fds
>> [2017-10-12 23:01:05.483365] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server lk
version = 1
>> [2017-10-12 23:01:30.310089] I
[dht-rebalance.c:2819:gf_defrag_process_dir] 0-DHT: Found critical error from
gf_defrag_get_entry
>> [2017-10-12 23:01:30.310166] E [MSGID: 109111]
[dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: gf_defrag_process_dir
failed for directory: /y/y/y/y/y
>> [2017-10-12 23:01:30.380574] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/y/y/y/y/y
>> [2017-10-12 23:01:30.380756] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/y/y/y/y
>> [2017-10-12 23:01:30.380879] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/y/y/y
>> [2017-10-12 23:01:30.380965] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/y/y
>> [2017-10-12 23:03:09.285157] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f112b6d16ba]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55b325019545]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55b3250193b4] ) 0-: received
signum (15), shutting down
>>
>> [2017-10-17 03:20:28.921512] W [socket.c:590:__socket_rwv]
0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by peer)
>> [2017-10-17 03:20:28.921554] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from
video-client-4. Client process will keep trying to connect to glusterd until
brick's port is available
>> [2017-10-17 03:20:28.921570] W [MSGID: 109073]
[dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN. Exiting
>> [2017-10-17 03:20:28.921578] I [MSGID: 109029]
[dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on rebalance
>> [2017-10-17 03:20:39.344417] I [rpc-clnt.c:1947:rpc_clnt_reconfig]
0-video-client-4: changing port to 49164 (from 0)
>> [2017-10-17 03:20:39.347440] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs] 0-video-client-4:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
>> [2017-10-17 03:20:39.349244] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4: Connected to
video-client-4, attached to remote volume '/export/md2/brick'.
>> [2017-10-17 03:20:39.349261] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4: Server and
Client lk-version numbers are not same, reopening the fds
>> [2017-10-17 03:20:39.350611] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4: Server lk
version = 1
>> [2017-10-17 03:27:17.231133] I
[dht-rebalance.c:2819:gf_defrag_process_dir] 0-DHT: Found critical error from
gf_defrag_get_entry
>> [2017-10-17 03:27:17.231214] E [MSGID: 109111]
[dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht: gf_defrag_process_dir
failed for directory: /x/x/x/x/x
>> [2017-10-17 03:27:17.562481] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/x/x/x/x/x
>> [2017-10-17 03:27:17.562619] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/x/x/x/x
>> [2017-10-17 03:27:17.562726] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/x/x/x
>> [2017-10-17 03:27:17.562810] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout failed for
/x/x
>> [2017-10-17 03:27:18.379825] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f700b9696ba]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55f9c0022545]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55f9c00223b4] ) 0-: received
signum (15), shutting down
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
--
Dr Stephen Remde
Director, Innovation and Research
T: 01535 280066
M: 07764 740920
E: stephen.remde at gaist.co.uk
W: www.gaist.co.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171017/b019f890/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1734 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171017/b019f890/attachment.jpg>