Okay so it's fixed by killing Gluster and rebooting the node again.
--
Respectfully
Mahdi A. Mahdi
________________________________
From: gluster-users-bounces at gluster.org <gluster-users-bounces at
gluster.org> on behalf of Mahdi Adnan <mahdi.adnan at outlook.com>
Sent: Wednesday, May 3, 2017 10:15:45 AM
To: gluster-users at gluster.org
Subject: [Gluster-users] Gluster long healing process
Hi,
I have a 4 node Gluster volume, each has 24 SSD brick running Gluster 3.8.10
(two volumes), i updated one of the nodes to 3.8.11 and rebooted the node, after
it came back online the healing process started and it never ended.
It has been 24 hours and the healing is still going, gluster vol heal $VOL info
return number of entries that need healing and it decrees and increase randomly.
The node is writing lots of Gigabytes and i dont know if this is normal or
something im missing.
Volume details;
Volume Name: ovirt_imgs
Type: Distributed-Replicate
Volume ID: 40d1354b-8e85-4464-8c71-9e2efbe10a63
Status: Started
Snapshot Count: 0
Number of Bricks: 26 x 2 = 52
Transport-type: tcp
Bricks:
Brick1: gluster01:/mnt/ovirt_disk1/ovirt_imgs
Brick2: gluster03:/mnt/ovirt_disk1/ovirt_imgs
Brick3: gluster02:/mnt/ovirt_disk1/ovirt_imgs
Brick4: gluster04:/mnt/ovirt_disk1/ovirt_imgs
Brick5: gluster01:/mnt/ovirt_disk2/ovirt_imgs
Brick6: gluster03:/mnt/ovirt_disk2/ovirt_imgs
Brick7: gluster02:/mnt/ovirt_disk2/ovirt_imgs
Brick8: gluster04:/mnt/ovirt_disk2/ovirt_imgs
Brick9: gluster01:/mnt/ovirt_disk3/ovirt_imgs
Brick10: gluster03:/mnt/ovirt_disk3/ovirt_imgs
Brick11: gluster02:/mnt/ovirt_disk3/ovirt_imgs
Brick12: gluster04:/mnt/ovirt_disk3/ovirt_imgs
Brick13: gluster01:/mnt/ovirt_disk4/ovirt_imgs
Brick14: gluster03:/mnt/ovirt_disk4/ovirt_imgs
Brick15: gluster02:/mnt/ovirt_disk4/ovirt_imgs
Brick16: gluster04:/mnt/ovirt_disk4/ovirt_imgs
Brick17: gluster01:/mnt/ovirt_disk5/ovirt_imgs
Brick18: gluster03:/mnt/ovirt_disk5/ovirt_imgs
Brick19: gluster02:/mnt/ovirt_disk5/ovirt_imgs
Brick20: gluster04:/mnt/ovirt_disk5/ovirt_imgs
Brick21: gluster01:/mnt/ovirt_disk6/ovirt_imgs
Brick22: gluster03:/mnt/ovirt_disk6/ovirt_imgs
Brick23: gluster02:/mnt/ovirt_disk6/ovirt_imgs
Brick24: gluster04:/mnt/ovirt_disk6/ovirt_imgs
Brick25: gluster01:/mnt/ovirt_disk7/ovirt_imgs
Brick26: gluster03:/mnt/ovirt_disk7/ovirt_imgs
Brick27: gluster02:/mnt/ovirt_disk7/ovirt_imgs
Brick28: gluster04:/mnt/ovirt_disk7/ovirt_imgs
Brick29: gluster01:/mnt/ovirt_disk8/ovirt_imgs
Brick30: gluster03:/mnt/ovirt_disk8/ovirt_imgs
Brick31: gluster02:/mnt/ovirt_disk8/ovirt_imgs
Brick32: gluster04:/mnt/ovirt_disk8/ovirt_imgs
Brick33: gluster01:/mnt/ovirt_disk9/ovirt_imgs
Brick34: gluster03:/mnt/ovirt_disk9/ovirt_imgs
Brick35: gluster02:/mnt/ovirt_disk9/ovirt_imgs
Brick36: gluster04:/mnt/ovirt_disk9/ovirt_imgs
Brick37: gluster01:/mnt/ovirt_disk10/ovirt_imgs
Brick38: gluster03:/mnt/ovirt_disk10/ovirt_imgs
Brick39: gluster02:/mnt/ovirt_disk10/ovirt_imgs
Brick40: gluster04:/mnt/ovirt_disk10/ovirt_imgs
Brick41: gluster01:/mnt/ovirt_disk11/ovirt_imgs
Brick42: gluster03:/mnt/ovirt_disk11/ovirt_imgs
Brick43: gluster02:/mnt/ovirt_disk11/ovirt_imgs
Brick44: gluster04:/mnt/ovirt_disk11/ovirt_imgs
Brick45: gluster01:/mnt/ovirt_disk12/ovirt_imgs
Brick46: gluster03:/mnt/ovirt_disk12/ovirt_imgs
Brick47: gluster02:/mnt/ovirt_disk12/ovirt_imgs
Brick48: gluster04:/mnt/ovirt_disk12/ovirt_imgs
Brick49: gluster01:/mnt/ovirt_disk13/ovirt_imgs
Brick50: gluster03:/mnt/ovirt_disk13/ovirt_imgs
Brick51: gluster02:/mnt/ovirt_disk13/ovirt_imgs
Brick52: gluster04:/mnt/ovirt_disk13/ovirt_imgs
Options Reconfigured:
ganesha.enable: off
features.cache-invalidation: off
features.shard-block-size: 256MB
storage.owner-gid: 36
storage.owner-uid: 36
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: none
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.server-quorum-ratio: 51%
nfs-ganesha: enable
cluster.enable-shared-storage: enable
OS: Centos 7.3 latest.
gluster heal log sample;
[2017-05-03 07:01:29.487108] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-45: changing port to 49571 (from 0)
[2017-05-03 07:01:29.489004] I [MSGID: 114020] [client.c:2356:notify]
0-ovirt_imgs-client-47: parent translators are ready, attempting connect on
transport
[2017-05-03 07:01:29.491077] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-44: Connected
to ovirt_imgs-client-44, attached to remote volume
'/mnt/ovirt_disk12/ovirt_imgs'.
[2017-05-03 07:01:29.491092] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-44: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.491123] I [MSGID: 108005] [afr-common.c:4387:afr_notify]
0-ovirt_imgs-replicate-22: Subvolume 'ovirt_imgs-client-44' came back
up; going online.
[2017-05-03 07:01:29.491173] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-44:
Server lk version = 1
[2017-05-03 07:01:29.491280] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-45: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.491331] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-46: changing port to 49521 (from 0)
[2017-05-03 07:01:29.493119] I [MSGID: 114020] [client.c:2356:notify]
0-ovirt_imgs-client-48: parent translators are ready, attempting connect on
transport
[2017-05-03 07:01:29.495480] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-45: Connected
to ovirt_imgs-client-45, attached to remote volume
'/mnt/ovirt_disk12/ovirt_imgs'.
[2017-05-03 07:01:29.495496] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-45: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.495670] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-46: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.495729] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-45:
Server lk version = 1
[2017-05-03 07:01:29.495798] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-47: changing port to 49465 (from 0)
[2017-05-03 07:01:29.497438] I [MSGID: 114020] [client.c:2356:notify]
0-ovirt_imgs-client-49: parent translators are ready, attempting connect on
transport
[2017-05-03 07:01:29.499871] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-46: Connected
to ovirt_imgs-client-46, attached to remote volume
'/mnt/ovirt_disk12/ovirt_imgs'.
[2017-05-03 07:01:29.499887] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-46: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.499915] I [MSGID: 108005] [afr-common.c:4387:afr_notify]
0-ovirt_imgs-replicate-23: Subvolume 'ovirt_imgs-client-46' came back
up; going online.
[2017-05-03 07:01:29.500015] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-46:
Server lk version = 1
[2017-05-03 07:01:29.500032] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-48: changing port to 49645 (from 0)
[2017-05-03 07:01:29.500052] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-47: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.501776] I [MSGID: 114020] [client.c:2356:notify]
0-ovirt_imgs-client-50: parent translators are ready, attempting connect on
transport
[2017-05-03 07:01:29.504191] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-47: Connected
to ovirt_imgs-client-47, attached to remote volume
'/mnt/ovirt_disk12/ovirt_imgs'.
[2017-05-03 07:01:29.504208] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-47: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.504313] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-47:
Server lk version = 1
[2017-05-03 07:01:29.504330] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-48: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.504462] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-49: changing port to 49572 (from 0)
[2017-05-03 07:01:29.506374] I [MSGID: 114020] [client.c:2356:notify]
0-ovirt_imgs-client-51: parent translators are ready, attempting connect on
transport
[2017-05-03 07:01:29.508431] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-48: Connected
to ovirt_imgs-client-48, attached to remote volume
'/mnt/ovirt_disk13/ovirt_imgs'.
[2017-05-03 07:01:29.508456] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-48: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.508498] I [MSGID: 108005] [afr-common.c:4387:afr_notify]
0-ovirt_imgs-replicate-24: Subvolume 'ovirt_imgs-client-48' came back
up; going online.
[2017-05-03 07:01:29.508556] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-48:
Server lk version = 1
[2017-05-03 07:01:29.508603] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-49: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.508725] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-50: changing port to 49522 (from 0)
[2017-05-03 07:01:29.510779] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-49: Connected
to ovirt_imgs-client-49, attached to remote volume
'/mnt/ovirt_disk13/ovirt_imgs'.
[2017-05-03 07:01:29.510796] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-49: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.510903] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-49:
Server lk version = 1
[2017-05-03 07:01:29.511062] I [rpc-clnt.c:1965:rpc_clnt_reconfig]
0-ovirt_imgs-client-51: changing port to 49466 (from 0)
[2017-05-03 07:01:29.512828] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-50: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.513197] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-50: Connected
to ovirt_imgs-client-50, attached to remote volume
'/mnt/ovirt_disk13/ovirt_imgs'.
[2017-05-03 07:01:29.513214] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-50: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.513236] I [MSGID: 108005] [afr-common.c:4387:afr_notify]
0-ovirt_imgs-replicate-25: Subvolume 'ovirt_imgs-client-50' came back
up; going online.
[2017-05-03 07:01:29.513314] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-50:
Server lk version = 1
[2017-05-03 07:01:29.515127] I [MSGID: 114057]
[client-handshake.c:1440:select_server_supported_programs]
0-ovirt_imgs-client-51: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-05-03 07:01:29.515520] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-51: Connected
to ovirt_imgs-client-51, attached to remote volume
'/mnt/ovirt_disk13/ovirt_imgs'.
[2017-05-03 07:01:29.515530] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-51: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:29.515628] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-51:
Server lk version = 1
[2017-05-03 07:01:30.009624] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-ovirt_imgs-client-40: Connected
to ovirt_imgs-client-40, attached to remote volume
'/mnt/ovirt_disk11/ovirt_imgs'.
[2017-05-03 07:01:30.009653] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-ovirt_imgs-client-40: Server
and Client lk-version numbers are not same, reopening the fds
[2017-05-03 07:01:30.234722] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-ovirt_imgs-client-40:
Server lk version = 1
[2017-05-03 07:01:30.235633] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-0: selecting
local read_child ovirt_imgs-client-0
[2017-05-03 07:01:30.236983] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-2: selecting
local read_child ovirt_imgs-client-4
[2017-05-03 07:01:30.237492] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-4: selecting
local read_child ovirt_imgs-client-8
[2017-05-03 07:01:30.238310] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-6: selecting
local read_child ovirt_imgs-client-12
[2017-05-03 07:01:30.238553] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-8: selecting
local read_child ovirt_imgs-client-16
[2017-05-03 07:01:30.238670] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-10: selecting
local read_child ovirt_imgs-client-20
[2017-05-03 07:01:30.238791] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-12: selecting
local read_child ovirt_imgs-client-24
[2017-05-03 07:01:30.238881] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-14: selecting
local read_child ovirt_imgs-client-28
[2017-05-03 07:01:30.238961] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-16: selecting
local read_child ovirt_imgs-client-32
[2017-05-03 07:01:30.239014] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-18: selecting
local read_child ovirt_imgs-client-36
[2017-05-03 07:01:30.239100] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-22: selecting
local read_child ovirt_imgs-client-44
[2017-05-03 07:01:30.239140] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-20: selecting
local read_child ovirt_imgs-client-40
[2017-05-03 07:01:30.239150] I [MSGID: 104041]
[glfs-resolve.c:885:__glfs_active_subvol] 0-ovirt_imgs: switched to graph
676c7573-7465-7230-312d-31333836322d (0)
[2017-05-03 07:01:30.239200] I [MSGID: 108031]
[afr-common.c:2157:afr_local_discovery_cbk] 0-ovirt_imgs-replicate-24: selecting
local read_child ovirt_imgs-client-48
i appreciate the help.
Thanks
--
Respectfully
Mahdi A. Mahdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170505/8d0b66fa/attachment.html>