Roman
2014-Nov-06 11:12 UTC
[Gluster-users] healing never ends (or never starts?) on replicated volume with virtual block device
Hi, another stupid/interesting situation: root at stor1:~# gluster volume heal HA-WIN-TT-1T info Brick stor1:/exports/NFS-WIN/1T/ /disk - Possibly undergoing heal Number of entries: 1 Brick stor2:/exports/NFS-WIN/1T/ /test /disk - Possibly undergoing heal Number of entries: 2 due to testings I've brought down stor1 port on the switch and the made it up again. then one of the volumes successfully restored and healed (with virtual machines) while other still (about 2 hours atm) says, there is a healing process, meanwhile there is no traffic between the servers and client/server. the /test is simple new file, i've made while stor1 was down. the /disk is a simple virtual block-device made of /dev/null which is 900GB and is mounted on windows server via iscsitarget :). and it seem it wont stop healing forever, as it can't decide which file is right? gluster client machine, where is volume for iscsi target is monted logs: [2014-11-06 08:19:36.949092] W [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0: remote operation failed: Transport endpoint is not connected [2014-11-06 08:19:36.949148] W [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0: remote operation failed: Transport endpoint is not connected [2014-11-06 08:19:36.951202] W [client-rpc-fops.c:1580:client3_3_finodelk_cbk] 0-HA-WIN-TT-1T-client-0: remote operation failed: Transport endpoint is not connected [2014-11-06 08:19:57.682937] W [socket.c:522:__socket_rwv] 0-glusterfs: readv on 10.250.0.1:24007 failed (Connection timed out) [2014-11-06 08:20:17.950981] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host) [2014-11-06 08:20:40.062928] E [socket.c:2161:socket_connect_finish] 0-HA-WIN-TT-1T-client-0: connection to 10.250.0.1:24007 failed (Connection timed out) [2014-11-06 08:30:15.638197] W [dht-diskusage.c:232:dht_is_subvol_filled] 0-HA-WIN-TT-1T-dht: disk space on subvolume 'HA-WIN-TT-1T-replicate-0' is getting full (95.00 %), consider adding more nodes [2014-11-06 08:36:18.385659] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2014-11-06 08:36:18.386573] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0) [2014-11-06 08:36:18.387182] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-11-06 08:36:18.387414] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Connected to 10.250.0.1:49160, attached to remote volume '/exports/NFS-WIN/1T'. [2014-11-06 08:36:18.387433] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: Server and Client lk-version numbers are not same, reopening the fds [2014-11-06 08:36:18.387446] I [client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-0: 1 fds open - Delaying child_up until they are re-opened [2014-11-06 08:36:18.387730] I [client-handshake.c:936:client_child_up_reopen_done] 0-HA-WIN-TT-1T-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-11-06 08:36:18.387862] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: Server lk version = 1 brick log on stor1: [2014-11-06 08:38:04.269503] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-11-06 08:38:04.269908] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Connected to 10.250.0.2:49160, attached to remote volume '/exports/NFS-WIN/1T'. [2014-11-06 08:38:04.269962] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: Server and Client lk-version numbers are not same, reopening the fds [2014-11-06 08:38:04.270560] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: Server lk version = 1 [2014-11-06 08:39:33.277219] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 08:49:33.327786] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 08:59:33.375835] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 09:09:33.430726] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 09:19:33.486488] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 09:29:33.541596] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 09:39:33.595242] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 09:49:33.648526] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 09:59:33.702368] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 10:09:33.756633] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 10:19:33.810984] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 10:29:33.865172] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 10:39:33.918765] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 10:49:33.973283] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 [2014-11-06 10:59:34.028836] I [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: Another crawl is in progress for HA-WIN-TT-1T-client-0 same on stor2 -- Best regards, Roman. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141106/af82adfc/attachment.html>
Roman
2014-Nov-06 12:40 UTC
[Gluster-users] healing never ends (or never starts?) on replicated volume with virtual block device
oh, never mind. it is synced now. took a LOT of time :) 2014-11-06 13:12 GMT+02:00 Roman <romeo.r at gmail.com>:> Hi, > > another stupid/interesting situation: > > root at stor1:~# gluster volume heal HA-WIN-TT-1T info > Brick stor1:/exports/NFS-WIN/1T/ > /disk - Possibly undergoing heal > Number of entries: 1 > > Brick stor2:/exports/NFS-WIN/1T/ > /test > /disk - Possibly undergoing heal > Number of entries: 2 > > due to testings I've brought down stor1 port on the switch and the made it > up again. > then one of the volumes successfully restored and healed (with virtual > machines) > while other still (about 2 hours atm) says, there is a healing process, > meanwhile there is no traffic between the servers and client/server. > > the /test is simple new file, i've made while stor1 was down. > the /disk is a simple virtual block-device made of /dev/null which is > 900GB and is mounted on windows server via iscsitarget :). and it seem it > wont stop healing forever, as it can't decide which file is right? > > gluster client machine, where is volume for iscsi target is monted logs: > [2014-11-06 08:19:36.949092] W > [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0: > remote operation failed: Transport endpoint is not connected > [2014-11-06 08:19:36.949148] W > [client-rpc-fops.c:1812:client3_3_fxattrop_cbk] 0-HA-WIN-TT-1T-client-0: > remote operation failed: Transport endpoint is not connected > [2014-11-06 08:19:36.951202] W > [client-rpc-fops.c:1580:client3_3_finodelk_cbk] 0-HA-WIN-TT-1T-client-0: > remote operation failed: Transport endpoint is not connected > [2014-11-06 08:19:57.682937] W [socket.c:522:__socket_rwv] 0-glusterfs: > readv on 10.250.0.1:24007 failed (Connection timed out) > [2014-11-06 08:20:17.950981] E [socket.c:2161:socket_connect_finish] > 0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host) > [2014-11-06 08:20:40.062928] E [socket.c:2161:socket_connect_finish] > 0-HA-WIN-TT-1T-client-0: connection to 10.250.0.1:24007 failed > (Connection timed out) > [2014-11-06 08:30:15.638197] W [dht-diskusage.c:232:dht_is_subvol_filled] > 0-HA-WIN-TT-1T-dht: disk space on subvolume 'HA-WIN-TT-1T-replicate-0' is > getting full (95.00 %), consider adding more nodes > [2014-11-06 08:36:18.385659] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2014-11-06 08:36:18.386573] I [rpc-clnt.c:1729:rpc_clnt_reconfig] > 0-HA-WIN-TT-1T-client-0: changing port to 49160 (from 0) > [2014-11-06 08:36:18.387182] I > [client-handshake.c:1677:select_server_supported_programs] > 0-HA-WIN-TT-1T-client-0: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2014-11-06 08:36:18.387414] I > [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: > Connected to 10.250.0.1:49160, attached to remote volume > '/exports/NFS-WIN/1T'. > [2014-11-06 08:36:18.387433] I > [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-0: > Server and Client lk-version numbers are not same, reopening the fds > [2014-11-06 08:36:18.387446] I > [client-handshake.c:1314:client_post_handshake] 0-HA-WIN-TT-1T-client-0: 1 > fds open - Delaying child_up until they are re-opened > [2014-11-06 08:36:18.387730] I > [client-handshake.c:936:client_child_up_reopen_done] > 0-HA-WIN-TT-1T-client-0: last fd open'd/lock-self-heal'd - notifying > CHILD-UP > [2014-11-06 08:36:18.387862] I > [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-0: > Server lk version = 1 > > brick log on stor1: > > [2014-11-06 08:38:04.269503] I > [client-handshake.c:1677:select_server_supported_programs] > 0-HA-WIN-TT-1T-client-1: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2014-11-06 08:38:04.269908] I > [client-handshake.c:1462:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: > Connected to 10.250.0.2:49160, attached to remote volume > '/exports/NFS-WIN/1T'. > [2014-11-06 08:38:04.269962] I > [client-handshake.c:1474:client_setvolume_cbk] 0-HA-WIN-TT-1T-client-1: > Server and Client lk-version numbers are not same, reopening the fds > [2014-11-06 08:38:04.270560] I > [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-WIN-TT-1T-client-1: > Server lk version = 1 > [2014-11-06 08:39:33.277219] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 08:49:33.327786] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 08:59:33.375835] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 09:09:33.430726] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 09:19:33.486488] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 09:29:33.541596] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 09:39:33.595242] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 09:49:33.648526] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 09:59:33.702368] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 10:09:33.756633] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 10:19:33.810984] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 10:29:33.865172] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 10:39:33.918765] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 10:49:33.973283] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > [2014-11-06 10:59:34.028836] I > [afr-self-heald.c:1690:afr_dir_exclusive_crawl] 0-HA-WIN-TT-1T-replicate-0: > Another crawl is in progress for HA-WIN-TT-1T-client-0 > > same on stor2 > > -- > Best regards, > Roman. >-- Best regards, Roman. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141106/f5f721e9/attachment.html>