Hi
I still don't know what caused it, wether the failure of one node in gluster
that lost one SATA controller and was rebooted or some user activity, but
gluster became quite unusable. Even basic gluster commands like gluster volume
heal home0 info etc didn't work either hanging or giving operation failed
results. I finally managed to stop the volume after numerous attempts and
restarted gluster on all nodes. However I don't seem to be able to do
anything useful still. Most commands fail and the log shows:
==> etc-glusterfs-glusterd.vol.log <=[2012-12-13 14:59:49.713103] I
[glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume home0
[2012-12-13 14:59:49.713194] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd:
Unable to get lock for uuid: c3ce6b9c-6297-4e77-924c-b44e2c13e58f, lock held by:
c3ce6b9c-6297-4e77-924c-b44e2c13e58f
[2012-12-13 14:59:49.713234] E [glusterd-handler.c:458:glusterd_op_txn_begin]
0-management: Unable to acquire local lock, ret: -1
I've googled and seen people hit with this at times, but never resolutions.
Is there some way to clear this lock? It's been in effect for well over an
hour so one of the googled results that claimed there's a generic lock
timeout of 30 minutes seems not to be at work here.
Any help would be appreciated.
[root at se1 home0]# gluster volume info
Volume Name: home0
Type: Distributed-Replicate
Volume ID: 8e594854-16e1-445e-8434-1d597cef1749
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: 192.168.1.241:/d35
Brick2: 192.168.1.242:/d35
Brick3: 192.168.1.243:/d35
Brick4: 192.168.1.244:/d35
Brick5: 192.168.1.245:/d35
Brick6: 192.168.1.240:/d35
Brick7: 192.168.1.241:/d36
Brick8: 192.168.1.242:/d36
Brick9: 192.168.1.243:/d36
Brick10: 192.168.1.244:/d36
Brick11: 192.168.1.245:/d36
Brick12: 192.168.1.240:/d36
Options Reconfigured:
cluster.quorum-type: auto
cluster.lookup-unhashed: off
performance.client-io-threads: on
cluster.data-self-heal: on
performance.stat-prefetch
[root at se1 home0]# gluster volume status
Status of volume: home0
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.1.241:/d35 24009 Y 7137
Brick 192.168.1.242:/d35 24009 Y 6804
Brick 192.168.1.243:/d35 24009 Y 5763
Brick 192.168.1.244:/d35 24009 Y 10378
Brick 192.168.1.245:/d35 24009 Y 3770
Brick 192.168.1.240:/d35 24009 Y 21112
Brick 192.168.1.241:/d36 24010 Y 7143
Brick 192.168.1.242:/d36 24010 Y 6810
Brick 192.168.1.243:/d36 24010 Y 5771
Brick 192.168.1.244:/d36 24010 Y 10384
Brick 192.168.1.245:/d36 24010 Y 3781
Brick 192.168.1.240:/d36 24010 Y 21120
NFS Server on localhost 38467 Y 13552
Self-heal Daemon on localhost N/A Y 13792
NFS Server on 192.168.1.242 38467 Y 21254
Self-heal Daemon on 192.168.1.242 N/A Y 21267
NFS Server on 192.168.1.243 38467 Y 8865
Self-heal Daemon on 192.168.1.243 N/A Y 8871
NFS Server on 192.168.1.240 38467 Y 18806
Self-heal Daemon on 192.168.1.240 N/A Y 19045
NFS Server on 192.168.1.244 38467 Y 536
Self-heal Daemon on 192.168.1.244 N/A Y 745
NFS Server on 192.168.1.245 38467 Y 8689
Self-heal Daemon on 192.168.1.245 N/A Y 8955
[root at se1 home0]#
[root at se1 home0]# gluster volume heal home0 info
==> cli.log <=[2012-12-13 15:09:33.476616] W
[rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option
transport-type'. defaulting to "socket"
==> etc-glusterfs-glusterd.vol.log <=[2012-12-13 15:09:33.565022] I
[glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume home0
[2012-12-13 15:09:33.565122] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd:
Cluster lock held by c3ce6b9c-6297-4e77-924c-b44e2c13e58f
[2012-12-13 15:09:33.565136] I [glusterd-handler.c:463:glusterd_op_txn_begin]
0-management: Acquired local lock
[2012-12-13 15:09:33.565938] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC
from uuid: 663ecbfb-4209-417e-a955-6c9f72751dbc
[2012-12-13 15:09:33.565999] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC
from uuid: f1a89ed2-a2f5-49a9-9482-1c6984c37945
[2012-12-13 15:09:33.566024] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC
from uuid: b1ce84be-de0b-4ae1-a1e8-758d828b8872
[2012-12-13 15:09:33.566047] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC
from uuid: 0f61d484-0f93-4144-b166-2145f4ea4427
[2012-12-13 15:09:33.566069] I
[glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC
from uuid: d9b48655-4b25-4ad2-be19-c5ec8768a789
[2012-12-13 15:09:33.566224] I
[glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to
5 peers
[2012-12-13 15:09:33.566420] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk]
0-glusterd: Received ACC from uuid: b1ce84be-de0b-4ae1-a1e8-758d828b8872
[2012-12-13 15:09:33.566450] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk]
0-glusterd: Received ACC from uuid: d9b48655-4b25-4ad2-be19-c5ec8768a789
[2012-12-13 15:09:33.566499] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk]
0-glusterd: Received ACC from uuid: f1a89ed2-a2f5-49a9-9482-1c6984c37945
[2012-12-13 15:09:33.566524] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk]
0-glusterd: Received ACC from uuid: 0f61d484-0f93-4144-b166-2145f4ea4427
[2012-12-13 15:09:33.566667] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk]
0-glusterd: Received ACC from uuid: 663ecbfb-4209-417e-a955-6c9f72751dbc
<hangs here>
ctrl+C
[root at se1 home0]# gluster volume heal home0
operation failed
[root at se1 home0]#
==> cli.log <=[2012-12-13 15:10:00.686308] W
[rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option
transport-type'. defaulting to "socket"
[2012-12-13 15:10:00.842108] I [cli-rpc-ops.c:5928:gf_cli3_1_heal_volume_cbk]
0-cli: Received resp to heal volume
[2012-12-13 15:10:00.842187] I [input.c:46:cli_batch] 0-: Exiting with: -1
==> etc-glusterfs-glusterd.vol.log <=[2012-12-13 15:10:00.841789] I
[glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume home0
[2012-12-13 15:10:00.841910] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd:
Unable to get lock for uuid: c3ce6b9c-6297-4e77-924c-b44e2c13e58f, lock held by:
c3ce6b9c-6297-4e77-924c-b44e2c13e58f
[2012-12-13 15:10:00.841926] E [glusterd-handler.c:458:glusterd_op_txn_begin]
0-management: Unable to acquire local lock, ret: -1
Mario Kadastik, PhD
Researcher
---
"Physics is like sex, sure it may have practical reasons, but that's
not why we do it"
-- Richard P. Feynman