Matthew Temple
2013-Feb-21 14:33 UTC
[Gluster-users] One brick in replicated/distributed volume not being written to.
Hi, all. I thought I had everything set correctly on my volume, but something is wrong. Here is the volume, made of 4 bricks: Volume Name: gf2 Type: Distributed-Replicate Volume ID: a9e64630-9166-4957-8243-e2933791b24b Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gf2ibp-1:/mnt/d0-0 Brick2: gf2ibp-1r:/mnt/d0-0 Brick3: gf2ibp-2:/mnt/d0-0 Brick4: gf2ibp-2r:/mnt/d0-0 I have Volume gf2 mounted by a computer we call "rcapps" About 6 TB have been written to the volume. When I look at /mnt/d0-0 on all 4 bricks, 3 look correct, but Brick1 only has 48GB written to it. Brick2, which should replicate Brick1, has 4TB. Brick3 and Brick4 seem to have the same amount of data. The status of the volume looks correct: gluster> volume status gf2 Status of volume: gf2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gf2ibp-1:/mnt/d0-0 24011 Y 30754 Brick gf2ibp-1r:/mnt/d0-0 24011 Y 17824 Brick gf2ibp-2:/mnt/d0-0 24011 Y 31516 Brick gf2ibp-2r:/mnt/d0-0 24011 Y 29119 NFS Server on localhost 38467 Y 30760 Self-heal Daemon on localhost N/A Y 30766 NFS Server on gf2ibp-2 38467 Y 31522 Self-heal Daemon on gf2ibp-2 N/A Y 31528 NFS Server on gf2ibp-2r 38467 Y 29125 Self-heal Daemon on gf2ibp-2r N/A Y 29131 NFS Server on gf2ibp-1r 38467 Y 17830 Self-heal Daemon on gf2ibp-1r N/A Y 17836 I then saw I had the firewall turned on for brick2 (even though it could be written) which I then turned off. I thought I should try to heal the volume but when I tried this through the gluster console, the operation failed. In the log file I see (it can't get a lock which is held by itself?): [2013-02-21 09:24:39.501612] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gf2 [2013-02-21 09:24:39.501732] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: f5edea20-9467-48ed-b4f1-dc566a9b6d02, lock held by: f5edea20-9467-48ed-b4f1-dc566a9b6d02 [2013-02-21 09:24:39.501759] E [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 And here is what I see in cli.log, which I can't interpret. 2013-02-21 09:31:38.689316] W [cli-rl.c:116:cli_rl_process_line] 0-glusterfs: failed to process line [2013-02-21 09:31:48.952950] I [cli-rpc-ops.c:5928:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2013-02-21 09:31:48.953366] W [dict.c:2339:dict_unserialize] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120) [0x333440f8b0] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x333440f0b5] (-->gluster(gf_cli3_1_heal_volume_cbk+0x2e3) [0x41ca43]))) 0-dict: buf is null! [2013-02-21 09:31:48.953410] E [cli-rpc-ops.c:5968:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2013-02-21 09:31:48.953490] W [cli-rl.c:116:cli_rl_process_line] 0-glusterfs: failed to process line [2013-02-21 09:31:56.419708] I [cli-rpc-ops.c:5928:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2013-02-21 09:31:56.419859] W [dict.c:2339:dict_unserialize] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120) [0x333440f8b0] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x333440f0b5] (-->gluster(gf_cli3_1_heal_volume_cbk+0x2e3) [0x41ca43]))) 0-dict: buf is null! [2013-02-21 09:31:56.419894] E [cli-rpc-ops.c:5968:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2013-02-21 09:31:56.419979] W [cli-rl.c:116:cli_rl_process_line] 0-glusterfs: failed to process line Any ideas of what I should do next? Right now I have a pair of bricks that replicate fine and a pair that does not, in a distributed/replicated cluster. I need to get get brick2 to send its files back to brick1. Thanks in advance. Matt Temple ------ Matt Temple Director, Research Computing Dana-Farber Cancer Institute. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130221/3c1c8b10/attachment.html>