On 03/07/2016 07:40 AM, songxin wrote:> Hi all, > I have a problem about how to recovery a replicate volume. > > precondition: > glusterfs version:3.7.6 > brick of A board :128.224.95.140:/data/brick/gv0 > brick of B board:128.224.162.255:/data/brick/gv0 > > reproduce: > 1.gluster peer probe 128.224.162.255 > > (on A board) > 2.gluster volume create gv0 replica 2 128.224.95.140:/data/brick/gv0 > 128.224.162.255:/data/brick/gv0 force > (on A board) > 3.gluster volume start gv0 > > (on A board) > 4.reboot the B board > > After B board reboot,sometimes I have problems as below. > 1.the peer status some times is rejected when I run "gluster peer > status".This is where you get into the problem. I am really not sure what happens when you reboot a board. In our earlier conversation w.r.t to a similar problem you did mention that board reboot doesn't wipe of /var/lib/glusterd, please double confirm! Also please send cmd_history.log along with glusterd log from both the nodes. Also post reboot are you also trying to detach/probe A? If so before detaching was A & B were in cluster connected state?> (on A or B board) > 2.The brick on B board sometimes is offline When I run "gluster volume > status" > (on A or B board) > > I want to know how I should do to recovery my replicate volume. > > PS. > Now I do following operation to recovery my replicate volume.But > sometimes I can't sync all the files in replicate volume even if I run > "heal full". > 1.gluster volume remove-brick gv0 replica 1 > 128.224.162.255:/data/brick/gv0 force > (on A board) > 2. gluster peer detach 128.224.162.255 > > (on A board) > 3.gluster peer probe 128.224.162.255 > > (on A board) > 4.gluster volume add-brick gv0 replica 2 128.224.162.255:/data/brick/gv0 > force > (on A board) > > > > Please help me. > > Thanks, > Xin > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Hi all, I have a file has a problem of gfid-mismatch as below. stat: cannot stat '/mnt/c//public_html/cello/ior_files/nameroot.ior': Input/output error Remote: getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior # file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x000000000000000256ded2f6000ad80f trusted.gfid=0x771221a7bb3c4f1aade40ce9e38a95ee Local: getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior # file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior trusted.bit-rot.version=0x000000000000000256ded38f000e3a51 trusted.gfid=0x8ea33f46703c4e2d95c09153c1b858fd There is a saying in link https://gluster.readthedocs.org/en/latest/Troubleshooting/split-brain/ as below. This is done by observing the afr changelog extended attributes of the file on the bricks using the getfattr command; then identifying the type of split-brain (data split-brain, metadata split-brain, entry split-brain or split-brain due to gfid-mismatch); and finally determining which of the bricks contains the 'good copy' of the file. So the gfid-mismatch is also a split-brain. But I found that "gluster volume heal gv0 info split-brain" can't show split-brain entry due to gfid-mismatch. My question is following: 1.Which command can be used to show split-brain due to gfid-mismatch? 2.How to heal it?Is it same as data split-brain? Thanks? Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160310/099f2c6b/attachment.html>
Hi, I have create a replicate volume and I want to run "gluster volume heal gv0 full". I found that if I run "gluster volume heal gv0 full" on one board it always output err like below. Launching heal operation to perform full self heal on volume gv0 has been unsuccessful But If I run "heal full " on the another board it alway sucessful. I found the code of glusterfs as below. if (gf_uuid_compare (brickinfo->uuid, candidate) > 0) gf_uuid_copy (candidate, brickinfo->uuid); if ((*index) % hxl_children == 0) { if (!gf_uuid_compare (MY_UUID, candidate)) { _add_hxlator_to_dict (dict, volinfo, ((*index)-1)/hxl_children, (*hxlator_count)); (*hxlator_count)++; } gf_uuid_clear (candidate); } My question is below: Must I run "heal full" on the board whose uuid is the biggest? If so, how cound I know which is the biggest board before I try to run "heal full" on every board? Thanks, Xin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160314/bd963f44/attachment.html>