Ravishankar N
2018-Nov-14 04:34 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
On 11/13/2018 01:09 PM, mabi wrote:> ??????? Original Message ??????? > On Friday, November 9, 2018 2:11 AM, Ravishankar N <ravishankar at redhat.com> wrote: > >> Please re-create the symlink on node 2 to match how it is in the other >> nodes and launch heal again. Check if this is the case for other entries >> too. >> -Ravi > Please ignore my previous mail, I was looking for a symlink with the GFID of node1 or node 3 on my node2 whereas I should have been looking with the GFID of node2 of course. I have now found the symlink on node2 pointing to that problematic directory and it looks like this: > > node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac > node2# ls -la | grep d9ac19 > lrwxrwxrwx 1 root root 66 Nov 5 14:12 d9ac192c-e85e-4402-af10-5551f587ed9a -> ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir > > When you say "re-create the symlink", do you mean I should delete the current symlink on node2 (d9ac192c-e85e-4402-af10-5551f587ed9a) and re-create it with the GFID which is used on my node 1 and node 3 like this?I thought it was missing which is why I asked you to create it.? The trusted.gfid xattr for any given file or directory must be same in all 3 bricks.? But it looks like that isn't the case. Are the gfids and the symlinks for all the dirs leading to the parent dir of oc_dir same on all nodes? (i.e evey directory in /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/)?> node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac > node2# rm d9ac192c-e85e-4402-af10-5551f587ed9a > node2# cd /data/myvol-pro/brick/.glusterfs/25/e2 > node2# ln -s ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir 25e2616b-4fb6-4b2a-8945-1afc956fff19 > > Just want to make sure I understood you correctly before doing that. Could you please let me know if this is correct?Let us see if the parents' gfids are the same before deleting anything. Is the heal info still showing 4 entries? Please also share the getfattr output of the the parent directory (i.e. dir11) . Thanks, Ravi> > Thanks
mabi
2018-Nov-14 09:49 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
??????? Original Message ??????? On Wednesday, November 14, 2018 5:34 AM, Ravishankar N <ravishankar at redhat.com> wrote:> I thought it was missing which is why I asked you to create it.? The > trusted.gfid xattr for any given file or directory must be same in all 3 > bricks.? But it looks like that isn't the case. Are the gfids and the > symlinks for all the dirs leading to the parent dir of oc_dir same on > all nodes? (i.e evey directory in > /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/)?I now checked the GFIDs of all directories leading back down to the parent dir (13 directories in total) and for node 1 and node 3 the GIFDs of all underlying directories match each other. On node 2 they are also all the same except for the two highest directories (".../dir11" and and ".../dir11/oc_dir"). It's exactly these two directories which are also listed in the "volume heal info" output under node 1 and node 2 and which do not get healed. For your reference I have pasted below the GFIDs for all underlying directories up to the parent directory and for all 3 nodes. I start at the top with the highest directory and at the bottom of the list is the parent directory (/data). # NODE 1 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 # /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd # /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11 trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 # ... trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82 trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4 trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94 trusted.gfid=0xf120657977274247900db4e9cc8129dd trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9 trusted.gfid=0x2174086880fc4fd19b187d1384300add trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 # ... trusted.gfid=0xa7d78519db61459399e01fad2badf3fb # /data/dir1/dir2 trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 # /data/dir1 trusted.gfid=0x2683990126724adbb6416b911180e62b # /data # NODE 2 trusted.gfid=0xd9ac192ce85e4402af105551f587ed9a trusted.gfid=0x10ec1eb1c8544ff2a36c325681713093 trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82 trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4 trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94 trusted.gfid=0xf120657977274247900db4e9cc8129dd trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9 trusted.gfid=0x2174086880fc4fd19b187d1384300add trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 trusted.gfid=0xa7d78519db61459399e01fad2badf3fb trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 trusted.gfid=0x2683990126724adbb6416b911180e62b # NODE 3 trusted.gfid=0x25e2616b4fb64b2a89451afc956fff19 trusted.gfid=0x70c894ca422b4bceacf15cfb4669abbd trusted.gfid=0x7d7d2165f4804edf8c93de01c8768269 trusted.gfid=0xdbc0bfa0a052405ca3fad2d1ca137f82 trusted.gfid=0xbb75051c24ba4c119351bef938c55ad4 trusted.gfid=0x0002ad0c3fbe4806a75f8e68304f5b94 trusted.gfid=0xf120657977274247900db4e9cc8129dd trusted.gfid=0x8afeb00bb1e74cbab932acea705b7dd9 trusted.gfid=0x2174086880fc4fd19b187d1384300add trusted.gfid=0x2057e87cf4cc43f9bbad160cbec43d01 trusted.gfid=0xa7d78519db61459399e01fad2badf3fb trusted.gfid=0xfaa0ed7ccaf84f6c8bdb20a7f657c4b4 trusted.gfid=0x2683990126724adbb6416b911180e62b> Let us see if the parents' gfids are the same before deleting anything. > Is the heal info still showing 4 entries? Please also share the getfattr > output of the the parent directory (i.e. dir11) .Yes, the heal info still shows the 4 entries but on node 1 the directory name is not shown anymore but just the GFID. This is the actual output of a "volume heal info": Brick node1:/data/myvol-pro/brick <gfid:25e2616b-4fb6-4b2a-8945-1afc956fff19> <gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360> <gfid:70c894ca-422b-4bce-acf1-5cfb4669abbd> <gfid:aae4098a-1a71-4155-9cc9-e564b89957cf> Status: Connected Number of entries: 4 Brick node2:/data/myvol-pro/brick Status: Connected Number of entries: 0 Brick node3:/srv/glusterfs/myvol-pro/brick /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11 /data/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/oc_dir <gfid:aae4098a-1a71-4155-9cc9-e564b89957cf> <gfid:3c92459b-8fa1-4669-9a3d-b38b8d41c360> Status: Connected Number of entries: 4 What are the next steps in order to fix that?