Ravishankar N
2018-Nov-09 01:11 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
On 11/08/2018 06:09 PM, mabi wrote:> ??????? Original Message ??????? > On Thursday, November 8, 2018 11:05 AM, Ravishankar N <ravishankar at redhat.com> wrote: > >> It is not a split-brain. Nodes 1 and 3 have xattrs indicating a pending >> entry heal on node2 , so heal must have happened ideally. Can you check >> a few things? >> - Is there any disconnects between each of the shds and the brick >> processes (check via statedump or look for disconnect messages in >> glustershd.log). Does restarting shd via a `volume start force` solve >> the problem? > Yes there is one disconnect at 14:21 (UTC 13:21) because node2 ran out of memory (although it has 32 GB of RAM) and I had to reboot it. Here are the relevant log entries taken from glustershd.log on node1: > > [2018-11-05 13:21:16.284239] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-myvol-pro-client-1: server 192.168.10.33:49154 has not responded in the last 42 seconds, disconnecting. > [2018-11-05 13:21:16.284385] I [MSGID: 114018] [client.c:2254:client_rpc_notify] 0-myvol-pro-client-1: disconnected from myvol-pro-client-1. Client process will keep trying to connect to glusterd until brick's port is available > [2018-11-05 13:21:16.284889] W [rpc-clnt-ping.c:222:rpc_clnt_ping_cbk] 0-myvol-pro-client-1: socket disconnected > > I also just ran a "volume start force" and saw that the glustershd processes got restarted on all 3 nodes but that did not trigger any healing. There are still the same amount of files/dirs pending heal... > >> - Is the symlink pointing to oc_dir present inside .glusterfs/25/e2 in >> all 3 bricks? > They are yes for node1 and node3 but node2 there is no such symlink... > > I hope that helps to debug the issue further, else please let me know if you need more infoPlease re-create the symlink on node 2 to match how it is in the other nodes and launch heal again. Check if this is the case for other entries too. -Ravi
mabi
2018-Nov-12 08:49 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
??????? Original Message ??????? On Friday, November 9, 2018 2:11 AM, Ravishankar N <ravishankar at redhat.com> wrote:> Please re-create the symlink on node 2 to match how it is in the other > nodes and launch heal again. Check if this is the case for other entries > too. > -RaviI can't create the missing symlink on node2 because the target (../../70/c8/70c894ca-422b-4bce-acf1-5cfb4669abbd/oc_dir) of that link does not exist. So basically the symlink and the target of that symlink are missing. Or shall I create a symlink to a non-existing target?
mabi
2018-Nov-13 07:39 UTC
[Gluster-users] Self-healing not healing 27k files on GlusterFS 4.1.5 3 nodes replica
??????? Original Message ??????? On Friday, November 9, 2018 2:11 AM, Ravishankar N <ravishankar at redhat.com> wrote:> Please re-create the symlink on node 2 to match how it is in the other > nodes and launch heal again. Check if this is the case for other entries > too. > -RaviPlease ignore my previous mail, I was looking for a symlink with the GFID of node1 or node 3 on my node2 whereas I should have been looking with the GFID of node2 of course. I have now found the symlink on node2 pointing to that problematic directory and it looks like this: node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac node2# ls -la | grep d9ac19 lrwxrwxrwx 1 root root 66 Nov 5 14:12 d9ac192c-e85e-4402-af10-5551f587ed9a -> ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir When you say "re-create the symlink", do you mean I should delete the current symlink on node2 (d9ac192c-e85e-4402-af10-5551f587ed9a) and re-create it with the GFID which is used on my node 1 and node 3 like this? node2# cd /data/myvol-pro/brick/.glusterfs/d9/ac node2# rm d9ac192c-e85e-4402-af10-5551f587ed9a node2# cd /data/myvol-pro/brick/.glusterfs/25/e2 node2# ln -s ../../10/ec/10ec1eb1-c854-4ff2-a36c-325681713093/oc_dir 25e2616b-4fb6-4b2a-8945-1afc956fff19 Just want to make sure I understood you correctly before doing that. Could you please let me know if this is correct? Thanks