Nicolas Ecarnot
2012-Mar-27 09:14 UTC
[Samba] ctdb_recovery_lock: Failed to get recovery lock
Hi, I'm happily progressing toward the successful setup of my two nodes samba cluster : cman, qdisk, clvm, gfs2, ctdb, samba, winbind, ad. And now, I'm in testing phase. When my cluster is up and running, I can transfer each ip address toward on node or the other, seamlessly. They can fence each other. But I still have one big issue : though they have been setup as clones, they don't behave identically : when shutting down node 1, node 0 takes over every part of ctdb setup (ip, recmaster, services). But when I stop ctdb daemon on node 1, though ctdb node 0 correctly stops its children daemons (nmbd, smbd and winbind) and kills itself, node 1 claims : ctdb_recovery_lock: Failed to get recovery lock on '/ctdb/.ctdb.lock' (This directory is clvm + gfs2 shared, writable and correctly accessible from both nodes) This leads node 1 to get banned. Then, (I guess), when being unbanned, reelection occurs, but I get : Recmaster node 1 no longer available. Force reelection I suppose that node 1 can't become recmaster as it can not get the recovery lock. But there's no way I see why this node claims it can take this lock. I don't know if this may help, but : - I removed the lock file, and restarting ctdb recreates it correctly - Every process is ran as root, who can obviously write in this dir - I don't know if it is correct, but this file weights zero byte? Waiting for your advice, I'm heading to reading the source code, in the hope I may understand what's wrong. -- Nicolas Ecarnot