Chalcogen
2014-Feb-21 00:21 UTC
[Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems
Hi everybody, This is more of a part of a larger wishlist: I found out that when a peer probe is performed by the user, mgmt/glusterd write a file named after the hostname of the peer in question. On successful probes, this file is replaced with a file named after the UUID of the glusterd instance on the peer, while a failed probe causes the temp file to simply get deleted. Here's an illustration: root at someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host & [1] 25918 root at someserver:/var/lib/glusterd/peers] cat some_non_host uuid=00000000-0000-0000-0000-000000000000 state=0 hostname1=ksome_non_host root at someserver:/var/lib/glusterd/peers] root at someserver:/var/lib/glusterd/peers] peer probe: failed: Probe returned with unknown errno 107 [1]+ Exit 1 gluster peer probe some_non_host root at someserver:/var/lib/glusterd/peers] ls root at someserver:/var/lib/glusterd/peers] Here's the deal. When, for some reason, glulsterd is killed off before it get a chance to clean up on the temp file (say for a peer that really doesn't exist), and then, if you reboot your machine, the temporary file will really break mgmt/glusterd's recovery graph, and glusterd will be unable to initialize any of the existing volumes without having to delete the tmp file manually. It seems to me that mgmt/glusterd should have the intelligence to distinguish between a genuine peer and a temp file created during probe. The temp file should not affect the recovery graph after reboot. Something like a <peer-name>.tmp? Preferably, also delete any temp file discovered during recovery at startup? I reported a bug over this at bugzilla. Its https://bugzilla.redhat.com/show_bug.cgi?id=1067733. Thanks, Anirban -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140221/b83d1387/attachment.html>