thr3ads.net - Gluster users - [Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Chalcogen

2014-Feb-21 00:21 UTC

[Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems

Hi everybody,

This is more of a part of a larger wishlist:

I found out that when a peer probe is performed by the user,
mgmt/glusterd write a file named after the hostname of the peer in
question. On successful probes, this file is replaced with a file named
after the UUID of the glusterd instance on the peer, while a failed
probe causes the temp file to simply get deleted.

Here's an illustration:

root at someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host
&
[1] 25918
root at someserver:/var/lib/glusterd/peers] cat some_non_host
uuid=00000000-0000-0000-0000-000000000000
state=0
hostname1=ksome_non_host
root at someserver:/var/lib/glusterd/peers]
root at someserver:/var/lib/glusterd/peers] peer probe: failed: Probe
returned with unknown errno 107

[1]+ Exit 1 gluster peer probe some_non_host
root at someserver:/var/lib/glusterd/peers] ls
root at someserver:/var/lib/glusterd/peers]

Here's the deal. When, for some reason, glulsterd is killed off before
it get a chance to clean up on the temp file (say for a peer that really
doesn't exist), and then, if you reboot your machine, the temporary file
will really break mgmt/glusterd's recovery graph, and glusterd will be
unable to initialize any of the existing volumes without having to
delete the tmp file manually.

It seems to me that mgmt/glusterd should have the intelligence to
distinguish between a genuine peer and a temp file created during probe.
The temp file should not affect the recovery graph after reboot.
Something like a <peer-name>.tmp? Preferably, also delete any temp file
discovered during recovery at startup?

I reported a bug over this at bugzilla. Its
https://bugzilla.redhat.com/show_bug.cgi?id=1067733.

Thanks,
Anirban
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140221/b83d1387/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

Gluster users - Feb 2014 - Failed cleanup on peer probe tmp file causes volume re-initialization problems

[Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems

Reasonably Related Threads

Wisdom of the Ancients