Using tincd 1.0.7, if I send a SIGALRM to tincd when a host is
unresolvable, it gets stuck in a nasty loop:
Feb 12 19:33:02 rosalyn tinc.slamb.org[2925]: Got ALRM signal
Feb 12 19:33:02 rosalyn tinc.slamb.org[2925]: Trying to connect to
calvin (216.136.66.56 port 655)
Feb 12 19:33:02 rosalyn tinc.slamb.org[2925]: Error looking up slamb-
linux.dyn.slamb.org port 4500: Name or service not known
Feb 12 19:33:02 rosalyn tinc.slamb.org[2925]: Could not set up a meta
connection to slamb_linux
Feb 12 19:33:02 rosalyn tinc.slamb.org[2925]: Trying to re-establish
outgoing connection in 15 seconds
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Flushing event queue
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Error looking up slamb-
linux.dyn.slamb.org port 4500: Name or service not known
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Could not set up a meta
connection to slamb_linux
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Trying to re-establish
outgoing connection in 20 seconds
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Error looking up slamb-
linux.dyn.slamb.org port 4500: Name or service not known
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Could not set up a meta
connection to slamb_linux
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Trying to re-establish
outgoing connection in 25 seconds
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Error looking up slamb-
linux.dyn.slamb.org port 4500: Name or service not known
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Could not set up a meta
connection to slamb_linux
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Trying to re-establish
outgoing connection in 30 seconds
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Error looking up slamb-
linux.dyn.slamb.org port 4500: Name or service not known
Feb 12 19:33:03 rosalyn tinc.slamb.org[2925]: Could not set up a meta
connection to slamb_linux
...
During this process, it keeps consuming memory until the kernel's out-
of-memory killer gets rid of it. If I take the unresolvable address
out of the configuration, it works fine.
Looks like the problem is this:
logger(LOG_INFO, _("Flushing event queue"));
while(event_tree->head) {
event = event_tree->head->data;
event->handler(event->data);
event_del(event);
}
There's initially a setup_outgoing_connection() event there. It calls
do_outgoing_connection(), which on resolve failure calls
retry_outgoing(), which adds another setup_outgoing_connection()
event. Events are added as fast as they are taken away, and the flush
never terminates. And apparently connections are only removed in
build_fdset() and terminate_connection(), which aren't getting called
in this tight loop.
I've attached a patch (tinc-1.0.7-flushfix.patch) that only flushes
events that already exist. I've also attached a patch (flush-1.0.7-
leaks.patch) that fixes a couple minor memory leaks I spotted in
"valgrind --tool=memcheck" output while looking for this problem.
Cheers,
Scott
--
Scott Lamb <http://www.slamb.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tinc-1.0.7-flushfix.patch
Type: application/octet-stream
Size: 1782 bytes
Desc: not available
Url :
http://brouwer.uvt.nl/pipermail/tinc/attachments/20070213/3a5dbac0/tinc-1.0.7-flushfix.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tinc-1.0.7-leaks.patch
Type: application/octet-stream
Size: 1460 bytes
Desc: not available
Url :
http://brouwer.uvt.nl/pipermail/tinc/attachments/20070213/3a5dbac0/tinc-1.0.7-leaks.obj