Hi there,
I'm seeing quite frequent segfaults around check_dead_connections() and
terminate_connection() when the tcp meta connection to a node times out
(or is e.g. firewalled), usually it happens when there's heavy packet loss:
Program terminated with signal 11, Segmentation fault.
#0 edge_del (e=0x1b71ba0) at edge.c:96
96 avl_delete(e->from->edge_tree, e);
(gdb) bt
#0 edge_del (e=0x1b71ba0) at edge.c:96
#1 0x0000000000408a65 in terminate_connection (report=false,
c=0x1abca00) at net.c:188
#2 terminate_connection (c=0x1abca00, report=false) at net.c:168
#3 0x0000000000409579 in check_dead_connections () at net.c:263
#4 main_loop () at net.c:444
#5 0x000000000040478a in main (argc=<optimized out>, argv=<optimized
out>) at tincd.c:656
(gdb) p *e
$1 = {from = 0x3932366432203231, to = 0x61626c6120373462, address = {
... }, options = 908079418, weight = 1663055157, connection =
0x3034393120, reverse = 0x0}
(gdb) up
#1 0x0000000000408a65 in terminate_connection (report=false,
c=0x1abca00) at net.c:188
188 edge_del(c->edge);
(gdb) p *c
$2 = {name = 0x1abb560 "...", address = { ... }, hostname = 0x1b6dc70
"... port 655", protocol_version = 17, socket = 15, options = 0,
status
= {pinged = 0, active = 0, connecting = 0, unused_termreq = 0, remove =
1, timeout = 0, encryptout = 0, decryptin = 0, mst = 0, unused = 0},
estimated_weight = 1011, start = {tv_sec = 1383624081, tv_usec =
156843}, outgoing = 0x1abc4c0, node = 0x1ad8ef0, edge = 0x1b71ba0,
rsa_key = 0x0, incipher = 0x7ff0d99eea20, outcipher = 0x7ff0d99eea20,
inctx = 0x0, outctx = 0x0, inkey = 0x0, outkey = 0x0, inkeylength = 0,
outkeylength = 0, indigest = 0x7ff0d99ef8c0, outdigest = 0x7ff0d99ef8c0,
inmaclength = 0, outmaclength = 0, incompression = 0, outcompression =
0, mychallenge = 0x0, hischallenge = 0x0, buffer = "...", buflen = 0,
reqlen = 0, tcplen = 0, allow_request = 0, outbuf = 0x1ab75d0 "0 ...
17\n", outbufstart = 0, outbuflen = 0, outbufsize = 14, last_ping_time =
1383624085, last_flushed_time = 1383624085, config_tree = 0x1abb580}
(gdb) p c->status.remove
$3 = 1
(gdb) p now
$4 = 1383624087
(gdb) p pingtimeout
$5 = 2
It seems as if something else already cleaned up the connection, also
c->status.remove == 1, but we still got to line 263.
Another:
Program terminated with signal 11, Segmentation fault.
#0 edge_del (e=0x2598c20) at edge.c:93
93 e->reverse->reverse = NULL;
(gdb) bt
#0 edge_del (e=0x2598c20) at edge.c:93
#1 0x0000000000408a65 in terminate_connection (report=false,
c=0x258ccf0) at net.c:188
#2 terminate_connection (c=0x258ccf0, report=false) at net.c:168
#3 0x0000000000409579 in check_dead_connections () at net.c:263
#4 main_loop () at net.c:444
#5 0x000000000040478a in main (argc=<optimized out>, argv=<optimized
out>) at tincd.c:656
(gdb) p *e
$1 = {from = 0x2598c40, to = 0x52779ed8, address = { ... }, options =
824193328, weight = 960048688, connection = 0x32332f30312e312e, reverse
= 0x303123}
(gdb) p *e->reverse
Cannot access memory at address 0x303123
(gdb) p c->status
$2 = {pinged = 0, active = 0, connecting = 0, unused_termreq = 0, remove
= 1, timeout = 0, encryptout = 0, decryptin = 0, mst = 0, unused = 0}
One more:
Program terminated with signal 11, Segmentation fault.
#0 avl_search_closest_node (tree=0x10001, data=0x183f820,
result=0x7fffd5ba5e9c) at avl_tree.c:346
346 node = tree->root;
(gdb) bt
#0 avl_search_closest_node (tree=0x10001, data=0x183f820,
result=0x7fffd5ba5e9c) at avl_tree.c:346
#1 0x0000000000404ede in avl_search_node (tree=<optimized out>,
data=<optimized out>) at avl_tree.c:335
#2 0x0000000000405469 in avl_delete (tree=0x10001, data=<optimized
out>) at avl_tree.c:645
#3 0x0000000000408a65 in terminate_connection (report=false,
c=0x1803790) at net.c:188
#4 terminate_connection (c=0x1803790, report=false) at net.c:168
#5 0x0000000000409579 in check_dead_connections () at net.c:263
#6 main_loop () at net.c:444
#7 0x000000000040478a in main (argc=<optimized out>, argv=<optimized
out>) at tincd.c:656
(gdb) p *tree
Cannot access memory at address 0x10001
(gdb) p c->status
$2 = {pinged = 0, active = 0, connecting = 0, unused_termreq = 0, remove
= 1, timeout = 0, encryptout = 1, decryptin = 0, mst = 0, unused = 0}
Those are with 1.0.23 but we saw similar with 1.0.21. OS is Ubuntu 12.04.
Any ideas? Let me know if some additional information would be helpful.
Thanks!
-Tuomas Silen