Hello,
I am trying 1.1 branch and I experience a segmentation fault upon ALRM signal.
This looks like a race condition.
I have my tincd daemon instantiated manually in if-up.d/jmuchemb (without
IF_TINC_NET) and when if-up.d/tinc runs, it sends a ALRM signal that makes tincd
crash.
It fails here:
Core was generated by `tincd -D -n jmuchemb -d -o ConnectTo srv -o srv.Address
81.x.y.z -o Connect'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000040a685 in retry () at net.c:349
349                     if(c->outgoing && !c->node) {
(gdb) p c
$1 = (connection_t *) 0x0
Here the end of strace:
> read(24, "-----BEGIN RSA PUBLIC KEY-----\nM"..., 4096) = 426
> close(24)                               = 0
> munmap(0x7f9f1978d000, 4096)            = 0
> epoll_wait(14, {{EPOLLOUT, {u32=23, u64=23}}}, 32, 4907) = 1
> sendto(23, "0 tecra 17.0\n1 94 64 0 0 A2B583B"..., 538, 0, NULL,
0) = 538
> epoll_ctl(14, EPOLL_CTL_MOD, 23, {EPOLLIN, {u32=23, u64=23}}) = 0
> epoll_wait(14, 69d660, 32, 4904)        = -1 EINTR (Interrupted system
call)
> --- SIGALRM (Alarm clock) @ 0 (0) ---
> sendto(15, "\16", 1, 0, NULL, 0)        = 1
> rt_sigreturn(0xf)                       = -1 EINTR (Interrupted system
call)
> epoll_wait(14, {{EPOLLIN, {u32=16, u64=16}}}, 32, 4886) = 1
> recvfrom(16, "\16", 1024, 0, NULL, NULL) = 1
> recvfrom(16, 0x7f9f191aa4e0, 1024, 0, 0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
> futex(0x7f9f18926840, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> write(2, "Got Alarm clock signal\n", 23) = 23
> write(2, "Could not set up a meta connecti"..., 42) = 42
> write(2, "Trying to re-establish outgoing "..., 56) = 56
> epoll_ctl(14, EPOLL_CTL_DEL, 22, {EPOLLIN, {u32=22, u64=22}}) = 0
> close(22)                               = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
It's easily reproducible for me so I can send more information if you want,
including core dump, binaries (Debian) and strace log.
Regards,
Julien
On Tue, Jun 26, 2012 at 01:38:20PM +0200, Julien Muchembled wrote:> I am trying 1.1 branch and I experience a segmentation fault upon ALRM signal. > This looks like a race condition.It should not be a race condition, since the signals are handled by the main event loop, not in a special signal context. So it is something more serious...> I have my tincd daemon instantiated manually in if-up.d/jmuchemb (without IF_TINC_NET) and when if-up.d/tinc runs, it sends a ALRM signal that makes tincd crash. > > It fails here: > > Core was generated by `tincd -D -n jmuchemb -d -o ConnectTo srv -o srv.Address 81.x.y.z -o Connect'. > Program terminated with signal 11, Segmentation fault. > #0 0x000000000040a685 in retry () at net.c:349 > 349 if(c->outgoing && !c->node) { > (gdb) p c > $1 = (connection_t *) 0x0Thanks for the backtrace!> > write(2, "Got Alarm clock signal\n", 23) = 23 > > write(2, "Could not set up a meta connecti"..., 42) = 42 > > write(2, "Trying to re-establish outgoing "..., 56) = 56 > > epoll_ctl(14, EPOLL_CTL_DEL, 22, {EPOLLIN, {u32=22, u64=22}}) = 0 > > close(22) = 0 > > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > It's easily reproducible for me so I can send more information if you want, including core dump, binaries (Debian) and strace log.Ok, I see the problem already, retry() calls do_outgoing_connection(), which can call connection_del(), which means "node = node->next" in retry() will give wrong results. Expect a fix soon. -- Met vriendelijke groet / with kind regards, Guus Sliepen <guus at tinc-vpn.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: <http://www.tinc-vpn.org/pipermail/tinc/attachments/20120626/03be9eaa/attachment.pgp>