Here is an update, 1. Reviewed 'core show locks' with the help of russellb @ #asterisk-devs last friday 2. Recommended recompilling asterisk with DONT_OPTIMIZE and getting a stack trace with: # gdb /usr/sbin/asterisk $(pidof asterisk) (gdb) set pagination off (gdb) thread apply all bt We did reinstall asterisk with the new compile flags back then and just experienced another hang now (weekend, monday and tuesday were very low activity days). Unfortunatelly, gdb seems to hang on startup, after what seems to be a thread list. It never gets to the "reading symbols from..." steps. As such, no gdb prompt -> no stack trace ! :-/ ps shows gdb process as <defunct> and, as such, it responds to no signals; asterisk seems to not respond to signals as well... (maybe that's why gdb hangs... I really do not know how gdb works in regards to attaching itself to a running process) Again we have a 'core show locks' + 'core show threads' output from asterisk which we have no skills to read... Lastly, asterisk log displays 12x... [Jun 11 09:41:07] ERROR[4837] chan_sip.c: SIP transaction failed: 588233f5261d52ac621587ca327b5083 at 192.168.161.40 [Jun 11 09:41:07] ERROR[4837] chan_sip.c: We could NOT get the channel lock for SIP/000e08de4cbe-097555c8! ...then... [Jun 11 09:41:19] WARNING[4837] chan_sip.c: Maximum retries exceeded on transmission 588233f5261d52ac621587ca327b5083 at 192.168.161.40 for seqno 102 (Critical Request) ...and finally about 1200 of these: [Jun 11 09:42:59] WARNING[4842] chan_iax2.c: Max retries exceeded to host 192.168.166.40 on IAX2/private-13779 (type = 6, subclass = 11, ts=40022, seqno=10) ...with several "combinations" of: - the number inside WARNING[xxx] -> 13 different - the host IP: 192.168.166.40 and 192.168.170.40 - the iax channel -> 12 different Till today, our gut feelings were: 1. The TC400B installation / usage change (idea: asterisk responds to no signals because it is waiting in kernel space, maybe something's wrong with zaptel, wctc4xxp, our HW ?) 2. The activation of a voicemail account with MWI We now have an extra possibility: - This system exchanges IAX calls with several other systems - The hanging one is running asterisk 1.4.20.1, but all the others are running 1.4.19 - The changelog from 1.4.19 -> 1.4.20.1 includes several chan_iax fixes --> could the absense of such fixes in this system's iax peers be leading it to hang ? Possibility: 3. Upgrade all peers to 1.4.20.1 Again, if anyone can chime in with their contribution, thanks in advance. Question of the day: why on earth does gdb hang ?! (our guess: because asterisk does not respond to signals... now why ?!) Cheers, -- exvito
Steve Totaro
2008-Jun-11 11:33 UTC
[asterisk-users] 1.4.20.1 hang -- extra info + gdb hangs
On Wed, Jun 11, 2008 at 6:23 AM, Ex Vito <ex.vitorino at gmail.com> wrote:> Here is an update, > > 1. Reviewed 'core show locks' with the help of russellb @ #asterisk-devs > last friday > > 2. Recommended recompilling asterisk with DONT_OPTIMIZE and > getting a stack trace with: > # gdb /usr/sbin/asterisk $(pidof asterisk) > (gdb) set pagination off > (gdb) thread apply all bt > > We did reinstall asterisk with the new compile flags back then and just > experienced another hang now (weekend, monday and tuesday > were very low activity days). > > Unfortunatelly, gdb seems to hang on startup, after what seems to be a > thread list. It never gets to the "reading symbols from..." steps. As such, > no gdb prompt -> no stack trace ! :-/ > > ps shows gdb process as <defunct> and, as such, it responds to no signals; > asterisk seems to not respond to signals as well... (maybe that's why gdb > hangs... I really do not know how gdb works in regards to attaching itself > to a running process) > > Again we have a 'core show locks' + 'core show threads' output from asterisk > which we have no skills to read... > > Lastly, asterisk log displays 12x... > > [Jun 11 09:41:07] ERROR[4837] chan_sip.c: SIP transaction failed: > 588233f5261d52ac621587ca327b5083 at 192.168.161.40 > [Jun 11 09:41:07] ERROR[4837] chan_sip.c: We could NOT get the channel > lock for SIP/000e08de4cbe-097555c8! > > ...then... > > [Jun 11 09:41:19] WARNING[4837] chan_sip.c: Maximum retries exceeded > on transmission 588233f5261d52ac621587ca327b5083 at 192.168.161.40 for > seqno 102 (Critical Request) > > ...and finally about 1200 of these: > > [Jun 11 09:42:59] WARNING[4842] chan_iax2.c: Max retries exceeded to > host 192.168.166.40 on IAX2/private-13779 (type = 6, subclass = 11, > ts=40022, seqno=10) > > ...with several "combinations" of: > - the number inside WARNING[xxx] -> 13 different > - the host IP: 192.168.166.40 and 192.168.170.40 > - the iax channel -> 12 different > > > Till today, our gut feelings were: > > 1. The TC400B installation / usage change > (idea: asterisk responds to no signals because it is waiting in > kernel space, > maybe something's wrong with zaptel, wctc4xxp, our HW ?) > > 2. The activation of a voicemail account with MWI > > We now have an extra possibility: > > - This system exchanges IAX calls with several other systems > - The hanging one is running asterisk 1.4.20.1, but all the others > are running 1.4.19 > - The changelog from 1.4.19 -> 1.4.20.1 includes several chan_iax > fixes --> could the absense of such fixes in this system's iax peers > be leading it to hang ? > > Possibility: > 3. Upgrade all peers to 1.4.20.1 > > > Again, if anyone can chime in with their contribution, thanks in advance. > > Question of the day: why on earth does gdb hang ?! (our guess: because > asterisk does not respond to signals... now why ?!) > > > Cheers, > -- > exvito >Try switching from IAX to SIP. Thanks, Steve T