Ex Vito
2008-Jun-06 12:01 UTC
[asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (TC400B at fault ?)
Hi list, Looking to share info and obtain peer feedback. Current possibilities: bad config, bad hw or asterisk/zaptel bug. System: HP Proliant DL380 G5 Installed HW: TE220B to PSTN, TE122 to ChannelBank and TC400B. OS: CentOS 5.1 kernel 2.6.18-53.1.21.el5 Asterisk: 1.4.20.1 Zaptel: 1.4.11 Events / History --------------------- May 29th - started production on the evening - TC400B was not on the system as it was not available by then June 4th - installed TC400B at the end of the day - test IAX/G.729 calls ok June 5th - 10.40: hang - 16.40: hang - 19.00: rebuild asterisk with DEBUG_LOCK + THREAD June 6th (today) - 11.35h: hang So, in short, after we installed the TC400B, the system appears to hang systematically. (which is really bad because we already had to RMA a TC400B twice for this system). Detail --------- When it hung about an hour ago we tested: - FXS @ channelbank @ TE122 => voicemail FAILS - FXS @ channelbank @ TE122 => PSTN @ TE220B WORKED ONCE - FXS @ channelbank @ TE122 => SIP phone FAILS - SIP phone => anywhere (voicemail, SIP, FXS @ channelbank, PSTN @ PRI) FAILS - PSTN @ TE220B => anywhere FAILS - IAX => anywhere FAILS asterisk log has 12 of: ERROR [14733] chan_sip.c: We could NOT get the channel lock for SIP/000e08dfdc72-0a107670! ERROR[14733] chan_sip.c: SIP transaction failed: 43e5ad5f6dc5b58c46c597cd2af0c31e at 192.168.161.40 ...followed by thousands of: NOTICE[30599] chan_iax2.c: Avoiding IAX destroy deadlock (log contains similar messages for the yesterday hangs) asterisk CPU usage is apparently none load is at about 3 network access to the system is ok dmesg kernel message buffer looks ok CLI core show locks shows lots of info which we're not able to decode (attached) CLI stop now has no effect kill <pid> has no effect kill -9 <pid> leads to <zombie> process shutdown -r now leads to kernel panic probably while stopping zaptel because the TE122 and TE220B drivers were not unloaded (attached) In Our Heads ------------------ - we're suspecting that the presence of the TC400B is making asterisk behave in different ways that lead to what we're now calling a hang (that is the apparent change in the system since it started mis-behaving) - as such we're considering removing the TC400B to see if the system stabilizes - however removing it may remove the possibility of further diagnosing this issue and trying fixes - of course, we're trying to manage customer expectations and satisfaction at the same time Extra Context Info ------------------------ - system serves ~100 SIP extensions - system peers with a dozen other systems withing the VPN (dundi+iax) Thanks in advance for any feedback or pointer that can help us identify, workaround and, ideally, fix this behaviour. Cheers, -- exvito -------------- next part -------------- A non-text attachment was scrubbed... Name: summary-log.txt.gz Type: application/x-gzip Size: 7202 bytes Desc: not available Url : http://lists.digium.com/pipermail/asterisk-users/attachments/20080606/30aa0ef0/attachment.bin
Ex Vito
2008-Jun-06 13:27 UTC
[asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (TC400B at fault ?)
On Fri, Jun 6, 2008 at 1:01 PM, Ex Vito <ex.vitorino at gmail.com> wrote:> In Our Heads > ------------------ > - we're suspecting that the presence of the TC400B is making asterisk behave > in different ways that lead to what we're now calling a hang (that is the > apparent change in the system since it started mis-behaving) > - as such we're considering removing the TC400B to see if the system > stabilizes however removing it may remove the possibility of further > diagnosing this issue and trying fixes > - of course, we're trying to manage customer expectations and > satisfaction at the same time >...other possibility: - instead of removing the TC400B, change the IAX trunk codec to GSM instead of G.729... this would prevent the TC400B usage and may lead to different (as in stable) behaviour More troubleshooting ideas ? -- exvito