Ex Vito
2009-Nov-11 20:08 UTC
[asterisk-users] TE121 - Idle system load at ~0.3 - Bad DAHDI 2.2.0.2 behaviour ?!
Hi Asterisk Users, We've been experiencing some tough time regarding a new Asterisk installation connected to the PSTN via an ISDN PRI with a Digium TE121 with the optional VPMADT032 echo cancellation module. For now, I'll focus on something very specific which is summarized on this email's subject. However, here are some general facts for the context: - System pbxfri went into production about a month ago. - System pbxfrv is HW+SW "copy+paste" of pbxfri not in production yet. - Had several incidents where the PSTN connection was not operational (calls had bad quality/echo or PRI trunk could not be used for either inbound or outbound) - Most of the incidents (maybe all of them, haven't verified thourougly) are asso- ciated to hundreds/thousands of "HDLC Abort" / "Bad FCS" messages in the asterisk log. - DAHDI + Asterisk + libpri never seemed to recover from those conditions. We manually had to stop Asterisk, unload+load DAHDI, start Asterisk. - Had at least on kernel panic on DAHDI load. - We have logs + traces and are working with the telco so as to try to fully diagnose what's going on here. For now we'd like to focus on the following (but if you think we should start somewhere else, please, by all means, fire away!): - Lots of info out there (google) seems to associate the "HDLC Abort" / "Bad FCS" with a system hardware issue - whatever it is: interrupts, badly behaved NICs, disk array controllers, etc. Question #1: What do these messages actually mean ? Can they be associated to a bad link/telco switch configuration ? - We've noticed that the system load at idle is about 0.3 when DAHDI is loaded. If we unload DAHDI, system load at idle goes to appoximately 0, as expected. Question #2: This looks like a very odd behaviour. We've installed several other systems (different HW/SW versions, however) without seeing such behaviour. Is this expected or could this be related with the "HDLC Aborts" / "Bad FCS" and general failures we've been experiencing ? System info (same for both): HW: HP Proliant ML310 G5 TE121 + VPMADT032 AEX410 + 4x FXS + without DSP OS: CentOS 5.3, kernel 2.6.18-164.el5 DAHDI: 2.2.0.2 libpri: 1.4.10.2 Asterisk: 1.4.26.2 Here is a session transcript for pbxfrv (not in production) showing the odd DAHDI / system load behaviour. It starts with DAHDI unloaded: # uname -a Linux pbxfrv.replaced 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009 i686 i686 i386 GNU/Linux # cat /proc/cmdline ro root=/dev/vg0/lv00 console=tty0 console=ttyS1,115200 # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 25288985 25275219 25290489 25274409 IO-APIC-edge timer 1: 3 0 0 0 IO-APIC-edge i8042 3: 24819 20503 24395 19262 IO-APIC-edge serial 8: 14 16 13 11 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-level acpi 12: 3 0 1 0 IO-APIC-edge i8042 74: 0 0 0 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5 82: 21 24 21 30 IO-APIC-level uhci_hcd:usb6 90: 17 16 14 16 IO-APIC-level ata_piix, ata_piix 106: 77476 0 0 0 PCI-MSI eth0 169: 1912615 1909646 1911302 1910566 IO-APIC-level ioc0 NMI: 0 0 0 0 LOC: 101129266 101132004 101132444 101128234 ERR: 0 MIS: 0 # uptime 17:52:15 up 1 day, 4:07, 1 user, load average: 0.00, 0.07, 0.06 # dmesg ... ACPI: PCI interrupt for device 0000:05:08.0 disabled Freed a Wildcard ACPI: PCI interrupt for device 0000:08:08.0 disabled Freed a Wildcard TE12xP. dahdi: Telephony Interface Unloaded # /etc/init.d/dahdi start Loading DAHDI hardware modules: wcte12xp: [ OK ] wctdm24xxp: [ OK ] Running dahdi_cfg: [ OK ] # dmesg ... dahdi: Telephony Interface Registered on major 196 dahdi: Version: 2.2.0.2 PCI: Enabling device 0000:08:08.0 (0150 -> 0153) ACPI: PCI Interrupt 0000:08:08.0[A] -> GSI 19 (level, low) -> IRQ 185 wcte12xp: VPM present and operational (Firmware version 117) wcte12xp: Setting up global serial parameters for E1 wcte12xp: Found a Wildcard TE121 PCI: Enabling device 0000:05:08.0 (0150 -> 0153) wcte12xp0: Missed interrupt. Increasing latency to 4 ms in order to compensate. ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 18 (level, low) -> IRQ 177 Port 1: Installed -- AUTO FXS/DPO Port 2: Installed -- AUTO FXS/DPO Port 3: Installed -- AUTO FXS/DPO Port 4: Installed -- AUTO FXS/DPO VPM100: Not Present Found a Wildcard TDM: Wildcard AEX410 (4 modules) dahdi: Registered tone zone 0 (United States / North America) dahdi_echocan_mg2: Registered echo canceler 'MG2' wcte12xp0: Missed interrupt. Increasing latency to 5 ms in order to compensate. wctdm24xxp0: Missed interrupt. Increasing latency to 4 ms in order to compensate. dahdi: Registered tone zone 25 (Portugal) wcte12xp: Span configured for CCS/HDB3/CRC4 # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 143066 143547 142836 143602 IO-APIC-edge timer 1: 2 1 0 0 IO-APIC-edge i8042 3: 104 122 115 125 IO-APIC-edge serial 8: 0 1 1 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-level acpi 12: 2 1 0 1 IO-APIC-edge i8042 74: 0 0 0 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5 82: 35 17 28 16 IO-APIC-level uhci_hcd:usb6 90: 17 15 13 18 IO-APIC-level ata_piix, ata_piix 98: 2820 0 0 0 PCI-MSI eth0 169: 27848 28037 27938 27759 IO-APIC-level ioc0 177: 21087 21083 21051 21149 IO-APIC-level wctdm24xxp0 185: 39542 38625 39457 38761 IO-APIC-level wcte12xp0 NMI: 0 0 0 0 LOC: 572349 572761 572610 572485 ERR: 0 MIS: 0 # sleep 60 # uptime 17:56:13 up 1 day, 4:11, 1 user, load average: 0.39, 0.19, 0.10 # /etc/init.d/dahdi stop Unloading DAHDI hardware modules: done # dmesg | tail ... ACPI: PCI interrupt for device 0000:05:08.0 disabled Freed a Wildcard ACPI: PCI interrupt for device 0000:08:08.0 disabled Freed a Wildcard TE12xP. dahdi: Telephony Interface Unloaded # sleep 60 # uptime 18:00:37 up 1 day, 4:15, 1 user, load average: 0.01, 0.10, 0.08 Extra information / question: A quick peek at https://issues.asterisk.org/view.php?id=15498&nbn=18 also lead me to test loading the wcte12xp driver with "vpmsupport=0". The system load behaviour is exactly the same. Do you think it could be related ? How would you go about diagnosing this behaviour ? Thanks in advance. Kind regards, -- exvito
Shaun Ruffell
2009-Nov-13 19:18 UTC
[asterisk-users] TE121 - Idle system load at ~0.3 - Bad DAHDI 2.2.0.2 behaviour ?!
On 11/11/2009 02:08 PM, Ex Vito wrote:> We've been experiencing some tough time regarding a new Asterisk installation > connected to the PSTN via an ISDN PRI with a Digium TE121 with the optional > VPMADT032 echo cancellation module.It appears there may be a regression in dahdi-linux 2.2.0 with regards to the wcte12xp driver and the VPMADT032 module (as discussed https://issues.asterisk.org/view.php?id=15724). Would you be willing to try at least revision 7584 of http://svn.asterisk.org/svn/dahdi/linux/branches/2.2 and report your results on that issue?> > Extra information / question: > > A quick peek at https://issues.asterisk.org/view.php?id=15498&nbn=18 > also lead me to test loading the wcte12xp driver with "vpmsupport=0". > > The system load behaviour is exactly the same.The idle load you're seeing can be a little misleading, but essentially, once you load the drivers for both the wctdm24xxp and wcte12xp, there is a fixed cost associated with continuously moving the TDM data to/from the card. The load imposed by the drivers would only go up after this point if a) software echocan is enabled, or b) you're conferencing many calls in the kernel. Otherwise....it's fixed. -- Shaun Ruffell Digium, Inc. | Linux Kernel Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA Check us out at: www.digium.com & www.asterisk.org