Christian Weeks
2010-Sep-08 15:38 UTC
[asterisk-users] Problem with new AEX800 card dying because of interrupt problems
Hello I purchased an AEX800 card to replace the ageing cheap channel bank/T1 card solution a few months ago, assuming that it would be a more robust solution for my small scale phone system. However, it appears to be anything but that. Originally implemented as a XEN dom-u virtual machine on a large server class machine, using PCI passthrough to pass the AEX800 and a small older TDM400, then recently migrated to the dom-0, the aex800 has continued to experience interrupt errors: wctdm24xxp 0000:04:08.0: Missed interrupt. Increasing latency to 8 ms in order to compensate. wctdm24xxp 0000:04:08.0: ERROR: Unable to service card within 25 ms and unable to further increase latency. Eventually, it gets to be too much and the card dies: wctdm24xxp 0000:04:08.0: Host failed to service card interrupt within 128 ms which is a hardunderun. oh and also: wctdm24xxp 0000:04:08.0: Power alarm on module 4, resetting! Now, these interrupt problems have occured inside and outside a VM. So far I've had the very unhelpful advice to "move the interrupt". Given that this is a PCI express board, and should be delivering MSI interrupts which are immovable, that seems to be somewhat impossible. The BIOS certainly has nothing (and no machine I've had in the past few years seems to have had such a feature- interrupts are programmed by the APIC these days). So I am asking the list, do you have any advice except perhaps to go back to the broken channel bank? Is it really true that my modern server class machine (quad core xeon) cannot handle the AEX800, whereas my seven year old AMD desktop (previous host to the T1) could handle what seems to have been about 3x the capacity? Isn't this a massive regression? I tried upgrading dahdi to 2.4.0 because there was promise of an interrupt handler rewrite there for the wctdm24xxp driver, but it has made no difference. It should also be noted that when the driver was inside the dom-u, I got about a week's uptime from the card. In the dom-0 I'm getting about 8 hours of uptime. Many thanks Christian
Shaun Ruffell
2010-Sep-08 16:06 UTC
[asterisk-users] Problem with new AEX800 card dying because of interrupt problems
On 09/08/2010 10:38 AM, Christian Weeks wrote:> So I am asking the list, do you have any advice except perhaps to go > back to the broken channel bank? Is it really true that my modern server > class machine (quad core xeon) cannot handle the AEX800, whereas my > seven year old AMD desktop (previous host to the T1) could handle what > seems to have been about 3x the capacity? Isn't this a massive > regression?Does the AEX800 work fine in your old AMD desktop? If the wctdm24xxp driver is having problems servicing the interrupt in a timely fashion in your server I would be surprised if other cards in the same system wouldn't also experience high interrupt latencies which would probably manifest itself as pops and noise on the channels. Some server class machines can have problems since they aren't optimized for "real-time" performance but are instead optimized for overall throughput (typically) and there are timing requirements for telephony. In other words, it doesn't matter if your server can handle a thousand channels...if it can't service any one channel within 25ms consistently, you're going to have issues with audio. I would recommend: a) checking the transfer rate to your hard drive ('hdparam -t /dev/[sda|hda]'). If it's below 4MB/s that's the likely culprit. Sometimes setting the kernel command line parameter to "hda=none" can help depending on the kernel version you're using. I've also seen slow transfer rates fixed by changing BIOS settings. b) Use cyclictest (https://rt.wiki.kernel.org/index.php/Cyclictest) and then stress your system to make sure maximum latencies remain low without DAHDI loaded. System Management Interrupts / Baseboard Management Controllers can cause problems here on some servers. If cyclictest is shows you have some maximum latency above 128ms, I would recommend trying to fix that first, but if for some reason you can't, you could trade some of your system memory for increased tolerance to system conditions by editing the DRING_SIZE in drivers/dahdi/voicebus.h to 256 or 512 depending on what cyclictest reported what your maximum latency is. Keep in mind this isn't a "fix" since you'll still have problems in your audio for any latency above 25ms. Good luck, -- Shaun Ruffell Digium, Inc. | Linux Kernel Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA Check us out at: www.digium.com & www.asterisk.org
Benny Amorsen
2010-Sep-13 14:52 UTC
[asterisk-users] Problem with new AEX800 card dying because of interrupt problems
Christian Weeks <cpw at weeksfamily.ca> writes:> Hello > I purchased an AEX800 card to replace the ageing cheap channel bank/T1 > card solution a few months ago, assuming that it would be a more robust > solution for my small scale phone system. However, it appears to be > anything but that. > > Originally implemented as a XEN dom-u virtual machine on a large server > class machine, using PCI passthrough to pass the AEX800 and a small > older TDM400, then recently migrated to the dom-0, the aex800 has > continued to experience interrupt errors: > > wctdm24xxp 0000:04:08.0: Missed interrupt. Increasing latency to 8 ms in > order to compensate. > wctdm24xxp 0000:04:08.0: ERROR: Unable to service card within 25 ms and > unable to further increase latency.Can you do a at /proc/interrupts? /Benny