thr3ads.net - asterisk users - [asterisk-users] Problem with new AEX800 card dying because of interrupt problems [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Christian Weeks

2010-Sep-08 15:38 UTC

[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

Hello
I purchased an AEX800 card to replace the ageing cheap channel bank/T1
card solution a few months ago, assuming that it would be a more robust
solution for my small scale phone system. However, it appears to be
anything but that.

Originally implemented as a XEN dom-u virtual machine on a large server
class machine, using PCI passthrough to pass the AEX800 and a small
older TDM400, then recently migrated to the dom-0, the aex800 has
continued to experience interrupt errors:

wctdm24xxp 0000:04:08.0: Missed interrupt. Increasing latency to 8 ms in
order to compensate.
wctdm24xxp 0000:04:08.0: ERROR: Unable to service card within 25 ms and
unable to further increase latency.

Eventually, it gets to be too much and the card dies:
wctdm24xxp 0000:04:08.0: Host failed to service card interrupt within
128 ms which is a hardunderun.
oh and also:
wctdm24xxp 0000:04:08.0: Power alarm on module 4, resetting!


Now, these interrupt problems have occured inside and outside a VM. So
far I've had the very unhelpful advice to "move the interrupt".
Given
that this is a PCI express board, and should be delivering MSI
interrupts which are immovable, that seems to be somewhat impossible.
The BIOS certainly has nothing (and no machine I've had in the past few
years seems to have had such a feature- interrupts are programmed by the
APIC these days).

So I am asking the list, do you have any advice except perhaps to go
back to the broken channel bank? Is it really true that my modern server
class machine (quad core xeon) cannot handle the AEX800, whereas my
seven year old AMD desktop (previous host to the T1) could handle what
seems to have been about 3x the capacity? Isn't this a massive
regression?

I tried upgrading dahdi to 2.4.0 because there was promise of an
interrupt handler rewrite there for the wctdm24xxp driver, but it has
made no difference. It should also be noted that when the driver was
inside the dom-u, I got about a week's uptime from the card. In the
dom-0 I'm getting about 8 hours of uptime.

Many thanks
Christian

Shaun Ruffell

2010-Sep-08 16:06 UTC

head link

[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

On 09/08/2010 10:38 AM, Christian Weeks wrote:> So I am asking the list, do you have any advice except perhaps to go
> back to the broken channel bank? Is it really true that my modern server
> class machine (quad core xeon) cannot handle the AEX800, whereas my
> seven year old AMD desktop (previous host to the T1) could handle what
> seems to have been about 3x the capacity? Isn't this a massive
> regression?
Does the AEX800 work fine in your old AMD desktop?  If the wctdm24xxp
driver is having problems servicing the interrupt in a timely fashion in
your server I would be surprised if other cards in the same system
wouldn't also experience high interrupt latencies which would probably
manifest itself as pops and noise on the channels.

Some server class machines can have problems since they aren't optimized
for "real-time" performance but are instead optimized for overall
throughput (typically) and there are timing requirements for telephony.
 In other words, it doesn't matter if your server can handle a thousand
channels...if it can't service any one channel within 25ms consistently,
you're going to have issues with audio.

I would recommend:

a) checking the transfer rate to your hard drive ('hdparam -t
/dev/[sda|hda]').  If it's below 4MB/s that's the likely culprit.
Sometimes setting the kernel command line parameter to "hda=none" can
help depending on the kernel version you're using.  I've also seen slow
transfer rates fixed by changing BIOS settings.

b) Use cyclictest (https://rt.wiki.kernel.org/index.php/Cyclictest) and
then stress your system to make sure maximum latencies remain low
without DAHDI loaded.  System Management Interrupts / Baseboard
Management Controllers can cause problems here on some servers.

If cyclictest is shows you have some maximum latency above 128ms, I
would recommend trying to fix that first, but if for some reason you
can't, you could trade some of your system memory for increased
tolerance to system conditions by editing the DRING_SIZE in
drivers/dahdi/voicebus.h to 256 or 512 depending on what cyclictest
reported what your maximum latency is.  Keep in mind this isn't a
"fix"
since you'll still have problems in your audio for any latency above 25ms.

Good luck,

-- 
Shaun Ruffell
Digium, Inc. | Linux Kernel Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: www.digium.com & www.asterisk.org

Benny Amorsen

2010-Sep-13 14:52 UTC

head link

[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

Christian Weeks <cpw at weeksfamily.ca> writes:
> Hello
> I purchased an AEX800 card to replace the ageing cheap channel bank/T1
> card solution a few months ago, assuming that it would be a more robust
> solution for my small scale phone system. However, it appears to be
> anything but that.
>
> Originally implemented as a XEN dom-u virtual machine on a large server
> class machine, using PCI passthrough to pass the AEX800 and a small
> older TDM400, then recently migrated to the dom-0, the aex800 has
> continued to experience interrupt errors:
>
> wctdm24xxp 0000:04:08.0: Missed interrupt. Increasing latency to 8 ms in
> order to compensate.
> wctdm24xxp 0000:04:08.0: ERROR: Unable to service card within 25 ms and
> unable to further increase latency.
Can you do a at /proc/interrupts?


/Benny

Maybe Matching Threads

Search for more apparently analagous threads

asterisk users - Sep 2010 - Problem with new AEX800 card dying because of interrupt problems

[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

Maybe Matching Threads