thr3ads.net - asterisk users - [asterisk-users] How are your PRI interrupts balanced? (+ Soft lockup BUG) [Mar 2010]

If this information is useful, please help other people find it:
Share via:

James Lamanna

2010-Mar-30 00:30 UTC

[asterisk-users] How are your PRI interrupts balanced? (+ Soft lockup BUG)

Hi,
I'm trying to figure out the cause of a soft lockup I experienced:

Mar 29 09:38:24 pstn1 kernel: BUG: soft lockup - CPU#0 stuck for 10s!
[asterisk:32029]
Mar 29 09:38:24 pstn1 kernel: Pid: 32029, comm:             asterisk
Mar 29 09:38:24 pstn1 kernel: EIP: 0060:[<c046e7fe>] CPU: 0
Mar 29 09:38:24 pstn1 kernel: EIP is at kfree+0x68/0x6c
Mar 29 09:38:24 pstn1 kernel:  EFLAGS: 00000286    Tainted: GF
(2.6.18-128.1.10.el5 #1)
Mar 29 09:38:24 pstn1 kernel: EAX: 00000029 EBX: f7ff9380 ECX:
f7fff880 EDX: c11ff9a0
Mar 29 09:38:24 pstn1 kernel: ESI: 00000286 EDI: cffcda00 EBP:
e5e10c80 DS: 007b ES: 007b
Mar 29 09:38:24 pstn1 kernel: CR0: 80050033 CR2: b7ce39e0 CR3:
0f911000 CR4: 000006d0
Mar 29 09:38:24 pstn1 kernel:  [<c05b067c>] kfree_skbmem+0x8/0x61
Mar 29 09:38:24 pstn1 kernel:  [<c05e9aaf>] __udp_queue_rcv_skb+0x4a/0x51
Mar 29 09:38:24 pstn1 kernel:  [<c05ad993>] release_sock+0x44/0x91
Mar 29 09:38:24 pstn1 kernel:  [<c05ea939>] udp_sendmsg+0x44e/0x514
Mar 29 09:38:24 pstn1 kernel:  [<c05efdec>] inet_sendmsg+0x35/0x3f
Mar 29 09:38:24 pstn1 kernel:  [<c05ab30c>] sock_sendmsg+0xce/0xe8
Mar 29 09:38:24 pstn1 kernel:  [<c043464f>]
autoremove_wake_function+0x0/0x2d
Mar 29 09:38:24 pstn1 kernel:  [<c04ea17b>] copy_from_user+0x17/0x5d
Mar 29 09:38:24 pstn1 kernel:  [<c04ea3a1>] copy_to_user+0x31/0x48
Mar 29 09:38:24 pstn1 kernel:  [<f89ab141>] zt_chan_read+0x1e0/0x20b
[zaptel]
Mar 29 09:38:24 pstn1 kernel:  [<c04ea195>] copy_from_user+0x31/0x5d
Mar 29 09:38:24 pstn1 kernel:  [<c05ac4c4>] sys_sendto+0x116/0x140
Mar 29 09:38:24 pstn1 kernel:  [<c0415d4f>] flush_tlb_page+0x74/0x77
Mar 29 09:38:24 pstn1 kernel:  [<c0461331>] do_wp_page+0x3bf/0x40a
Mar 29 09:38:24 pstn1 kernel:  [<c04284f1>] current_fs_time+0x4a/0x55
Mar 29 09:38:24 pstn1 kernel:  [<c0488f9b>] touch_atime+0x60/0x91
Mar 29 09:38:24 pstn1 kernel:  [<c047d9d0>] pipe_readv+0x315/0x321
Mar 29 09:38:24 pstn1 kernel:  [<c05acde4>] sys_socketcall+0x106/0x19e
Mar 29 09:38:24 pstn1 kernel:  [<c0404f17>] syscall_call+0x7/0xb
Mar 29 09:38:24 pstn1 kernel:  ======================

This occurred during a "high load" period (52 calls across 3 PRI
spans).

A couple days ago I moved the interrupts for my PRI card to CPU0 from
CPU3, because CPU3 was handling everything else:
           CPU0       CPU1       CPU2       CPU3
  0:        306          0          0 3684057379    IO-APIC-edge  timer
  1:          0          0          0      13468    IO-APIC-edge  i8042
  8:          0          0          0          3    IO-APIC-edge  rtc
  9:          0          0          0          0   IO-APIC-level  acpi
 12:          0          0          0          4    IO-APIC-edge  i8042
169:          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
177:          0          0          0   18392593   IO-APIC-level  ata_piix
185:          0          0          0          1   IO-APIC-level  ehci_hcd:usb1
193:          0          0          0          0   IO-APIC-level  uhci_hcd:usb3
201:          0          0          0 2090021759   IO-APIC-level  eth0
209:  149621223          0          0 3534419461   IO-APIC-level  wct4xxp


(The CPU3 number for wct4xxp is not increasing any more).

What is the interrupt distribution of other people's systems?
Before I made this change I was having a problem with D-channels
dropping occasionally, so I thought it might be an interrupt/load
issue.

Thank you.

-- James

Matt Watson

2010-Mar-30 03:38 UTC

head link

[asterisk-users] How are your PRI interrupts balanced? (+ Soft lockup BUG)

Dell server by any chance?

I have a similar problem with a TE220B in a Dell 1950 III server - i've seen
several other people having issues with digium cards in dell servers as
well.

I've actually done something similar to what you have done - isolated the
TE220B onto its own IRQ and set processor affinity for all the IRQs to
particular cores... so far I haven't had kernel pancs since doing this, but
its still a little too early to say if it has fixed the issue 100% or not.

--
Matt

On Mon, Mar 29, 2010 at 8:30 PM, James Lamanna <jlamanna at gmail.com>
wrote:
> Hi,
> I'm trying to figure out the cause of a soft lockup I experienced:
>
> Mar 29 09:38:24 pstn1 kernel: BUG: soft lockup - CPU#0 stuck for 10s!
> [asterisk:32029]
> Mar 29 09:38:24 pstn1 kernel: Pid: 32029, comm:             asterisk
> Mar 29 09:38:24 pstn1 kernel: EIP: 0060:[<c046e7fe>] CPU: 0
> Mar 29 09:38:24 pstn1 kernel: EIP is at kfree+0x68/0x6c
> Mar 29 09:38:24 pstn1 kernel:  EFLAGS: 00000286    Tainted: GF
> (2.6.18-128.1.10.el5 #1)
> Mar 29 09:38:24 pstn1 kernel: EAX: 00000029 EBX: f7ff9380 ECX:
> f7fff880 EDX: c11ff9a0
> Mar 29 09:38:24 pstn1 kernel: ESI: 00000286 EDI: cffcda00 EBP:
> e5e10c80 DS: 007b ES: 007b
> Mar 29 09:38:24 pstn1 kernel: CR0: 80050033 CR2: b7ce39e0 CR3:
> 0f911000 CR4: 000006d0
> Mar 29 09:38:24 pstn1 kernel:  [<c05b067c>] kfree_skbmem+0x8/0x61
> Mar 29 09:38:24 pstn1 kernel:  [<c05e9aaf>]
__udp_queue_rcv_skb+0x4a/0x51
> Mar 29 09:38:24 pstn1 kernel:  [<c05ad993>] release_sock+0x44/0x91
> Mar 29 09:38:24 pstn1 kernel:  [<c05ea939>] udp_sendmsg+0x44e/0x514
> Mar 29 09:38:24 pstn1 kernel:  [<c05efdec>] inet_sendmsg+0x35/0x3f
> Mar 29 09:38:24 pstn1 kernel:  [<c05ab30c>] sock_sendmsg+0xce/0xe8
> Mar 29 09:38:24 pstn1 kernel:  [<c043464f>]
> autoremove_wake_function+0x0/0x2d
> Mar 29 09:38:24 pstn1 kernel:  [<c04ea17b>] copy_from_user+0x17/0x5d
> Mar 29 09:38:24 pstn1 kernel:  [<c04ea3a1>] copy_to_user+0x31/0x48
> Mar 29 09:38:24 pstn1 kernel:  [<f89ab141>] zt_chan_read+0x1e0/0x20b
> [zaptel]
> Mar 29 09:38:24 pstn1 kernel:  [<c04ea195>] copy_from_user+0x31/0x5d
> Mar 29 09:38:24 pstn1 kernel:  [<c05ac4c4>] sys_sendto+0x116/0x140
> Mar 29 09:38:24 pstn1 kernel:  [<c0415d4f>] flush_tlb_page+0x74/0x77
> Mar 29 09:38:24 pstn1 kernel:  [<c0461331>] do_wp_page+0x3bf/0x40a
> Mar 29 09:38:24 pstn1 kernel:  [<c04284f1>] current_fs_time+0x4a/0x55
> Mar 29 09:38:24 pstn1 kernel:  [<c0488f9b>] touch_atime+0x60/0x91
> Mar 29 09:38:24 pstn1 kernel:  [<c047d9d0>] pipe_readv+0x315/0x321
> Mar 29 09:38:24 pstn1 kernel:  [<c05acde4>]
sys_socketcall+0x106/0x19e
> Mar 29 09:38:24 pstn1 kernel:  [<c0404f17>] syscall_call+0x7/0xb
> Mar 29 09:38:24 pstn1 kernel:  ======================>
>
> This occurred during a "high load" period (52 calls across 3 PRI
spans).
>
> A couple days ago I moved the interrupts for my PRI card to CPU0 from
> CPU3, because CPU3 was handling everything else:
>           CPU0       CPU1       CPU2       CPU3
>  0:        306          0          0 3684057379    IO-APIC-edge  timer
>  1:          0          0          0      13468    IO-APIC-edge  i8042
>  8:          0          0          0          3    IO-APIC-edge  rtc
>  9:          0          0          0          0   IO-APIC-level  acpi
>  12:          0          0          0          4    IO-APIC-edge  i8042
> 169:          0          0          0          0   IO-APIC-level
>  uhci_hcd:usb2
> 177:          0          0          0   18392593   IO-APIC-level  ata_piix
> 185:          0          0          0          1   IO-APIC-level
>  ehci_hcd:usb1
> 193:          0          0          0          0   IO-APIC-level
>  uhci_hcd:usb3
> 201:          0          0          0 2090021759   IO-APIC-level  eth0
> 209:  149621223          0          0 3534419461   IO-APIC-level  wct4xxp
>
>
> (The CPU3 number for wct4xxp is not increasing any more).
>
> What is the interrupt distribution of other people's systems?
> Before I made this change I was having a problem with D-channels
> dropping occasionally, so I thought it might be an interrupt/load
> issue.
>
> Thank you.
>
> -- James
>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
> New to Asterisk? Join us for a live introductory webinar every Thurs:
>               http://www.asterisk.org/hello
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.digium.com/pipermail/asterisk-users/attachments/20100329/ffbfcedf/attachment.htm

Maybe Matching Threads

Search for more reasonably related threads

asterisk users - Mar 2010 - How are your PRI interrupts balanced? (+ Soft lockup BUG)

[asterisk-users] How are your PRI interrupts balanced? (+ Soft lockup BUG)

[asterisk-users] How are your PRI interrupts balanced? (+ Soft lockup BUG)

Maybe Matching Threads