Actually, Steve did post the oops (appended).
It''s possible this was a cset before the alignment fix which would have
exercised skb copying more heavily, but that''s no excuse for it
crashing.
Ian
n4h34 login: Unable to handle kernel paging request at virtual address
c0976590
printing eip:
c03d6ed7
*pde = ma 3cf91067 pa 00f91067
*pte = ma 00000000 pa fffff000
Oops: 0000 [#1]
SMP
Modules linked in: iptable_filter ip_tables x_tables bridge drbd ipv6
nfsd lockd sunrpc e100 tulip softdog 3c59x evdev sd_mod dm_mod thermal
processor fan e1000 eepro100 mii tg3
CPU: 0
EIP: 0061:[<c03d6ed7>] Not tainted VLI
EFLAGS: 00010206 (2.6.16.13-xen #3)
EIP is at skb_copy_bits+0x127/0x280
eax: c0976000 ebx: 000005a8 ecx: 0000016a edx: c3711720
esi: c0976590 edi: c37110e0 ebp: 000005a8 esp: c00818c8
ds: 007b es: 007b ss: 0069
Process drbd12_receiver (pid: 3924, threadinfo=c0080000 task=c3efda90)
Stack: <0>c1012ec0 00000002 c03d65ef c1d51a00 000005ea 00000042 00000000
00000000
00000020 c3edc800 c376fd64 c03d6b0f c376fd64 00000042 c37110e0
000005a8
c1a70000 c54690c0 00000000 c1d51ac0 c3edc800 c376fd64 c03dc478
c376fd64
Call Trace:
[<c03d65ef>] pskb_expand_head+0xdf/0x140
[<c03d6b0f>] __pskb_pull_tail+0x7f/0x320
[<c54690c0>] br_nf_dev_queue_xmit+0x0/0x50 [bridge]
[<c03dc478>] dev_queue_xmit+0x328/0x370
[<c5462f7e>] br_dev_queue_push_xmit+0xbe/0x140 [bridge]
[<c5469212>] br_nf_post_routing+0x102/0x1c0 [bridge]
[<c54690c0>] br_nf_dev_queue_xmit+0x0/0x50 [bridge]
[<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge]
[<c03f43f8>] nf_iterate+0x78/0x90
[<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge]
[<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge]
[<c03f447e>] nf_hook_slow+0x6e/0x110
[<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge]
[<c5463061>] br_forward_finish+0x61/0x70 [bridge]
[<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge]
[<c5468995>] br_nf_forward_finish+0x75/0x130 [bridge]
[<c5463000>] br_forward_finish+0x0/0x70 [bridge]
[<c5468b38>] br_nf_forward_ip+0xe8/0x190 [bridge]
[<c5468920>] br_nf_forward_finish+0x0/0x130 [bridge]
[<c5463000>] br_forward_finish+0x0/0x70 [bridge]
[<c03f43f8>] nf_iterate+0x78/0x90
[<c5463000>] br_forward_finish+0x0/0x70 [bridge]
[<c5463000>] br_forward_finish+0x0/0x70 [bridge]
[<c03f447e>] nf_hook_slow+0x6e/0x110
[<c5463000>] br_forward_finish+0x0/0x70 [bridge]
[<c5463167>] __br_forward+0x77/0x80 [bridge]
[<c5463000>] br_forward_finish+0x0/0x70 [bridge]
[<c5463fbf>] br_handle_frame_finish+0xdf/0x160 [bridge]
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c5467d89>] br_nf_pre_routing_finish+0xf9/0x370 [bridge]
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c0322e3a>] loopback_start_xmit+0xba/0x110
[<c0400d70>] ip_finish_output+0x0/0x220
[<c03dc07e>] dev_hard_start_xmit+0x5e/0x130
[<c03dc3b5>] dev_queue_xmit+0x265/0x370
[<c03f43f8>] nf_iterate+0x78/0x90
[<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge]
[<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge]
[<c03f447e>] nf_hook_slow+0x6e/0x110
[<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge]
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c54685fc>] br_nf_pre_routing+0x26c/0x520 [bridge]
[<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge]
[<c03f43f8>] nf_iterate+0x78/0x90
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c03f447e>] nf_hook_slow+0x6e/0x110
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c546422d>] br_handle_frame+0x1ed/0x230 [bridge]
[<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge]
[<c03dcb21>] netif_receive_skb+0x1a1/0x330
[<c03dcd87>] process_backlog+0xd7/0x190
[<c03dcf2a>] net_rx_action+0xea/0x230
[<c0125915>] __do_softirq+0xf5/0x120
[<c01259d5>] do_softirq+0x95/0xa0
[<c0125a42>] local_bh_enable+0x62/0xa0
[<c0406bd1>] tcp_prequeue_process+0x71/0x80
[<c04070e9>] tcp_recvmsg+0x349/0x750
[<c50eebfc>] dm_request+0xbc/0x100 [dm_mod]
[<c03d5085>] sock_common_recvmsg+0x55/0x70
[<c03d11cf>] sock_recvmsg+0xef/0x110
[<c03143fa>] force_evtchn_callback+0xa/0x10
[<c0147163>] mempool_alloc+0x33/0xe0
[<c01367d0>] autoremove_wake_function+0x0/0x60
[<c50eebfc>] dm_request+0xbc/0x100 [dm_mod]
[<c02b2ee0>] generic_make_request+0xf0/0x160
[<c529c680>] drbd_recv+0x90/0x190 [drbd]
[<c529cdec>] drbd_recv_header+0x2c/0xf0 [drbd]
[<c529e580>] receive_DataRequest+0x0/0x7d0 [drbd]
[<c52a045c>] drbdd+0x1c/0x150 [drbd]
[<c52a105a>] drbdd_init+0x7a/0x1a0 [drbd]
[<c52a7136>] drbd_thread_setup+0x86/0xf0 [drbd]
[<c52a70b0>] drbd_thread_setup+0x0/0xf0 [drbd]
[<c0102f75>] kernel_thread_helper+0x5/0x10
Code: 8b 4c 24 30 8b 7c 24 34 8b 91 a0 00 00 00 8b 4c 24 18 0f b7 74 ca
18 8b 4c 24 14 8d 34 06 01 fe 29 ce 8b 7c 24 38 89 d9 c1 e9 02 <f3> a5
89 d9 83 e1 03 74 02 f3 a4 89 04 24 ba 02 00 00 00 89 54
<0>Kernel panic - not syncing: Fatal exception in interrupt
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> -----Original Message-----
> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
> Sent: 12 August 2006 10:50
> To: Steve Traugott; Ian Pratt
> Cc: xen-devel
> Subject: Re: Unable to handle kernel paging request
>
>
>
>
> On 12/8/06 6:50 am, "Steve Traugott"
<stevegt@TerraLuna.Org> wrote:
>
> > Another data point... Dom0 seems to only want to crash
> when more than
> > 3 or 4 domU''s are running (each with their own DRBD root,
with DRBD
> > running in dom0), and the below ''nc'' command is run
in the last
> > domU...
> >
> > Still looking.
>
> What does the crash look like? There was no oops message from
> domain0 in the kernel logs that you posted.
>
> -- Keir
>
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
On 12/8/06 7:48 pm, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote:> It''s possible this was a cset before the alignment fix which would have > exercised skb copying more heavily, but that''s no excuse for it > crashing.If the problem has only appeared with recent changesets then it might be worth working backwards to find which one introduced the problem. The network driver changes for GSO would be an obvious candidate. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Traugott
2006-Aug-14 05:44 UTC
[Xen-devel] Re: Unable to handle kernel paging request
On Sun, Aug 13, 2006 at 01:08:53AM +0100, Keir Fraser wrote:> On 12/8/06 7:48 pm, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote: > > It''s possible this was a cset before the alignment fix which would have > > exercised skb copying more heavily, but that''s no excuse for it > > crashing. > > If the problem has only appeared with recent changesets then it might be > worth working backwards to find which one introduced the problem. The > network driver changes for GSO would be an obvious candidate.So far -testing tip (changeset 9762) looks like it avoids both the soft lockups that I was getting earlier in -testing, and the various crashes I''ve been seeing in -unstable 10868. (In addition to the dom0 oops we''re talking about in this thread, 10868 also randomly crashes domUs when xendomains restores them during boot; I haven''t captured data on that since it hasn''t been as critical and can be worked around. It should be easy enough for folks to duplicate if anyone wants to chase it down -- get a few domUs running on a dual-CPU box, then reboot dom0, then check the console of the domUs after everything''s back up. About half of the domUs wound up oopsed and hung in my case, possibly the odd-numbered ones but I''m not sure if it was that consistent. If you can''t duplicate it, let me know and I''ll move some boxes back to 10868 and have another go.) Overall, I''m *really* wishing I had time to set up a stress test suite that exercizes DRBD, aoe, heavy disk and net I/O, etc., and run daily or weekly changesets across it on a dedicated set of hardware, posting the results here. Maybe after the dust settles on this rollout I''m in the middle of right now... Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel