Actually, Steve did post the oops (appended). It''s possible this was a cset before the alignment fix which would have exercised skb copying more heavily, but that''s no excuse for it crashing. Ian n4h34 login: Unable to handle kernel paging request at virtual address c0976590 printing eip: c03d6ed7 *pde = ma 3cf91067 pa 00f91067 *pte = ma 00000000 pa fffff000 Oops: 0000 [#1] SMP Modules linked in: iptable_filter ip_tables x_tables bridge drbd ipv6 nfsd lockd sunrpc e100 tulip softdog 3c59x evdev sd_mod dm_mod thermal processor fan e1000 eepro100 mii tg3 CPU: 0 EIP: 0061:[<c03d6ed7>] Not tainted VLI EFLAGS: 00010206 (2.6.16.13-xen #3) EIP is at skb_copy_bits+0x127/0x280 eax: c0976000 ebx: 000005a8 ecx: 0000016a edx: c3711720 esi: c0976590 edi: c37110e0 ebp: 000005a8 esp: c00818c8 ds: 007b es: 007b ss: 0069 Process drbd12_receiver (pid: 3924, threadinfo=c0080000 task=c3efda90) Stack: <0>c1012ec0 00000002 c03d65ef c1d51a00 000005ea 00000042 00000000 00000000 00000020 c3edc800 c376fd64 c03d6b0f c376fd64 00000042 c37110e0 000005a8 c1a70000 c54690c0 00000000 c1d51ac0 c3edc800 c376fd64 c03dc478 c376fd64 Call Trace: [<c03d65ef>] pskb_expand_head+0xdf/0x140 [<c03d6b0f>] __pskb_pull_tail+0x7f/0x320 [<c54690c0>] br_nf_dev_queue_xmit+0x0/0x50 [bridge] [<c03dc478>] dev_queue_xmit+0x328/0x370 [<c5462f7e>] br_dev_queue_push_xmit+0xbe/0x140 [bridge] [<c5469212>] br_nf_post_routing+0x102/0x1c0 [bridge] [<c54690c0>] br_nf_dev_queue_xmit+0x0/0x50 [bridge] [<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge] [<c03f43f8>] nf_iterate+0x78/0x90 [<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge] [<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge] [<c03f447e>] nf_hook_slow+0x6e/0x110 [<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge] [<c5463061>] br_forward_finish+0x61/0x70 [bridge] [<c5462ec0>] br_dev_queue_push_xmit+0x0/0x140 [bridge] [<c5468995>] br_nf_forward_finish+0x75/0x130 [bridge] [<c5463000>] br_forward_finish+0x0/0x70 [bridge] [<c5468b38>] br_nf_forward_ip+0xe8/0x190 [bridge] [<c5468920>] br_nf_forward_finish+0x0/0x130 [bridge] [<c5463000>] br_forward_finish+0x0/0x70 [bridge] [<c03f43f8>] nf_iterate+0x78/0x90 [<c5463000>] br_forward_finish+0x0/0x70 [bridge] [<c5463000>] br_forward_finish+0x0/0x70 [bridge] [<c03f447e>] nf_hook_slow+0x6e/0x110 [<c5463000>] br_forward_finish+0x0/0x70 [bridge] [<c5463167>] __br_forward+0x77/0x80 [bridge] [<c5463000>] br_forward_finish+0x0/0x70 [bridge] [<c5463fbf>] br_handle_frame_finish+0xdf/0x160 [bridge] [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c5467d89>] br_nf_pre_routing_finish+0xf9/0x370 [bridge] [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c0322e3a>] loopback_start_xmit+0xba/0x110 [<c0400d70>] ip_finish_output+0x0/0x220 [<c03dc07e>] dev_hard_start_xmit+0x5e/0x130 [<c03dc3b5>] dev_queue_xmit+0x265/0x370 [<c03f43f8>] nf_iterate+0x78/0x90 [<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge] [<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge] [<c03f447e>] nf_hook_slow+0x6e/0x110 [<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge] [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c54685fc>] br_nf_pre_routing+0x26c/0x520 [bridge] [<c5467c90>] br_nf_pre_routing_finish+0x0/0x370 [bridge] [<c03f43f8>] nf_iterate+0x78/0x90 [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c03f447e>] nf_hook_slow+0x6e/0x110 [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c546422d>] br_handle_frame+0x1ed/0x230 [bridge] [<c5463ee0>] br_handle_frame_finish+0x0/0x160 [bridge] [<c03dcb21>] netif_receive_skb+0x1a1/0x330 [<c03dcd87>] process_backlog+0xd7/0x190 [<c03dcf2a>] net_rx_action+0xea/0x230 [<c0125915>] __do_softirq+0xf5/0x120 [<c01259d5>] do_softirq+0x95/0xa0 [<c0125a42>] local_bh_enable+0x62/0xa0 [<c0406bd1>] tcp_prequeue_process+0x71/0x80 [<c04070e9>] tcp_recvmsg+0x349/0x750 [<c50eebfc>] dm_request+0xbc/0x100 [dm_mod] [<c03d5085>] sock_common_recvmsg+0x55/0x70 [<c03d11cf>] sock_recvmsg+0xef/0x110 [<c03143fa>] force_evtchn_callback+0xa/0x10 [<c0147163>] mempool_alloc+0x33/0xe0 [<c01367d0>] autoremove_wake_function+0x0/0x60 [<c50eebfc>] dm_request+0xbc/0x100 [dm_mod] [<c02b2ee0>] generic_make_request+0xf0/0x160 [<c529c680>] drbd_recv+0x90/0x190 [drbd] [<c529cdec>] drbd_recv_header+0x2c/0xf0 [drbd] [<c529e580>] receive_DataRequest+0x0/0x7d0 [drbd] [<c52a045c>] drbdd+0x1c/0x150 [drbd] [<c52a105a>] drbdd_init+0x7a/0x1a0 [drbd] [<c52a7136>] drbd_thread_setup+0x86/0xf0 [drbd] [<c52a70b0>] drbd_thread_setup+0x0/0xf0 [drbd] [<c0102f75>] kernel_thread_helper+0x5/0x10 Code: 8b 4c 24 30 8b 7c 24 34 8b 91 a0 00 00 00 8b 4c 24 18 0f b7 74 ca 18 8b 4c 24 14 8d 34 06 01 fe 29 ce 8b 7c 24 38 89 d9 c1 e9 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 89 04 24 ba 02 00 00 00 89 54 <0>Kernel panic - not syncing: Fatal exception in interrupt (XEN) Domain 0 crashed: rebooting machine in 5 seconds.> -----Original Message----- > From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] > Sent: 12 August 2006 10:50 > To: Steve Traugott; Ian Pratt > Cc: xen-devel > Subject: Re: Unable to handle kernel paging request > > > > > On 12/8/06 6:50 am, "Steve Traugott" <stevegt@TerraLuna.Org> wrote: > > > Another data point... Dom0 seems to only want to crash > when more than > > 3 or 4 domU''s are running (each with their own DRBD root, with DRBD > > running in dom0), and the below ''nc'' command is run in the last > > domU... > > > > Still looking. > > What does the crash look like? There was no oops message from > domain0 in the kernel logs that you posted. > > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/8/06 7:48 pm, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote:> It''s possible this was a cset before the alignment fix which would have > exercised skb copying more heavily, but that''s no excuse for it > crashing.If the problem has only appeared with recent changesets then it might be worth working backwards to find which one introduced the problem. The network driver changes for GSO would be an obvious candidate. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steve Traugott
2006-Aug-14 05:44 UTC
[Xen-devel] Re: Unable to handle kernel paging request
On Sun, Aug 13, 2006 at 01:08:53AM +0100, Keir Fraser wrote:> On 12/8/06 7:48 pm, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote: > > It''s possible this was a cset before the alignment fix which would have > > exercised skb copying more heavily, but that''s no excuse for it > > crashing. > > If the problem has only appeared with recent changesets then it might be > worth working backwards to find which one introduced the problem. The > network driver changes for GSO would be an obvious candidate.So far -testing tip (changeset 9762) looks like it avoids both the soft lockups that I was getting earlier in -testing, and the various crashes I''ve been seeing in -unstable 10868. (In addition to the dom0 oops we''re talking about in this thread, 10868 also randomly crashes domUs when xendomains restores them during boot; I haven''t captured data on that since it hasn''t been as critical and can be worked around. It should be easy enough for folks to duplicate if anyone wants to chase it down -- get a few domUs running on a dual-CPU box, then reboot dom0, then check the console of the domUs after everything''s back up. About half of the domUs wound up oopsed and hung in my case, possibly the odd-numbered ones but I''m not sure if it was that consistent. If you can''t duplicate it, let me know and I''ll move some boxes back to 10868 and have another go.) Overall, I''m *really* wishing I had time to set up a stress test suite that exercizes DRBD, aoe, heavy disk and net I/O, etc., and run daily or weekly changesets across it on a dedicated set of hardware, posting the results here. Maybe after the dust settles on this rollout I''m in the middle of right now... Steve -- Stephen G. Traugott (KG6HDQ) Managing Partner, TerraLuna LLC stevegt@TerraLuna.Org -- http://www.t7a.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel