I hit the following oops a couple of times a day - it seems to correspond to tearing down a vif: Jul 3 01:30:13 ubuntu kernel: ------------[ cut here ]------------ Jul 3 01:30:13 ubuntu kernel: kernel BUG at include/linux/dcache.h:293! Jul 3 01:30:13 ubuntu kernel: invalid operand: 0000 [#1] Jul 3 01:30:13 ubuntu kernel: SMP Jul 3 01:30:13 ubuntu kernel: Modules linked in: video thermal processor fan button battery ac mptscsih mptbase Jul 3 01:30:13 ubuntu kernel: CPU: 0 Jul 3 01:30:13 ubuntu kernel: EIP: 0061:[<c0193100>] Not tainted VLI Jul 3 01:30:13 ubuntu kernel: EFLAGS: 00010246 (2.6.11.12-xen0) Jul 3 01:30:13 ubuntu kernel: EIP is at sysfs_remove_dir+0x100/0x110 Jul 3 01:30:13 ubuntu kernel: eax: 00000000 ebx: d557b3d4 ecx: dcfd4234 edx: d557b3d4 Jul 3 01:30:13 ubuntu kernel: esi: da0e1a20 edi: dd4d1424 ebp: 00000006 esp: c089de64 Jul 3 01:30:13 ubuntu kernel: ds: 007b es: 007b ss: 0069 Jul 3 01:30:13 ubuntu kernel: Process events/0 (pid: 10, threadinfo=c089c000 task=c075ca40) Jul 3 01:30:13 ubuntu kernel: Stack: c0191f02 dcfd4dbc dc576000 d557b3d4 da0e1a20 dc576000 00000006 c0211070 Jul 3 01:30:13 ubuntu kernel: d557b3d4 00000002 d557b340 c03f087a d557b3d4 d557b340 da0e1a20 c03f1948 Jul 3 01:30:13 ubuntu kernel: da0e1a20 dc576000 c04c84a0 dc576000 00000006 dc576144 c012cd55 c04c84a0 Jul 3 01:30:13 ubuntu kernel: Call Trace: Jul 3 01:30:13 ubuntu kernel: [<c0191f02>] sysfs_hash_and_remove+0x52/0xe9 Jul 3 01:30:13 ubuntu kernel: [<c0211070>] kobject_del+0x20/0x30 Jul 3 01:30:13 ubuntu kernel: [<c03f087a>] br_del_if+0x3a/0x5c Jul 3 01:30:13 ubuntu kernel: [<c03f1948>] br_device_event+0xb8/0x100 Jul 3 01:30:13 ubuntu kernel: [<c012cd55>] notifier_call_chain+0x25/0x40 Jul 3 01:30:13 ubuntu kernel: [<c03a4a2f>] unregister_netdevice+0x14f/0x270 Jul 3 01:30:13 ubuntu kernel: [<c03a4b65>] unregister_netdev+0x15/0x1e Jul 3 01:30:13 ubuntu kernel: [<c02be4f5>] netif_destroy+0x75/0x90 Jul 3 01:30:13 ubuntu kernel: [<c02bdeb4>] netif_ctrlif_rx+0x64/0xb0 Jul 3 01:30:13 ubuntu kernel: [<c0105550>] __ctrl_if_rxmsg_deferred+0x40/0x50 Jul 3 01:30:13 ubuntu kernel: [<c012fbc8>] worker_thread+0x1d8/0x260 Jul 3 01:30:14 ubuntu kernel: [<c0105510>] __ctrl_if_rxmsg_deferred+0x0/0x50 Jul 3 01:30:14 ubuntu kernel: [<c011a930>] default_wake_function+0x0/0x20 Jul 3 01:30:14 ubuntu kernel: [<c011a930>] default_wake_function+0x0/0x20 Jul 3 01:30:14 ubuntu kernel: [<c012f9f0>] worker_thread+0x0/0x260 Jul 3 01:30:14 ubuntu kernel: [<c01341ad>] kthread+0xbd/0x100 Jul 3 01:30:14 ubuntu kernel: [<c01340f0>] kthread+0x0/0x100 Jul 3 01:30:14 ubuntu kernel: [<c0107b15>] kernel_thread_helper+0x5/0x10 Jul 3 01:30:14 ubuntu kernel: Code: 89 44 24 08 8b 00 89 04 24 e8 0d 25 fb ff 8b 54 24 08 8b 42 04 89 04 24 e8 7e e0 07 00 8b 44 24 08 89 04 24 e8 f2 24 fb ff eb 92 <0f> 0b 25 01 53 65 42 c0 e9 13 ff ff ff 8d 76 00 83 ec 20 89 5c Jul 3 01:31:57 ubuntu xenstored: xenstored corruption: connection id 0: err Bad address: Unknown error 14 (Bad address) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Jul-03 08:27 UTC
[Xen-devel] Re: Recurring OOPS in latest -unstable kernel
On 3 Jul 2005, at 02:33, Kip Macy wrote:> I hit the following oops a couple of times a day - it seems to > correspond to tearing down a vif:Are you actually trying to tear down a vif when the crash occurs, or is its refcnt falling to zero because of a bug? We''ve had this bug report at least once before, but I couldn;t find any obvious problem from reading through the backtrace... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This happens periodically when a domU crashes, so I can''t say for sure. I''ve been more focused on debugging my domU :-) -Kip On 7/3/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk> wrote:> > On 3 Jul 2005, at 02:33, Kip Macy wrote: > > > I hit the following oops a couple of times a day - it seems to > > correspond to tearing down a vif: > > Are you actually trying to tear down a vif when the crash occurs, or is > its refcnt falling to zero because of a bug? > > We''ve had this bug report at least once before, but I couldn;t find any > obvious problem from reading through the backtrace... > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message-----> > I hit the following oops a couple of times a day - it seems to > > correspond to tearing down a vif: > > Are you actually trying to tear down a vif when the crash > occurs, or is its refcnt falling to zero because of a bug? > > We''ve had this bug report at least once before, but I > couldn;t find any obvious problem from reading through the > backtrace...This sounds rather like the bug that''s being seen with the ported SuSE kernel. Appended is a summary of the info we have on it. Ian The problem really looks obscure to me, a requests seems to be routed to the wrong netback(vifX.0) device, the refcount drops to 0 and then we OOps. (The normal oops path is the BUG() in line 101 of netback/interface.c, I patched the kernel to get a backtrace at the place where we schedule the work.) The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something screws up the ringbuffers -- should we start reviewing the path down from hypervisor_callback? Something strange seems to happen there with ringbuffer assignment to interfaces and I guess we need to review the upcall path. Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM ... I don''t know the code well enough see it without adding a lot of instrumentation to the code. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Just to clarify - this is straight out of the -unstable tree from yesterday with no CONFIG_REGPARM. Nonetheless, a few things are different: CONFIG_MK8=y CONFIG_SMP=y # CONFIG_PREEMPT is not set -Kip> > The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something > screws up the ringbuffers -- should we start reviewing the path down > from hypervisor_callback? > > Something strange seems to happen there with ringbuffer assignment to > interfaces and I guess we need to review the upcall path. > Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM > ... > I don''t know the code well enough see it without adding a lot of > instrumentation to the code. > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Let me know if there is anything I can do to help out. Having to reboot every second or third dom create is frustrating. I know you have a v40z there, so it would be surprising if you couldn''t reproduce it. -Kip On 7/3/05, Kip Macy <kip.macy@gmail.com> wrote:> Just to clarify - this is straight out of the -unstable tree from > yesterday with no CONFIG_REGPARM. Nonetheless, a few things are > different: > CONFIG_MK8=y > CONFIG_SMP=y > # CONFIG_PREEMPT is not set > > -Kip > > > > The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something > > screws up the ringbuffers -- should we start reviewing the path down > > from hypervisor_callback? > > > > Something strange seems to happen there with ringbuffer assignment to > > interfaces and I guess we need to review the upcall path. > > Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM > > ... > > I don''t know the code well enough see it without adding a lot of > > instrumentation to the code. > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel