thr3ads.net - Xen devel - [Xen-devel] Recurring OOPS in latest -unstable kernel [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Kip Macy

2005-Jul-03 01:33 UTC

[Xen-devel] Recurring OOPS in latest -unstable kernel

I hit the following oops a couple of times a day - it seems to
correspond to tearing down a vif:

Jul  3 01:30:13 ubuntu kernel: ------------[ cut here ]------------
Jul  3 01:30:13 ubuntu kernel: kernel BUG at include/linux/dcache.h:293!
Jul  3 01:30:13 ubuntu kernel: invalid operand: 0000 [#1]
Jul  3 01:30:13 ubuntu kernel: SMP 
Jul  3 01:30:13 ubuntu kernel: Modules linked in: video thermal
processor fan button battery ac mptscsih mptbase
Jul  3 01:30:13 ubuntu kernel: CPU:    0
Jul  3 01:30:13 ubuntu kernel: EIP:    0061:[<c0193100>]    Not tainted
VLI
Jul  3 01:30:13 ubuntu kernel: EFLAGS: 00010246   (2.6.11.12-xen0) 
Jul  3 01:30:13 ubuntu kernel: EIP is at sysfs_remove_dir+0x100/0x110
Jul  3 01:30:13 ubuntu kernel: eax: 00000000   ebx: d557b3d4   ecx:
dcfd4234   edx: d557b3d4
Jul  3 01:30:13 ubuntu kernel: esi: da0e1a20   edi: dd4d1424   ebp:
00000006   esp: c089de64
Jul  3 01:30:13 ubuntu kernel: ds: 007b   es: 007b   ss: 0069
Jul  3 01:30:13 ubuntu kernel: Process events/0 (pid: 10,
threadinfo=c089c000 task=c075ca40)
Jul  3 01:30:13 ubuntu kernel: Stack: c0191f02 dcfd4dbc dc576000
d557b3d4 da0e1a20 dc576000 00000006 c0211070
Jul  3 01:30:13 ubuntu kernel:        d557b3d4 00000002 d557b340
c03f087a d557b3d4 d557b340 da0e1a20 c03f1948
Jul  3 01:30:13 ubuntu kernel:        da0e1a20 dc576000 c04c84a0
dc576000 00000006 dc576144 c012cd55 c04c84a0
Jul  3 01:30:13 ubuntu kernel: Call Trace:
Jul  3 01:30:13 ubuntu kernel:  [<c0191f02>]
sysfs_hash_and_remove+0x52/0xe9
Jul  3 01:30:13 ubuntu kernel:  [<c0211070>] kobject_del+0x20/0x30
Jul  3 01:30:13 ubuntu kernel:  [<c03f087a>] br_del_if+0x3a/0x5c
Jul  3 01:30:13 ubuntu kernel:  [<c03f1948>] br_device_event+0xb8/0x100
Jul  3 01:30:13 ubuntu kernel:  [<c012cd55>] notifier_call_chain+0x25/0x40
Jul  3 01:30:13 ubuntu kernel:  [<c03a4a2f>]
unregister_netdevice+0x14f/0x270
Jul  3 01:30:13 ubuntu kernel:  [<c03a4b65>] unregister_netdev+0x15/0x1e
Jul  3 01:30:13 ubuntu kernel:  [<c02be4f5>] netif_destroy+0x75/0x90
Jul  3 01:30:13 ubuntu kernel:  [<c02bdeb4>] netif_ctrlif_rx+0x64/0xb0
Jul  3 01:30:13 ubuntu kernel:  [<c0105550>]
__ctrl_if_rxmsg_deferred+0x40/0x50
Jul  3 01:30:13 ubuntu kernel:  [<c012fbc8>] worker_thread+0x1d8/0x260
Jul  3 01:30:14 ubuntu kernel:  [<c0105510>]
__ctrl_if_rxmsg_deferred+0x0/0x50
Jul  3 01:30:14 ubuntu kernel:  [<c011a930>]
default_wake_function+0x0/0x20
Jul  3 01:30:14 ubuntu kernel:  [<c011a930>]
default_wake_function+0x0/0x20
Jul  3 01:30:14 ubuntu kernel:  [<c012f9f0>] worker_thread+0x0/0x260
Jul  3 01:30:14 ubuntu kernel:  [<c01341ad>] kthread+0xbd/0x100
Jul  3 01:30:14 ubuntu kernel:  [<c01340f0>] kthread+0x0/0x100
Jul  3 01:30:14 ubuntu kernel:  [<c0107b15>] kernel_thread_helper+0x5/0x10
Jul  3 01:30:14 ubuntu kernel: Code: 89 44 24 08 8b 00 89 04 24 e8 0d
25 fb ff 8b 54 24 08 8b 42 04 89 04 24 e8 7e e0 07 00 8b 44 24 08 89
04 24 e8 f2 24 fb ff eb 92 <0f> 0b 25 01 53 65 42 c0 e9 13 ff ff ff 8d
76 00 83 ec 20 89 5c
Jul  3 01:31:57 ubuntu xenstored: xenstored corruption: connection id
0: err Bad address: Unknown error 14 (Bad address)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Jul-03 08:27 UTC

head link

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

On 3 Jul 2005, at 02:33, Kip Macy wrote:
> I hit the following oops a couple of times a day - it seems to
> correspond to tearing down a vif:
Are you actually trying to tear down a vif when the crash occurs, or is 
its refcnt falling to zero because of a bug?

We''ve had this bug report at least once before, but I couldn;t find any
obvious problem from reading through the backtrace...

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kip Macy

2005-Jul-03 16:21 UTC

head link

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

This happens periodically when a domU crashes, so I can''t say for
sure. I''ve been more focused on debugging my domU :-)

                           -Kip

On 7/3/05, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
wrote:> 
> On 3 Jul 2005, at 02:33, Kip Macy wrote:
> 
> > I hit the following oops a couple of times a day - it seems to
> > correspond to tearing down a vif:
> 
> Are you actually trying to tear down a vif when the crash occurs, or is
> its refcnt falling to zero because of a bug?
> 
> We''ve had this bug report at least once before, but I couldn;t
find any
> obvious problem from reading through the backtrace...
> 
>  -- Keir
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Jul-03 19:36 UTC

head link

[Xen-devel] RE: Recurring OOPS in latest -unstable kernel

> -----Original Message-----
> > I hit the following oops a couple of times a day - it seems to 
> > correspond to tearing down a vif:
> 
> Are you actually trying to tear down a vif when the crash 
> occurs, or is its refcnt falling to zero because of a bug?
> 
> We''ve had this bug report at least once before, but I 
> couldn;t find any obvious problem from reading through the 
> backtrace...
This sounds rather like the bug that''s being seen with the ported SuSE
kernel. Appended is a summary of the info we have on it.

Ian


The problem really looks obscure to me, a requests seems to be routed to
the wrong netback(vifX.0) device, the refcount drops to 0 and then we
OOps. (The normal oops path is the BUG() in line
101 of netback/interface.c, I patched the kernel to get a backtrace at
the place where we schedule the work.)

The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
screws up the ringbuffers -- should we start reviewing the path down
from hypervisor_callback?

Something strange seems to happen there with ringbuffer assignment to
interfaces and I guess we need to review the upcall path.
Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
...
I don''t know the code well enough see it without adding a lot of
instrumentation to the code.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kip Macy

2005-Jul-03 20:28 UTC

head link

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

Just to clarify  - this is straight out of the -unstable tree from
yesterday with no CONFIG_REGPARM. Nonetheless, a few things are
different:
CONFIG_MK8=y
CONFIG_SMP=y
# CONFIG_PREEMPT is not set

              -Kip> 
> The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
> screws up the ringbuffers -- should we start reviewing the path down
> from hypervisor_callback?
> 
> Something strange seems to happen there with ringbuffer assignment to
> interfaces and I guess we need to review the upcall path.
> Somewhere, we may clobber an argument, possibly involving CONFIG_REGPARM
> ...
> I don''t know the code well enough see it without adding a lot of
> instrumentation to the code.
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kip Macy

2005-Jul-03 22:20 UTC

head link

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

Let me know if there is anything I can do to help out. Having to
reboot every second or third dom create is frustrating. I know you
have a v40z there, so it would be surprising if you couldn''t reproduce
it.

     -Kip

On 7/3/05, Kip Macy <kip.macy@gmail.com> wrote:> Just to clarify  - this is straight out of the -unstable tree from
> yesterday with no CONFIG_REGPARM. Nonetheless, a few things are
> different:
> CONFIG_MK8=y
> CONFIG_SMP=y
> # CONFIG_PREEMPT is not set
> 
>               -Kip
> >
> > The same code (in netback) works in 2.6.9rc2/2.6.11.x, so something
> > screws up the ringbuffers -- should we start reviewing the path down
> > from hypervisor_callback?
> >
> > Something strange seems to happen there with ringbuffer assignment to
> > interfaces and I guess we need to review the upcall path.
> > Somewhere, we may clobber an argument, possibly involving
CONFIG_REGPARM
> > ...
> > I don''t know the code well enough see it without adding a lot
of
> > instrumentation to the code.
> >
> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2005 - Recurring OOPS in latest -unstable kernel

[Xen-devel] Recurring OOPS in latest -unstable kernel

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

[Xen-devel] RE: Recurring OOPS in latest -unstable kernel

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel

[Xen-devel] Re: Recurring OOPS in latest -unstable kernel