thr3ads.net - Xen devel - [Xen-devel] dom0 crash while booting from AOE devices [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Jayesh Salvi

2006-Sep-23 21:50 UTC

[Xen-devel] dom0 crash while booting from AOE devices

Hello,

I have encountered a crash in dom0 kernel while booting a domU from an AOE
device. I haven''t seen such crashes when booting from local partitions/
LVM
volumes/ loopback file systems. Also I haven''t seen such crash when I
did
repetitive I/O to these AOE devices. As the call trace of crash indicates
the crash is in xenolinux kernel. Also this crash is predictably
reproducible.

I am currently using xen 3.0.1, but I have seen the same thing happening in
3.0.2 some time back. If time permits I can try to reproduce it on latest
Xen builds.

The domU''s disks look like this:
''phy:/dev/etherd/e0.4,sda1,w''
''phy:/dev/etherd/e1.4,sda2,w''

Inside the domU, sda1 is treated as root device and sda2 is treated as swap.


The AOE setup involves, vblade servers running on the server machine that
exports some disks over AOE. The dom0 instance in question is a client to
this AOE server. It has ''aoe'' module loaded into it and the
aoe-tools
version is 10.

The stack trace of the crash is as follows:

Unable to handle kernel NULL pointer dereference at virtual address 00000004

 printing eip:

c012cc32

*pde = ma 8da99067 pa 32e99067

*pte = ma 00000000 pa 55555000

Oops: 0002 [#1]

SMP

Modules linked in: ipt_physdev iptable_filter ip_tables aoe bridge nfs lockd
ppdev vmnet vmmon sg parport_pc lp parport autofs4 sunrpc af_packet
binfmt_misc dm_mirror dm_multipath video thermal processor fan button
battery ac ipv6 md ohci1394 ieee1394 uhci_hcd intel_agp agpgart i2c_i801
i2c_core pci_hotplug snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd soundcore snd_page_alloc e1000 floppy unix sd_mod
aacraid scsi_mod ext3 jbd dm_mod

CPU:    0

EIP:    0061:[<c012cc32>]    Tainted: P      VLI

EFLAGS: 00010012   (2.6.12.6-xen)

EIP is at run_timer_softirq+0xa2/0x1c0

eax: 00000000   ebx: 00000000   ecx: f33dbe00   edx: c03f3f0c

esi: 00000100   edi: c26deda0   ebp: 00000000   esp: c03f3ef8

ds: 007b   es: 007b   ss: 0069

Process swapper (pid: 0, threadinfo=c03f2000 task=c0369fc0)

Stack: 00000000 c03f3f7c 00000100 c01438a0 c03f2000 f33dbe00 c0449260
20000000

       00000011 c03ecda8 c0420ea0 00000000 c0127ee6 c03ecda8 0000000a
c03f2000

       00000001 00000000 00000000 c0128005 00000000 fbf7e000 c010ef32
c0105a00

Call Trace:

 [<c01438a0>] handle_IRQ_event+0x60/0xb0

 [<c0127ee6>] __do_softirq+0x96/0x130

 [<c0128005>] do_softirq+0x85/0xa0

 [<c010ef32>] do_IRQ+0x22/0x30

 [<c0105a00>] evtchn_do_upcall+0x90/0x100

 [<c010a88c>] hypervisor_callback+0x2c/0x34

 [<c01082aa>] xen_idle+0x4a/0xa0

 [<c0108369>] cpu_idle+0x69/0xb0

 [<c03f49fa>] start_kernel+0x1ca/0x220

 [<c03f4370>] unknown_bootoption+0x0/0x1f0

Code: 00 8b 53 04 8d 6c 24 14 8b 44 24 14 89 69 04 89 4c 24 14 89 50 04 89
02 89 5b 04 89 5e 0c eb 66 8b 51 04 8b 01 8b 69 14 8b 59 18 <89> 50 04 89
02
c7 41 04 00 02 20 00 c7 01 00 01 10 00 89 4f 08

 <0>Kernel panic - not syncing: Fatal exception in interrupt

 (XEN) Domain 0 shutdown: rebooting machine.

(XEN) Reboot disabled on cmdline: require manual reset


Before getting this crash I get some warnings on the serial console that
look like following:

Uninitialised timer!

This is just a warning.  Your computer is OK

function=0xc02344b0, data=0xf1b9d460

But I guess these have nothing to do with the crash.


I also observed the AOE traffic when the crash occurs using tcpdump. But
nothing seemed unusual to my eyes, just that the packets stopped flowing
after the AOE client dom0 crashed. Furthermore, there is no problem with AOE
servers. After reboot I can again start using the same AOE devices (save the
inconsistent file system). My past attempts of putting printk''s in AOE
driver source also didn''t reveal any helpful information.

Please let me know if any bug fixes were done in recent versions in the area
where this crash is being seen (handle_IRQ_event). Any other suggestions to
tackle the problem are welcome.

Thanks,
-- 
Jayesh
------------------------------------------------------------------------
Everything you can imagine is real


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Sep-24 00:15 UTC

head link

RE: [Xen-devel] dom0 crash while booting from AOE devices

> I am currently using xen 3.0.1, but I have seen the same thing
happening in> 3.0.2 some time back. If time permits I can try to reproduce it on
latest> Xen builds.
Please do.
 > EIP is at run_timer_softirq+0xa2/0x1c0
 > Before getting this crash I get some warnings on the serial console
that> look like following:
> 
> Uninitialised timer!
> 
> This is just a warning.  Your computer is OK
> 
> function=0xc02344b0, data=0xf1b9d460
Looking up the address of the above function might be interesting if you
can repro on:

http://xenbits.staging.xensource.com/xen-3.0.3-testing.hg  or
xen-usntable.hg

Thanks,
Ian 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Possibly Parallel Threads

Search for more reasonably related threads

Xen devel - Sep 2006 - dom0 crash while booting from AOE devices

[Xen-devel] dom0 crash while booting from AOE devices

RE: [Xen-devel] dom0 crash while booting from AOE devices

Possibly Parallel Threads