Hi! I got a hypervisor crash on shutdown a HVM guest. Xen: 64bit, c/s 17893 Dom0: Linux, 64bit HVM guest: NetBSD/amd64 Hardware: AMD Athlon, DualCore, RevF When it entered ACPI S5, the hypervisor crashed and provided this trace: (XEN) ----[ Xen-3.3-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c80196c62>] oos_fixup_remove+0x5a/0x79 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: 00000000000009c0 rbx: ffff828401905e10 rcx: 0000000000000000 (XEN) rdx: ffff8300deeec9c0 rsi: 00000000000a025a rdi: ffff8300dee10100 (XEN) rbp: ffff828c80267a38 rsp: ffff828c80267a38 r8: ffffffffffffffff (XEN) r9: 0000000000000000 r10: 0000000000000001 r11: 0000ffff0000ffff (XEN) r12: 00000000000a025a r13: ffff8300dee10100 r14: ffff8300dfde6100 (XEN) r15: ffff8300dee10100 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000001168ef000 cr2: ffff8300deeec9c0 (XEN) ds: 0000 es: 0000 fs: 0063 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff828c80267a38: (XEN) ffff828c80267a68 ffff828c8019998f 00000000000a025a 00000000000a025a (XEN) 0000000000000008 ffff83010010f010 ffff828c80267ab8 ffff828c801aea53 (XEN) 0000000002802a30 000000000010010e ffff828c80267ac8 ffff828402802a30 (XEN) 0000000000000000 ffff83010010f010 000000000010010e ffff8300dee10100 (XEN) ffff828c80267ac8 ffff828c8019816c ffff828c80267b28 ffff828c801b2884 (XEN) 0000000002802a58 001040000010034c 000000010010f000 000000000010010f (XEN) 0000000000000002 ffff828402802a58 ffff83010034c000 0000000000000000 (XEN) 000000000010010f 000000010034c000 ffff828c80267b38 ffff828c8019819d (XEN) ffff828c80267b88 ffff828c801b2ae2 0000000000100351 ffff8300dee10100 (XEN) 000000000010034c ffff8284028083e0 ffff830100351000 0000000000000000 (XEN) 000000000010034c ffff8300dee10100 ffff828c80267b98 ffff828c801981a4 (XEN) ffff828c80267bf8 ffff828c801b2d97 0010400000000008 001040001c1ef000 (XEN) 0000000100351000 0000000000100351 0000000000000000 ffff8300dfde6100 (XEN) 0000000000000002 0000000000000001 0000000000100351 ffff8284028084a8 (XEN) ffff828c80267c08 ffff828c801981ab ffff828c80267c68 ffff828c8019b1a1 (XEN) ffff828402c64b00 ffff8300dfde7068 ffff8300dee10100 ffff8300dfde7068 (XEN) 0000000000000000 ffff8300dfde6100 0000000000000000 0000000000000004 (XEN) 0000000000000000 0000000000000002 ffff828c80267cc8 ffff828c8019b561 (XEN) 0000000000000020 0000000000000000 0000000000000002 ffff8300dfde7098 (XEN) ffff828c80267f28 ffff8300dfde70a8 0000000000000000 ffff8300dfde6100 (XEN) Xen call trace: (XEN) [<ffff828c80196c62>] oos_fixup_remove+0x5a/0x79 (XEN) [<ffff828c8019998f>] shadow_demote+0x14b/0x15c (XEN) [<ffff828c801aea53>] sh_destroy_l1_shadow__guest_4+0xc5/0x2e2 (XEN) [<ffff828c8019816c>] sh_destroy_shadow+0xf5/0x166 (XEN) [<ffff828c801b2884>] sh_destroy_l2_shadow__guest_4+0x297/0x2d9 (XEN) [<ffff828c8019819d>] sh_destroy_shadow+0x126/0x166 (XEN) [<ffff828c801b2ae2>] sh_destroy_l3_shadow__guest_4+0x21c/0x266 (XEN) [<ffff828c801981a4>] sh_destroy_shadow+0x12d/0x166 (XEN) [<ffff828c801b2d97>] sh_destroy_l4_shadow__guest_4+0x26b/0x2ad (XEN) [<ffff828c801981ab>] sh_destroy_shadow+0x134/0x166 (XEN) [<ffff828c8019b1a1>] _shadow_prealloc+0x2ed/0x4d5 (XEN) [<ffff828c8019b561>] sh_set_allocation+0x1d8/0x31c (XEN) [<ffff828c8019b953>] shadow_teardown+0x2ae/0x43b (XEN) [<ffff828c80193e5c>] paging_teardown+0x2d/0x3c (XEN) [<ffff828c8013c067>] domain_relinquish_resources+0x55/0x1e1 (XEN) [<ffff828c8010635b>] domain_kill+0x77/0x16c (XEN) [<ffff828c80104805>] do_domctl+0x6d1/0xe6c (XEN) [<ffff828c801bc1bf>] syscall_enter+0xef/0x149 (XEN) Pagetable walk from ffff8300deeec9c0: (XEN) L4[0x106] = 00000000df06d027 5555555555555555 (XEN) L3[0x003] = 00000000df071027 5555555555555555 (XEN) L2[0x0f7] = 000000011ffff063 5555555555555555 (XEN) L1[0x0ec] = 00000000deeec262 5555555555555555 (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff8300deeec9c0 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... It looks to me, the recent L1 shadow changesets introduced a bug. Christoph -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hello, Sorry for the late reply. Sending emails is sometimes more complicated than what you expect. On Jun 25, 2008, at 3:17 PM, Christoph Egger wrote:> It looks to me, the recent L1 shadow changesets introduced a bug.Yes. This bug is triggered when the guest shuts down with pages still OOS (e.g., paging still enabled and recently touched L1s pagetables, or supposedly so). I unfortunately could not reproduce it, but the inline patch should fix this. Thanks! Gianluca Signed-off-by: Gianluca Guida <gianluca.guida@eu.citrix.com> diff -r 2be3c309e446 xen/arch/x86/mm/shadow/common.c --- a/xen/arch/x86/mm/shadow/common.c Wed Jun 25 13:39:14 2008 -0400 +++ b/xen/arch/x86/mm/shadow/common.c Wed Jun 25 14:54:34 2008 -0400 @@ -630,6 +630,11 @@ struct domain *d = v->domain; perfc_incr(shadow_oos_fixup_remove); + + /* If the domain is dying we might get called when deallocating + * the shadows. Fixup tables are already freed so exit now. */ + if (d->is_dying) + return; idx = mfn_x(gmfn) % SHADOW_OOS_FT_HASH; for_each_vcpu(d, v) @@ -3168,6 +3173,7 @@ { free_xenheap_pages(v->arch.paging.shadow.oos_fixups, SHADOW_OOS_FT_ORDER); + v->arch.paging.shadow.oos_fixups = NULL; } { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thursday 26 June 2008 12:32:11 Gianluca Guida wrote:> Hello, > > Sorry for the late reply. Sending emails is sometimes more > complicated than what you expect. > > On Jun 25, 2008, at 3:17 PM, Christoph Egger wrote: > > It looks to me, the recent L1 shadow changesets introduced a bug. > > Yes. This bug is triggered when the guest shuts down with pages still > OOS (e.g., paging still enabled and recently touched L1s pagetables, > or supposedly so). I unfortunately could not reproduce it, but the > inline patch should fix this.I can confirm this patch fixes the crash. Tnx. Keir: Please apply the patch. Christoph> > Thanks! > Gianluca > > Signed-off-by: Gianluca Guida <gianluca.guida@eu.citrix.com> > > diff -r 2be3c309e446 xen/arch/x86/mm/shadow/common.c > --- a/xen/arch/x86/mm/shadow/common.c Wed Jun 25 13:39:14 2008 -0400 > +++ b/xen/arch/x86/mm/shadow/common.c Wed Jun 25 14:54:34 2008 -0400 > @@ -630,6 +630,11 @@ > struct domain *d = v->domain; > > perfc_incr(shadow_oos_fixup_remove); > + > + /* If the domain is dying we might get called when deallocating > + * the shadows. Fixup tables are already freed so exit now. */ > + if (d->is_dying) > + return; > > idx = mfn_x(gmfn) % SHADOW_OOS_FT_HASH; > for_each_vcpu(d, v) > @@ -3168,6 +3173,7 @@ > { > free_xenheap_pages(v->arch.paging.shadow.oos_fixups, > SHADOW_OOS_FT_ORDER); > + v->arch.paging.shadow.oos_fixups = NULL; > } > > {-- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel