Is this familiar to anybody? I can reproduce this during live migration and heavy disk usage in both 3.1 and 3.2 on 64-bit. regards john Xen panic[dom=0xffff8300e2e86100/vcpu=0xffff8300e2edc100]: FATAL PAGE FAULT [error_code=0000] Faulting linear address: ffff81c0ffc07c48 rdi: ffff828c801efdf0 rsi: ffff8300e2ef7a18 rdx: ffff8300e2ef7a48 rcx: 8 r8: 800000000 r9: ffff8300e3de8100 rax: ffff8300e2ef7b08 rbx: ffff8300e2edc100 rbp: ffff8300e2ef7af8 r10: ffff828c80203468 r11: ffff8300e2ef7a88 r12: 282 r13: 3000000008 r14: ffff828c8013577a r15: 3e2ef79a8 fsb: 0 gsb: ffffff00ba9dc580 ds: 4b es: 4b fs: 0 gs: 1c3 cs: e008 rfl: 282 rsp: ffff8300e2ef7a00 rip: ffff828c8015b2eb: ss: 0 cr0: 8005003b cr2: ffff81c0ffc07c48 cr3: 1cc4c1000 cr4: 6f0 Xen panic[dom=0xffff8300e2e86100/vcpu=0xffff8300e2edc100]: FATAL PAGE FAULT [error_code=0000] Faulting linear address: ffff81c0ffc07c48 ffff8300e2ef7af8 xpv:do_page_fault+13d ffff8300e2ef7b38 xpv:handle_exception+4b ffff8300e2ef7b68 0xffff8300e2e86100 (in Xen) ffff8300e2ef7c58 xpv:sh_page_fault__shadow_4_guest_4+598 ffff8300e2ef7e58 xpv:paging_fault+3c ffff8300e2ef7e88 xpv:fixup_page_fault+22b ffff8300e2ef7ed8 xpv:do_page_fault+40 ffff8300e2ef7f18 xpv:handle_exception+4b _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Is this familiar to anybody? I can reproduce this during livemigration> and heavy disk usage in both 3.1 and 3.2 on 64-bit.What guest OS? How many VCPUs? Ian> regards > john > > Xen panic[dom=0xffff8300e2e86100/vcpu=0xffff8300e2edc100]: FATAL PAGE > FAULT > [error_code=0000] > Faulting linear address: ffff81c0ffc07c48 > > rdi: ffff828c801efdf0 rsi: ffff8300e2ef7a18 rdx: > ffff8300e2ef7a48 > rcx: 8 r8: 800000000 r9: > ffff8300e3de8100 > rax: ffff8300e2ef7b08 rbx: ffff8300e2edc100 rbp: > ffff8300e2ef7af8 > r10: ffff828c80203468 r11: ffff8300e2ef7a88 r12: > 282 > r13: 3000000008 r14: ffff828c8013577a r15: > 3e2ef79a8 > fsb: 0 gsb: ffffff00ba9dc580 ds: > 4b > es: 4b fs: 0 gs: > 1c3 > cs: e008 rfl: 282 rsp: > ffff8300e2ef7a00 > rip: ffff828c8015b2eb: ss: 0 > cr0: 8005003b cr2: ffff81c0ffc07c48 cr3: 1cc4c1000 cr4: > 6f0 > Xen panic[dom=0xffff8300e2e86100/vcpu=0xffff8300e2edc100]: FATAL PAGE > FAULT > [error_code=0000] > Faulting linear address: ffff81c0ffc07c48 > > > ffff8300e2ef7af8 xpv:do_page_fault+13d > ffff8300e2ef7b38 xpv:handle_exception+4b > ffff8300e2ef7b68 0xffff8300e2e86100 (in Xen) > ffff8300e2ef7c58 xpv:sh_page_fault__shadow_4_guest_4+598 > ffff8300e2ef7e58 xpv:paging_fault+3c > ffff8300e2ef7e88 xpv:fixup_page_fault+22b > ffff8300e2ef7ed8 xpv:do_page_fault+40 > ffff8300e2ef7f18 xpv:handle_exception+4b > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 01:47:00AM -0000, Ian Pratt wrote:> > Is this familiar to anybody? I can reproduce this during live > migration > > and heavy disk usage in both 3.1 and 3.2 on 64-bit. > > What guest OS? How many VCPUs?Solaris domU+dom0. Both have 4 VCPUs. I can test a Linux domU and try to reproduce: I''ll do that tomorrow. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 01:47:00AM -0000, Ian Pratt wrote:> > Is this familiar to anybody? I can reproduce this during live > migration > > and heavy disk usage in both 3.1 and 3.2 on 64-bit. > > What guest OS? How many VCPUs?I can reproduce with: Solaris domU with > 1 VCPU I can''t reproduce with: Solaris domU with 1 VCPU Linux domU with 4 VCPUs Seems like there''s some unusual race with SMP Solaris domUs regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 01:18:53AM +0000, John Levon wrote:> ffff8300e2ef7c58 xpv:sh_page_fault__shadow_4_guest_4+598Looking at what I can of the disasm, this looks like we''re here: 2817 /* Make sure there is enough free shadow memory to build a chain of 2818 * shadow tables: one SHADOW_MAX_ORDER chunk will always be enough 2819 * to allocate all we need. (We never allocate a top-level shadow 2820 * on this path, only a 32b l1, pae l2+1 or 64b l3+2+1) */ 2821 shadow_prealloc(d, SHADOW_MAX_ORDER); 2822 2823 /* Acquire the shadow. This must happen before we figure out the rights 2824 * for the shadow entry, since we might promote a page here. */ 2825 ptr_sl1e = shadow_get_and_create_l1e(v, &gw, &sl1mfn, ft); >----< So we''re taking a fault somewhere in shadow_get_and_create_l1e(). Unfortunately the exact point doesn''t look easy to find, since the stack trace makes no sense: ffff8300e2ef7b38 xpv`do_page_fault+0x13d(ffff8300e2ef7b48) ffff8300e2ef7b68 0xffff828c801d354b() ffff8300e2ef7c58 0xffff8300e2e86100() ffff8300e2ef7e58 xpv`sh_page_fault__shadow_4_guest_4+0x598() Looking through the stack by hand, I do see:> ffff828c8014e5f2=pxpv`guest_get_eff_l1e+0xb9 but of course this might just be stack junk. regardsjohn _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
If you have a debug build of Xen then the backtrace should be trustworthy. Are there addresses in the backtrace that don''t look to be within Xen text? Your backtraces don''t appear to be in the usual Xen format, so I''m not entirely sure what I''m looking at. -- Keir On 16/1/08 20:37, "John Levon" <levon@movementarian.org> wrote:> On Wed, Jan 16, 2008 at 01:18:53AM +0000, John Levon wrote: > >> ffff8300e2ef7c58 xpv:sh_page_fault__shadow_4_guest_4+598 > > Looking at what I can of the disasm, this looks like we''re here: > > 2817 /* Make sure there is enough free shadow memory to build a chain of > 2818 * shadow tables: one SHADOW_MAX_ORDER chunk will always be enough > 2819 * to allocate all we need. (We never allocate a top-level shadow > 2820 * on this path, only a 32b l1, pae l2+1 or 64b l3+2+1) */ > 2821 shadow_prealloc(d, SHADOW_MAX_ORDER); > 2822 > 2823 /* Acquire the shadow. This must happen before we figure out the > rights > 2824 * for the shadow entry, since we might promote a page here. */ > 2825 ptr_sl1e = shadow_get_and_create_l1e(v, &gw, &sl1mfn, ft); >> ----< > > So we''re taking a fault somewhere in shadow_get_and_create_l1e(). > Unfortunately the > exact point doesn''t look easy to find, since the stack trace makes no sense: > > ffff8300e2ef7b38 xpv`do_page_fault+0x13d(ffff8300e2ef7b48) > ffff8300e2ef7b68 0xffff828c801d354b() > ffff8300e2ef7c58 0xffff8300e2e86100() > ffff8300e2ef7e58 xpv`sh_page_fault__shadow_4_guest_4+0x598() > > Looking through the stack by hand, I do see: > >> ffff828c8014e5f2=p > xpv`guest_get_eff_l1e+0xb9 > > but of course this might just be stack junk. > > regardsjohn > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 09:43:41PM +0000, Keir Fraser wrote:> If you have a debug build of Xen then the backtrace should be trustworthy. > Are there addresses in the backtrace that don''t look to be within Xen text? > Your backtraces don''t appear to be in the usual Xen format, so I''m not > entirely sure what I''m looking at.I''ll try turning off our panic support to see if the xen-reported stack is any better. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 09:43:41PM +0000, Keir Fraser wrote:> If you have a debug build of Xen then the backtrace should be trustworthy. > Are there addresses in the backtrace that don''t look to be within Xen text?Here''s what I got without the panic patch (sigh): (XEN) sh error: sh_page_fault__shadow_4_guest_4(): Recursive shadow fault: lock was taken by sh_page_fault__shadow_4_guest_4 (XEN) ----[ Xen-3.1.2 x86_64 debug=n Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff828c80168822>] shadow_set_l1e+0x32/0x1b0 (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 800000012521c067 rcx: 0000000000132b04 (XEN) rdx: 800000012521c067 rsi: 00000000000000b1 rdi: ffff8300e2ed2080 (XEN) rbp: ffff8300e2e0fc08 rsp: ffff8300e2e0fbc8 r8: 0000000000000006 (XEN) r9: 0000000000000006 r10: 0000000132b05118 r11: 0000000132b07ff0 (XEN) r12: ffff8300e2ed2080 r13: 00000000000000b1 r14: 0000000000132b04 (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000132b07000 cr2: 00000000000000b1 (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300e2e0fbc8: (XEN) ffff8300e2e0fc08 0000000080167b02 800000012521c067 (XEN) ffff8300e2e0ff28 ffff8300e2ed2080 ffff8300e2fb6080 (XEN) ffff828c80221b20 0000000000000001 ffff8300e2e0fe18 (XEN) ffff828c8016af61 0000000000000000 ffffff0003fd3ac0 (XEN) 0000000000000000 ffff828c80221b20 ffff8300e2e0fd98 (XEN) ffff828c8010d757 ffff830184aeb000 0000000000000008 (XEN) ffff8300e2fb6080 ffff828c801c52b8 0000000000132b07 (XEN) ffff81c0ffc00118 00000000000000b1 ffff8300e2e0fcf0 (XEN) 0000000000000008 000000000012521c 00000006e2e0fd68 (XEN) ffff8300e2e0fe78 ffffff000475d848 ffff8300e2e06080 (XEN) ffff8300e2e0fcf8 800000012521c067 0000000132b04067 (XEN) 0000000132b05067 0000000132b06067 0000000000132b06 (XEN) 0000000000132b05 0000000000132b04 ffff8300e2e0fd18 (XEN) 0000000000000082 0000000000003000 ffff8300e2e06080 (XEN) ffff8300e2e0fd28 ffff828c801355b2 ffff8300e2e0fe78 (XEN) ffff828c801288db ffff828c801c8100 0000005878a902d7 (XEN) ffff8300e2ed2080 ffff8300e2e06080 0000000000000086 (XEN) 0000000000003000 ffff8300e2e0ff28 ffff8300e2e06080 (XEN) ffff8300e2ed2080 ffff8300e2e0fdc0 ffff828c801252d7 (XEN) 820000000000efff ffffff000475d848 ffff8140a0502ff0 (XEN) Xen call trace: (XEN) [<ffff828c80168822>] shadow_set_l1e+0x32/0x1b0 (XEN) [<ffff828c8016af61>] sh_page_fault__shadow_4_guest_4+0xb61/0x10b0 (XEN) [<ffff828c8013b1c2>] do_page_fault+0x1f2/0x500 (XEN) [<ffff828c8017a495>] handle_exception_saved+0x2d/0x6b (XEN) (XEN) Pagetable walk from 00000000000000b1: (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 00000000000000b1 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Well that''s a lot saner even without being a debug build. Possibly Tim has some insight into how this can happen... I expect the ''recursive shadow fault'' is simply a result of the fault in shadow_set_l1e() causing an unexpected re-entry into shadow code. It''d be interesting to know which invocation of shadow_set_l1e() is on the backtrace. That might be easier to work out if you can repro the crash with a debug build of Xen. Alternatively, since there is obviously a very bogus nearly-NULL pointer involved, perhaps you could add some tracing to pick up on that? Possibly the sl1e argument to shadow_set_l1e() is the thing that is bogus here. -- Keir On 16/1/08 22:37, "John Levon" <levon@movementarian.org> wrote:> On Wed, Jan 16, 2008 at 09:43:41PM +0000, Keir Fraser wrote: > >> If you have a debug build of Xen then the backtrace should be trustworthy. >> Are there addresses in the backtrace that don''t look to be within Xen text? > > Here''s what I got without the panic patch (sigh): > > (XEN) sh error: sh_page_fault__shadow_4_guest_4(): Recursive shadow fault: > lock was taken by sh_page_fault__shadow_4_guest_4 > (XEN) ----[ Xen-3.1.2 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 3 > (XEN) RIP: e008:[<ffff828c80168822>] shadow_set_l1e+0x32/0x1b0 > (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: 800000012521c067 rcx: 0000000000132b04 > (XEN) rdx: 800000012521c067 rsi: 00000000000000b1 rdi: ffff8300e2ed2080 > (XEN) rbp: ffff8300e2e0fc08 rsp: ffff8300e2e0fbc8 r8: 0000000000000006 > (XEN) r9: 0000000000000006 r10: 0000000132b05118 r11: 0000000132b07ff0 > (XEN) r12: ffff8300e2ed2080 r13: 00000000000000b1 r14: 0000000000132b04 > (XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 0000000132b07000 cr2: 00000000000000b1 > (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8300e2e0fbc8: > (XEN) ffff8300e2e0fc08 0000000080167b02 800000012521c067 > (XEN) ffff8300e2e0ff28 ffff8300e2ed2080 ffff8300e2fb6080 > (XEN) ffff828c80221b20 0000000000000001 ffff8300e2e0fe18 > (XEN) ffff828c8016af61 0000000000000000 ffffff0003fd3ac0 > (XEN) 0000000000000000 ffff828c80221b20 ffff8300e2e0fd98 > (XEN) ffff828c8010d757 ffff830184aeb000 0000000000000008 > (XEN) ffff8300e2fb6080 ffff828c801c52b8 0000000000132b07 > (XEN) ffff81c0ffc00118 00000000000000b1 ffff8300e2e0fcf0 > (XEN) 0000000000000008 000000000012521c 00000006e2e0fd68 > (XEN) ffff8300e2e0fe78 ffffff000475d848 ffff8300e2e06080 > (XEN) ffff8300e2e0fcf8 800000012521c067 0000000132b04067 > (XEN) 0000000132b05067 0000000132b06067 0000000000132b06 > (XEN) 0000000000132b05 0000000000132b04 ffff8300e2e0fd18 > (XEN) 0000000000000082 0000000000003000 ffff8300e2e06080 > (XEN) ffff8300e2e0fd28 ffff828c801355b2 ffff8300e2e0fe78 > (XEN) ffff828c801288db ffff828c801c8100 0000005878a902d7 > (XEN) ffff8300e2ed2080 ffff8300e2e06080 0000000000000086 > (XEN) 0000000000003000 ffff8300e2e0ff28 ffff8300e2e06080 > (XEN) ffff8300e2ed2080 ffff8300e2e0fdc0 ffff828c801252d7 > (XEN) 820000000000efff ffffff000475d848 ffff8140a0502ff0 > (XEN) Xen call trace: > (XEN) [<ffff828c80168822>] shadow_set_l1e+0x32/0x1b0 > (XEN) [<ffff828c8016af61>] sh_page_fault__shadow_4_guest_4+0xb61/0x10b0 > (XEN) [<ffff828c8013b1c2>] do_page_fault+0x1f2/0x500 > (XEN) [<ffff828c8017a495>] handle_exception_saved+0x2d/0x6b > (XEN) > (XEN) Pagetable walk from 00000000000000b1: > (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: 00000000000000b1 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds..._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 11:01:21PM +0000, Keir Fraser wrote:> Well that''s a lot saner even without being a debug build. Possibly Tim hasI totally missed that I didn''t have a debug build. My little script to build Xen itself was broken, and wasn''t setting ''debug=y''. This also explains the bad Solaris panic stack (no frame pointer, and we expect it). There was also another problem[1]. I''ll reproduce in debug mode, and look some more at stuff around shadow_set_l1e() as you suggest, and get back to you soon. cheers john [1] because we don''t want to get stuck, the Solaris panic path doesn''t take any locks, and that means no console_start_sync(), so we were leaving the printk serial buffer unprinted. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 11:01:21PM +0000, Keir Fraser wrote:> > (XEN) sh error: sh_page_fault__shadow_4_guest_4(): Recursive shadow fault: > > lock was taken by sh_page_fault__shadow_4_guest_4 > > (XEN) ----[ Xen-3.1.2 x86_64 debug=n Not tainted ]----This one might be bogus... noticed a problem. I''ll report back with the proper panic shortly regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jan 16, 2008 at 11:01:21PM +0000, Keir Fraser wrote:> Well that''s a lot saner even without being a debug build. Possibly Tim hasRight, I added something to dig out the pending serial console buffer:> ::serlog.1.2-xvm x86_64 debug=y Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff828c801b4848>] shadow_get_and_create_l1e+0x47/0x32f (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: ffff81c0ffc07c48 rbx: ffff8300e2e86100 rcx: 0000000000000000 (XEN) rdx: ffff8300e2ef7dd8 rsi: ffff8300e2ef7ad0 rdi: 00000000001d2cab (XEN) rbp: ffff8300e2ef7c58 rsp: ffff8300e2ef7bf8 r8: 0000000000000006 (XEN) r9: 0000000000000006 r10: ffff8300e2edc100 r11: ffffff00f683e540 (XEN) r12: ffffff00b96fa858 r13: ffffff00b96f8050 r14: ffffff01f12d1e20 (XEN) r15: ffffff0105d8b980 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000001cc4c1000 cr2: ffff81c0ffc07c48 (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300e2ef7bf8: (XEN) ffff828404e45778 0000000000000000 00000000001f4efc (XEN) 0000000000000050 0000000000000002 ffff81c0ffc07c48 (XEN) 00000000001d2cab ffff8300e2e87280 00000006e2e20100 (XEN) ffff8300e2ef7da8 ffff8300e2ef7dd8 ffff8300e2edc100 (XEN) ffff8300e2ef7e58 ffff828c801b68e8 0000000000000000 (XEN) ffff8300e2e2ac38 ffff8301bc91ffc0 00000000001bc91f (XEN) 0000000000071916 ffff8300e2ef7d98 0000000000000006 (XEN) 0000000100000006 ffff8300e2ef7d18 ffff828c8014e1fa (XEN) 0000000498000004 ffff830000000008 000000040076ddc8 (XEN) 00000006a0000005 0000000100000000 0000000100000008 (XEN) 0000000098000004 ffff8300e2ef7f28 ffff8300e2ef7d38 (XEN) ffff8300e2ef7f28 ffff8300e2ef7d48 0000000000000086 (XEN) ffff828c80296838 000000038023d2a0 ffff8300e2ef7d48 (XEN) ffff828c8015c5ef ffff828c80296838 ffff8300e2ef7f28 (XEN) ffff8300e2ef7d68 ffff828c8015c5b9 00000002e2ef7d78 (XEN) ffff828c8027c780 00007cff1d108267 ffff828c801464e1 (XEN) ffffff00b5428008 0000000000000000 0000000000000008 (XEN) ffff817f80f89690 ffff8300e2ef7e60 000000088027c780 (XEN) 0000000000000000 ffff8300e2ef7e58 0000000000094720 (XEN) ffff828c8014e5f2 0000000000094720 0000002700000002 (XEN) Xen call trace: (XEN) [<ffff828c801b4848>] shadow_get_and_create_l1e+0x47/0x32f (XEN) [<ffff828c801b68e8>] sh_page_fault__shadow_4_guest_4+0x598/0xb9e (XEN) [<ffff828c80162fff>] paging_fault+0x3c/0x3e (XEN) [<ffff828c80162fa9>] fixup_page_fault+0x22b/0x245 (XEN) [<ffff828c80163041>] do_page_fault+0x40/0x15c (XEN) (XEN) Pagetable walk from ffff81c0ffc07c48: (XEN) L4[0x103] = 00000001cc4c1063 0000000000015d86 (XEN) L3[0x103] = 00000001cc4c1063 0000000000015d86 (XEN) L2[0x1fe] = 00000001a8d36067 0000000000015b61 (XEN) L1[0x007] = 0000000000000000 ffffffffffffffff> shadow_get_and_create_l1e+0x47::disxpv`shadow_get_and_create_l1e+0x1a: leaq -0x30(%rbp),%rdx xpv`shadow_get_and_create_l1e+0x1e: movq -0x10(%rbp),%rsi xpv`shadow_get_and_create_l1e+0x22: movq -0x8(%rbp),%rdi xpv`shadow_get_and_create_l1e+0x26: call -0x107 <xpv`shadow_get_and_create_l2e> xpv`shadow_get_and_create_l1e+0x2b: movq %rax,-0x38(%rbp) xpv`shadow_get_and_create_l1e+0x2f: cmpq $0x0,-0x38(%rbp) xpv`shadow_get_and_create_l1e+0x34: jne +0xd <xpv`shadow_get_and_create_l1e+0x43> xpv`shadow_get_and_create_l1e+0x36: movq $0x0,-0x58(%rbp) xpv`shadow_get_and_create_l1e+0x3e: jmp +0x2df <xpv`shadow_get_and_create_l1e+0x322> xpv`shadow_get_and_create_l1e+0x43: movq -0x38(%rbp),%rax xpv`shadow_get_and_create_l1e+0x47: movq (%rax),%rdi xpv`shadow_get_and_create_l1e+0x4a: call -0x8f3 <xpv`shadow_l2e_get_flags> xpv`shadow_get_and_create_l1e+0x4f: andl $0x1,%eax xpv`shadow_get_and_create_l1e+0x52: testl %eax,%eax xpv`shadow_get_and_create_l1e+0x54: je +0x96 <xpv`shadow_get_and_create_l1e+0xf0> xpv`shadow_get_and_create_l1e+0x5a: cmpl $0x6,-0x1c(%rbp) xpv`shadow_get_and_create_l1e+0x5e: jne +0x3d <xpv`shadow_get_and_create_l1e+0x9d> xpv`shadow_get_and_create_l1e+0x60: movq -0x10(%rbp),%rax xpv`shadow_get_and_create_l1e+0x64: movq 0x8(%rax),%rax xpv`shadow_get_and_create_l1e+0x68: movl (%rax),%edi xpv`shadow_get_and_create_l1e+0x6a: call -0x1f1b <xpv`guest_l2e_get_flags> (I''m back on 3.1 bits here) 1894 sl2e = shadow_get_and_create_l2e(v, gw, &sl2mfn, ft); 1895 if ( sl2e == NULL ) return NULL; 1896 /* Install the sl1 in the l2e if it wasn''t there or if we need to 1897 * re-do it to fix a PSE dirty bit. */ 1898 if ( shadow_l2e_get_flags(*sl2e) & _PAGE_PRESENT So sl2e is non-zero, but bogus:> ffff81c0ffc07c48::dump0 1 2 3 4 5 6 7 \/ 9 a b c d e f 01234567v9abcdef mdb: failed to read data at 0xffff81c0ffc07c48: no mapping for address This pointer is a constant though (right?) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/1/08 02:42, "John Levon" <levon@movementarian.org> wrote:> On Wed, Jan 16, 2008 at 11:01:21PM +0000, Keir Fraser wrote: > > (I''m back on 3.1 bits here) > > 1894 sl2e = shadow_get_and_create_l2e(v, gw, &sl2mfn, ft); > 1895 if ( sl2e == NULL ) return NULL; > 1896 /* Install the sl1 in the l2e if it wasn''t there or if we need to > 1897 * re-do it to fix a PSE dirty bit. */ > 1898 if ( shadow_l2e_get_flags(*sl2e) & _PAGE_PRESENT > > So sl2e is non-zero, but bogus: > >> ffff81c0ffc07c48::dump > 0 1 2 3 4 5 6 7 \/ 9 a b c d e f 01234567v9abcdef > mdb: failed to read data at 0xffff81c0ffc07c48: no mapping for address > > This pointer is a constant though (right?)What do you mean by ''a constant''? It''s a pointer into the guest linear pagetable, which I suppose is what we expect, and for some reason there is no PTE at that location to be read. Clearly a higher-level page directory is missing. Possibly shadow code has got confused and thought a page directory was present when it wasn''t, or perhaps the page directory went away (and/or was in the process of disappearing from TLBs) as the shadow fault handler went about its business. I''m sure Tim will have some insights. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 17/1/08 02:42, "John Levon" <levon@movementarian.org> wrote:> Right, I added something to dig out the pending serial console buffer: > >> ::serlog > .1.2-xvm x86_64 debug=y Not tainted ]---- > (XEN) CPU: 3 > (XEN) RIP: e008:[<ffff828c801b4848>] shadow_get_and_create_l1e+0x47/0x32fOh, also this debug build backtrace is very different from the non-debug one. Do you think the non-debug backtrace was totally bogus, or are we looking at a common-mode fault that can have both symptoms (i.e, almost-NULL pointer in shadow_set_l1e(); *and* bogus linear pagetable pointer in shadow_get_and_create_l1e())? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 23:01 +0000 on 16 Jan (1200524481), Keir Fraser wrote:> Well that''s a lot saner even without being a debug build. Possibly Tim has > some insight into how this can happen... I expect the ''recursive shadow > fault'' is simply a result of the fault in shadow_set_l1e() causing an > unexpected re-entry into shadow code.Yep, that''s exactly it. It''s there to stop an unexpected #PF in the shadow code itself from being even more confusing by having the shadow code try to handle it and then fail in some other, wierder, way later. Instead, we bail out and let the normal fatal-page-fault handler take over. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 08:10 +0000 on 17 Jan (1200557415), Keir Fraser wrote:> What do you mean by ''a constant''? It''s a pointer into the guest linear > pagetable, which I suppose is what we expect, and for some reason there is > no PTE at that location to be read. Clearly a higher-level page directory is > missing. Possibly shadow code has got confused and thought a page directory > was present when it wasn''t, or perhaps the page directory went away (and/or > was in the process of disappearing from TLBs) as the shadow fault handler > went about its business. I''m sure Tim will have some insights. :-)Hmm. Yes, it''s a pointer into the (shadow) linear PT, and we''ve just checked that it''s valid or made it so. Code inspection has lead to a lot of dead ends so far; can you try the attached patch? Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jan 17, 2008 at 08:11:56AM +0000, Keir Fraser wrote:> Oh, also this debug build backtrace is very different from the non-debug > one. Do you think the non-debug backtrace was totally bogus, or are weIt was totally bogus, I think (my apologies). regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jan 17, 2008 at 10:53:12AM +0000, Tim Deegan wrote:> Hmm. Yes, it''s a pointer into the (shadow) linear PT, and we''ve just > checked that it''s valid or made it so. Code inspection has lead to a > lot of dead ends so far; can you try the attached patch?I haven''t reproduced the same panic yet, but I did get the one below instead. I''m still trying to get it to go down the path where you added the debugging. This one looks pretty similar though. regards john (XEN) ----[ Xen-3.1.2-xvm x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c801b26ba>] shadow_set_l1e+0x4f/0x14c (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: ffff81ff80d64ff0 rbx: ffff8300e2ed8100 rcx: 000000000015c505 (XEN) rdx: 00000001c96ef065 rsi: ffff81ff80d64ff0 rdi: ffff8300e2e56100 (XEN) rbp: ffff828c80267c58 rsp: ffff828c80267c08 r8: 0000000000000002 (XEN) r9: 0000000000000002 r10: ffff8300e2e56100 r11: ffffff015cdaa808 (XEN) r12: ffffff01ac9fe3c0 r13: ffffff02bdbb1648 r14: ffffff02bdbb1540 (XEN) r15: ffffff014e4db008 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000001cb03f000 cr2: ffff81ff80d64ff0 (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff828c80267c08: (XEN) 00000000001686df ffff828c80267c18 ffff8301686dfff0 (XEN) ffff8300e2e56100 ffff8300e2ed8100 0000000080267d98 (XEN) 000000000015c505 00000001c96ef065 ffff81ff80d64ff0 (XEN) ffff8300e2e56100 ffff828c80267e58 ffff828c801b5efe (XEN) 0000000800000000 0000000000000004 ffff8301686dfff0 (XEN) 00000000001686df 00000000001c96ef ffff828c80267d98 (XEN) 0000000000000002 0000000100000002 ffff828c80144306 (XEN) ffff828c8023d2b8 0000027b0000027a ffff828c80267cf0 (XEN) 0000000000000082 00000002e2e56248 0000000100000000 (XEN) ffff8300e2e02248 00000000cb03f000 ffff828c80267d10 (XEN) ffff828c80141ad3 ffff8300e2e02248 00000000e2e56100 (XEN) ffff828c80267db0 ffff828c80141a95 820000060000efff (XEN) 000000000000ffff ffff828c80267d70 ffff828c8023d2a0 (XEN) ffff8300e2e56488 00000000000000a8 ffff828c80267f28 (XEN) 0000000000000000 0000002000000000 0000002000000020 (XEN) ffff828c80267e40 0000000180142209 0000000000000000 (XEN) 0000000000000008 ffff81ff80d64ff0 00000001c96ef065 (XEN) 000000088013c486 000000000015c505 ffff828c80267e58 (XEN) 00000000001c96ef ffff828c8014e5e2 00000000001c96ef (XEN) 0000002780267e00 ffffff01ac9fe3c8 ffff8140a0502ff0 (XEN) Xen call trace: (XEN) [<ffff828c801b26ba>] shadow_set_l1e+0x4f/0x14c (XEN) [<ffff828c801b5efe>] sh_page_fault__shadow_4_guest_4+0x6fe/0xb9e (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c (XEN) (XEN) Pagetable walk from ffff81ff80d64ff0: (XEN) L4[0x103] = 00000001cb03f063 000000000000063b (XEN) L3[0x1fe] = 00000001d358e067 00000000000005d4 (XEN) L2[0x006] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff81ff80d64ff0 (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 22:25 +0000 on 17 Jan (1200608703), John Levon wrote:> (XEN) Xen call trace: > (XEN) [<ffff828c801b26ba>] shadow_set_l1e+0x4f/0x14c > (XEN) [<ffff828c801b5efe>] sh_page_fault__shadow_4_guest_4+0x6fe/0xb9e > (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e > (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 > (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c > (XEN) > (XEN) Pagetable walk from ffff81ff80d64ff0: > (XEN) L4[0x103] = 00000001cb03f063 000000000000063b > (XEN) L3[0x1fe] = 00000001d358e067 00000000000005d4 > (XEN) L2[0x006] = 0000000000000000 ffffffffffffffffHmmm. This is the same error, one function further down the chain, which tells us something interesting. In this case, the shadow l3e has been written into the l3 linear map and then used successfully via the l2 linear map to write the l2e, but is now missing when we come to the l1 linear map. Bizarre. Either something has changed the sl4e or sl3e under our feet (surely not - we have the shadow lock), or it could still be a missing TLB flush. If we changed the sl4e (from one present entry to another) but didn''t flush the TLB it could cause this. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Jan 18, 2008 at 09:41:05AM +0000, Tim Deegan wrote:> At 22:25 +0000 on 17 Jan (1200608703), John Levon wrote: > > (XEN) Xen call trace: > > (XEN) [<ffff828c801b26ba>] shadow_set_l1e+0x4f/0x14c > > (XEN) [<ffff828c801b5efe>] sh_page_fault__shadow_4_guest_4+0x6fe/0xb9e > > (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e > > (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 > > (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c > > (XEN) > > (XEN) Pagetable walk from ffff81ff80d64ff0: > > (XEN) L4[0x103] = 00000001cb03f063 000000000000063b > > (XEN) L3[0x1fe] = 00000001d358e067 00000000000005d4 > > (XEN) L2[0x006] = 0000000000000000 ffffffffffffffff > > Hmmm. This is the same error, one function further down the chain, > which tells us something interesting. In this case, the shadow l3e has > been written into the l3 linear map and then used successfully via the > l2 linear map to write the l2e, but is now missing when we come to the > l1 linear map. Bizarre. > > Either something has changed the sl4e or sl3e under our feet (surely not > - we have the shadow lock), or it could still be a missing TLB flush. > If we changed the sl4e (from one present entry to another) but didn''t > flush the TLB it could cause this.Here''s another one. This time, I was running with SHADOW_OPTIMIZATIONS == 0. (XEN) Xen call trace: (XEN) [<ffff828c8018809b>] shadow_set_l2e+0x41/0x40c (XEN) [<ffff828c80189520>] shadow_get_and_create_l1e+0x2b3/0x344 (XEN) [<ffff828c8018b610>] sh_page_fault__shadow_4_guest_4+0x5a7/0xba3 (XEN) [<ffff828c8014aad0>] fixup_page_fault+0x1e0/0x1f2 (XEN) [<ffff828c8014ab8a>] do_page_fault+0xa8/0x186 (XEN) (XEN) Pagetable walk from ffff81c0ffc09348: (XEN) L4[0x103] = 00000001ca552063 00000000000004b9 (XEN) L3[0x103] = 00000001ca552063 00000000000004b9 (XEN) L2[0x1fe] = 00000001ca1e2067 00000000000007cd regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 09:41 +0000 on 18 Jan (1200649265), Tim Deegan wrote:> Either something has changed the sl4e or sl3e under our feet (surely not > - we have the shadow lock), or it could still be a missing TLB flush. > If we changed the sl4e (from one present entry to another) but didn''t > flush the TLB it could cause this.So: another patch for you; can you see if this makes the crashes go away? Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Jan 18, 2008 at 04:53:24PM +0000, Tim Deegan wrote:> At 09:41 +0000 on 18 Jan (1200649265), Tim Deegan wrote: > > Either something has changed the sl4e or sl3e under our feet (surely not > > - we have the shadow lock), or it could still be a missing TLB flush. > > If we changed the sl4e (from one present entry to another) but didn''t > > flush the TLB it could cause this. > > So: another patch for you; can you see if this makes the crashes go away?I''m afraid not: ffff828c80267a88 xpv:do_page_fault+13d ffff828c80267ac8 xpv:handle_exception+4b ffff828c80267af8 0xffff8300e2e44100 (in Xen) ffff828c80267be8 xpv:shadow_get_and_create_l1e+26b ffff828c80267c58 xpv:sh_page_fault__shadow_4_guest_4+598 ffff828c80267e58 xpv:paging_fault+3c ffff828c80267e88 xpv:fixup_page_fault+22b ffff828c80267ed8 xpv:do_page_fault+40 ffff828c80267f18 xpv:handle_exception+4b ffff828c80267f48 eb7c54b8 regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 16:55 +0000 on 20 Jan (1200848136), John Levon wrote:> On Fri, Jan 18, 2008 at 04:53:24PM +0000, Tim Deegan wrote: > > So: another patch for you; can you see if this makes the crashes go away? > > I''m afraid not:Argh. Well, here''s more debugging, since you seem to hit the _l1e case more often. This patch includes the previous two as well. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jan 22, 2008 at 09:45:41AM +0000, Tim Deegan wrote:> Argh. Well, here''s more debugging, since you seem to hit the _l1e case > more often. This patch includes the previous two as well.See below. I also saw "Can''t see the l1e" version as well. cheers john (XEN) sh error: shadow_get_and_create_l1e(): Can''t see the l2e, even with TLB flushPagetable walk from ffff81c0ffc06928: (XEN) L4[0x103] = 00000001d2f4d063 000000000007dd4e (XEN) L3[0x103] = 00000001d2f4d063 000000000007dd4e (XEN) L2[0x1fe] = 00000001f73ca067 000000000007dc91 (XEN) L1[0x006] = 0000000000000000 ffffffffffffffff (XEN) Pagetable walk from ffffff01a4a6e8f0: (XEN) L4[0x1fe] = 00000001f73ca067 000000000007dc91 (XEN) L3[0x006] = 0000000000000000 ffffffffffffffff (XEN) v->arch.shadow_table[0] == 0x1d2f4d (XEN) CR3 = 0x1d2f4d000 (XEN) Xen WARN at multi.c:1910 (XEN) ----[ Xen-3.1.2-xvm x86_64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff828c801b3ca0>] shadow_get_and_create_l1e+0x147/0x46c (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: ffff828c802035e4 rbx: ffff8300e3daa100 rcx: 0000000000000008 (XEN) rdx: ffff828c8027dbf2 rsi: 000000000000000a rdi: ffff828c802035e4 (XEN) rbp: ffff8300e2e0fc38 rsp: ffff8300e2e0fb98 r8: 00000000ffffffff (XEN) r9: 00000000ffffffff r10: ffff828c8027dfdf r11: ffff828c8027dbe6 (XEN) r12: ffffff01a4a6e8b8 r13: ffffff01a48594c0 r14: ffffff01a4859480 (XEN) r15: ffffff0146e28608 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000001d2f4d000 cr2: ffff81c0ffc06928 (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300e2e0fb98: (XEN) 00000020e2e0fbc0 ffff8300e2e0fbc0 ffff8300e2e0fbc8 (XEN) ffff828c8015ba1d ffffffffffffffff 0000000000000000 (XEN) ffff8300e2e0fc38 ffff8300e2e0fbf8 0000000000000008 (XEN) ffff81c0ffc06928 0000000000000008 0000000000000008 (XEN) 0000000000000000 ffff81c0ffc06928 000000000015d83e (XEN) ffff8300e3dab280 00000006e2eda100 ffff8300e2e0fda8 (XEN) ffff8300e2e0fdd8 ffff8300e2eca100 ffff8300e2e0fe58 (XEN) ffff828c801b5da0 000000fc00000000 0000000800000002 (XEN) 0000000000000044 ffff8301c1e7cab8 00000000001c1e7c (XEN) 00000000001c60c1 ffff8300e2e0fd98 0000000000000006 (XEN) 0000000100000006 000000008015b93f 00000001c60c1065 (XEN) 0000000000000000 ffff8300e2e0fc98 ffff81ff80a5bab8 (XEN) 0000000000000008 0000000000000000 ffff8300e2e0fd20 (XEN) 00000006e2e0fd20 0000000100000000 ffff8300e2e0fd20 (XEN) ffff8300e2e0fd08 ffff828c8015b5f9 000000208021b300 (XEN) 0000000000000000 0000000000000004 ffff8300e2e0fe90 (XEN) ffffff000414e4d0 0000000400000020 ffff8300e2e0fe8c (XEN) ffffff000414e4cc ffff8300e2e0fd88 ffff828c801668a3 (XEN) ffff8300e2e0fd68 0000000000000000 0000000000000004 (XEN) ffff8300e2e0fe8c ffffff000414e4cc 000000008023f4c0 (XEN) Xen call trace: (XEN) [<ffff828c801b3ca0>] shadow_get_and_create_l1e+0x147/0x46c (XEN) [<ffff828c801b5da0>] sh_page_fault__shadow_4_guest_4+0x598/0xce7 (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c (XEN) (XEN) ----[ Xen-3.1.2-xvm x86_64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff828c801b3cb3>] shadow_get_and_create_l1e+0x15a/0x46c (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: ffff81c0ffc06928 rbx: ffff8300e3daa100 rcx: 0000000000000008 (XEN) rdx: ffff828c8027dbf2 rsi: 000000000000000a rdi: ffff828c802035e4 (XEN) rbp: ffff8300e2e0fc38 rsp: ffff8300e2e0fb98 r8: 00000000ffffffff (XEN) r9: 00000000ffffffff r10: ffff828c8027dfdf r11: ffff828c8027dbe6 (XEN) r12: ffffff01a4a6e8b8 r13: ffffff01a48594c0 r14: ffffff01a4859480 (XEN) r15: ffffff0146e28608 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000001d2f4d000 cr2: ffff81c0ffc06928 (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300e2e0fb98: (XEN) 00000020e2e0fbc0 ffff8300e2e0fbc0 ffff8300e2e0fbc8 (XEN) ffff828c8015ba1d ffffffffffffffff 0000000000000000 (XEN) ffff8300e2e0fc38 ffff8300e2e0fbf8 0000000000000008 (XEN) ffff81c0ffc06928 0000000000000008 0000000000000008 (XEN) 0000000000000000 ffff81c0ffc06928 000000000015d83e (XEN) ffff8300e3dab280 00000006e2eda100 ffff8300e2e0fda8 (XEN) ffff8300e2e0fdd8 ffff8300e2eca100 ffff8300e2e0fe58 (XEN) ffff828c801b5da0 000000fc00000000 0000000800000002 (XEN) 0000000000000044 ffff8301c1e7cab8 00000000001c1e7c (XEN) 00000000001c60c1 ffff8300e2e0fd98 0000000000000006 (XEN) 0000000100000006 000000008015b93f 00000001c60c1065 (XEN) 0000000000000000 ffff8300e2e0fc98 ffff81ff80a5bab8 (XEN) 0000000000000008 0000000000000000 ffff8300e2e0fd20 (XEN) 00000006e2e0fd20 0000000100000000 ffff8300e2e0fd20 (XEN) ffff8300e2e0fd08 ffff828c8015b5f9 000000208021b300 (XEN) 0000000000000000 0000000000000004 ffff8300e2e0fe90 (XEN) ffffff000414e4d0 0000000400000020 ffff8300e2e0fe8c (XEN) ffffff000414e4cc ffff8300e2e0fd88 ffff828c801668a3 (XEN) ffff8300e2e0fd68 0000000000000000 0000000000000004 (XEN) ffff8300e2e0fe8c ffffff000414e4cc 000000008023f4c0 (XEN) Xen call trace: (XEN) [<ffff828c801b3cb3>] shadow_get_and_create_l1e+0x15a/0x46c (XEN) [<ffff828c801b5da0>] sh_page_fault__shadow_4_guest_4+0x598/0xce7 (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c (XEN) (XEN) Pagetable walk from ffff81c0ffc06928: (XEN) L4[0x103] = 00000001d2f4d063 000000000007dd4e (XEN) L3[0x103] = 00000001d2f4d063 000000000007dd4e (XEN) L2[0x1fe] = 00000001f73ca067 000000000007dc91 (XEN) L1[0x006] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff81c0ffc06928 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Any progress on this one? We may be seeing it too (on 3.1.3 near final), at least the call trace looks very similar to one of the trace''s that John previously posted on this thread. In our case, the problem occurred on an xm create after heavy usage for >24 hours. 64-bit Xen, 32-bit dom0, AMD x86_64 x 8, if that helps. Thanks, Dan (XEN) Xen call trace: (XEN) [<ffff828c8016a02f>] shadow_set_l1e+0x2f/0x1b0 (XEN) [<ffff828c8016e5d8>] sh_page_fault__shadow_4_guest_4+0x8e8/0xec0 (XEN) [<ffff828c80169699>] sh_make_shadow+0x479/0x4b0 (XEN) [<ffff828c8016d459>] sh_update_cr3__shadow_4_guest_4+0x409/0x510 (XEN) [<ffff828c80166f85>] shadow_update_paging_modes+0x95/0xd0 (XEN) [<ffff828c8015906f>] svm_cr_access+0xecf/0xf50 (XEN) [<ffff828c8015509c>] get_effective_addr_modrm64+0x13c/0x3d0 (XEN) [<ffff828c8014b1d0>] hvm_io_assist+0xe30/0xe60 (XEN) [<ffff828c80146297>] hvm_do_resume+0x27/0x150 (XEN) [<ffff828c80151ff6>] vlapic_has_interrupt+0x26/0x60 (XEN) [<ffff828c801595c8>] svm_vmexit_handler+0x4d8/0x15f0 (XEN) [<ffff828c80114676>] vcpu_periodic_timer_work+0x16/0x80 (XEN) [<ffff828c80151f46>] vlapic_get_ppr+0x26/0xb0 (XEN) [<ffff828c8014b4d4>] is_isa_irq_masked+0x34/0x90 (XEN) [<ffff828c80151ff6>] vlapic_has_interrupt+0x26/0x60 (XEN) [<ffff828c8014b5ac>] cpu_has_pending_irq+0x2c/0x60 (XEN) [<ffff828c8015b08a>] svm_stgi_label+0x8/0xe (more crash dump data if needed)> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of John Levon > Sent: Wednesday, January 23, 2008 12:16 PM > To: Tim Deegan > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] 3.1/2 live migration panic > > > On Tue, Jan 22, 2008 at 09:45:41AM +0000, Tim Deegan wrote: > > > Argh. Well, here''s more debugging, since you seem to hit > the _l1e case > > more often. This patch includes the previous two as well. > > See below. I also saw "Can''t see the l1e" version as well. > > cheers > john > > (XEN) sh error: shadow_get_and_create_l1e(): Can''t see the > l2e, even with TLB flushPagetable walk from ffff81c0ffc06928: > (XEN) L4[0x103] = 00000001d2f4d063 000000000007dd4e > (XEN) L3[0x103] = 00000001d2f4d063 000000000007dd4e > (XEN) L2[0x1fe] = 00000001f73ca067 000000000007dc91 > (XEN) L1[0x006] = 0000000000000000 ffffffffffffffff > (XEN) Pagetable walk from ffffff01a4a6e8f0: > (XEN) L4[0x1fe] = 00000001f73ca067 000000000007dc91 > (XEN) L3[0x006] = 0000000000000000 ffffffffffffffff > (XEN) v->arch.shadow_table[0] == 0x1d2f4d > (XEN) CR3 = 0x1d2f4d000 > (XEN) Xen WARN at multi.c:1910 > (XEN) ----[ Xen-3.1.2-xvm x86_64 debug=y Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[<ffff828c801b3ca0>] > shadow_get_and_create_l1e+0x147/0x46c > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > (XEN) rax: ffff828c802035e4 rbx: ffff8300e3daa100 rcx: > 0000000000000008 > (XEN) rdx: ffff828c8027dbf2 rsi: 000000000000000a rdi: > ffff828c802035e4 > (XEN) rbp: ffff8300e2e0fc38 rsp: ffff8300e2e0fb98 r8: > 00000000ffffffff > (XEN) r9: 00000000ffffffff r10: ffff828c8027dfdf r11: > ffff828c8027dbe6 > (XEN) r12: ffffff01a4a6e8b8 r13: ffffff01a48594c0 r14: > ffffff01a4859480 > (XEN) r15: ffffff0146e28608 cr0: 000000008005003b cr4: > 00000000000006f0 > (XEN) cr3: 00000001d2f4d000 cr2: ffff81c0ffc06928 > (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8300e2e0fb98: > (XEN) 00000020e2e0fbc0 ffff8300e2e0fbc0 ffff8300e2e0fbc8 > (XEN) ffff828c8015ba1d ffffffffffffffff 0000000000000000 > (XEN) ffff8300e2e0fc38 ffff8300e2e0fbf8 0000000000000008 > (XEN) ffff81c0ffc06928 0000000000000008 0000000000000008 > (XEN) 0000000000000000 ffff81c0ffc06928 000000000015d83e > (XEN) ffff8300e3dab280 00000006e2eda100 ffff8300e2e0fda8 > (XEN) ffff8300e2e0fdd8 ffff8300e2eca100 ffff8300e2e0fe58 > (XEN) ffff828c801b5da0 000000fc00000000 0000000800000002 > (XEN) 0000000000000044 ffff8301c1e7cab8 00000000001c1e7c > (XEN) 00000000001c60c1 ffff8300e2e0fd98 0000000000000006 > (XEN) 0000000100000006 000000008015b93f 00000001c60c1065 > (XEN) 0000000000000000 ffff8300e2e0fc98 ffff81ff80a5bab8 > (XEN) 0000000000000008 0000000000000000 ffff8300e2e0fd20 > (XEN) 00000006e2e0fd20 0000000100000000 ffff8300e2e0fd20 > (XEN) ffff8300e2e0fd08 ffff828c8015b5f9 000000208021b300 > (XEN) 0000000000000000 0000000000000004 ffff8300e2e0fe90 > (XEN) ffffff000414e4d0 0000000400000020 ffff8300e2e0fe8c > (XEN) ffffff000414e4cc ffff8300e2e0fd88 ffff828c801668a3 > (XEN) ffff8300e2e0fd68 0000000000000000 0000000000000004 > (XEN) ffff8300e2e0fe8c ffffff000414e4cc 000000008023f4c0 > (XEN) Xen call trace: > (XEN) [<ffff828c801b3ca0>] shadow_get_and_create_l1e+0x147/0x46c > (XEN) [<ffff828c801b5da0>] > sh_page_fault__shadow_4_guest_4+0x598/0xce7 > (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e > (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 > (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c > (XEN) > (XEN) ----[ Xen-3.1.2-xvm x86_64 debug=y Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[<ffff828c801b3cb3>] > shadow_get_and_create_l1e+0x15a/0x46c > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor > (XEN) rax: ffff81c0ffc06928 rbx: ffff8300e3daa100 rcx: > 0000000000000008 > (XEN) rdx: ffff828c8027dbf2 rsi: 000000000000000a rdi: > ffff828c802035e4 > (XEN) rbp: ffff8300e2e0fc38 rsp: ffff8300e2e0fb98 r8: > 00000000ffffffff > (XEN) r9: 00000000ffffffff r10: ffff828c8027dfdf r11: > ffff828c8027dbe6 > (XEN) r12: ffffff01a4a6e8b8 r13: ffffff01a48594c0 r14: > ffffff01a4859480 > (XEN) r15: ffffff0146e28608 cr0: 000000008005003b cr4: > 00000000000006f0 > (XEN) cr3: 00000001d2f4d000 cr2: ffff81c0ffc06928 > (XEN) ds: 004b es: 004b fs: 0000 gs: 01c3 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8300e2e0fb98: > (XEN) 00000020e2e0fbc0 ffff8300e2e0fbc0 ffff8300e2e0fbc8 > (XEN) ffff828c8015ba1d ffffffffffffffff 0000000000000000 > (XEN) ffff8300e2e0fc38 ffff8300e2e0fbf8 0000000000000008 > (XEN) ffff81c0ffc06928 0000000000000008 0000000000000008 > (XEN) 0000000000000000 ffff81c0ffc06928 000000000015d83e > (XEN) ffff8300e3dab280 00000006e2eda100 ffff8300e2e0fda8 > (XEN) ffff8300e2e0fdd8 ffff8300e2eca100 ffff8300e2e0fe58 > (XEN) ffff828c801b5da0 000000fc00000000 0000000800000002 > (XEN) 0000000000000044 ffff8301c1e7cab8 00000000001c1e7c > (XEN) 00000000001c60c1 ffff8300e2e0fd98 0000000000000006 > (XEN) 0000000100000006 000000008015b93f 00000001c60c1065 > (XEN) 0000000000000000 ffff8300e2e0fc98 ffff81ff80a5bab8 > (XEN) 0000000000000008 0000000000000000 ffff8300e2e0fd20 > (XEN) 00000006e2e0fd20 0000000100000000 ffff8300e2e0fd20 > (XEN) ffff8300e2e0fd08 ffff828c8015b5f9 000000208021b300 > (XEN) 0000000000000000 0000000000000004 ffff8300e2e0fe90 > (XEN) ffffff000414e4d0 0000000400000020 ffff8300e2e0fe8c > (XEN) ffffff000414e4cc ffff8300e2e0fd88 ffff828c801668a3 > (XEN) ffff8300e2e0fd68 0000000000000000 0000000000000004 > (XEN) ffff8300e2e0fe8c ffffff000414e4cc 000000008023f4c0 > (XEN) Xen call trace: > (XEN) [<ffff828c801b3cb3>] shadow_get_and_create_l1e+0x15a/0x46c > (XEN) [<ffff828c801b5da0>] > sh_page_fault__shadow_4_guest_4+0x598/0xce7 > (XEN) [<ffff828c8016234f>] paging_fault+0x3c/0x3e > (XEN) [<ffff828c801622f9>] fixup_page_fault+0x22b/0x245 > (XEN) [<ffff828c80162391>] do_page_fault+0x40/0x15c > (XEN) > (XEN) Pagetable walk from ffff81c0ffc06928: > (XEN) L4[0x103] = 00000001d2f4d063 000000000007dd4e > (XEN) L3[0x103] = 00000001d2f4d063 000000000007dd4e > (XEN) L2[0x1fe] = 00000001f73ca067 000000000007dc91 > (XEN) L1[0x006] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 2: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: ffff81c0ffc06928 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Feb 01, 2008 at 02:19:40PM -0700, Dan Magenheimer wrote:> Any progress on this one? We may be seeing it too (on 3.1.3 near final), > at least the call trace looks very similar to one of the trace''s that > John previously posted on this thread.It does look pretty similar.> In our case, the problem occurred on an xm create after heavy usage > for >24 hours. 64-bit Xen, 32-bit dom0, AMD x86_64 x 8, if that helps.More details on that AMD box? It transpires that I can only reproduce it on one single machine, a 4-way AMD Sun Fire V40Z. I''m investigating if there''s a BIOS update needed at the moment. I''ve tested on a number of other Intel and AMD boxes and can''t reproduce the problem. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel