Further to my previous report: http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00257.html Message-ID: <19629.39326.337589.71778@wylie.me.uk> I''ve added some debugging and have tracked down the crash to the recently modified code in arch/x86/xen/mmu.c Since the last version of the code that worked for me, mmu.c has been modified with a lot of P2M changes. It now crashes in get_phys_to_machine(). Having tracked down the crash and the offending value of pfn, I then further modified the code only to print if ( pfn == 0x18C3 ), and also to print intermediate values. <7>ALANW get_phys_to_machine pfn 000018C3 <7> topidx 00000000 <7> mididx 0000000C <7> idx 000000C3 (XEN) d0:v0: unhandled page fault (ec=0000) If there is any more debugging that I can do, I''ll be only too happy to oblige. System: Supermicro SM-SC825TQ-R720LPB, 8GB RAM Motherboard: X8DTL Processor: 1 x Intel XEON E5506 quad core RAID controller: LSI MegaRAID SAS 8708 git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen branch xen/stable-2.6.32.x commit 179eca50d08fa05d7650fcb8a0d3e6598cf2388a Merge commit ''v2.6.32.24'' into xen/next-2.6.32 ------8<------8<------8<------8<------8<------8<------8<------8<------8<------8< /* initial changes to mmu.c to track down crash */ +static char hex[9]; unsigned long get_phys_to_machine(unsigned long pfn) { unsigned topidx, mididx, idx; + unsigned long rv; + + longtohex(pfn); + xen_raw_printk(KERN_DEBUG "ALANW get_phys_to_machine %s", hex ); if (unlikely(pfn >= MAX_P2M_PFN)) return INVALID_P2M_ENTRY; @@ -406,7 +432,12 @@ unsigned long get_phys_to_machine(unsigned long pfn) mididx = p2m_mid_index(pfn); idx = p2m_index(pfn); - return p2m_top[topidx][mididx][idx]; + rv=p2m_top[topidx][mididx][idx]; + + longtohex(rv); + xen_raw_printk(KERN_DEBUG " returns %s\n", hex ); + + return rv; } ------8<------8<------8<------8<------8<------8<------8<------8<------8<------8< ... (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff816b1000 (XEN) Init. ramdisk: ffffffff816b1000->ffffffff816b1000 (XEN) Phys-Mach map: ffffffff816b1000->ffffffff818b1000 (XEN) Start info: ffffffff818b1000->ffffffff818b14b4 (XEN) Page tables: ffffffff818b2000->ffffffff818c3000 (XEN) Boot stack: ffffffff818c3000->ffffffff818c4000 (XEN) TOTAL: ffffffff80000000->ffffffff81c00000 (XEN) ENTRY ADDRESS: ffffffff814cc200 ... <7>ALANW get_phys_to_machine 0003FFFC<7> returns 0017A544 <7>ALANW get_phys_to_machine 0003FFFD<7> returns 0017A545 <7>ALANW get_phys_to_machine 0003FFFE<7> returns 0017A546 <7>ALANW get_phys_to_machine 0003FFFF<7> returns 0017A547 <7>ALANW get_phys_to_machine 000002ED<7> returns 002382ED <7>ALANW get_phys_to_machine 000002ED<7> returns 002382ED init_memory_mapping: 0000000100000000-00000002bf780000 0100000000 - 02bf780000 page 4k kernel direct mapping tables up to 2bf780000 @ 18c3000-2ecb000 <7>ALANW get_phys_to_machine 000018C3(XEN) d0:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffffff816bd618: (XEN) L4[0x1ff] = 0000000239003067 0000000000001003 (XEN) L3[0x1fe] = 0000000239007067 0000000000001007 (XEN) L2[0x00b] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.0.2-rc1-pre x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8100c393>] (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff816bd000 rbx: 00000000000000c3 rcx: 0000000000000000 (XEN) rdx: ffffffff8158b000 rsi: 0000000000000025 rdi: 0000000000000000 (XEN) rbp: ffffffffffffffff rsp: ffffffff81445c00 r8: 000000000000000a (XEN) r9: ffffffff8157bf90 r10: ffffffff8157bd90 r11: 0000000000000200 (XEN) r12: 00000000018c3000 r13: 8000000000000163 r14: 0000000000000001 (XEN) r15: 00000000000009ff cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000239001000 cr2: ffffffff816bd618 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81445c00: (XEN) 0000000000000000 0000000000000200 0000000000000000 ffffffff8100c393 (XEN) 000000010000e030 0000000000010006 ffffffff81445c40 000000000000e02b (XEN) ffffffff81445de8 80000000018c3063 00000000018c3000 ffffffff8100c657 (XEN) 80000000018c3063 ffffffff8100c72a 0000000000000000 ffffffff8100b789 (XEN) 0000000239002040 ffffffff815553e0 000000000000000f 8000000000000163 (XEN) 80000000018c3063 ffffffffff400000 ffffffff81536000 00000002bf780000 (XEN) ffffffff814de1a3 0000000000000001 ffffffff814c30a0 ffffffffff400000 (XEN) 0000000139002038 ffffffff815553e0 ffffffff81445d90 00000000018c3000 (XEN) ffff8802bf780000 00000002bf780000 00000002bf780000 0000000000000005 (XEN) ffffffff813085a8 ffff880001002048 0000000240000000 ffff8802bf780000 (XEN) ffffffff814f8091 0000000000000001 ffffffff814c30a0 8000000000000163 (XEN) 0000000000000000 0000000000000004 0000000000000000 0000000000000000 (XEN) ffff880001002000 ffffffff8100b76b 00000000000003bf ffffffff815553e0 (XEN) ffffffff81001880 00000002bf780000 ffff8802bf780000 ffffffff813c7fad (XEN) ffff8802bf780000 0000000000000000 ffffffff814f823a 00000002bf780000 (XEN) ffffffff813196e4 0000000000000020 ffff880100000000 ffffffff81445e08 (XEN) 0000000040000000 0000000040000000 ffffffff81445e78 0000000000000001 (XEN) 0000000000000001 ffffffff813c7fad 0000000000000000 00000002bf780000 (XEN) ffffffff813083d2 0000000000000000 0000000000000000 ffffffff00000000 (XEN) 0000000100000000 ffff880000000000 0000000000000000 0000000100000000 (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. -- Alan J. Wylie http://www.wylie.me.uk/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-13 00:19 UTC
Re: [Xen-devel] Xen dom0 crash in get_phys_to_machine
On 10/12/2010 12:55 AM, Alan J. Wylie wrote:> Further to my previous report: > > http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00257.html > Message-ID: <19629.39326.337589.71778@wylie.me.uk> > > I''ve added some debugging and have tracked down the crash to the > recently modified code in arch/x86/xen/mmu.cThanks, this is useful info. I''ll try to get to it tomorrow. J> Since the last version of the code that worked for me, mmu.c has been > modified with a lot of P2M changes. It now crashes in > get_phys_to_machine(). > > Having tracked down the crash and the offending value of pfn, I then > further modified the code only to print if ( pfn == 0x18C3 ), and also > to print intermediate values. > > <7>ALANW get_phys_to_machine pfn 000018C3 > <7> topidx 00000000 > <7> mididx 0000000C > <7> idx 000000C3 > (XEN) d0:v0: unhandled page fault (ec=0000) > > If there is any more debugging that I can do, I''ll be only too happy to > oblige. > > System: Supermicro SM-SC825TQ-R720LPB, 8GB RAM > Motherboard: X8DTL > Processor: 1 x Intel XEON E5506 quad core > RAID controller: LSI MegaRAID SAS 8708 > > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen > branch xen/stable-2.6.32.x > > commit 179eca50d08fa05d7650fcb8a0d3e6598cf2388a > Merge commit ''v2.6.32.24'' into xen/next-2.6.32 > > ------8<------8<------8<------8<------8<------8<------8<------8<------8<------8< > /* initial changes to mmu.c to track down crash */ > +static char hex[9]; > > unsigned long get_phys_to_machine(unsigned long pfn) > { > unsigned topidx, mididx, idx; > + unsigned long rv; > + > + longtohex(pfn); > + xen_raw_printk(KERN_DEBUG "ALANW get_phys_to_machine %s", hex ); > > if (unlikely(pfn >= MAX_P2M_PFN)) > return INVALID_P2M_ENTRY; > @@ -406,7 +432,12 @@ unsigned long get_phys_to_machine(unsigned long pfn) > mididx = p2m_mid_index(pfn); > idx = p2m_index(pfn); > > - return p2m_top[topidx][mididx][idx]; > + rv=p2m_top[topidx][mididx][idx]; > + > + longtohex(rv); > + xen_raw_printk(KERN_DEBUG " returns %s\n", hex ); > + > + return rv; > } > ------8<------8<------8<------8<------8<------8<------8<------8<------8<------8< > ... > > (XEN) VIRTUAL MEMORY ARRANGEMENT: > (XEN) Loaded kernel: ffffffff81000000->ffffffff816b1000 > (XEN) Init. ramdisk: ffffffff816b1000->ffffffff816b1000 > (XEN) Phys-Mach map: ffffffff816b1000->ffffffff818b1000 > (XEN) Start info: ffffffff818b1000->ffffffff818b14b4 > (XEN) Page tables: ffffffff818b2000->ffffffff818c3000 > (XEN) Boot stack: ffffffff818c3000->ffffffff818c4000 > (XEN) TOTAL: ffffffff80000000->ffffffff81c00000 > (XEN) ENTRY ADDRESS: ffffffff814cc200 > > ... > > <7>ALANW get_phys_to_machine 0003FFFC<7> returns 0017A544 > <7>ALANW get_phys_to_machine 0003FFFD<7> returns 0017A545 > <7>ALANW get_phys_to_machine 0003FFFE<7> returns 0017A546 > <7>ALANW get_phys_to_machine 0003FFFF<7> returns 0017A547 > <7>ALANW get_phys_to_machine 000002ED<7> returns 002382ED > <7>ALANW get_phys_to_machine 000002ED<7> returns 002382ED > init_memory_mapping: 0000000100000000-00000002bf780000 > 0100000000 - 02bf780000 page 4k > kernel direct mapping tables up to 2bf780000 @ 18c3000-2ecb000 > <7>ALANW get_phys_to_machine 000018C3(XEN) d0:v0: unhandled page fault (ec=0000) > (XEN) Pagetable walk from ffffffff816bd618: > (XEN) L4[0x1ff] = 0000000239003067 0000000000001003 > (XEN) L3[0x1fe] = 0000000239007067 0000000000001007 > (XEN) L2[0x00b] = 0000000000000000 ffffffffffffffff > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-4.0.2-rc1-pre x86_64 debug=n Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e033:[<ffffffff8100c393>] > (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest > (XEN) rax: ffffffff816bd000 rbx: 00000000000000c3 rcx: 0000000000000000 > (XEN) rdx: ffffffff8158b000 rsi: 0000000000000025 rdi: 0000000000000000 > (XEN) rbp: ffffffffffffffff rsp: ffffffff81445c00 r8: 000000000000000a > (XEN) r9: ffffffff8157bf90 r10: ffffffff8157bd90 r11: 0000000000000200 > (XEN) r12: 00000000018c3000 r13: 8000000000000163 r14: 0000000000000001 > (XEN) r15: 00000000000009ff cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 0000000239001000 cr2: ffffffff816bd618 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=ffffffff81445c00: > (XEN) 0000000000000000 0000000000000200 0000000000000000 ffffffff8100c393 > (XEN) 000000010000e030 0000000000010006 ffffffff81445c40 000000000000e02b > (XEN) ffffffff81445de8 80000000018c3063 00000000018c3000 ffffffff8100c657 > (XEN) 80000000018c3063 ffffffff8100c72a 0000000000000000 ffffffff8100b789 > (XEN) 0000000239002040 ffffffff815553e0 000000000000000f 8000000000000163 > (XEN) 80000000018c3063 ffffffffff400000 ffffffff81536000 00000002bf780000 > (XEN) ffffffff814de1a3 0000000000000001 ffffffff814c30a0 ffffffffff400000 > (XEN) 0000000139002038 ffffffff815553e0 ffffffff81445d90 00000000018c3000 > (XEN) ffff8802bf780000 00000002bf780000 00000002bf780000 0000000000000005 > (XEN) ffffffff813085a8 ffff880001002048 0000000240000000 ffff8802bf780000 > (XEN) ffffffff814f8091 0000000000000001 ffffffff814c30a0 8000000000000163 > (XEN) 0000000000000000 0000000000000004 0000000000000000 0000000000000000 > (XEN) ffff880001002000 ffffffff8100b76b 00000000000003bf ffffffff815553e0 > (XEN) ffffffff81001880 00000002bf780000 ffff8802bf780000 ffffffff813c7fad > (XEN) ffff8802bf780000 0000000000000000 ffffffff814f823a 00000002bf780000 > (XEN) ffffffff813196e4 0000000000000020 ffff880100000000 ffffffff81445e08 > (XEN) 0000000040000000 0000000040000000 ffffffff81445e78 0000000000000001 > (XEN) 0000000000000001 ffffffff813c7fad 0000000000000000 00000002bf780000 > (XEN) ffffffff813083d2 0000000000000000 0000000000000000 ffffffff00000000 > (XEN) 0000000100000000 ffff880000000000 0000000000000000 0000000100000000 > (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
"Alan J. Wylie" <NDA5OWUy@wylie.me.uk>,
2010-Oct-22 12:47 UTC
Re: [Xen-devel] Xen dom0 crash in get_phys_to_machine
I''ve pulled the latest tree from Jeremy''s git repository and would just like to report that the changes to mmu.c in 7510ae89101a20046d03c551bd7db056ada84933 haven''t stopped the crashing. -- Alan J. Wylie http://www.wylie.me.uk/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2010-Oct-22 13:05 UTC
Re: [Xen-devel] Xen dom0 crash in get_phys_to_machine
On Tue, 2010-10-12 at 08:55 +0100, Alan J. Wylie wrote:> Further to my previous report: > > http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00257.html > Message-ID: <19629.39326.337589.71778@wylie.me.uk> > > I''ve added some debugging and have tracked down the crash to the > recently modified code in arch/x86/xen/mmu.c > > Since the last version of the code that worked for me, mmu.c has been > modified with a lot of P2M changes. It now crashes in > get_phys_to_machine(). > > Having tracked down the crash and the offending value of pfn, I then > further modified the code only to print if ( pfn == 0x18C3 ), and also > to print intermediate values. > > <7>ALANW get_phys_to_machine pfn 000018C3 > <7> topidx 00000000 > <7> mididx 0000000C > <7> idx 000000C3 > (XEN) d0:v0: unhandled page fault (ec=0000) > > If there is any more debugging that I can do, I''ll be only too happy to > oblige.FWIW, when I was checking for any call where pfn > max_pfn - and I got: p2m_top[0][10][104] max_pfn=0 The p2m seems to have been correctly initialised: xen_build_dynamic_phys_to_machine: topidx=0 mididx=375 max_pfn=192512 But then it looks like something is trampling max_pfn and possibly other important data structures. I can get a working pvops dom0 by reverting to commit e6b9b2cbca5093e8e38d3e314e2f6415ad951c60 - with the same config. git-bisect between that commit and head turned up some nonsense about a ata_piix change which just added a spinlock 876b3a81850fc237f643a065ea78ce2ad7665767 - so I assume that is a bisect problem and that this commit is unrelated... Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
at 14:05 on Fri 22-Oct-2010 Gianni Tedesco (gianni.tedesco@citrix.com) wrote:> FWIW, when I was checking for any call where pfn > max_pfn - and I got: > > p2m_top[0][10][104] max_pfn=0 > > The p2m seems to have been correctly initialised: > > xen_build_dynamic_phys_to_machine: topidx=0 mididx=375 max_pfn=192512 > > But then it looks like something is trampling max_pfn and possibly other > important data structures.I''ve just been reading through the Documentation/development-process and discovered "sparse". Five minutes ago I ran it on mmu.c and got the following interesting output: /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:385:23: warning: symbol ''max_pfn'' shadows an earlier one /usr/src/jeremy-git-xen/arch/x86/include/asm/page_64_types.h:58:22: originally declared here /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:289:47: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: incorrect type in argument 1 (different address spaces) /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: expected void const volatile [noderef] <asn:1>*<noident> /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: got unsigned long * /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:1269:37: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:1410:37: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:1684:17: error: bad constant expression Is it just a co-incidence that the first two lines refer to the same symbol that you have just mentioned? I''m going to try renaming the local symbol and see if things still crash. The trouble is that the box I''ve been testing on is supposed to be our backup file server and is currently doing a rsync of 280GB of files from a 7 year old windows box. At least I''ll be able to leave it running undisturbed over the weekend. -- Alan J. Wylie http://www.wylie.me.uk/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2010-Oct-22 13:45 UTC
Re: [Xen-devel] Xen dom0 crash in get_phys_to_machine
On Fri, 2010-10-22 at 14:33 +0100, Alan J. Wylie wrote:> at 14:05 on Fri 22-Oct-2010 Gianni Tedesco (gianni.tedesco@citrix.com) wrote: > > > FWIW, when I was checking for any call where pfn > max_pfn - and I got: > > > > p2m_top[0][10][104] max_pfn=0 > > > > The p2m seems to have been correctly initialised: > > > > xen_build_dynamic_phys_to_machine: topidx=0 mididx=375 max_pfn=192512 > > > > But then it looks like something is trampling max_pfn and possibly other > > important data structures. > > I''ve just been reading through the Documentation/development-process > and discovered "sparse". > > Five minutes ago I ran it on mmu.c and got the following interesting > output: > > /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:385:23: warning: symbol ''max_pfn'' shadows an earlier one > /usr/src/jeremy-git-xen/arch/x86/include/asm/page_64_types.h:58:22: originally declared here > /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:289:47: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: incorrect type in argument 1 (different address spaces) > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: expected void const volatile [noderef] <asn:1>*<noident> > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: got unsigned long * > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) > /usr/src/jeremy-git-xen/arch/x86/include/asm/xen/page.h:84:9: warning: cast adds address space to expression (<asn:1>) > /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:1269:37: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:1410:37: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/include/linux/mm.h:603:16: warning: potentially expensive pointer subtraction > /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:1684:17: error: bad constant expression > > Is it just a co-incidence that the first two lines refer to the same > symbol that you have just mentioned?Hmm, sort of, I assumed I was printing the global max_pfn but it looks like the shadowing is deliberate (if a little thoughtless in the naming). It does reverse my finding that ''max_pfn'' (the global one) is getting corrupted.> I''m going to try renaming the local symbol and see if things still crash.Sadly, I''m almost certain things will still crash. You may get more play out of initialising the global max_pfn. But I am not sure how this code is supposed to work and am busy with other things right now.> The trouble is that the box I''ve been testing on is supposed to be our > backup file server and is currently doing a rsync of 280GB of files > from a 7 year old windows box. At least I''ll be able to leave it > running undisturbed over the weekend.This happens for you after a full boot then? Mine gets as far as this: (XEN) Freed 204kB init memory. xen_build_dynamic_phys_to_machine: topidx=0 mididx=375 max_pfn=192512 mapping kernel into physical memory Xen: setup ISA identity maps xen_build_mfn_list_list: topidx=0 mididx=375 about to get started... [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 2.6.32.24-g2472a9c-dirty (scara@dt09) (gcc version 4.5.1 20100907 (Red Hat 4.5.1-3) (GCC) ) #51 SMP Thu Oct 21 15:46:56 BST 2010 [ 0.000000] Command line: ro root=/dev/sda2 console=hvc0 initcall_debug max_cstate=1 earlyprintk=xen [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Centaur CentaurHauls [ 0.000000] xen_release_chunk: looking at area pfn 9e-a0: 2 pages freed [ 0.000000] released 2 pages of unused memory [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 000000000009e000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 000000002f000000 (usable) [ 0.000000] Xen: 00000000bf699000 - 00000000bf6af000 (reserved) [ 0.000000] Xen: 00000000bf6af000 - 00000000bf6ce000 (ACPI data) [ 0.000000] Xen: 00000000bf6ce000 - 00000000c0000000 (reserved) [ 0.000000] Xen: 00000000e0000000 - 00000000f0000000 (reserved) [ 0.000000] Xen: 00000000fe000000 - 0000000100000000 (reserved) [ 0.000000] Xen: 0000000240000000 - 00000002d069b000 (usable) [ 0.000000] bootconsole [xenboot0] enabled [ 0.000000] DMI 2.6 present. [ 0.000000] last_pfn = 0x2d069b max_arch_pfn = 0x400000000 [ 0.000000] x86 PAT enabled: cpu 0, old 0x50100070406, new 0x7010600070106 [ 0.000000] last_pfn = 0x2f000 max_arch_pfn = 0x400000000 [ 0.000000] init_memory_mapping: 0000000000000000-000000002f000000 [ 0.000000] init_memory_mapping: 0000000100000000-00000002d069b000 (XEN) d0:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffffff817d2030: (XEN) L4[0x1ff] = 0000000239003067 0000000000001003 (XEN) L3[0x1fe] = 0000000239007067 0000000000001007 (XEN) L2[0x00b] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff81212bbf>] (XEN) RFLAGS: 0000000000000246 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff817d2000 rbx: 0000000000000046 rcx: 00000000ffffffff (XEN) rdx: 00000000deadbeef rsi: 00000000deadbeef rdi: 00000000deadbeef (XEN) rbp: ffffffff813c7c58 rsp: ffffffff813c7be0 r8: 0000000000000766 (XEN) r9: 00000000ffffffff r10: 0000000000000006 r11: ffffffff813c7c88 (XEN) r12: ffffffff8148bb87 r13: 0000000000000767 r14: 0000000000000046 (XEN) r15: 00000000ffffffff cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000239001000 cr2: ffffffff817d2030 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff813c7be0: (XEN) 00000000ffffffff ffffffff813c7c88 0000000000000000 ffffffff81212bbf (XEN) 000000010000e030 0000000000010046 ffffffff813c7c28 000000000000e02b (XEN) ffffffff81212b13 ffffffff813c7c78 ffffffff813e7ee0 ffffffff813ffae0 (XEN) 0000000000000767 0000000000000046 00000000ffffffff ffffffff813c7c88 (XEN) ffffffff8104c284 00000000000007ad 00000000000007ad 00000000000007ad (XEN) 0000000000000000 ffffffff813c7ca8 ffffffff8104c2e5 ffffffff813c7ca8 (XEN) 00000000fffff853 ffffffff813c7cd8 ffffffff8104c580 ffffffff8150b55a (XEN) 000000000000004c ffffffff813c7d08 0000000000000036 ffffffff813c7d78 (XEN) ffffffff8104cb25 ffffffff813c7d16 0000000000000000 0000000faaaaaaaa (XEN) ffffffff813c7d17 302e30202020205b 00205d3030303030 0000000000000000 (XEN) aaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa 0000000000007ff0 0000000000000000 (XEN) 00000000deadbeef 0000000000000000 000000000153b3c5 0000000000000000 (XEN) ffffffff813c7f60 ffffffffffffffff 0000000000000000 ffffffff813c7dd8 (XEN) ffffffff81322650 0000000000000018 ffffffff813c7de8 ffffffff813c7da8 (XEN) 00003ffffffff000 ffffffff813c7dd8 0000000100000000 00000002d069b000 (XEN) 0000000000100000 0000000000007ff0 aaaaaaaaaaaaaaaa ffffffff813c7eb8 (XEN) ffffffff81311686 302e30202020205b 0000000000000000 0000000100000000 (XEN) 00000002d069b000 0000000000000000 000000002f000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
at 14:45 on Fri 22-Oct-2010 Gianni Tedesco (gianni.tedesco@citrix.com) wrote:> On Fri, 2010-10-22 at 14:33 +0100, Alan J. Wylie wrote:>>> But then it looks like something is trampling max_pfn and possibly other >>> important data structures.>> I''ve just been reading through the Documentation/development-process >> and discovered "sparse".>> Five minutes ago I ran it on mmu.c and got the following interesting >> output:>> /usr/src/jeremy-git-xen/arch/x86/xen/mmu.c:385:23: warning: symbol ''max_pfn'' >> shadows an earlier one >> /usr/src/jeremy-git-xen/arch/x86/include/asm/page_64_types.h:58:22: >> originally declared here>> Is it just a co-incidence that the first two lines refer to the >> same symbol that you have just mentioned?> Hmm, sort of, I assumed I was printing the global max_pfn but it > looks like the shadowing is deliberate (if a little thoughtless in > the naming). It does reverse my finding that ''max_pfn'' (the global > one) is getting corrupted.>> I''m going to try renaming the local symbol and see if things still crash.> Sadly, I''m almost certain things will still crash.You are quite right - it still crashes.>> At least I''ll be able to leave it running undisturbed over the >> weekend.> This happens for you after a full boot then?No - I have an old 2.6.32.18 kernel that boots fine. -- Alan J. Wylie http://www.wylie.me.uk/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-22 22:26 UTC
Re: [Xen-devel] Xen dom0 crash in get_phys_to_machine
On 10/22/2010 06:05 AM, Gianni Tedesco wrote:> On Tue, 2010-10-12 at 08:55 +0100, Alan J. Wylie wrote: >> Further to my previous report: >> >> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00257.html >> Message-ID: <19629.39326.337589.71778@wylie.me.uk> >> >> I''ve added some debugging and have tracked down the crash to the >> recently modified code in arch/x86/xen/mmu.c >> >> Since the last version of the code that worked for me, mmu.c has been >> modified with a lot of P2M changes. It now crashes in >> get_phys_to_machine(). >> >> Having tracked down the crash and the offending value of pfn, I then >> further modified the code only to print if ( pfn == 0x18C3 ), and also >> to print intermediate values. >> >> <7>ALANW get_phys_to_machine pfn 000018C3 >> <7> topidx 00000000 >> <7> mididx 0000000C >> <7> idx 000000C3 >> (XEN) d0:v0: unhandled page fault (ec=0000) >> >> If there is any more debugging that I can do, I''ll be only too happy to >> oblige. > FWIW, when I was checking for any call where pfn > max_pfn - and I got: > > p2m_top[0][10][104] max_pfn=0 > > The p2m seems to have been correctly initialised: > > xen_build_dynamic_phys_to_machine: topidx=0 mididx=375 max_pfn=192512 > > But then it looks like something is trampling max_pfn and possibly other > important data structures. > > I can get a working pvops dom0 by reverting to commit > e6b9b2cbca5093e8e38d3e314e2f6415ad951c60 - with the same config. > > git-bisect between that commit and head turned up some nonsense about a > ata_piix change which just added a spinlock > 876b3a81850fc237f643a065ea78ce2ad7665767 - so I assume that is a bisect > problem and that this commit is unrelated...Yeah. If the problem appears as a function of kernel size, then bisection is going to give you more or less random results, unfortunately. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hello, is there any news regarding this problem? This evening I tried next-2.6.32 again, but still no luck. (..had the slight hope the recent changes about lowest-megabyte memory area might have fixed this, too .. but ..) Regards, Sven _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Nov-13 00:53 UTC
Re: [Xen-devel] Xen dom0 crash in get_phys_to_machine
On 11/12/2010 02:41 PM, sven wrote:> is there any news regarding this problem? > > This evening I tried next-2.6.32 again, but still no luck. > > (..had the slight hope the recent changes about lowest-megabyte memory > area might have fixed this, too .. but ..)Sorry, I''d set it to one side while dealing with all the 2.6.37 upstreaming work. Unfortunately I don''t have a machine which reproduces this problem, so I''ve been relying on other people''s reports, and they haven''t shown any smoking guns yet. But I have been seeing other odd things occasionally which could be the same problem in different guises, so I''ll see if I can track those down. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel