Konrad Rzeszutek Wilk
2010-Nov-23 20:41 UTC
[Xen-devel] [RFC PATCH] fix 32-bit PAE bootup of Xen PV guests.
I was wondering if you could help me out. The git commit b40827fa7268fda8a62490728a61c2856f33830b ("x86-32, mm: Add an initial page table for core bootstrapping") crashes 32-bit PAE Xen PV guests. I am attaching an RFC patch that works around your patch, but honestly I was wondering if there is a better way to make this work for 2.6.37? Any recommendations? Without this patch, a PV 32-bit PAE guest crashes during early bootup as so: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) d0:v0: unhandled page fault (ec=0003) (XEN) Pagetable walk from c158901c: (XEN) L3[0x003] = 0000000025645001 00001645 (XEN) L2[0x00a] = 00000000258ae067 00019086 (XEN) L1[0x189] = 0000000025589061 00001589 (XEN) domain_crash_sync called from entry.S (ff1d10d5) (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1-101123 x86_32p debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) EIP: e019:[<c151d371>] (XEN) EFLAGS: 00000286 EM: 1 CONTEXT: pv guest (XEN) eax: 01586001 ebx: 00000000 ecx: 00000000 edx: 00000000 (XEN) esi: c150cdc0 edi: c1511840 ebp: c14d5fa4 esp: c14d5f20 (XEN) cr0: 8005003b cr4: 000026f0 cr3: 0028aca0 cr2: c158901c (XEN) ds: e021 es: e021 fs: 00d8 gs: 0000 ss: e021 cs: e019 (XEN) Guest stack trace from esp=c14d5f20: (XEN) 00000003 c151d371 0001e019 00010086 c14e09a4 c103b02d 00000000 0000000f (XEN) 00000035 00000000 00000000 c14d5f65 00000000 205b0000 30202020 3030302e (XEN) c14d5fb4 c11a0020 c14df334 c14d5f78 c10073aa 00000000 00000000 c14df334 (XEN) c14d5f94 c100651a 00000000 c154d230 c14df334 c14d5fa4 00000000 c154d230 (XEN) c14df334 c14d5fbc c1518646 c143996d c139a010 c15180ab 18f8a000 c14d5fd4 (XEN) c15180df 017b2000 00000000 017b2000 c154d230 c14d5ffc c151afd5 1fc98375 (XEN) 80000481 00040800 00000f62 00000001 00000000 d9076000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Nov-23 20:41 UTC
[Xen-devel] [PATCH] xen/mmu, x86-32: Make swapper_pg_dir writeable before calling setup_arch.
git commit b40827fa7268fda8a62490728a61c2856f33830b ("x86-32, mm: Add an initial page table for core bootstrapping") makes swapper_pg_dir be empty during early bootup and uses initial_page_table for the startup code. In setup_arch() it copies the contents of initial_page_table to swapper_pg_dir and pivots the cr3 to use swapper_pg_dir. Later on, at the end of setup_arch() it copies swapper_pg_dir contents to initial_page_table. While that works for baremetal, under Xen, before the setup_arch is called we setup swapper_pg_dir to be RO and also load it in %cr3, and initial_page_table is empty. To work with with the requirement of copying the contents of initial_page_table to swapper_pg_dir, and then latter vice-versa this patch introduces a new mechanism to pivot over to initial_page_table right before calling setup_arch and also set swapper_pg_dir to be writeable. Then right before setup_arch calls clone_pg_dir to copy swapper_pg_dir to initial_page_table (so back) we pivot our PGD from initial_page_table to swapper_pg_dir and set initial_page_table writeable. There is an extra piece of logic where we inhibit loading of cr3 between the start of setup_arch() up to x86_init.paging.pagetable_setup_start(). This is b/c Xen requires the PGD to be _RO and there are no calls in between those code paths that does that - so we end up with a nasty error from Xen with no updates to cr3. This patch makes it possible to bootup PAE Xen PV guest. 2.6.36 did not have this issues as it did not have git commit b40827fa7268fda8a62490728a61c2856f33830b. CC: Borislav Petkov <bp@alien8.de> CC: H. Peter Anvin <hpa@linux.intel.com> CC: Ian Campbell <Ian.Campbell@eu.citrix.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/enlighten.c | 1 + arch/x86/xen/mmu.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++ arch/x86/xen/xen-ops.h | 2 + 3 files changed, 103 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 7250bef..8114fdb 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1256,6 +1256,7 @@ asmlinkage void __init xen_start_kernel(void) /* Start the world */ #ifdef CONFIG_X86_32 + xen_mk_swapper_pg_dir_writeable(); i386_start_kernel(); #else x86_64_start_reservations((char *)__pa_symbol(&boot_params)); diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index bd2713a..b4b03cd 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1590,6 +1590,9 @@ void xen_exit_mmap(struct mm_struct *mm) static __init void xen_pagetable_setup_start(pgd_t *base) { +#ifdef CONFIG_X86_32 + xen_mk_swapper_pg_dir_writeable_fixup(); +#endif } static void xen_post_allocator_init(void); @@ -2264,7 +2267,104 @@ __init void xen_ident_map_ISA(void) xen_flush_tlb(); } +#ifdef CONFIG_X86_64 +__init void xen_mk_swapper_pg_dir_writeable(void) { } +#else +static RESERVE_BRK_ARRAY(pmd_t, level2_kernel_pgt_bak, PTRS_PER_PMD); + +static void xen_mk_swapper_pg_dir_writeable_write_cr3(unsigned long cr3) +{ +} + +__init void xen_mk_swapper_pg_dir_writeable(void) +{ + /* The purpose of this code is to make swapper_pg_dir writeable. + * + * We need that b/c in setup_arch clone_pgd_range is used to copy + * from initial_page_table to swapper_pg_dir and we fault early. + * + * To make swapper_pg_dir writeable we need to stop using the + * swapper_pg_dir as PGD and create a whole new pgd with puds copied + * from the swapper_pg_dir. For that we deep-copy (one level) the + * swapper_pg_dir to initial_page_table and swap over to that one. + */ + /* Inhibit the write_cr3 as Xen requires the pagetable that is to be + * the PGD to be RO and the code in arch_setup does not set it as so, + * so to inhibit Xen hypervisor from throwing errors. We set this to + * stub (later in xen_mk_swapper_pg_dir_writeable_done we restore it). */ + pv_mmu_ops.write_cr3 = xen_mk_swapper_pg_dir_writeable_write_cr3; + + /* Create a deep (one-level) copy of swapper_pg_dir */ + level2_kernel_pgt_bak = extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE); + + memcpy(level2_kernel_pgt_bak, + m2v(swapper_pg_dir[KERNEL_PGD_BOUNDARY].pgd), + sizeof(pmd_t) * PTRS_PER_PMD); + + memcpy(initial_page_table, swapper_pg_dir, sizeof(pgd_t) * PTRS_PER_PGD); + set_pgd(&initial_page_table[KERNEL_PGD_BOUNDARY], + __pgd(__pa(level2_kernel_pgt_bak) | _PAGE_PRESENT)); + + /* L3 and the slot it points to _MUST_ be RO */ + set_page_prot(initial_page_table, PAGE_KERNEL_RO); + set_page_prot(level2_kernel_pgt_bak, PAGE_KERNEL_RO); + + /* Pivot over to the new PGD. */ + pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(swapper_pg_dir))); + + xen_write_cr3(__pa(initial_page_table)); + + pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(initial_page_table))); + + /* And now swapper_pg_dir + level2_kernel_pgt is writeable. */ + set_page_prot(swapper_pg_dir, PAGE_KERNEL); + set_page_prot(level2_kernel_pgt, PAGE_KERNEL); + + printk(KERN_INFO "Pivoting from PGD: 0x%lx to 0x%lx\n", + PFN_DOWN(__pa(swapper_pg_dir)), + PFN_DOWN(__pa(initial_page_table))); +} +__init void xen_mk_swapper_pg_dir_writeable_fixup(void) +{ + int i; + /* We MUST copy over the new L2 data as it has been updated. */ + memcpy(level2_kernel_pgt, level2_kernel_pgt_bak, + sizeof(pmd_t) * PTRS_PER_PMD); + + set_pgd(&swapper_pg_dir[KERNEL_PGD_BOUNDARY], + __pgd(__pa(level2_kernel_pgt) | _PAGE_PRESENT)); + + for (i = 0; i < KERNEL_PGD_PTRS; i++) { + if (swapper_pg_dir[i].pgd != initial_page_table[i].pgd) + xen_raw_printk("[%3d] 0x%lx != 0x%lx\n", i, swapper_pg_dir[i], + initial_page_table[i]); + if (swapper_pg_dir[i].pgd != 0) + xen_raw_printk("[%3d] 0x%lx\n", i, swapper_pg_dir[i]); + } + + set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO); + set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO); + + /* Swap PGD over to swapper_pg_dir. */ + pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(initial_page_table))); + + xen_write_cr3(__pa(swapper_pg_dir)); + + pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir))); + + /* And now initial page table is writeable. */ + set_page_prot(initial_page_table, PAGE_KERNEL); + set_page_prot(level2_kernel_pgt_bak, PAGE_KERNEL); + + /* Restore the CR3 operation. */ + pv_mmu_ops.write_cr3 = xen_write_cr3; + + printk(KERN_INFO "Pivoting back from PGD: 0x%lx to 0x%lx\n", + PFN_DOWN(__pa(initial_page_table)), + PFN_DOWN(__pa(swapper_pg_dir))); +} +#endif static __init void xen_post_allocator_init(void) { pv_mmu_ops.set_pte = xen_set_pte; diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 6404474..11e4aec 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -30,6 +30,8 @@ void xen_setup_machphys_mapping(void); pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn); void xen_ident_map_ISA(void); void xen_reserve_top(void); +void xen_mk_swapper_pg_dir_writeable(void); +void xen_mk_swapper_pg_dir_writeable_fixup(void); extern unsigned long xen_max_p2m_pfn; void xen_set_pat(u64); -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Nov-23 21:18 UTC
[Xen-devel] Re: [RFC PATCH] fix 32-bit PAE bootup of Xen PV guests.
On Tue, Nov 23, 2010 at 03:41:35PM -0500, Konrad Rzeszutek Wilk wrote:> I was wondering if you could help me out. The git commit > b40827fa7268fda8a62490728a61c2856f33830b ("x86-32, mm: Add an initial > page table for core bootstrapping") crashes 32-bit PAE Xen PV guests. > > I am attaching an RFC patch that works around your patch, but honestly > I was wondering if there is a better way to make this work for 2.6.37? > Any recommendations?Ignore this patch please. Ian''s fix ( http://marc.info/?l=linux-kernel&m=128879861204118&w=2) is much cleaner. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel