Stefano Stabellini
2011-Feb-28 18:24 UTC
[Xen-devel] [PATCH 0/2] x86: cleanup highmap after brk is concluded
Hi all, a little while ago I sent a patch titled "x86/mm/init: respect memblock reserved regions when destroying mappings" (https://lkml.org/lkml/2011/1/31/232) to fix a serious boot crash problem on Xen (full logs attached): Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600 Workstation/0A98h RIP: e030:[<ffffffff81008314>] [<ffffffff81008314>] get_phys_to_machine+0x44/0x50 RSP: e02b:ffffffff82001ca0 EFLAGS: 00010002 RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010 RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004 RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000 R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004 R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000 FS: 0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020) Stack: ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10 00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007 Call Trace: [<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0 [<ffffffff8100587e>] xen_pgd_val+0xe/0x10 [<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0 [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db [<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0 [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21 [<ffffffff8217de95>] setup_arch+0xa59/0xd89 [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30 [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e [<ffffffff82177b35>] start_kernel+0xc6/0x4df [<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9 [<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9 Even though a clear solution wasn''t reached in the following discussion, Yinghai Lu sent a patch to move cleanup_highmap() after reserve_brk() so that we don''t have to clear the initial mappings in two steps. The patch is a nice cleanup and with few small changes to honour the variable max_pfn_mapped can be used to fix the boot issue on Xen: all we have to do is setting max_pfn_mapped to the last valid pfn mapped on Xen that is the page baking _end. The list of patches with diffstat follows, comments and suggestions are very welcome: Stefano Stabellini (1): xen: set max_pfn_mapped to the last pfn mapped Yinghai Lu (1): x86: Cleanup highmap after brk is concluded arch/x86/kernel/head64.c | 3 --- arch/x86/kernel/setup.c | 6 ++++++ arch/x86/mm/init.c | 19 ------------------- arch/x86/mm/init_64.c | 11 ++++++----- arch/x86/xen/mmu.c | 13 +++++++------ 5 files changed, 19 insertions(+), 33 deletions(-) A git branch based on 2.6.38-rc6 is available here: git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-rc6-mm-fix Cheers, Stefano _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2011-Feb-28 18:25 UTC
[Xen-devel] [PATCH 1/2] x86: Cleanup highmap after brk is concluded
From: Yinghai Lu <yinghai@kernel.org> Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- arch/x86/kernel/head64.c | 3 --- arch/x86/kernel/setup.c | 6 ++++++ arch/x86/mm/init.c | 19 ------------------- arch/x86/mm/init_64.c | 11 ++++++----- 4 files changed, 12 insertions(+), 27 deletions(-) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 2d2673c..5655c22 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -77,9 +77,6 @@ void __init x86_64_start_kernel(char * real_mode_data) /* Make NULL pointers segfault */ zap_identity_mappings(); - /* Cleanup the over mapped high alias */ - cleanup_highmap(); - max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT; for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) { diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d3cfe26..f03e6e0 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -297,6 +297,9 @@ static void __init init_gbpages(void) static inline void init_gbpages(void) { } +static void __init cleanup_highmap(void) +{ +} #endif static void __init reserve_brk(void) @@ -922,6 +925,9 @@ void __init setup_arch(char **cmdline_p) */ reserve_brk(); + /* Cleanup the over mapped high alias after _brk_end*/ + cleanup_highmap(); + memblock.current_limit = get_max_mapped(); memblock_x86_fill(); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 947f42a..f13ff3a 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -279,25 +279,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, load_cr3(swapper_pg_dir); #endif -#ifdef CONFIG_X86_64 - if (!after_bootmem && !start) { - pud_t *pud; - pmd_t *pmd; - - mmu_cr4_features = read_cr4(); - - /* - * _brk_end cannot change anymore, but it and _end may be - * located on different 2M pages. cleanup_highmap(), however, - * can only consider _end when it runs, so destroy any - * mappings beyond _brk_end here. - */ - pud = pud_offset(pgd_offset_k(_brk_end), _brk_end); - pmd = pmd_offset(pud, _brk_end - 1); - while (++pmd <= pmd_offset(pud, (unsigned long)_end - 1)) - pmd_clear(pmd); - } -#endif __flush_tlb_all(); if (!after_bootmem && e820_table_end > e820_table_start) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 71a5929..a8d08c2 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -51,6 +51,7 @@ #include <asm/numa.h> #include <asm/cacheflush.h> #include <asm/init.h> +#include <asm/setup.h> static int __init parse_direct_gbpages_off(char *arg) { @@ -293,18 +294,18 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size) * to the compile time generated pmds. This results in invalid pmds up * to the point where we hit the physaddr 0 mapping. * - * We limit the mappings to the region from _text to _end. _end is - * rounded up to the 2MB boundary. This catches the invalid pmds as + * We limit the mappings to the region from _text to _brk_end. _brk_end + * is rounded up to the 2MB boundary. This catches the invalid pmds as * well, as they are located before _text: */ void __init cleanup_highmap(void) { unsigned long vaddr = __START_KERNEL_map; - unsigned long end = roundup((unsigned long)_end, PMD_SIZE) - 1; + unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT); + unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1; pmd_t *pmd = level2_kernel_pgt; - pmd_t *last_pmd = pmd + PTRS_PER_PMD; - for (; pmd < last_pmd; pmd++, vaddr += PMD_SIZE) { + for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) { if (pmd_none(*pmd)) continue; if (vaddr < (unsigned long) _text || vaddr > end) -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2011-Feb-28 18:25 UTC
[Xen-devel] [PATCH 2/2] xen: set max_pfn_mapped to the last pfn mapped
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Do not set max_pfn_mapped to the end of the initial memory mappings, that also contain pages that don''t belong in pfn space (like the mfn list). Set max_pfn_mapped to the last real pfn mapped in the initial memory mappings that is the pfn backing _end. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> --- arch/x86/xen/mmu.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 5e92b61..6092f73 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn) for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) { pte_t pte; - if (pfn > max_pfn_mapped) - max_pfn_mapped = pfn; - if (!pte_none(pte_page[pteidx])) continue; @@ -1713,6 +1710,12 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, pud_t *l3; pmd_t *l2; + /* max_pfn_mapped is the last pfn mapped in the initial memory + * mappings. Considering that on Xen after the kernel mappings we + * have the mappings of some pages that don''t exist in pfn space, we + * set max_pfn_mapped to the last real pfn mapped. */ + max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list)); + /* Zap identity mapping */ init_level4_pgt[0] = __pgd(0); @@ -1817,9 +1820,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, initial_kernel_pmd extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE); - max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) + - xen_start_info->nr_pt_frames * PAGE_SIZE + - 512*1024); + max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list)); kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd); memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD); -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yinghai Lu
2011-Feb-28 18:42 UTC
[Xen-devel] Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
On 02/28/2011 10:24 AM, Stefano Stabellini wrote:> Hi all, > a little while ago I sent a patch titled "x86/mm/init: respect memblock > reserved regions when destroying mappings" > (https://lkml.org/lkml/2011/1/31/232) to fix a serious boot crash > problem on Xen (full logs attached): > > Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600 Workstation/0A98h > RIP: e030:[<ffffffff81008314>] [<ffffffff81008314>] get_phys_to_machine+0x44/0x50 > RSP: e02b:ffffffff82001ca0 EFLAGS: 00010002 > RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004 > RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000 > R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004 > R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000 > FS: 0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 > Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020) > Stack: > ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10 > 00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e > ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007 > Call Trace: > [<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0 > [<ffffffff8100587e>] xen_pgd_val+0xe/0x10 > [<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e > [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0 > [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db > [<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0 > [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21 > [<ffffffff8217de95>] setup_arch+0xa59/0xd89 > [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30 > [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e > [<ffffffff82177b35>] start_kernel+0xc6/0x4df > [<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9 > [<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9 > > > Even though a clear solution wasn''t reached in the following discussion, > Yinghai Lu sent a patch to move cleanup_highmap() after reserve_brk() so > that we don''t have to clear the initial mappings in two steps. > The patch is a nice cleanup and with few small changes to honour the > variable max_pfn_mapped can be used to fix the boot issue on Xen: all we > have to do is setting max_pfn_mapped to the last valid pfn mapped on Xen > that is the page baking _end. > > > The list of patches with diffstat follows, comments and suggestions are > very welcome: > > Stefano Stabellini (1): > xen: set max_pfn_mapped to the last pfn mapped > > Yinghai Lu (1): > x86: Cleanup highmap after brk is concluded > > arch/x86/kernel/head64.c | 3 --- > arch/x86/kernel/setup.c | 6 ++++++ > arch/x86/mm/init.c | 19 ------------------- > arch/x86/mm/init_64.c | 11 ++++++----- > arch/x86/xen/mmu.c | 13 +++++++------ > 5 files changed, 19 insertions(+), 33 deletions(-) > > > A git branch based on 2.6.38-rc6 is available here: >Can you please rebase them on top of tip/x86/mm? http://people.redhat.com/mingo/tip.git/readme.txt Thanks Yinghai Lu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-01 15:13 UTC
[Xen-devel] Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
On Mon, 28 Feb 2011, Yinghai Lu wrote:> Can you please rebase them on top of tip/x86/mm? > > http://people.redhat.com/mingo/tip.git/readme.txt >Sure, I rebased the two patches on the very latest tip/x86/mm: git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-mm-fix _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-08 14:27 UTC
[Xen-devel] Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
On Tue, 1 Mar 2011, Stefano Stabellini wrote:> On Mon, 28 Feb 2011, Yinghai Lu wrote: > > Can you please rebase them on top of tip/x86/mm? > > > > http://people.redhat.com/mingo/tip.git/readme.txt > > > > Sure, I rebased the two patches on the very latest tip/x86/mm: > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-mm-fix >ping? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel