Stefano Stabellini
2011-Feb-28 18:24 UTC
[Xen-devel] [PATCH 0/2] x86: cleanup highmap after brk is concluded
Hi all,
a little while ago I sent a patch titled "x86/mm/init: respect memblock
reserved regions when destroying mappings"
(https://lkml.org/lkml/2011/1/31/232) to fix a serious boot crash
problem on Xen (full logs attached):
Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600
Workstation/0A98h
RIP: e030:[<ffffffff81008314>] [<ffffffff81008314>]
get_phys_to_machine+0x44/0x50
RSP: e02b:ffffffff82001ca0 EFLAGS: 00010002
RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010
RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004
RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004
R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000
FS: 0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020)
Stack:
ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10
00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e
ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007
Call Trace:
[<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0
[<ffffffff8100587e>] xen_pgd_val+0xe/0x10
[<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e
[<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0
[<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db
[<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0
[<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21
[<ffffffff8217de95>] setup_arch+0xa59/0xd89
[<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30
[<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
[<ffffffff82177b35>] start_kernel+0xc6/0x4df
[<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9
[<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9
Even though a clear solution wasn''t reached in the following
discussion,
Yinghai Lu sent a patch to move cleanup_highmap() after reserve_brk() so
that we don''t have to clear the initial mappings in two steps.
The patch is a nice cleanup and with few small changes to honour the
variable max_pfn_mapped can be used to fix the boot issue on Xen: all we
have to do is setting max_pfn_mapped to the last valid pfn mapped on Xen
that is the page baking _end.
The list of patches with diffstat follows, comments and suggestions are
very welcome:
Stefano Stabellini (1):
xen: set max_pfn_mapped to the last pfn mapped
Yinghai Lu (1):
x86: Cleanup highmap after brk is concluded
arch/x86/kernel/head64.c | 3 ---
arch/x86/kernel/setup.c | 6 ++++++
arch/x86/mm/init.c | 19 -------------------
arch/x86/mm/init_64.c | 11 ++++++-----
arch/x86/xen/mmu.c | 13 +++++++------
5 files changed, 19 insertions(+), 33 deletions(-)
A git branch based on 2.6.38-rc6 is available here:
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-rc6-mm-fix
Cheers,
Stefano
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2011-Feb-28 18:25 UTC
[Xen-devel] [PATCH 1/2] x86: Cleanup highmap after brk is concluded
From: Yinghai Lu <yinghai@kernel.org>
Now cleanup_highmap actually is in two steps: one is early in head64.c
and only clears above _end; a second one is in init_memory_mapping() and
tries to clean from _brk_end to _end.
It should check if those boundaries are PMD_SIZE aligned but currently
does not.
Also init_memory_mapping() is called several times for numa or memory
hotplug, so we really should not handle initial kernel mappings there.
This patch moves cleanup_highmap() down after _brk_end is settled so
we can do everything in one step.
Also we honor max_pfn_mapped in the implementation of cleanup_highmap.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
arch/x86/kernel/head64.c | 3 ---
arch/x86/kernel/setup.c | 6 ++++++
arch/x86/mm/init.c | 19 -------------------
arch/x86/mm/init_64.c | 11 ++++++-----
4 files changed, 12 insertions(+), 27 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 2d2673c..5655c22 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -77,9 +77,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
/* Make NULL pointers segfault */
zap_identity_mappings();
- /* Cleanup the over mapped high alias */
- cleanup_highmap();
-
max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d3cfe26..f03e6e0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -297,6 +297,9 @@ static void __init init_gbpages(void)
static inline void init_gbpages(void)
{
}
+static void __init cleanup_highmap(void)
+{
+}
#endif
static void __init reserve_brk(void)
@@ -922,6 +925,9 @@ void __init setup_arch(char **cmdline_p)
*/
reserve_brk();
+ /* Cleanup the over mapped high alias after _brk_end*/
+ cleanup_highmap();
+
memblock.current_limit = get_max_mapped();
memblock_x86_fill();
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 947f42a..f13ff3a 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -279,25 +279,6 @@ unsigned long __init_refok init_memory_mapping(unsigned
long start,
load_cr3(swapper_pg_dir);
#endif
-#ifdef CONFIG_X86_64
- if (!after_bootmem && !start) {
- pud_t *pud;
- pmd_t *pmd;
-
- mmu_cr4_features = read_cr4();
-
- /*
- * _brk_end cannot change anymore, but it and _end may be
- * located on different 2M pages. cleanup_highmap(), however,
- * can only consider _end when it runs, so destroy any
- * mappings beyond _brk_end here.
- */
- pud = pud_offset(pgd_offset_k(_brk_end), _brk_end);
- pmd = pmd_offset(pud, _brk_end - 1);
- while (++pmd <= pmd_offset(pud, (unsigned long)_end - 1))
- pmd_clear(pmd);
- }
-#endif
__flush_tlb_all();
if (!after_bootmem && e820_table_end > e820_table_start)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..a8d08c2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -51,6 +51,7 @@
#include <asm/numa.h>
#include <asm/cacheflush.h>
#include <asm/init.h>
+#include <asm/setup.h>
static int __init parse_direct_gbpages_off(char *arg)
{
@@ -293,18 +294,18 @@ void __init init_extra_mapping_uc(unsigned long phys,
unsigned long size)
* to the compile time generated pmds. This results in invalid pmds up
* to the point where we hit the physaddr 0 mapping.
*
- * We limit the mappings to the region from _text to _end. _end is
- * rounded up to the 2MB boundary. This catches the invalid pmds as
+ * We limit the mappings to the region from _text to _brk_end. _brk_end
+ * is rounded up to the 2MB boundary. This catches the invalid pmds as
* well, as they are located before _text:
*/
void __init cleanup_highmap(void)
{
unsigned long vaddr = __START_KERNEL_map;
- unsigned long end = roundup((unsigned long)_end, PMD_SIZE) - 1;
+ unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped <<
PAGE_SHIFT);
+ unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
pmd_t *pmd = level2_kernel_pgt;
- pmd_t *last_pmd = pmd + PTRS_PER_PMD;
- for (; pmd < last_pmd; pmd++, vaddr += PMD_SIZE) {
+ for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
if (pmd_none(*pmd))
continue;
if (vaddr < (unsigned long) _text || vaddr > end)
--
1.5.6.5
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2011-Feb-28 18:25 UTC
[Xen-devel] [PATCH 2/2] xen: set max_pfn_mapped to the last pfn mapped
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Do not set max_pfn_mapped to the end of the initial memory mappings,
that also contain pages that don''t belong in pfn space (like the mfn
list).
Set max_pfn_mapped to the last real pfn mapped in the initial memory
mappings that is the pfn backing _end.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
arch/x86/xen/mmu.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e92b61..6092f73 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1653,9 +1653,6 @@ static __init void xen_map_identity_early(pmd_t *pmd,
unsigned long max_pfn)
for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) {
pte_t pte;
- if (pfn > max_pfn_mapped)
- max_pfn_mapped = pfn;
-
if (!pte_none(pte_page[pteidx]))
continue;
@@ -1713,6 +1710,12 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
pud_t *l3;
pmd_t *l2;
+ /* max_pfn_mapped is the last pfn mapped in the initial memory
+ * mappings. Considering that on Xen after the kernel mappings we
+ * have the mappings of some pages that don''t exist in pfn space, we
+ * set max_pfn_mapped to the last real pfn mapped. */
+ max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
+
/* Zap identity mapping */
init_level4_pgt[0] = __pgd(0);
@@ -1817,9 +1820,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
initial_kernel_pmd extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
- max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
- xen_start_info->nr_pt_frames * PAGE_SIZE +
- 512*1024);
+ max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
--
1.5.6.5
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Yinghai Lu
2011-Feb-28 18:42 UTC
[Xen-devel] Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
On 02/28/2011 10:24 AM, Stefano Stabellini wrote:> Hi all, > a little while ago I sent a patch titled "x86/mm/init: respect memblock > reserved regions when destroying mappings" > (https://lkml.org/lkml/2011/1/31/232) to fix a serious boot crash > problem on Xen (full logs attached): > > Pid: 0, comm: swapper Not tainted 2.6.38-rc6+ #1270 Hewlett-Packard HP xw8600 Workstation/0A98h > RIP: e030:[<ffffffff81008314>] [<ffffffff81008314>] get_phys_to_machine+0x44/0x50 > RSP: e02b:ffffffff82001ca0 EFLAGS: 00010002 > RAX: ffffffff824ce000 RBX: 0000000126004067 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: 00000001cfdc2000 RDI: 0000000000000004 > RBP: ffffffff82001ca0 R08: 0000000000000020 R09: 0000000000000000 > R10: 0000000000000007 R11: 00000000ffffffff R12: 0000000000126004 > R13: 0000000000002004 R14: ffff880100000000 R15: ffff8801cfdc2000 > FS: 0000000000000000(0000) GS:ffffffff82162000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000002003000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 > Process swapper (pid: 0, threadinfo ffffffff82000000, task ffffffff8200b020) > Stack: > ffffffff82001cd0 ffffffff8100582c ffffffff81dce1bc ffffffff82001e10 > 00000001cfdc2000 ffffffff82003880 ffffffff82001ce0 ffffffff8100587e > ffffffff82001d98 ffffffff8100498f 00000000ffffffff 0000000000000007 > Call Trace: > [<ffffffff8100582c>] pte_mfn_to_pfn+0x8c/0xb0 > [<ffffffff8100587e>] xen_pgd_val+0xe/0x10 > [<ffffffff8100498f>] __raw_callee_save_xen_pgd_val+0x11/0x1e > [<ffffffff813ba570>] ? xenboot_write_console+0x0/0xd0 > [<ffffffff821c24b8>] ? kernel_physical_mapping_init+0x83/0x1db > [<ffffffff8195469f>] init_memory_mapping+0x31f/0x6d0 > [<ffffffff821989fd>] ? memblock_reserve+0x1b/0x21 > [<ffffffff8217de95>] setup_arch+0xa59/0xd89 > [<ffffffff819b9c90>] ? _raw_spin_unlock_irqrestore+0x20/0x30 > [<ffffffff810074bd>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e > [<ffffffff82177b35>] start_kernel+0xc6/0x4df > [<ffffffff821772c5>] x86_64_start_reservations+0xa5/0xc9 > [<ffffffff8217b6fa>] xen_start_kernel+0x5d3/0x6a9 > > > Even though a clear solution wasn''t reached in the following discussion, > Yinghai Lu sent a patch to move cleanup_highmap() after reserve_brk() so > that we don''t have to clear the initial mappings in two steps. > The patch is a nice cleanup and with few small changes to honour the > variable max_pfn_mapped can be used to fix the boot issue on Xen: all we > have to do is setting max_pfn_mapped to the last valid pfn mapped on Xen > that is the page baking _end. > > > The list of patches with diffstat follows, comments and suggestions are > very welcome: > > Stefano Stabellini (1): > xen: set max_pfn_mapped to the last pfn mapped > > Yinghai Lu (1): > x86: Cleanup highmap after brk is concluded > > arch/x86/kernel/head64.c | 3 --- > arch/x86/kernel/setup.c | 6 ++++++ > arch/x86/mm/init.c | 19 ------------------- > arch/x86/mm/init_64.c | 11 ++++++----- > arch/x86/xen/mmu.c | 13 +++++++------ > 5 files changed, 19 insertions(+), 33 deletions(-) > > > A git branch based on 2.6.38-rc6 is available here: >Can you please rebase them on top of tip/x86/mm? http://people.redhat.com/mingo/tip.git/readme.txt Thanks Yinghai Lu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-01 15:13 UTC
[Xen-devel] Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
On Mon, 28 Feb 2011, Yinghai Lu wrote:> Can you please rebase them on top of tip/x86/mm? > > http://people.redhat.com/mingo/tip.git/readme.txt >Sure, I rebased the two patches on the very latest tip/x86/mm: git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-mm-fix _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-08 14:27 UTC
[Xen-devel] Re: [PATCH 0/2] x86: cleanup highmap after brk is concluded
On Tue, 1 Mar 2011, Stefano Stabellini wrote:> On Mon, 28 Feb 2011, Yinghai Lu wrote: > > Can you please rebase them on top of tip/x86/mm? > > > > http://people.redhat.com/mingo/tip.git/readme.txt > > > > Sure, I rebased the two patches on the very latest tip/x86/mm: > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.38-tip-mm-fix >ping? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel