<stefano.stabellini@eu.citrix.com>
2011-Jul-18 12:25 UTC
[Xen-devel] [PATCH tip/x86/mm] x86_32: calculate additional memory needed by the fixmap
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> When NR_CPUS increases the fixmap might need more than the page allocated by head_32.S. This patch introduces the logic to calculate the additional memory that is going to be required by early_ioremap_page_table_range_init: - enough memory to allocate the pte pages needed to cover the fixmap virtual memory range, minus the single page allocated by head_32.S; - account for the page already allocated by early_ioremap_init; - account for the two additional pages that might be needed to make sure that the kmap''s ptes are contiguous. Unfortunately this code is rather complex and depends on the behaviour of other functions but I hope to have covered all the corner cases. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/mm/init.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 47 insertions(+), 0 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index e72c9f8..a7ee16b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -33,6 +33,9 @@ static void __init find_early_table_space(unsigned long start, { unsigned long pmds = 0, ptes = 0, tables = 0, good_end = end, pud_mapped = 0, pmd_mapped = 0, size = end - start; + int kmap_begin_pmd_idx, kmap_end_pmd_idx; + int fixmap_begin_pmd_idx, fixmap_end_pmd_idx; + int btmap_begin_pmd_idx; phys_addr_t base; pud_mapped = DIV_ROUND_UP(PFN_PHYS(max_pfn_mapped), @@ -92,6 +95,50 @@ static void __init find_early_table_space(unsigned long start, } else ptes = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; +#ifdef CONFIG_X86_32 + fixmap_begin_pmd_idx = __fix_to_virt(__end_of_fixed_addresses - 1) + >> PMD_SHIFT; + /* + * fixmap_end_pmd_idx is the end of the fixmap minus the PMD that + * has been defined in the data section by head_32.S (see + * initial_pg_fixmap). + * Note: This is similar to what early_ioremap_page_table_range_init + * does except that the "end" has PMD_SIZE expunged as per previous + * comment. + */ + fixmap_end_pmd_idx = (FIXADDR_TOP - 1) >> PMD_SHIFT; + btmap_begin_pmd_idx = __fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT; + + size = fixmap_end_pmd_idx - fixmap_begin_pmd_idx; + /* + * early_ioremap_init has already allocated a PMD at + * btmap_begin_pmd_idx + */ + if (btmap_begin_pmd_idx < fixmap_end_pmd_idx) + size--; + +#ifdef CONFIG_HIGHMEM + /* + * see page_table_kmap_check: if the kmap spans multiple PMDs, make + * sure the pte pages are allocated contiguously. It might need up + * to two additional pte pages to replace the page declared by + * head_32.S and the one allocated by early_ioremap_init, if they + * are even partially used for the kmap. + */ + kmap_begin_pmd_idx = __fix_to_virt(FIX_KMAP_END) >> PMD_SHIFT; + kmap_end_pmd_idx = __fix_to_virt(FIX_KMAP_BEGIN) >> PMD_SHIFT; + if (kmap_begin_pmd_idx != kmap_end_pmd_idx) { + if (kmap_end_pmd_idx == fixmap_end_pmd_idx) + size++; + if (btmap_begin_pmd_idx >= kmap_begin_pmd_idx && + btmap_begin_pmd_idx <= kmap_end_pmd_idx) + size++; + } +#endif + + ptes += (size * PMD_SIZE + PAGE_SIZE - 1) >> PAGE_SHIFT; +#endif + tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE); if (!tables) -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Jul-18 16:29 UTC
Re: [Xen-devel] [PATCH tip/x86/mm] x86_32: calculate additional memory needed by the fixmap
stefano.stabellini@eu.citrix.com writes ("[Xen-devel] [PATCH tip/x86/mm] x86_32: calculate additional memory needed by the fixmap"):> Unfortunately this code is rather complex and depends on the behaviour > of other functions but I hope to have covered all the corner cases.I''m sorry to say that think this is really the wrong approach. I agree that *some* change needs to be made, as this is currently a serious regression in tip/x86/mm. But the correct change is in fact to undo the reversion of "x86,xen: introduce x86_init.mapping.pagetable_reserve" That was a hook with a reasonable and defined interface. What we are having instead, now, is a fragile piece of code that tries to second-guess the complex config- and hardware-dependent memory-allocation behaviour of the rest of the startup code. This is a recipe for making things break. Indeed, since the reversion of "mapping.pagetable_reserve" and its replacement with the new "exact calculation" code, tip/x86/mm has already had to have two separated config-dependent fixes - and the thing hasn''t even been pushed to Linus''s tip yet ! This is a hopelessly fragile approach. We should go back to the new pvop. If the interface documentation in 279b706bf800b596 is not satisfactory I would be happy to help improve it. To be honest I think much of the contents of the commit message should be in comments in the code. Indeed, I agree that the current lack of coherent semantic specification for the pvops is a problem. But the solution is not to abolish pvops in favour of fragile duplication of logic. The solution is to fix the specification comments. But, if the x86 maintainers absolutely won''t have the new pvop then this new second-guessing code, fragile as it is, has to go in to fix the regression. Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jul-20 13:09 UTC
Re: [Xen-devel] [PATCH tip/x86/mm] x86_32: calculate additional memory needed by the fixmap
On Mon, Jul 18, 2011 at 05:29:21PM +0100, Ian Jackson wrote:> stefano.stabellini@eu.citrix.com writes ("[Xen-devel] [PATCH tip/x86/mm] x86_32: calculate additional memory needed by the fixmap"): > > Unfortunately this code is rather complex and depends on the behaviour > > of other functions but I hope to have covered all the corner cases. > > I''m sorry to say that think this is really the wrong approach. > I agree that *some* change needs to be made, as this is currently a > serious regression in tip/x86/mm.tip/x86/mm should boot with this patch, right?> > But the correct change is in fact to undo the reversion of > "x86,xen: introduce x86_init.mapping.pagetable_reserve" > That was a hook with a reasonable and defined interface. > > What we are having instead, now, is a fragile piece of code that tries > to second-guess the complex config- and hardware-dependent > memory-allocation behaviour of the rest of the startup code. This is > a recipe for making things break.Stefano did an excellent job at doing an archaeological excavation and finding all the little pieces and bringing them up to surface via this function. We have now the full accounting of the early bootup code and its page table creation. And based on that we can tighten the dance between pgt_end/pgt_begin/pgt_top to WARN much sooner - and I hope make the code more simpler. Sure there are some bugs that we won''t find until more folks start testing it - I''ve only found this one b/c of playing with a bigger set of .config variants. But that is part of the 3.1-rcX cycle - to find them and fix them. Or if we cannot fix them, then revert this whole patchset and think then some more - maybe at the Linux Plumbers Conference or the Linux Kernel mini-summits. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Reasonably Related Threads
- [vhost:linux-next 16/17] include/linux/page_reporting.h:9:34: note: in expansion of macro 'pageblock_order'
- Centos 6 2.6.32-696.18.7.el6.x86_64 does not boot in Xen PV mode
- Centos 6 2.6.32-696.18.7.el6.x86_64 does not boot in Xen PV mode
- Centos 6 2.6.32-696.18.7.el6.x86_64 does not boot in Xen PV mode
- Fw: [RFC] makedumpfile: xen extraction