Sengul Thomas observed that Xen was updating the domheap mappings without a lock and that this caused problems on SMP systems. The underlying problem here is that the domheap mappings are supposed to be per-PCPU but were not, instead all PCPUs are currently sharing the same pagetables and therefore using the same mappings. Implement per-PCPU page tables and give each PCPU its own domheap mapping pages. Sengul, thanks for your bugreport, I wonder if you could test this series? WRT the Freeze this change represents a bug fix not a feature... Ian.
Ian Campbell
2013-Apr-22 16:43 UTC
[PATCH 1/4] arm: remove incorrect dcache flush at start of day.
flush_xen_dcache flushes by virtual address and at this point boot_ttbr contains a physical address. The actual virtual address of the root pagetable is within the region flushed by the following flush of the _start to _end range. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/mm.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index ba3140d..0801239 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -306,7 +306,6 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) /* Change pagetables to the copy in the relocated Xen */ boot_ttbr = (uintptr_t) xen_pgtable + phys_offset; - flush_xen_dcache(boot_ttbr); flush_xen_dcache_va_range((void*)dest_va, _end - _start); flush_xen_text_tlb(); -- 1.7.2.5
Ian Campbell
2013-Apr-22 16:43 UTC
[PATCH 2/4] xen: arm: rename xen_pgtable to boot_pgtable
The intention is that in a subsequent patch each PCPU will have its own pagetables and that xen_pgtable will become a per-cpu variable. The boot pagetables will become the boot cpu''s pagetables. For now leave a #define in place for those places which semantically do mean xen_pgtable and not boot_pgtable. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/arm32/head.S | 2 +- xen/arch/arm/arm64/head.S | 4 ++-- xen/arch/arm/mm.c | 18 +++++++++++------- 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S index f2f581d..0b4cfde 100644 --- a/xen/arch/arm/arm32/head.S +++ b/xen/arch/arm/arm32/head.S @@ -206,7 +206,7 @@ skip_bss: mcr CP32(r0, HSCTLR) /* Write Xen''s PT''s paddr into the HTTBR */ - ldr r4, =xen_pgtable + ldr r4, =boot_pgtable add r4, r4, r10 /* r4 := paddr (xen_pagetable) */ mov r5, #0 /* r4:r5 is paddr (xen_pagetable) */ mcrr CP64(r4, r5, HTTBR) diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index c18ef2b..f0d9066 100644 --- a/xen/arch/arm/arm64/head.S +++ b/xen/arch/arm/arm64/head.S @@ -190,14 +190,14 @@ skip_bss: msr SCTLR_EL2, x0 /* Write Xen''s PT''s paddr into the HTTBR */ - ldr x4, =xen_pgtable + ldr x4, =boot_pgtable add x4, x4, x20 /* x4 := paddr (xen_pagetable) */ msr TTBR0_EL2, x4 /* Non-boot CPUs don''t need to rebuild the pagetable */ cbnz x22, pt_ready - ldr x1, =xen_first + ldr x1, =boot_first add x1, x1, x20 /* x1 := paddr (xen_first) */ mov x3, #PT_PT /* x2 := table map of xen_first */ orr x2, x1, x3 /* (+ rights for linear PT) */ diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 0801239..f4179d8 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -40,10 +40,10 @@ struct domain *dom_xen, *dom_io, *dom_cow; /* Static start-of-day pagetables that we use before the allocators are up */ -/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ -lpae_t xen_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +/* boot_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ +lpae_t boot_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #ifdef CONFIG_ARM_64 -lpae_t xen_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +lpae_t boot_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #endif /* N.B. The second-level table is 4 contiguous pages long, and covers * all addresses from 0 to 0xffffffff. Offsets into it are calculated @@ -52,6 +52,10 @@ lpae_t xen_second[LPAE_ENTRIES*4] __attribute__((__aligned__(4096*4))); lpae_t xen_fixmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); static lpae_t xen_xenmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +/* boot_pgtable becomes the boot processors pagetable, eventually this will + * become a per-cpu variable */ +#define xen_pgtable boot_pgtable + /* Non-boot CPUs use this to find the correct pagetables. */ uint64_t boot_ttbr; @@ -284,11 +288,11 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) /* Beware! Any state we modify between now and the PT switch may be * discarded when we switch over to the copy. */ - /* Update the copy of xen_pgtable to use the new paddrs */ - p = (void *) xen_pgtable + dest_va - (unsigned long) _start; + /* Update the copy of boot_pgtable to use the new paddrs */ + p = (void *) boot_pgtable + dest_va - (unsigned long) _start; #ifdef CONFIG_ARM_64 p[0].pt.base += (phys_offset - boot_phys_offset) >> PAGE_SHIFT; - p = (void *) xen_first + dest_va - (unsigned long) _start; + p = (void *) boot_first + dest_va - (unsigned long) _start; #endif for ( i = 0; i < 4; i++) p[i].pt.base += (phys_offset - boot_phys_offset) >> PAGE_SHIFT; @@ -305,7 +309,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) p[i].pt.base += (phys_offset - boot_phys_offset) >> PAGE_SHIFT; /* Change pagetables to the copy in the relocated Xen */ - boot_ttbr = (uintptr_t) xen_pgtable + phys_offset; + boot_ttbr = (uintptr_t) boot_pgtable + phys_offset; flush_xen_dcache_va_range((void*)dest_va, _end - _start); flush_xen_text_tlb(); -- 1.7.2.5
Ian Campbell
2013-Apr-22 16:43 UTC
[PATCH 3/4] arm: parenthesise argument to *_linear_offset macros
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/page.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 14e63eb..a6a312f 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -323,9 +323,9 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr) #define FIRST_MASK (~(FIRST_SIZE - 1)) /* Calculate the offsets into the pagetables for a given VA */ -#define first_linear_offset(va) (va >> FIRST_SHIFT) -#define second_linear_offset(va) (va >> SECOND_SHIFT) -#define third_linear_offset(va) (va >> THIRD_SHIFT) +#define first_linear_offset(va) ((va) >> FIRST_SHIFT) +#define second_linear_offset(va) ((va) >> SECOND_SHIFT) +#define third_linear_offset(va) ((va) >> THIRD_SHIFT) #define TABLE_OFFSET(offs) ((unsigned int)(offs) & LPAE_ENTRY_MASK) #define first_table_offset(va) TABLE_OFFSET(first_linear_offset(va)) -- 1.7.2.5
Ian Campbell
2013-Apr-22 16:43 UTC
[PATCH 4/4] arm: allocate per-PCPU domheap pagetable pages
The domheap mappings are supposed to be per-PCPU. Therefore xen_pgtable becomes a per-PCPU variable and we allocate and setup the page tables for each secondary PCPU just before we tell it to come up. Each secondary PCPU starts out on the boot page table but switches to its own page tables ASAP. The boot PCPU uses the boot pagetables as its own. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/mm.c | 138 +++++++++++++++++++++++++++++++++++++----- xen/arch/arm/smpboot.c | 6 ++ xen/include/asm-arm/config.h | 4 + xen/include/asm-arm/mm.h | 4 +- 4 files changed, 136 insertions(+), 16 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index f4179d8..e3b8541 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -39,22 +39,47 @@ struct domain *dom_xen, *dom_io, *dom_cow; -/* Static start-of-day pagetables that we use before the allocators are up */ -/* boot_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ +/* Static start-of-day pagetables that we use before the + * allocators are up. These go on to become the boot CPUs real pagetables. + */ lpae_t boot_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #ifdef CONFIG_ARM_64 lpae_t boot_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #endif -/* N.B. The second-level table is 4 contiguous pages long, and covers - * all addresses from 0 to 0xffffffff. Offsets into it are calculated - * with second_linear_offset(), not second_table_offset(). */ + +/* + * xen_pgtable and xen_dommap are per-PCPU and are allocated before + * bringing up each CPU. On 64-bit a first level table is also allocated. + * + * xen_second, xen_fixmap and xen_xenmap are shared between all PCPUS. + */ + +/* Per-CPU pagetable pages */ +/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ +static DEFINE_PER_CPU(lpae_t *, xen_pgtable); +/* xen_dommap == pages used by map_domain_page, these pages contain + * the second level pagetables which mapp the domheap region + * DOMHEAP_VIRT_START...DOMHEAP_VIRT_END in 2MB chunks. */ +static DEFINE_PER_CPU(lpae_t *, xen_dommap); + +/* Common pagetable leaves */ +/* Second level page tables. + * + * The second-level table is 2 contiguous pages long, and covers all + * addresses from 0 to 0x7fffffff. + * + * Addresses 0x80000000 to 0xffffffff are covered by the per-cpu + * xen_domheap mappings described above. However we allocate 4 pages + * here for use in the boot page tables and the second two pages + * become the boot CPUs xen_dommap pages. + */ lpae_t xen_second[LPAE_ENTRIES*4] __attribute__((__aligned__(4096*4))); +/* First level page table used for fixmap */ lpae_t xen_fixmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +/* First level page table used to map Xen itself with the XN bit set + * as appropriate. */ static lpae_t xen_xenmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); -/* boot_pgtable becomes the boot processors pagetable, eventually this will - * become a per-cpu variable */ -#define xen_pgtable boot_pgtable /* Non-boot CPUs use this to find the correct pagetables. */ uint64_t boot_ttbr; @@ -107,12 +132,17 @@ done: void dump_hyp_walk(vaddr_t addr) { uint64_t ttbr = READ_SYSREG64(TTBR0_EL2); + lpae_t *pgtable = this_cpu(xen_pgtable); - printk("Walking Hypervisor VA 0x%"PRIvaddr" via TTBR 0x%016"PRIx64"\n", - addr, ttbr); + printk("Walking Hypervisor VA 0x%"PRIvaddr" " + "on CPU%d via TTBR 0x%016"PRIx64"\n", + addr, smp_processor_id(), ttbr); - BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != xen_pgtable ); - dump_pt_walk(xen_pgtable, addr); + if ( smp_processor_id() == 0 ) + BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != pgtable ); + else + BUG_ON( virt_to_maddr(pgtable) != ttbr ); + dump_pt_walk(pgtable, addr); } /* Map a 4k page in a fixmap entry */ @@ -138,7 +168,7 @@ void clear_fixmap(unsigned map) void *map_domain_page(unsigned long mfn) { unsigned long flags; - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); + lpae_t *map = this_cpu(xen_dommap); unsigned long slot_mfn = mfn & ~LPAE_ENTRY_MASK; vaddr_t va; lpae_t pte; @@ -204,7 +234,7 @@ void *map_domain_page(unsigned long mfn) void unmap_domain_page(const void *va) { unsigned long flags; - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); + lpae_t *map = this_cpu(xen_dommap); int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; local_irq_save(flags); @@ -219,7 +249,7 @@ void unmap_domain_page(const void *va) unsigned long domain_page_map_to_mfn(const void *va) { - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); + lpae_t *map = this_cpu(xen_dommap); int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; unsigned long offset = ((unsigned long)va>>THIRD_SHIFT) & LPAE_ENTRY_MASK; @@ -361,11 +391,89 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); /* Flush everything after setting WXN bit. */ flush_xen_text_tlb(); + + per_cpu(xen_pgtable, 0) = boot_pgtable; + per_cpu(xen_dommap, 0) = xen_second + + second_linear_offset(DOMHEAP_VIRT_START); + + /* Some of these slots may have been used during start of day and/or + * relocation. Make sure they are clear now. */ + memset(this_cpu(xen_dommap), 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); + flush_xen_dcache_va_range(this_cpu(xen_dommap), + DOMHEAP_SECOND_PAGES*PAGE_SIZE); +} + +int init_secondary_pagetables(int cpu) +{ + lpae_t *root, *first, *domheap, pte; + int i; + + root = alloc_xenheap_page(); +#ifdef CONFIG_ARM_64 + first = alloc_xenheap_page(); +#else + first = root; /* root == first level on 32-bit 3-level trie */ +#endif + domheap = alloc_xenheap_pages(get_order_from_pages(DOMHEAP_SECOND_PAGES), 0); + + if ( root == NULL || domheap == NULL || first == NULL + ) + { + printk("Not enough free memory for secondary CPU%d pagetables\n", cpu); + free_xenheap_pages(domheap, get_order_from_pages(DOMHEAP_SECOND_PAGES)); +#ifdef CONFIG_ARM_64 + free_xenheap_page(first); +#endif + free_xenheap_page(root); + return -ENOMEM; + } + + /* Initialise root pagetable from root of boot tables */ + memcpy(root, boot_pgtable, PAGE_SIZE); + +#ifdef CONFIG_ARM_64 + /* Initialise first pagetable from first level of boot tables, and + * hook into the new root. */ + memcpy(first, boot_first, PAGE_SIZE); + pte = mfn_to_xen_entry(virt_to_mfn(first)); + pte.pt.table = 1; + write_pte(root, pte); +#endif + + /* Ensure the domheap has no stray mappings */ + memset(domheap, 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); + + /* Update the first level mapping to reference the local CPUs + * domheap mapping pages. */ + for ( i = 0; i < 2; i++ ) + { + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); + pte.pt.table = 1; + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); + } + + per_cpu(xen_pgtable, cpu) = root; + per_cpu(xen_dommap, cpu) = domheap; + + return 0; } /* MMU setup for secondary CPUS (which already have paging enabled) */ void __cpuinit mmu_init_secondary_cpu(void) { + uint64_t ttbr; + + /* Change to this CPUs pagetables */ + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); + flush_xen_dcache_va_range(this_cpu(xen_pgtable), PAGE_SIZE); + flush_xen_dcache_va_range(this_cpu(xen_dommap), + DOMHEAP_SECOND_PAGES*PAGE_SIZE); + flush_xen_text_tlb(); + + WRITE_SYSREG64(ttbr, TTBR0_EL2); + dsb(); /* Ensure visibility of HTTBR update */ + flush_xen_text_tlb(); + /* From now on, no mapping may be both writable and executable. */ WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); flush_xen_text_tlb(); diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index bd353c8..8011987 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -220,6 +220,12 @@ void stop_cpu(void) /* Bring up a remote CPU */ int __cpu_up(unsigned int cpu) { + int rc; + + rc = init_secondary_pagetables(cpu); + if ( rc < 0 ) + return rc; + /* Tell the remote CPU which stack to boot on. */ init_stack = idle_vcpu[cpu]->arch.stack; diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index 8be8563..79f6353 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -98,12 +98,16 @@ #define EARLY_VMAP_VIRT_START mk_unsigned_long(0x10000000) #define XENHEAP_VIRT_START mk_unsigned_long(0x40000000) #define DOMHEAP_VIRT_START mk_unsigned_long(0x80000000) +#define DOMHEAP_VIRT_END mk_unsigned_long(0xffffffff) #define EARLY_VMAP_VIRT_END XENHEAP_VIRT_START #define HYPERVISOR_VIRT_START XEN_VIRT_START #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ +/* Number domheap pagetable pages require at the second level (2MB mappings) */ +#define DOMHEAP_SECOND_PAGES ((DOMHEAP_VIRT_END - DOMHEAP_VIRT_START + 1) >> FIRST_SHIFT) + /* Fixmap slots */ #define FIXMAP_CONSOLE 0 /* The primary UART */ #define FIXMAP_PT 1 /* Temporary mappings of pagetable pages */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 4be383e..26c271e 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -138,7 +138,9 @@ extern unsigned long total_pages; /* Boot-time pagetable setup */ extern void setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr); -/* MMU setup for secondary CPUS (which already have paging enabled) */ +/* Allocate and initialise pagetables for a secondary CPU */ +extern int __cpuinit init_secondary_pagetables(int cpu); +/* Switch secondary CPUS to its own pagetables and finalise MMU setup */ extern void __cpuinit mmu_init_secondary_cpu(void); /* Second stage paging setup, to be called on all CPUs */ extern void __cpuinit setup_virt_paging(void); -- 1.7.2.5
Julien Grall
2013-Apr-22 17:43 UTC
Re: [PATCH 1/4] arm: remove incorrect dcache flush at start of day.
On 04/22/2013 05:43 PM, Ian Campbell wrote:> flush_xen_dcache flushes by virtual address and at this point boot_ttbr > contains a physical address.I think this part of the comment is wrong. flush_xen_dcache will flush the address of boot_ttbr, not the one contains in the variable.> The actual virtual address of the root pagetable is within the region flushed > by the following flush of the _start to _end range. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > xen/arch/arm/mm.c | 1 - > 1 files changed, 0 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index ba3140d..0801239 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -306,7 +306,6 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > /* Change pagetables to the copy in the relocated Xen */ > boot_ttbr = (uintptr_t) xen_pgtable + phys_offset; > - flush_xen_dcache(boot_ttbr); > flush_xen_dcache_va_range((void*)dest_va, _end - _start); > flush_xen_text_tlb(); >
Stefano Stabellini
2013-Apr-22 18:16 UTC
Re: [PATCH 4/4] arm: allocate per-PCPU domheap pagetable pages
On Mon, 22 Apr 2013, Ian Campbell wrote:> The domheap mappings are supposed to be per-PCPU. Therefore xen_pgtable > becomes a per-PCPU variable and we allocate and setup the page tables for each > secondary PCPU just before we tell it to come up. > > Each secondary PCPU starts out on the boot page table but switches to its own > page tables ASAP. > > The boot PCPU uses the boot pagetables as its own. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > xen/arch/arm/mm.c | 138 +++++++++++++++++++++++++++++++++++++----- > xen/arch/arm/smpboot.c | 6 ++ > xen/include/asm-arm/config.h | 4 + > xen/include/asm-arm/mm.h | 4 +- > 4 files changed, 136 insertions(+), 16 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index f4179d8..e3b8541 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -39,22 +39,47 @@ > > struct domain *dom_xen, *dom_io, *dom_cow; > > -/* Static start-of-day pagetables that we use before the allocators are up */ > -/* boot_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ > +/* Static start-of-day pagetables that we use before the > + * allocators are up. These go on to become the boot CPUs real pagetables. > + */ > lpae_t boot_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > #ifdef CONFIG_ARM_64 > lpae_t boot_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > #endif > -/* N.B. The second-level table is 4 contiguous pages long, and covers > - * all addresses from 0 to 0xffffffff. Offsets into it are calculated > - * with second_linear_offset(), not second_table_offset(). */ > + > +/* > + * xen_pgtable and xen_dommap are per-PCPU and are allocated before > + * bringing up each CPU. On 64-bit a first level table is also allocated. > + * > + * xen_second, xen_fixmap and xen_xenmap are shared between all PCPUS. > + */ > + > +/* Per-CPU pagetable pages */ > +/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ > +static DEFINE_PER_CPU(lpae_t *, xen_pgtable); > +/* xen_dommap == pages used by map_domain_page, these pages contain > + * the second level pagetables which mapp the domheap region > + * DOMHEAP_VIRT_START...DOMHEAP_VIRT_END in 2MB chunks. */ > +static DEFINE_PER_CPU(lpae_t *, xen_dommap); > + > +/* Common pagetable leaves */ > +/* Second level page tables. > + * > + * The second-level table is 2 contiguous pages long, and covers all > + * addresses from 0 to 0x7fffffff. > + * > + * Addresses 0x80000000 to 0xffffffff are covered by the per-cpu > + * xen_domheap mappings described above. However we allocate 4 pages > + * here for use in the boot page tables and the second two pages > + * become the boot CPUs xen_dommap pages. > + */ > lpae_t xen_second[LPAE_ENTRIES*4] __attribute__((__aligned__(4096*4))); > +/* First level page table used for fixmap */ > lpae_t xen_fixmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > +/* First level page table used to map Xen itself with the XN bit set > + * as appropriate. */ > static lpae_t xen_xenmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > > -/* boot_pgtable becomes the boot processors pagetable, eventually this will > - * become a per-cpu variable */ > -#define xen_pgtable boot_pgtable > > /* Non-boot CPUs use this to find the correct pagetables. */ > uint64_t boot_ttbr; > @@ -107,12 +132,17 @@ done: > void dump_hyp_walk(vaddr_t addr) > { > uint64_t ttbr = READ_SYSREG64(TTBR0_EL2); > + lpae_t *pgtable = this_cpu(xen_pgtable); > > - printk("Walking Hypervisor VA 0x%"PRIvaddr" via TTBR 0x%016"PRIx64"\n", > - addr, ttbr); > + printk("Walking Hypervisor VA 0x%"PRIvaddr" " > + "on CPU%d via TTBR 0x%016"PRIx64"\n", > + addr, smp_processor_id(), ttbr); > > - BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != xen_pgtable ); > - dump_pt_walk(xen_pgtable, addr); > + if ( smp_processor_id() == 0 ) > + BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != pgtable ); > + else > + BUG_ON( virt_to_maddr(pgtable) != ttbr ); > + dump_pt_walk(pgtable, addr); > } > > /* Map a 4k page in a fixmap entry */ > @@ -138,7 +168,7 @@ void clear_fixmap(unsigned map) > void *map_domain_page(unsigned long mfn) > { > unsigned long flags; > - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); > + lpae_t *map = this_cpu(xen_dommap); > unsigned long slot_mfn = mfn & ~LPAE_ENTRY_MASK; > vaddr_t va; > lpae_t pte; > @@ -204,7 +234,7 @@ void *map_domain_page(unsigned long mfn) > void unmap_domain_page(const void *va) > { > unsigned long flags; > - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); > + lpae_t *map = this_cpu(xen_dommap); > int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; > > local_irq_save(flags); > @@ -219,7 +249,7 @@ void unmap_domain_page(const void *va) > > unsigned long domain_page_map_to_mfn(const void *va) > { > - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); > + lpae_t *map = this_cpu(xen_dommap); > int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; > unsigned long offset = ((unsigned long)va>>THIRD_SHIFT) & LPAE_ENTRY_MASK; > > @@ -361,11 +391,89 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); > /* Flush everything after setting WXN bit. */ > flush_xen_text_tlb(); > + > + per_cpu(xen_pgtable, 0) = boot_pgtable; > + per_cpu(xen_dommap, 0) = xen_second + > + second_linear_offset(DOMHEAP_VIRT_START); > + > + /* Some of these slots may have been used during start of day and/or > + * relocation. Make sure they are clear now. */ > + memset(this_cpu(xen_dommap), 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > + flush_xen_dcache_va_range(this_cpu(xen_dommap), > + DOMHEAP_SECOND_PAGES*PAGE_SIZE); > +} > + > +int init_secondary_pagetables(int cpu) > +{ > + lpae_t *root, *first, *domheap, pte; > + int i; > + > + root = alloc_xenheap_page(); > +#ifdef CONFIG_ARM_64 > + first = alloc_xenheap_page(); > +#else > + first = root; /* root == first level on 32-bit 3-level trie */ > +#endif > + domheap = alloc_xenheap_pages(get_order_from_pages(DOMHEAP_SECOND_PAGES), 0); > + > + if ( root == NULL || domheap == NULL || first == NULL > + )code style> + { > + printk("Not enough free memory for secondary CPU%d pagetables\n", cpu); > + free_xenheap_pages(domheap, get_order_from_pages(DOMHEAP_SECOND_PAGES)); > +#ifdef CONFIG_ARM_64 > + free_xenheap_page(first); > +#endif > + free_xenheap_page(root); > + return -ENOMEM; > + } > + > + /* Initialise root pagetable from root of boot tables */ > + memcpy(root, boot_pgtable, PAGE_SIZE); > + > +#ifdef CONFIG_ARM_64 > + /* Initialise first pagetable from first level of boot tables, and > + * hook into the new root. */ > + memcpy(first, boot_first, PAGE_SIZE); > + pte = mfn_to_xen_entry(virt_to_mfn(first)); > + pte.pt.table = 1; > + write_pte(root, pte); > +#endif > + > + /* Ensure the domheap has no stray mappings */ > + memset(domheap, 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > + > + /* Update the first level mapping to reference the local CPUs > + * domheap mapping pages. */ > + for ( i = 0; i < 2; i++ )instead of being hardcoded to "2", shouldn''t the limit be based on DOMHEAP_SECOND_PAGES?> + { > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > + pte.pt.table = 1; > + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte);Also shouldn''t we add an ASSERT to check that DOMHEAP_VIRT_START is properly aligned?> + } > + > + per_cpu(xen_pgtable, cpu) = root; > + per_cpu(xen_dommap, cpu) = domheap; > + > + return 0; > } > > /* MMU setup for secondary CPUS (which already have paging enabled) */ > void __cpuinit mmu_init_secondary_cpu(void) > { > + uint64_t ttbr; > + > + /* Change to this CPUs pagetables */ > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable));we should be flushing this ttbr write> + flush_xen_dcache_va_range(this_cpu(xen_pgtable), PAGE_SIZE); > + flush_xen_dcache_va_range(this_cpu(xen_dommap), > + DOMHEAP_SECOND_PAGES*PAGE_SIZE);Given that these pagetable pages are written by cpu0, I wonder whether we actually need to execute any of these flushes on secondary cpus. I think they should be moved to init_secondary_pagetables.> + flush_xen_text_tlb(); > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > + dsb(); /* Ensure visibility of HTTBR update */ > + flush_xen_text_tlb();The two flush_xen_text_tlb are probably necessary at least for the I-cache and the BP.
> Sengul, thanks for your bugreport, I wonder if you could test this > series?Sure, no problem. Thomas
Ian Campbell
2013-Apr-23 08:42 UTC
Re: [PATCH 1/4] arm: remove incorrect dcache flush at start of day.
On Mon, 2013-04-22 at 18:43 +0100, Julien Grall wrote:> On 04/22/2013 05:43 PM, Ian Campbell wrote: > > > flush_xen_dcache flushes by virtual address and at this point boot_ttbr > > contains a physical address. > > I think this part of the comment is wrong. flush_xen_dcache will flush > the address of boot_ttbr, not the one contains in the variable.Ah yes. this is a macro which takes &argument and flushes that. However the global variable boot_ttbr is also part of the following flush_xen_dcache_va_range so I think the removal itself is correct (sounds like you agree?) I reworded the commit message as: arm: remove redundant dcache flush at start of day. boot_ttbr is part of the region flushed by the subsequent flush_xen_dcache_va_range from _start to _end so there is no need to flush separately. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- v2: Corrected changelog description of why the flush is unnecessary. Ian.
Tim Deegan
2013-Apr-23 08:57 UTC
Re: [PATCH 1/4] arm: remove incorrect dcache flush at start of day.
At 09:42 +0100 on 23 Apr (1366710155), Ian Campbell wrote:> On Mon, 2013-04-22 at 18:43 +0100, Julien Grall wrote: > > On 04/22/2013 05:43 PM, Ian Campbell wrote: > > > > > flush_xen_dcache flushes by virtual address and at this point boot_ttbr > > > contains a physical address. > > > > I think this part of the comment is wrong. flush_xen_dcache will flush > > the address of boot_ttbr, not the one contains in the variable. > > Ah yes. this is a macro which takes &argument and flushes that. > > However the global variable boot_ttbr is also part of the following > flush_xen_dcache_va_range so I think the removal itself is correctI don''t think the removal is correct. This is the _unrelocated_ copy of boot_ttbr we''re flushing here, and we do mean to flush the variable itself, for the benefit of the other CPUs which will use it when they turn on paging. Tim.
Ian Campbell
2013-Apr-23 09:35 UTC
Re: [PATCH 1/4] arm: remove incorrect dcache flush at start of day.
On Tue, 2013-04-23 at 09:57 +0100, Tim Deegan wrote:> At 09:42 +0100 on 23 Apr (1366710155), Ian Campbell wrote: > > On Mon, 2013-04-22 at 18:43 +0100, Julien Grall wrote: > > > On 04/22/2013 05:43 PM, Ian Campbell wrote: > > > > > > > flush_xen_dcache flushes by virtual address and at this point boot_ttbr > > > > contains a physical address. > > > > > > I think this part of the comment is wrong. flush_xen_dcache will flush > > > the address of boot_ttbr, not the one contains in the variable. > > > > Ah yes. this is a macro which takes &argument and flushes that. > > > > However the global variable boot_ttbr is also part of the following > > flush_xen_dcache_va_range so I think the removal itself is correct > > I don''t think the removal is correct. This is the _unrelocated_ copy of > boot_ttbr we''re flushing here, and we do mean to flush the variable > itself, for the benefit of the other CPUs which will use it when they > turn on paging.Ah, that makes sense. I''ll drop this patch. Ian.
Ian Campbell
2013-Apr-23 09:48 UTC
Re: [PATCH 4/4] arm: allocate per-PCPU domheap pagetable pages
On Mon, 2013-04-22 at 19:16 +0100, Stefano Stabellini wrote:> On Mon, 22 Apr 2013, Ian Campbell wrote: > > + if ( root == NULL || domheap == NULL || first == NULL > > + ) > > code styleRemnant of an ifdef around first, fixed.> > + /* Update the first level mapping to reference the local CPUs > > + * domheap mapping pages. */ > > + for ( i = 0; i < 2; i++ ) > > instead of being hardcoded to "2", shouldn''t the limit be based on > DOMHEAP_SECOND_PAGES?Yes, thought I''d caught all these, thanks!> > + { > > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > > + pte.pt.table = 1; > > + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > > Also shouldn''t we add an ASSERT to check that DOMHEAP_VIRT_START is > properly aligned?It would be a BUILD_BUG_ON I think and this would be far from the first place where that would be a problem, are we terribly worried about people changing config.h in ways which would break this? They''d notice pretty quickly..> > + } > > + > > + per_cpu(xen_pgtable, cpu) = root; > > + per_cpu(xen_dommap, cpu) = domheap; > > + > > + return 0; > > } > > > > /* MMU setup for secondary CPUS (which already have paging enabled) */ > > void __cpuinit mmu_init_secondary_cpu(void) > > { > > + uint64_t ttbr; > > + > > + /* Change to this CPUs pagetables */ > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > we should be flushing this ttbr writettbr is a local variable, it''s probably only in a register, it''s used for the following SYSREG_WRITE to TTBR0_EL2.> > + flush_xen_dcache_va_range(this_cpu(xen_pgtable), PAGE_SIZE); > > + flush_xen_dcache_va_range(this_cpu(xen_dommap), > > + DOMHEAP_SECOND_PAGES*PAGE_SIZE); > > Given that these pagetable pages are written by cpu0, I wonder whether we > actually need to execute any of these flushes on secondary cpus. I think > they should be moved to init_secondary_pagetables.I wondered about that too, TBH I''m not sure but I think you are probably right, I will move them.> > + flush_xen_text_tlb(); > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > + dsb(); /* Ensure visibility of HTTBR update */ > > + flush_xen_text_tlb(); > > The two flush_xen_text_tlb are probably necessary at least for the > I-cache and the BP.You are agreeing with the code, rather than suggesting a change, right? If you are suggesting a change I''ve not understood what it is ;-) Ian.
On Tue, Apr 23, 2013 at 8:42 AM, Sengul Thomas <thomas.sengul@gmail.com> wrote:>> Sengul, thanks for your bugreport, I wonder if you could test this >> series? > > Sure, no problem.After several times of repeatible tests, it looks no more map/unmap_domain_page assertions occur. I will post again if something different turns out. Thomas
On Tue, 2013-04-23 at 11:01 +0100, Sengul Thomas wrote:> On Tue, Apr 23, 2013 at 8:42 AM, Sengul Thomas <thomas.sengul@gmail.com> wrote: > >> Sengul, thanks for your bugreport, I wonder if you could test this > >> series? > > > > Sure, no problem. > > After several times of repeatible tests, it looks no more > map/unmap_domain_page assertions occur. > I will post again if something different turns out.Awesome, thanks. I''m just about to send a second version of the series with comments addressed. Ian.
Stefano Stabellini
2013-Apr-23 10:12 UTC
Re: [PATCH 4/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 23 Apr 2013, Ian Campbell wrote:> > > + { > > > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > > > + pte.pt.table = 1; > > > + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > > > > Also shouldn''t we add an ASSERT to check that DOMHEAP_VIRT_START is > > properly aligned? > > It would be a BUILD_BUG_ON I think and this would be far from the first > place where that would be a problem, are we terribly worried about > people changing config.h in ways which would break this? They''d notice > pretty quickly..No, but fewer assumptions we make, less error prone is the code> > > + } > > > + > > > + per_cpu(xen_pgtable, cpu) = root; > > > + per_cpu(xen_dommap, cpu) = domheap; > > > + > > > + return 0; > > > } > > > > > > /* MMU setup for secondary CPUS (which already have paging enabled) */ > > > void __cpuinit mmu_init_secondary_cpu(void) > > > { > > > + uint64_t ttbr; > > > + > > > + /* Change to this CPUs pagetables */ > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > we should be flushing this ttbr write > > ttbr is a local variable, it''s probably only in a register, it''s used > for the following SYSREG_WRITE to TTBR0_EL2.Ah, that''s right> > > + flush_xen_dcache_va_range(this_cpu(xen_pgtable), PAGE_SIZE); > > > + flush_xen_dcache_va_range(this_cpu(xen_dommap), > > > + DOMHEAP_SECOND_PAGES*PAGE_SIZE); > > > > Given that these pagetable pages are written by cpu0, I wonder whether we > > actually need to execute any of these flushes on secondary cpus. I think > > they should be moved to init_secondary_pagetables. > > I wondered about that too, TBH I''m not sure but I think you are probably > right, I will move them. > > > > + flush_xen_text_tlb(); > > > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > + dsb(); /* Ensure visibility of HTTBR update */ > > > + flush_xen_text_tlb(); > > > > The two flush_xen_text_tlb are probably necessary at least for the > > I-cache and the BP. > > You are agreeing with the code, rather than suggesting a change, right? > > If you are suggesting a change I''ve not understood what it is ;-)Yes, I was agreeing :)
Ian Campbell
2013-Apr-23 10:14 UTC
Re: [PATCH 4/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 11:12 +0100, Stefano Stabellini wrote:> On Tue, 23 Apr 2013, Ian Campbell wrote: > > > > + { > > > > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > > > > + pte.pt.table = 1; > > > > + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > > > > > > Also shouldn''t we add an ASSERT to check that DOMHEAP_VIRT_START is > > > properly aligned? > > > > It would be a BUILD_BUG_ON I think and this would be far from the first > > place where that would be a problem, are we terribly worried about > > people changing config.h in ways which would break this? They''d notice > > pretty quickly.. > > No, but fewer assumptions we make, less error prone is the codeI''ve thrown a patch onto the end of the V2 series which we can see if we like... Ian.