Sengul Thomas observed that Xen was updating the domheap mappings without a lock and that this caused problems on SMP systems. The underlying problem here is that the domheap mappings are supposed to be per-PCPU but were not, instead all PCPUs are currently sharing the same pagetables and therefore using the same mappings. Implement per-PCPU page tables and give each PCPU its own domheap mapping pages. Sengul, thanks for your bugreport, I wonder if you could test this series? WRT the Freeze this change represents a bug fix not a feature... Changes in V2: dropped patch "arm: remove incorrect dcache flush at start of day." which was itself incorrect. addressed Stefano''s review on the meaty patch. added a patch to BUILD_BUG_ON various address space alignment constraints. Ian.
Ian Campbell
2013-Apr-23 10:19 UTC
[PATCH V2 1/4] xen: arm: rename xen_pgtable to boot_pgtable
The intention is that in a subsequent patch each PCPU will have its own pagetables and that xen_pgtable will become a per-cpu variable. The boot pagetables will become the boot cpu''s pagetables. For now leave a #define in place for those places which semantically do mean xen_pgtable and not boot_pgtable. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/arm32/head.S | 2 +- xen/arch/arm/arm64/head.S | 4 ++-- xen/arch/arm/mm.c | 18 +++++++++++------- 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S index f2f581d..0b4cfde 100644 --- a/xen/arch/arm/arm32/head.S +++ b/xen/arch/arm/arm32/head.S @@ -206,7 +206,7 @@ skip_bss: mcr CP32(r0, HSCTLR) /* Write Xen''s PT''s paddr into the HTTBR */ - ldr r4, =xen_pgtable + ldr r4, =boot_pgtable add r4, r4, r10 /* r4 := paddr (xen_pagetable) */ mov r5, #0 /* r4:r5 is paddr (xen_pagetable) */ mcrr CP64(r4, r5, HTTBR) diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index c18ef2b..f0d9066 100644 --- a/xen/arch/arm/arm64/head.S +++ b/xen/arch/arm/arm64/head.S @@ -190,14 +190,14 @@ skip_bss: msr SCTLR_EL2, x0 /* Write Xen''s PT''s paddr into the HTTBR */ - ldr x4, =xen_pgtable + ldr x4, =boot_pgtable add x4, x4, x20 /* x4 := paddr (xen_pagetable) */ msr TTBR0_EL2, x4 /* Non-boot CPUs don''t need to rebuild the pagetable */ cbnz x22, pt_ready - ldr x1, =xen_first + ldr x1, =boot_first add x1, x1, x20 /* x1 := paddr (xen_first) */ mov x3, #PT_PT /* x2 := table map of xen_first */ orr x2, x1, x3 /* (+ rights for linear PT) */ diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index ba3140d..3cb852b 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -40,10 +40,10 @@ struct domain *dom_xen, *dom_io, *dom_cow; /* Static start-of-day pagetables that we use before the allocators are up */ -/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ -lpae_t xen_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +/* boot_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ +lpae_t boot_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #ifdef CONFIG_ARM_64 -lpae_t xen_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +lpae_t boot_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #endif /* N.B. The second-level table is 4 contiguous pages long, and covers * all addresses from 0 to 0xffffffff. Offsets into it are calculated @@ -52,6 +52,10 @@ lpae_t xen_second[LPAE_ENTRIES*4] __attribute__((__aligned__(4096*4))); lpae_t xen_fixmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); static lpae_t xen_xenmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +/* boot_pgtable becomes the boot processors pagetable, eventually this will + * become a per-cpu variable */ +#define xen_pgtable boot_pgtable + /* Non-boot CPUs use this to find the correct pagetables. */ uint64_t boot_ttbr; @@ -284,11 +288,11 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) /* Beware! Any state we modify between now and the PT switch may be * discarded when we switch over to the copy. */ - /* Update the copy of xen_pgtable to use the new paddrs */ - p = (void *) xen_pgtable + dest_va - (unsigned long) _start; + /* Update the copy of boot_pgtable to use the new paddrs */ + p = (void *) boot_pgtable + dest_va - (unsigned long) _start; #ifdef CONFIG_ARM_64 p[0].pt.base += (phys_offset - boot_phys_offset) >> PAGE_SHIFT; - p = (void *) xen_first + dest_va - (unsigned long) _start; + p = (void *) boot_first + dest_va - (unsigned long) _start; #endif for ( i = 0; i < 4; i++) p[i].pt.base += (phys_offset - boot_phys_offset) >> PAGE_SHIFT; @@ -305,7 +309,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) p[i].pt.base += (phys_offset - boot_phys_offset) >> PAGE_SHIFT; /* Change pagetables to the copy in the relocated Xen */ - boot_ttbr = (uintptr_t) xen_pgtable + phys_offset; + boot_ttbr = (uintptr_t) boot_pgtable + phys_offset; flush_xen_dcache(boot_ttbr); flush_xen_dcache_va_range((void*)dest_va, _end - _start); flush_xen_text_tlb(); -- 1.7.2.5
Ian Campbell
2013-Apr-23 10:19 UTC
[PATCH V2 2/4] arm: parenthesise argument to *_linear_offset macros
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/page.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 14e63eb..a6a312f 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -323,9 +323,9 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr) #define FIRST_MASK (~(FIRST_SIZE - 1)) /* Calculate the offsets into the pagetables for a given VA */ -#define first_linear_offset(va) (va >> FIRST_SHIFT) -#define second_linear_offset(va) (va >> SECOND_SHIFT) -#define third_linear_offset(va) (va >> THIRD_SHIFT) +#define first_linear_offset(va) ((va) >> FIRST_SHIFT) +#define second_linear_offset(va) ((va) >> SECOND_SHIFT) +#define third_linear_offset(va) ((va) >> THIRD_SHIFT) #define TABLE_OFFSET(offs) ((unsigned int)(offs) & LPAE_ENTRY_MASK) #define first_table_offset(va) TABLE_OFFSET(first_linear_offset(va)) -- 1.7.2.5
Ian Campbell
2013-Apr-23 10:19 UTC
[PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
The domheap mappings are supposed to be per-PCPU. Therefore xen_pgtable becomes a per-PCPU variable and we allocate and setup the page tables for each secondary PCPU just before we tell it to come up. Each secondary PCPU starts out on the boot page table but switches to its own page tables ASAP. The boot PCPU uses the boot pagetables as its own. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- v2: Coding style fixes. Use DOMHEAP_SECOND_PAGES instead of "2" Flush the secondary CPU''s pagetables on CPU0 when we create/write them not on the secondary CPU. --- xen/arch/arm/mm.c | 137 +++++++++++++++++++++++++++++++++++++----- xen/arch/arm/smpboot.c | 6 ++ xen/include/asm-arm/config.h | 4 + xen/include/asm-arm/mm.h | 4 +- 4 files changed, 135 insertions(+), 16 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 3cb852b..58ce4e1 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -39,22 +39,47 @@ struct domain *dom_xen, *dom_io, *dom_cow; -/* Static start-of-day pagetables that we use before the allocators are up */ -/* boot_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ +/* Static start-of-day pagetables that we use before the + * allocators are up. These go on to become the boot CPUs real pagetables. + */ lpae_t boot_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #ifdef CONFIG_ARM_64 lpae_t boot_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); #endif -/* N.B. The second-level table is 4 contiguous pages long, and covers - * all addresses from 0 to 0xffffffff. Offsets into it are calculated - * with second_linear_offset(), not second_table_offset(). */ + +/* + * xen_pgtable and xen_dommap are per-PCPU and are allocated before + * bringing up each CPU. On 64-bit a first level table is also allocated. + * + * xen_second, xen_fixmap and xen_xenmap are shared between all PCPUS. + */ + +/* Per-CPU pagetable pages */ +/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ +static DEFINE_PER_CPU(lpae_t *, xen_pgtable); +/* xen_dommap == pages used by map_domain_page, these pages contain + * the second level pagetables which mapp the domheap region + * DOMHEAP_VIRT_START...DOMHEAP_VIRT_END in 2MB chunks. */ +static DEFINE_PER_CPU(lpae_t *, xen_dommap); + +/* Common pagetable leaves */ +/* Second level page tables. + * + * The second-level table is 2 contiguous pages long, and covers all + * addresses from 0 to 0x7fffffff. + * + * Addresses 0x80000000 to 0xffffffff are covered by the per-cpu + * xen_domheap mappings described above. However we allocate 4 pages + * here for use in the boot page tables and the second two pages + * become the boot CPUs xen_dommap pages. + */ lpae_t xen_second[LPAE_ENTRIES*4] __attribute__((__aligned__(4096*4))); +/* First level page table used for fixmap */ lpae_t xen_fixmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); +/* First level page table used to map Xen itself with the XN bit set + * as appropriate. */ static lpae_t xen_xenmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); -/* boot_pgtable becomes the boot processors pagetable, eventually this will - * become a per-cpu variable */ -#define xen_pgtable boot_pgtable /* Non-boot CPUs use this to find the correct pagetables. */ uint64_t boot_ttbr; @@ -107,12 +132,17 @@ done: void dump_hyp_walk(vaddr_t addr) { uint64_t ttbr = READ_SYSREG64(TTBR0_EL2); + lpae_t *pgtable = this_cpu(xen_pgtable); - printk("Walking Hypervisor VA 0x%"PRIvaddr" via TTBR 0x%016"PRIx64"\n", - addr, ttbr); + printk("Walking Hypervisor VA 0x%"PRIvaddr" " + "on CPU%d via TTBR 0x%016"PRIx64"\n", + addr, smp_processor_id(), ttbr); - BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != xen_pgtable ); - dump_pt_walk(xen_pgtable, addr); + if ( smp_processor_id() == 0 ) + BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != pgtable ); + else + BUG_ON( virt_to_maddr(pgtable) != ttbr ); + dump_pt_walk(pgtable, addr); } /* Map a 4k page in a fixmap entry */ @@ -138,7 +168,7 @@ void clear_fixmap(unsigned map) void *map_domain_page(unsigned long mfn) { unsigned long flags; - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); + lpae_t *map = this_cpu(xen_dommap); unsigned long slot_mfn = mfn & ~LPAE_ENTRY_MASK; vaddr_t va; lpae_t pte; @@ -204,7 +234,7 @@ void *map_domain_page(unsigned long mfn) void unmap_domain_page(const void *va) { unsigned long flags; - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); + lpae_t *map = this_cpu(xen_dommap); int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; local_irq_save(flags); @@ -219,7 +249,7 @@ void unmap_domain_page(const void *va) unsigned long domain_page_map_to_mfn(const void *va) { - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); + lpae_t *map = this_cpu(xen_dommap); int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; unsigned long offset = ((unsigned long)va>>THIRD_SHIFT) & LPAE_ENTRY_MASK; @@ -362,11 +392,88 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); /* Flush everything after setting WXN bit. */ flush_xen_text_tlb(); + + per_cpu(xen_pgtable, 0) = boot_pgtable; + per_cpu(xen_dommap, 0) = xen_second + + second_linear_offset(DOMHEAP_VIRT_START); + + /* Some of these slots may have been used during start of day and/or + * relocation. Make sure they are clear now. */ + memset(this_cpu(xen_dommap), 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); + flush_xen_dcache_va_range(this_cpu(xen_dommap), + DOMHEAP_SECOND_PAGES*PAGE_SIZE); +} + +int init_secondary_pagetables(int cpu) +{ + lpae_t *root, *first, *domheap, pte; + int i; + + root = alloc_xenheap_page(); +#ifdef CONFIG_ARM_64 + first = alloc_xenheap_page(); +#else + first = root; /* root == first level on 32-bit 3-level trie */ +#endif + domheap = alloc_xenheap_pages(get_order_from_pages(DOMHEAP_SECOND_PAGES), 0); + + if ( root == NULL || domheap == NULL || first == NULL ) + { + printk("Not enough free memory for secondary CPU%d pagetables\n", cpu); + free_xenheap_pages(domheap, get_order_from_pages(DOMHEAP_SECOND_PAGES)); +#ifdef CONFIG_ARM_64 + free_xenheap_page(first); +#endif + free_xenheap_page(root); + return -ENOMEM; + } + + /* Initialise root pagetable from root of boot tables */ + memcpy(root, boot_pgtable, PAGE_SIZE); + +#ifdef CONFIG_ARM_64 + /* Initialise first pagetable from first level of boot tables, and + * hook into the new root. */ + memcpy(first, boot_first, PAGE_SIZE); + pte = mfn_to_xen_entry(virt_to_mfn(first)); + pte.pt.table = 1; + write_pte(root, pte); +#endif + + /* Ensure the domheap has no stray mappings */ + memset(domheap, 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); + + /* Update the first level mapping to reference the local CPUs + * domheap mapping pages. */ + for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ ) + { + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); + pte.pt.table = 1; + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); + } + + flush_xen_dcache_va_range(root, PAGE_SIZE); + flush_xen_dcache_va_range(domheap, DOMHEAP_SECOND_PAGES*PAGE_SIZE); + + per_cpu(xen_pgtable, cpu) = root; + per_cpu(xen_dommap, cpu) = domheap; + + return 0; } /* MMU setup for secondary CPUS (which already have paging enabled) */ void __cpuinit mmu_init_secondary_cpu(void) { + uint64_t ttbr; + + /* Change to this CPUs pagetables */ + flush_xen_text_tlb(); + + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); + WRITE_SYSREG64(ttbr, TTBR0_EL2); + dsb(); /* Ensure visibility of HTTBR update */ + flush_xen_text_tlb(); + /* From now on, no mapping may be both writable and executable. */ WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); flush_xen_text_tlb(); diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index bd353c8..8011987 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -220,6 +220,12 @@ void stop_cpu(void) /* Bring up a remote CPU */ int __cpu_up(unsigned int cpu) { + int rc; + + rc = init_secondary_pagetables(cpu); + if ( rc < 0 ) + return rc; + /* Tell the remote CPU which stack to boot on. */ init_stack = idle_vcpu[cpu]->arch.stack; diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index 8be8563..79f6353 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -98,12 +98,16 @@ #define EARLY_VMAP_VIRT_START mk_unsigned_long(0x10000000) #define XENHEAP_VIRT_START mk_unsigned_long(0x40000000) #define DOMHEAP_VIRT_START mk_unsigned_long(0x80000000) +#define DOMHEAP_VIRT_END mk_unsigned_long(0xffffffff) #define EARLY_VMAP_VIRT_END XENHEAP_VIRT_START #define HYPERVISOR_VIRT_START XEN_VIRT_START #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ +/* Number domheap pagetable pages require at the second level (2MB mappings) */ +#define DOMHEAP_SECOND_PAGES ((DOMHEAP_VIRT_END - DOMHEAP_VIRT_START + 1) >> FIRST_SHIFT) + /* Fixmap slots */ #define FIXMAP_CONSOLE 0 /* The primary UART */ #define FIXMAP_PT 1 /* Temporary mappings of pagetable pages */ diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 4be383e..26c271e 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -138,7 +138,9 @@ extern unsigned long total_pages; /* Boot-time pagetable setup */ extern void setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr); -/* MMU setup for secondary CPUS (which already have paging enabled) */ +/* Allocate and initialise pagetables for a secondary CPU */ +extern int __cpuinit init_secondary_pagetables(int cpu); +/* Switch secondary CPUS to its own pagetables and finalise MMU setup */ extern void __cpuinit mmu_init_secondary_cpu(void); /* Second stage paging setup, to be called on all CPUs */ extern void __cpuinit setup_virt_paging(void); -- 1.7.2.5
Ian Campbell
2013-Apr-23 10:19 UTC
[PATCH V2 4/4] arm: add build time asserts for various virtual address aligment constraints
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- Not totally sold on the need for this, but I guess it is harmless enough... --- xen/arch/arm/mm.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 58ce4e1..a2f9e88 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -99,6 +99,17 @@ unsigned long total_pages; extern char __init_begin[], __init_end[]; +/* Checking VA memory layout alignment. */ +static inline void check_memory_layout_alignment_constraints(void) { + /* 2MB aligned regions */ + BUILD_BUG_ON(XEN_VIRT_START & ~SECOND_MASK); + BUILD_BUG_ON(FIXMAP_ADDR(0) & ~SECOND_MASK); + BUILD_BUG_ON(BOOT_MISC_VIRT_START & ~SECOND_MASK); + /* 1GB aligned regions */ + BUILD_BUG_ON(XENHEAP_VIRT_START & ~FIRST_MASK); + BUILD_BUG_ON(DOMHEAP_VIRT_START & ~FIRST_MASK); +} + void dump_pt_walk(lpae_t *first, paddr_t addr) { lpae_t *second = NULL, *third = NULL; -- 1.7.2.5
Tim Deegan
2013-Apr-23 10:48 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
At 11:19 +0100 on 23 Apr (1366715994), Ian Campbell wrote:> The domheap mappings are supposed to be per-PCPU. Therefore xen_pgtable > becomes a per-PCPU variable and we allocate and setup the page tables for each > secondary PCPU just before we tell it to come up. > > Each secondary PCPU starts out on the boot page table but switches to its own > page tables ASAP. > > The boot PCPU uses the boot pagetables as its own. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > v2: Coding style fixes. > Use DOMHEAP_SECOND_PAGES instead of "2" > Flush the secondary CPU''s pagetables on CPU0 when we create/write them not > on the secondary CPU. > --- > xen/arch/arm/mm.c | 137 +++++++++++++++++++++++++++++++++++++----- > xen/arch/arm/smpboot.c | 6 ++ > xen/include/asm-arm/config.h | 4 + > xen/include/asm-arm/mm.h | 4 +- > 4 files changed, 135 insertions(+), 16 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 3cb852b..58ce4e1 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -39,22 +39,47 @@ > > struct domain *dom_xen, *dom_io, *dom_cow; > > -/* Static start-of-day pagetables that we use before the allocators are up */ > -/* boot_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ > +/* Static start-of-day pagetables that we use before the > + * allocators are up. These go on to become the boot CPUs real pagetables.^''s> + */ > lpae_t boot_pgtable[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > #ifdef CONFIG_ARM_64 > lpae_t boot_first[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > #endif > -/* N.B. The second-level table is 4 contiguous pages long, and covers > - * all addresses from 0 to 0xffffffff. Offsets into it are calculated > - * with second_linear_offset(), not second_table_offset(). */ > + > +/* > + * xen_pgtable and xen_dommap are per-PCPU and are allocated before > + * bringing up each CPU. On 64-bit a first level table is also allocated. > + * > + * xen_second, xen_fixmap and xen_xenmap are shared between all PCPUS.s/PCPUS/PCPUs/> + */ > + > +/* Per-CPU pagetable pages */ > +/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ > +static DEFINE_PER_CPU(lpae_t *, xen_pgtable); > +/* xen_dommap == pages used by map_domain_page, these pages contain > + * the second level pagetables which mapp the domheap region > + * DOMHEAP_VIRT_START...DOMHEAP_VIRT_END in 2MB chunks. */ > +static DEFINE_PER_CPU(lpae_t *, xen_dommap); > + > +/* Common pagetable leaves */ > +/* Second level page tables. > + * > + * The second-level table is 2 contiguous pages long, and covers all > + * addresses from 0 to 0x7fffffff.Can we keep the comment about using second_linear_offset to access xen_second?> + * > + * Addresses 0x80000000 to 0xffffffff are covered by the per-cpu > + * xen_domheap mappings described above. However we allocate 4 pages > + * here for use in the boot page tables and the second two pages > + * become the boot CPUs xen_dommap pages. > + */ > lpae_t xen_second[LPAE_ENTRIES*4] __attribute__((__aligned__(4096*4))); > +/* First level page table used for fixmap */ > lpae_t xen_fixmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > +/* First level page table used to map Xen itself with the XN bit set > + * as appropriate. */ > static lpae_t xen_xenmap[LPAE_ENTRIES] __attribute__((__aligned__(4096))); > > -/* boot_pgtable becomes the boot processors pagetable, eventually this will > - * become a per-cpu variable */ > -#define xen_pgtable boot_pgtable > > /* Non-boot CPUs use this to find the correct pagetables. */ > uint64_t boot_ttbr; > @@ -107,12 +132,17 @@ done: > void dump_hyp_walk(vaddr_t addr) > { > uint64_t ttbr = READ_SYSREG64(TTBR0_EL2); > + lpae_t *pgtable = this_cpu(xen_pgtable); > > - printk("Walking Hypervisor VA 0x%"PRIvaddr" via TTBR 0x%016"PRIx64"\n", > - addr, ttbr); > + printk("Walking Hypervisor VA 0x%"PRIvaddr" " > + "on CPU%d via TTBR 0x%016"PRIx64"\n", > + addr, smp_processor_id(), ttbr); > > - BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != xen_pgtable ); > - dump_pt_walk(xen_pgtable, addr); > + if ( smp_processor_id() == 0 ) > + BUG_ON( (lpae_t *)(unsigned long)(ttbr - phys_offset) != pgtable ); > + else > + BUG_ON( virt_to_maddr(pgtable) != ttbr ); > + dump_pt_walk(pgtable, addr); > } > > /* Map a 4k page in a fixmap entry */ > @@ -138,7 +168,7 @@ void clear_fixmap(unsigned map) > void *map_domain_page(unsigned long mfn) > { > unsigned long flags; > - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); > + lpae_t *map = this_cpu(xen_dommap); > unsigned long slot_mfn = mfn & ~LPAE_ENTRY_MASK; > vaddr_t va; > lpae_t pte; > @@ -204,7 +234,7 @@ void *map_domain_page(unsigned long mfn) > void unmap_domain_page(const void *va) > { > unsigned long flags; > - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); > + lpae_t *map = this_cpu(xen_dommap); > int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; > > local_irq_save(flags); > @@ -219,7 +249,7 @@ void unmap_domain_page(const void *va) > > unsigned long domain_page_map_to_mfn(const void *va) > { > - lpae_t *map = xen_second + second_linear_offset(DOMHEAP_VIRT_START); > + lpae_t *map = this_cpu(xen_dommap); > int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT; > unsigned long offset = ((unsigned long)va>>THIRD_SHIFT) & LPAE_ENTRY_MASK; > > @@ -362,11 +392,88 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); > /* Flush everything after setting WXN bit. */ > flush_xen_text_tlb(); > + > + per_cpu(xen_pgtable, 0) = boot_pgtable; > + per_cpu(xen_dommap, 0) = xen_second + > + second_linear_offset(DOMHEAP_VIRT_START); > + > + /* Some of these slots may have been used during start of day and/or > + * relocation. Make sure they are clear now. */ > + memset(this_cpu(xen_dommap), 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > + flush_xen_dcache_va_range(this_cpu(xen_dommap), > + DOMHEAP_SECOND_PAGES*PAGE_SIZE); > +} > + > +int init_secondary_pagetables(int cpu) > +{ > + lpae_t *root, *first, *domheap, pte; > + int i; > + > + root = alloc_xenheap_page(); > +#ifdef CONFIG_ARM_64 > + first = alloc_xenheap_page(); > +#else > + first = root; /* root == first level on 32-bit 3-level trie */ > +#endif > + domheap = alloc_xenheap_pages(get_order_from_pages(DOMHEAP_SECOND_PAGES), 0); > + > + if ( root == NULL || domheap == NULL || first == NULL ) > + { > + printk("Not enough free memory for secondary CPU%d pagetables\n", cpu); > + free_xenheap_pages(domheap, get_order_from_pages(DOMHEAP_SECOND_PAGES)); > +#ifdef CONFIG_ARM_64 > + free_xenheap_page(first); > +#endif > + free_xenheap_page(root); > + return -ENOMEM; > + } > + > + /* Initialise root pagetable from root of boot tables */ > + memcpy(root, boot_pgtable, PAGE_SIZE); > + > +#ifdef CONFIG_ARM_64 > + /* Initialise first pagetable from first level of boot tables, and > + * hook into the new root. */ > + memcpy(first, boot_first, PAGE_SIZE); > + pte = mfn_to_xen_entry(virt_to_mfn(first)); > + pte.pt.table = 1; > + write_pte(root, pte); > +#endif > + > + /* Ensure the domheap has no stray mappings */ > + memset(domheap, 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > + > + /* Update the first level mapping to reference the local CPUs > + * domheap mapping pages. */ > + for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ ) > + { > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > + pte.pt.table = 1; > + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > + } > + > + flush_xen_dcache_va_range(root, PAGE_SIZE); > + flush_xen_dcache_va_range(domheap, DOMHEAP_SECOND_PAGES*PAGE_SIZE);Don''t we need to flush ''first'' as well?> + > + per_cpu(xen_pgtable, cpu) = root; > + per_cpu(xen_dommap, cpu) = domheap; > + > + return 0; > } > > /* MMU setup for secondary CPUS (which already have paging enabled) */ > void __cpuinit mmu_init_secondary_cpu(void) > { > + uint64_t ttbr; > + > + /* Change to this CPUs pagetables */ > + flush_xen_text_tlb(); > + > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > + WRITE_SYSREG64(ttbr, TTBR0_EL2);isb() here, to make sure the CP write completes?> + dsb(); /* Ensure visibility of HTTBR update */ > + flush_xen_text_tlb(); > + > /* From now on, no mapping may be both writable and executable. */ > WRITE_SYSREG32(READ_SYSREG32(SCTLR_EL2) | SCTLR_WXN, SCTLR_EL2); > flush_xen_text_tlb(); > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index bd353c8..8011987 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -220,6 +220,12 @@ void stop_cpu(void) > /* Bring up a remote CPU */ > int __cpu_up(unsigned int cpu) > { > + int rc; > + > + rc = init_secondary_pagetables(cpu); > + if ( rc < 0 ) > + return rc; > + > /* Tell the remote CPU which stack to boot on. */ > init_stack = idle_vcpu[cpu]->arch.stack; > > diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h > index 8be8563..79f6353 100644 > --- a/xen/include/asm-arm/config.h > +++ b/xen/include/asm-arm/config.h > @@ -98,12 +98,16 @@ > #define EARLY_VMAP_VIRT_START mk_unsigned_long(0x10000000) > #define XENHEAP_VIRT_START mk_unsigned_long(0x40000000) > #define DOMHEAP_VIRT_START mk_unsigned_long(0x80000000) > +#define DOMHEAP_VIRT_END mk_unsigned_long(0xffffffff) > > #define EARLY_VMAP_VIRT_END XENHEAP_VIRT_START > #define HYPERVISOR_VIRT_START XEN_VIRT_START > > #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ > > +/* Number domheap pagetable pages require at the second level (2MB mappings) */^of ^d Cheers, Tim.
At 11:19 +0100 on 23 Apr (1366715977), Ian Campbell wrote:> Sengul Thomas observed that Xen was updating the domheap mappings > without a lock and that this caused problems on SMP systems. > > The underlying problem here is that the domheap mappings are supposed to > be per-PCPU but were not, instead all PCPUs are currently sharing the > same pagetables and therefore using the same mappings. > > Implement per-PCPU page tables and give each PCPU its own domheap > mapping pages.1, 2 and 4: Acked-by: Tim Deegan <tim@xen.org> Tim.
Stefano Stabellini
2013-Apr-23 10:55 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 23 Apr 2013, Ian Campbell wrote:> +int init_secondary_pagetables(int cpu) > +{ > + lpae_t *root, *first, *domheap, pte; > + int i; > + > + root = alloc_xenheap_page(); > +#ifdef CONFIG_ARM_64 > + first = alloc_xenheap_page(); > +#else > + first = root; /* root == first level on 32-bit 3-level trie */ > +#endif > + domheap = alloc_xenheap_pages(get_order_from_pages(DOMHEAP_SECOND_PAGES), 0); > + > + if ( root == NULL || domheap == NULL || first == NULL ) > + { > + printk("Not enough free memory for secondary CPU%d pagetables\n", cpu); > + free_xenheap_pages(domheap, get_order_from_pages(DOMHEAP_SECOND_PAGES)); > +#ifdef CONFIG_ARM_64 > + free_xenheap_page(first); > +#endif > + free_xenheap_page(root); > + return -ENOMEM; > + } > + > + /* Initialise root pagetable from root of boot tables */ > + memcpy(root, boot_pgtable, PAGE_SIZE); > + > +#ifdef CONFIG_ARM_64 > + /* Initialise first pagetable from first level of boot tables, and > + * hook into the new root. */ > + memcpy(first, boot_first, PAGE_SIZE); > + pte = mfn_to_xen_entry(virt_to_mfn(first)); > + pte.pt.table = 1; > + write_pte(root, pte); > +#endif > + > + /* Ensure the domheap has no stray mappings */ > + memset(domheap, 0, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > + > + /* Update the first level mapping to reference the local CPUs > + * domheap mapping pages. */ > + for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ ) > + { > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > + pte.pt.table = 1; > + write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > + } > + > + flush_xen_dcache_va_range(root, PAGE_SIZE); > + flush_xen_dcache_va_range(domheap, DOMHEAP_SECOND_PAGES*PAGE_SIZE);Don''t we need to flush first too (on ARMv8)?> + per_cpu(xen_pgtable, cpu) = root; > + per_cpu(xen_dommap, cpu) = domheap;I think we should flush the two per_cpu variables here
Ian Campbell
2013-Apr-23 11:14 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 11:55 +0100, Stefano Stabellini wrote:> > + flush_xen_dcache_va_range(root, PAGE_SIZE); > > + flush_xen_dcache_va_range(domheap, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > > Don''t we need to flush first too (on ARMv8)?Yes, I think so.> > > > + per_cpu(xen_pgtable, cpu) = root; > > + per_cpu(xen_dommap, cpu) = domheap; > > I think we should flush the two per_cpu variables hereThese are just normal per-cpu variables, they aren''t passed to the MMU hardware or anything like that, so I don''t think we do need to flush. If we need to flush per_cpu writes in the general case then we have a big problem, but I don''t think this is the case. Ian.
Ian Campbell
2013-Apr-23 11:21 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 11:48 +0100, Tim Deegan wrote:> s/PCPUS/PCPUs/Ack to all the various typos...> > + */ > > + > > +/* Per-CPU pagetable pages */ > > +/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */ > > +static DEFINE_PER_CPU(lpae_t *, xen_pgtable); > > +/* xen_dommap == pages used by map_domain_page, these pages contain > > + * the second level pagetables which mapp the domheap region > > + * DOMHEAP_VIRT_START...DOMHEAP_VIRT_END in 2MB chunks. */ > > +static DEFINE_PER_CPU(lpae_t *, xen_dommap); > > + > > +/* Common pagetable leaves */ > > +/* Second level page tables. > > + * > > + * The second-level table is 2 contiguous pages long, and covers all > > + * addresses from 0 to 0x7fffffff. > > Can we keep the comment about using second_linear_offset to access > xen_second?Yes,sorry I rewrote this a few times and must have dropped it.> > + flush_xen_dcache_va_range(root, PAGE_SIZE); > > + flush_xen_dcache_va_range(domheap, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > > Don''t we need to flush ''first'' as well?Yes.> > + uint64_t ttbr; > > + > > + /* Change to this CPUs pagetables */ > > + flush_xen_text_tlb(); > > + > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > isb() here, to make sure the CP write completes?Hrm, we don''t have one in the primary CPU bring up either. flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I vaguely remember having a conversation along those lines when this code was changed to look like this).> > + dsb(); /* Ensure visibility of HTTBR update */I think this comment is actually misleading since it doesn''t actually ensure the visibility, just that things won''t cross the boundary. I''ll try and improve the comment here and in the other site too, does ths sound right?: WRITE_SYSREG64(ttbr, TTBR0_EL2); dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ /* flush_xen_text_tlb contains an ISB which ensures the TTBR0_EL2 * update has completed. */ flush_xen_text_tlb(); ?
Stefano Stabellini
2013-Apr-23 11:23 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 23 Apr 2013, Ian Campbell wrote:> On Tue, 2013-04-23 at 11:55 +0100, Stefano Stabellini wrote: > > > > + flush_xen_dcache_va_range(root, PAGE_SIZE); > > > + flush_xen_dcache_va_range(domheap, DOMHEAP_SECOND_PAGES*PAGE_SIZE); > > > > Don''t we need to flush first too (on ARMv8)? > > Yes, I think so. > > > > > > > > + per_cpu(xen_pgtable, cpu) = root; > > > + per_cpu(xen_dommap, cpu) = domheap; > > > > I think we should flush the two per_cpu variables here > > These are just normal per-cpu variables, they aren''t passed to the MMU > hardware or anything like that, so I don''t think we do need to flush. > > If we need to flush per_cpu writes in the general case then we have a > big problem, but I don''t think this is the case.I noticed that you used xen_pgtable at the beginning of mmu_init_secondary_cpu, that is executed by the second cpu. However the second cpu has already paging enabled and doesn''t actually pass the variable to the hardware, so I think you are right, we don''t need to flush them.
Tim Deegan
2013-Apr-23 11:33 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
Hi, At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote:> > > + /* Change to this CPUs pagetables */ > > > + flush_xen_text_tlb(); > > > + > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > isb() here, to make sure the CP write completes? > > Hrm, we don''t have one in the primary CPU bring up either. > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > vaguely remember having a conversation along those lines when this code > was changed to look like this).Hrmn. I guess that does turn out OK, but given that this happens exactly once per CPU, the extra ISB is a cheap enough price to pay for making the code obviously correct. Tim.> > > + dsb(); /* Ensure visibility of HTTBR update */ > > I think this comment is actually misleading since it doesn''t actually > ensure the visibility, just that things won''t cross the boundary. > > I''ll try and improve the comment here and in the other site too, does > ths sound right?: > WRITE_SYSREG64(ttbr, TTBR0_EL2); > dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > /* flush_xen_text_tlb contains an ISB which ensures the > TTBR0_EL2 > * update has completed. */ > flush_xen_text_tlb(); > ? > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Apr-23 11:38 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 12:33 +0100, Tim Deegan wrote:> Hi, > > At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote: > > > > + /* Change to this CPUs pagetables */ > > > > + flush_xen_text_tlb(); > > > > + > > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > > > isb() here, to make sure the CP write completes? > > > > Hrm, we don''t have one in the primary CPU bring up either. > > > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > > vaguely remember having a conversation along those lines when this code > > was changed to look like this). > > Hrmn. I guess that does turn out OK, but given that this happens > exactly once per CPU, the extra ISB is a cheap enough price to pay for > making the code obviously correct.True. I refactored this code into: static void write_ttbr(uint64_t ttbr) { flush_xen_text_tlb(); WRITE_SYSREG64(boot_ttbr, TTBR0_EL2); dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ isb(); /* ensure that the TTBT write has completed */ flush_xen_text_tlb(); } I wasn''t 100% sure about carrying that first flush_xen_text_tlb to here instead of leaving it in the caller but my rationale was that it ensures things are in a consistent state before we make the switch and both callers include it. Ian.
Ian Campbell
2013-Apr-23 11:42 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 12:38 +0100, Ian Campbell wrote:> On Tue, 2013-04-23 at 12:33 +0100, Tim Deegan wrote: > > Hi, > > > > At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote: > > > > > + /* Change to this CPUs pagetables */ > > > > > + flush_xen_text_tlb(); > > > > > + > > > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > > > > > isb() here, to make sure the CP write completes? > > > > > > Hrm, we don''t have one in the primary CPU bring up either. > > > > > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > > > vaguely remember having a conversation along those lines when this code > > > was changed to look like this). > > > > Hrmn. I guess that does turn out OK, but given that this happens > > exactly once per CPU, the extra ISB is a cheap enough price to pay for > > making the code obviously correct. > > True. I refactored this code into: > static void write_ttbr(uint64_t ttbr) > { > flush_xen_text_tlb(); > > WRITE_SYSREG64(boot_ttbr, TTBR0_EL2);s/boot_//g of course...> dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > isb(); /* ensure that the TTBT write has completed */ > flush_xen_text_tlb(); > } > > I wasn''t 100% sure about carrying that first flush_xen_text_tlb to here > instead of leaving it in the caller but my rationale was that it ensures > things are in a consistent state before we make the switch and both > callers include it. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Apr-23 11:52 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 12:38 +0100, Ian Campbell wrote:> On Tue, 2013-04-23 at 12:33 +0100, Tim Deegan wrote: > > Hi, > > > > At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote: > > > > > + /* Change to this CPUs pagetables */ > > > > > + flush_xen_text_tlb(); > > > > > + > > > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > > > > > isb() here, to make sure the CP write completes? > > > > > > Hrm, we don''t have one in the primary CPU bring up either. > > > > > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > > > vaguely remember having a conversation along those lines when this code > > > was changed to look like this). > > > > Hrmn. I guess that does turn out OK, but given that this happens > > exactly once per CPU, the extra ISB is a cheap enough price to pay for > > making the code obviously correct. > > True. I refactored this code into: > static void write_ttbr(uint64_t ttbr) > { > flush_xen_text_tlb(); > > WRITE_SYSREG64(boot_ttbr, TTBR0_EL2); > dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > isb(); /* ensure that the TTBT write has completed */ > flush_xen_text_tlb(); > }Weird, if I don''t mark this function inline then it hangs at this point (on the model). If I inline then all is fine. Sounds like a missing flush or barrier which is exposed by the return "bx lr" or possibly the stack manipulations on the function epilogue. But I can''t spot it... Ian.
Tim Deegan
2013-Apr-23 12:26 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
At 12:38 +0100 on 23 Apr (1366720695), Ian Campbell wrote:> On Tue, 2013-04-23 at 12:33 +0100, Tim Deegan wrote: > > Hi, > > > > At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote: > > > > > + /* Change to this CPUs pagetables */ > > > > > + flush_xen_text_tlb(); > > > > > + > > > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > > > > > isb() here, to make sure the CP write completes? > > > > > > Hrm, we don''t have one in the primary CPU bring up either. > > > > > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > > > vaguely remember having a conversation along those lines when this code > > > was changed to look like this). > > > > Hrmn. I guess that does turn out OK, but given that this happens > > exactly once per CPU, the extra ISB is a cheap enough price to pay for > > making the code obviously correct. > > True. I refactored this code into: > static void write_ttbr(uint64_t ttbr) > { > flush_xen_text_tlb(); > > WRITE_SYSREG64(boot_ttbr, TTBR0_EL2); > dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > isb(); /* ensure that the TTBT write has completed */Actually I''m going to u-turn entirely here: looking at flush_xen_text_tlb(), maybe we can drop the explicit ISB/DSB here entirely, with a comment saying that they''re taken care of. If not, I think these need to be the other way around: isb to complete the CP write, _then_ dsb to stop anything getting hoisted. Also, I don''t remember why we needed the extra flush_xen_text_tlb before the TTBR write. ISTR it was needed when we moved on to real hardware, but can''t recall exactly why.> flush_xen_text_tlb(); > } > > I wasn''t 100% sure about carrying that first flush_xen_text_tlb to here > instead of leaving it in the caller but my rationale was that it ensures > things are in a consistent state before we make the switch and both > callers include it.Yep, sounds like a good idea. Tim.
Ian Campbell
2013-Apr-23 15:21 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 13:26 +0100, Tim Deegan wrote:> At 12:38 +0100 on 23 Apr (1366720695), Ian Campbell wrote: > > On Tue, 2013-04-23 at 12:33 +0100, Tim Deegan wrote: > > > Hi, > > > > > > At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote: > > > > > > + /* Change to this CPUs pagetables */ > > > > > > + flush_xen_text_tlb(); > > > > > > + > > > > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > > > > > > > isb() here, to make sure the CP write completes? > > > > > > > > Hrm, we don''t have one in the primary CPU bring up either. > > > > > > > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > > > > vaguely remember having a conversation along those lines when this code > > > > was changed to look like this). > > > > > > Hrmn. I guess that does turn out OK, but given that this happens > > > exactly once per CPU, the extra ISB is a cheap enough price to pay for > > > making the code obviously correct. > > > > True. I refactored this code into: > > static void write_ttbr(uint64_t ttbr) > > { > > flush_xen_text_tlb(); > > > > WRITE_SYSREG64(boot_ttbr, TTBR0_EL2); > > dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > > isb(); /* ensure that the TTBT write has completed */ > > Actually I''m going to u-turn entirely here: looking at > flush_xen_text_tlb(), maybe we can drop the explicit ISB/DSB here > entirely, with a comment saying that they''re taken care of.flush_xen_text_tlb is: "isb;" /* Ensure synchronization with previous changes to text */ STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ STORE_CP32(0, ICIALLU) /* Flush I-cache */ STORE_CP32(0, BPIALL) /* Flush branch predictor */ "dsb;" /* Ensure completion of TLB+BP flush */ "isb;" So I think we still need the dsb?> If not, I think these need to be the other way around: isb to complete > the CP write, _then_ dsb to stop anything getting hoisted.Isn''t it the case that because there are no instructions between the isb/dsb and neither of them access memory themselves it doesn''t really matter which way around they are?> Also, I don''t remember why we needed the extra flush_xen_text_tlb before > the TTBR write. ISTR it was needed when we moved on to real hardware, > but can''t recall exactly why.Stefano seems to have added it in a8c8110333 but it replaced an open coded asm inline with mostly the same affect (the function has an extra BPIALL). The asm inline came from the initial checkin... Ian.
Tim Deegan
2013-Apr-23 15:57 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
At 16:21 +0100 on 23 Apr (1366734090), Ian Campbell wrote:> > > True. I refactored this code into: > > > static void write_ttbr(uint64_t ttbr) > > > { > > > flush_xen_text_tlb(); > > > > > > WRITE_SYSREG64(boot_ttbr, TTBR0_EL2); > > > dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > > > isb(); /* ensure that the TTBT write has completed */ > > > > Actually I''m going to u-turn entirely here: looking at > > flush_xen_text_tlb(), maybe we can drop the explicit ISB/DSB here > > entirely, with a comment saying that they''re taken care of. > > flush_xen_text_tlb is: > "isb;" /* Ensure synchronization with previous changes to text */ > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > STORE_CP32(0, ICIALLU) /* Flush I-cache */ > STORE_CP32(0, BPIALL) /* Flush branch predictor */ > "dsb;" /* Ensure completion of TLB+BP flush */ > "isb;" > > So I think we still need the dsb?That depends what the DSB is for. :) If it''s to stop any later data accesses being hoisted past the TTBR write, I think the one in flush_xen_text_tlb() will do -- none of the instructions before it are memory operations. If it''s to stop any earlier writes being sunk past the TTBR write, surely it needs to go before the WRITE_SYSREG64() itself, not before the ISB. Or maybe I''m (once again) confused about exactly what these barriers are doing. Clearly something''s going on here that I don''t quite understand yet, if the return from write_ttbr is crashing.> > If not, I think these need to be the other way around: isb to complete > > the CP write, _then_ dsb to stop anything getting hoisted. > > Isn''t it the case that because there are no instructions between the > isb/dsb and neither of them access memory themselves it doesn''t really > matter which way around they are?I think that since the purpose of the ISB is to synchronize against the MMU finishing the TTBR update, then allowing reads to be hoisted before it must be a Bad Thing[tm].> > Also, I don''t remember why we needed the extra flush_xen_text_tlb before > > the TTBR write. ISTR it was needed when we moved on to real hardware, > > but can''t recall exactly why. > > Stefano seems to have added it in a8c8110333 but it replaced an open > coded asm inline with mostly the same affect (the function has an extra > BPIALL). The asm inline came from the initial checkin...Oh. :| In that case I''m not sure what use it is; it seems like a siimple ''isb; dsb;'' should do to make sure we''re ready for the TTBR switch. Tim.
Ian Campbell
2013-Apr-23 16:36 UTC
Re: [PATCH V2 3/4] arm: allocate per-PCPU domheap pagetable pages
On Tue, 2013-04-23 at 12:52 +0100, Ian Campbell wrote:> On Tue, 2013-04-23 at 12:38 +0100, Ian Campbell wrote: > > On Tue, 2013-04-23 at 12:33 +0100, Tim Deegan wrote: > > > Hi, > > > > > > At 12:21 +0100 on 23 Apr (1366719671), Ian Campbell wrote: > > > > > > + /* Change to this CPUs pagetables */ > > > > > > + flush_xen_text_tlb(); > > > > > > + > > > > > > + ttbr = (uintptr_t) virt_to_maddr(this_cpu(xen_pgtable)); > > > > > > + WRITE_SYSREG64(ttbr, TTBR0_EL2); > > > > > > > > > > isb() here, to make sure the CP write completes? > > > > > > > > Hrm, we don''t have one in the primary CPU bring up either. > > > > > > > > flush_xen_text_tlb starts with an isb, which I suspect is sufficient? (I > > > > vaguely remember having a conversation along those lines when this code > > > > was changed to look like this). > > > > > > Hrmn. I guess that does turn out OK, but given that this happens > > > exactly once per CPU, the extra ISB is a cheap enough price to pay for > > > making the code obviously correct. > > > > True. I refactored this code into: > > static void write_ttbr(uint64_t ttbr) > > { > > flush_xen_text_tlb(); > > > > WRITE_SYSREG64(boot_ttbr, TTBR0_EL2); > > dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ > > isb(); /* ensure that the TTBT write has completed */ > > flush_xen_text_tlb(); > > } > > Weird, if I don''t mark this function inline then it hangs at this point > (on the model). > > If I inline then all is fine. > > Sounds like a missing flush or barrier which is exposed by the return > "bx lr" or possibly the stack manipulations on the function epilogue. > But I can''t spot it...The function is pushing the fp to the stack in the prologue, then reading it back in the epilogue after the switch and getting some stale value, which is pretty obvious now I think of it. The compiler is being a bit dumb, since there are no stack local variables so it does: 0x00241e04 <+0>: push {r11} ; (str r11, [sp, #-4]!) 0x00241e08 <+4>: add r11, sp, #0 ... 0x00241e50 <+76>: add sp, r11, #0 0x00241e54 <+80>: pop {r11} which is all a bit pointless since the function has 0 local variables and never touches the stack otherwise, but of well. I''m going to stick an always_inline in there with a comment. Ian.