Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 00/10] xen: arm: map normal memory as inner shareable, reduce scope of various barriers
Currently Xen maps all RAM as Outer-Shareable, since that seemed like the most conservative option early on when we didn''t really know what Inner- vs. Outer-Shareable meant. However we have long suspected that actually Inner-Shareable would be the correct type to use. After reading the docs many times, getting more confused each time, I finally got a reasonable explanation from a man (and a dog) down the pub: Inner-Shareable == the processors in an SMP system, while Outer-Shareable == devices. (NB: Not a random man, he knew what he was talking about...). With that in mind switch all of Xen''s memory mapping, page table walks, TLB flushes and an appropriate subset of the barriers to be inner shareable. In addition I have switched barriers to use the correct read/write/any variants for their types. Note that I have only tackled the generic mb/rmb/wmb and smp_* barriers (mainly used by common code) here. There are also quite a few open-coded full-system dsb''s in the arch code which I will look at separately (10 patches is quite enough for now). Since those deal with e.g. pagetable updates they will be fun ;-) I have tested this series on Cortex A15 and AEMv8 fast models in both cases with 2 CPUs and I can start a guest in both cases. I have not tested on any real hardware at all. These changes should result in a performance improvement, although only having models to go on I haven''t actually bothered to measure. I would appreciate anyone with access to real hardware giving it a go. I have pushed the patches to: git://xenbits.xen.org/people/ianc/xen.git inner-shareable-v1 Ian Campbell (10): xen: arm: map memory as inner shareable. xen: arm: Only upgrade guest barriers to inner shareable. xen: arm: reduce instruction cache and tlb flushes to inner-shareable. xen: arm: consolidate barrier definitions xen: use SMP barrier in common code dealing with shared memory protocols xen: arm: Use SMP barriers when that is all which is required. xen: arm: Use dmb for smp barriers xen: arm: add scope to dsb and dmb macros xen: arm: weaken SMP barriers to inner shareable. xen: arm: use more specific barriers for read and write barriers. xen/arch/arm/arm32/head.S | 8 +++--- xen/arch/arm/arm64/head.S | 8 +++--- xen/arch/arm/domain.c | 2 +- xen/arch/arm/gic.c | 8 +++--- xen/arch/arm/mm.c | 28 ++++++++++------------ xen/arch/arm/platforms/vexpress.c | 6 ++-- xen/arch/arm/smpboot.c | 6 ++-- xen/arch/arm/time.c | 2 +- xen/arch/arm/traps.c | 2 +- xen/common/domain.c | 2 +- xen/common/domctl.c | 2 +- xen/common/grant_table.c | 4 +- xen/common/page_alloc.c | 2 +- xen/common/smp.c | 4 +- xen/common/spinlock.c | 6 ++-- xen/common/tmem_xen.c | 10 ++++---- xen/common/trace.c | 8 +++--- xen/drivers/char/console.c | 2 +- xen/drivers/video/arm_hdlcd.c | 2 +- xen/include/asm-arm/arm32/flushtlb.h | 12 +++++----- xen/include/asm-arm/arm32/io.h | 4 +- xen/include/asm-arm/arm32/page.h | 12 +++++----- xen/include/asm-arm/arm32/system.h | 16 ------------- xen/include/asm-arm/arm64/flushtlb.h | 4 +- xen/include/asm-arm/arm64/io.h | 4 +- xen/include/asm-arm/arm64/page.h | 10 ++++---- xen/include/asm-arm/arm64/system.h | 17 ------------- xen/include/asm-arm/page.h | 42 +++++++++++++++++++++++++++++----- xen/include/asm-arm/system.h | 21 +++++++++++++++++ xen/include/xen/event.h | 4 +- xen/xsm/flask/ss/sidtab.c | 4 +- 31 files changed, 139 insertions(+), 123 deletions(-) Ian.
The inner shareable domain contains all SMP processors, including different clusters (e.g. big.LITTLE). Therefore this is the correct thing to use for Xen memory mappings. The outer shareable domain is for devices on busses which are barriers (e.g. AMBA4). While the system domain is for things behind bridges which do not. One wrinkle is that Normal memory with attributes Inner Non-cacheable, Outer Non-cacheable (which we call BUFFERABLE) must be mapped Outer Shareable on ARM v7. Therefore change the prototype of mfn_to_xen_entry to take the attribute index so we can DTRT. On ARMv8 the sharability is ignored and considered to always be Outer Shareable. While I''m here change all the dmb/dsb with an implicit sy to an explicit sy, to make future changes simpler. Other than that don''t adjust the barriers, flushes etc, those remain as they were (which is more than is now required). I''ll change those in a later patch. Many thanks to Leif for explaining the difference between Inner- and Outer-Shareable in words of two or less syllables, I hope I''ve replicated that explanation properly above! Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> --- xen/arch/arm/arm32/head.S | 8 +++--- xen/arch/arm/arm64/head.S | 8 +++--- xen/arch/arm/mm.c | 24 ++++++++++------------ xen/include/asm-arm/arm32/system.h | 4 +- xen/include/asm-arm/page.h | 38 ++++++++++++++++++++++++++++++++--- 5 files changed, 55 insertions(+), 27 deletions(-) diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S index 0588d54..464c351 100644 --- a/xen/arch/arm/arm32/head.S +++ b/xen/arch/arm/arm32/head.S @@ -24,8 +24,8 @@ #define ZIMAGE_MAGIC_NUMBER 0x016f2818 -#define PT_PT 0xe7f /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=1 P=1 */ -#define PT_MEM 0xe7d /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=0 P=1 */ +#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */ +#define PT_MEM 0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */ #define PT_DEV 0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */ #define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */ @@ -206,10 +206,10 @@ skip_bss: mcr CP32(r1, HMAIR1) /* Set up the HTCR: - * PT walks use Outer-Shareable accesses, + * PT walks use Inner-Shareable accesses, * PT walks are write-back, write-allocate in both cache levels, * Full 32-bit address space goes through this table. */ - ldr r0, =0x80002500 + ldr r0, =0x80003500 mcr CP32(r0, HTCR) /* Set up the HSCTLR: diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S index 21b7e4d..ffcb880 100644 --- a/xen/arch/arm/arm64/head.S +++ b/xen/arch/arm/arm64/head.S @@ -24,8 +24,8 @@ #include <asm/page.h> #include <asm/asm_defns.h> -#define PT_PT 0xe7f /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=1 P=1 */ -#define PT_MEM 0xe7d /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=0 P=1 */ +#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */ +#define PT_MEM 0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */ #define PT_DEV 0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */ #define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */ @@ -178,10 +178,10 @@ skip_bss: /* Set up the HTCR: * PASize -- 4G * Top byte is used - * PT walks use Outer-Shareable accesses, + * PT walks use Inner-Shareable accesses, * PT walks are write-back, write-allocate in both cache levels, * Full 64-bit address space goes through this table. */ - ldr x0, =0x80802500 + ldr x0, =0x80803500 msr tcr_el2, x0 /* Set up the HSCTLR: diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index d1290cd..c5213f2 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -163,9 +163,8 @@ void dump_hyp_walk(vaddr_t addr) /* Map a 4k page in a fixmap entry */ void set_fixmap(unsigned map, unsigned long mfn, unsigned attributes) { - lpae_t pte = mfn_to_xen_entry(mfn); + lpae_t pte = mfn_to_xen_entry(mfn, attributes); pte.pt.table = 1; /* 4k mappings always have this bit set */ - pte.pt.ai = attributes; pte.pt.xn = 1; write_pte(xen_fixmap + third_table_offset(FIXMAP_ADDR(map)), pte); flush_xen_data_tlb_range_va(FIXMAP_ADDR(map), PAGE_SIZE); @@ -212,7 +211,7 @@ void *map_domain_page(unsigned long mfn) if ( map[slot].pt.avail == 0 ) { /* Commandeer this 2MB slot */ - pte = mfn_to_xen_entry(slot_mfn); + pte = mfn_to_xen_entry(slot_mfn, WRITEALLOC); pte.pt.avail = 1; write_pte(map + slot, pte); break; @@ -340,7 +339,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) /* Map the destination in the boot misc area. */ dest_va = BOOT_MISC_VIRT_START; - pte = mfn_to_xen_entry(xen_paddr >> PAGE_SHIFT); + pte = mfn_to_xen_entry(xen_paddr >> PAGE_SHIFT, WRITEALLOC); write_pte(xen_second + second_table_offset(dest_va), pte); flush_xen_data_tlb_range_va(dest_va, SECOND_SIZE); @@ -387,7 +386,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) /* Link in the fixmap pagetable */ pte = mfn_to_xen_entry((((unsigned long) xen_fixmap) + phys_offset) - >> PAGE_SHIFT); + >> PAGE_SHIFT, WRITEALLOC); pte.pt.table = 1; write_pte(xen_second + second_table_offset(FIXMAP_ADDR(0)), pte); /* @@ -402,7 +401,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT); if ( !is_kernel(va) ) break; - pte = mfn_to_xen_entry(mfn); + pte = mfn_to_xen_entry(mfn, WRITEALLOC); pte.pt.table = 1; /* 4k mappings always have this bit set */ if ( is_kernel_text(va) || is_kernel_inittext(va) ) { @@ -415,7 +414,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) /* No flush required here as page table is not hooked in yet. */ } pte = mfn_to_xen_entry((((unsigned long) xen_xenmap) + phys_offset) - >> PAGE_SHIFT); + >> PAGE_SHIFT, WRITEALLOC); pte.pt.table = 1; write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte); /* TLBFLUSH and ISB would be needed here, but wait until we set WXN */ @@ -467,7 +466,7 @@ int init_secondary_pagetables(int cpu) /* Initialise first pagetable from first level of boot tables, and * hook into the new root. */ memcpy(first, boot_first, PAGE_SIZE); - pte = mfn_to_xen_entry(virt_to_mfn(first)); + pte = mfn_to_xen_entry(virt_to_mfn(first), WRITEALLOC); pte.pt.table = 1; write_pte(root, pte); #endif @@ -479,7 +478,7 @@ int init_secondary_pagetables(int cpu) * domheap mapping pages. */ for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ ) { - pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES), WRITEALLOC); pte.pt.table = 1; write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); } @@ -525,7 +524,7 @@ static void __init create_mappings(unsigned long virt, count = nr_mfns / LPAE_ENTRIES; p = xen_second + second_linear_offset(virt); - pte = mfn_to_xen_entry(base_mfn); + pte = mfn_to_xen_entry(base_mfn, WRITEALLOC); pte.pt.hint = 1; /* These maps are in 16-entry contiguous chunks. */ for ( i = 0; i < count; i++ ) { @@ -595,7 +594,7 @@ static int create_xen_table(lpae_t *entry) if ( p == NULL ) return -ENOMEM; clear_page(p); - pte = mfn_to_xen_entry(virt_to_mfn(p)); + pte = mfn_to_xen_entry(virt_to_mfn(p), WRITEALLOC); pte.pt.table = 1; write_pte(entry, pte); return 0; @@ -641,9 +640,8 @@ static int create_xen_entries(enum xenmap_operation op, addr, mfn); return -EINVAL; } - pte = mfn_to_xen_entry(mfn); + pte = mfn_to_xen_entry(mfn, ai); pte.pt.table = 1; - pte.pt.ai = ai; write_pte(&third[third_table_offset(addr)], pte); break; case REMOVE: diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h index 60148cb..b3736f4 100644 --- a/xen/include/asm-arm/arm32/system.h +++ b/xen/include/asm-arm/arm32/system.h @@ -7,8 +7,8 @@ #define wfi() __asm__ __volatile__ ("wfi" : : : "memory") #define isb() __asm__ __volatile__ ("isb" : : : "memory") -#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") -#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") +#define dsb() __asm__ __volatile__ ("dsb sy" : : : "memory") +#define dmb() __asm__ __volatile__ ("dmb sy" : : : "memory") #define mb() dsb() #define rmb() dsb() diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index 41e9eff..cd38956 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -185,7 +185,7 @@ typedef union { /* Standard entry type that we''ll use to build Xen''s own pagetables. * We put the same permissions at every level, because they''re ignored * by the walker in non-leaf entries. */ -static inline lpae_t mfn_to_xen_entry(unsigned long mfn) +static inline lpae_t mfn_to_xen_entry(unsigned long mfn, unsigned attr) { paddr_t pa = ((paddr_t) mfn) << PAGE_SHIFT; lpae_t e = (lpae_t) { @@ -193,10 +193,9 @@ static inline lpae_t mfn_to_xen_entry(unsigned long mfn) .xn = 1, /* No need to execute outside .text */ .ng = 1, /* Makes TLB flushes easier */ .af = 1, /* No need for access tracking */ - .sh = LPAE_SH_OUTER, /* Xen mappings are globally coherent */ .ns = 1, /* Hyp mode is in the non-secure world */ .user = 1, /* See below */ - .ai = WRITEALLOC, + .ai = attr, .table = 0, /* Set to 1 for links and 4k maps */ .valid = 1, /* Mappings are present */ }};; @@ -205,6 +204,37 @@ static inline lpae_t mfn_to_xen_entry(unsigned long mfn) * pagetables un User mode it''s OK. If this changes, remember * to update the hard-coded values in head.S too */ + switch ( attr ) + { + case BUFFERABLE: + /* + * ARM ARM: Overlaying the shareability attribute (B3-1376 to 1377) + * + * A memory region with a resultant memory type attribute of Normal, + * and a resultant cacheability attribute of Inner Non-cacheable, + * Outer Non-cacheable, must have a resultant shareability attribute + * of Outer Shareable, otherwise shareability is UNPREDICTABLE. + * + * On ARMv8 sharability is ignored and explicitly treated as Outer + * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable. + */ + e.pt.sh = LPAE_SH_OUTER; + break; + case UNCACHED: + case DEV_SHARED: + /* Shareability is ignored for non-Normal memory, Outer is as + * good as anything. + * + * On ARMv8 sharability is ignored and explicitly treated as Outer + * Shareable for any device memory type. + */ + e.pt.sh = LPAE_SH_OUTER; + break; + default: + e.pt.sh = LPAE_SH_INNER; /* Xen mappings are SMP coherent */ + break; + } + ASSERT(!(pa & ~PAGE_MASK)); ASSERT(!(pa & ~PADDR_MASK)); @@ -219,7 +249,7 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) lpae_t e = (lpae_t) { .p2m.xn = 0, .p2m.af = 1, - .p2m.sh = LPAE_SH_OUTER, + .p2m.sh = LPAE_SH_INNER, .p2m.write = 1, .p2m.read = 1, .p2m.mattr = mattr, -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 02/10] xen: arm: Only upgrade guest barriers to inner shareable.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/traps.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 398d209..e40ae2e 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -62,7 +62,7 @@ void __cpuinit init_traps(void) WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2); /* Setup hypervisor traps */ - WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); + WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); isb(); } -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 03/10] xen: arm: reduce instruction cache and tlb flushes to inner-shareable.
Now that Xen maps memory and performs pagetable walks as inner shareable we don''t need to push updates down so far when modifying page tables etc. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/arm32/flushtlb.h | 4 ++-- xen/include/asm-arm/arm32/page.h | 8 ++++---- xen/include/asm-arm/arm64/flushtlb.h | 4 ++-- xen/include/asm-arm/arm64/page.h | 6 +++--- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/xen/include/asm-arm/arm32/flushtlb.h b/xen/include/asm-arm/arm32/flushtlb.h index a258f58..14e8827 100644 --- a/xen/include/asm-arm/arm32/flushtlb.h +++ b/xen/include/asm-arm/arm32/flushtlb.h @@ -6,7 +6,7 @@ static inline void flush_tlb_local(void) { dsb(); - WRITE_CP32((uint32_t) 0, TLBIALL); + WRITE_CP32((uint32_t) 0, TLBIALLIS); dsb(); isb(); @@ -17,7 +17,7 @@ static inline void flush_tlb_all_local(void) { dsb(); - WRITE_CP32((uint32_t) 0, TLBIALLNSNH); + WRITE_CP32((uint32_t) 0, TLBIALLNSNHIS); dsb(); isb(); diff --git a/xen/include/asm-arm/arm32/page.h b/xen/include/asm-arm/arm32/page.h index 38bcffd..e573502 100644 --- a/xen/include/asm-arm/arm32/page.h +++ b/xen/include/asm-arm/arm32/page.h @@ -39,8 +39,8 @@ static inline void flush_xen_text_tlb(void) asm volatile ( "isb;" /* Ensure synchronization with previous changes to text */ STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ - STORE_CP32(0, ICIALLU) /* Flush I-cache */ - STORE_CP32(0, BPIALL) /* Flush branch predictor */ + STORE_CP32(0, ICIALLUIS) /* Flush I-cache */ + STORE_CP32(0, BPIALLIS) /* Flush branch predictor */ "dsb;" /* Ensure completion of TLB+BP flush */ "isb;" : : "r" (r0) /*dummy*/ : "memory"); @@ -54,7 +54,7 @@ static inline void flush_xen_data_tlb(void) { register unsigned long r0 asm ("r0"); asm volatile("dsb;" /* Ensure preceding are visible */ - STORE_CP32(0, TLBIALLH) + STORE_CP32(0, TLBIALLHIS) "dsb;" /* Ensure completion of the TLB flush */ "isb;" : : "r" (r0) /* dummy */: "memory"); @@ -69,7 +69,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s unsigned long end = va + size; dsb(); /* Ensure preceding are visible */ while ( va < end ) { - asm volatile(STORE_CP32(0, TLBIMVAH) + asm volatile(STORE_CP32(0, TLBIMVAHIS) : : "r" (va) : "memory"); va += PAGE_SIZE; } diff --git a/xen/include/asm-arm/arm64/flushtlb.h b/xen/include/asm-arm/arm64/flushtlb.h index d0535a0..3a6d2cb 100644 --- a/xen/include/asm-arm/arm64/flushtlb.h +++ b/xen/include/asm-arm/arm64/flushtlb.h @@ -6,7 +6,7 @@ static inline void flush_tlb_local(void) { asm volatile( "dsb sy;" - "tlbi vmalle1;" + "tlbi vmalle1is;" "dsb sy;" "isb;" : : : "memory"); @@ -17,7 +17,7 @@ static inline void flush_tlb_all_local(void) { asm volatile( "dsb sy;" - "tlbi alle1;" + "tlbi alle1is;" "dsb sy;" "isb;" : : : "memory"); diff --git a/xen/include/asm-arm/arm64/page.h b/xen/include/asm-arm/arm64/page.h index bd48fe3..28748d3 100644 --- a/xen/include/asm-arm/arm64/page.h +++ b/xen/include/asm-arm/arm64/page.h @@ -33,7 +33,7 @@ static inline void flush_xen_text_tlb(void) asm volatile ( "isb;" /* Ensure synchronization with previous changes to text */ "tlbi alle2;" /* Flush hypervisor TLB */ - "ic iallu;" /* Flush I-cache */ + "ic ialluis;" /* Flush I-cache */ "dsb sy;" /* Ensure completion of TLB flush */ "isb;" : : : "memory"); @@ -47,7 +47,7 @@ static inline void flush_xen_data_tlb(void) { asm volatile ( "dsb sy;" /* Ensure visibility of PTE writes */ - "tlbi alle2;" /* Flush hypervisor TLB */ + "tlbi alle2is;" /* Flush hypervisor TLB */ "dsb sy;" /* Ensure completion of TLB flush */ "isb;" : : : "memory"); @@ -62,7 +62,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s unsigned long end = va + size; dsb(); /* Ensure preceding are visible */ while ( va < end ) { - asm volatile("tlbi vae2, %0;" + asm volatile("tlbi vae2is, %0;" : : "r" (va>>PAGE_SHIFT) : "memory"); va += PAGE_SIZE; } -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 04/10] xen: arm: consolidate barrier definitions
These are effectively identical on both 32- and 64-bit. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/arm32/system.h | 16 ---------------- xen/include/asm-arm/arm64/system.h | 17 ----------------- xen/include/asm-arm/system.h | 16 ++++++++++++++++ 3 files changed, 16 insertions(+), 33 deletions(-) diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h index b3736f4..9f233fe 100644 --- a/xen/include/asm-arm/arm32/system.h +++ b/xen/include/asm-arm/arm32/system.h @@ -2,22 +2,6 @@ #ifndef __ASM_ARM32_SYSTEM_H #define __ASM_ARM32_SYSTEM_H -#define sev() __asm__ __volatile__ ("sev" : : : "memory") -#define wfe() __asm__ __volatile__ ("wfe" : : : "memory") -#define wfi() __asm__ __volatile__ ("wfi" : : : "memory") - -#define isb() __asm__ __volatile__ ("isb" : : : "memory") -#define dsb() __asm__ __volatile__ ("dsb sy" : : : "memory") -#define dmb() __asm__ __volatile__ ("dmb sy" : : : "memory") - -#define mb() dsb() -#define rmb() dsb() -#define wmb() mb() - -#define smp_mb() mb() -#define smp_rmb() rmb() -#define smp_wmb() wmb() - extern void __bad_xchg(volatile void *, int); static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size) diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h index 4e41913..46f901c 100644 --- a/xen/include/asm-arm/arm64/system.h +++ b/xen/include/asm-arm/arm64/system.h @@ -2,23 +2,6 @@ #ifndef __ASM_ARM64_SYSTEM_H #define __ASM_ARM64_SYSTEM_H -#define sev() asm volatile("sev" : : : "memory") -#define wfe() asm volatile("wfe" : : : "memory") -#define wfi() asm volatile("wfi" : : : "memory") - -#define isb() asm volatile("isb" : : : "memory") -#define dsb() asm volatile("dsb sy" : : : "memory") -#define dmb() asm volatile("dmb sy" : : : "memory") - -#define mb() dsb() -#define rmb() dsb() -#define wmb() mb() - -#define smp_mb() mb() -#define smp_rmb() rmb() -#define smp_wmb() wmb() - - extern void __bad_xchg(volatile void *, int); static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size) diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index 290d38d..e003624 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -8,6 +8,22 @@ #define nop() \ asm volatile ( "nop" ) +#define sev() asm volatile("sev" : : : "memory") +#define wfe() asm volatile("wfe" : : : "memory") +#define wfi() asm volatile("wfi" : : : "memory") + +#define isb() asm volatile("isb" : : : "memory") +#define dsb() asm volatile("dsb sy" : : : "memory") +#define dmb() asm volatile("dmb sy" : : : "memory") + +#define mb() dsb() +#define rmb() dsb() +#define wmb() mb() + +#define smp_mb() mb() +#define smp_rmb() rmb() +#define smp_wmb() wmb() + #define xchg(ptr,x) \ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 05/10] xen: use SMP barrier in common code dealing with shared memory protocols
Xen currently makes no strong distinction between the SMP barriers (smp_mb etc) and the regular barrier (mb etc). In Linux, where we inherited these names from having imported Linux code which uses them, the SMP barriers are intended to be sufficient for implementing shared-memory protocols between processors in an SMP system while the standard barriers are useful for MMIO etc. On x86 with the stronger ordering model there is not much practical difference here but ARM has weaker barriers available which are suitable for use as SMP barriers. Therefore ensure that common code uses the SMP barriers when that is all which is required. On both ARM and x86 both types of barrier are currently identical so there is no actual change. A future patch will change smp_mb to a weaker barrier on ARM. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: jbeuich@suse.com Cc: keir@xen.org --- I''m not convinced some of these mb() couldn''t actually be wmb(), but I didn''t think to hard about this or make any changes along those lines. --- xen/common/domain.c | 2 +- xen/common/domctl.c | 2 +- xen/common/grant_table.c | 4 ++-- xen/common/page_alloc.c | 2 +- xen/common/smp.c | 4 ++-- xen/common/spinlock.c | 6 +++--- xen/common/tmem_xen.c | 10 +++++----- xen/common/trace.c | 8 ++++---- xen/drivers/char/console.c | 2 +- xen/include/xen/event.h | 4 ++-- xen/xsm/flask/ss/sidtab.c | 4 ++-- 11 files changed, 24 insertions(+), 24 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index fac3470..1380ea9 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -932,7 +932,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset) v->vcpu_info_mfn = page_to_mfn(page); /* Set new vcpu_info pointer /before/ setting pending flags. */ - wmb(); + smp_wmb(); /* * Mark everything as being pending just to make sure nothing gets diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 9bd8f80..c653efb 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -533,7 +533,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) /* Install vcpu array /then/ update max_vcpus. */ d->vcpu = vcpus; - wmb(); + smp_wmb(); d->max_vcpus = max; } diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index 3f97328..eb50288 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -426,7 +426,7 @@ static int _set_status_v2(domid_t domid, /* Make sure guest sees status update before checking if flags are still valid */ - mb(); + smp_mb(); scombo.word = *(u32 *)shah; barrier(); @@ -1670,7 +1670,7 @@ gnttab_transfer( guest_physmap_add_page(e, sha->full_page.frame, mfn, 0); sha->full_page.frame = mfn; } - wmb(); + smp_wmb(); shared_entry_header(e->grant_table, gop.ref)->flags | GTF_transfer_completed; diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 2162ef1..25a7d3d 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1472,7 +1472,7 @@ int assign_pages( ASSERT(page_get_owner(&pg[i]) == NULL); ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0); page_set_owner(&pg[i], d); - wmb(); /* Domain pointer must be visible before updating refcnt. */ + smp_wmb(); /* Domain pointer must be visible before updating refcnt. */ pg[i].count_info = PGC_allocated | 1; page_list_add_tail(&pg[i], &d->page_list); } diff --git a/xen/common/smp.c b/xen/common/smp.c index b2b056b..482a203 100644 --- a/xen/common/smp.c +++ b/xen/common/smp.c @@ -89,12 +89,12 @@ void smp_call_function_interrupt(void) if ( call_data.wait ) { (*func)(info); - mb(); + smp_mb(); cpumask_clear_cpu(cpu, &call_data.selected); } else { - mb(); + smp_mb(); cpumask_clear_cpu(cpu, &call_data.selected); (*func)(info); } diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c index bfb9670..575cc6d 100644 --- a/xen/common/spinlock.c +++ b/xen/common/spinlock.c @@ -218,7 +218,7 @@ void _spin_barrier(spinlock_t *lock) u64 loop = 0; check_barrier(&lock->debug); - do { mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) ); + do { smp_mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) ); if ((loop > 1) && lock->profile) { lock->profile->time_block += NOW() - block; @@ -226,9 +226,9 @@ void _spin_barrier(spinlock_t *lock) } #else check_barrier(&lock->debug); - do { mb(); } while ( _raw_spin_is_locked(&lock->raw) ); + do { smp_mb(); } while ( _raw_spin_is_locked(&lock->raw) ); #endif - mb(); + smp_mb(); } int _spin_trylock_recursive(spinlock_t *lock) diff --git a/xen/common/tmem_xen.c b/xen/common/tmem_xen.c index 736a8c3..54ec09f 100644 --- a/xen/common/tmem_xen.c +++ b/xen/common/tmem_xen.c @@ -173,7 +173,7 @@ EXPORT int tmh_copy_from_client(pfp_t *pfp, return -EFAULT; } } - mb(); + smp_mb(); if ( len == PAGE_SIZE && !tmem_offset && !pfn_offset && cli_va ) tmh_copy_page(tmem_va, cli_va); else if ( (tmem_offset+len <= PAGE_SIZE) && @@ -216,7 +216,7 @@ EXPORT int tmh_compress_from_client(tmem_cli_mfn_t cmfn, return 0; else if ( copy_from_guest(scratch, clibuf, PAGE_SIZE) ) return -EFAULT; - mb(); + smp_mb(); ret = lzo1x_1_compress(cli_va ?: scratch, PAGE_SIZE, dmem, out_len, wmem); ASSERT(ret == LZO_E_OK); *out_va = dmem; @@ -260,7 +260,7 @@ EXPORT int tmh_copy_to_client(tmem_cli_mfn_t cmfn, pfp_t *pfp, unmap_domain_page(tmem_va); if ( cli_va ) cli_put_page(cli_va, cli_pfp, cli_mfn, 1); - mb(); + smp_mb(); return rc; } @@ -289,7 +289,7 @@ EXPORT int tmh_decompress_to_client(tmem_cli_mfn_t cmfn, void *tmem_va, cli_put_page(cli_va, cli_pfp, cli_mfn, 1); else if ( copy_to_guest(clibuf, scratch, PAGE_SIZE) ) return -EFAULT; - mb(); + smp_mb(); return 1; } @@ -311,7 +311,7 @@ EXPORT int tmh_copy_tze_to_client(tmem_cli_mfn_t cmfn, void *tmem_va, if ( len < PAGE_SIZE ) memset((char *)cli_va+len,0,PAGE_SIZE-len); cli_put_page(cli_va, cli_pfp, cli_mfn, 1); - mb(); + smp_mb(); return 1; } diff --git a/xen/common/trace.c b/xen/common/trace.c index fd4ac48..63ea0b7 100644 --- a/xen/common/trace.c +++ b/xen/common/trace.c @@ -255,7 +255,7 @@ static int alloc_trace_bufs(unsigned int pages) opt_tbuf_size = pages; printk("xentrace: initialised\n"); - wmb(); /* above must be visible before tb_init_done flag set */ + smp_wmb(); /* above must be visible before tb_init_done flag set */ tb_init_done = 1; return 0; @@ -414,7 +414,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc) int i; tb_init_done = 0; - wmb(); + smp_wmb(); /* Clear any lost-record info so we don''t get phantom lost records next time we * start tracing. Grab the lock to make sure we''re not racing anyone. After this * hypercall returns, no more records should be placed into the buffers. */ @@ -607,7 +607,7 @@ static inline void __insert_record(struct t_buf *buf, memcpy(next_page, (char *)rec + remaining, rec_size - remaining); } - wmb(); + smp_wmb(); next += rec_size; if ( next >= 2*data_size ) @@ -718,7 +718,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra, return; /* Read tb_init_done /before/ t_bufs. */ - rmb(); + smp_rmb(); spin_lock_irqsave(&this_cpu(t_lock), flags); diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c index 7cd7bf6..b696b3e 100644 --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -648,7 +648,7 @@ void __init console_init_postirq(void) for ( i = conringc ; i != conringp; i++ ) ring[i & (opt_conring_size - 1)] = conring[i & (conring_size - 1)]; conring = ring; - wmb(); /* Allow users of console_force_unlock() to see larger buffer. */ + smp_wmb(); /* Allow users of console_force_unlock() to see larger buffer. */ conring_size = opt_conring_size; spin_unlock_irq(&console_lock); diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h index 4ac39ad..6f60162 100644 --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -85,7 +85,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); if ( condition ) \ break; \ set_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ - mb(); /* set blocked status /then/ re-evaluate condition */ \ + smp_mb(); /* set blocked status /then/ re-evaluate condition */ \ if ( condition ) \ { \ clear_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ @@ -99,7 +99,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); do { \ set_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ raise_softirq(SCHEDULE_SOFTIRQ); \ - mb(); /* set blocked status /then/ caller does his work */ \ + smp_mb(); /* set blocked status /then/ caller does his work */ \ } while ( 0 ) #endif /* __XEN_EVENT_H__ */ diff --git a/xen/xsm/flask/ss/sidtab.c b/xen/xsm/flask/ss/sidtab.c index 586033c..cd1360c 100644 --- a/xen/xsm/flask/ss/sidtab.c +++ b/xen/xsm/flask/ss/sidtab.c @@ -79,13 +79,13 @@ int sidtab_insert(struct sidtab *s, u32 sid, struct context *context) if ( prev ) { newnode->next = prev->next; - wmb(); + smp_wmb(); prev->next = newnode; } else { newnode->next = s->htable[hvalue]; - wmb(); + smp_wmb(); s->htable[hvalue] = newnode; } -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 06/10] xen: arm: Use SMP barriers when that is all which is required.
As explained in the previous commit SMP barriers can be used when all we care about is synchronising against other processors. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/mm.c | 2 +- xen/arch/arm/smpboot.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index c5213f2..3f049cb 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -776,7 +776,7 @@ void share_xen_page_with_guest(struct page_info *page, page->u.inuse.type_info |= PGT_validated | 1; page_set_owner(page, d); - wmb(); /* install valid domain ptr before updating refcnt. */ + smp_wmb(); /* install valid domain ptr before updating refcnt. */ ASSERT((page->count_info & ~PGC_xen_heap) == 0); /* Only add to the allocation list if the domain isn''t dying. */ diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 8011987..727e09f 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -170,11 +170,11 @@ void __cpuinit start_secondary(unsigned long boot_phys_offset, /* Run local notifiers */ notify_cpu_starting(cpuid); - wmb(); + smp_wmb(); /* Now report this CPU is up */ cpumask_set_cpu(cpuid, &cpu_online_map); - wmb(); + smp_wmb(); local_irq_enable(); -- 1.7.2.5
The full power of dsb is not required in this context. Also change wmb() to be dsb() directly instead of indirectly via mb(), for clarity. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/system.h | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index e003624..89c61ef 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -18,11 +18,11 @@ #define mb() dsb() #define rmb() dsb() -#define wmb() mb() +#define wmb() dsb() -#define smp_mb() mb() -#define smp_rmb() rmb() -#define smp_wmb() wmb() +#define smp_mb() dmb() +#define smp_rmb() dmb() +#define smp_wmb() dmb() #define xchg(ptr,x) \ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 08/10] xen: arm: add scope to dsb and dmb macros
Everywhere currently passes "sy"stem, so no actual change. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/arch/arm/domain.c | 2 +- xen/arch/arm/gic.c | 8 ++++---- xen/arch/arm/mm.c | 2 +- xen/arch/arm/platforms/vexpress.c | 6 +++--- xen/arch/arm/smpboot.c | 2 +- xen/arch/arm/time.c | 2 +- xen/drivers/video/arm_hdlcd.c | 2 +- xen/include/asm-arm/arm32/flushtlb.h | 8 ++++---- xen/include/asm-arm/arm32/io.h | 4 ++-- xen/include/asm-arm/arm32/page.h | 4 ++-- xen/include/asm-arm/arm64/io.h | 4 ++-- xen/include/asm-arm/arm64/page.h | 4 ++-- xen/include/asm-arm/page.h | 4 ++-- xen/include/asm-arm/system.h | 16 ++++++++-------- 14 files changed, 34 insertions(+), 34 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 4c434a1..0bd8f6b 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -44,7 +44,7 @@ void idle_loop(void) local_irq_disable(); if ( cpu_is_haltable(smp_processor_id()) ) { - dsb(); + dsb("sy"); wfi(); } local_irq_enable(); diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 177560e..42095ee 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -432,7 +432,7 @@ void send_SGI_mask(const cpumask_t *cpumask, enum gic_sgi sgi) ASSERT(mask < 0x100); /* The target bitmap only supports 8 CPUs */ - dsb(); + dsb("sy"); GICD[GICD_SGIR] = GICD_SGI_TARGET_LIST | (mask<<GICD_SGI_TARGET_SHIFT) @@ -449,7 +449,7 @@ void send_SGI_self(enum gic_sgi sgi) { ASSERT(sgi < 16); /* There are only 16 SGIs */ - dsb(); + dsb("sy"); GICD[GICD_SGIR] = GICD_SGI_TARGET_SELF | sgi; @@ -459,7 +459,7 @@ void send_SGI_allbutself(enum gic_sgi sgi) { ASSERT(sgi < 16); /* There are only 16 SGIs */ - dsb(); + dsb("sy"); GICD[GICD_SGIR] = GICD_SGI_TARGET_OTHERS | sgi; @@ -546,7 +546,7 @@ static int __setup_irq(struct irq_desc *desc, unsigned int irq, desc->action = new; desc->status &= ~IRQ_DISABLED; - dsb(); + dsb("sy"); desc->handler->startup(desc); diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 3f049cb..b287a9b 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -324,7 +324,7 @@ void __cpuinit setup_virt_paging(void) #define WRITE_TTBR(ttbr) \ flush_xen_text_tlb(); \ WRITE_SYSREG64(ttbr, TTBR0_EL2); \ - dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ \ + dsb("sy"); /* ensure memory accesses do not cross over the TTBR0 write */ \ /* flush_xen_text_tlb contains an initial isb which ensures the \ * write to TTBR0 has completed. */ \ flush_xen_text_tlb() diff --git a/xen/arch/arm/platforms/vexpress.c b/xen/arch/arm/platforms/vexpress.c index 8fc30c4..6f6869e 100644 --- a/xen/arch/arm/platforms/vexpress.c +++ b/xen/arch/arm/platforms/vexpress.c @@ -46,7 +46,7 @@ static inline int vexpress_ctrl_start(uint32_t *syscfg, int write, /* wait for complete flag to be set */ do { stat = syscfg[V2M_SYS_CFGSTAT/4]; - dsb(); + dsb("sy"); } while ( !(stat & V2M_SYS_CFG_COMPLETE) ); /* check error status and return error flag if set */ @@ -111,10 +111,10 @@ static void vexpress_reset(void) /* switch to slow mode */ iowritel(sp810, 0x3); - dsb(); isb(); + dsb("sy"); isb(); /* writing any value to SCSYSSTAT reg will reset the system */ iowritel(sp810 + 4, 0x1); - dsb(); isb(); + dsb("sy"); isb(); iounmap(sp810); } diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 727e09f..b88355f 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -211,7 +211,7 @@ void stop_cpu(void) local_irq_disable(); cpu_is_dead = 1; /* Make sure the write happens before we sleep forever */ - dsb(); + dsb("sy"); isb(); while ( 1 ) wfi(); diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index 4ed7882..2c254f4 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -252,7 +252,7 @@ void udelay(unsigned long usecs) s_time_t deadline = get_s_time() + 1000 * (s_time_t) usecs; while ( get_s_time() - deadline < 0 ) ; - dsb(); + dsb("sy"); isb(); } diff --git a/xen/drivers/video/arm_hdlcd.c b/xen/drivers/video/arm_hdlcd.c index 72979ea..5c09b0e 100644 --- a/xen/drivers/video/arm_hdlcd.c +++ b/xen/drivers/video/arm_hdlcd.c @@ -77,7 +77,7 @@ void (*video_puts)(const char *) = vga_noop_puts; static void hdlcd_flush(void) { - dsb(); + dsb("sy"); } static int __init get_color_masks(const char* bpp, struct color_masks **masks) diff --git a/xen/include/asm-arm/arm32/flushtlb.h b/xen/include/asm-arm/arm32/flushtlb.h index 14e8827..2776375 100644 --- a/xen/include/asm-arm/arm32/flushtlb.h +++ b/xen/include/asm-arm/arm32/flushtlb.h @@ -4,22 +4,22 @@ /* Flush local TLBs, current VMID only */ static inline void flush_tlb_local(void) { - dsb(); + dsb("sy"); WRITE_CP32((uint32_t) 0, TLBIALLIS); - dsb(); + dsb("sy"); isb(); } /* Flush local TLBs, all VMIDs, non-hypervisor mode */ static inline void flush_tlb_all_local(void) { - dsb(); + dsb("sy"); WRITE_CP32((uint32_t) 0, TLBIALLNSNHIS); - dsb(); + dsb("sy"); isb(); } diff --git a/xen/include/asm-arm/arm32/io.h b/xen/include/asm-arm/arm32/io.h index ec7e0ff..cb0bc96 100644 --- a/xen/include/asm-arm/arm32/io.h +++ b/xen/include/asm-arm/arm32/io.h @@ -30,14 +30,14 @@ static inline uint32_t ioreadl(const volatile void __iomem *addr) asm volatile("ldr %1, %0" : "+Qo" (*(volatile uint32_t __force *)addr), "=r" (val)); - dsb(); + dsb("sy"); return val; } static inline void iowritel(const volatile void __iomem *addr, uint32_t val) { - dsb(); + dsb("sy"); asm volatile("str %1, %0" : "+Qo" (*(volatile uint32_t __force *)addr) : "r" (val)); diff --git a/xen/include/asm-arm/arm32/page.h b/xen/include/asm-arm/arm32/page.h index e573502..f8dfbd3 100644 --- a/xen/include/asm-arm/arm32/page.h +++ b/xen/include/asm-arm/arm32/page.h @@ -67,13 +67,13 @@ static inline void flush_xen_data_tlb(void) static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long size) { unsigned long end = va + size; - dsb(); /* Ensure preceding are visible */ + dsb("sy"); /* Ensure preceding are visible */ while ( va < end ) { asm volatile(STORE_CP32(0, TLBIMVAHIS) : : "r" (va) : "memory"); va += PAGE_SIZE; } - dsb(); /* Ensure completion of the TLB flush */ + dsb("sy"); /* Ensure completion of the TLB flush */ isb(); } diff --git a/xen/include/asm-arm/arm64/io.h b/xen/include/asm-arm/arm64/io.h index ec041cd..0a100ad 100644 --- a/xen/include/asm-arm/arm64/io.h +++ b/xen/include/asm-arm/arm64/io.h @@ -24,14 +24,14 @@ static inline uint32_t ioreadl(const volatile void __iomem *addr) uint32_t val; asm volatile("ldr %w0, [%1]" : "=r" (val) : "r" (addr)); - dsb(); + dsb("sy"); return val; } static inline void iowritel(const volatile void __iomem *addr, uint32_t val) { - dsb(); + dsb("sy"); asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr)); } diff --git a/xen/include/asm-arm/arm64/page.h b/xen/include/asm-arm/arm64/page.h index 28748d3..aca1590 100644 --- a/xen/include/asm-arm/arm64/page.h +++ b/xen/include/asm-arm/arm64/page.h @@ -60,13 +60,13 @@ static inline void flush_xen_data_tlb(void) static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long size) { unsigned long end = va + size; - dsb(); /* Ensure preceding are visible */ + dsb("sy"); /* Ensure preceding are visible */ while ( va < end ) { asm volatile("tlbi vae2is, %0;" : : "r" (va>>PAGE_SHIFT) : "memory"); va += PAGE_SIZE; } - dsb(); /* Ensure completion of the TLB flush */ + dsb("sy"); /* Ensure completion of the TLB flush */ isb(); } diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index cd38956..eb007ac 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -284,10 +284,10 @@ extern size_t cacheline_bytes; static inline void flush_xen_dcache_va_range(void *p, unsigned long size) { void *end; - dsb(); /* So the CPU issues all writes to the range */ + dsb("sy"); /* So the CPU issues all writes to the range */ for ( end = p + size; p < end; p += cacheline_bytes ) asm volatile (__flush_xen_dcache_one(0) : : "r" (p)); - dsb(); /* So we know the flushes happen before continuing */ + dsb("sy"); /* So we know the flushes happen before continuing */ } /* Macro for flushing a single small item. The predicate is always diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index 89c61ef..68efba9 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -13,16 +13,16 @@ #define wfi() asm volatile("wfi" : : : "memory") #define isb() asm volatile("isb" : : : "memory") -#define dsb() asm volatile("dsb sy" : : : "memory") -#define dmb() asm volatile("dmb sy" : : : "memory") +#define dsb(scope) asm volatile("dsb " scope : : : "memory") +#define dmb(scope) asm volatile("dmb " scope : : : "memory") -#define mb() dsb() -#define rmb() dsb() -#define wmb() dsb() +#define mb() dsb("sy") +#define rmb() dsb("sy") +#define wmb() dsb("sy") -#define smp_mb() dmb() -#define smp_rmb() dmb() -#define smp_wmb() dmb() +#define smp_mb() dmb("sy") +#define smp_rmb() dmb("sy") +#define smp_wmb() dmb("sy") #define xchg(ptr,x) \ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 09/10] xen: arm: weaken SMP barriers to inner shareable.
Since all processors are in the inner-shareable domain and we map everything that way this is sufficient. The non-SMP barriers remain full system. Although in principal they could become outer shareable barriers for some hardware this would require us to know which class a given device is. Given the small number of device drivers in Xen itself its probably not worth worrying over, although maybe someone will benchmark at some point. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/system.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index 68efba9..7c3e42d 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -20,9 +20,9 @@ #define rmb() dsb("sy") #define wmb() dsb("sy") -#define smp_mb() dmb("sy") -#define smp_rmb() dmb("sy") -#define smp_wmb() dmb("sy") +#define smp_mb() dmb("ish") +#define smp_rmb() dmb("ish") +#define smp_wmb() dmb("ish") #define xchg(ptr,x) \ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:10 UTC
[PATCH 10/10] xen: arm: use more specific barriers for read and write barriers.
Note that 32-bit does not provide a load variant of the inner shareable barrier, so that remains a full any-any barrier. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- xen/include/asm-arm/system.h | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index 7c3e42d..efaf645 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -17,12 +17,17 @@ #define dmb(scope) asm volatile("dmb " scope : : : "memory") #define mb() dsb("sy") -#define rmb() dsb("sy") -#define wmb() dsb("sy") +#define rmb() dsb("ld") +#define wmb() dsb("st") #define smp_mb() dmb("ish") -#define smp_rmb() dmb("ish") -#define smp_wmb() dmb("ish") +#ifdef CONFIG_ARM_64 +#define smp_rmb() dmb("ishld") +#else +#define smp_rmb() dmb("ish") /* 32-bit has no ishld variant. */ +#endif + +#define smp_wmb() dmb("ishst") #define xchg(ptr,x) \ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) -- 1.7.2.5
Ian Campbell
2013-Jun-28 16:11 UTC
Re: [PATCH 00/10] xen: arm: map normal memory as inner shareable, reduce scope of various barriers
I forgot to add an "RFC" to the subject line. I''m sure I don''t need to point out that this isn''t for 4.3! On Fri, 2013-06-28 at 17:10 +0100, Ian Campbell wrote:> Currently Xen maps all RAM as Outer-Shareable, since that seemed like > the most conservative option early on when we didn''t really know what > Inner- vs. Outer-Shareable meant. However we have long suspected that > actually Inner-Shareable would be the correct type to use. > > After reading the docs many times, getting more confused each time, I > finally got a reasonable explanation from a man (and a dog) down the > pub: Inner-Shareable == the processors in an SMP system, while > Outer-Shareable == devices. (NB: Not a random man, he knew what he was > talking about...). With that in mind switch all of Xen''s memory mapping, > page table walks, TLB flushes and an appropriate subset of the barriers > to be inner shareable. > > In addition I have switched barriers to use the correct read/write/any > variants for their types. Note that I have only tackled the generic > mb/rmb/wmb and smp_* barriers (mainly used by common code) here. There > are also quite a few open-coded full-system dsb''s in the arch code which > I will look at separately (10 patches is quite enough for now). Since > those deal with e.g. pagetable updates they will be fun ;-) > > I have tested this series on Cortex A15 and AEMv8 fast models in both > cases with 2 CPUs and I can start a guest in both cases. I have not > tested on any real hardware at all. > > These changes should result in a performance improvement, although only > having models to go on I haven''t actually bothered to measure. > > I would appreciate anyone with access to real hardware giving it a go. I > have pushed the patches to: > > git://xenbits.xen.org/people/ianc/xen.git inner-shareable-v1 > > > Ian Campbell (10): > xen: arm: map memory as inner shareable. > xen: arm: Only upgrade guest barriers to inner shareable. > xen: arm: reduce instruction cache and tlb flushes to inner-shareable. > xen: arm: consolidate barrier definitions > xen: use SMP barrier in common code dealing with shared memory protocols > xen: arm: Use SMP barriers when that is all which is required. > xen: arm: Use dmb for smp barriers > xen: arm: add scope to dsb and dmb macros > xen: arm: weaken SMP barriers to inner shareable. > xen: arm: use more specific barriers for read and write barriers. > > xen/arch/arm/arm32/head.S | 8 +++--- > xen/arch/arm/arm64/head.S | 8 +++--- > xen/arch/arm/domain.c | 2 +- > xen/arch/arm/gic.c | 8 +++--- > xen/arch/arm/mm.c | 28 ++++++++++------------ > xen/arch/arm/platforms/vexpress.c | 6 ++-- > xen/arch/arm/smpboot.c | 6 ++-- > xen/arch/arm/time.c | 2 +- > xen/arch/arm/traps.c | 2 +- > xen/common/domain.c | 2 +- > xen/common/domctl.c | 2 +- > xen/common/grant_table.c | 4 +- > xen/common/page_alloc.c | 2 +- > xen/common/smp.c | 4 +- > xen/common/spinlock.c | 6 ++-- > xen/common/tmem_xen.c | 10 ++++---- > xen/common/trace.c | 8 +++--- > xen/drivers/char/console.c | 2 +- > xen/drivers/video/arm_hdlcd.c | 2 +- > xen/include/asm-arm/arm32/flushtlb.h | 12 +++++----- > xen/include/asm-arm/arm32/io.h | 4 +- > xen/include/asm-arm/arm32/page.h | 12 +++++----- > xen/include/asm-arm/arm32/system.h | 16 ------------- > xen/include/asm-arm/arm64/flushtlb.h | 4 +- > xen/include/asm-arm/arm64/io.h | 4 +- > xen/include/asm-arm/arm64/page.h | 10 ++++---- > xen/include/asm-arm/arm64/system.h | 17 ------------- > xen/include/asm-arm/page.h | 42 +++++++++++++++++++++++++++++----- > xen/include/asm-arm/system.h | 21 +++++++++++++++++ > xen/include/xen/event.h | 4 +- > xen/xsm/flask/ss/sidtab.c | 4 +- > 31 files changed, 139 insertions(+), 123 deletions(-) > > Ian.
Ian Campbell
2013-Jun-28 16:15 UTC
Re: [PATCH 05/10] xen: use SMP barrier in common code dealing with shared memory protocols
On Fri, 2013-06-28 at 17:10 +0100, Ian Campbell wrote:> Xen currently makes no strong distinction between the SMP barriers (smp_mb > etc) and the regular barrier (mb etc). In Linux, where we inherited these > names from having imported Linux code which uses them, the SMP barriers are > intended to be sufficient for implementing shared-memory protocols between > processors in an SMP system while the standard barriers are useful for MMIO > etc. > > On x86 with the stronger ordering model there is not much practical difference > here but ARM has weaker barriers available which are suitable for use as SMP > barriers. > > Therefore ensure that common code uses the SMP barriers when that is all which > is required. > > On both ARM and x86 both types of barrier are currently identical so there is > no actual change. A future patch will change smp_mb to a weaker barrier on > ARM. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: jbeuich@suse.comI knew something was up here, and double checked "eu" vs "ue", but failed to notice the entire missing letter. Sorry Jan. CC line corrected.> Cc: keir@xen.org > --- > I''m not convinced some of these mb() couldn''t actually be wmb(), but I didn''t > think to hard about this or make any changes along those lines. > --- > xen/common/domain.c | 2 +- > xen/common/domctl.c | 2 +- > xen/common/grant_table.c | 4 ++-- > xen/common/page_alloc.c | 2 +- > xen/common/smp.c | 4 ++-- > xen/common/spinlock.c | 6 +++--- > xen/common/tmem_xen.c | 10 +++++----- > xen/common/trace.c | 8 ++++---- > xen/drivers/char/console.c | 2 +- > xen/include/xen/event.h | 4 ++-- > xen/xsm/flask/ss/sidtab.c | 4 ++-- > 11 files changed, 24 insertions(+), 24 deletions(-) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index fac3470..1380ea9 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -932,7 +932,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset) > v->vcpu_info_mfn = page_to_mfn(page); > > /* Set new vcpu_info pointer /before/ setting pending flags. */ > - wmb(); > + smp_wmb(); > > /* > * Mark everything as being pending just to make sure nothing gets > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index 9bd8f80..c653efb 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -533,7 +533,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) > > /* Install vcpu array /then/ update max_vcpus. */ > d->vcpu = vcpus; > - wmb(); > + smp_wmb(); > d->max_vcpus = max; > } > > diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c > index 3f97328..eb50288 100644 > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -426,7 +426,7 @@ static int _set_status_v2(domid_t domid, > > /* Make sure guest sees status update before checking if flags are > still valid */ > - mb(); > + smp_mb(); > > scombo.word = *(u32 *)shah; > barrier(); > @@ -1670,7 +1670,7 @@ gnttab_transfer( > guest_physmap_add_page(e, sha->full_page.frame, mfn, 0); > sha->full_page.frame = mfn; > } > - wmb(); > + smp_wmb(); > shared_entry_header(e->grant_table, gop.ref)->flags |> GTF_transfer_completed; > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index 2162ef1..25a7d3d 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1472,7 +1472,7 @@ int assign_pages( > ASSERT(page_get_owner(&pg[i]) == NULL); > ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0); > page_set_owner(&pg[i], d); > - wmb(); /* Domain pointer must be visible before updating refcnt. */ > + smp_wmb(); /* Domain pointer must be visible before updating refcnt. */ > pg[i].count_info = PGC_allocated | 1; > page_list_add_tail(&pg[i], &d->page_list); > } > diff --git a/xen/common/smp.c b/xen/common/smp.c > index b2b056b..482a203 100644 > --- a/xen/common/smp.c > +++ b/xen/common/smp.c > @@ -89,12 +89,12 @@ void smp_call_function_interrupt(void) > if ( call_data.wait ) > { > (*func)(info); > - mb(); > + smp_mb(); > cpumask_clear_cpu(cpu, &call_data.selected); > } > else > { > - mb(); > + smp_mb(); > cpumask_clear_cpu(cpu, &call_data.selected); > (*func)(info); > } > diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c > index bfb9670..575cc6d 100644 > --- a/xen/common/spinlock.c > +++ b/xen/common/spinlock.c > @@ -218,7 +218,7 @@ void _spin_barrier(spinlock_t *lock) > u64 loop = 0; > > check_barrier(&lock->debug); > - do { mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) ); > + do { smp_mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) ); > if ((loop > 1) && lock->profile) > { > lock->profile->time_block += NOW() - block; > @@ -226,9 +226,9 @@ void _spin_barrier(spinlock_t *lock) > } > #else > check_barrier(&lock->debug); > - do { mb(); } while ( _raw_spin_is_locked(&lock->raw) ); > + do { smp_mb(); } while ( _raw_spin_is_locked(&lock->raw) ); > #endif > - mb(); > + smp_mb(); > } > > int _spin_trylock_recursive(spinlock_t *lock) > diff --git a/xen/common/tmem_xen.c b/xen/common/tmem_xen.c > index 736a8c3..54ec09f 100644 > --- a/xen/common/tmem_xen.c > +++ b/xen/common/tmem_xen.c > @@ -173,7 +173,7 @@ EXPORT int tmh_copy_from_client(pfp_t *pfp, > return -EFAULT; > } > } > - mb(); > + smp_mb(); > if ( len == PAGE_SIZE && !tmem_offset && !pfn_offset && cli_va ) > tmh_copy_page(tmem_va, cli_va); > else if ( (tmem_offset+len <= PAGE_SIZE) && > @@ -216,7 +216,7 @@ EXPORT int tmh_compress_from_client(tmem_cli_mfn_t cmfn, > return 0; > else if ( copy_from_guest(scratch, clibuf, PAGE_SIZE) ) > return -EFAULT; > - mb(); > + smp_mb(); > ret = lzo1x_1_compress(cli_va ?: scratch, PAGE_SIZE, dmem, out_len, wmem); > ASSERT(ret == LZO_E_OK); > *out_va = dmem; > @@ -260,7 +260,7 @@ EXPORT int tmh_copy_to_client(tmem_cli_mfn_t cmfn, pfp_t *pfp, > unmap_domain_page(tmem_va); > if ( cli_va ) > cli_put_page(cli_va, cli_pfp, cli_mfn, 1); > - mb(); > + smp_mb(); > return rc; > } > > @@ -289,7 +289,7 @@ EXPORT int tmh_decompress_to_client(tmem_cli_mfn_t cmfn, void *tmem_va, > cli_put_page(cli_va, cli_pfp, cli_mfn, 1); > else if ( copy_to_guest(clibuf, scratch, PAGE_SIZE) ) > return -EFAULT; > - mb(); > + smp_mb(); > return 1; > } > > @@ -311,7 +311,7 @@ EXPORT int tmh_copy_tze_to_client(tmem_cli_mfn_t cmfn, void *tmem_va, > if ( len < PAGE_SIZE ) > memset((char *)cli_va+len,0,PAGE_SIZE-len); > cli_put_page(cli_va, cli_pfp, cli_mfn, 1); > - mb(); > + smp_mb(); > return 1; > } > > diff --git a/xen/common/trace.c b/xen/common/trace.c > index fd4ac48..63ea0b7 100644 > --- a/xen/common/trace.c > +++ b/xen/common/trace.c > @@ -255,7 +255,7 @@ static int alloc_trace_bufs(unsigned int pages) > opt_tbuf_size = pages; > > printk("xentrace: initialised\n"); > - wmb(); /* above must be visible before tb_init_done flag set */ > + smp_wmb(); /* above must be visible before tb_init_done flag set */ > tb_init_done = 1; > > return 0; > @@ -414,7 +414,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc) > int i; > > tb_init_done = 0; > - wmb(); > + smp_wmb(); > /* Clear any lost-record info so we don''t get phantom lost records next time we > * start tracing. Grab the lock to make sure we''re not racing anyone. After this > * hypercall returns, no more records should be placed into the buffers. */ > @@ -607,7 +607,7 @@ static inline void __insert_record(struct t_buf *buf, > memcpy(next_page, (char *)rec + remaining, rec_size - remaining); > } > > - wmb(); > + smp_wmb(); > > next += rec_size; > if ( next >= 2*data_size ) > @@ -718,7 +718,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int extra, > return; > > /* Read tb_init_done /before/ t_bufs. */ > - rmb(); > + smp_rmb(); > > spin_lock_irqsave(&this_cpu(t_lock), flags); > > diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c > index 7cd7bf6..b696b3e 100644 > --- a/xen/drivers/char/console.c > +++ b/xen/drivers/char/console.c > @@ -648,7 +648,7 @@ void __init console_init_postirq(void) > for ( i = conringc ; i != conringp; i++ ) > ring[i & (opt_conring_size - 1)] = conring[i & (conring_size - 1)]; > conring = ring; > - wmb(); /* Allow users of console_force_unlock() to see larger buffer. */ > + smp_wmb(); /* Allow users of console_force_unlock() to see larger buffer. */ > conring_size = opt_conring_size; > spin_unlock_irq(&console_lock); > > diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h > index 4ac39ad..6f60162 100644 > --- a/xen/include/xen/event.h > +++ b/xen/include/xen/event.h > @@ -85,7 +85,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); > if ( condition ) \ > break; \ > set_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ > - mb(); /* set blocked status /then/ re-evaluate condition */ \ > + smp_mb(); /* set blocked status /then/ re-evaluate condition */ \ > if ( condition ) \ > { \ > clear_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ > @@ -99,7 +99,7 @@ void notify_via_xen_event_channel(struct domain *ld, int lport); > do { \ > set_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ > raise_softirq(SCHEDULE_SOFTIRQ); \ > - mb(); /* set blocked status /then/ caller does his work */ \ > + smp_mb(); /* set blocked status /then/ caller does his work */ \ > } while ( 0 ) > > #endif /* __XEN_EVENT_H__ */ > diff --git a/xen/xsm/flask/ss/sidtab.c b/xen/xsm/flask/ss/sidtab.c > index 586033c..cd1360c 100644 > --- a/xen/xsm/flask/ss/sidtab.c > +++ b/xen/xsm/flask/ss/sidtab.c > @@ -79,13 +79,13 @@ int sidtab_insert(struct sidtab *s, u32 sid, struct context *context) > if ( prev ) > { > newnode->next = prev->next; > - wmb(); > + smp_wmb(); > prev->next = newnode; > } > else > { > newnode->next = s->htable[hvalue]; > - wmb(); > + smp_wmb(); > s->htable[hvalue] = newnode; > } >
Keir Fraser
2013-Jun-28 16:20 UTC
Re: [PATCH 05/10] xen: use SMP barrier in common code dealing with shared memory protocols
On 28/06/2013 17:10, "Ian Campbell" <ian.campbell@citrix.com> wrote:> Xen currently makes no strong distinction between the SMP barriers (smp_mb > etc) and the regular barrier (mb etc). In Linux, where we inherited these > names from having imported Linux code which uses them, the SMP barriers are > intended to be sufficient for implementing shared-memory protocols between > processors in an SMP system while the standard barriers are useful for MMIO > etc. > > On x86 with the stronger ordering model there is not much practical difference > here but ARM has weaker barriers available which are suitable for use as SMP > barriers. > > Therefore ensure that common code uses the SMP barriers when that is all which > is required. > > On both ARM and x86 both types of barrier are currently identical so there is > no actual change. A future patch will change smp_mb to a weaker barrier on > ARM. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: jbeuich@suse.com > Cc: keir@xen.org > ---Acked-by: Keir Fraser <keir@xen.org>> I''m not convinced some of these mb() couldn''t actually be wmb(), but I didn''t > think to hard about this or make any changes along those lines.Would logically be a separate patch anyway. -- Keir> --- > xen/common/domain.c | 2 +- > xen/common/domctl.c | 2 +- > xen/common/grant_table.c | 4 ++-- > xen/common/page_alloc.c | 2 +- > xen/common/smp.c | 4 ++-- > xen/common/spinlock.c | 6 +++--- > xen/common/tmem_xen.c | 10 +++++----- > xen/common/trace.c | 8 ++++---- > xen/drivers/char/console.c | 2 +- > xen/include/xen/event.h | 4 ++-- > xen/xsm/flask/ss/sidtab.c | 4 ++-- > 11 files changed, 24 insertions(+), 24 deletions(-) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index fac3470..1380ea9 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -932,7 +932,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, > unsigned offset) > v->vcpu_info_mfn = page_to_mfn(page); > > /* Set new vcpu_info pointer /before/ setting pending flags. */ > - wmb(); > + smp_wmb(); > > /* > * Mark everything as being pending just to make sure nothing gets > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index 9bd8f80..c653efb 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -533,7 +533,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) > u_domctl) > > /* Install vcpu array /then/ update max_vcpus. */ > d->vcpu = vcpus; > - wmb(); > + smp_wmb(); > d->max_vcpus = max; > } > > diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c > index 3f97328..eb50288 100644 > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -426,7 +426,7 @@ static int _set_status_v2(domid_t domid, > > /* Make sure guest sees status update before checking if flags are > still valid */ > - mb(); > + smp_mb(); > > scombo.word = *(u32 *)shah; > barrier(); > @@ -1670,7 +1670,7 @@ gnttab_transfer( > guest_physmap_add_page(e, sha->full_page.frame, mfn, 0); > sha->full_page.frame = mfn; > } > - wmb(); > + smp_wmb(); > shared_entry_header(e->grant_table, gop.ref)->flags |> GTF_transfer_completed; > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index 2162ef1..25a7d3d 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1472,7 +1472,7 @@ int assign_pages( > ASSERT(page_get_owner(&pg[i]) == NULL); > ASSERT((pg[i].count_info & ~(PGC_allocated | 1)) == 0); > page_set_owner(&pg[i], d); > - wmb(); /* Domain pointer must be visible before updating refcnt. */ > + smp_wmb(); /* Domain pointer must be visible before updating refcnt. > */ > pg[i].count_info = PGC_allocated | 1; > page_list_add_tail(&pg[i], &d->page_list); > } > diff --git a/xen/common/smp.c b/xen/common/smp.c > index b2b056b..482a203 100644 > --- a/xen/common/smp.c > +++ b/xen/common/smp.c > @@ -89,12 +89,12 @@ void smp_call_function_interrupt(void) > if ( call_data.wait ) > { > (*func)(info); > - mb(); > + smp_mb(); > cpumask_clear_cpu(cpu, &call_data.selected); > } > else > { > - mb(); > + smp_mb(); > cpumask_clear_cpu(cpu, &call_data.selected); > (*func)(info); > } > diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c > index bfb9670..575cc6d 100644 > --- a/xen/common/spinlock.c > +++ b/xen/common/spinlock.c > @@ -218,7 +218,7 @@ void _spin_barrier(spinlock_t *lock) > u64 loop = 0; > > check_barrier(&lock->debug); > - do { mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) ); > + do { smp_mb(); loop++;} while ( _raw_spin_is_locked(&lock->raw) ); > if ((loop > 1) && lock->profile) > { > lock->profile->time_block += NOW() - block; > @@ -226,9 +226,9 @@ void _spin_barrier(spinlock_t *lock) > } > #else > check_barrier(&lock->debug); > - do { mb(); } while ( _raw_spin_is_locked(&lock->raw) ); > + do { smp_mb(); } while ( _raw_spin_is_locked(&lock->raw) ); > #endif > - mb(); > + smp_mb(); > } > > int _spin_trylock_recursive(spinlock_t *lock) > diff --git a/xen/common/tmem_xen.c b/xen/common/tmem_xen.c > index 736a8c3..54ec09f 100644 > --- a/xen/common/tmem_xen.c > +++ b/xen/common/tmem_xen.c > @@ -173,7 +173,7 @@ EXPORT int tmh_copy_from_client(pfp_t *pfp, > return -EFAULT; > } > } > - mb(); > + smp_mb(); > if ( len == PAGE_SIZE && !tmem_offset && !pfn_offset && cli_va ) > tmh_copy_page(tmem_va, cli_va); > else if ( (tmem_offset+len <= PAGE_SIZE) && > @@ -216,7 +216,7 @@ EXPORT int tmh_compress_from_client(tmem_cli_mfn_t cmfn, > return 0; > else if ( copy_from_guest(scratch, clibuf, PAGE_SIZE) ) > return -EFAULT; > - mb(); > + smp_mb(); > ret = lzo1x_1_compress(cli_va ?: scratch, PAGE_SIZE, dmem, out_len, > wmem); > ASSERT(ret == LZO_E_OK); > *out_va = dmem; > @@ -260,7 +260,7 @@ EXPORT int tmh_copy_to_client(tmem_cli_mfn_t cmfn, pfp_t > *pfp, > unmap_domain_page(tmem_va); > if ( cli_va ) > cli_put_page(cli_va, cli_pfp, cli_mfn, 1); > - mb(); > + smp_mb(); > return rc; > } > > @@ -289,7 +289,7 @@ EXPORT int tmh_decompress_to_client(tmem_cli_mfn_t cmfn, > void *tmem_va, > cli_put_page(cli_va, cli_pfp, cli_mfn, 1); > else if ( copy_to_guest(clibuf, scratch, PAGE_SIZE) ) > return -EFAULT; > - mb(); > + smp_mb(); > return 1; > } > > @@ -311,7 +311,7 @@ EXPORT int tmh_copy_tze_to_client(tmem_cli_mfn_t cmfn, > void *tmem_va, > if ( len < PAGE_SIZE ) > memset((char *)cli_va+len,0,PAGE_SIZE-len); > cli_put_page(cli_va, cli_pfp, cli_mfn, 1); > - mb(); > + smp_mb(); > return 1; > } > > diff --git a/xen/common/trace.c b/xen/common/trace.c > index fd4ac48..63ea0b7 100644 > --- a/xen/common/trace.c > +++ b/xen/common/trace.c > @@ -255,7 +255,7 @@ static int alloc_trace_bufs(unsigned int pages) > opt_tbuf_size = pages; > > printk("xentrace: initialised\n"); > - wmb(); /* above must be visible before tb_init_done flag set */ > + smp_wmb(); /* above must be visible before tb_init_done flag set */ > tb_init_done = 1; > > return 0; > @@ -414,7 +414,7 @@ int tb_control(xen_sysctl_tbuf_op_t *tbc) > int i; > > tb_init_done = 0; > - wmb(); > + smp_wmb(); > /* Clear any lost-record info so we don''t get phantom lost records > next time we > * start tracing. Grab the lock to make sure we''re not racing > anyone. After this > * hypercall returns, no more records should be placed into the > buffers. */ > @@ -607,7 +607,7 @@ static inline void __insert_record(struct t_buf *buf, > memcpy(next_page, (char *)rec + remaining, rec_size - remaining); > } > > - wmb(); > + smp_wmb(); > > next += rec_size; > if ( next >= 2*data_size ) > @@ -718,7 +718,7 @@ void __trace_var(u32 event, bool_t cycles, unsigned int > extra, > return; > > /* Read tb_init_done /before/ t_bufs. */ > - rmb(); > + smp_rmb(); > > spin_lock_irqsave(&this_cpu(t_lock), flags); > > diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c > index 7cd7bf6..b696b3e 100644 > --- a/xen/drivers/char/console.c > +++ b/xen/drivers/char/console.c > @@ -648,7 +648,7 @@ void __init console_init_postirq(void) > for ( i = conringc ; i != conringp; i++ ) > ring[i & (opt_conring_size - 1)] = conring[i & (conring_size - 1)]; > conring = ring; > - wmb(); /* Allow users of console_force_unlock() to see larger buffer. */ > + smp_wmb(); /* Allow users of console_force_unlock() to see larger buffer. > */ > conring_size = opt_conring_size; > spin_unlock_irq(&console_lock); > > diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h > index 4ac39ad..6f60162 100644 > --- a/xen/include/xen/event.h > +++ b/xen/include/xen/event.h > @@ -85,7 +85,7 @@ void notify_via_xen_event_channel(struct domain *ld, int > lport); > if ( condition ) \ > break; \ > set_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ > - mb(); /* set blocked status /then/ re-evaluate condition */ \ > + smp_mb(); /* set blocked status /then/ re-evaluate condition */ \ > if ( condition ) \ > { \ > clear_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ > @@ -99,7 +99,7 @@ void notify_via_xen_event_channel(struct domain *ld, int > lport); > do { \ > set_bit(_VPF_blocked_in_xen, ¤t->pause_flags); \ > raise_softirq(SCHEDULE_SOFTIRQ); \ > - mb(); /* set blocked status /then/ caller does his work */ \ > + smp_mb(); /* set blocked status /then/ caller does his work */ \ > } while ( 0 ) > > #endif /* __XEN_EVENT_H__ */ > diff --git a/xen/xsm/flask/ss/sidtab.c b/xen/xsm/flask/ss/sidtab.c > index 586033c..cd1360c 100644 > --- a/xen/xsm/flask/ss/sidtab.c > +++ b/xen/xsm/flask/ss/sidtab.c > @@ -79,13 +79,13 @@ int sidtab_insert(struct sidtab *s, u32 sid, struct > context *context) > if ( prev ) > { > newnode->next = prev->next; > - wmb(); > + smp_wmb(); > prev->next = newnode; > } > else > { > newnode->next = s->htable[hvalue]; > - wmb(); > + smp_wmb(); > s->htable[hvalue] = newnode; > } >
Stefano Stabellini
2013-Jul-01 15:19 UTC
Re: [PATCH 06/10] xen: arm: Use SMP barriers when that is all which is required.
On Fri, 28 Jun 2013, Ian Campbell wrote:> As explained in the previous commit SMP barriers can be used when all we care > about is synchronising against other processors. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > xen/arch/arm/mm.c | 2 +- > xen/arch/arm/smpboot.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index c5213f2..3f049cb 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -776,7 +776,7 @@ void share_xen_page_with_guest(struct page_info *page, > page->u.inuse.type_info |= PGT_validated | 1; > > page_set_owner(page, d); > - wmb(); /* install valid domain ptr before updating refcnt. */ > + smp_wmb(); /* install valid domain ptr before updating refcnt. */ > ASSERT((page->count_info & ~PGC_xen_heap) == 0); > > /* Only add to the allocation list if the domain isn''t dying. */ > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 8011987..727e09f 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -170,11 +170,11 @@ void __cpuinit start_secondary(unsigned long boot_phys_offset, > > /* Run local notifiers */ > notify_cpu_starting(cpuid); > - wmb(); > + smp_wmb(); > > /* Now report this CPU is up */ > cpumask_set_cpu(cpuid, &cpu_online_map); > - wmb(); > + smp_wmb(); > > local_irq_enable();Did you missed few mb() in smpboot.c?
Stefano Stabellini
2013-Jul-01 15:20 UTC
Re: [PATCH 07/10] xen: arm: Use dmb for smp barriers
On Fri, 28 Jun 2013, Ian Campbell wrote:> The full power of dsb is not required in this context. > > Also change wmb() to be dsb() directly instead of indirectly via mb(), for > clarity. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/include/asm-arm/system.h | 8 ++++---- > 1 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > index e003624..89c61ef 100644 > --- a/xen/include/asm-arm/system.h > +++ b/xen/include/asm-arm/system.h > @@ -18,11 +18,11 @@ > > #define mb() dsb() > #define rmb() dsb() > -#define wmb() mb() > +#define wmb() dsb() > > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > +#define smp_mb() dmb() > +#define smp_rmb() dmb() > +#define smp_wmb() dmb() > > #define xchg(ptr,x) \ > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > -- > 1.7.2.5 >
Stefano Stabellini
2013-Jul-01 15:21 UTC
Re: [PATCH 08/10] xen: arm: add scope to dsb and dmb macros
On Fri, 28 Jun 2013, Ian Campbell wrote:> Everywhere currently passes "sy"stem, so no actual change. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/arch/arm/domain.c | 2 +- > xen/arch/arm/gic.c | 8 ++++---- > xen/arch/arm/mm.c | 2 +- > xen/arch/arm/platforms/vexpress.c | 6 +++--- > xen/arch/arm/smpboot.c | 2 +- > xen/arch/arm/time.c | 2 +- > xen/drivers/video/arm_hdlcd.c | 2 +- > xen/include/asm-arm/arm32/flushtlb.h | 8 ++++---- > xen/include/asm-arm/arm32/io.h | 4 ++-- > xen/include/asm-arm/arm32/page.h | 4 ++-- > xen/include/asm-arm/arm64/io.h | 4 ++-- > xen/include/asm-arm/arm64/page.h | 4 ++-- > xen/include/asm-arm/page.h | 4 ++-- > xen/include/asm-arm/system.h | 16 ++++++++-------- > 14 files changed, 34 insertions(+), 34 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 4c434a1..0bd8f6b 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -44,7 +44,7 @@ void idle_loop(void) > local_irq_disable(); > if ( cpu_is_haltable(smp_processor_id()) ) > { > - dsb(); > + dsb("sy"); > wfi(); > } > local_irq_enable(); > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 177560e..42095ee 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -432,7 +432,7 @@ void send_SGI_mask(const cpumask_t *cpumask, enum gic_sgi sgi) > > ASSERT(mask < 0x100); /* The target bitmap only supports 8 CPUs */ > > - dsb(); > + dsb("sy"); > > GICD[GICD_SGIR] = GICD_SGI_TARGET_LIST > | (mask<<GICD_SGI_TARGET_SHIFT) > @@ -449,7 +449,7 @@ void send_SGI_self(enum gic_sgi sgi) > { > ASSERT(sgi < 16); /* There are only 16 SGIs */ > > - dsb(); > + dsb("sy"); > > GICD[GICD_SGIR] = GICD_SGI_TARGET_SELF > | sgi; > @@ -459,7 +459,7 @@ void send_SGI_allbutself(enum gic_sgi sgi) > { > ASSERT(sgi < 16); /* There are only 16 SGIs */ > > - dsb(); > + dsb("sy"); > > GICD[GICD_SGIR] = GICD_SGI_TARGET_OTHERS > | sgi; > @@ -546,7 +546,7 @@ static int __setup_irq(struct irq_desc *desc, unsigned int irq, > > desc->action = new; > desc->status &= ~IRQ_DISABLED; > - dsb(); > + dsb("sy"); > > desc->handler->startup(desc); > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 3f049cb..b287a9b 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -324,7 +324,7 @@ void __cpuinit setup_virt_paging(void) > #define WRITE_TTBR(ttbr) \ > flush_xen_text_tlb(); \ > WRITE_SYSREG64(ttbr, TTBR0_EL2); \ > - dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ \ > + dsb("sy"); /* ensure memory accesses do not cross over the TTBR0 write */ \ > /* flush_xen_text_tlb contains an initial isb which ensures the \ > * write to TTBR0 has completed. */ \ > flush_xen_text_tlb() > diff --git a/xen/arch/arm/platforms/vexpress.c b/xen/arch/arm/platforms/vexpress.c > index 8fc30c4..6f6869e 100644 > --- a/xen/arch/arm/platforms/vexpress.c > +++ b/xen/arch/arm/platforms/vexpress.c > @@ -46,7 +46,7 @@ static inline int vexpress_ctrl_start(uint32_t *syscfg, int write, > /* wait for complete flag to be set */ > do { > stat = syscfg[V2M_SYS_CFGSTAT/4]; > - dsb(); > + dsb("sy"); > } while ( !(stat & V2M_SYS_CFG_COMPLETE) ); > > /* check error status and return error flag if set */ > @@ -111,10 +111,10 @@ static void vexpress_reset(void) > > /* switch to slow mode */ > iowritel(sp810, 0x3); > - dsb(); isb(); > + dsb("sy"); isb(); > /* writing any value to SCSYSSTAT reg will reset the system */ > iowritel(sp810 + 4, 0x1); > - dsb(); isb(); > + dsb("sy"); isb(); > > iounmap(sp810); > } > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 727e09f..b88355f 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -211,7 +211,7 @@ void stop_cpu(void) > local_irq_disable(); > cpu_is_dead = 1; > /* Make sure the write happens before we sleep forever */ > - dsb(); > + dsb("sy"); > isb(); > while ( 1 ) > wfi(); > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c > index 4ed7882..2c254f4 100644 > --- a/xen/arch/arm/time.c > +++ b/xen/arch/arm/time.c > @@ -252,7 +252,7 @@ void udelay(unsigned long usecs) > s_time_t deadline = get_s_time() + 1000 * (s_time_t) usecs; > while ( get_s_time() - deadline < 0 ) > ; > - dsb(); > + dsb("sy"); > isb(); > } > > diff --git a/xen/drivers/video/arm_hdlcd.c b/xen/drivers/video/arm_hdlcd.c > index 72979ea..5c09b0e 100644 > --- a/xen/drivers/video/arm_hdlcd.c > +++ b/xen/drivers/video/arm_hdlcd.c > @@ -77,7 +77,7 @@ void (*video_puts)(const char *) = vga_noop_puts; > > static void hdlcd_flush(void) > { > - dsb(); > + dsb("sy"); > } > > static int __init get_color_masks(const char* bpp, struct color_masks **masks) > diff --git a/xen/include/asm-arm/arm32/flushtlb.h b/xen/include/asm-arm/arm32/flushtlb.h > index 14e8827..2776375 100644 > --- a/xen/include/asm-arm/arm32/flushtlb.h > +++ b/xen/include/asm-arm/arm32/flushtlb.h > @@ -4,22 +4,22 @@ > /* Flush local TLBs, current VMID only */ > static inline void flush_tlb_local(void) > { > - dsb(); > + dsb("sy"); > > WRITE_CP32((uint32_t) 0, TLBIALLIS); > > - dsb(); > + dsb("sy"); > isb(); > } > > /* Flush local TLBs, all VMIDs, non-hypervisor mode */ > static inline void flush_tlb_all_local(void) > { > - dsb(); > + dsb("sy"); > > WRITE_CP32((uint32_t) 0, TLBIALLNSNHIS); > > - dsb(); > + dsb("sy"); > isb(); > } > > diff --git a/xen/include/asm-arm/arm32/io.h b/xen/include/asm-arm/arm32/io.h > index ec7e0ff..cb0bc96 100644 > --- a/xen/include/asm-arm/arm32/io.h > +++ b/xen/include/asm-arm/arm32/io.h > @@ -30,14 +30,14 @@ static inline uint32_t ioreadl(const volatile void __iomem *addr) > asm volatile("ldr %1, %0" > : "+Qo" (*(volatile uint32_t __force *)addr), > "=r" (val)); > - dsb(); > + dsb("sy"); > > return val; > } > > static inline void iowritel(const volatile void __iomem *addr, uint32_t val) > { > - dsb(); > + dsb("sy"); > asm volatile("str %1, %0" > : "+Qo" (*(volatile uint32_t __force *)addr) > : "r" (val)); > diff --git a/xen/include/asm-arm/arm32/page.h b/xen/include/asm-arm/arm32/page.h > index e573502..f8dfbd3 100644 > --- a/xen/include/asm-arm/arm32/page.h > +++ b/xen/include/asm-arm/arm32/page.h > @@ -67,13 +67,13 @@ static inline void flush_xen_data_tlb(void) > static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long size) > { > unsigned long end = va + size; > - dsb(); /* Ensure preceding are visible */ > + dsb("sy"); /* Ensure preceding are visible */ > while ( va < end ) { > asm volatile(STORE_CP32(0, TLBIMVAHIS) > : : "r" (va) : "memory"); > va += PAGE_SIZE; > } > - dsb(); /* Ensure completion of the TLB flush */ > + dsb("sy"); /* Ensure completion of the TLB flush */ > isb(); > } > > diff --git a/xen/include/asm-arm/arm64/io.h b/xen/include/asm-arm/arm64/io.h > index ec041cd..0a100ad 100644 > --- a/xen/include/asm-arm/arm64/io.h > +++ b/xen/include/asm-arm/arm64/io.h > @@ -24,14 +24,14 @@ static inline uint32_t ioreadl(const volatile void __iomem *addr) > uint32_t val; > > asm volatile("ldr %w0, [%1]" : "=r" (val) : "r" (addr)); > - dsb(); > + dsb("sy"); > > return val; > } > > static inline void iowritel(const volatile void __iomem *addr, uint32_t val) > { > - dsb(); > + dsb("sy"); > asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr)); > } > > diff --git a/xen/include/asm-arm/arm64/page.h b/xen/include/asm-arm/arm64/page.h > index 28748d3..aca1590 100644 > --- a/xen/include/asm-arm/arm64/page.h > +++ b/xen/include/asm-arm/arm64/page.h > @@ -60,13 +60,13 @@ static inline void flush_xen_data_tlb(void) > static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long size) > { > unsigned long end = va + size; > - dsb(); /* Ensure preceding are visible */ > + dsb("sy"); /* Ensure preceding are visible */ > while ( va < end ) { > asm volatile("tlbi vae2is, %0;" > : : "r" (va>>PAGE_SHIFT) : "memory"); > va += PAGE_SIZE; > } > - dsb(); /* Ensure completion of the TLB flush */ > + dsb("sy"); /* Ensure completion of the TLB flush */ > isb(); > } > > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index cd38956..eb007ac 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -284,10 +284,10 @@ extern size_t cacheline_bytes; > static inline void flush_xen_dcache_va_range(void *p, unsigned long size) > { > void *end; > - dsb(); /* So the CPU issues all writes to the range */ > + dsb("sy"); /* So the CPU issues all writes to the range */ > for ( end = p + size; p < end; p += cacheline_bytes ) > asm volatile (__flush_xen_dcache_one(0) : : "r" (p)); > - dsb(); /* So we know the flushes happen before continuing */ > + dsb("sy"); /* So we know the flushes happen before continuing */ > } > > /* Macro for flushing a single small item. The predicate is always > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > index 89c61ef..68efba9 100644 > --- a/xen/include/asm-arm/system.h > +++ b/xen/include/asm-arm/system.h > @@ -13,16 +13,16 @@ > #define wfi() asm volatile("wfi" : : : "memory") > > #define isb() asm volatile("isb" : : : "memory") > -#define dsb() asm volatile("dsb sy" : : : "memory") > -#define dmb() asm volatile("dmb sy" : : : "memory") > +#define dsb(scope) asm volatile("dsb " scope : : : "memory") > +#define dmb(scope) asm volatile("dmb " scope : : : "memory") > > -#define mb() dsb() > -#define rmb() dsb() > -#define wmb() dsb() > +#define mb() dsb("sy") > +#define rmb() dsb("sy") > +#define wmb() dsb("sy") > > -#define smp_mb() dmb() > -#define smp_rmb() dmb() > -#define smp_wmb() dmb() > +#define smp_mb() dmb("sy") > +#define smp_rmb() dmb("sy") > +#define smp_wmb() dmb("sy") > > #define xchg(ptr,x) \ > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > -- > 1.7.2.5 >
Stefano Stabellini
2013-Jul-01 15:21 UTC
Re: [PATCH 09/10] xen: arm: weaken SMP barriers to inner shareable.
On Fri, 28 Jun 2013, Ian Campbell wrote:> Since all processors are in the inner-shareable domain and we map everything > that way this is sufficient. > > The non-SMP barriers remain full system. Although in principal they could > become outer shareable barriers for some hardware this would require us to > know which class a given device is. Given the small number of device drivers > in Xen itself its probably not worth worrying over, although maybe someone > will benchmark at some point. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/include/asm-arm/system.h | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > index 68efba9..7c3e42d 100644 > --- a/xen/include/asm-arm/system.h > +++ b/xen/include/asm-arm/system.h > @@ -20,9 +20,9 @@ > #define rmb() dsb("sy") > #define wmb() dsb("sy") > > -#define smp_mb() dmb("sy") > -#define smp_rmb() dmb("sy") > -#define smp_wmb() dmb("sy") > +#define smp_mb() dmb("ish") > +#define smp_rmb() dmb("ish") > +#define smp_wmb() dmb("ish") > > #define xchg(ptr,x) \ > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > -- > 1.7.2.5 >
Stefano Stabellini
2013-Jul-01 15:22 UTC
Re: [PATCH 10/10] xen: arm: use more specific barriers for read and write barriers.
On Fri, 28 Jun 2013, Ian Campbell wrote:> Note that 32-bit does not provide a load variant of the inner shareable > barrier, so that remains a full any-any barrier. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/include/asm-arm/system.h | 13 +++++++++---- > 1 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > index 7c3e42d..efaf645 100644 > --- a/xen/include/asm-arm/system.h > +++ b/xen/include/asm-arm/system.h > @@ -17,12 +17,17 @@ > #define dmb(scope) asm volatile("dmb " scope : : : "memory") > > #define mb() dsb("sy") > -#define rmb() dsb("sy") > -#define wmb() dsb("sy") > +#define rmb() dsb("ld") > +#define wmb() dsb("st") > > #define smp_mb() dmb("ish") > -#define smp_rmb() dmb("ish") > -#define smp_wmb() dmb("ish") > +#ifdef CONFIG_ARM_64 > +#define smp_rmb() dmb("ishld") > +#else > +#define smp_rmb() dmb("ish") /* 32-bit has no ishld variant. */ > +#endif > + > +#define smp_wmb() dmb("ishst") > > #define xchg(ptr,x) \ > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > -- > 1.7.2.5 >
Stefano Stabellini
2013-Jul-01 15:22 UTC
Re: [PATCH 08/10] xen: arm: add scope to dsb and dmb macros
On Mon, 1 Jul 2013, Stefano Stabellini wrote:> On Fri, 28 Jun 2013, Ian Campbell wrote: > > Everywhere currently passes "sy"stem, so no actual change. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>sorry I meant Acked-by> > > xen/arch/arm/domain.c | 2 +- > > xen/arch/arm/gic.c | 8 ++++---- > > xen/arch/arm/mm.c | 2 +- > > xen/arch/arm/platforms/vexpress.c | 6 +++--- > > xen/arch/arm/smpboot.c | 2 +- > > xen/arch/arm/time.c | 2 +- > > xen/drivers/video/arm_hdlcd.c | 2 +- > > xen/include/asm-arm/arm32/flushtlb.h | 8 ++++---- > > xen/include/asm-arm/arm32/io.h | 4 ++-- > > xen/include/asm-arm/arm32/page.h | 4 ++-- > > xen/include/asm-arm/arm64/io.h | 4 ++-- > > xen/include/asm-arm/arm64/page.h | 4 ++-- > > xen/include/asm-arm/page.h | 4 ++-- > > xen/include/asm-arm/system.h | 16 ++++++++-------- > > 14 files changed, 34 insertions(+), 34 deletions(-) > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index 4c434a1..0bd8f6b 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -44,7 +44,7 @@ void idle_loop(void) > > local_irq_disable(); > > if ( cpu_is_haltable(smp_processor_id()) ) > > { > > - dsb(); > > + dsb("sy"); > > wfi(); > > } > > local_irq_enable(); > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index 177560e..42095ee 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -432,7 +432,7 @@ void send_SGI_mask(const cpumask_t *cpumask, enum gic_sgi sgi) > > > > ASSERT(mask < 0x100); /* The target bitmap only supports 8 CPUs */ > > > > - dsb(); > > + dsb("sy"); > > > > GICD[GICD_SGIR] = GICD_SGI_TARGET_LIST > > | (mask<<GICD_SGI_TARGET_SHIFT) > > @@ -449,7 +449,7 @@ void send_SGI_self(enum gic_sgi sgi) > > { > > ASSERT(sgi < 16); /* There are only 16 SGIs */ > > > > - dsb(); > > + dsb("sy"); > > > > GICD[GICD_SGIR] = GICD_SGI_TARGET_SELF > > | sgi; > > @@ -459,7 +459,7 @@ void send_SGI_allbutself(enum gic_sgi sgi) > > { > > ASSERT(sgi < 16); /* There are only 16 SGIs */ > > > > - dsb(); > > + dsb("sy"); > > > > GICD[GICD_SGIR] = GICD_SGI_TARGET_OTHERS > > | sgi; > > @@ -546,7 +546,7 @@ static int __setup_irq(struct irq_desc *desc, unsigned int irq, > > > > desc->action = new; > > desc->status &= ~IRQ_DISABLED; > > - dsb(); > > + dsb("sy"); > > > > desc->handler->startup(desc); > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > index 3f049cb..b287a9b 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -324,7 +324,7 @@ void __cpuinit setup_virt_paging(void) > > #define WRITE_TTBR(ttbr) \ > > flush_xen_text_tlb(); \ > > WRITE_SYSREG64(ttbr, TTBR0_EL2); \ > > - dsb(); /* ensure memory accesses do not cross over the TTBR0 write */ \ > > + dsb("sy"); /* ensure memory accesses do not cross over the TTBR0 write */ \ > > /* flush_xen_text_tlb contains an initial isb which ensures the \ > > * write to TTBR0 has completed. */ \ > > flush_xen_text_tlb() > > diff --git a/xen/arch/arm/platforms/vexpress.c b/xen/arch/arm/platforms/vexpress.c > > index 8fc30c4..6f6869e 100644 > > --- a/xen/arch/arm/platforms/vexpress.c > > +++ b/xen/arch/arm/platforms/vexpress.c > > @@ -46,7 +46,7 @@ static inline int vexpress_ctrl_start(uint32_t *syscfg, int write, > > /* wait for complete flag to be set */ > > do { > > stat = syscfg[V2M_SYS_CFGSTAT/4]; > > - dsb(); > > + dsb("sy"); > > } while ( !(stat & V2M_SYS_CFG_COMPLETE) ); > > > > /* check error status and return error flag if set */ > > @@ -111,10 +111,10 @@ static void vexpress_reset(void) > > > > /* switch to slow mode */ > > iowritel(sp810, 0x3); > > - dsb(); isb(); > > + dsb("sy"); isb(); > > /* writing any value to SCSYSSTAT reg will reset the system */ > > iowritel(sp810 + 4, 0x1); > > - dsb(); isb(); > > + dsb("sy"); isb(); > > > > iounmap(sp810); > > } > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > > index 727e09f..b88355f 100644 > > --- a/xen/arch/arm/smpboot.c > > +++ b/xen/arch/arm/smpboot.c > > @@ -211,7 +211,7 @@ void stop_cpu(void) > > local_irq_disable(); > > cpu_is_dead = 1; > > /* Make sure the write happens before we sleep forever */ > > - dsb(); > > + dsb("sy"); > > isb(); > > while ( 1 ) > > wfi(); > > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c > > index 4ed7882..2c254f4 100644 > > --- a/xen/arch/arm/time.c > > +++ b/xen/arch/arm/time.c > > @@ -252,7 +252,7 @@ void udelay(unsigned long usecs) > > s_time_t deadline = get_s_time() + 1000 * (s_time_t) usecs; > > while ( get_s_time() - deadline < 0 ) > > ; > > - dsb(); > > + dsb("sy"); > > isb(); > > } > > > > diff --git a/xen/drivers/video/arm_hdlcd.c b/xen/drivers/video/arm_hdlcd.c > > index 72979ea..5c09b0e 100644 > > --- a/xen/drivers/video/arm_hdlcd.c > > +++ b/xen/drivers/video/arm_hdlcd.c > > @@ -77,7 +77,7 @@ void (*video_puts)(const char *) = vga_noop_puts; > > > > static void hdlcd_flush(void) > > { > > - dsb(); > > + dsb("sy"); > > } > > > > static int __init get_color_masks(const char* bpp, struct color_masks **masks) > > diff --git a/xen/include/asm-arm/arm32/flushtlb.h b/xen/include/asm-arm/arm32/flushtlb.h > > index 14e8827..2776375 100644 > > --- a/xen/include/asm-arm/arm32/flushtlb.h > > +++ b/xen/include/asm-arm/arm32/flushtlb.h > > @@ -4,22 +4,22 @@ > > /* Flush local TLBs, current VMID only */ > > static inline void flush_tlb_local(void) > > { > > - dsb(); > > + dsb("sy"); > > > > WRITE_CP32((uint32_t) 0, TLBIALLIS); > > > > - dsb(); > > + dsb("sy"); > > isb(); > > } > > > > /* Flush local TLBs, all VMIDs, non-hypervisor mode */ > > static inline void flush_tlb_all_local(void) > > { > > - dsb(); > > + dsb("sy"); > > > > WRITE_CP32((uint32_t) 0, TLBIALLNSNHIS); > > > > - dsb(); > > + dsb("sy"); > > isb(); > > } > > > > diff --git a/xen/include/asm-arm/arm32/io.h b/xen/include/asm-arm/arm32/io.h > > index ec7e0ff..cb0bc96 100644 > > --- a/xen/include/asm-arm/arm32/io.h > > +++ b/xen/include/asm-arm/arm32/io.h > > @@ -30,14 +30,14 @@ static inline uint32_t ioreadl(const volatile void __iomem *addr) > > asm volatile("ldr %1, %0" > > : "+Qo" (*(volatile uint32_t __force *)addr), > > "=r" (val)); > > - dsb(); > > + dsb("sy"); > > > > return val; > > } > > > > static inline void iowritel(const volatile void __iomem *addr, uint32_t val) > > { > > - dsb(); > > + dsb("sy"); > > asm volatile("str %1, %0" > > : "+Qo" (*(volatile uint32_t __force *)addr) > > : "r" (val)); > > diff --git a/xen/include/asm-arm/arm32/page.h b/xen/include/asm-arm/arm32/page.h > > index e573502..f8dfbd3 100644 > > --- a/xen/include/asm-arm/arm32/page.h > > +++ b/xen/include/asm-arm/arm32/page.h > > @@ -67,13 +67,13 @@ static inline void flush_xen_data_tlb(void) > > static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long size) > > { > > unsigned long end = va + size; > > - dsb(); /* Ensure preceding are visible */ > > + dsb("sy"); /* Ensure preceding are visible */ > > while ( va < end ) { > > asm volatile(STORE_CP32(0, TLBIMVAHIS) > > : : "r" (va) : "memory"); > > va += PAGE_SIZE; > > } > > - dsb(); /* Ensure completion of the TLB flush */ > > + dsb("sy"); /* Ensure completion of the TLB flush */ > > isb(); > > } > > > > diff --git a/xen/include/asm-arm/arm64/io.h b/xen/include/asm-arm/arm64/io.h > > index ec041cd..0a100ad 100644 > > --- a/xen/include/asm-arm/arm64/io.h > > +++ b/xen/include/asm-arm/arm64/io.h > > @@ -24,14 +24,14 @@ static inline uint32_t ioreadl(const volatile void __iomem *addr) > > uint32_t val; > > > > asm volatile("ldr %w0, [%1]" : "=r" (val) : "r" (addr)); > > - dsb(); > > + dsb("sy"); > > > > return val; > > } > > > > static inline void iowritel(const volatile void __iomem *addr, uint32_t val) > > { > > - dsb(); > > + dsb("sy"); > > asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr)); > > } > > > > diff --git a/xen/include/asm-arm/arm64/page.h b/xen/include/asm-arm/arm64/page.h > > index 28748d3..aca1590 100644 > > --- a/xen/include/asm-arm/arm64/page.h > > +++ b/xen/include/asm-arm/arm64/page.h > > @@ -60,13 +60,13 @@ static inline void flush_xen_data_tlb(void) > > static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long size) > > { > > unsigned long end = va + size; > > - dsb(); /* Ensure preceding are visible */ > > + dsb("sy"); /* Ensure preceding are visible */ > > while ( va < end ) { > > asm volatile("tlbi vae2is, %0;" > > : : "r" (va>>PAGE_SHIFT) : "memory"); > > va += PAGE_SIZE; > > } > > - dsb(); /* Ensure completion of the TLB flush */ > > + dsb("sy"); /* Ensure completion of the TLB flush */ > > isb(); > > } > > > > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > > index cd38956..eb007ac 100644 > > --- a/xen/include/asm-arm/page.h > > +++ b/xen/include/asm-arm/page.h > > @@ -284,10 +284,10 @@ extern size_t cacheline_bytes; > > static inline void flush_xen_dcache_va_range(void *p, unsigned long size) > > { > > void *end; > > - dsb(); /* So the CPU issues all writes to the range */ > > + dsb("sy"); /* So the CPU issues all writes to the range */ > > for ( end = p + size; p < end; p += cacheline_bytes ) > > asm volatile (__flush_xen_dcache_one(0) : : "r" (p)); > > - dsb(); /* So we know the flushes happen before continuing */ > > + dsb("sy"); /* So we know the flushes happen before continuing */ > > } > > > > /* Macro for flushing a single small item. The predicate is always > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > > index 89c61ef..68efba9 100644 > > --- a/xen/include/asm-arm/system.h > > +++ b/xen/include/asm-arm/system.h > > @@ -13,16 +13,16 @@ > > #define wfi() asm volatile("wfi" : : : "memory") > > > > #define isb() asm volatile("isb" : : : "memory") > > -#define dsb() asm volatile("dsb sy" : : : "memory") > > -#define dmb() asm volatile("dmb sy" : : : "memory") > > +#define dsb(scope) asm volatile("dsb " scope : : : "memory") > > +#define dmb(scope) asm volatile("dmb " scope : : : "memory") > > > > -#define mb() dsb() > > -#define rmb() dsb() > > -#define wmb() dsb() > > +#define mb() dsb("sy") > > +#define rmb() dsb("sy") > > +#define wmb() dsb("sy") > > > > -#define smp_mb() dmb() > > -#define smp_rmb() dmb() > > -#define smp_wmb() dmb() > > +#define smp_mb() dmb("sy") > > +#define smp_rmb() dmb("sy") > > +#define smp_wmb() dmb("sy") > > > > #define xchg(ptr,x) \ > > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > > -- > > 1.7.2.5 > > >
Stefano Stabellini
2013-Jul-01 15:22 UTC
Re: [PATCH 09/10] xen: arm: weaken SMP barriers to inner shareable.
On Mon, 1 Jul 2013, Stefano Stabellini wrote:> On Fri, 28 Jun 2013, Ian Campbell wrote: > > Since all processors are in the inner-shareable domain and we map everything > > that way this is sufficient. > > > > The non-SMP barriers remain full system. Although in principal they could > > become outer shareable barriers for some hardware this would require us to > > know which class a given device is. Given the small number of device drivers > > in Xen itself its probably not worth worrying over, although maybe someone > > will benchmark at some point. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> >same here> > xen/include/asm-arm/system.h | 6 +++--- > > 1 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > > index 68efba9..7c3e42d 100644 > > --- a/xen/include/asm-arm/system.h > > +++ b/xen/include/asm-arm/system.h > > @@ -20,9 +20,9 @@ > > #define rmb() dsb("sy") > > #define wmb() dsb("sy") > > > > -#define smp_mb() dmb("sy") > > -#define smp_rmb() dmb("sy") > > -#define smp_wmb() dmb("sy") > > +#define smp_mb() dmb("ish") > > +#define smp_rmb() dmb("ish") > > +#define smp_wmb() dmb("ish") > > > > #define xchg(ptr,x) \ > > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > > -- > > 1.7.2.5 > > >
Stefano Stabellini
2013-Jul-01 15:24 UTC
Re: [PATCH 02/10] xen: arm: Only upgrade guest barriers to inner shareable.
On Fri, 28 Jun 2013, Ian Campbell wrote:> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/arch/arm/traps.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 398d209..e40ae2e 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -62,7 +62,7 @@ void __cpuinit init_traps(void) > WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2); > > /* Setup hypervisor traps */ > - WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); > + WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); > isb(); > }
Ian Campbell
2013-Jul-01 15:24 UTC
Re: [PATCH 06/10] xen: arm: Use SMP barriers when that is all which is required.
On Mon, 2013-07-01 at 16:19 +0100, Stefano Stabellini wrote:> On Fri, 28 Jun 2013, Ian Campbell wrote: > > As explained in the previous commit SMP barriers can be used when all we care > > about is synchronising against other processors. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > --- > > xen/arch/arm/mm.c | 2 +- > > xen/arch/arm/smpboot.c | 4 ++-- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > index c5213f2..3f049cb 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -776,7 +776,7 @@ void share_xen_page_with_guest(struct page_info *page, > > page->u.inuse.type_info |= PGT_validated | 1; > > > > page_set_owner(page, d); > > - wmb(); /* install valid domain ptr before updating refcnt. */ > > + smp_wmb(); /* install valid domain ptr before updating refcnt. */ > > ASSERT((page->count_info & ~PGC_xen_heap) == 0); > > > > /* Only add to the allocation list if the domain isn''t dying. */ > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > > index 8011987..727e09f 100644 > > --- a/xen/arch/arm/smpboot.c > > +++ b/xen/arch/arm/smpboot.c > > @@ -170,11 +170,11 @@ void __cpuinit start_secondary(unsigned long boot_phys_offset, > > > > /* Run local notifiers */ > > notify_cpu_starting(cpuid); > > - wmb(); > > + smp_wmb(); > > > > /* Now report this CPU is up */ > > cpumask_set_cpu(cpuid, &cpu_online_map); > > - wmb(); > > + smp_wmb(); > > > > local_irq_enable(); > > Did you missed few mb() in smpboot.c?The ones in __cpu_disable and __cpu_die? I think I just wasn''t 100% sure they might not be touching hardware (i.e. some platform register to shutdown a CPU) and since they weren''t performance critical I punted on them. Looking it again the first half of that logic seems to be bogus (that code goes nowhere near any peripheral). Ian.
Stefano Stabellini
2013-Jul-01 15:25 UTC
Re: [PATCH 03/10] xen: arm: reduce instruction cache and tlb flushes to inner-shareable.
On Fri, 28 Jun 2013, Ian Campbell wrote:> Now that Xen maps memory and performs pagetable walks as inner shareable we > don''t need to push updates down so far when modifying page tables etc. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/include/asm-arm/arm32/flushtlb.h | 4 ++-- > xen/include/asm-arm/arm32/page.h | 8 ++++---- > xen/include/asm-arm/arm64/flushtlb.h | 4 ++-- > xen/include/asm-arm/arm64/page.h | 6 +++--- > 4 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/xen/include/asm-arm/arm32/flushtlb.h b/xen/include/asm-arm/arm32/flushtlb.h > index a258f58..14e8827 100644 > --- a/xen/include/asm-arm/arm32/flushtlb.h > +++ b/xen/include/asm-arm/arm32/flushtlb.h > @@ -6,7 +6,7 @@ static inline void flush_tlb_local(void) > { > dsb(); > > - WRITE_CP32((uint32_t) 0, TLBIALL); > + WRITE_CP32((uint32_t) 0, TLBIALLIS); > > dsb(); > isb(); > @@ -17,7 +17,7 @@ static inline void flush_tlb_all_local(void) > { > dsb(); > > - WRITE_CP32((uint32_t) 0, TLBIALLNSNH); > + WRITE_CP32((uint32_t) 0, TLBIALLNSNHIS); > > dsb(); > isb(); > diff --git a/xen/include/asm-arm/arm32/page.h b/xen/include/asm-arm/arm32/page.h > index 38bcffd..e573502 100644 > --- a/xen/include/asm-arm/arm32/page.h > +++ b/xen/include/asm-arm/arm32/page.h > @@ -39,8 +39,8 @@ static inline void flush_xen_text_tlb(void) > asm volatile ( > "isb;" /* Ensure synchronization with previous changes to text */ > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > - STORE_CP32(0, ICIALLU) /* Flush I-cache */ > - STORE_CP32(0, BPIALL) /* Flush branch predictor */ > + STORE_CP32(0, ICIALLUIS) /* Flush I-cache */ > + STORE_CP32(0, BPIALLIS) /* Flush branch predictor */ > "dsb;" /* Ensure completion of TLB+BP flush */ > "isb;" > : : "r" (r0) /*dummy*/ : "memory"); > @@ -54,7 +54,7 @@ static inline void flush_xen_data_tlb(void) > { > register unsigned long r0 asm ("r0"); > asm volatile("dsb;" /* Ensure preceding are visible */ > - STORE_CP32(0, TLBIALLH) > + STORE_CP32(0, TLBIALLHIS) > "dsb;" /* Ensure completion of the TLB flush */ > "isb;" > : : "r" (r0) /* dummy */: "memory"); > @@ -69,7 +69,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s > unsigned long end = va + size; > dsb(); /* Ensure preceding are visible */ > while ( va < end ) { > - asm volatile(STORE_CP32(0, TLBIMVAH) > + asm volatile(STORE_CP32(0, TLBIMVAHIS) > : : "r" (va) : "memory"); > va += PAGE_SIZE; > } > diff --git a/xen/include/asm-arm/arm64/flushtlb.h b/xen/include/asm-arm/arm64/flushtlb.h > index d0535a0..3a6d2cb 100644 > --- a/xen/include/asm-arm/arm64/flushtlb.h > +++ b/xen/include/asm-arm/arm64/flushtlb.h > @@ -6,7 +6,7 @@ static inline void flush_tlb_local(void) > { > asm volatile( > "dsb sy;" > - "tlbi vmalle1;" > + "tlbi vmalle1is;" > "dsb sy;" > "isb;" > : : : "memory"); > @@ -17,7 +17,7 @@ static inline void flush_tlb_all_local(void) > { > asm volatile( > "dsb sy;" > - "tlbi alle1;" > + "tlbi alle1is;" > "dsb sy;" > "isb;" > : : : "memory"); > diff --git a/xen/include/asm-arm/arm64/page.h b/xen/include/asm-arm/arm64/page.h > index bd48fe3..28748d3 100644 > --- a/xen/include/asm-arm/arm64/page.h > +++ b/xen/include/asm-arm/arm64/page.h > @@ -33,7 +33,7 @@ static inline void flush_xen_text_tlb(void) > asm volatile ( > "isb;" /* Ensure synchronization with previous changes to text */ > "tlbi alle2;" /* Flush hypervisor TLB */ > - "ic iallu;" /* Flush I-cache */ > + "ic ialluis;" /* Flush I-cache */ > "dsb sy;" /* Ensure completion of TLB flush */ > "isb;" > : : : "memory"); > @@ -47,7 +47,7 @@ static inline void flush_xen_data_tlb(void) > { > asm volatile ( > "dsb sy;" /* Ensure visibility of PTE writes */ > - "tlbi alle2;" /* Flush hypervisor TLB */ > + "tlbi alle2is;" /* Flush hypervisor TLB */ > "dsb sy;" /* Ensure completion of TLB flush */ > "isb;" > : : : "memory"); > @@ -62,7 +62,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s > unsigned long end = va + size; > dsb(); /* Ensure preceding are visible */ > while ( va < end ) { > - asm volatile("tlbi vae2, %0;" > + asm volatile("tlbi vae2is, %0;" > : : "r" (va>>PAGE_SHIFT) : "memory"); > va += PAGE_SIZE; > } > -- > 1.7.2.5 >
Stefano Stabellini
2013-Jul-01 15:25 UTC
Re: [PATCH 04/10] xen: arm: consolidate barrier definitions
On Fri, 28 Jun 2013, Ian Campbell wrote:> These are effectively identical on both 32- and 64-bit. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/include/asm-arm/arm32/system.h | 16 ---------------- > xen/include/asm-arm/arm64/system.h | 17 ----------------- > xen/include/asm-arm/system.h | 16 ++++++++++++++++ > 3 files changed, 16 insertions(+), 33 deletions(-) > > diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h > index b3736f4..9f233fe 100644 > --- a/xen/include/asm-arm/arm32/system.h > +++ b/xen/include/asm-arm/arm32/system.h > @@ -2,22 +2,6 @@ > #ifndef __ASM_ARM32_SYSTEM_H > #define __ASM_ARM32_SYSTEM_H > > -#define sev() __asm__ __volatile__ ("sev" : : : "memory") > -#define wfe() __asm__ __volatile__ ("wfe" : : : "memory") > -#define wfi() __asm__ __volatile__ ("wfi" : : : "memory") > - > -#define isb() __asm__ __volatile__ ("isb" : : : "memory") > -#define dsb() __asm__ __volatile__ ("dsb sy" : : : "memory") > -#define dmb() __asm__ __volatile__ ("dmb sy" : : : "memory") > - > -#define mb() dsb() > -#define rmb() dsb() > -#define wmb() mb() > - > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > - > extern void __bad_xchg(volatile void *, int); > > static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size) > diff --git a/xen/include/asm-arm/arm64/system.h b/xen/include/asm-arm/arm64/system.h > index 4e41913..46f901c 100644 > --- a/xen/include/asm-arm/arm64/system.h > +++ b/xen/include/asm-arm/arm64/system.h > @@ -2,23 +2,6 @@ > #ifndef __ASM_ARM64_SYSTEM_H > #define __ASM_ARM64_SYSTEM_H > > -#define sev() asm volatile("sev" : : : "memory") > -#define wfe() asm volatile("wfe" : : : "memory") > -#define wfi() asm volatile("wfi" : : : "memory") > - > -#define isb() asm volatile("isb" : : : "memory") > -#define dsb() asm volatile("dsb sy" : : : "memory") > -#define dmb() asm volatile("dmb sy" : : : "memory") > - > -#define mb() dsb() > -#define rmb() dsb() > -#define wmb() mb() > - > -#define smp_mb() mb() > -#define smp_rmb() rmb() > -#define smp_wmb() wmb() > - > - > extern void __bad_xchg(volatile void *, int); > > static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size) > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > index 290d38d..e003624 100644 > --- a/xen/include/asm-arm/system.h > +++ b/xen/include/asm-arm/system.h > @@ -8,6 +8,22 @@ > #define nop() \ > asm volatile ( "nop" ) > > +#define sev() asm volatile("sev" : : : "memory") > +#define wfe() asm volatile("wfe" : : : "memory") > +#define wfi() asm volatile("wfi" : : : "memory") > + > +#define isb() asm volatile("isb" : : : "memory") > +#define dsb() asm volatile("dsb sy" : : : "memory") > +#define dmb() asm volatile("dmb sy" : : : "memory") > + > +#define mb() dsb() > +#define rmb() dsb() > +#define wmb() mb() > + > +#define smp_mb() mb() > +#define smp_rmb() rmb() > +#define smp_wmb() wmb() > + > #define xchg(ptr,x) \ > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > > -- > 1.7.2.5 >
Stefano Stabellini
2013-Jul-01 15:39 UTC
Re: [PATCH 01/10] xen: arm: map memory as inner shareable.
On Fri, 28 Jun 2013, Ian Campbell wrote:> The inner shareable domain contains all SMP processors, including different > clusters (e.g. big.LITTLE). Therefore this is the correct thing to use for Xen > memory mappings. The outer shareable domain is for devices on busses which are > barriers (e.g. AMBA4). While the system domain is for things behind bridges > which do not. > > One wrinkle is that Normal memory with attributes Inner Non-cacheable, Outer > Non-cacheable (which we call BUFFERABLE) must be mapped Outer Shareable on ARM > v7. Therefore change the prototype of mfn_to_xen_entry to take the attribute > index so we can DTRT. On ARMv8 the sharability is ignored and considered to > always be Outer Shareable. > > While I''m here change all the dmb/dsb with an implicit sy to an explicit sy, > to make future changes simpler. Other than that don''t adjust the barriers, > flushes etc, those remain as they were (which is more than is now required). > I''ll change those in a later patch. > > Many thanks to Leif for explaining the difference between Inner- and > Outer-Shareable in words of two or less syllables, I hope I''ve replicated that > explanation properly above! > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org>It looks OK. I would have kept the dsb sy changes separate.> xen/arch/arm/arm32/head.S | 8 +++--- > xen/arch/arm/arm64/head.S | 8 +++--- > xen/arch/arm/mm.c | 24 ++++++++++------------ > xen/include/asm-arm/arm32/system.h | 4 +- > xen/include/asm-arm/page.h | 38 ++++++++++++++++++++++++++++++++--- > 5 files changed, 55 insertions(+), 27 deletions(-) > > diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S > index 0588d54..464c351 100644 > --- a/xen/arch/arm/arm32/head.S > +++ b/xen/arch/arm/arm32/head.S > @@ -24,8 +24,8 @@ > > #define ZIMAGE_MAGIC_NUMBER 0x016f2818 > > -#define PT_PT 0xe7f /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=1 P=1 */ > -#define PT_MEM 0xe7d /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=0 P=1 */ > +#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */ > +#define PT_MEM 0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */ > #define PT_DEV 0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */ > #define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */ > > @@ -206,10 +206,10 @@ skip_bss: > mcr CP32(r1, HMAIR1) > > /* Set up the HTCR: > - * PT walks use Outer-Shareable accesses, > + * PT walks use Inner-Shareable accesses, > * PT walks are write-back, write-allocate in both cache levels, > * Full 32-bit address space goes through this table. */ > - ldr r0, =0x80002500 > + ldr r0, =0x80003500 > mcr CP32(r0, HTCR) > > /* Set up the HSCTLR: > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > index 21b7e4d..ffcb880 100644 > --- a/xen/arch/arm/arm64/head.S > +++ b/xen/arch/arm/arm64/head.S > @@ -24,8 +24,8 @@ > #include <asm/page.h> > #include <asm/asm_defns.h> > > -#define PT_PT 0xe7f /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=1 P=1 */ > -#define PT_MEM 0xe7d /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=0 P=1 */ > +#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */ > +#define PT_MEM 0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */ > #define PT_DEV 0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */ > #define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */ > > @@ -178,10 +178,10 @@ skip_bss: > /* Set up the HTCR: > * PASize -- 4G > * Top byte is used > - * PT walks use Outer-Shareable accesses, > + * PT walks use Inner-Shareable accesses, > * PT walks are write-back, write-allocate in both cache levels, > * Full 64-bit address space goes through this table. */ > - ldr x0, =0x80802500 > + ldr x0, =0x80803500 > msr tcr_el2, x0 > > /* Set up the HSCTLR: > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index d1290cd..c5213f2 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -163,9 +163,8 @@ void dump_hyp_walk(vaddr_t addr) > /* Map a 4k page in a fixmap entry */ > void set_fixmap(unsigned map, unsigned long mfn, unsigned attributes) > { > - lpae_t pte = mfn_to_xen_entry(mfn); > + lpae_t pte = mfn_to_xen_entry(mfn, attributes); > pte.pt.table = 1; /* 4k mappings always have this bit set */ > - pte.pt.ai = attributes; > pte.pt.xn = 1; > write_pte(xen_fixmap + third_table_offset(FIXMAP_ADDR(map)), pte); > flush_xen_data_tlb_range_va(FIXMAP_ADDR(map), PAGE_SIZE); > @@ -212,7 +211,7 @@ void *map_domain_page(unsigned long mfn) > if ( map[slot].pt.avail == 0 ) > { > /* Commandeer this 2MB slot */ > - pte = mfn_to_xen_entry(slot_mfn); > + pte = mfn_to_xen_entry(slot_mfn, WRITEALLOC); > pte.pt.avail = 1; > write_pte(map + slot, pte); > break; > @@ -340,7 +339,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > /* Map the destination in the boot misc area. */ > dest_va = BOOT_MISC_VIRT_START; > - pte = mfn_to_xen_entry(xen_paddr >> PAGE_SHIFT); > + pte = mfn_to_xen_entry(xen_paddr >> PAGE_SHIFT, WRITEALLOC); > write_pte(xen_second + second_table_offset(dest_va), pte); > flush_xen_data_tlb_range_va(dest_va, SECOND_SIZE); > > @@ -387,7 +386,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > /* Link in the fixmap pagetable */ > pte = mfn_to_xen_entry((((unsigned long) xen_fixmap) + phys_offset) > - >> PAGE_SHIFT); > + >> PAGE_SHIFT, WRITEALLOC); > pte.pt.table = 1; > write_pte(xen_second + second_table_offset(FIXMAP_ADDR(0)), pte); > /* > @@ -402,7 +401,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT); > if ( !is_kernel(va) ) > break; > - pte = mfn_to_xen_entry(mfn); > + pte = mfn_to_xen_entry(mfn, WRITEALLOC); > pte.pt.table = 1; /* 4k mappings always have this bit set */ > if ( is_kernel_text(va) || is_kernel_inittext(va) ) > { > @@ -415,7 +414,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > /* No flush required here as page table is not hooked in yet. */ > } > pte = mfn_to_xen_entry((((unsigned long) xen_xenmap) + phys_offset) > - >> PAGE_SHIFT); > + >> PAGE_SHIFT, WRITEALLOC); > pte.pt.table = 1; > write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte); > /* TLBFLUSH and ISB would be needed here, but wait until we set WXN */ > @@ -467,7 +466,7 @@ int init_secondary_pagetables(int cpu) > /* Initialise first pagetable from first level of boot tables, and > * hook into the new root. */ > memcpy(first, boot_first, PAGE_SIZE); > - pte = mfn_to_xen_entry(virt_to_mfn(first)); > + pte = mfn_to_xen_entry(virt_to_mfn(first), WRITEALLOC); > pte.pt.table = 1; > write_pte(root, pte); > #endif > @@ -479,7 +478,7 @@ int init_secondary_pagetables(int cpu) > * domheap mapping pages. */ > for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ ) > { > - pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES), WRITEALLOC); > pte.pt.table = 1; > write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > } > @@ -525,7 +524,7 @@ static void __init create_mappings(unsigned long virt, > > count = nr_mfns / LPAE_ENTRIES; > p = xen_second + second_linear_offset(virt); > - pte = mfn_to_xen_entry(base_mfn); > + pte = mfn_to_xen_entry(base_mfn, WRITEALLOC); > pte.pt.hint = 1; /* These maps are in 16-entry contiguous chunks. */ > for ( i = 0; i < count; i++ ) > { > @@ -595,7 +594,7 @@ static int create_xen_table(lpae_t *entry) > if ( p == NULL ) > return -ENOMEM; > clear_page(p); > - pte = mfn_to_xen_entry(virt_to_mfn(p)); > + pte = mfn_to_xen_entry(virt_to_mfn(p), WRITEALLOC); > pte.pt.table = 1; > write_pte(entry, pte); > return 0; > @@ -641,9 +640,8 @@ static int create_xen_entries(enum xenmap_operation op, > addr, mfn); > return -EINVAL; > } > - pte = mfn_to_xen_entry(mfn); > + pte = mfn_to_xen_entry(mfn, ai); > pte.pt.table = 1; > - pte.pt.ai = ai; > write_pte(&third[third_table_offset(addr)], pte); > break; > case REMOVE: > diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h > index 60148cb..b3736f4 100644 > --- a/xen/include/asm-arm/arm32/system.h > +++ b/xen/include/asm-arm/arm32/system.h > @@ -7,8 +7,8 @@ > #define wfi() __asm__ __volatile__ ("wfi" : : : "memory") > > #define isb() __asm__ __volatile__ ("isb" : : : "memory") > -#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") > -#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") > +#define dsb() __asm__ __volatile__ ("dsb sy" : : : "memory") > +#define dmb() __asm__ __volatile__ ("dmb sy" : : : "memory") > > #define mb() dsb() > #define rmb() dsb() > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index 41e9eff..cd38956 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -185,7 +185,7 @@ typedef union { > /* Standard entry type that we''ll use to build Xen''s own pagetables. > * We put the same permissions at every level, because they''re ignored > * by the walker in non-leaf entries. */ > -static inline lpae_t mfn_to_xen_entry(unsigned long mfn) > +static inline lpae_t mfn_to_xen_entry(unsigned long mfn, unsigned attr) > { > paddr_t pa = ((paddr_t) mfn) << PAGE_SHIFT; > lpae_t e = (lpae_t) { > @@ -193,10 +193,9 @@ static inline lpae_t mfn_to_xen_entry(unsigned long mfn) > .xn = 1, /* No need to execute outside .text */ > .ng = 1, /* Makes TLB flushes easier */ > .af = 1, /* No need for access tracking */ > - .sh = LPAE_SH_OUTER, /* Xen mappings are globally coherent */ > .ns = 1, /* Hyp mode is in the non-secure world */ > .user = 1, /* See below */ > - .ai = WRITEALLOC, > + .ai = attr, > .table = 0, /* Set to 1 for links and 4k maps */ > .valid = 1, /* Mappings are present */ > }};; > @@ -205,6 +204,37 @@ static inline lpae_t mfn_to_xen_entry(unsigned long mfn) > * pagetables un User mode it''s OK. If this changes, remember > * to update the hard-coded values in head.S too */ > > + switch ( attr ) > + { > + case BUFFERABLE: > + /* > + * ARM ARM: Overlaying the shareability attribute (B3-1376 to 1377) > + * > + * A memory region with a resultant memory type attribute of Normal, > + * and a resultant cacheability attribute of Inner Non-cacheable, > + * Outer Non-cacheable, must have a resultant shareability attribute > + * of Outer Shareable, otherwise shareability is UNPREDICTABLE. > + * > + * On ARMv8 sharability is ignored and explicitly treated as Outer > + * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable. > + */ > + e.pt.sh = LPAE_SH_OUTER; > + break; > + case UNCACHED: > + case DEV_SHARED: > + /* Shareability is ignored for non-Normal memory, Outer is as > + * good as anything. > + * > + * On ARMv8 sharability is ignored and explicitly treated as Outer > + * Shareable for any device memory type. > + */ > + e.pt.sh = LPAE_SH_OUTER; > + break; > + default: > + e.pt.sh = LPAE_SH_INNER; /* Xen mappings are SMP coherent */ > + break; > + } > + > ASSERT(!(pa & ~PAGE_MASK)); > ASSERT(!(pa & ~PADDR_MASK)); > > @@ -219,7 +249,7 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) > lpae_t e = (lpae_t) { > .p2m.xn = 0, > .p2m.af = 1, > - .p2m.sh = LPAE_SH_OUTER, > + .p2m.sh = LPAE_SH_INNER, > .p2m.write = 1, > .p2m.read = 1, > .p2m.mattr = mattr, > -- > 1.7.2.5 >
Ian Campbell
2013-Jul-01 15:42 UTC
Re: [PATCH 01/10] xen: arm: map memory as inner shareable.
On Mon, 2013-07-01 at 16:39 +0100, Stefano Stabellini wrote:> On Fri, 28 Jun 2013, Ian Campbell wrote: > > The inner shareable domain contains all SMP processors, including different > > clusters (e.g. big.LITTLE). Therefore this is the correct thing to use for Xen > > memory mappings. The outer shareable domain is for devices on busses which are > > barriers (e.g. AMBA4). While the system domain is for things behind bridges > > which do not. > > > > One wrinkle is that Normal memory with attributes Inner Non-cacheable, Outer > > Non-cacheable (which we call BUFFERABLE) must be mapped Outer Shareable on ARM > > v7. Therefore change the prototype of mfn_to_xen_entry to take the attribute > > index so we can DTRT. On ARMv8 the sharability is ignored and considered to > > always be Outer Shareable. > > > > While I''m here change all the dmb/dsb with an implicit sy to an explicit sy, > > to make future changes simpler. Other than that don''t adjust the barriers, > > flushes etc, those remain as they were (which is more than is now required). > > I''ll change those in a later patch. > > > > Many thanks to Leif for explaining the difference between Inner- and > > Outer-Shareable in words of two or less syllables, I hope I''ve replicated that > > explanation properly above! > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Cc: Leif Lindholm <leif.lindholm@linaro.org> > > It looks OK.Thanks.> I would have kept the dsb sy changes separate.So would I if I''d have know then that it was going end up being so many patches. I''ll split them out when I resend.
Leif Lindholm
2013-Jul-02 14:09 UTC
Re: [PATCH 01/10] xen: arm: map memory as inner shareable.
On Fri, Jun 28, 2013 at 05:10:47PM +0100, Ian Campbell wrote:> The inner shareable domain contains all SMP processors, including different > clusters (e.g. big.LITTLE). Therefore this is the correct thing to use for Xen > memory mappings. The outer shareable domain is for devices on busses which are > barriers (e.g. AMBA4).I think this should say something like "which are coherent and barrier-aware". And to be technically correct, the example should say "AMBA4 AXI with ACE").> While the system domain is for things behind bridges > which do not.And given the above ... -> which "are" not.> One wrinkle is that Normal memory with attributes Inner Non-cacheable, Outer > Non-cacheable (which we call BUFFERABLE) must be mapped Outer Shareable on ARM > v7. Therefore change the prototype of mfn_to_xen_entry to take the attribute > index so we can DTRT. On ARMv8 the sharability is ignored and considered to > always be Outer Shareable. > > While I''m here change all the dmb/dsb with an implicit sy to an explicit sy, > to make future changes simpler. Other than that don''t adjust the barriers, > flushes etc, those remain as they were (which is more than is now required). > I''ll change those in a later patch. > > Many thanks to Leif for explaining the difference between Inner- and > Outer-Shareable in words of two or less syllables, I hope I''ve replicated that > explanation properly above!Apart from my usual nitpicking, indeed :)> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org> > --- > xen/arch/arm/arm32/head.S | 8 +++--- > xen/arch/arm/arm64/head.S | 8 +++--- > xen/arch/arm/mm.c | 24 ++++++++++------------ > xen/include/asm-arm/arm32/system.h | 4 +- > xen/include/asm-arm/page.h | 38 ++++++++++++++++++++++++++++++++--- > 5 files changed, 55 insertions(+), 27 deletions(-) > > diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S > index 0588d54..464c351 100644 > --- a/xen/arch/arm/arm32/head.S > +++ b/xen/arch/arm/arm32/head.S > @@ -24,8 +24,8 @@ > > #define ZIMAGE_MAGIC_NUMBER 0x016f2818 > > -#define PT_PT 0xe7f /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=1 P=1 */ > -#define PT_MEM 0xe7d /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=0 P=1 */ > +#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */ > +#define PT_MEM 0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */ > #define PT_DEV 0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */ > #define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */ > > @@ -206,10 +206,10 @@ skip_bss: > mcr CP32(r1, HMAIR1) > > /* Set up the HTCR: > - * PT walks use Outer-Shareable accesses, > + * PT walks use Inner-Shareable accesses, > * PT walks are write-back, write-allocate in both cache levels, > * Full 32-bit address space goes through this table. */ > - ldr r0, =0x80002500 > + ldr r0, =0x80003500 > mcr CP32(r0, HTCR) > > /* Set up the HSCTLR: > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S > index 21b7e4d..ffcb880 100644 > --- a/xen/arch/arm/arm64/head.S > +++ b/xen/arch/arm/arm64/head.S > @@ -24,8 +24,8 @@ > #include <asm/page.h> > #include <asm/asm_defns.h> > > -#define PT_PT 0xe7f /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=1 P=1 */ > -#define PT_MEM 0xe7d /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=111 T=0 P=1 */ > +#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */ > +#define PT_MEM 0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */ > #define PT_DEV 0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */ > #define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */ > > @@ -178,10 +178,10 @@ skip_bss: > /* Set up the HTCR: > * PASize -- 4G > * Top byte is used > - * PT walks use Outer-Shareable accesses, > + * PT walks use Inner-Shareable accesses, > * PT walks are write-back, write-allocate in both cache levels, > * Full 64-bit address space goes through this table. */ > - ldr x0, =0x80802500 > + ldr x0, =0x80803500 > msr tcr_el2, x0 > > /* Set up the HSCTLR: > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index d1290cd..c5213f2 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -163,9 +163,8 @@ void dump_hyp_walk(vaddr_t addr) > /* Map a 4k page in a fixmap entry */ > void set_fixmap(unsigned map, unsigned long mfn, unsigned attributes) > { > - lpae_t pte = mfn_to_xen_entry(mfn); > + lpae_t pte = mfn_to_xen_entry(mfn, attributes); > pte.pt.table = 1; /* 4k mappings always have this bit set */ > - pte.pt.ai = attributes; > pte.pt.xn = 1; > write_pte(xen_fixmap + third_table_offset(FIXMAP_ADDR(map)), pte); > flush_xen_data_tlb_range_va(FIXMAP_ADDR(map), PAGE_SIZE); > @@ -212,7 +211,7 @@ void *map_domain_page(unsigned long mfn) > if ( map[slot].pt.avail == 0 ) > { > /* Commandeer this 2MB slot */ > - pte = mfn_to_xen_entry(slot_mfn); > + pte = mfn_to_xen_entry(slot_mfn, WRITEALLOC); > pte.pt.avail = 1; > write_pte(map + slot, pte); > break; > @@ -340,7 +339,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > /* Map the destination in the boot misc area. */ > dest_va = BOOT_MISC_VIRT_START; > - pte = mfn_to_xen_entry(xen_paddr >> PAGE_SHIFT); > + pte = mfn_to_xen_entry(xen_paddr >> PAGE_SHIFT, WRITEALLOC); > write_pte(xen_second + second_table_offset(dest_va), pte); > flush_xen_data_tlb_range_va(dest_va, SECOND_SIZE); > > @@ -387,7 +386,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > > /* Link in the fixmap pagetable */ > pte = mfn_to_xen_entry((((unsigned long) xen_fixmap) + phys_offset) > - >> PAGE_SHIFT); > + >> PAGE_SHIFT, WRITEALLOC); > pte.pt.table = 1; > write_pte(xen_second + second_table_offset(FIXMAP_ADDR(0)), pte); > /* > @@ -402,7 +401,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > unsigned long va = XEN_VIRT_START + (i << PAGE_SHIFT); > if ( !is_kernel(va) ) > break; > - pte = mfn_to_xen_entry(mfn); > + pte = mfn_to_xen_entry(mfn, WRITEALLOC); > pte.pt.table = 1; /* 4k mappings always have this bit set */ > if ( is_kernel_text(va) || is_kernel_inittext(va) ) > { > @@ -415,7 +414,7 @@ void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr) > /* No flush required here as page table is not hooked in yet. */ > } > pte = mfn_to_xen_entry((((unsigned long) xen_xenmap) + phys_offset) > - >> PAGE_SHIFT); > + >> PAGE_SHIFT, WRITEALLOC); > pte.pt.table = 1; > write_pte(xen_second + second_linear_offset(XEN_VIRT_START), pte); > /* TLBFLUSH and ISB would be needed here, but wait until we set WXN */ > @@ -467,7 +466,7 @@ int init_secondary_pagetables(int cpu) > /* Initialise first pagetable from first level of boot tables, and > * hook into the new root. */ > memcpy(first, boot_first, PAGE_SIZE); > - pte = mfn_to_xen_entry(virt_to_mfn(first)); > + pte = mfn_to_xen_entry(virt_to_mfn(first), WRITEALLOC); > pte.pt.table = 1; > write_pte(root, pte); > #endif > @@ -479,7 +478,7 @@ int init_secondary_pagetables(int cpu) > * domheap mapping pages. */ > for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ ) > { > - pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES)); > + pte = mfn_to_xen_entry(virt_to_mfn(domheap+i*LPAE_ENTRIES), WRITEALLOC); > pte.pt.table = 1; > write_pte(&first[first_table_offset(DOMHEAP_VIRT_START+i*FIRST_SIZE)], pte); > } > @@ -525,7 +524,7 @@ static void __init create_mappings(unsigned long virt, > > count = nr_mfns / LPAE_ENTRIES; > p = xen_second + second_linear_offset(virt); > - pte = mfn_to_xen_entry(base_mfn); > + pte = mfn_to_xen_entry(base_mfn, WRITEALLOC); > pte.pt.hint = 1; /* These maps are in 16-entry contiguous chunks. */ > for ( i = 0; i < count; i++ ) > { > @@ -595,7 +594,7 @@ static int create_xen_table(lpae_t *entry) > if ( p == NULL ) > return -ENOMEM; > clear_page(p); > - pte = mfn_to_xen_entry(virt_to_mfn(p)); > + pte = mfn_to_xen_entry(virt_to_mfn(p), WRITEALLOC); > pte.pt.table = 1; > write_pte(entry, pte); > return 0; > @@ -641,9 +640,8 @@ static int create_xen_entries(enum xenmap_operation op, > addr, mfn); > return -EINVAL; > } > - pte = mfn_to_xen_entry(mfn); > + pte = mfn_to_xen_entry(mfn, ai); > pte.pt.table = 1; > - pte.pt.ai = ai; > write_pte(&third[third_table_offset(addr)], pte); > break; > case REMOVE: > diff --git a/xen/include/asm-arm/arm32/system.h b/xen/include/asm-arm/arm32/system.h > index 60148cb..b3736f4 100644 > --- a/xen/include/asm-arm/arm32/system.h > +++ b/xen/include/asm-arm/arm32/system.h > @@ -7,8 +7,8 @@ > #define wfi() __asm__ __volatile__ ("wfi" : : : "memory") > > #define isb() __asm__ __volatile__ ("isb" : : : "memory") > -#define dsb() __asm__ __volatile__ ("dsb" : : : "memory") > -#define dmb() __asm__ __volatile__ ("dmb" : : : "memory") > +#define dsb() __asm__ __volatile__ ("dsb sy" : : : "memory") > +#define dmb() __asm__ __volatile__ ("dmb sy" : : : "memory") > > #define mb() dsb() > #define rmb() dsb() > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index 41e9eff..cd38956 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -185,7 +185,7 @@ typedef union { > /* Standard entry type that we''ll use to build Xen''s own pagetables. > * We put the same permissions at every level, because they''re ignored > * by the walker in non-leaf entries. */ > -static inline lpae_t mfn_to_xen_entry(unsigned long mfn) > +static inline lpae_t mfn_to_xen_entry(unsigned long mfn, unsigned attr) > { > paddr_t pa = ((paddr_t) mfn) << PAGE_SHIFT; > lpae_t e = (lpae_t) { > @@ -193,10 +193,9 @@ static inline lpae_t mfn_to_xen_entry(unsigned long mfn) > .xn = 1, /* No need to execute outside .text */ > .ng = 1, /* Makes TLB flushes easier */ > .af = 1, /* No need for access tracking */ > - .sh = LPAE_SH_OUTER, /* Xen mappings are globally coherent */ > .ns = 1, /* Hyp mode is in the non-secure world */ > .user = 1, /* See below */ > - .ai = WRITEALLOC, > + .ai = attr, > .table = 0, /* Set to 1 for links and 4k maps */ > .valid = 1, /* Mappings are present */ > }};; > @@ -205,6 +204,37 @@ static inline lpae_t mfn_to_xen_entry(unsigned long mfn) > * pagetables un User mode it''s OK. If this changes, remember > * to update the hard-coded values in head.S too */ > > + switch ( attr ) > + { > + case BUFFERABLE: > + /* > + * ARM ARM: Overlaying the shareability attribute (B3-1376 to 1377)It would be worth to indicate the revision of the ARM ARM here (in this instance DDI 0406C.b).> + * > + * A memory region with a resultant memory type attribute of Normal, > + * and a resultant cacheability attribute of Inner Non-cacheable, > + * Outer Non-cacheable, must have a resultant shareability attribute > + * of Outer Shareable, otherwise shareability is UNPREDICTABLE. > + * > + * On ARMv8 sharability is ignored and explicitly treated as Outer > + * Shareable for Normal Inner Non_cacheable, Outer Non-cacheable. > + */ > + e.pt.sh = LPAE_SH_OUTER; > + break; > + case UNCACHED: > + case DEV_SHARED: > + /* Shareability is ignored for non-Normal memory, Outer is as > + * good as anything. > + * > + * On ARMv8 sharability is ignored and explicitly treated as Outer > + * Shareable for any device memory type. > + */ > + e.pt.sh = LPAE_SH_OUTER; > + break; > + default: > + e.pt.sh = LPAE_SH_INNER; /* Xen mappings are SMP coherent */ > + break; > + } > + > ASSERT(!(pa & ~PAGE_MASK)); > ASSERT(!(pa & ~PADDR_MASK)); > > @@ -219,7 +249,7 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) > lpae_t e = (lpae_t) { > .p2m.xn = 0, > .p2m.af = 1, > - .p2m.sh = LPAE_SH_OUTER, > + .p2m.sh = LPAE_SH_INNER, > .p2m.write = 1, > .p2m.read = 1, > .p2m.mattr = mattr, > -- > 1.7.2.5 >
Ian Campbell
2013-Jul-02 14:26 UTC
Re: [PATCH 01/10] xen: arm: map memory as inner shareable.
On Tue, 2013-07-02 at 16:09 +0200, Leif Lindholm wrote:> On Fri, Jun 28, 2013 at 05:10:47PM +0100, Ian Campbell wrote: > > The inner shareable domain contains all SMP processors, including different > > clusters (e.g. big.LITTLE). Therefore this is the correct thing to use for Xen > > memory mappings. The outer shareable domain is for devices on busses which are > > barriers (e.g. AMBA4). > > I think this should say something like "which are coherent and > barrier-aware". > > And to be technically correct, the example should say "AMBA4 AXI with > ACE"). > > > While the system domain is for things behind bridges > which do not. > > And given the above ... -> which "are" not. > > > One wrinkle is that Normal memory with attributes Inner Non-cacheable, Outer > > Non-cacheable (which we call BUFFERABLE) must be mapped Outer Shareable on ARM > > v7. Therefore change the prototype of mfn_to_xen_entry to take the attribute > > index so we can DTRT. On ARMv8 the sharability is ignored and considered to > > always be Outer Shareable. > > > > While I''m here change all the dmb/dsb with an implicit sy to an explicit sy, > > to make future changes simpler. Other than that don''t adjust the barriers, > > flushes etc, those remain as they were (which is more than is now required). > > I''ll change those in a later patch. > > > > Many thanks to Leif for explaining the difference between Inner- and > > Outer-Shareable in words of two or less syllables, I hope I''ve replicated that > > explanation properly above! > > Apart from my usual nitpicking, indeed :)Thanks, I was sure I was playing a bit fast and loose with the specifics!> [...] > > + switch ( attr ) > > + { > > + case BUFFERABLE: > > + /* > > + * ARM ARM: Overlaying the shareability attribute (B3-1376 to 1377) > > It would be worth to indicate the revision of the ARM ARM here (in this > instance DDI 0406C.b).Good idea, thanks. Ian.
Tim Deegan
2013-Jul-04 10:58 UTC
Re: [PATCH 01/10] xen: arm: map memory as inner shareable.
At 17:10 +0100 on 28 Jun (1372439447), Ian Campbell wrote:> @@ -219,7 +249,7 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) > lpae_t e = (lpae_t) { > .p2m.xn = 0, > .p2m.af = 1, > - .p2m.sh = LPAE_SH_OUTER, > + .p2m.sh = LPAE_SH_INNER, > .p2m.write = 1, > .p2m.read = 1, > .p2m.mattr = mattr,I think we need to leave MMIO mappings outer-shareable (or full-system as appropriate), which means adding the same mechanism that you did for the xen PT mappings. Otherwise, this looks fine. Cheers, Tim.
Tim Deegan
2013-Jul-04 10:58 UTC
Re: [PATCH 02/10] xen: arm: Only upgrade guest barriers to inner shareable.
At 17:10 +0100 on 28 Jun (1372439448), Ian Campbell wrote:> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Tim Deegan <tim@xen.org>> --- > xen/arch/arm/traps.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 398d209..e40ae2e 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -62,7 +62,7 @@ void __cpuinit init_traps(void) > WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2); > > /* Setup hypervisor traps */ > - WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); > + WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2); > isb(); > }
Ian Campbell
2013-Jul-04 11:03 UTC
Re: [PATCH 01/10] xen: arm: map memory as inner shareable.
On Thu, 2013-07-04 at 11:58 +0100, Tim Deegan wrote:> At 17:10 +0100 on 28 Jun (1372439447), Ian Campbell wrote: > > @@ -219,7 +249,7 @@ static inline lpae_t mfn_to_p2m_entry(unsigned long mfn, unsigned int mattr) > > lpae_t e = (lpae_t) { > > .p2m.xn = 0, > > .p2m.af = 1, > > - .p2m.sh = LPAE_SH_OUTER, > > + .p2m.sh = LPAE_SH_INNER, > > .p2m.write = 1, > > .p2m.read = 1, > > .p2m.mattr = mattr, > > I think we need to leave MMIO mappings outer-shareable (or full-system > as appropriate), which means adding the same mechanism that you did for > the xen PT mappings.Yes, good catch. THanks> > Otherwise, this looks fine. > > Cheers, > > Tim. >
Tim Deegan
2013-Jul-04 11:07 UTC
Re: [PATCH 03/10] xen: arm: reduce instruction cache and tlb flushes to inner-shareable.
At 17:10 +0100 on 28 Jun (1372439449), Ian Campbell wrote:> Now that Xen maps memory and performs pagetable walks as inner shareable we > don''t need to push updates down so far when modifying page tables etc. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>> --- a/xen/include/asm-arm/arm32/page.h > +++ b/xen/include/asm-arm/arm32/page.h > @@ -39,8 +39,8 @@ static inline void flush_xen_text_tlb(void) > asm volatile ( > "isb;" /* Ensure synchronization with previous changes to text */ > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > - STORE_CP32(0, ICIALLU) /* Flush I-cache */ > - STORE_CP32(0, BPIALL) /* Flush branch predictor */ > + STORE_CP32(0, ICIALLUIS) /* Flush I-cache */ > + STORE_CP32(0, BPIALLIS) /* Flush branch predictor */ > "dsb;" /* Ensure completion of TLB+BP flush */ > "isb;" > : : "r" (r0) /*dummy*/ : "memory"); > @@ -54,7 +54,7 @@ static inline void flush_xen_data_tlb(void) > { > register unsigned long r0 asm ("r0"); > asm volatile("dsb;" /* Ensure preceding are visible */ > - STORE_CP32(0, TLBIALLH) > + STORE_CP32(0, TLBIALLHIS) > "dsb;" /* Ensure completion of the TLB flush */ > "isb;" > : : "r" (r0) /* dummy */: "memory"); > @@ -69,7 +69,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s > unsigned long end = va + size; > dsb(); /* Ensure preceding are visible */ > while ( va < end ) { > - asm volatile(STORE_CP32(0, TLBIMVAH) > + asm volatile(STORE_CP32(0, TLBIMVAHIS) > : : "r" (va) : "memory"); > va += PAGE_SIZE; > }That''s OK for actual Xen data mappings, map_domain_page() &c., but now set_fixmap() and clear_fixmap() need to use a stronger flush whenever they map device memory. The same goes for create_xen_entries() when ai != WRITEALLOC.> diff --git a/xen/include/asm-arm/arm64/flushtlb.h b/xen/include/asm-arm/arm64/flushtlb.h > index d0535a0..3a6d2cb 100644 > --- a/xen/include/asm-arm/arm64/flushtlb.h > +++ b/xen/include/asm-arm/arm64/flushtlb.h > @@ -6,7 +6,7 @@ static inline void flush_tlb_local(void) > { > asm volatile( > "dsb sy;" > - "tlbi vmalle1;" > + "tlbi vmalle1is;" > "dsb sy;" > "isb;" > : : : "memory"); > @@ -17,7 +17,7 @@ static inline void flush_tlb_all_local(void) > { > asm volatile( > "dsb sy;" > - "tlbi alle1;" > + "tlbi alle1is;" > "dsb sy;" > "isb;" > : : : "memory");Might these need to be stronger if we''re using them on context switch and guests have MMIO/outer-shareable mappings? Tim.
Tim Deegan
2013-Jul-04 11:07 UTC
Re: [PATCH 04/10] xen: arm: consolidate barrier definitions
At 17:10 +0100 on 28 Jun (1372439450), Ian Campbell wrote:> These are effectively identical on both 32- and 64-bit. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan
2013-Jul-04 11:19 UTC
Re: [PATCH 03/10] xen: arm: reduce instruction cache and tlb flushes to inner-shareable.
At 12:07 +0100 on 04 Jul (1372939621), Tim Deegan wrote:> At 17:10 +0100 on 28 Jun (1372439449), Ian Campbell wrote: > > Now that Xen maps memory and performs pagetable walks as inner shareable we > > don''t need to push updates down so far when modifying page tables etc. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > --- a/xen/include/asm-arm/arm32/page.h > > +++ b/xen/include/asm-arm/arm32/page.h > > @@ -39,8 +39,8 @@ static inline void flush_xen_text_tlb(void) > > asm volatile ( > > "isb;" /* Ensure synchronization with previous changes to text */ > > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > > - STORE_CP32(0, ICIALLU) /* Flush I-cache */ > > - STORE_CP32(0, BPIALL) /* Flush branch predictor */ > > + STORE_CP32(0, ICIALLUIS) /* Flush I-cache */ > > + STORE_CP32(0, BPIALLIS) /* Flush branch predictor */ > > "dsb;" /* Ensure completion of TLB+BP flush */ > > "isb;" > > : : "r" (r0) /*dummy*/ : "memory"); > > @@ -54,7 +54,7 @@ static inline void flush_xen_data_tlb(void) > > { > > register unsigned long r0 asm ("r0"); > > asm volatile("dsb;" /* Ensure preceding are visible */ > > - STORE_CP32(0, TLBIALLH) > > + STORE_CP32(0, TLBIALLHIS) > > "dsb;" /* Ensure completion of the TLB flush */ > > "isb;" > > : : "r" (r0) /* dummy */: "memory"); > > @@ -69,7 +69,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s > > unsigned long end = va + size; > > dsb(); /* Ensure preceding are visible */ > > while ( va < end ) { > > - asm volatile(STORE_CP32(0, TLBIMVAH) > > + asm volatile(STORE_CP32(0, TLBIMVAHIS) > > : : "r" (va) : "memory"); > > va += PAGE_SIZE; > > } > > That''s OK for actual Xen data mappings, map_domain_page() &c., but now > set_fixmap() and clear_fixmap() need to use a stronger flush whenever > they map device memory. The same goes for create_xen_entries() when > ai != WRITEALLOC.Ian has pointed out that this is actually making the flushes _stronger_ (and that in general the TLB flush operations need a bit of attention). So I suggest that we drop the TLB-flush parts of this patch for now, and address that whole area separately. In the meantime, the cache-flush parts are Acked-by: Tim Deegan <tim@xen.org>. Cheers, Tim.
Tim Deegan
2013-Jul-04 11:21 UTC
Re: [PATCH 03/10] xen: arm: reduce instruction cache and tlb flushes to inner-shareable.
At 12:19 +0100 on 04 Jul (1372940342), Tim Deegan wrote:> At 12:07 +0100 on 04 Jul (1372939621), Tim Deegan wrote: > > At 17:10 +0100 on 28 Jun (1372439449), Ian Campbell wrote: > > > Now that Xen maps memory and performs pagetable walks as inner shareable we > > > don''t need to push updates down so far when modifying page tables etc. > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > > > --- a/xen/include/asm-arm/arm32/page.h > > > +++ b/xen/include/asm-arm/arm32/page.h > > > @@ -39,8 +39,8 @@ static inline void flush_xen_text_tlb(void) > > > asm volatile ( > > > "isb;" /* Ensure synchronization with previous changes to text */ > > > STORE_CP32(0, TLBIALLH) /* Flush hypervisor TLB */ > > > - STORE_CP32(0, ICIALLU) /* Flush I-cache */ > > > - STORE_CP32(0, BPIALL) /* Flush branch predictor */ > > > + STORE_CP32(0, ICIALLUIS) /* Flush I-cache */ > > > + STORE_CP32(0, BPIALLIS) /* Flush branch predictor */ > > > "dsb;" /* Ensure completion of TLB+BP flush */ > > > "isb;" > > > : : "r" (r0) /*dummy*/ : "memory"); > > > @@ -54,7 +54,7 @@ static inline void flush_xen_data_tlb(void) > > > { > > > register unsigned long r0 asm ("r0"); > > > asm volatile("dsb;" /* Ensure preceding are visible */ > > > - STORE_CP32(0, TLBIALLH) > > > + STORE_CP32(0, TLBIALLHIS) > > > "dsb;" /* Ensure completion of the TLB flush */ > > > "isb;" > > > : : "r" (r0) /* dummy */: "memory"); > > > @@ -69,7 +69,7 @@ static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long s > > > unsigned long end = va + size; > > > dsb(); /* Ensure preceding are visible */ > > > while ( va < end ) { > > > - asm volatile(STORE_CP32(0, TLBIMVAH) > > > + asm volatile(STORE_CP32(0, TLBIMVAHIS) > > > : : "r" (va) : "memory"); > > > va += PAGE_SIZE; > > > } > > > > That''s OK for actual Xen data mappings, map_domain_page() &c., but now > > set_fixmap() and clear_fixmap() need to use a stronger flush whenever > > they map device memory. The same goes for create_xen_entries() when > > ai != WRITEALLOC. > > Ian has pointed out that this is actually making the flushes _stronger_ > (and that in general the TLB flush operations need a bit of attention). > > So I suggest that we drop the TLB-flush parts of this patch for now, and > address that whole area separately. In the meantime, the cache-flush > parts are Acked-by: Tim Deegan <tim@xen.org>.Wait, no - that made no sense. :) I retract the ack, and let''s start again with the TLB-flush ops. I don''t think anything else in this series relies on this patch. Tim.
Tim Deegan
2013-Jul-04 11:26 UTC
Re: [PATCH 05/10] xen: use SMP barrier in common code dealing with shared memory protocols
At 17:10 +0100 on 28 Jun (1372439451), Ian Campbell wrote:> Xen currently makes no strong distinction between the SMP barriers (smp_mb > etc) and the regular barrier (mb etc). In Linux, where we inherited these > names from having imported Linux code which uses them, the SMP barriers are > intended to be sufficient for implementing shared-memory protocols between > processors in an SMP system while the standard barriers are useful for MMIO > etc. > > On x86 with the stronger ordering model there is not much practical difference > here but ARM has weaker barriers available which are suitable for use as SMP > barriers. > > Therefore ensure that common code uses the SMP barriers when that is all which > is required. > > On both ARM and x86 both types of barrier are currently identical so there is > no actual change. A future patch will change smp_mb to a weaker barrier on > ARM. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Reviewed-by: Tim Deegan <tim@xen.org>
Tim Deegan
2013-Jul-04 11:30 UTC
Re: [PATCH 06/10] xen: arm: Use SMP barriers when that is all which is required.
At 16:24 +0100 on 01 Jul (1372695881), Ian Campbell wrote:> On Mon, 2013-07-01 at 16:19 +0100, Stefano Stabellini wrote: > > On Fri, 28 Jun 2013, Ian Campbell wrote: > > > As explained in the previous commit SMP barriers can be used when all we care > > > about is synchronising against other processors. > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > --- > > > xen/arch/arm/mm.c | 2 +- > > > xen/arch/arm/smpboot.c | 4 ++-- > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > > index c5213f2..3f049cb 100644 > > > --- a/xen/arch/arm/mm.c > > > +++ b/xen/arch/arm/mm.c > > > @@ -776,7 +776,7 @@ void share_xen_page_with_guest(struct page_info *page, > > > page->u.inuse.type_info |= PGT_validated | 1; > > > > > > page_set_owner(page, d); > > > - wmb(); /* install valid domain ptr before updating refcnt. */ > > > + smp_wmb(); /* install valid domain ptr before updating refcnt. */ > > > ASSERT((page->count_info & ~PGC_xen_heap) == 0); > > > > > > /* Only add to the allocation list if the domain isn''t dying. */ > > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > > > index 8011987..727e09f 100644 > > > --- a/xen/arch/arm/smpboot.c > > > +++ b/xen/arch/arm/smpboot.c > > > @@ -170,11 +170,11 @@ void __cpuinit start_secondary(unsigned long boot_phys_offset, > > > > > > /* Run local notifiers */ > > > notify_cpu_starting(cpuid); > > > - wmb(); > > > + smp_wmb(); > > > > > > /* Now report this CPU is up */ > > > cpumask_set_cpu(cpuid, &cpu_online_map); > > > - wmb(); > > > + smp_wmb(); > > > > > > local_irq_enable(); > > > > Did you missed few mb() in smpboot.c? > > The ones in __cpu_disable and __cpu_die? > > I think I just wasn''t 100% sure they might not be touching hardware > (i.e. some platform register to shutdown a CPU) and since they weren''t > performance critical I punted on them. > > Looking it again the first half of that logic seems to be bogus (that > code goes nowhere near any peripheral).Yes, I think they can both be smp_mb(). With or without that change, Acked-by: Tim Deegan <tim@xen.org>
At 17:10 +0100 on 28 Jun (1372439453), Ian Campbell wrote:> The full power of dsb is not required in this context. > > Also change wmb() to be dsb() directly instead of indirectly via mb(), for > clarity. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan
2013-Jul-04 11:35 UTC
Re: [PATCH 09/10] xen: arm: weaken SMP barriers to inner shareable.
At 17:10 +0100 on 28 Jun (1372439455), Ian Campbell wrote:> Since all processors are in the inner-shareable domain and we map everything > that way this is sufficient. > > The non-SMP barriers remain full system. Although in principal they coulds/principal/principle/.> become outer shareable barriers for some hardware this would require us to > know which class a given device is. Given the small number of device drivers > in Xen itself its probably not worth worrying over, although maybe someone > will benchmark at some point. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan
2013-Jul-04 11:42 UTC
Re: [PATCH 10/10] xen: arm: use more specific barriers for read and write barriers.
At 17:10 +0100 on 28 Jun (1372439456), Ian Campbell wrote:> Note that 32-bit does not provide a load variant of the inner shareable > barrier, so that remains a full any-any barrier. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > xen/include/asm-arm/system.h | 13 +++++++++---- > 1 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > index 7c3e42d..efaf645 100644 > --- a/xen/include/asm-arm/system.h > +++ b/xen/include/asm-arm/system.h > @@ -17,12 +17,17 @@ > #define dmb(scope) asm volatile("dmb " scope : : : "memory") > > #define mb() dsb("sy") > -#define rmb() dsb("sy") > -#define wmb() dsb("sy") > +#define rmb() dsb("ld")This doesn''t exist on arm32; it''ll have to be dsb("sy") there, just like you''ve done for smb_rmb() below. With that change, Acked-by: Tim Deegan <tim@xen.org>> +#define wmb() dsb("st") > > #define smp_mb() dmb("ish") > -#define smp_rmb() dmb("ish") > -#define smp_wmb() dmb("ish") > +#ifdef CONFIG_ARM_64 > +#define smp_rmb() dmb("ishld") > +#else > +#define smp_rmb() dmb("ish") /* 32-bit has no ishld variant. */ > +#endif > + > +#define smp_wmb() dmb("ishst") > > #define xchg(ptr,x) \ > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > -- > 1.7.2.5 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Jul-04 11:46 UTC
Re: [PATCH 10/10] xen: arm: use more specific barriers for read and write barriers.
On Thu, 2013-07-04 at 12:42 +0100, Tim Deegan wrote:> At 17:10 +0100 on 28 Jun (1372439456), Ian Campbell wrote: > > Note that 32-bit does not provide a load variant of the inner shareable > > barrier, so that remains a full any-any barrier. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > --- > > xen/include/asm-arm/system.h | 13 +++++++++---- > > 1 files changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h > > index 7c3e42d..efaf645 100644 > > --- a/xen/include/asm-arm/system.h > > +++ b/xen/include/asm-arm/system.h > > @@ -17,12 +17,17 @@ > > #define dmb(scope) asm volatile("dmb " scope : : : "memory") > > > > #define mb() dsb("sy") > > -#define rmb() dsb("sy") > > -#define wmb() dsb("sy") > > +#define rmb() dsb("ld") > > This doesn''t exist on arm32; it''ll have to be dsb("sy") there, just like > you''ve done for smb_rmb() below.Right, suggests that rmb isn''t actually used anywhere (since I''m sure I compile tested) but lets at least make it correct...> > With that change, Acked-by: Tim Deegan <tim@xen.org>Thanks, Ian.> > > +#define wmb() dsb("st") > > > > #define smp_mb() dmb("ish") > > -#define smp_rmb() dmb("ish") > > -#define smp_wmb() dmb("ish") > > +#ifdef CONFIG_ARM_64 > > +#define smp_rmb() dmb("ishld") > > +#else > > +#define smp_rmb() dmb("ish") /* 32-bit has no ishld variant. */ > > +#endif > > + > > +#define smp_wmb() dmb("ishst") > > > > #define xchg(ptr,x) \ > > ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) > > -- > > 1.7.2.5 > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel