Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 0 of 7] x86: lay groundwork for Xen domain 0 support
Hi Ingo, This series begins to lay the groundwork for Xen domain 0 support. Domain 0 is much like a normal Xen domain, but it is also allowed to have direct access to the underlying hardware (including both IO space and various BIOS tables), so it can run device drivers, etc. This means that we need to be able to distinguish between a Xen domain''s normal pseudo-physical address space, and the real machine physical address space. This series: - Adds a new pte flag - _PAGE_IOMAP - used to indicate that a mapping is intended to map an IO device, and the physical address is actually a real machine physical address rather than a pseudo-physical address. This is exposed as PAGE_KERNEL_IO and so on. ioremap() and early_ioremap() are modified to set this flag, as they (should) always be used to map IO devices and not other memory. By default __supported_pte_mask masks this flag out, so it won''t end up in the final pagetable. The Xen (and any other) code which cares about this flag unmask it from __supported_pte_mask. - Add early_memremap() to deal with the cases where early_ioremap() actually being used to map normal memory. - Remove a bogus check in x86-64''s implementation of set_pte_vaddr_pud(), which prevents memory mappings from being updated. - Convert __acpi_map_table to use early_ioremap(), rather than have its own implementation. - Make __acpi_map_table always map the memory rather than assuming that the linear map maps some of the acpi tables. This won''t be true in the virtual case, and always mapping doesn''t hurt in the native case. I''ve tested these patches on both 32 and 64 native booting, and it all works for me. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Use one of the software-defined PTE bits to indicate that a mapping is intended for an IO address. On native hardware this is irrelevent, since a physical address is a physical address. But in a virtual environment, physical addresses are also virtualized, so there needs to be some way to distinguish between pseudo-physical addresses and actual hardware addresses; _PAGE_IOMAP indicates this intent. By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn''t even appear in the final pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/mm/init_32.c | 2 +- arch/x86/mm/init_64.c | 2 +- arch/x86/mm/ioremap.c | 8 ++++---- include/asm-x86/fixmap.h | 2 +- include/asm-x86/pgtable.h | 14 ++++++++++++-- 5 files changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -503,7 +503,7 @@ int nx_enabled; -pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL); +pteval_t __supported_pte_mask __read_mostly = ~(_PAGE_NX | _PAGE_GLOBAL | _PAGE_IOMAP); EXPORT_SYMBOL_GPL(__supported_pte_mask); #ifdef CONFIG_X86_PAE diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -88,7 +88,7 @@ int after_bootmem; -unsigned long __supported_pte_mask __read_mostly = ~0UL; +pteval_t __supported_pte_mask __read_mostly = ~_PAGE_IOMAP; EXPORT_SYMBOL_GPL(__supported_pte_mask); static int do_not_nx __cpuinitdata; diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -223,16 +223,16 @@ switch (prot_val) { case _PAGE_CACHE_UC: default: - prot = PAGE_KERNEL_NOCACHE; + prot = PAGE_KERNEL_IO_NOCACHE; break; case _PAGE_CACHE_UC_MINUS: - prot = PAGE_KERNEL_UC_MINUS; + prot = PAGE_KERNEL_IO_UC_MINUS; break; case _PAGE_CACHE_WC: - prot = PAGE_KERNEL_WC; + prot = PAGE_KERNEL_IO_WC; break; case _PAGE_CACHE_WB: - prot = PAGE_KERNEL; + prot = PAGE_KERNEL_IO; break; } diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -15,7 +15,7 @@ #define _PAGE_BIT_PAT 7 /* on 4KB pages */ #define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ #define _PAGE_BIT_UNUSED1 9 /* available for programmer */ -#define _PAGE_BIT_UNUSED2 10 +#define _PAGE_BIT_IOMAP 10 /* flag used to indicate IO mapping */ #define _PAGE_BIT_HIDDEN 11 /* hidden by kmemcheck */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ #define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1 @@ -32,7 +32,7 @@ #define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) #define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) #define _PAGE_UNUSED1 (_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1) -#define _PAGE_UNUSED2 (_AT(pteval_t, 1) << _PAGE_BIT_UNUSED2) +#define _PAGE_IOMAP (_AT(pteval_t, 1) << _PAGE_BIT_IOMAP) #define _PAGE_HIDDEN (_AT(pteval_t, 1) << _PAGE_BIT_HIDDEN) #define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT) #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE) @@ -99,6 +99,11 @@ #define __PAGE_KERNEL_LARGE_NOCACHE (__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE) #define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE) +#define __PAGE_KERNEL_IO (__PAGE_KERNEL | _PAGE_IOMAP) +#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP) +#define __PAGE_KERNEL_IO_UC_MINUS (__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP) +#define __PAGE_KERNEL_IO_WC (__PAGE_KERNEL_WC | _PAGE_IOMAP) + #define PAGE_KERNEL __pgprot(__PAGE_KERNEL) #define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO) #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC) @@ -112,6 +117,11 @@ #define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC) #define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL) #define PAGE_KERNEL_VSYSCALL_NOCACHE __pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE) + +#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO) +#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE) +#define PAGE_KERNEL_IO_UC_MINUS __pgprot(__PAGE_KERNEL_IO_UC_MINUS) +#define PAGE_KERNEL_IO_WC __pgprot(__PAGE_KERNEL_IO_WC) /* xwr */ #define __P000 PAGE_NONE _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 2 of 7] x86: remove duplicate early_ioremap declarations
early_ioremap() is redeclared in several places; remove them. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- include/asm-x86/io.h | 14 -------------- include/asm-x86/io_64.h | 3 --- 2 files changed, 17 deletions(-) diff --git a/include/asm-x86/io.h b/include/asm-x86/io.h --- a/include/asm-x86/io.h +++ b/include/asm-x86/io.h @@ -4,20 +4,6 @@ #define ARCH_HAS_IOREMAP_WC #include <linux/compiler.h> - -/* - * early_ioremap() and early_iounmap() are for temporary early boot-time - * mappings, before the real ioremap() is functional. - * A boot-time mapping is currently limited to at most 16 pages. - */ -#ifndef __ASSEMBLY__ -extern void early_ioremap_init(void); -extern void early_ioremap_clear(void); -extern void early_ioremap_reset(void); -extern void *early_ioremap(unsigned long offset, unsigned long size); -extern void early_iounmap(void *addr, unsigned long size); -extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys); -#endif #define build_mmio_read(name, size, type, reg, barrier) \ static inline type name(const volatile void __iomem *addr) \ diff --git a/include/asm-x86/io_64.h b/include/asm-x86/io_64.h --- a/include/asm-x86/io_64.h +++ b/include/asm-x86/io_64.h @@ -165,9 +165,6 @@ #include <asm-generic/iomap.h> -extern void *early_ioremap(unsigned long addr, unsigned long size); -extern void early_iounmap(void *addr, unsigned long size); - /* * This one maps high address device memory and turns off caching for that area. * it''s useful if some control registers are in such an area and write combining _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 3 of 7] x86: add early_memremap()
early_ioremap() is also used to map normal memory when constructing the linear memory mapping. However, since we sometimes need to be able to distinguish between actual IO mappings and normal memory mappings, add a early_memremap() call, which maps with PAGE_KERNEL (as opposed to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/mm/init_64.c | 2 +- arch/x86/mm/ioremap.c | 22 +++++++++++++++++----- include/asm-x86/io.h | 1 + 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -312,7 +312,7 @@ if (pfn >= table_top) panic("alloc_low_page: ran out of memory"); - adr = early_ioremap(pfn * PAGE_SIZE, PAGE_SIZE); + adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE); memset(adr, 0, PAGE_SIZE); *phys = pfn * PAGE_SIZE; return adr; diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -549,12 +549,12 @@ } static inline void __init early_set_fixmap(enum fixed_addresses idx, - unsigned long phys) + unsigned long phys, pgprot_t prot) { if (after_paging_init) - set_fixmap(idx, phys); + __set_fixmap(idx, phys, prot); else - __early_set_fixmap(idx, phys, PAGE_KERNEL); + __early_set_fixmap(idx, phys, prot); } static inline void __init early_clear_fixmap(enum fixed_addresses idx) @@ -582,7 +582,7 @@ } late_initcall(check_early_ioremap_leak); -void __init *early_ioremap(unsigned long phys_addr, unsigned long size) +static void __init *__early_ioremap(unsigned long phys_addr, unsigned long size, pgprot_t prot) { unsigned long offset, last_addr; unsigned int nrpages, nesting; @@ -631,7 +631,7 @@ idx0 = FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*nesting; idx = idx0; while (nrpages > 0) { - early_set_fixmap(idx, phys_addr); + early_set_fixmap(idx, phys_addr, prot); phys_addr += PAGE_SIZE; --idx; --nrpages; @@ -640,6 +640,18 @@ printk(KERN_CONT "%08lx + %08lx\n", offset, fix_to_virt(idx0)); return (void *) (offset + fix_to_virt(idx0)); +} + +/* Remap an IO device */ +void __init *early_ioremap(unsigned long phys_addr, unsigned long size) +{ + return __early_ioremap(phys_addr, size, PAGE_KERNEL_IO); +} + +/* Remap memory */ +void __init *early_memremap(unsigned long phys_addr, unsigned long size) +{ + return __early_ioremap(phys_addr, size, PAGE_KERNEL); } void __init early_iounmap(void *addr, unsigned long size) diff --git a/include/asm-x86/io.h b/include/asm-x86/io.h --- a/include/asm-x86/io.h +++ b/include/asm-x86/io.h @@ -83,6 +83,7 @@ extern void early_ioremap_clear(void); extern void early_ioremap_reset(void); extern void *early_ioremap(unsigned long offset, unsigned long size); +extern void *early_memremap(unsigned long offset, unsigned long size); extern void early_iounmap(void *addr, unsigned long size); extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 4 of 7] x86: use early_memremap() in setup.c
The remappings in setup.c are all just ordinary memory, so use early_memremap() rather than early_ioremap(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/setup.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -302,7 +302,7 @@ if (clen > MAX_MAP_CHUNK-slop) clen = MAX_MAP_CHUNK-slop; mapaddr = ramdisk_image & PAGE_MASK; - p = early_ioremap(mapaddr, clen+slop); + p = early_memremap(mapaddr, clen+slop); memcpy(q, p+slop, clen); early_iounmap(p, clen+slop); q += clen; @@ -379,7 +379,7 @@ return; pa_data = boot_params.hdr.setup_data; while (pa_data) { - data = early_ioremap(pa_data, PAGE_SIZE); + data = early_memremap(pa_data, PAGE_SIZE); switch (data->type) { case SETUP_E820_EXT: parse_e820_ext(data, pa_data); @@ -402,7 +402,7 @@ return; pa_data = boot_params.hdr.setup_data; while (pa_data) { - data = early_ioremap(pa_data, sizeof(*data)); + data = early_memremap(pa_data, sizeof(*data)); e820_update_range(pa_data, sizeof(*data)+data->len, E820_RAM, E820_RESERVED_KERN); found = 1; @@ -428,7 +428,7 @@ return; pa_data = boot_params.hdr.setup_data; while (pa_data) { - data = early_ioremap(pa_data, sizeof(*data)); + data = early_memremap(pa_data, sizeof(*data)); sprintf(buf, "setup data %x", data->type); reserve_early(pa_data, pa_data+sizeof(*data)+data->len, buf); pa_data = data->next; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 5 of 7] x86-64: don''t check for map replacement
The check prevents flags on mappings from being changed, which is not desireable. There''s no need to check for replacing a mapping, and x86-32 does not do this check. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/mm/init_64.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -195,9 +195,6 @@ } pte = pte_offset_kernel(pmd, vaddr); - if (!pte_none(*pte) && pte_val(new_pte) && - pte_val(*pte) != (pte_val(new_pte) & __supported_pte_mask)) - pte_ERROR(*pte); set_pte(pte, new_pte); /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 6 of 7] x86: use early_ioremap in __acpi_map_table
__acpi_map_table() effectively reimplements early_ioremap(). Rather than have that duplication, just implement it in terms of early_ioremap(). However, unlike early_ioremap(), __acpi_map_table() just maintains a single mapping which gets replaced each call, and has no corresponding unmap function. Implement this by just removing the previous mapping each time its called. Unfortunately, this will leave a stray mapping at the end. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/acpi/boot.c | 27 +++++++-------------------- include/asm-x86/acpi.h | 3 --- include/asm-x86/fixmap_32.h | 4 ---- include/asm-x86/fixmap_64.h | 4 ---- 4 files changed, 7 insertions(+), 31 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -121,8 +121,8 @@ */ char *__init __acpi_map_table(unsigned long phys, unsigned long size) { - unsigned long base, offset, mapped_size; - int idx; + static char *prev_map; + static unsigned long prev_size; if (!phys || !size) return NULL; @@ -130,26 +130,13 @@ if (phys+size <= (max_low_pfn_mapped << PAGE_SHIFT)) return __va(phys); - offset = phys & (PAGE_SIZE - 1); - mapped_size = PAGE_SIZE - offset; - clear_fixmap(FIX_ACPI_END); - set_fixmap(FIX_ACPI_END, phys); - base = fix_to_virt(FIX_ACPI_END); + if (prev_map) + early_iounmap(prev_map, prev_size); - /* - * Most cases can be covered by the below. - */ - idx = FIX_ACPI_END; - while (mapped_size < size) { - if (--idx < FIX_ACPI_BEGIN) - return NULL; /* cannot handle this */ - phys += PAGE_SIZE; - clear_fixmap(idx); - set_fixmap(idx, phys); - mapped_size += PAGE_SIZE; - } + prev_size = size; + prev_map = early_ioremap(phys, size); - return ((unsigned char *)base + offset); + return prev_map; } #ifdef CONFIG_PCI_MMCONFIG diff --git a/include/asm-x86/acpi.h b/include/asm-x86/acpi.h --- a/include/asm-x86/acpi.h +++ b/include/asm-x86/acpi.h @@ -102,9 +102,6 @@ acpi_noirq = 1; } -/* Fixmap pages to reserve for ACPI boot-time tables (see fixmap.h) */ -#define FIX_ACPI_PAGES 4 - extern int acpi_gsi_to_irq(u32 gsi, unsigned int *irq); static inline void acpi_noirq_set(void) { acpi_noirq = 1; } diff --git a/include/asm-x86/fixmap_32.h b/include/asm-x86/fixmap_32.h --- a/include/asm-x86/fixmap_32.h +++ b/include/asm-x86/fixmap_32.h @@ -99,10 +99,6 @@ (__end_of_permanent_fixed_addresses & 255), FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_NESTING - 1, FIX_WP_TEST, -#ifdef CONFIG_ACPI - FIX_ACPI_BEGIN, - FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1, -#endif #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT FIX_OHCI1394_BASE, #endif diff --git a/include/asm-x86/fixmap_64.h b/include/asm-x86/fixmap_64.h --- a/include/asm-x86/fixmap_64.h +++ b/include/asm-x86/fixmap_64.h @@ -50,10 +50,6 @@ FIX_PARAVIRT_BOOTMAP, #endif __end_of_permanent_fixed_addresses, -#ifdef CONFIG_ACPI - FIX_ACPI_BEGIN, - FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1, -#endif #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT FIX_OHCI1394_BASE, #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-07 22:21 UTC
[Xen-devel] [PATCH 7 of 7] x86: always explicitly map acpi memory
Always map acpi tables, rather than assuming we can use the normal linear mapping to access the acpi tables. This is necessary in a virtual environment where the linear mappings are to pseudo-physical memory, but the acpi tables exist at a real physical address. It doesn''t hurt to map in the normal non-virtual case, so just do it unconditionally. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/acpi/boot.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -126,9 +126,6 @@ if (!phys || !size) return NULL; - - if (phys+size <= (max_low_pfn_mapped << PAGE_SHIFT)) - return __va(phys); if (prev_map) early_iounmap(prev_map, prev_size); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yinghai Lu
2008-Sep-07 23:35 UTC
Re: [Xen-devel] [PATCH 7 of 7] x86: always explicitly map acpi memory
On Sun, Sep 7, 2008 at 3:21 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Always map acpi tables, rather than assuming we can use the normal > linear mapping to access the acpi tables. This is necessary in a > virtual environment where the linear mappings are to pseudo-physical > memory, but the acpi tables exist at a real physical address. It > doesn''t hurt to map in the normal non-virtual case, so just do it > unconditionally. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > arch/x86/kernel/acpi/boot.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c > --- a/arch/x86/kernel/acpi/boot.c > +++ b/arch/x86/kernel/acpi/boot.c > @@ -126,9 +126,6 @@ > > if (!phys || !size) > return NULL; > - > - if (phys+size <= (max_low_pfn_mapped << PAGE_SHIFT)) > - return __va(phys); > > if (prev_map) > early_iounmap(prev_map, prev_size); >actually, case 1: acpi tables near mmio, range, we don''t map them from 2.6.27-rc1, and it is bigger than max_low_mapped... case 2: some strange system put acpi in the middle of RAM... like when 8G ram installed, but MMIO is 3.5G, BIOS put acpi tables around 2G.. YH _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-08 00:02 UTC
Re: [Xen-devel] [PATCH 7 of 7] x86: always explicitly map acpi memory
Yinghai Lu wrote:> actually, > case 1: acpi tables near mmio, range, we don''t map them from > 2.6.27-rc1, and it is bigger than max_low_mapped... > case 2: some strange system put acpi in the middle of RAM... like when > 8G ram installed, but MMIO is 3.5G, BIOS put acpi tables around 2G.. >OK, so what''s your conclusion? Is this change OK or not? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-08 00:03 UTC
[Xen-devel] Re: [PATCH 6 of 7] x86: use early_ioremap in __acpi_map_table
Andi Kleen wrote:> On Sun, Sep 07, 2008 at 03:21:18PM -0700, Jeremy Fitzhardinge wrote: > >> __acpi_map_table() effectively reimplements early_ioremap(). Rather >> than have that duplication, just implement it in terms of >> early_ioremap(). >> >> However, unlike early_ioremap(), __acpi_map_table() just maintains a >> single mapping which gets replaced each call, and has no corresponding >> unmap function. Implement this by just removing the previous mapping >> each time its called. Unfortunately, this will leave a stray mapping >> at the end. >> > > It would be better to just fix the ACPI code to unmap.I was concerned that would cause lots of cross-arch churn, but of course the only other relevant architecture is ia64. I''ll prep a followup patch. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yinghai Lu
2008-Sep-08 00:14 UTC
Re: [Xen-devel] [PATCH 7 of 7] x86: always explicitly map acpi memory
On Sun, Sep 7, 2008 at 5:02 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Yinghai Lu wrote: >> actually, >> case 1: acpi tables near mmio, range, we don''t map them from >> 2.6.27-rc1, and it is bigger than max_low_mapped... >> case 2: some strange system put acpi in the middle of RAM... like when >> 8G ram installed, but MMIO is 3.5G, BIOS put acpi tables around 2G.. >> > > OK, so what''s your conclusion? Is this change OK or not? >Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2008-Sep-08 14:26 UTC
[Xen-devel] Re: [PATCH 6 of 7] x86: use early_ioremap in __acpi_map_table
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:> >> However, unlike early_ioremap(), __acpi_map_table() just maintains > >> a single mapping which gets replaced each call, and has no > >> corresponding unmap function. Implement this by just removing the > >> previous mapping each time its called. Unfortunately, this will > >> leave a stray mapping at the end. > > > > It would be better to just fix the ACPI code to unmap. > > I was concerned that would cause lots of cross-arch churn, but of > course the only other relevant architecture is ia64. I''ll prep a > followup patch.uhm, there''s a nasty trap in that route: it can potentially cause a lot of breakage. It''s not robust to assume that the ACPI code is sane wrt. mapping/unmapping, because it currently simply doesnt rely on robust unmapping (in the linear range). I tried it in the past and i found tons of crappy ACPI code all around that just never unmapped tables. Leaking ACPI maps are hard to find as well, and it can occur anytime during bootup. As a general principle it might be worth fixing those places, and we''ve hardened up the early-ioremap code for leaks during the PAT rewrite, still please realize that it can become non-trivial and it might cause a lot of unhappy users. So i''d suggest a different, more carful approach: keep the new code you wrote, but print a WARN()ing if prev_map is not unmapped yet when the next mapping is acquired. That way the ACPI code can be fixed gradually and without breaking existing functionality. There''s another complication: ACPI might rely on multiple mappings being present at once, so unmapping the previous one might not be safe. But it _should_ be fine most of the time as __acpi_map_table() is only used inearly init code - and we fixed most of these things in the PAT patchset in any case. Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-08 16:29 UTC
[Xen-devel] Re: [PATCH 6 of 7] x86: use early_ioremap in __acpi_map_table
Ingo Molnar wrote:> uhm, there''s a nasty trap in that route: it can potentially cause a lot > of breakage. > > It''s not robust to assume that the ACPI code is sane wrt. > mapping/unmapping, because it currently simply doesnt rely on robust > unmapping (in the linear range). >You mean there''s code which just assumes that it can keep using a linear-mapped acpi even after __acpi_map_table() should have implicitly unmapped it?> I tried it in the past and i found tons of crappy ACPI code all around > that just never unmapped tables. Leaking ACPI maps are hard to find as > well, and it can occur anytime during bootup. >__acpi_map_table() is called by acpi_map_table(), which does have a acpi_unmap_table() counterpart. But it only calls iounmap() once we''re past the stage of calling early_*(). I could easily make it call __acpi_unmap_table()->early_iounmap(). But if the concern is that the early boot callers of acpi_map_table() "know" that they never need to unmap, then yes, I see the problem.> As a general principle it might be worth fixing those places, and we''ve > hardened up the early-ioremap code for leaks during the PAT rewrite, > still please realize that it can become non-trivial and it might cause a > lot of unhappy users. > > So i''d suggest a different, more carful approach: keep the new code you > wrote, but print a WARN()ing if prev_map is not unmapped yet when the > next mapping is acquired. That way the ACPI code can be fixed gradually > and without breaking existing functionality. >Yep.> There''s another complication: ACPI might rely on multiple mappings being > present at once, so unmapping the previous one might not be safe. But it > _should_ be fine most of the time as __acpi_map_table() is only used > inearly init code - and we fixed most of these things in the PAT > patchset in any case.And the current behaviour of __acpi_map_table() is to remove the previous mapping (implicitly, by overwriting the same fixmap slots), so its only an issue if the callers assume they can keep using linear-mapped acpi tables after a subsequent call to __acpi_map_table(). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-08 19:41 UTC
[Xen-devel] Re: [PATCH 6 of 7] x86: use early_ioremap in __acpi_map_table
Ingo Molnar wrote:> uhm, there''s a nasty trap in that route: it can potentially cause a lot > of breakage. > > It''s not robust to assume that the ACPI code is sane wrt. > mapping/unmapping, because it currently simply doesnt rely on robust > unmapping (in the linear range). > > I tried it in the past and i found tons of crappy ACPI code all around > that just never unmapped tables. Leaking ACPI maps are hard to find as > well, and it can occur anytime during bootup. >Yes, you''re right, it''s a mess. I put a unmap and warning in there, but there''s lots of acpi code which doesn''t unmap after using its table. It doesn''t seem to intermix using two tables, fortunately, so the "unmap previous" behaviour of __acpi_map_table works OK, at least as far as I can see. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2008-Sep-09 13:32 UTC
[Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Jeremy Fitzhardinge wrote:> Use one of the software-defined PTE bits to indicate that a mapping is > intended for an IO address. On native hardware this is irrelevent, > since a physical address is a physical address. But in a virtual > environment, physical addresses are also virtualized, so there needs > to be some way to distinguish between pseudo-physical addresses and > actual hardware addresses; _PAGE_IOMAP indicates this intent. > > By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn''t > even appear in the final pagetable. >Could PTE_SPECIAL, added for get_user_pages_really_fast(), be reused for this? -- error compiling committee.c: too many arguments to function _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-09 14:47 UTC
[Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Avi Kivity wrote:> Jeremy Fitzhardinge wrote: >> Use one of the software-defined PTE bits to indicate that a mapping is >> intended for an IO address. On native hardware this is irrelevent, >> since a physical address is a physical address. But in a virtual >> environment, physical addresses are also virtualized, so there needs >> to be some way to distinguish between pseudo-physical addresses and >> actual hardware addresses; _PAGE_IOMAP indicates this intent. >> >> By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn''t >> even appear in the final pagetable. >> > > Could PTE_SPECIAL, added for get_user_pages_really_fast(), be reused > for this? >I''m not sure; I still don''t really understand how _PAGE_SPECIAL gets used, other than being user-mode mapping only. But in principle, _PAGE_IOMAP could be set on both kernel and user mappings (if you direct map a device into a process address space), so I think they would conflict then? Also, _PAGE_SPECIAL is also shared with _PAGE_CPA_TEST, which is only used on kernel mappings, so they can co-exist happily. Is _PAGE_IOMAP at all useful for device passthrough in kvm? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2008-Sep-09 14:56 UTC
[Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Jeremy Fitzhardinge wrote:> Avi Kivity wrote: > >> Jeremy Fitzhardinge wrote: >> >>> Use one of the software-defined PTE bits to indicate that a mapping is >>> intended for an IO address. On native hardware this is irrelevent, >>> since a physical address is a physical address. But in a virtual >>> environment, physical addresses are also virtualized, so there needs >>> to be some way to distinguish between pseudo-physical addresses and >>> actual hardware addresses; _PAGE_IOMAP indicates this intent. >>> >>> By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn''t >>> even appear in the final pagetable. >>> >>> >> Could PTE_SPECIAL, added for get_user_pages_really_fast(), be reused >> for this? >> >> > > I''m not sure; I still don''t really understand how _PAGE_SPECIAL gets > used, other than being user-mode mapping only. But in principle, > _PAGE_IOMAP could be set on both kernel and user mappings (if you direct > map a device into a process address space), so I think they would > conflict then? > >It''s a "don''t refcount me" flag, which is not sematically the same as I/O, but may be close enough.> Is _PAGE_IOMAP at all useful for device passthrough in kvm? >No. We don''t care if a page is an I/O page or RAM (other than refcounting correctness and page attributes, which are handled by other means). -- error compiling committee.c: too many arguments to function _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Sep-09 15:29 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
On 9/9/08 15:56, "Avi Kivity" <avi@qumranet.com> wrote:>> I''m not sure; I still don''t really understand how _PAGE_SPECIAL gets >> used, other than being user-mode mapping only. But in principle, >> _PAGE_IOMAP could be set on both kernel and user mappings (if you direct >> map a device into a process address space), so I think they would >> conflict then? > > It''s a "don''t refcount me" flag, which is not sematically the same as > I/O, but may be close enough.That''s basically what our _PAGE_IO flag (in our old Linux patchset) means. We use it to cause pte_pfn() to return an invalid pfn and hence avoid reference counting that way. Since kernel mappings are never reference counted (I think?) perhaps we could use _PAGE_SPECIAL even if it is restricted to use on user mappings. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-09 15:48 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Keir Fraser wrote:> That''s basically what our _PAGE_IO flag (in our old Linux patchset) means. > We use it to cause pte_pfn() to return an invalid pfn and hence avoid > reference counting that way. Since kernel mappings are never reference > counted (I think?) perhaps we could use _PAGE_SPECIAL even if it is > restricted to use on user mappings.Well, _PAGE_IOMAP''s most important semantic from Xen''s perspective is that the frame number is considered to already be an MFN and so isn''t converted. It may be that _PAGE_SPECIAL is also useful for its "no refcount" properties, but we could set both in that case. At present we can''t set _PAGE_SPECIAL on kernel mappings because we assume its usermode only, and its shared with _PAGE_CPA_TEST which is kernel-only. Most _PAGE_IOMAP mappings are in kernel space, but I wouldn''t discount user mappings if we end up allowing direct access to passthrough devices. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Sep-09 16:05 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
On 9/9/08 16:48, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> Keir Fraser wrote: >> That''s basically what our _PAGE_IO flag (in our old Linux patchset) means. >> We use it to cause pte_pfn() to return an invalid pfn and hence avoid >> reference counting that way. Since kernel mappings are never reference >> counted (I think?) perhaps we could use _PAGE_SPECIAL even if it is >> restricted to use on user mappings. > > Well, _PAGE_IOMAP''s most important semantic from Xen''s perspective is > that the frame number is considered to already be an MFN and so isn''t > converted. It may be that _PAGE_SPECIAL is also useful for its "no > refcount" properties, but we could set both in that case.Oh, yes, our _PAGE_IO has the same meaning. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Avi Kivity
2008-Sep-10 09:55 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Avi Kivity wrote:>> >> I''m not sure; I still don''t really understand how _PAGE_SPECIAL gets >> used, other than being user-mode mapping only. But in principle, >> _PAGE_IOMAP could be set on both kernel and user mappings (if you direct >> map a device into a process address space), so I think they would >> conflict then? >> >> > > It''s a "don''t refcount me" flag, which is not sematically the same as > I/O, but may be close enough.Actually it''s more of a "no struct page" flag, which implies no refcounting. And not having a struct page should correspond well to a pte not requiring pfn->mfn conversion and being an I/O page. -- error compiling committee.c: too many arguments to function _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-10 16:38 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Avi Kivity wrote:> Actually it''s more of a "no struct page" flag, which implies no > refcounting.Hm, is that actually true enough to define it? Could we rename it something like _PAGE_NOSTRUCTPAGE or something a bit more specific than "special"?> And not having a struct page should correspond well to a pte not > requiring pfn->mfn conversion and being an I/O page.But _PAGE_SPECIAL is only set in a few places. It''s not set in ioremap mappings and so on. Should it be? There''s also the hiccup that it gets set in a pte with pte_mkspecial() - but at that point its too late because you''ve already constructed the pte and done the pfn->mfn conversion. _PAGE_IOMAP can only be set when you initially construct the pte out of a frame number and a pgprot. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nick Piggin
2008-Sep-10 16:55 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
On Thursday 11 September 2008 02:38, Jeremy Fitzhardinge wrote:> Avi Kivity wrote: > > Actually it''s more of a "no struct page" flag, which implies no > > refcounting. > > Hm, is that actually true enough to define it? Could we rename it > something like _PAGE_NOSTRUCTPAGE or something a bit more specific than > "special"?It complements vm_normal_page, which was there first (and coined by Linus). It is the opposite of normal. This question always comes up and my answer is always yes, if you can convince Linus to rename vm_normal_page to the corresponding term :) It''s not exactly _PAGE_NOSTRUCTPAGE. There can be struct pages under there, but you''re not to touch them.> > And not having a struct page should correspond well to a pte not > > requiring pfn->mfn conversion and being an I/O page. > > But _PAGE_SPECIAL is only set in a few places. It''s not set in ioremap > mappings and so on. Should it be?Kernel address space, you mean? No, it is only ever used on user addresses.> There''s also the hiccup that it gets set in a pte with pte_mkspecial() - > but at that point its too late because you''ve already constructed the > pte and done the pfn->mfn conversion. _PAGE_IOMAP can only be set when > you initially construct the pte out of a frame number and a pgprot.I don''t see this would be any problem because the pte is always constructed in a single line in both places where it is used. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-10 17:27 UTC
Re: [Xen-devel] Re: [PATCH 1 of 7] x86: add _PAGE_IOMAP pte flag for IO mappings
Nick Piggin wrote:> It complements vm_normal_page, which was there first (and coined by > Linus). It is the opposite of normal. This question always comes up > and my answer is always yes, if you can convince Linus to rename > vm_normal_page to the corresponding term :) >Not really. Normal is normal, but "special" doesn''t tell us what kind of special it is.> It''s not exactly _PAGE_NOSTRUCTPAGE. There can be struct pages under > there, but you''re not to touch them. >To the extent that the struct page may as well not exist? Does it contain any meaningful state? Are they always IO mappings? Could we just use _PAGE_IOMAP as the name for _PAGE_SPECIAL?>>> And not having a struct page should correspond well to a pte not >>> requiring pfn->mfn conversion and being an I/O page. >>> >> But _PAGE_SPECIAL is only set in a few places. It''s not set in ioremap >> mappings and so on. Should it be? >> > > Kernel address space, you mean? No, it is only ever used on user > addresses. >Right. But if we fold _PAGE_SPECIAL and _PAGE_IOMAP together, it would start getting used on kernel addresses (and obviously we''d need to rearrange _PAGE_CPA_TEST).>> There''s also the hiccup that it gets set in a pte with pte_mkspecial() - >> but at that point its too late because you''ve already constructed the >> pte and done the pfn->mfn conversion. _PAGE_IOMAP can only be set when >> you initially construct the pte out of a frame number and a pgprot. >> > > I don''t see this would be any problem because the pte is always constructed > in a single line in both places where it is used. >OK. If we were to fold these two together, then pte_mkspecial() would have to go, since it wouldn''t possible to use correctly in my use case. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel