Jeremy Fitzhardinge
2008-Nov-13 19:09 UTC
[Xen-devel] [PATCH 00 of 38] xen: add more Xen dom0 support
Hi Ingo, Here''s the chunk of patches to add Xen Dom0 support (it''s probably worth creating a new xen/dom0 topic branch for it). A dom0 Xen domain is basically the same as a normal domU domain, but it has extra privileges to directly access hardware. There are two issues to deal with: - translating to and from the domain''s pseudo-physical addresses and real machine addresses (for ioremap and setting up DMA) - routing hardware interrupts into the domain ioremap is relatively easy to deal with. An earlier patch introduced the _PAGE_IOMAP pte flag, which we use to distinguish between a regular pseudo-physical mapping and a real machine mapping. Everything falls out pretty cleanly. A consequence is that the various pieces of table-parsing code - DMI, ACPI, MP, etc - work out of the box. Similarly, the series adds hooks into swiotlb so that architectures can allocate the swiotlb memory appropriately; on the x86/xen side, Xen hooks these allocation functions to make special hypercalls to guarantee that the allocated memory is contiguous in machine memory. Interrupts are a very different affair. The descriptions in each patch describe how it all fits together in detail, but the overview is: 1. Xen owns the local APICs; the dom0 kernel controls the IO APICs 2. Hardware interrupts are delivered on event channels like everything else 3. To set this up, we intercept at pcibios_enable_irq: - given a dev+pin, we use ACPI to get a gsi - hook acpi_register_gsi to call xen_register_gsi, which - allocates an irq (generally not 1:1 with the gsi) - asks Xen for a vector and event channel for the irq - program the IO APIC to deliver the hardware interrupt to the allocated vector The upshot is that the device driver gets an irq, and when the hardware raises an interrupt, it gets delivered on that irq. We maintain our own irq allocation space, since the hardware-bound event channel irqs are intermixed with all the other normal Xen event channel irqs (inter-domain, timers, IPIs, etc). For compatibility the irqs 0-15 are reserved for legacy device interrupts, but the rest of the range is dynamically allocated. Initialization also requires care. The dom0 kernel parses the ACPI tables as usual, in order to discover the local and IO APICs, and all the rest of the ACPI-provided data the kernel requires. However, because the kernel doesn''t own the local APICs and can''t directly map the IO APICs, we must be sure to avoid actually touching the hardware when running under Xen. TODO: work out how to fit MSI into all this. So, in summary, this series contains: - dom0 console support - dom0 xenbus support - CPU features and IO access for a privleged domain - mtrrs - making ioremap work on machine addresses - swiotlb allocation hooks - interrupts - introduce PV io_apic operations - add Xen-specific IRQ allocator - switch to using all-Xen event delivery - add pirq Xen interrupt type - table parsing and setup - intercept driver interrupt registration All this code will compile away to nothing when CONFIG_XEN_DOM0 is not enabled. If it is enabled, it will only have an effect if booted as a dom0 kernel; normal native execution and domU execution should be unaffected. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:09 UTC
[Xen-devel] [PATCH 01 of 38] xen: various whitespace and other formatting cleanups
From: Tej <bewith.tej@gmail.com> Signed-off-by: Tej <bewith.tej@gmail.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/enlighten.c | 18 ++++++++++-------- arch/x86/xen/mmu.c | 17 ++++++++++------- arch/x86/xen/multicalls.c | 2 +- arch/x86/xen/setup.c | 9 +++++---- 4 files changed, 26 insertions(+), 20 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -793,7 +793,7 @@ ret = 0; - switch(msr) { + switch (msr) { #ifdef CONFIG_X86_64 unsigned which; u64 base; @@ -1453,7 +1453,7 @@ ident_pte = 0; pfn = 0; - for(pmdidx = 0; pmdidx < PTRS_PER_PMD && pfn < max_pfn; pmdidx++) { + for (pmdidx = 0; pmdidx < PTRS_PER_PMD && pfn < max_pfn; pmdidx++) { pte_t *pte_page; /* Reuse or allocate a page of ptes */ @@ -1471,7 +1471,7 @@ } /* Install mappings */ - for(pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) { + for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) { pte_t pte; if (pfn > max_pfn_mapped) @@ -1485,7 +1485,7 @@ } } - for(pteidx = 0; pteidx < ident_pte; pteidx += PTRS_PER_PTE) + for (pteidx = 0; pteidx < ident_pte; pteidx += PTRS_PER_PTE) set_page_prot(&level1_ident_pgt[pteidx], PAGE_KERNEL_RO); set_page_prot(pmd, PAGE_KERNEL_RO); @@ -1499,7 +1499,7 @@ /* All levels are converted the same way, so just treat them as ptes. */ - for(i = 0; i < PTRS_PER_PTE; i++) + for (i = 0; i < PTRS_PER_PTE; i++) pte[i] = xen_make_pte(pte[i].pte); } @@ -1514,7 +1514,8 @@ * of the physical mapping once some sort of allocator has been set * up. */ -static __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn) +static __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, + unsigned long max_pfn) { pud_t *l3; pmd_t *l2; @@ -1577,7 +1578,8 @@ #else /* !CONFIG_X86_64 */ static pmd_t level2_kernel_pgt[PTRS_PER_PMD] __page_aligned_bss; -static __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn) +static __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, + unsigned long max_pfn) { pmd_t *kernel_pmd; diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -154,13 +154,13 @@ { unsigned pfn, idx; - for(pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_ENTRIES_PER_PAGE) { + for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_ENTRIES_PER_PAGE) { unsigned topidx = p2m_top_index(pfn); p2m_top_mfn[topidx] = virt_to_mfn(p2m_top[topidx]); } - for(idx = 0; idx < ARRAY_SIZE(p2m_top_mfn_list); idx++) { + for (idx = 0; idx < ARRAY_SIZE(p2m_top_mfn_list); idx++) { unsigned topidx = idx * P2M_ENTRIES_PER_PAGE; p2m_top_mfn_list[idx] = virt_to_mfn(&p2m_top_mfn[topidx]); } @@ -179,7 +179,7 @@ unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages); unsigned pfn; - for(pfn = 0; pfn < max_pfn; pfn += P2M_ENTRIES_PER_PAGE) { + for (pfn = 0; pfn < max_pfn; pfn += P2M_ENTRIES_PER_PAGE) { unsigned topidx = p2m_top_index(pfn); p2m_top[topidx] = &mfn_list[pfn]; @@ -207,7 +207,7 @@ p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL); BUG_ON(p == NULL); - for(i = 0; i < P2M_ENTRIES_PER_PAGE; i++) + for (i = 0; i < P2M_ENTRIES_PER_PAGE; i++) p[i] = INVALID_P2M_ENTRY; if (cmpxchg(pp, p2m_missing, p) != p2m_missing) @@ -407,7 +407,8 @@ preempt_enable(); } -pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t *ptep) +pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) { /* Just return the pte as-is. We preserve the bits on commit */ return *ptep; @@ -871,7 +872,8 @@ if (user_pgd) { xen_pin_page(mm, virt_to_page(user_pgd), PT_PGD); - xen_do_pin(MMUEXT_PIN_L4_TABLE, PFN_DOWN(__pa(user_pgd))); + xen_do_pin(MMUEXT_PIN_L4_TABLE, + PFN_DOWN(__pa(user_pgd))); } } #else /* CONFIG_X86_32 */ @@ -986,7 +988,8 @@ pgd_t *user_pgd = xen_get_user_pgd(pgd); if (user_pgd) { - xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(user_pgd))); + xen_do_pin(MMUEXT_UNPIN_TABLE, + PFN_DOWN(__pa(user_pgd))); xen_unpin_page(mm, virt_to_page(user_pgd), PT_PGD); } } diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c --- a/arch/x86/xen/multicalls.c +++ b/arch/x86/xen/multicalls.c @@ -154,7 +154,7 @@ ret, smp_processor_id()); dump_stack(); for (i = 0; i < b->mcidx; i++) { - printk(" call %2d/%d: op=%lu arg=[%lx] result=%ld\n", + printk(KERN_DEBUG " call %2d/%d: op=%lu arg=[%lx] result=%ld\n", i+1, b->mcidx, b->debug[i].op, b->debug[i].args[0], diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -28,6 +28,9 @@ /* These are code, but not functions. Defined in entry.S */ extern const char xen_hypervisor_callback[]; extern const char xen_failsafe_callback[]; +extern void xen_sysenter_target(void); +extern void xen_syscall_target(void); +extern void xen_syscall32_target(void); /** @@ -110,7 +113,6 @@ void __cpuinit xen_enable_sysenter(void) { - extern void xen_sysenter_target(void); int ret; unsigned sysenter_feature; @@ -132,8 +134,6 @@ { #ifdef CONFIG_X86_64 int ret; - extern void xen_syscall_target(void); - extern void xen_syscall32_target(void); ret = register_callback(CALLBACKTYPE_syscall, xen_syscall_target); if (ret != 0) { @@ -160,7 +160,8 @@ HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables); if (!xen_feature(XENFEAT_auto_translated_physmap)) - HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_pae_extended_cr3); + HYPERVISOR_vm_assist(VMASST_CMD_enable, + VMASST_TYPE_pae_extended_cr3); if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) || register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 02 of 38] x86: remove unused iommu_nr_pages
The last usage was removed by the patch set culminating in commit e3c449f526cebb8d287241c7e82faafd9709668b Author: Joerg Roedel <joerg.roedel@amd.com> Date: Wed Oct 15 22:02:11 2008 -0700 x86, AMD IOMMU: convert driver to generic iommu_num_pages function Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.orgdiff -r 9b89e3b4ca90 arch/x86/include/asm/iommu.h --- arch/x86/include/asm/iommu.h | 2 -- arch/x86/kernel/pci-dma.c | 7 ------- 2 files changed, 9 deletions(-) diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -8,8 +8,6 @@ extern int iommu_detected; extern int dmar_disabled; -extern unsigned long iommu_nr_pages(unsigned long addr, unsigned long len); - /* 10 seconds */ #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -125,13 +125,6 @@ pci_swiotlb_init(); } -unsigned long iommu_nr_pages(unsigned long addr, unsigned long len) -{ - unsigned long size = roundup((addr & ~PAGE_MASK) + len, PAGE_SIZE); - - return size >> PAGE_SHIFT; -} -EXPORT_SYMBOL(iommu_nr_pages); #endif void *dma_generic_alloc_coherent(struct device *dev, size_t size, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 03 of 38] swiotlb: allow architectures to override swiotlb pool allocation
Architectures may need to allocate memory specially for use with the swiotlb. Create the weak function swiotlb_alloc_boot() and swiotlb_alloc() defaulting to the current behaviour. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: Tony Luck <tony.luck@intel.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- include/linux/swiotlb.h | 3 +++ lib/swiotlb.c | 16 +++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -10,6 +10,9 @@ extern void swiotlb_init(void); +extern void *swiotlb_alloc_boot(size_t bytes, unsigned long nslabs); +extern void *swiotlb_alloc(unsigned order, unsigned long nslabs); + extern void *swiotlb_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t *dma_handle, gfp_t flags); diff --git a/lib/swiotlb.c b/lib/swiotlb.c --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -21,6 +21,7 @@ #include <linux/mm.h> #include <linux/module.h> #include <linux/spinlock.h> +#include <linux/swiotlb.h> #include <linux/string.h> #include <linux/types.h> #include <linux/ctype.h> @@ -126,6 +127,16 @@ __setup("swiotlb=", setup_io_tlb_npages); /* make io_tlb_overflow tunable too? */ +void * __weak swiotlb_alloc_boot(size_t size, unsigned long nslabs) +{ + return alloc_bootmem_low_pages(size); +} + +void * __weak swiotlb_alloc(unsigned order, unsigned long nslabs) +{ + return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order); +} + /* * Statically reserve bounce buffer space and initialize bounce buffer data * structures for the software IO TLB used to implement the DMA API. @@ -145,7 +156,7 @@ /* * Get IO TLB memory from the low pages */ - io_tlb_start = alloc_bootmem_low_pages(bytes); + io_tlb_start = swiotlb_alloc_boot(bytes, io_tlb_nslabs); if (!io_tlb_start) panic("Cannot allocate SWIOTLB buffer"); io_tlb_end = io_tlb_start + bytes; @@ -202,8 +213,7 @@ bytes = io_tlb_nslabs << IO_TLB_SHIFT; while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) { - io_tlb_start = (char *)__get_free_pages(GFP_DMA | __GFP_NOWARN, - order); + io_tlb_start = swiotlb_alloc(order, io_tlb_nslabs); if (io_tlb_start) break; order--; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 04 of 38] swiotlb: move some definitions to header
From: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- include/linux/swiotlb.h | 14 ++++++++++++++ lib/swiotlb.c | 14 +------------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -7,6 +7,20 @@ struct dma_attrs; struct scatterlist; +/* + * Maximum allowable number of contiguous slabs to map, + * must be a power of 2. What is the appropriate value ? + * The complexity of {map,unmap}_single is linearly dependent on this value. + */ +#define IO_TLB_SEGSIZE 128 + + +/* + * log of the size of each IO TLB slab. The number of slabs is command line + * controllable. + */ +#define IO_TLB_SHIFT 11 + extern void swiotlb_init(void); diff --git a/lib/swiotlb.c b/lib/swiotlb.c --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -23,6 +23,7 @@ #include <linux/spinlock.h> #include <linux/swiotlb.h> #include <linux/string.h> +#include <linux/swiotlb.h> #include <linux/types.h> #include <linux/ctype.h> @@ -40,19 +41,6 @@ #define SG_ENT_VIRT_ADDRESS(sg) (sg_virt((sg))) #define SG_ENT_PHYS_ADDRESS(sg) virt_to_bus(SG_ENT_VIRT_ADDRESS(sg)) -/* - * Maximum allowable number of contiguous slabs to map, - * must be a power of 2. What is the appropriate value ? - * The complexity of {map,unmap}_single is linearly dependent on this value. - */ -#define IO_TLB_SEGSIZE 128 - -/* - * log of the size of each IO TLB slab. The number of slabs is command line - * controllable. - */ -#define IO_TLB_SHIFT 11 - #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT)) /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 05 of 38] swiotlb: allow architectures to override virt<->bus<->virt conversions
From: Ian Campbell <ian.campbell@citrix.com> Architectures may need to override these conversions. Implement a __weak hook point containing the default implementation. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- include/linux/swiotlb.h | 3 +++ lib/swiotlb.c | 42 ++++++++++++++++++++++++++---------------- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -27,6 +27,9 @@ extern void *swiotlb_alloc_boot(size_t bytes, unsigned long nslabs); extern void *swiotlb_alloc(unsigned order, unsigned long nslabs); +extern unsigned long __weak swiotlb_virt_to_bus(volatile void *address); +extern void * __weak swiotlb_bus_to_virt(unsigned long address); + extern void *swiotlb_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t *dma_handle, gfp_t flags); diff --git a/lib/swiotlb.c b/lib/swiotlb.c --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -125,6 +125,16 @@ return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order); } +unsigned long __weak swiotlb_virt_to_bus(volatile void *address) +{ + return virt_to_bus(address); +} + +void * __weak swiotlb_bus_to_virt(unsigned long address) +{ + return bus_to_virt(address); +} + /* * Statically reserve bounce buffer space and initialize bounce buffer data * structures for the software IO TLB used to implement the DMA API. @@ -168,7 +178,7 @@ panic("Cannot allocate SWIOTLB overflow buffer!\n"); printk(KERN_INFO "Placing software IO TLB between 0x%lx - 0x%lx\n", - virt_to_bus(io_tlb_start), virt_to_bus(io_tlb_end)); + swiotlb_virt_to_bus(io_tlb_start), swiotlb_virt_to_bus(io_tlb_end)); } void __init @@ -250,7 +260,7 @@ printk(KERN_INFO "Placing %luMB software IO TLB between 0x%lx - " "0x%lx\n", bytes >> 20, - virt_to_bus(io_tlb_start), virt_to_bus(io_tlb_end)); + swiotlb_virt_to_bus(io_tlb_start), swiotlb_virt_to_bus(io_tlb_end)); return 0; @@ -298,7 +308,7 @@ unsigned long max_slots; mask = dma_get_seg_boundary(hwdev); - start_dma_addr = virt_to_bus(io_tlb_start) & mask; + start_dma_addr = swiotlb_virt_to_bus(io_tlb_start) & mask; offset_slots = ALIGN(start_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; max_slots = mask + 1 @@ -467,7 +477,7 @@ int order = get_order(size); ret = (void *)__get_free_pages(flags, order); - if (ret && address_needs_mapping(hwdev, virt_to_bus(ret), size)) { + if (ret && address_needs_mapping(hwdev, swiotlb_virt_to_bus(ret), size)) { /* * The allocated memory isn''t reachable by the device. * Fall back on swiotlb_map_single(). @@ -488,7 +498,7 @@ } memset(ret, 0, size); - dev_addr = virt_to_bus(ret); + dev_addr = swiotlb_virt_to_bus(ret); /* Confirm address can be DMA''d by device */ if (address_needs_mapping(hwdev, dev_addr, size)) { @@ -548,7 +558,7 @@ swiotlb_map_single_attrs(struct device *hwdev, void *ptr, size_t size, int dir, struct dma_attrs *attrs) { - dma_addr_t dev_addr = virt_to_bus(ptr); + dma_addr_t dev_addr = swiotlb_virt_to_bus(ptr); void *map; BUG_ON(dir == DMA_NONE); @@ -569,7 +579,7 @@ map = io_tlb_overflow_buffer; } - dev_addr = virt_to_bus(map); + dev_addr = swiotlb_virt_to_bus(map); /* * Ensure that the address returned is DMA''ble @@ -599,7 +609,7 @@ swiotlb_unmap_single_attrs(struct device *hwdev, dma_addr_t dev_addr, size_t size, int dir, struct dma_attrs *attrs) { - char *dma_addr = bus_to_virt(dev_addr); + char *dma_addr = swiotlb_bus_to_virt(dev_addr); BUG_ON(dir == DMA_NONE); if (is_swiotlb_buffer(dma_addr)) @@ -629,7 +639,7 @@ swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr, size_t size, int dir, int target) { - char *dma_addr = bus_to_virt(dev_addr); + char *dma_addr = swiotlb_bus_to_virt(dev_addr); BUG_ON(dir == DMA_NONE); if (is_swiotlb_buffer(dma_addr)) @@ -660,7 +670,7 @@ unsigned long offset, size_t size, int dir, int target) { - char *dma_addr = bus_to_virt(dev_addr) + offset; + char *dma_addr = swiotlb_bus_to_virt(dev_addr) + offset; BUG_ON(dir == DMA_NONE); if (is_swiotlb_buffer(dma_addr)) @@ -716,7 +726,7 @@ for_each_sg(sgl, sg, nelems, i) { addr = SG_ENT_VIRT_ADDRESS(sg); - dev_addr = virt_to_bus(addr); + dev_addr = swiotlb_virt_to_bus(addr); if (swiotlb_force || address_needs_mapping(hwdev, dev_addr, sg->length)) { void *map = map_single(hwdev, addr, sg->length, dir); @@ -729,7 +739,7 @@ sgl[0].dma_length = 0; return 0; } - sg->dma_address = virt_to_bus(map); + sg->dma_address = swiotlb_virt_to_bus(map); } else sg->dma_address = dev_addr; sg->dma_length = sg->length; @@ -760,7 +770,7 @@ for_each_sg(sgl, sg, nelems, i) { if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg)) - unmap_single(hwdev, bus_to_virt(sg->dma_address), + unmap_single(hwdev, swiotlb_bus_to_virt(sg->dma_address), sg->dma_length, dir); else if (dir == DMA_FROM_DEVICE) dma_mark_clean(SG_ENT_VIRT_ADDRESS(sg), sg->dma_length); @@ -793,7 +803,7 @@ for_each_sg(sgl, sg, nelems, i) { if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg)) - sync_single(hwdev, bus_to_virt(sg->dma_address), + sync_single(hwdev, swiotlb_bus_to_virt(sg->dma_address), sg->dma_length, dir, target); else if (dir == DMA_FROM_DEVICE) dma_mark_clean(SG_ENT_VIRT_ADDRESS(sg), sg->dma_length); @@ -817,7 +827,7 @@ int swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr) { - return (dma_addr == virt_to_bus(io_tlb_overflow_buffer)); + return (dma_addr == swiotlb_virt_to_bus(io_tlb_overflow_buffer)); } /* @@ -829,7 +839,7 @@ int swiotlb_dma_supported(struct device *hwdev, u64 mask) { - return virt_to_bus(io_tlb_end - 1) <= mask; + return swiotlb_virt_to_bus(io_tlb_end - 1) <= mask; } EXPORT_SYMBOL(swiotlb_map_single); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 06 of 38] xen: clean up asm-x86/xen/hypervisor.h
hypervisor.h had accumulated a lot of crud, including lots of spurious #includes. Clean it all up, and go around fixing up everything else accordingly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/xen/hypercall.h | 6 +++++ arch/x86/include/asm/xen/hypervisor.h | 39 ++++++--------------------------- arch/x86/include/asm/xen/page.h | 5 ++++ arch/x86/xen/enlighten.c | 1 drivers/xen/balloon.c | 4 ++- drivers/xen/features.c | 6 ++++- drivers/xen/grant-table.c | 1 include/xen/interface/event_channel.h | 2 + 8 files changed, 31 insertions(+), 33 deletions(-) diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -33,8 +33,14 @@ #ifndef _ASM_X86_XEN_HYPERCALL_H #define _ASM_X86_XEN_HYPERCALL_H +#include <linux/kernel.h> +#include <linux/spinlock.h> #include <linux/errno.h> #include <linux/string.h> +#include <linux/types.h> + +#include <asm/page.h> +#include <asm/pgtable.h> #include <xen/interface/xen.h> #include <xen/interface/sched.h> diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -33,39 +33,10 @@ #ifndef _ASM_X86_XEN_HYPERVISOR_H #define _ASM_X86_XEN_HYPERVISOR_H -#include <linux/types.h> -#include <linux/kernel.h> - -#include <xen/interface/xen.h> -#include <xen/interface/version.h> - -#include <asm/ptrace.h> -#include <asm/page.h> -#include <asm/desc.h> -#if defined(__i386__) -# ifdef CONFIG_X86_PAE -# include <asm-generic/pgtable-nopud.h> -# else -# include <asm-generic/pgtable-nopmd.h> -# endif -#endif -#include <asm/xen/hypercall.h> - /* arch/i386/kernel/setup.c */ extern struct shared_info *HYPERVISOR_shared_info; extern struct start_info *xen_start_info; -/* arch/i386/mach-xen/evtchn.c */ -/* Force a proper event-channel callback from Xen. */ -extern void force_evtchn_callback(void); - -/* Turn jiffies into Xen system time. */ -u64 jiffies_to_st(unsigned long jiffies); - - -#define MULTI_UVMFLAGS_INDEX 3 -#define MULTI_UVMDOMID_INDEX 4 - enum xen_domain_type { XEN_NATIVE, XEN_PV_DOMAIN, @@ -74,9 +45,15 @@ extern enum xen_domain_type xen_domain_type; +#ifdef CONFIG_XEN #define xen_domain() (xen_domain_type != XEN_NATIVE) -#define xen_pv_domain() (xen_domain_type == XEN_PV_DOMAIN) +#else +#define xen_domain() (0) +#endif + +#define xen_pv_domain() (xen_domain() && xen_domain_type == XEN_PV_DOMAIN) +#define xen_hvm_domain() (xen_domain() && xen_domain_type == XEN_HVM_DOMAIN) + #define xen_initial_domain() (xen_pv_domain() && xen_start_info->flags & SIF_INITDOMAIN) -#define xen_hvm_domain() (xen_domain_type == XEN_HVM_DOMAIN) #endif /* _ASM_X86_XEN_HYPERVISOR_H */ diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -1,11 +1,16 @@ #ifndef _ASM_X86_XEN_PAGE_H #define _ASM_X86_XEN_PAGE_H +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/spinlock.h> #include <linux/pfn.h> #include <asm/uaccess.h> +#include <asm/page.h> #include <asm/pgtable.h> +#include <xen/interface/xen.h> #include <xen/features.h> /* Xen machine address */ diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -28,6 +28,7 @@ #include <linux/console.h> #include <xen/interface/xen.h> +#include <xen/interface/version.h> #include <xen/interface/physdev.h> #include <xen/interface/vcpu.h> #include <xen/features.h> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -44,13 +44,15 @@ #include <linux/list.h> #include <linux/sysdev.h> -#include <asm/xen/hypervisor.h> #include <asm/page.h> #include <asm/pgalloc.h> #include <asm/pgtable.h> #include <asm/uaccess.h> #include <asm/tlb.h> +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> +#include <xen/interface/xen.h> #include <xen/interface/memory.h> #include <xen/xenbus.h> #include <xen/features.h> diff --git a/drivers/xen/features.c b/drivers/xen/features.c --- a/drivers/xen/features.c +++ b/drivers/xen/features.c @@ -8,7 +8,11 @@ #include <linux/types.h> #include <linux/cache.h> #include <linux/module.h> -#include <asm/xen/hypervisor.h> + +#include <asm/xen/hypercall.h> + +#include <xen/interface/xen.h> +#include <xen/interface/version.h> #include <xen/features.h> u8 xen_features[XENFEAT_NR_SUBMAPS * 32] __read_mostly; diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -40,6 +40,7 @@ #include <xen/interface/xen.h> #include <xen/page.h> #include <xen/grant_table.h> +#include <asm/xen/hypercall.h> #include <asm/pgtable.h> #include <asm/sync_bitops.h> diff --git a/include/xen/interface/event_channel.h b/include/xen/interface/event_channel.h --- a/include/xen/interface/event_channel.h +++ b/include/xen/interface/event_channel.h @@ -9,6 +9,8 @@ #ifndef __XEN_PUBLIC_EVENT_CHANNEL_H__ #define __XEN_PUBLIC_EVENT_CHANNEL_H__ +#include <xen/interface/xen.h> + typedef uint32_t evtchn_port_t; DEFINE_GUEST_HANDLE(evtchn_port_t); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 07 of 38] x86: add io_apic_ops
Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add a io_apic_ops for it to intercept. Do this as ops structure because there''s at least some chance that another paravirtualized environment may want to intercept these. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/io_apic.h | 9 +++++++ arch/x86/kernel/io_apic.c | 50 +++++++++++++++++++++++++++++++++++++--- 2 files changed, 56 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -21,6 +21,15 @@ #define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15) #define IO_APIC_REDIR_MASKED (1 << 16) +struct io_apic_ops { + void (*init)(void); + unsigned int (*read)(unsigned int apic, unsigned int reg); + void (*write)(unsigned int apic, unsigned int reg, unsigned int value); + void (*modify)(unsigned int apic, unsigned int reg, unsigned int value); +}; + +void __init set_io_apic_ops(const struct io_apic_ops *); + /* * The structure of the IO-APIC: */ diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/io_apic.c --- a/arch/x86/kernel/io_apic.c +++ b/arch/x86/kernel/io_apic.c @@ -67,6 +67,25 @@ #define __apicdebuginit(type) static type __init +static void __init __ioapic_init_mappings(void); +static unsigned int __io_apic_read(unsigned int apic, unsigned int reg); +static void __io_apic_write(unsigned int apic, unsigned int reg, + unsigned int val); +static void __io_apic_modify(unsigned int apic, unsigned int reg, + unsigned int val); + +static struct io_apic_ops io_apic_ops = { + .init = __ioapic_init_mappings, + .read = __io_apic_read, + .write = __io_apic_write, + .modify = __io_apic_modify, +}; + +void __init set_io_apic_ops(const struct io_apic_ops *ops) +{ + io_apic_ops = *ops; +} + /* * Is the SiS APIC rmw bug present ? * -1 = don''t know, 0 = no, 1 = yes @@ -196,6 +215,24 @@ return pin; } +static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) +{ + return io_apic_ops.read(apic, reg); +} + +static inline void io_apic_write(unsigned int apic, unsigned int reg, + unsigned int value) +{ + io_apic_ops.write(apic, reg, value); +} + +static inline void io_apic_modify(unsigned int apic, unsigned int reg, + unsigned int value) +{ + io_apic_ops.modify(apic, reg, value); +} + + struct io_apic { unsigned int index; unsigned int unused[3]; @@ -208,14 +245,15 @@ + (mp_ioapics[idx].mp_apicaddr & ~PAGE_MASK); } -static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg) +static unsigned int __io_apic_read(unsigned int apic, unsigned int reg) { struct io_apic __iomem *io_apic = io_apic_base(apic); writel(reg, &io_apic->index); return readl(&io_apic->data); } -static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value) +static void __io_apic_write(unsigned int apic, unsigned int reg, + unsigned int value) { struct io_apic __iomem *io_apic = io_apic_base(apic); writel(reg, &io_apic->index); @@ -228,7 +266,8 @@ * * Older SiS APIC requires we rewrite the index register */ -static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value) +static void __io_apic_modify(unsigned int apic, unsigned int reg, + unsigned int value) { struct io_apic __iomem *io_apic = io_apic_base(apic); @@ -3848,6 +3887,11 @@ void __init ioapic_init_mappings(void) { + io_apic_ops.init(); +} + +static void __init __ioapic_init_mappings(void) +{ unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0; struct resource *ioapic_res; int i; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 08 of 38] x86: make apic_* operations inline functions
Mainly to get proper type-checking and consistency. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/apic.h | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -125,12 +125,35 @@ extern struct apic_ops *apic_ops; -#define apic_read (apic_ops->read) -#define apic_write (apic_ops->write) -#define apic_icr_read (apic_ops->icr_read) -#define apic_icr_write (apic_ops->icr_write) -#define apic_wait_icr_idle (apic_ops->wait_icr_idle) -#define safe_apic_wait_icr_idle (apic_ops->safe_wait_icr_idle) +static inline u32 apic_read(u32 reg) +{ + return apic_ops->read(reg); +} + +static inline void apic_write(u32 reg, u32 val) +{ + apic_ops->write(reg, val); +} + +static inline u64 apic_icr_read(void) +{ + return apic_ops->icr_read(); +} + +static inline void apic_icr_write(u32 low, u32 high) +{ + apic_ops->icr_write(low, high); +} + +static inline void apic_wait_icr_idle(void) +{ + apic_ops->wait_icr_idle(); +} + +static inline u32 safe_apic_wait_icr_idle(void) +{ + return apic_ops->safe_wait_icr_idle(); +} extern int get_physical_broadcast(void); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 09 of 38] x86: make sure we really have an hpet mapping before using it
When booting in Xen dom0, the hpet isn''t really accessible, so make sure the mapping is non-NULL before use. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/hpet.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -834,10 +834,11 @@ hpet_address = force_hpet_address; hpet_enable(); - if (!hpet_virt_address) - return -ENODEV; } + if (!hpet_virt_address) + return -ENODEV; + hpet_reserve_platform_timers(hpet_readl(HPET_ID)); for_each_online_cpu(cpu) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 10 of 38] x86: add handle_irq() to allow interrupt injection
Xen uses a different interrupt path, so introduce handle_irq() to allow interrupts to be inserted into the normal interrupt path. This is handled slightly differently on 32 and 64-bit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/irq.h | 1 + arch/x86/kernel/irq_32.c | 34 +++++++++++++++++++++------------- arch/x86/kernel/irq_64.c | 27 ++++++++++++++++++--------- 3 files changed, 40 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h --- a/arch/x86/include/asm/irq.h +++ b/arch/x86/include/asm/irq.h @@ -39,6 +39,7 @@ extern unsigned int do_IRQ(struct pt_regs *regs); extern void init_IRQ(void); extern void native_init_IRQ(void); +extern bool handle_irq(unsigned irq, struct pt_regs *regs); /* Interrupt vector management */ extern DECLARE_BITMAP(used_vectors, NR_VECTORS); diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -191,6 +191,26 @@ execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq) { return 0; } #endif +bool handle_irq(unsigned irq, struct pt_regs *regs) +{ + struct irq_desc *desc; + int overflow; + + overflow = check_stack_overflow(); + + desc = irq_to_desc(irq); + if (unlikely(!desc)) + return false; + + if (!execute_on_irq_stack(overflow, desc, irq)) { + if (unlikely(overflow)) + print_stack_overflow(); + desc->handle_irq(irq, desc); + } + + return true; +} + /* * do_IRQ handles all normal device IRQ''s (the special * SMP cross-CPU interrupts have their own specific @@ -200,31 +220,19 @@ { struct pt_regs *old_regs; /* high bit used in ret_from_ code */ - int overflow; unsigned vector = ~regs->orig_ax; - struct irq_desc *desc; unsigned irq; - old_regs = set_irq_regs(regs); irq_enter(); irq = __get_cpu_var(vector_irq)[vector]; - overflow = check_stack_overflow(); - - desc = irq_to_desc(irq); - if (unlikely(!desc)) { + if (!handle_irq(irq, regs)) { printk(KERN_EMERG "%s: cannot handle IRQ %d vector %#x cpu %d\n", __func__, irq, vector, smp_processor_id()); BUG(); } - if (!execute_on_irq_stack(overflow, desc, irq)) { - if (unlikely(overflow)) - print_stack_overflow(); - desc->handle_irq(irq, desc); - } - irq_exit(); set_irq_regs(old_regs); return 1; diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c --- a/arch/x86/kernel/irq_64.c +++ b/arch/x86/kernel/irq_64.c @@ -42,6 +42,22 @@ } #endif +bool handle_irq(unsigned irq, struct pt_regs *regs) +{ + struct irq_desc *desc; + +#ifdef CONFIG_DEBUG_STACKOVERFLOW + stack_overflow_check(regs); +#endif + + desc = irq_to_desc(irq); + if (unlikely(!desc)) + return false; + + generic_handle_irq_desc(irq, desc); + return true; +} + /* * do_IRQ handles all normal device IRQ''s (the special * SMP cross-CPU interrupts have their own specific @@ -50,7 +66,6 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); - struct irq_desc *desc; /* high bit used in ret_from_ code */ unsigned vector = ~regs->orig_ax; @@ -58,16 +73,10 @@ exit_idle(); irq_enter(); + irq = __get_cpu_var(vector_irq)[vector]; -#ifdef CONFIG_DEBUG_STACKOVERFLOW - stack_overflow_check(regs); -#endif - - desc = irq_to_desc(irq); - if (likely(desc)) - generic_handle_irq_desc(irq, desc); - else { + if (!handle_irq(irq, regs)) { if (!disable_apic) ack_APIC_irq(); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 11 of 38] x86: define no-op exit_idle() on 32-bit
exit_idle() is only used on 64-bit, but define a noop version on 32-bit to allow subsequent unification. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/idle.h | 10 ++++++++++ arch/x86/kernel/apic.c | 7 +------ arch/x86/kernel/io_apic.c | 2 -- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/idle.h b/arch/x86/include/asm/idle.h --- a/arch/x86/include/asm/idle.h +++ b/arch/x86/include/asm/idle.h @@ -8,8 +8,18 @@ void idle_notifier_register(struct notifier_block *n); void idle_notifier_unregister(struct notifier_block *n); +#ifdef CONFIG_X86_64 void enter_idle(void); void exit_idle(void); +#else +static inline void enter_idle(void) +{ +} + +static inline void exit_idle(void) +{ +} +#endif /* CONFIG_X86_64 */ void c1e_remove_cpu(int cpu); diff --git a/arch/x86/kernel/apic.c b/arch/x86/kernel/apic.c --- a/arch/x86/kernel/apic.c +++ b/arch/x86/kernel/apic.c @@ -808,9 +808,8 @@ * Besides, if we don''t timer interrupts ignore the global * interrupt lock, which is the WrongThing (tm) to do. */ -#ifdef CONFIG_X86_64 exit_idle(); -#endif + irq_enter(); local_apic_timer_interrupt(); irq_exit(); @@ -1668,9 +1667,7 @@ { u32 v; -#ifdef CONFIG_X86_64 exit_idle(); -#endif irq_enter(); /* * Check if this really is a spurious interrupt and ACK it @@ -1699,9 +1696,7 @@ { u32 v, v1; -#ifdef CONFIG_X86_64 exit_idle(); -#endif irq_enter(); /* First tickle the hardware, only then report what went on. -- REW */ v = apic_read(APIC_ESR); diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/io_apic.c --- a/arch/x86/kernel/io_apic.c +++ b/arch/x86/kernel/io_apic.c @@ -2242,9 +2242,7 @@ { unsigned vector, me; ack_APIC_irq(); -#ifdef CONFIG_X86_64 exit_idle(); -#endif irq_enter(); me = smp_processor_id(); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 12 of 38] xen/dom0: handle acpi lapic parsing in Xen dom0
When running in Xen dom0, we still want to parse the ACPI tables to find out about local and IO apics, but we don''t want to actually use the lapics. Put a couple of tests for Xen to prevent lapics from being mapped or accessed. This is very Xen-specific behaviour, so there didn''t seem to be any point in adding more indirection. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/xen/hypervisor.h | 2 ++ arch/x86/kernel/acpi/boot.c | 10 ++++++++++ 2 files changed, 12 insertions(+) diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -46,6 +46,8 @@ extern enum xen_domain_type xen_domain_type; #ifdef CONFIG_XEN +#include <xen/interface/xen.h> + #define xen_domain() (xen_domain_type != XEN_NATIVE) #else #define xen_domain() (0) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -42,6 +42,8 @@ #include <asm/mpspec.h> #include <asm/smp.h> +#include <asm/xen/hypervisor.h> + #ifdef CONFIG_X86_LOCAL_APIC # include <mach_apic.h> #endif @@ -234,6 +236,10 @@ { unsigned int ver = 0; + /* We don''t want to register lapics when in Xen dom0 */ + if (xen_initial_domain()) + return; + if (!enabled) { ++disabled_cpus; return; @@ -755,6 +761,10 @@ static void __init acpi_register_lapic_address(unsigned long address) { + /* Xen dom0 doesn''t have usable lapics */ + if (xen_initial_domain()) + return; + mp_lapic_addr = address; set_fixmap_nocache(FIX_APIC_BASE, address); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 13 of 38] x86: unstatic mp_find_ioapic so it can be used elsewhere
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/mpspec.h | 1 + arch/x86/kernel/acpi/boot.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h --- a/arch/x86/include/asm/mpspec.h +++ b/arch/x86/include/asm/mpspec.h @@ -63,6 +63,7 @@ #ifdef CONFIG_X86_IO_APIC extern int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin, u32 gsi, int triggering, int polarity); +extern int mp_find_ioapic(int gsi); #else static inline int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin, diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -865,7 +865,7 @@ DECLARE_BITMAP(pin_programmed, MP_MAX_IOAPIC_PIN + 1); } mp_ioapic_routing[MAX_IO_APICS]; -static int mp_find_ioapic(int gsi) +int mp_find_ioapic(int gsi) { int i = 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 14 of 38] x86: add mp_find_ioapic_pin
Add mp_find_ioapic_pin() to find an IO APIC''s specific pin from a GSI, and use this function within acpi/boot. Make it non-static so other code can use it too. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/mpspec.h | 1 + arch/x86/kernel/acpi/boot.c | 16 +++++++++++++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h --- a/arch/x86/include/asm/mpspec.h +++ b/arch/x86/include/asm/mpspec.h @@ -64,6 +64,7 @@ extern int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin, u32 gsi, int triggering, int polarity); extern int mp_find_ioapic(int gsi); +extern int mp_find_ioapic_pin(int ioapic, int gsi); #else static inline int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin, diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -880,6 +880,16 @@ return -1; } +int mp_find_ioapic_pin(int ioapic, int gsi) +{ + if (WARN_ON(ioapic == -1)) + return -1; + if (WARN_ON(gsi > mp_ioapic_routing[ioapic].gsi_end)) + return -1; + + return gsi - mp_ioapic_routing[ioapic].gsi_base; +} + static u8 __init uniq_ioapic_id(u8 id) { #ifdef CONFIG_X86_32 @@ -992,7 +1002,7 @@ ioapic = mp_find_ioapic(gsi); if (ioapic < 0) return; - pin = gsi - mp_ioapic_routing[ioapic].gsi_base; + pin = mp_find_ioapic_pin(ioapic, gsi); /* * TBD: This check is for faulty timer entries, where the override @@ -1114,7 +1124,7 @@ return gsi; } - ioapic_pin = gsi - mp_ioapic_routing[ioapic].gsi_base; + ioapic_pin = mp_find_ioapic_pin(ioapic, gsi); #ifdef CONFIG_X86_32 if (ioapic_renumber_irq) @@ -1203,7 +1213,7 @@ mp_irq.mp_srcbusirq = (((devfn >> 3) & 0x1f) << 2) | ((pin - 1) & 3); ioapic = mp_find_ioapic(gsi); mp_irq.mp_dstapic = mp_ioapic_routing[ioapic].apic_id; - mp_irq.mp_dstirq = gsi - mp_ioapic_routing[ioapic].gsi_base; + mp_irq.mp_dstirq = mp_find_ioapic_pin(ioapic, gsi); save_mp_irq(&mp_irq); #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 15 of 38] x86: unstatic ioapic entry funcs
Unstatic ioapic_write_entry and setup_ioapic_entry functions so that the Xen code can do its own ioapic routing setup. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/io_apic.h | 6 ++++++ arch/x86/kernel/io_apic.c | 10 +++++----- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h --- a/arch/x86/include/asm/io_apic.h +++ b/arch/x86/include/asm/io_apic.h @@ -209,6 +209,12 @@ extern int probe_nr_irqs(void); +extern int setup_ioapic_entry(int apic, int irq, + struct IO_APIC_route_entry *entry, + unsigned int destination, int trigger, + int polarity, int vector); +extern void ioapic_write_entry(int apic, int pin, + struct IO_APIC_route_entry e); #else /* !CONFIG_X86_IO_APIC */ #define io_apic_assign_pci_irqs 0 static const int timer_through_8259 = 0; diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/io_apic.c --- a/arch/x86/kernel/io_apic.c +++ b/arch/x86/kernel/io_apic.c @@ -337,7 +337,7 @@ io_apic_write(apic, 0x10 + 2*pin, eu.w1); } -static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) +void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) { unsigned long flags; spin_lock_irqsave(&ioapic_lock, flags); @@ -1275,10 +1275,10 @@ handle_edge_irq, "edge"); } -static int setup_ioapic_entry(int apic, int irq, - struct IO_APIC_route_entry *entry, - unsigned int destination, int trigger, - int polarity, int vector) +int setup_ioapic_entry(int apic, int irq, + struct IO_APIC_route_entry *entry, + unsigned int destination, int trigger, + int polarity, int vector) { /* * add it to the IO-APIC irq-routing table: _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 16 of 38] x86: include linux/init.h in asm/numa_64.h
It uses __init/__cpuinit, which are not defined otherwise. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/numa_64.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/numa_64.h b/arch/x86/include/asm/numa_64.h --- a/arch/x86/include/asm/numa_64.h +++ b/arch/x86/include/asm/numa_64.h @@ -1,6 +1,7 @@ #ifndef _ASM_X86_NUMA_64_H #define _ASM_X86_NUMA_64_H +#include <linux/init.h> #include <linux/nodemask.h> #include <asm/apicdef.h> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 17 of 38] x86: add swiotlb allocation functions
Add x86-specific swiotlb allocation functions. These are purely default for the moment. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/pci-swiotlb_64.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/pci-swiotlb_64.c b/arch/x86/kernel/pci-swiotlb_64.c --- a/arch/x86/kernel/pci-swiotlb_64.c +++ b/arch/x86/kernel/pci-swiotlb_64.c @@ -3,6 +3,8 @@ #include <linux/pci.h> #include <linux/cache.h> #include <linux/module.h> +#include <linux/swiotlb.h> +#include <linux/bootmem.h> #include <linux/dma-mapping.h> #include <asm/iommu.h> @@ -11,6 +13,16 @@ int swiotlb __read_mostly; +void *swiotlb_alloc_boot(size_t size, unsigned long nslabs) +{ + return alloc_bootmem_low_pages(size); +} + +void *swiotlb_alloc(unsigned order, unsigned long nslabs) +{ + return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order); +} + static dma_addr_t swiotlb_map_single_phys(struct device *hwdev, phys_addr_t paddr, size_t size, int direction) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
swiotlb on 32 bit will be used by Xen domain 0 support. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- arch/x86/include/asm/dma-mapping.h | 6 +----- arch/x86/include/asm/pci.h | 2 ++ arch/x86/include/asm/pci_64.h | 1 - arch/x86/kernel/Makefile | 3 ++- arch/x86/kernel/pci-dma.c | 6 ++++-- arch/x86/kernel/pci-swiotlb_64.c | 2 ++ arch/x86/mm/init_32.c | 3 +++ 7 files changed, 14 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h --- a/arch/x86/include/asm/dma-mapping.h +++ b/arch/x86/include/asm/dma-mapping.h @@ -66,21 +66,17 @@ return dma_ops; else return dev->archdata.dma_ops; -#endif /* _ASM_X86_DMA_MAPPING_H */ +#endif } /* Make sure we keep the same behaviour */ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { -#ifdef CONFIG_X86_32 - return 0; -#else struct dma_mapping_ops *ops = get_dma_ops(dev); if (ops->mapping_error) return ops->mapping_error(dev, dma_addr); return (dma_addr == bad_dma_address); -#endif } #define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f) diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -84,6 +84,8 @@ static inline void early_quirks(void) { } #endif +extern void pci_iommu_alloc(void); + #endif /* __KERNEL__ */ #ifdef CONFIG_X86_32 diff --git a/arch/x86/include/asm/pci_64.h b/arch/x86/include/asm/pci_64.h --- a/arch/x86/include/asm/pci_64.h +++ b/arch/x86/include/asm/pci_64.h @@ -23,7 +23,6 @@ int reg, int len, u32 value); extern void dma32_reserve_bootmem(void); -extern void pci_iommu_alloc(void); /* The PCI address space does equal the physical memory * address space. The networking and block device layers use diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -115,6 +115,8 @@ obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o +obj-$(CONFIG_SWIOTLB) += pci-swiotlb_64.o # NB rename without _64 + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) @@ -128,7 +130,6 @@ obj-$(CONFIG_GART_IOMMU) += pci-gart_64.o aperture_64.o obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu_init.o amd_iommu.o - obj-$(CONFIG_SWIOTLB) += pci-swiotlb_64.o obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o endif diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -105,11 +105,15 @@ dma32_bootmem_ptr = NULL; dma32_bootmem_size = 0; } +#endif void __init pci_iommu_alloc(void) { +#ifdef CONFIG_X86_64 /* free the range so iommu could get some range less than 4G */ dma32_free_bootmem(); +#endif + /* * The order of these functions is important for * fall-back/fail-over reasons @@ -125,8 +129,6 @@ pci_swiotlb_init(); } -#endif - void *dma_generic_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, gfp_t flag) { diff --git a/arch/x86/kernel/pci-swiotlb_64.c b/arch/x86/kernel/pci-swiotlb_64.c --- a/arch/x86/kernel/pci-swiotlb_64.c +++ b/arch/x86/kernel/pci-swiotlb_64.c @@ -62,8 +62,10 @@ void __init pci_swiotlb_init(void) { /* don''t initialize swiotlb if iommu=off (no_iommu=1) */ +#ifdef CONFIG_X86_64 if (!iommu_detected && !no_iommu && max_pfn > MAX_DMA32_PFN) swiotlb = 1; +#endif if (swiotlb_force) swiotlb = 1; if (swiotlb) { diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -21,6 +21,7 @@ #include <linux/init.h> #include <linux/highmem.h> #include <linux/pagemap.h> +#include <linux/pci.h> #include <linux/pfn.h> #include <linux/poison.h> #include <linux/bootmem.h> @@ -971,6 +972,8 @@ int codesize, reservedpages, datasize, initsize; int tmp; + pci_iommu_alloc(); + #ifdef CONFIG_FLATMEM BUG_ON(!mem_map); #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 19 of 38] x86: add arch specific version of the swiotlb virt<->bus<->virt functions
From: Ian Campbell <ian.campbell@citrix.com> These are to be used later and contain the default implementation. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/pci-swiotlb_64.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/pci-swiotlb_64.c b/arch/x86/kernel/pci-swiotlb_64.c --- a/arch/x86/kernel/pci-swiotlb_64.c +++ b/arch/x86/kernel/pci-swiotlb_64.c @@ -23,6 +23,16 @@ return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order); } +unsigned long swiotlb_virt_to_bus(volatile void *address) +{ + return virt_to_bus(address); +} + +void * swiotlb_bus_to_virt(unsigned long address) +{ + return bus_to_virt(address); +} + static dma_addr_t swiotlb_map_single_phys(struct device *hwdev, phys_addr_t paddr, size_t size, int direction) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 20 of 38] xen dom0: Make hvc_xen console work for dom0
Use the console hypercalls for dom0 console. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- drivers/char/hvc_xen.c | 100 +++++++++++++++++++++++++++++++----------------- drivers/xen/events.c | 2 include/xen/events.h | 2 3 files changed, 69 insertions(+), 35 deletions(-) diff --git a/drivers/char/hvc_xen.c b/drivers/char/hvc_xen.c --- a/drivers/char/hvc_xen.c +++ b/drivers/char/hvc_xen.c @@ -55,7 +55,7 @@ notify_remote_via_evtchn(xen_start_info->console.domU.evtchn); } -static int write_console(uint32_t vtermno, const char *data, int len) +static int domU_write_console(uint32_t vtermno, const char *data, int len) { struct xencons_interface *intf = xencons_interface(); XENCONS_RING_IDX cons, prod; @@ -76,7 +76,7 @@ return sent; } -static int read_console(uint32_t vtermno, char *buf, int len) +static int domU_read_console(uint32_t vtermno, char *buf, int len) { struct xencons_interface *intf = xencons_interface(); XENCONS_RING_IDX cons, prod; @@ -97,28 +97,62 @@ return recv; } -static struct hv_ops hvc_ops = { - .get_chars = read_console, - .put_chars = write_console, +static struct hv_ops domU_hvc_ops = { + .get_chars = domU_read_console, + .put_chars = domU_write_console, .notifier_add = notifier_add_irq, .notifier_del = notifier_del_irq, .notifier_hangup = notifier_hangup_irq, }; -static int __init xen_init(void) +static int dom0_read_console(uint32_t vtermno, char *buf, int len) +{ + return HYPERVISOR_console_io(CONSOLEIO_read, len, buf); +} + +/* + * Either for a dom0 to write to the system console, or a domU with a + * debug version of Xen + */ +static int dom0_write_console(uint32_t vtermno, const char *str, int len) +{ + int rc = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)str); + if (rc < 0) + return 0; + + return len; +} + +static struct hv_ops dom0_hvc_ops = { + .get_chars = dom0_read_console, + .put_chars = dom0_write_console, + .notifier_add = notifier_add_irq, + .notifier_del = notifier_del_irq, +}; + +static int __init xen_hvc_init(void) { struct hvc_struct *hp; + struct hv_ops *ops; - if (!xen_pv_domain() || - xen_initial_domain() || - !xen_start_info->console.domU.evtchn) - return -ENODEV; + if (!xen_pv_domain()) + return -ENODEV; - xencons_irq = bind_evtchn_to_irq(xen_start_info->console.domU.evtchn); + if (xen_initial_domain()) { + ops = &dom0_hvc_ops; + xencons_irq = bind_virq_to_irq(VIRQ_CONSOLE, 0); + } else { + if (!xen_start_info->console.domU.evtchn) + return -ENODEV; + + ops = &domU_hvc_ops; + xencons_irq = bind_evtchn_to_irq(xen_start_info->console.domU.evtchn); + } + if (xencons_irq < 0) xencons_irq = 0; /* NO_IRQ */ - hp = hvc_alloc(HVC_COOKIE, xencons_irq, &hvc_ops, 256); + hp = hvc_alloc(HVC_COOKIE, xencons_irq, ops, 256); if (IS_ERR(hp)) return PTR_ERR(hp); @@ -135,7 +169,7 @@ rebind_evtchn_irq(xen_start_info->console.domU.evtchn, xencons_irq); } -static void __exit xen_fini(void) +static void __exit xen_hvc_fini(void) { if (hvc) hvc_remove(hvc); @@ -143,29 +177,24 @@ static int xen_cons_init(void) { + struct hv_ops *ops; + if (!xen_pv_domain()) return 0; - hvc_instantiate(HVC_COOKIE, 0, &hvc_ops); + ops = &domU_hvc_ops; + if (xen_initial_domain()) + ops = &dom0_hvc_ops; + + hvc_instantiate(HVC_COOKIE, 0, ops); + return 0; } -module_init(xen_init); -module_exit(xen_fini); +module_init(xen_hvc_init); +module_exit(xen_hvc_fini); console_initcall(xen_cons_init); -static void raw_console_write(const char *str, int len) -{ - while(len > 0) { - int rc = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)str); - if (rc <= 0) - break; - - str += rc; - len -= rc; - } -} - #ifdef CONFIG_EARLY_PRINTK static void xenboot_write_console(struct console *console, const char *string, unsigned len) @@ -173,19 +202,22 @@ unsigned int linelen, off = 0; const char *pos; - raw_console_write(string, len); + dom0_write_console(0, string, len); - write_console(0, "(early) ", 8); + if (xen_initial_domain()) + return; + + domU_write_console(0, "(early) ", 8); while (off < len && NULL != (pos = strchr(string+off, ''\n''))) { linelen = pos-string+off; if (off + linelen > len) break; - write_console(0, string+off, linelen); - write_console(0, "\r\n", 2); + domU_write_console(0, string+off, linelen); + domU_write_console(0, "\r\n", 2); off += linelen + 1; } if (off < len) - write_console(0, string+off, len-off); + domU_write_console(0, string+off, len-off); } struct console xenboot_console = { @@ -197,7 +229,7 @@ void xen_raw_console_write(const char *str) { - raw_console_write(str, strlen(str)); + dom0_write_console(0, str, strlen(str)); } void xen_raw_printk(const char *fmt, ...) diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -307,7 +307,7 @@ } -static int bind_virq_to_irq(unsigned int virq, unsigned int cpu) +int bind_virq_to_irq(unsigned int virq, unsigned int cpu) { struct evtchn_bind_virq bind_virq; int evtchn, irq; diff --git a/include/xen/events.h b/include/xen/events.h --- a/include/xen/events.h +++ b/include/xen/events.h @@ -12,6 +12,8 @@ irq_handler_t handler, unsigned long irqflags, const char *devname, void *dev_id); +int bind_virq_to_irq(unsigned int virq, unsigned int cpu); + int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu, irq_handler_t handler, unsigned long irqflags, const char *devname, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 21 of 38] xen dom0: Initialize xenbus for dom0
From: Juan Quintela <quintela@redhat.com> Do initial xenbus/xenstore setup in dom0. In dom0 we need to actually allocate the xenstore resources, rather than being given them from outside. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- drivers/xen/xenbus/xenbus_probe.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -810,6 +810,7 @@ static int __init xenbus_probe_init(void) { int err = 0; + unsigned long page = 0; DPRINTK(""); @@ -830,7 +831,31 @@ * Domain0 doesn''t have a store_evtchn or store_mfn yet. */ if (xen_initial_domain()) { - /* dom0 not yet supported */ + struct evtchn_alloc_unbound alloc_unbound; + + /* Allocate Xenstore page */ + page = get_zeroed_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + xen_store_mfn = xen_start_info->store_mfn + pfn_to_mfn(virt_to_phys((void *)page) >> + PAGE_SHIFT); + + /* Next allocate a local port which xenstored can bind to */ + alloc_unbound.dom = DOMID_SELF; + alloc_unbound.remote_dom = 0; + + err = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, + &alloc_unbound); + if (err == -ENOSYS) + goto out_unreg_front; + + BUG_ON(err); + xen_store_evtchn = xen_start_info->store_evtchn + alloc_unbound.port; + + xen_store_interface = mfn_to_virt(xen_store_mfn); } else { xenstored_ready = 1; xen_store_evtchn = xen_start_info->store_evtchn; @@ -858,6 +883,9 @@ bus_unregister(&xenbus_frontend.bus); out_error: + if (page != 0) + free_page(page); + return err; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 22 of 38] xen dom0: Set up basic IO permissions for dom0
From: Juan Quintela <quintela@redhat.com> Add the direct mapping area for ISA bus access, and enable IO space access for the guest when running as dom0. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- arch/x86/xen/enlighten.c | 32 ++++++++++++++++++++++++++++++++ arch/x86/xen/setup.c | 6 +++++- 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1437,6 +1437,7 @@ return __ka(m2p(maddr)); } +/* Set the page permissions on an identity-mapped pages */ static void set_page_prot(void *addr, pgprot_t prot) { unsigned long pfn = __pa(addr) >> PAGE_SHIFT; @@ -1492,6 +1493,29 @@ set_page_prot(pmd, PAGE_KERNEL_RO); } +static __init void xen_ident_map_ISA(void) +{ + unsigned long pa; + + /* + * If we''re dom0, then linear map the ISA machine addresses into + * the kernel''s address space. + */ + if (!xen_initial_domain()) + return; + + xen_raw_printk("Xen: setup ISA identity maps\n"); + + for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) { + pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO); + + if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0)) + BUG(); + } + + xen_flush_tlb(); +} + #ifdef CONFIG_X86_64 static void convert_pfn_mfn(void *v) { @@ -1674,6 +1698,7 @@ xen_raw_console_write("mapping kernel into physical memory\n"); pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages); + xen_ident_map_ISA(); init_mm.pgd = pgd; @@ -1683,6 +1708,13 @@ if (xen_feature(XENFEAT_supervisor_mode_kernel)) pv_info.kernel_rpl = 0; + if (xen_initial_domain()) { + struct physdev_set_iopl set_iopl; + set_iopl.iopl = 1; + if (HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl) == -1) + BUG(); + } + /* set the limit of our address space */ xen_reserve_top(); diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -51,6 +51,9 @@ * Even though this is normal, usable memory under Xen, reserve * ISA memory anyway because too many things think they can poke * about in there. + * + * In a dom0 kernel, this region is identity mapped with the + * hardware ISA area, so it really is out of bounds. */ e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS, E820_RESERVED); @@ -188,7 +191,8 @@ pm_idle = xen_idle; - paravirt_disable_iospace(); + if (!xen_initial_domain()) + paravirt_disable_iospace(); fiddle_vdso(); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 23 of 38] xen-dom0: only selectively disable cpu features
Dom0 kernels actually want most of the CPU features to be enabled. Some, like MCA/MCE, are still handled by Xen itself. We leave APIC enabled even though we don''t really have a functional local apic so that the ACPI code will parse the corresponding tables properly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/enlighten.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -205,18 +205,23 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx, unsigned int *cx, unsigned int *dx) { - unsigned maskedx = ~0; + unsigned maskedx = 0; /* * Mask out inconvenient features, to try and disable as many * unsupported kernel subsystems as possible. */ - if (*ax == 1) - maskedx = ~((1 << X86_FEATURE_APIC) | /* disable APIC */ - (1 << X86_FEATURE_ACPI) | /* disable ACPI */ - (1 << X86_FEATURE_MCE) | /* disable MCE */ - (1 << X86_FEATURE_MCA) | /* disable MCA */ - (1 << X86_FEATURE_ACC)); /* thermal monitoring */ + if (*ax == 1) { + maskedx + (1 << X86_FEATURE_MCE) | /* disable MCE */ + (1 << X86_FEATURE_MCA) | /* disable MCA */ + (1 << X86_FEATURE_ACC); /* thermal monitoring */ + + if (!xen_initial_domain()) + maskedx |+ (1 << X86_FEATURE_APIC) | /* disable local APIC */ + (1 << X86_FEATURE_ACPI); /* disable ACPI */ + } asm(XEN_EMULATE_PREFIX "cpuid" : "=a" (*ax), @@ -224,7 +229,7 @@ "=c" (*cx), "=d" (*dx) : "0" (*ax), "2" (*cx)); - *dx &= maskedx; + *dx &= ~maskedx; } static void xen_set_debugreg(int reg, unsigned long val) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 24 of 38] xen dom0: Add support for the platform_ops hypercall
From: Stephen Tweedie <sct@redhat.com> Minimal changes to get platform ops (renamed dom0_ops on pv_ops) working on pv_ops builds. Pulls in upstream linux-2.6.18-xen.hg''s platform.h Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/xen/hypercall.h | 8 + include/xen/interface/platform.h | 232 ++++++++++++++++++++++++++++++++++ include/xen/interface/xen.h | 2 3 files changed, 242 insertions(+) diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -45,6 +45,7 @@ #include <xen/interface/xen.h> #include <xen/interface/sched.h> #include <xen/interface/physdev.h> +#include <xen/interface/platform.h> /* * The hypercall asms have to meet several constraints: @@ -282,6 +283,13 @@ } static inline int +HYPERVISOR_dom0_op(struct xen_platform_op *platform_op) +{ + platform_op->interface_version = XENPF_INTERFACE_VERSION; + return _hypercall1(int, dom0_op, platform_op); +} + +static inline int HYPERVISOR_set_debugreg(int reg, unsigned long value) { return _hypercall2(int, set_debugreg, reg, value); diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h new file mode 100644 --- /dev/null +++ b/include/xen/interface/platform.h @@ -0,0 +1,232 @@ +/****************************************************************************** + * platform.h + * + * Hardware platform operations. Intended for use by domain-0 kernel. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2002-2006, K Fraser + */ + +#ifndef __XEN_PUBLIC_PLATFORM_H__ +#define __XEN_PUBLIC_PLATFORM_H__ + +#include "xen.h" + +#define XENPF_INTERFACE_VERSION 0x03000001 + +/* + * Set clock such that it would read <secs,nsecs> after 00:00:00 UTC, + * 1 January, 1970 if the current system time was <system_time>. + */ +#define XENPF_settime 17 +struct xenpf_settime { + /* IN variables. */ + uint32_t secs; + uint32_t nsecs; + uint64_t system_time; +}; +typedef struct xenpf_settime xenpf_settime_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_settime_t); + +/* + * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type. + * On x86, @type is an architecture-defined MTRR memory type. + * On success, returns the MTRR that was used (@reg) and a handle that can + * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting. + * (x86-specific). + */ +#define XENPF_add_memtype 31 +struct xenpf_add_memtype { + /* IN variables. */ + unsigned long mfn; + uint64_t nr_mfns; + uint32_t type; + /* OUT variables. */ + uint32_t handle; + uint32_t reg; +}; +typedef struct xenpf_add_memtype xenpf_add_memtype_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_add_memtype_t); + +/* + * Tear down an existing memory-range type. If @handle is remembered then it + * should be passed in to accurately tear down the correct setting (in case + * of overlapping memory regions with differing types). If it is not known + * then @handle should be set to zero. In all cases @reg must be set. + * (x86-specific). + */ +#define XENPF_del_memtype 32 +struct xenpf_del_memtype { + /* IN variables. */ + uint32_t handle; + uint32_t reg; +}; +typedef struct xenpf_del_memtype xenpf_del_memtype_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_del_memtype_t); + +/* Read current type of an MTRR (x86-specific). */ +#define XENPF_read_memtype 33 +struct xenpf_read_memtype { + /* IN variables. */ + uint32_t reg; + /* OUT variables. */ + unsigned long mfn; + uint64_t nr_mfns; + uint32_t type; +}; +typedef struct xenpf_read_memtype xenpf_read_memtype_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_read_memtype_t); + +#define XENPF_microcode_update 35 +struct xenpf_microcode_update { + /* IN variables. */ + GUEST_HANDLE(void) data; /* Pointer to microcode data */ + uint32_t length; /* Length of microcode data. */ +}; +typedef struct xenpf_microcode_update xenpf_microcode_update_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_microcode_update_t); + +#define XENPF_platform_quirk 39 +#define QUIRK_NOIRQBALANCING 1 /* Do not restrict IO-APIC RTE targets */ +#define QUIRK_IOAPIC_BAD_REGSEL 2 /* IO-APIC REGSEL forgets its value */ +#define QUIRK_IOAPIC_GOOD_REGSEL 3 /* IO-APIC REGSEL behaves properly */ +struct xenpf_platform_quirk { + /* IN variables. */ + uint32_t quirk_id; +}; +typedef struct xenpf_platform_quirk xenpf_platform_quirk_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_platform_quirk_t); + +#define XENPF_firmware_info 50 +#define XEN_FW_DISK_INFO 1 /* from int 13 AH=08/41/48 */ +#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */ +#define XEN_FW_VBEDDC_INFO 3 /* from int 10 AX=4f15 */ +struct xenpf_firmware_info { + /* IN variables. */ + uint32_t type; + uint32_t index; + /* OUT variables. */ + union { + struct { + /* Int13, Fn48: Check Extensions Present. */ + uint8_t device; /* %dl: bios device number */ + uint8_t version; /* %ah: major version */ + uint16_t interface_support; /* %cx: support bitmap */ + /* Int13, Fn08: Legacy Get Device Parameters. */ + uint16_t legacy_max_cylinder; /* %cl[7:6]:%ch: max cyl # */ + uint8_t legacy_max_head; /* %dh: max head # */ + uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector # */ + /* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */ + /* NB. First uint16_t of buffer must be set to buffer size. */ + GUEST_HANDLE(void) edd_params; + } disk_info; /* XEN_FW_DISK_INFO */ + struct { + uint8_t device; /* bios device number */ + uint32_t mbr_signature; /* offset 0x1b8 in mbr */ + } disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */ + struct { + /* Int10, AX=4F15: Get EDID info. */ + uint8_t capabilities; + uint8_t edid_transfer_time; + /* must refer to 128-byte buffer */ + GUEST_HANDLE(uchar) edid; + } vbeddc_info; /* XEN_FW_VBEDDC_INFO */ + } u; +}; +typedef struct xenpf_firmware_info xenpf_firmware_info_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_firmware_info_t); + +#define XENPF_enter_acpi_sleep 51 +struct xenpf_enter_acpi_sleep { + /* IN variables */ + uint16_t pm1a_cnt_val; /* PM1a control value. */ + uint16_t pm1b_cnt_val; /* PM1b control value. */ + uint32_t sleep_state; /* Which state to enter (Sn). */ + uint32_t flags; /* Must be zero. */ +}; +typedef struct xenpf_enter_acpi_sleep xenpf_enter_acpi_sleep_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_enter_acpi_sleep_t); + +#define XENPF_change_freq 52 +struct xenpf_change_freq { + /* IN variables */ + uint32_t flags; /* Must be zero. */ + uint32_t cpu; /* Physical cpu. */ + uint64_t freq; /* New frequency (Hz). */ +}; +typedef struct xenpf_change_freq xenpf_change_freq_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_change_freq_t); + +/* + * Get idle times (nanoseconds since boot) for physical CPUs specified in the + * @cpumap_bitmap with range [0..@cpumap_nr_cpus-1]. The @idletime array is + * indexed by CPU number; only entries with the corresponding @cpumap_bitmap + * bit set are written to. On return, @cpumap_bitmap is modified so that any + * non-existent CPUs are cleared. Such CPUs have their @idletime array entry + * cleared. + */ +#define XENPF_getidletime 53 +struct xenpf_getidletime { + /* IN/OUT variables */ + /* IN: CPUs to interrogate; OUT: subset of IN which are present */ + GUEST_HANDLE(uchar) cpumap_bitmap; + /* IN variables */ + /* Size of cpumap bitmap. */ + uint32_t cpumap_nr_cpus; + /* Must be indexable for every cpu in cpumap_bitmap. */ + GUEST_HANDLE(uint64_t) idletime; + /* OUT variables */ + /* System time when the idletime snapshots were taken. */ + uint64_t now; +}; +typedef struct xenpf_getidletime xenpf_getidletime_t; +DEFINE_GUEST_HANDLE_STRUCT(xenpf_getidletime_t); + +struct xen_platform_op { + uint32_t cmd; + uint32_t interface_version; /* XENPF_INTERFACE_VERSION */ + union { + struct xenpf_settime settime; + struct xenpf_add_memtype add_memtype; + struct xenpf_del_memtype del_memtype; + struct xenpf_read_memtype read_memtype; + struct xenpf_microcode_update microcode; + struct xenpf_platform_quirk platform_quirk; + struct xenpf_firmware_info firmware_info; + struct xenpf_enter_acpi_sleep enter_acpi_sleep; + struct xenpf_change_freq change_freq; + struct xenpf_getidletime getidletime; + uint8_t pad[128]; + } u; +}; +typedef struct xen_platform_op xen_platform_op_t; +DEFINE_GUEST_HANDLE_STRUCT(xen_platform_op_t); + +#endif /* __XEN_PUBLIC_PLATFORM_H__ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h --- a/include/xen/interface/xen.h +++ b/include/xen/interface/xen.h @@ -461,6 +461,8 @@ #define __mk_unsigned_long(x) x ## UL #define mk_unsigned_long(x) __mk_unsigned_long(x) +DEFINE_GUEST_HANDLE(uint64_t); + #else /* __ASSEMBLY__ */ /* In assembly code we cannot use C numeric constant suffixes. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 25 of 38] xen mtrr: Add mtrr_ops support for Xen mtrr
From: Stephen Tweedie <sct@redhat.com> Add a Xen mtrr type, and reorganise mtrr initialisation slightly to allow the mtrr driver to set up num_var_ranges (Xen needs to do this by querying the hypervisor itself.) Only the boot path is handled for now: we set up a xen-specific mtrr_if and set up the mtrr tables based on hypervisor information, but we don''t yet handle mtrr entry add/delete. Signed-off-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/cpu/mtrr/Makefile | 1 arch/x86/kernel/cpu/mtrr/amd.c | 1 arch/x86/kernel/cpu/mtrr/centaur.c | 1 arch/x86/kernel/cpu/mtrr/cyrix.c | 1 arch/x86/kernel/cpu/mtrr/generic.c | 1 arch/x86/kernel/cpu/mtrr/main.c | 11 ++++-- arch/x86/kernel/cpu/mtrr/mtrr.h | 5 +++ arch/x86/kernel/cpu/mtrr/xen.c | 59 ++++++++++++++++++++++++++++++++++++ 8 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/Makefile b/arch/x86/kernel/cpu/mtrr/Makefile --- a/arch/x86/kernel/cpu/mtrr/Makefile +++ b/arch/x86/kernel/cpu/mtrr/Makefile @@ -1,3 +1,4 @@ obj-y := main.o if.o generic.o state.o obj-$(CONFIG_X86_32) += amd.o cyrix.o centaur.o +obj-$(CONFIG_XEN_DOM0) += xen.o diff --git a/arch/x86/kernel/cpu/mtrr/amd.c b/arch/x86/kernel/cpu/mtrr/amd.c --- a/arch/x86/kernel/cpu/mtrr/amd.c +++ b/arch/x86/kernel/cpu/mtrr/amd.c @@ -108,6 +108,7 @@ .get_free_region = generic_get_free_region, .validate_add_page = amd_validate_add_page, .have_wrcomb = positive_have_wrcomb, + .num_var_ranges = common_num_var_ranges, }; int __init amd_init_mtrr(void) diff --git a/arch/x86/kernel/cpu/mtrr/centaur.c b/arch/x86/kernel/cpu/mtrr/centaur.c --- a/arch/x86/kernel/cpu/mtrr/centaur.c +++ b/arch/x86/kernel/cpu/mtrr/centaur.c @@ -213,6 +213,7 @@ .get_free_region = centaur_get_free_region, .validate_add_page = centaur_validate_add_page, .have_wrcomb = positive_have_wrcomb, + .num_var_ranges = common_num_var_ranges, }; int __init centaur_init_mtrr(void) diff --git a/arch/x86/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c --- a/arch/x86/kernel/cpu/mtrr/cyrix.c +++ b/arch/x86/kernel/cpu/mtrr/cyrix.c @@ -263,6 +263,7 @@ .get_free_region = cyrix_get_free_region, .validate_add_page = generic_validate_add_page, .have_wrcomb = positive_have_wrcomb, + .num_var_ranges = common_num_var_ranges, }; int __init cyrix_init_mtrr(void) diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -667,4 +667,5 @@ .set = generic_set_mtrr, .validate_add_page = generic_validate_add_page, .have_wrcomb = generic_have_wrcomb, + .num_var_ranges = common_num_var_ranges, }; diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -99,7 +99,7 @@ } /* This function returns the number of variable MTRRs */ -static void __init set_num_var_ranges(void) +int __init common_num_var_ranges(void) { unsigned long config = 0, dummy; @@ -109,7 +109,7 @@ config = 2; else if (is_cpu(CYRIX) || is_cpu(CENTAUR)) config = 8; - num_var_ranges = config & 0xff; + return config & 0xff; } static void __init init_table(void) @@ -1676,12 +1676,17 @@ void __init mtrr_bp_init(void) { u32 phys_addr; + init_ifs(); phys_addr = 32; if (cpu_has_mtrr) { mtrr_if = &generic_mtrr_ops; +#ifdef CONFIG_XEN_DOM0 + xen_init_mtrr(); +#endif + size_or_mask = 0xff000000; /* 36 bits */ size_and_mask = 0x00f00000; phys_addr = 36; @@ -1739,7 +1744,7 @@ } if (mtrr_if) { - set_num_var_ranges(); + num_var_ranges = mtrr_if->num_var_ranges(); init_table(); if (use_intel()) { get_mtrr_state(); diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h --- a/arch/x86/kernel/cpu/mtrr/mtrr.h +++ b/arch/x86/kernel/cpu/mtrr/mtrr.h @@ -50,6 +50,8 @@ int (*validate_add_page)(unsigned long base, unsigned long size, unsigned int type); int (*have_wrcomb)(void); + + int (*num_var_ranges)(void); }; extern int generic_get_free_region(unsigned long base, unsigned long size, @@ -61,6 +63,8 @@ extern int positive_have_wrcomb(void); +extern int __init common_num_var_ranges(void); + /* library functions for processor-specific routines */ struct set_mtrr_context { unsigned long flags; @@ -104,3 +108,4 @@ int amd_init_mtrr(void); int cyrix_init_mtrr(void); int centaur_init_mtrr(void); +void xen_init_mtrr(void); diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c new file mode 100644 --- /dev/null +++ b/arch/x86/kernel/cpu/mtrr/xen.c @@ -0,0 +1,59 @@ +#include <linux/init.h> +#include <linux/proc_fs.h> +#include <linux/ctype.h> +#include <linux/module.h> +#include <linux/seq_file.h> +#include <asm/uaccess.h> +#include <linux/mutex.h> + +#include <asm/mtrr.h> +#include "mtrr.h" + +#include <xen/interface/platform.h> +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> + +static int __init xen_num_var_ranges(void); + +/* DOM0 TODO: Need to fill in the remaining mtrr methods to have full + * working userland mtrr support. */ +static struct mtrr_ops xen_mtrr_ops = { + .vendor = X86_VENDOR_UNKNOWN, +// .set = xen_set_mtrr, +// .get = xen_get_mtrr, + .get_free_region = generic_get_free_region, +// .validate_add_page = xen_validate_add_page, + .have_wrcomb = positive_have_wrcomb, + .use_intel_if = 0, + .num_var_ranges = xen_num_var_ranges, +}; + +static int __init xen_num_var_ranges(void) +{ + int ranges; + struct xen_platform_op op; + + for (ranges = 0; ; ranges++) { + op.cmd = XENPF_read_memtype; + op.u.read_memtype.reg = ranges; + if (HYPERVISOR_dom0_op(&op) != 0) + break; + } + return ranges; +} + +void __init xen_init_mtrr(void) +{ + struct cpuinfo_x86 *c = &boot_cpu_data; + + if (!xen_initial_domain()) + return; + + if ((!cpu_has(c, X86_FEATURE_MTRR)) && + (!cpu_has(c, X86_FEATURE_K6_MTRR)) && + (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) && + (!cpu_has(c, X86_FEATURE_CENTAUR_MCR))) + return; + + mtrr_if = &xen_mtrr_ops; +} _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 26 of 38] xen: forcibly disable PAT support
From: Ian Campbell <ian.campbell@citrix.com> Xen imposes a particular PAT layout on all paravirtual guests which does not match the layout Linux would like to use. Force PAT to be disabled until this is resolved. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- arch/x86/include/asm/pat.h | 4 ++-- arch/x86/xen/enlighten.c | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h --- a/arch/x86/include/asm/pat.h +++ b/arch/x86/include/asm/pat.h @@ -6,9 +6,11 @@ #ifdef CONFIG_X86_PAT extern int pat_enabled; extern void validate_pat_support(struct cpuinfo_x86 *c); +extern void pat_disable(char *reason); #else static const int pat_enabled; static inline void validate_pat_support(struct cpuinfo_x86 *c) { } +static inline void pat_disable(char *reason) { } #endif extern void pat_init(void); @@ -17,6 +19,4 @@ unsigned long req_type, unsigned long *ret_type); extern int free_memtype(u64 start, u64 end); -extern void pat_disable(char *reason); - #endif /* _ASM_X86_PAT_H */ diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -48,6 +48,7 @@ #include <asm/pgtable.h> #include <asm/tlbflush.h> #include <asm/reboot.h> +#include <asm/pat.h> #include "xen-ops.h" #include "mmu.h" @@ -1743,6 +1744,8 @@ add_preferred_console("hvc", 0, NULL); } + pat_disable("PAT disabled on Xen"); + xen_raw_console_write("about to get started...\n"); /* Start the world */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 27 of 38] xen/dom0: use _PAGE_IOMAP in ioremap to do machine mappings
In a Xen domain, ioremap operates on machine addresses, not pseudo-physical addresses. We use _PAGE_IOMAP to determine whether a mapping is intended for machine addresses. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/xen/page.h | 8 +---- arch/x86/xen/enlighten.c | 14 +++++++-- arch/x86/xen/mmu.c | 60 ++++++++++++++++++++++++++++++++++++++- 3 files changed, 73 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -112,13 +112,9 @@ */ static inline unsigned long mfn_to_local_pfn(unsigned long mfn) { - extern unsigned long max_mapnr; unsigned long pfn = mfn_to_pfn(mfn); - if ((pfn < max_mapnr) - && !xen_feature(XENFEAT_auto_translated_physmap) - && (get_phys_to_machine(pfn) != mfn)) - return max_mapnr; /* force !pfn_valid() */ - /* XXX fixme; not true with sparsemem */ + if (get_phys_to_machine(pfn) != mfn) + return -1; /* force !pfn_valid() */ return pfn; } diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1162,11 +1162,19 @@ #ifdef CONFIG_X86_LOCAL_APIC case FIX_APIC_BASE: /* maps dummy local APIC */ #endif + /* All local page mappings */ pte = pfn_pte(phys, prot); break; + case FIX_PARAVIRT_BOOTMAP: + /* This is an MFN, but it isn''t an IO mapping from the + IO domain */ + pte = mfn_pte(phys, prot); + break; + default: - pte = mfn_pte(phys, prot); + /* By default, set_fixmap is used for hardware mappings */ + pte = mfn_pte(phys, __pgprot(pgprot_val(prot) | _PAGE_IOMAP)); break; } @@ -1695,7 +1703,9 @@ /* Prevent unwanted bits from being set in PTEs. */ __supported_pte_mask &= ~_PAGE_GLOBAL; - if (!xen_initial_domain()) + if (xen_initial_domain()) + __supported_pte_mask |= _PAGE_IOMAP; + else __supported_pte_mask &= ~(_PAGE_PWT | _PAGE_PCD); /* Don''t do the full vcpu_info placement stuff until we have a diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -302,6 +302,28 @@ return PagePinned(page); } +static bool xen_iomap_pte(pte_t pte) +{ + return xen_initial_domain() && (pte_flags(pte) & _PAGE_IOMAP); +} + +static void xen_set_iomap_pte(pte_t *ptep, pte_t pteval) +{ + struct multicall_space mcs; + struct mmu_update *u; + + mcs = xen_mc_entry(sizeof(*u)); + u = mcs.args; + + /* ptep might be kmapped when using 32-bit HIGHPTE */ + u->ptr = arbitrary_virt_to_machine(ptep).maddr; + u->val = pte_val_ma(pteval); + + MULTI_mmu_update(mcs.mc, mcs.args, 1, NULL, DOMID_IO); + + xen_mc_issue(PARAVIRT_LAZY_MMU); +} + static void xen_extend_mmu_update(const struct mmu_update *update) { struct multicall_space mcs; @@ -382,6 +404,11 @@ if (mm == &init_mm) preempt_disable(); + if (xen_iomap_pte(pteval)) { + xen_set_iomap_pte(ptep, pteval); + goto out; + } + ADD_STATS(set_pte_at, 1); // ADD_STATS(set_pte_at_pinned, xen_page_pinned(ptep)); ADD_STATS(set_pte_at_current, mm == current->mm); @@ -454,8 +481,25 @@ return val; } +static pteval_t iomap_pte(pteval_t val) +{ + if (val & _PAGE_PRESENT) { + unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT; + pteval_t flags = val & PTE_FLAGS_MASK; + + /* We assume the pte frame number is a MFN, so + just use it as-is. */ + val = ((pteval_t)pfn << PAGE_SHIFT) | flags; + } + + return val; +} + pteval_t xen_pte_val(pte_t pte) { + if (xen_initial_domain() && (pte.pte & _PAGE_IOMAP)) + return pte.pte; + return pte_mfn_to_pfn(pte.pte); } @@ -466,7 +510,11 @@ pte_t xen_make_pte(pteval_t pte) { - pte = pte_pfn_to_mfn(pte); + if (unlikely(xen_initial_domain() && (pte & _PAGE_IOMAP))) + pte = iomap_pte(pte); + else + pte = pte_pfn_to_mfn(pte); + return native_make_pte(pte); } @@ -519,6 +567,11 @@ void xen_set_pte(pte_t *ptep, pte_t pte) { + if (xen_iomap_pte(pte)) { + xen_set_iomap_pte(ptep, pte); + return; + } + ADD_STATS(pte_update, 1); // ADD_STATS(pte_update_pinned, xen_page_pinned(ptep)); ADD_STATS(pte_update_batched, paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU); @@ -535,6 +588,11 @@ #ifdef CONFIG_X86_PAE void xen_set_pte_atomic(pte_t *ptep, pte_t pte) { + if (xen_iomap_pte(pte)) { + xen_set_iomap_pte(ptep, pte); + return; + } + set_64bit((u64 *)ptep, native_pte_val(pte)); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 28 of 38] paravirt/xen: add pvop for page_is_ram
A guest domain may have external pages mapped into its address space, in order to share memory with other domains. These shared pages are more akin to io mappings than real RAM, and should not pass the page_is_ram test. Add a paravirt op for this so that a hypervisor backend can validate whether a page should be considered ram or not. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/page.h | 9 ++++++++- arch/x86/include/asm/paravirt.h | 7 +++++++ arch/x86/kernel/paravirt.c | 1 + arch/x86/mm/ioremap.c | 2 +- arch/x86/xen/enlighten.c | 11 +++++++++++ 5 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -56,7 +56,14 @@ typedef struct { pgdval_t pgd; } pgd_t; typedef struct { pgprotval_t pgprot; } pgprot_t; -extern int page_is_ram(unsigned long pagenr); +extern int native_page_is_ram(unsigned long pagenr); +#ifndef CONFIG_PARAVIRT +static inline int page_is_ram(unsigned long pagenr) +{ + return native_page_is_ram(pagenr); +} +#endif + extern int pagerange_is_ram(unsigned long start, unsigned long end); extern int devmem_is_allowed(unsigned long pagenr); extern void map_devmem(unsigned long pfn, unsigned long size, diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -321,6 +321,8 @@ an mfn. We can tell which is which from the index. */ void (*set_fixmap)(unsigned /* enum fixed_addresses */ idx, unsigned long phys, pgprot_t flags); + + int (*page_is_ram)(unsigned long pfn); }; struct raw_spinlock; @@ -1386,6 +1388,11 @@ pv_mmu_ops.set_fixmap(idx, phys, flags); } +static inline int page_is_ram(unsigned long pfn) +{ + return PVOP_CALL1(int, pv_mmu_ops.page_is_ram, pfn); +} + void _paravirt_nop(void); #define paravirt_nop ((void *)_paravirt_nop) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -451,6 +451,7 @@ }, .set_fixmap = native_set_fixmap, + .page_is_ram = native_page_is_ram, }; EXPORT_SYMBOL_GPL(pv_time_ops); diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -97,7 +97,7 @@ #endif -int page_is_ram(unsigned long pagenr) +int native_page_is_ram(unsigned long pagenr) { resource_size_t addr, end; int i; diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1190,6 +1190,16 @@ #endif } +static int xen_page_is_ram(unsigned long pfn) +{ + /* Granted pages are not RAM. They will not have a proper + identity pfn<->mfn translation. */ + if (mfn_to_local_pfn(pfn_to_mfn(pfn)) != pfn) + return 0; + + return native_page_is_ram(pfn); +} + static const struct pv_info xen_info __initdata = { .paravirt_enabled = 1, .shared_kernel_pmd = 0, @@ -1364,6 +1374,7 @@ }, .set_fixmap = xen_set_fixmap, + .page_is_ram = xen_page_is_ram, }; static void xen_reboot(int reason) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 29 of 38] xen: create dummy ioapic mapping
We don''t allow direct access to the IO apic, so make sure that any request to map it just "maps" non-present pages. We should see any attempts at direct access explode nicely. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/enlighten.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1166,6 +1166,16 @@ pte = pfn_pte(phys, prot); break; +#ifdef CONFIG_X86_IO_APIC + case FIX_IO_APIC_BASE_0 ... FIX_IO_APIC_BASE_END: + /* + * We just don''t map the IO APIC - all access is via + * hypercalls. Keep the address in the pte for reference. + */ + pte = pfn_pte(phys, PAGE_NONE); + break; +#endif + case FIX_PARAVIRT_BOOTMAP: /* This is an MFN, but it isn''t an IO mapping from the IO domain */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 30 of 38] xen: implement io_apic_ops
Writes to the IO APIC are paravirtualized via hypercalls, so implement the appropriate operations. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/Makefile | 3 +- arch/x86/xen/apic.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ arch/x86/xen/enlighten.c | 2 + arch/x86/xen/xen-ops.h | 2 + 4 files changed, 72 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -9,4 +9,5 @@ time.o xen-asm_$(BITS).o grant-table.o suspend.o obj-$(CONFIG_SMP) += smp.o spinlock.o -obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o \ No newline at end of file +obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o +obj-$(CONFIG_XEN_DOM0) += apic.o diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c new file mode 100644 --- /dev/null +++ b/arch/x86/xen/apic.c @@ -0,0 +1,66 @@ +#include <linux/kernel.h> +#include <linux/threads.h> +#include <linux/bitmap.h> + +#include <asm/io_apic.h> +#include <asm/acpi.h> + +#include <asm/xen/hypervisor.h> +#include <asm/xen/hypercall.h> + +#include <xen/interface/xen.h> +#include <xen/interface/physdev.h> + +static void __init xen_io_apic_init(void) +{ + printk("xen apic init\n"); + dump_stack(); +} + +static unsigned int xen_io_apic_read(unsigned apic, unsigned reg) +{ + struct physdev_apic apic_op; + int ret; + + apic_op.apic_physbase = mp_ioapics[apic].mp_apicaddr; + apic_op.reg = reg; + ret = HYPERVISOR_physdev_op(PHYSDEVOP_apic_read, &apic_op); + if (ret) + BUG(); + return apic_op.value; +} + + +static void xen_io_apic_write(unsigned int apic, unsigned int reg, unsigned int value) +{ + struct physdev_apic apic_op; + + apic_op.apic_physbase = mp_ioapics[apic].mp_apicaddr; + apic_op.reg = reg; + apic_op.value = value; + if (HYPERVISOR_physdev_op(PHYSDEVOP_apic_write, &apic_op)) + BUG(); +} + +static struct io_apic_ops __initdata xen_ioapic_ops = { + .init = xen_io_apic_init, + .read = xen_io_apic_read, + .write = xen_io_apic_write, + .modify = xen_io_apic_write, +}; + +void xen_init_apic(void) +{ + if (!xen_initial_domain()) + return; + + set_io_apic_ops(&xen_ioapic_ops); + +#ifdef CONFIG_ACPI + /* + * Pretend ACPI found our lapic even though we''ve disabled it, + * to prevent MP tables from setting up lapics. + */ + acpi_lapic = 1; +#endif +} diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1750,6 +1750,8 @@ set_iopl.iopl = 1; if (HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl) == -1) BUG(); + + xen_init_apic(); } /* set the limit of our address space */ diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -64,6 +64,8 @@ #endif +void xen_init_apic(void); + /* Declare an asm function, along with symbols needed to make it inlineable */ #define DECL_ASM(ret, name, ...) \ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 31 of 38] xen: set irq_chip disable
By default, the irq_chip.disable operation is a no-op. Explicitly set it to disable the Xen event channel. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -806,8 +806,11 @@ static struct irq_chip xen_dynamic_chip __read_mostly = { .name = "xen-dyn", + + .disable = disable_dynirq, .mask = disable_dynirq, .unmask = enable_dynirq, + .ack = ack_dynirq, .set_affinity = set_affinity_irq, .retrigger = retrigger_dynirq, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 32 of 38] xen: use our own eventchannel->irq path
Rather than overloading vectors for event channels, take full responsibility for mapping an event channel to irq directly. With this patch Xen has its own irq allocator. When the kernel gets an event channel upcall, it maps the event channel number to an irq and injects it into the normal interrupt path. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/include/asm/xen/events.h | 6 ------ arch/x86/xen/irq.c | 17 +---------------- drivers/xen/events.c | 22 ++++++++++++++++++++-- 3 files changed, 21 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/xen/events.h b/arch/x86/include/asm/xen/events.h --- a/arch/x86/include/asm/xen/events.h +++ b/arch/x86/include/asm/xen/events.h @@ -15,10 +15,4 @@ return raw_irqs_disabled_flags(regs->flags); } -static inline void xen_do_IRQ(int irq, struct pt_regs *regs) -{ - regs->orig_ax = ~irq; - do_IRQ(regs); -} - #endif /* _ASM_X86_XEN_EVENTS_H */ diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c --- a/arch/x86/xen/irq.c +++ b/arch/x86/xen/irq.c @@ -19,21 +19,6 @@ (void)HYPERVISOR_xen_version(0, NULL); } -static void __init __xen_init_IRQ(void) -{ - int i; - - /* Create identity vector->irq map */ - for(i = 0; i < NR_VECTORS; i++) { - int cpu; - - for_each_possible_cpu(cpu) - per_cpu(vector_irq, cpu)[i] = i; - } - - xen_init_IRQ(); -} - static unsigned long xen_save_fl(void) { struct vcpu_info *vcpu; @@ -123,7 +108,7 @@ } static const struct pv_irq_ops xen_irq_ops __initdata = { - .init_IRQ = __xen_init_IRQ, + .init_IRQ = xen_init_IRQ, .save_fl = xen_save_fl, .restore_fl = xen_restore_fl, .irq_disable = xen_irq_disable, diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -29,6 +29,7 @@ #include <asm/ptrace.h> #include <asm/irq.h> +#include <asm/idle.h> #include <asm/sync_bitops.h> #include <asm/xen/hypercall.h> #include <asm/xen/hypervisor.h> @@ -503,6 +504,24 @@ } +static void xen_do_irq(unsigned irq, struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + if (WARN_ON(irq == -1)) + return; + + exit_idle(); + irq_enter(); + + //printk("cpu %d handling irq %d\n", smp_processor_id(), info->irq); + handle_irq(irq, regs); + + irq_exit(); + + set_irq_regs(old_regs); +} + /* * Search the CPUs pending events bitmasks. For each one found, map * the event number to an irq, and feed it into do_IRQ() for @@ -543,8 +562,7 @@ int port = (word_idx * BITS_PER_LONG) + bit_idx; int irq = evtchn_to_irq[port]; - if (irq != -1) - xen_do_IRQ(irq, regs); + xen_do_irq(irq, regs); } } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 33 of 38] xen: pack all irq-related info together
Put all irq info into one struct. Also, use a union to keep event channel type-specific information, rather than overloading the index field. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 184 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 135 insertions(+), 49 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -51,18 +51,8 @@ /* IRQ <-> IPI mapping */ static DEFINE_PER_CPU(int, ipi_to_irq[XEN_NR_IPIS]) = {[0 ... XEN_NR_IPIS-1] = -1}; -/* Packed IRQ information: binding type, sub-type index, and event channel. */ -struct packed_irq -{ - unsigned short evtchn; - unsigned char index; - unsigned char type; -}; - -static struct packed_irq irq_info[NR_IRQS]; - -/* Binding types. */ -enum { +/* Interrupt types. */ +enum xen_irq_type { IRQT_UNBOUND, IRQT_PIRQ, IRQT_VIRQ, @@ -70,14 +60,39 @@ IRQT_EVTCHN }; -/* Convenient shorthand for packed representation of an unbound IRQ. */ -#define IRQ_UNBOUND mk_irq_info(IRQT_UNBOUND, 0, 0) +/* + * Packed IRQ information: + * type - enum xen_irq_type + * event channel - irq->event channel mapping + * cpu - cpu this event channel is bound to + * index - type-specific information: + * PIRQ - vector, with MSB being "needs EIO" + * VIRQ - virq number + * IPI - IPI vector + * EVTCHN - + */ +struct irq_info +{ + enum xen_irq_type type; /* type */ + unsigned short evtchn; /* event channel */ + unsigned short cpu; /* cpu bound */ + + union { + unsigned short virq; + enum ipi_vector ipi; + struct { + unsigned short gsi; + unsigned short vector; + } pirq; + } u; +}; + +static struct irq_info irq_info[NR_IRQS]; static int evtchn_to_irq[NR_EVENT_CHANNELS] = { [0 ... NR_EVENT_CHANNELS-1] = -1 }; static unsigned long cpu_evtchn_mask[NR_CPUS][NR_EVENT_CHANNELS/BITS_PER_LONG]; -static u8 cpu_evtchn[NR_EVENT_CHANNELS]; /* Reference counts for bindings to IRQs. */ static int irq_bindcount[NR_IRQS]; @@ -88,27 +103,107 @@ static struct irq_chip xen_dynamic_chip; /* Constructor for packed IRQ information. */ -static inline struct packed_irq mk_irq_info(u32 type, u32 index, u32 evtchn) +static struct irq_info mk_unbound_info(void) { - return (struct packed_irq) { evtchn, index, type }; + return (struct irq_info) { .type = IRQT_UNBOUND }; +} + +static struct irq_info mk_evtchn_info(unsigned short evtchn) +{ + return (struct irq_info) { .type = IRQT_EVTCHN, .evtchn = evtchn }; +} + +static struct irq_info mk_ipi_info(unsigned short evtchn, enum ipi_vector ipi) +{ + return (struct irq_info) { .type = IRQT_IPI, .evtchn = evtchn, + .u.ipi = ipi }; +} + +static struct irq_info mk_virq_info(unsigned short evtchn, unsigned short virq) +{ + return (struct irq_info) { .type = IRQT_VIRQ, .evtchn = evtchn, + .u.virq = virq }; +} + +static struct irq_info mk_pirq_info(unsigned short evtchn, + unsigned short gsi, unsigned short vector) +{ + return (struct irq_info) { .type = IRQT_PIRQ, .evtchn = evtchn, + .u.pirq = { .gsi = gsi, .vector = vector } }; } /* * Accessors for packed IRQ information. */ -static inline unsigned int evtchn_from_irq(int irq) +static struct irq_info *info_for_irq(unsigned irq) { - return irq_info[irq].evtchn; + return &irq_info[irq]; } -static inline unsigned int index_from_irq(int irq) +static unsigned int evtchn_from_irq(unsigned irq) { - return irq_info[irq].index; + return info_for_irq(irq)->evtchn; } -static inline unsigned int type_from_irq(int irq) +static enum ipi_vector ipi_from_irq(unsigned irq) { - return irq_info[irq].type; + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info == NULL); + BUG_ON(info->type != IRQT_IPI); + + return info->u.ipi; +} + +static unsigned virq_from_irq(unsigned irq) +{ + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info == NULL); + BUG_ON(info->type != IRQT_VIRQ); + + return info->u.virq; +} + +static unsigned gsi_from_irq(unsigned irq) +{ + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info == NULL); + BUG_ON(info->type != IRQT_PIRQ); + + return info->u.pirq.gsi; +} + +static unsigned vector_from_irq(unsigned irq) +{ + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info == NULL); + BUG_ON(info->type != IRQT_PIRQ); + + return info->u.pirq.vector; +} + +static enum xen_irq_type type_from_irq(unsigned irq) +{ + return info_for_irq(irq)->type; +} + +static unsigned cpu_from_irq(unsigned irq) +{ + return info_for_irq(irq)->cpu; +} + +static unsigned int cpu_from_evtchn(unsigned int evtchn) +{ + int irq = evtchn_to_irq[evtchn]; + unsigned ret = 0; + + if (irq != -1) + ret = cpu_from_irq(irq); + + return ret; } static inline unsigned long active_evtchns(unsigned int cpu, @@ -129,10 +224,10 @@ irq_to_desc(irq)->affinity = cpumask_of_cpu(cpu); #endif - __clear_bit(chn, cpu_evtchn_mask[cpu_evtchn[chn]]); + __clear_bit(chn, cpu_evtchn_mask[cpu_from_irq(irq)]); __set_bit(chn, cpu_evtchn_mask[cpu]); - cpu_evtchn[chn] = cpu; + irq_info[irq].cpu = cpu; } static void init_evtchn_cpu_bindings(void) @@ -146,15 +241,9 @@ desc->affinity = cpumask_of_cpu(0); #endif - memset(cpu_evtchn, 0, sizeof(cpu_evtchn)); memset(cpu_evtchn_mask[0], ~0, sizeof(cpu_evtchn_mask[0])); } -static inline unsigned int cpu_from_evtchn(unsigned int evtchn) -{ - return cpu_evtchn[evtchn]; -} - static inline void clear_evtchn(int port) { struct shared_info *s = HYPERVISOR_shared_info; @@ -239,6 +328,8 @@ if (irq == nr_irqs) panic("No available IRQ to bind to: increase nr_irqs!\n"); + dynamic_irq_init(irq); + return irq; } @@ -253,12 +344,11 @@ if (irq == -1) { irq = find_unbound_irq(); - dynamic_irq_init(irq); set_irq_chip_and_handler_name(irq, &xen_dynamic_chip, handle_level_irq, "event"); evtchn_to_irq[evtchn] = irq; - irq_info[irq] = mk_irq_info(IRQT_EVTCHN, 0, evtchn); + irq_info[irq] = mk_evtchn_info(evtchn); } irq_bindcount[irq]++; @@ -282,7 +372,6 @@ if (irq < 0) goto out; - dynamic_irq_init(irq); set_irq_chip_and_handler_name(irq, &xen_dynamic_chip, handle_level_irq, "ipi"); @@ -293,7 +382,7 @@ evtchn = bind_ipi.port; evtchn_to_irq[evtchn] = irq; - irq_info[irq] = mk_irq_info(IRQT_IPI, ipi, evtchn); + irq_info[irq] = mk_ipi_info(evtchn, ipi); per_cpu(ipi_to_irq, cpu)[ipi] = irq; @@ -327,12 +416,11 @@ irq = find_unbound_irq(); - dynamic_irq_init(irq); set_irq_chip_and_handler_name(irq, &xen_dynamic_chip, handle_level_irq, "virq"); evtchn_to_irq[evtchn] = irq; - irq_info[irq] = mk_irq_info(IRQT_VIRQ, virq, evtchn); + irq_info[irq] = mk_virq_info(evtchn, virq); per_cpu(virq_to_irq, cpu)[virq] = irq; @@ -361,11 +449,11 @@ switch (type_from_irq(irq)) { case IRQT_VIRQ: per_cpu(virq_to_irq, cpu_from_evtchn(evtchn)) - [index_from_irq(irq)] = -1; + [virq_from_irq(irq)] = -1; break; case IRQT_IPI: per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn)) - [index_from_irq(irq)] = -1; + [ipi_from_irq(irq)] = -1; break; default: break; @@ -375,7 +463,7 @@ bind_evtchn_to_cpu(evtchn, 0); evtchn_to_irq[evtchn] = -1; - irq_info[irq] = IRQ_UNBOUND; + irq_info[irq] = mk_unbound_info(); dynamic_irq_cleanup(irq); } @@ -493,8 +581,8 @@ for(i = 0; i < NR_EVENT_CHANNELS; i++) { if (sync_test_bit(i, sh->evtchn_pending)) { printk(" %d: event %d -> irq %d\n", - cpu_evtchn[i], i, - evtchn_to_irq[i]); + cpu_from_evtchn(i), i, + evtchn_to_irq[i]); } } @@ -592,7 +680,7 @@ BUG_ON(irq_bindcount[irq] == 0); evtchn_to_irq[evtchn] = irq; - irq_info[irq] = mk_irq_info(IRQT_EVTCHN, 0, evtchn); + irq_info[irq] = mk_evtchn_info(evtchn); spin_unlock(&irq_mapping_update_lock); @@ -702,8 +790,7 @@ if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1) continue; - BUG_ON(irq_info[irq].type != IRQT_VIRQ); - BUG_ON(irq_info[irq].index != virq); + BUG_ON(virq_from_irq(irq) != virq); /* Get a new binding from Xen. */ bind_virq.virq = virq; @@ -715,7 +802,7 @@ /* Record the new mapping. */ evtchn_to_irq[evtchn] = irq; - irq_info[irq] = mk_irq_info(IRQT_VIRQ, virq, evtchn); + irq_info[irq] = mk_virq_info(evtchn, virq); bind_evtchn_to_cpu(evtchn, cpu); /* Ready for use. */ @@ -732,8 +819,7 @@ if ((irq = per_cpu(ipi_to_irq, cpu)[ipi]) == -1) continue; - BUG_ON(irq_info[irq].type != IRQT_IPI); - BUG_ON(irq_info[irq].index != ipi); + BUG_ON(ipi_from_irq(irq) != ipi); /* Get a new binding from Xen. */ bind_ipi.vcpu = cpu; @@ -744,7 +830,7 @@ /* Record the new mapping. */ evtchn_to_irq[evtchn] = irq; - irq_info[irq] = mk_irq_info(IRQT_IPI, ipi, evtchn); + irq_info[irq] = mk_ipi_info(evtchn, ipi); bind_evtchn_to_cpu(evtchn, cpu); /* Ready for use. */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 34 of 38] xen: remove irq bindcount
There should be no need for us to maintain our own bind count for irqs, since the surrounding irq system should keep track of shared irqs for us. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 26 +++++++------------------- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -53,7 +53,7 @@ /* Interrupt types. */ enum xen_irq_type { - IRQT_UNBOUND, + IRQT_UNBOUND = 0, IRQT_PIRQ, IRQT_VIRQ, IRQT_IPI, @@ -94,9 +94,6 @@ }; static unsigned long cpu_evtchn_mask[NR_CPUS][NR_EVENT_CHANNELS/BITS_PER_LONG]; -/* Reference counts for bindings to IRQs. */ -static int irq_bindcount[NR_IRQS]; - /* Xen will never allocate port zero for any purpose. */ #define VALID_EVTCHN(chn) ((chn) != 0) @@ -320,9 +317,8 @@ { int irq; - /* Only allocate from dynirq range */ for_each_irq_nr(irq) - if (irq_bindcount[irq] == 0) + if (irq_info[irq].type == IRQT_UNBOUND) break; if (irq == nr_irqs) @@ -351,8 +347,6 @@ irq_info[irq] = mk_evtchn_info(evtchn); } - irq_bindcount[irq]++; - spin_unlock(&irq_mapping_update_lock); return irq; @@ -389,8 +383,6 @@ bind_evtchn_to_cpu(evtchn, cpu); } - irq_bindcount[irq]++; - out: spin_unlock(&irq_mapping_update_lock); return irq; @@ -427,8 +419,6 @@ bind_evtchn_to_cpu(evtchn, cpu); } - irq_bindcount[irq]++; - spin_unlock(&irq_mapping_update_lock); return irq; @@ -441,7 +431,7 @@ spin_lock(&irq_mapping_update_lock); - if ((--irq_bindcount[irq] == 0) && VALID_EVTCHN(evtchn)) { + if (VALID_EVTCHN(evtchn)) { close.port = evtchn; if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) BUG(); @@ -667,6 +657,8 @@ /* Rebind a new event channel to an existing irq. */ void rebind_evtchn_irq(int evtchn, int irq) { + struct irq_info *info = info_for_irq(irq); + /* Make sure the irq is masked, since the new event channel will also be masked. */ disable_irq(irq); @@ -676,8 +668,8 @@ /* After resume the irq<->evtchn mappings are all cleared out */ BUG_ON(evtchn_to_irq[evtchn] != -1); /* Expect irq to have been bound before, - so the bindcount should be non-0 */ - BUG_ON(irq_bindcount[irq] == 0); + so there should be a proper type */ + BUG_ON(info->type != IRQT_UNBOUND); evtchn_to_irq[evtchn] = irq; irq_info[irq] = mk_evtchn_info(evtchn); @@ -930,9 +922,5 @@ for (i = 0; i < NR_EVENT_CHANNELS; i++) mask_evtchn(i); - /* Dynamic IRQ space is currently unbound. Zero the refcnts. */ - for_each_irq_nr(i) - irq_bindcount[i] = 0; - irq_ctx_init(smp_processor_id()); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 35 of 38] xen: implement pirq type event channels
A privileged PV Xen domain can get direct access to hardware. In order for this to be useful, it must be able to get hardware interrupts. Being a PV Xen domain, all interrupts are delivered as event channels. PIRQ event channels are bound to a pirq number and an interrupt vector. When a IO APIC raises a hardware interrupt on that vector, it is delivered as an event channel, which we can deliver to the appropriate device driver(s). This patch simply implements the infrastructure for dealing with pirq event channels. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 250 +++++++++++++++++++++++++++++++++++++++++++++++++- include/xen/events.h | 11 ++ 2 files changed, 258 insertions(+), 3 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -16,7 +16,7 @@ * (typically dom0). * 2. VIRQs, typically used for timers. These are per-cpu events. * 3. IPIs. - * 4. Hardware interrupts. Not supported at present. + * 4. PIRQs - Hardware interrupts. * * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007 */ @@ -39,6 +39,9 @@ #include <xen/interface/xen.h> #include <xen/interface/event_channel.h> +/* Leave low irqs free for identity mapping */ +#define LEGACY_IRQS 16 + /* * This lock protects updates to the following mapping and reference-count * arrays. The lock does not need to be acquired to read the mapping tables. @@ -82,10 +85,12 @@ enum ipi_vector ipi; struct { unsigned short gsi; - unsigned short vector; + unsigned char vector; + unsigned char flags; } pirq; } u; }; +#define PIRQ_NEEDS_EOI (1 << 0) static struct irq_info irq_info[NR_IRQS]; @@ -98,6 +103,7 @@ #define VALID_EVTCHN(chn) ((chn) != 0) static struct irq_chip xen_dynamic_chip; +static struct irq_chip xen_pirq_chip; /* Constructor for packed IRQ information. */ static struct irq_info mk_unbound_info(void) @@ -203,6 +209,15 @@ return ret; } +static bool pirq_needs_eoi(unsigned irq) +{ + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + return info->u.pirq.flags & PIRQ_NEEDS_EOI; +} + static inline unsigned long active_evtchns(unsigned int cpu, struct shared_info *sh, unsigned int idx) @@ -317,9 +332,12 @@ { int irq; - for_each_irq_nr(irq) + for_each_irq_nr(irq) { + if (irq < LEGACY_IRQS) + continue; if (irq_info[irq].type == IRQT_UNBOUND) break; + } if (irq == nr_irqs) panic("No available IRQ to bind to: increase nr_irqs!\n"); @@ -329,6 +347,212 @@ return irq; } +static bool identity_mapped_irq(unsigned irq) +{ + /* only identity map legacy irqs */ + return irq < LEGACY_IRQS; +} + +static void pirq_unmask_notify(int irq) +{ + struct physdev_eoi eoi = { .irq = irq }; + + if (unlikely(pirq_needs_eoi(irq))) { + int rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi); + WARN_ON(rc); + } +} + +static void pirq_query_unmask(int irq) +{ + struct physdev_irq_status_query irq_status; + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + irq_status.irq = irq; + if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status)) + irq_status.flags = 0; + + info->u.pirq.flags &= ~PIRQ_NEEDS_EOI; + if (irq_status.flags & XENIRQSTAT_needs_eoi) + info->u.pirq.flags |= PIRQ_NEEDS_EOI; +} + +static bool probing_irq(int irq) +{ + struct irq_desc *desc = irq_to_desc(irq); + + return desc && desc->action == NULL; +} + +static unsigned int startup_pirq(unsigned int irq) +{ + struct evtchn_bind_pirq bind_pirq; + struct irq_info *info = info_for_irq(irq); + int evtchn = evtchn_from_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + if (VALID_EVTCHN(evtchn)) + goto out; + + bind_pirq.pirq = irq; + /* NB. We are happy to share unless we are probing. */ + bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE; + if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq) != 0) { + if (!probing_irq(irq)) + printk(KERN_INFO "Failed to obtain physical IRQ %d\n", + irq); + return 0; + } + evtchn = bind_pirq.port; + + pirq_query_unmask(irq); + + evtchn_to_irq[evtchn] = irq; + bind_evtchn_to_cpu(evtchn, 0); + info->evtchn = evtchn; + + out: + unmask_evtchn(evtchn); + pirq_unmask_notify(irq); + + return 0; +} + +static void shutdown_pirq(unsigned int irq) +{ + struct evtchn_close close; + struct irq_info *info = info_for_irq(irq); + int evtchn = evtchn_from_irq(irq); + + BUG_ON(info->type != IRQT_PIRQ); + + if (!VALID_EVTCHN(evtchn)) + return; + + mask_evtchn(evtchn); + + close.port = evtchn; + if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0) + BUG(); + + bind_evtchn_to_cpu(evtchn, 0); + evtchn_to_irq[evtchn] = -1; + info->evtchn = 0; +} + +static void enable_pirq(unsigned int irq) +{ + startup_pirq(irq); +} + +static void disable_pirq(unsigned int irq) +{ +} + +static void ack_pirq(unsigned int irq) +{ + int evtchn = evtchn_from_irq(irq); + + move_native_irq(irq); + + if (VALID_EVTCHN(evtchn)) { + mask_evtchn(evtchn); + clear_evtchn(evtchn); + } +} + +static void end_pirq(unsigned int irq) +{ + int evtchn = evtchn_from_irq(irq); + struct irq_desc *desc = irq_to_desc(irq); + + if (WARN_ON(!desc)) + return; + + if ((desc->status & (IRQ_DISABLED|IRQ_PENDING)) =+ (IRQ_DISABLED|IRQ_PENDING)) { + shutdown_pirq(irq); + } else if (VALID_EVTCHN(evtchn)) { + unmask_evtchn(evtchn); + pirq_unmask_notify(irq); + } +} + +static int find_irq_by_gsi(unsigned gsi) +{ + int irq; + + for(irq = 0; irq < NR_IRQS; irq++) { + struct irq_info *info = info_for_irq(irq); + + if (info == NULL || info->type != IRQT_PIRQ) + continue; + + if (gsi_from_irq(irq) == gsi) + return irq; + } + + return -1; +} + +/* + * Allocate a physical irq, along with a vector. We don''t assign an + * event channel until the irq actually started up. Return an + * existing irq if we''ve already got one for the gsi. + */ +int xen_allocate_pirq(unsigned gsi) +{ + int irq; + struct physdev_irq irq_op; + + spin_lock(&irq_mapping_update_lock); + + irq = find_irq_by_gsi(gsi); + if (irq != -1) { + printk(KERN_INFO "xen_allocate_pirq: returning irq %d for gsi %u\n", + irq, gsi); + goto out; /* XXX need refcount? */ + } + + if (identity_mapped_irq(gsi)) { + irq = gsi; + dynamic_irq_init(irq); + } else + irq = find_unbound_irq(); + + spin_unlock(&irq_mapping_update_lock); + + set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + handle_level_irq, "pirq"); + + irq_op.irq = irq; + if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { + dynamic_irq_cleanup(irq); + irq = -ENOSPC; + goto out; + } + + irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + +out: + spin_unlock(&irq_mapping_update_lock); + + return irq; +} + +int xen_vector_from_irq(unsigned irq) +{ + return vector_from_irq(irq); +} + +int xen_gsi_from_irq(unsigned irq) +{ + return gsi_from_irq(irq); +} + int bind_evtchn_to_irq(unsigned int evtchn) { int irq; @@ -912,6 +1136,26 @@ .retrigger = retrigger_dynirq, }; +static struct irq_chip xen_pirq_chip __read_mostly = { + .name = "xen-pirq", + + .startup = startup_pirq, + .shutdown = shutdown_pirq, + + .enable = enable_pirq, + .unmask = enable_pirq, + + .disable = disable_pirq, + .mask = disable_pirq, + + .ack = ack_pirq, + .end = end_pirq, + + .set_affinity = set_affinity_irq, + + .retrigger = retrigger_dynirq, +}; + void __init xen_init_IRQ(void) { int i; diff --git a/include/xen/events.h b/include/xen/events.h --- a/include/xen/events.h +++ b/include/xen/events.h @@ -55,4 +55,15 @@ irq will be disabled so it won''t deliver an interrupt. */ void xen_poll_irq(int irq); +/* Allocate an irq for a physical interrupt, given a gsi. "Legacy" + GSIs are identity mapped; others are dynamically allocated as + usual. */ +int xen_allocate_pirq(unsigned gsi); + +/* Return vector allocated to pirq */ +int xen_vector_from_irq(unsigned pirq); + +/* Return gsi allocated to pirq */ +int xen_gsi_from_irq(unsigned pirq); + #endif /* _XEN_EVENTS_H */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 36 of 38] xen: route hardware irqs via Xen
This patch puts the hooks into place so that when the interrupt subsystem registers an irq, it gets routed via Xen (if we''re running under Xen). The first step is to get a gsi for a particular device+pin. We use the normal acpi interrupt routing to do the mapping. Normally the gsi number is used directly as the irq number. We can''t do that since we also have irqs for non-hardware event channels, and so we must share the irq space between them. A given gsi is only allocated a single irq, so re-registering a gsi will simply return the same irq. We therefore allocate an irq for a given gsi, and return that. As a special case, we reserve the first 16 irqs for identity-mapping legacy irqs, since there''s a fair amount of code which assumes that. Having allocated an irq, we ask Xen to allocate a vector, and then bind that pirq/vector to an event channel. When the hardware raises an interrupt on a vector, Xen signals us on the corresponding event channel, which gets routed to the irq and delivered to the appropriate device driver. This patch does everything except set up the IO APIC pin routing to the vector. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/acpi/boot.c | 8 +++ arch/x86/pci/legacy.c | 4 + arch/x86/xen/Makefile | 1 arch/x86/xen/pci.c | 98 +++++++++++++++++++++++++++++++++++++++++++ arch/x86/xen/xen-ops.h | 1 drivers/xen/events.c | 9 ++- include/asm-x86/xen/pci.h | 7 +++ include/xen/events.h | 8 +++ 8 files changed, 132 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -42,6 +42,8 @@ #include <asm/mpspec.h> #include <asm/smp.h> +#include <asm/xen/pci.h> + #include <asm/xen/hypervisor.h> #ifdef CONFIG_X86_LOCAL_APIC @@ -501,6 +503,12 @@ unsigned int irq; unsigned int plat_gsi = gsi; +#ifdef CONFIG_XEN_PCI + irq = xen_register_gsi(gsi, triggering, polarity); + if ((int)irq >= 0) + return irq; +#endif + #ifdef CONFIG_PCI /* * Make sure all (legacy) PCI IRQs are set as level-triggered. diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -3,6 +3,7 @@ */ #include <linux/init.h> #include <linux/pci.h> +#include <asm/xen/pci.h> #include "pci.h" /* @@ -66,6 +67,9 @@ #ifdef CONFIG_X86_VISWS pci_visws_init(); #endif +#ifdef CONFIG_XEN_PCI + xen_pci_init(); +#endif pci_legacy_init(); pcibios_irq_init(); pcibios_init(); diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -11,3 +11,4 @@ obj-$(CONFIG_SMP) += smp.o spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o obj-$(CONFIG_XEN_DOM0) += apic.o +obj-$(CONFIG_XEN_PCI) += pci.o \ No newline at end of file diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c new file mode 100644 --- /dev/null +++ b/arch/x86/xen/pci.c @@ -0,0 +1,98 @@ +#include <linux/kernel.h> +#include <linux/acpi.h> +#include <linux/pci.h> + +#include <asm/xen/hypervisor.h> + +#include <xen/interface/xen.h> +#include <xen/events.h> + +#include "../pci/pci.h" + +#include "xen-ops.h" + +int xen_register_gsi(u32 gsi, int triggering, int polarity) +{ + int irq; + + if (!xen_domain()) + return -1; + + printk(KERN_DEBUG "xen: registering gsi %u triggering %d polarity %d\n", + gsi, triggering, polarity); + + irq = xen_allocate_pirq(gsi); + + printk(KERN_DEBUG "xen: --> irq=%d\n", irq); + + return irq; +} + +static int xen_pci_pirq_enable(struct pci_dev *dev) +{ + int rc; + + printk(KERN_DEBUG "xen: enabling pci device %s pin %d\n", + pci_name(dev), dev->pin); + + if (dev->pin == 0) + return 0; /* no pin, nothing to do */ + + rc = acpi_pci_irq_enable(dev); + + if (rc >= 0 && dev->irq != 0) { + int irq = dev->irq; + + printk(KERN_INFO "xen: PCI device %s pin %d -> irq %d\n", + pci_name(dev), dev->pin, irq); + + /* install vector in ioapic? */ + } else + printk(KERN_INFO "xen: irq enable for %s failed: rc=%d pin=%d irq=%d\n", + pci_name(dev), rc, dev->pin, dev->irq); + + return rc; +} + +static void xen_pci_pirq_disable(struct pci_dev *dev) +{ + printk(KERN_INFO "xen: disable pci device %s\n", + pci_name(dev)); + + dump_stack(); +} + +void __init xen_pci_init(void) +{ + if (!xen_domain()) + return; + + /* + * In either dom0 or when using pcifront we need to take + * control of physical interrupts from pci devices. + * Overriding these two put us in charge of interrupt routing + * akin to ACPI. + * + * This overrides any previous settings. + */ + pcibios_enable_irq = xen_pci_pirq_enable; + pcibios_disable_irq = xen_pci_pirq_disable; +} + +void __init xen_setup_pirqs(void) +{ +#ifdef CONFIG_ACPI + int irq; + + /* + * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt. + */ + irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt); + + printk(KERN_INFO "xen: allocated irq %d for acpi %d\n", + irq, acpi_gbl_FADT.sci_interrupt); + + /* Blerk. */ + acpi_gbl_FADT.sci_interrupt = irq; +#endif +} diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -63,7 +63,6 @@ static inline void xen_smp_init(void) {} #endif - void xen_init_apic(void); /* Declare an asm function, along with symbols needed to make it diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -391,6 +391,7 @@ struct evtchn_bind_pirq bind_pirq; struct irq_info *info = info_for_irq(irq); int evtchn = evtchn_from_irq(irq); + int rc; BUG_ON(info->type != IRQT_PIRQ); @@ -400,10 +401,12 @@ bind_pirq.pirq = irq; /* NB. We are happy to share unless we are probing. */ bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE; - if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq) != 0) { + rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq); + if (rc != 0) { if (!probing_irq(irq)) printk(KERN_INFO "Failed to obtain physical IRQ %d\n", irq); + dump_stack(); return 0; } evtchn = bind_pirq.port; @@ -523,8 +526,6 @@ } else irq = find_unbound_irq(); - spin_unlock(&irq_mapping_update_lock); - set_irq_chip_and_handler_name(irq, &xen_pirq_chip, handle_level_irq, "pirq"); @@ -1167,4 +1168,6 @@ mask_evtchn(i); irq_ctx_init(smp_processor_id()); + + xen_setup_pirqs(); } diff --git a/include/asm-x86/xen/pci.h b/include/asm-x86/xen/pci.h new file mode 100644 --- /dev/null +++ b/include/asm-x86/xen/pci.h @@ -0,0 +1,7 @@ +#ifndef _ASM_X86_XEN_PCI_H +#define _ASM_X86_XEN_PCI_H + +void xen_pci_init(void); +int xen_register_gsi(u32 gsi, int triggering, int polarity); + +#endif /* _ASM_X86_XEN_PCI_H */ diff --git a/include/xen/events.h b/include/xen/events.h --- a/include/xen/events.h +++ b/include/xen/events.h @@ -66,4 +66,12 @@ /* Return gsi allocated to pirq */ int xen_gsi_from_irq(unsigned pirq); +#ifdef CONFIG_XEN_PCI +void xen_setup_pirqs(void); +#else +static inline void xen_setup_pirqs(void) +{ +} +#endif + #endif /* _XEN_EVENTS_H */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 37 of 38] xen: bind pirq to vector and event channel
Having converting a dev+pin to a gsi, and that gsi to an irq, and allocated a vector for the irq, we must program the IO APIC to deliver an interrupt on a pin to the vector, so Xen can deliver it as an event channel. Given the pirq, we can get the gsi and vector. We map the gsi to a specific IO APIC''s pin, and set the routing entry. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/xen/apic.c | 4 ++-- arch/x86/xen/pci.c | 34 +++++++++++++++++++++++++++++++--- 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -4,6 +4,7 @@ #include <asm/io_apic.h> #include <asm/acpi.h> +#include <asm/hw_irq.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> @@ -13,8 +14,7 @@ static void __init xen_io_apic_init(void) { - printk("xen apic init\n"); - dump_stack(); + enable_IO_APIC(); } static unsigned int xen_io_apic_read(unsigned apic, unsigned reg) diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c --- a/arch/x86/xen/pci.c +++ b/arch/x86/xen/pci.c @@ -2,8 +2,10 @@ #include <linux/acpi.h> #include <linux/pci.h> +#include <asm/mpspec.h> +#include <asm/io_apic.h> + #include <asm/xen/hypervisor.h> - #include <xen/interface/xen.h> #include <xen/events.h> @@ -11,6 +13,31 @@ #include "xen-ops.h" +static void xen_set_io_apic_routing(int irq, int trigger, int polarity) +{ + int ioapic, ioapic_pin; + int vector, gsi; + struct IO_APIC_route_entry entry; + + gsi = xen_gsi_from_irq(irq); + vector = xen_vector_from_irq(irq); + + ioapic = mp_find_ioapic(gsi); + if (ioapic == -1) { + printk(KERN_WARNING "xen_set_ioapic_routing: irq %d gsi %d ioapic %d\n", + irq, gsi, ioapic); + return; + } + + ioapic_pin = mp_find_ioapic_pin(ioapic, gsi); + + printk(KERN_INFO "xen_set_ioapic_routing: irq %d gsi %d vector %d ioapic %d pin %d triggering %d polarity %d\n", + irq, gsi, vector, ioapic, ioapic_pin, trigger, polarity); + + setup_ioapic_entry(ioapic, -1, &entry, ~0, trigger, polarity, vector); + ioapic_write_entry(ioapic, ioapic_pin, entry); +} + int xen_register_gsi(u32 gsi, int triggering, int polarity) { int irq; @@ -25,6 +52,9 @@ printk(KERN_DEBUG "xen: --> irq=%d\n", irq); + if (irq > 0) + xen_set_io_apic_routing(irq, triggering, polarity); + return irq; } @@ -45,8 +75,6 @@ printk(KERN_INFO "xen: PCI device %s pin %d -> irq %d\n", pci_name(dev), dev->pin, irq); - - /* install vector in ioapic? */ } else printk(KERN_INFO "xen: irq enable for %s failed: rc=%d pin=%d irq=%d\n", pci_name(dev), rc, dev->pin, dev->irq); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-13 19:10 UTC
[Xen-devel] [PATCH 38 of 38] xen: make sure that softirqs get handled at the end of event processing
Make sure that irq_enter()/irq_exit() wrap the entire event processing loop, rather than each individual event invokation. This make sure that softirq processing is deferred until the of event processing, rather than in the middle with interrupts disabled. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/xen/events.c | 29 +++++++++-------------------- 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -806,25 +806,6 @@ return IRQ_HANDLED; } - -static void xen_do_irq(unsigned irq, struct pt_regs *regs) -{ - struct pt_regs *old_regs = set_irq_regs(regs); - - if (WARN_ON(irq == -1)) - return; - - exit_idle(); - irq_enter(); - - //printk("cpu %d handling irq %d\n", smp_processor_id(), info->irq); - handle_irq(irq, regs); - - irq_exit(); - - set_irq_regs(old_regs); -} - /* * Search the CPUs pending events bitmasks. For each one found, map * the event number to an irq, and feed it into do_IRQ() for @@ -837,11 +818,15 @@ void xen_evtchn_do_upcall(struct pt_regs *regs) { int cpu = get_cpu(); + struct pt_regs *old_regs = set_irq_regs(regs); struct shared_info *s = HYPERVISOR_shared_info; struct vcpu_info *vcpu_info = __get_cpu_var(xen_vcpu); static DEFINE_PER_CPU(unsigned, nesting_count); unsigned count; + exit_idle(); + irq_enter(); + do { unsigned long pending_words; @@ -865,7 +850,8 @@ int port = (word_idx * BITS_PER_LONG) + bit_idx; int irq = evtchn_to_irq[port]; - xen_do_irq(irq, regs); + if (irq != -1) + handle_irq(irq, regs); } } @@ -876,6 +862,9 @@ } while(count != 1); out: + irq_exit(); + set_irq_regs(old_regs); + put_cpu(); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark McLoughlin
2008-Nov-13 20:28 UTC
[Xen-devel] Re: [PATCH 25 of 38] xen mtrr: Add mtrr_ops support for Xen mtrr
On Thu, 2008-11-13 at 11:10 -0800, Jeremy Fitzhardinge wrote:> diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c > new file mode 100644 > --- /dev/null > +++ b/arch/x86/kernel/cpu/mtrr/xen.c > @@ -0,0 +1,59 @@...> + > +/* DOM0 TODO: Need to fill in the remaining mtrr methods to have full > + * working userland mtrr support. */ > +static struct mtrr_ops xen_mtrr_ops = { > + .vendor = X86_VENDOR_UNKNOWN, > +// .set = xen_set_mtrr, > +// .get = xen_get_mtrr, > + .get_free_region = generic_get_free_region, > +// .validate_add_page = xen_validate_add_page, > + .have_wrcomb = positive_have_wrcomb, > + .use_intel_if = 0, > + .num_var_ranges = xen_num_var_ranges, > +};... I''m vague on the details now, but looking back at the dom0 patch set here: http://git.et.redhat.com/?p=linux-2.6-dom0-pvops.git;a=shortlog;h=55abc194080b5cf31cd66f5e35e8e5c5af2aa927 I see we did have a bunch more mtrr work e.g. fixing the TODO above: http://git.et.redhat.com/?p=linux-2.6-dom0-pvops.git;a=commitdiff;h=93f779bf3d79f28d0933bfbc53f7b8c5b6496081 Cheers, Mark. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-14 00:35 UTC
[Xen-devel] Re: [PATCH 25 of 38] xen mtrr: Add mtrr_ops support for Xen mtrr
Mark McLoughlin wrote:> On Thu, 2008-11-13 at 11:10 -0800, Jeremy Fitzhardinge wrote: > > >> diff --git a/arch/x86/kernel/cpu/mtrr/xen.c b/arch/x86/kernel/cpu/mtrr/xen.c >> new file mode 100644 >> --- /dev/null >> +++ b/arch/x86/kernel/cpu/mtrr/xen.c >> @@ -0,0 +1,59 @@ >> > ... > >> + >> +/* DOM0 TODO: Need to fill in the remaining mtrr methods to have full >> + * working userland mtrr support. */ >> +static struct mtrr_ops xen_mtrr_ops = { >> + .vendor = X86_VENDOR_UNKNOWN, >> +// .set = xen_set_mtrr, >> +// .get = xen_get_mtrr, >> + .get_free_region = generic_get_free_region, >> +// .validate_add_page = xen_validate_add_page, >> + .have_wrcomb = positive_have_wrcomb, >> + .use_intel_if = 0, >> + .num_var_ranges = xen_num_var_ranges, >> +}; >> > > ... > > I''m vague on the details now, but looking back at the dom0 patch set > here: > > http://git.et.redhat.com/?p=linux-2.6-dom0-pvops.git;a=shortlog;h=55abc194080b5cf31cd66f5e35e8e5c5af2aa927 > > I see we did have a bunch more mtrr work e.g. fixing the TODO above: > > http://git.et.redhat.com/?p=linux-2.6-dom0-pvops.git;a=commitdiff;h=93f779bf3d79f28d0933bfbc53f7b8c5b6496081 >Yes, the mtrr changes are incomplete. I started on them as much as necessary to get things booting, and then left the rest to revisit. It''s not a particularly pretty part of the kernel, and so I was hoping some magic beautification fairy would visit it before I needed to touch it more... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Nov-14 09:01 UTC
Re: [Xen-devel] [PATCH 03 of 38] swiotlb: allow architectures tooverride swiotlb pool allocation
Not directly related to this patch alone, but to the combined set of changes to swiotlb: I don''t see any handling of CONFIG_HIGHMEM here (or at least a note that this a known limitation needing work). I mention this because this was the largest part of the changes I had posted long ago to make lib/swiotlb.c Xen-ready, and which got rejected due to their ugliness. While perhaps less intrusive to take care of, I also didn''t see an equivalent of the range_straddles_page_boundary() logic, without which I can''t see how this would work in the common case. Jan>>> Jeremy Fitzhardinge <jeremy@goop.org> 13.11.08 20:10 >>>Architectures may need to allocate memory specially for use with the swiotlb. Create the weak function swiotlb_alloc_boot() and swiotlb_alloc() defaulting to the current behaviour. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: Tony Luck <tony.luck@intel.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- include/linux/swiotlb.h | 3 +++ lib/swiotlb.c | 16 +++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -10,6 +10,9 @@ extern void swiotlb_init(void); +extern void *swiotlb_alloc_boot(size_t bytes, unsigned long nslabs); +extern void *swiotlb_alloc(unsigned order, unsigned long nslabs); + extern void *swiotlb_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t *dma_handle, gfp_t flags); diff --git a/lib/swiotlb.c b/lib/swiotlb.c --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -21,6 +21,7 @@ #include <linux/mm.h> #include <linux/module.h> #include <linux/spinlock.h> +#include <linux/swiotlb.h> #include <linux/string.h> #include <linux/types.h> #include <linux/ctype.h> @@ -126,6 +127,16 @@ __setup("swiotlb=", setup_io_tlb_npages); /* make io_tlb_overflow tunable too? */ +void * __weak swiotlb_alloc_boot(size_t size, unsigned long nslabs) +{ + return alloc_bootmem_low_pages(size); +} + +void * __weak swiotlb_alloc(unsigned order, unsigned long nslabs) +{ + return (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order); +} + /* * Statically reserve bounce buffer space and initialize bounce buffer data * structures for the software IO TLB used to implement the DMA API. @@ -145,7 +156,7 @@ /* * Get IO TLB memory from the low pages */ - io_tlb_start = alloc_bootmem_low_pages(bytes); + io_tlb_start = swiotlb_alloc_boot(bytes, io_tlb_nslabs); if (!io_tlb_start) panic("Cannot allocate SWIOTLB buffer"); io_tlb_end = io_tlb_start + bytes; @@ -202,8 +213,7 @@ bytes = io_tlb_nslabs << IO_TLB_SHIFT; while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) { - io_tlb_start = (char *)__get_free_pages(GFP_DMA | __GFP_NOWARN, - order); + io_tlb_start = swiotlb_alloc(order, io_tlb_nslabs); if (io_tlb_start) break; order--; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-14 19:33 UTC
Re: [Xen-devel] [PATCH 03 of 38] swiotlb: allow architectures tooverride swiotlb pool allocation
Jan Beulich wrote:> Not directly related to this patch alone, but to the combined set of changes > to swiotlb: I don''t see any handling of CONFIG_HIGHMEM here (or at least > a note that this a known limitation needing work). I mention this because > this was the largest part of the changes I had posted long ago to make > lib/swiotlb.c Xen-ready, and which got rejected due to their ugliness. >Was that Andi''s objection on the grounds that he didn''t think that Xen should need swiotlb at all? I have to admit I didn''t follow that thread very closely (or threads, as I seem to remember). Do you have a pointer to the pertinent bits?> While perhaps less intrusive to take care of, I also didn''t see an equivalent > of the range_straddles_page_boundary() logic, without which I can''t see > how this would work in the common case. >Could you be more specific? The swiotlb allocation should be machine contiguous and so there''s no stradding required, but I think I''m missing your point. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chris Lalancette
2008-Nov-17 08:04 UTC
Re: [Xen-devel] [PATCH 03 of 38] swiotlb: allow architectures tooverride swiotlb pool allocation
Jeremy Fitzhardinge wrote:>> While perhaps less intrusive to take care of, I also didn''t see an equivalent >> of the range_straddles_page_boundary() logic, without which I can''t see >> how this would work in the common case. >> > Could you be more specific? The swiotlb allocation should be machine > contiguous and so there''s no stradding required, but I think I''m missing > your point.In general, I think you are right; swiotlb should be machine contiguous, so it works in the normal case. The range_straddles_page_boundary function takes care of a corner case, where you can run into swiotlb exhaustion when you really shouldn''t. As I understand it, it comes about because it is possible to get a swiotlb request with two pages that just happen to be machine contiguous, but were *not* allocated through xen_create_contiguous_region (and hence weren''t marked in the contiguous_bitmap as such). In this case, you split the request into two separate requests, and this can more easily lead to exhaustion. range_straddles_page_boundary works around this by checking whether any two pages coming through the swiotlb layer are machine contiguous, and if they are, not splitting the request. -- Chris Lalancette _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Nov-17 08:44 UTC
Re: [Xen-devel] [PATCH 03 of 38] swiotlb: allow architectures tooverride swiotlb pool allocation
On 17/11/08 08:04, "Chris Lalancette" <clalance@redhat.com> wrote:>> Could you be more specific? The swiotlb allocation should be machine >> contiguous and so there''s no stradding required, but I think I''m missing >> your point. > > In general, I think you are right; swiotlb should be machine contiguous, so it > works in the normal case. The range_straddles_page_boundary function takes > care > of a corner case, where you can run into swiotlb exhaustion when you really > shouldn''t. As I understand it, it comes about because it is possible to get a > swiotlb request with two pages that just happen to be machine contiguous, but > were *not* allocated through xen_create_contiguous_region (and hence weren''t > marked in the contiguous_bitmap as such). In this case, you split the request > into two separate requests, and this can more easily lead to exhaustion. > range_straddles_page_boundary works around this by checking whether any two > pages coming through the swiotlb layer are machine contiguous, and if they > are, > not splitting the request.A more specific problem solved by range_straddle_page_boundary() in our 2.6.18 kernel was that the block layer would do bio merging because it checked that pages really were contiguous, and then swiotlb (without r_s_p_b) would decide that the pages weren''t contiguous (because the contiguity was random luck rather than explicitly requested) and hence do bounce buffering. Result was that sufficiently aggressive I/O would exhaust swiotlb resources and crash the kernel. In the 2.6.18 port we actually got rid of contiguous_bitmap[] entirely. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2008-Nov-17 09:15 UTC
Re: [Xen-devel] [PATCH 03 of 38] swiotlb: allow architectures tooverrideswiotlb pool allocation
>>> Jeremy Fitzhardinge <jeremy@goop.org> 14.11.08 20:33 >>> >Jan Beulich wrote: >> Not directly related to this patch alone, but to the combined set of changes >> to swiotlb: I don''t see any handling of CONFIG_HIGHMEM here (or at least >> a note that this a known limitation needing work). I mention this because >> this was the largest part of the changes I had posted long ago to make >> lib/swiotlb.c Xen-ready, and which got rejected due to their ugliness. >> > >Was that Andi''s objection on the grounds that he didn''t think that Xen >should need swiotlb at all?No, Tony Luck actually merged it, but someone else (I don''t recall who it was) requested it to be reverted again.>I have to admit I didn''t follow that thread very closely (or threads, as >I seem to remember). Do you have a pointer to the pertinent bits?http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=51099005ab8e09d68a13fea8d55bc739c1040ca6>> While perhaps less intrusive to take care of, I also didn''t see an equivalent >> of the range_straddles_page_boundary() logic, without which I can''t see >> how this would work in the common case. >> >Could you be more specific? The swiotlb allocation should be machine >contiguous and so there''s no stradding required, but I think I''m missing >your point.The question is whether a multi-page piece of memory must be funneled through the swiotlb in the first place. In native code, checking whether the first/last byte satisfies the address_needs_mapping() check is sufficient, but in Xen you also need to check whether the known to be physically contiguous pages are also machine-contiguous. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-17 11:52 UTC
[Xen-devel] Re: [PATCH 04 of 38] swiotlb: move some definitions to header
On Mon, 2008-11-17 at 12:48 +0900, FUJITA Tomonori wrote:> Why do we need to export IO_TLB_SEGSIZE and IO_TLB_SHIFT to everyone > in include/linux?A subsequent Xen patch needs to make use of them, although I can''t see it in the patchset Jeremy posted so here it is (not fully baked yet) Subject: xen swiotlb: fixup swiotlb is chunks smaller than MAX_CONTIG_ORDER From: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- arch/x86/kernel/pci-swiotlb_64.c | 7 +------ drivers/pci/xen-iommu.c | 30 ++++++++++++++++++++---------- 2 files changed, 21 insertions(+), 16 deletions(-) ==================================================================--- a/arch/x86/kernel/pci-swiotlb_64.c +++ b/arch/x86/kernel/pci-swiotlb_64.c @@ -28,12 +28,7 @@ void *swiotlb_alloc(unsigned order, unsigned long nslabs) { - void *ret = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN, order); - - if (ret && xen_pv_domain()) - xen_swiotlb_fixup(ret, 1u << order, nslabs); - - return ret; + BUG(); } static dma_addr_t ==================================================================--- a/drivers/pci/xen-iommu.c +++ b/drivers/pci/xen-iommu.c @@ -6,6 +6,7 @@ #include <linux/version.h> #include <linux/scatterlist.h> #include <linux/bio.h> +#include <linux/swiotlb.h> #include <linux/io.h> #include <linux/bug.h> @@ -43,19 +44,27 @@ unsigned long *bitmap; }; +static int max_dma_bits = 32; + void xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs) { - unsigned order = get_order(size); + int i, rc; + int dma_bits; - printk(KERN_DEBUG "xen_swiotlb_fixup: buf=%p size=%zu order=%u\n", - buf, size, order); + printk(KERN_DEBUG "xen_swiotlb_fixup: buf=%p size=%zu\n", + buf, size); - if (WARN_ON(size != (PAGE_SIZE << order))) - return; - - if (xen_create_contiguous_region((unsigned long)buf, - order, 0xffffffff)) - printk(KERN_ERR "xen_create_contiguous_region failed\n"); + dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT; + for (i = 0; i < nslabs; i += IO_TLB_SEGSIZE) { + do { + rc = xen_create_contiguous_region( + (unsigned long)buf + (i << IO_TLB_SHIFT), + get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT), + dma_bits); + } while (rc && dma_bits++ < max_dma_bits); + if (rc) + panic(KERN_ERR "xen_create_contiguous_region failed\n"); + } } static inline int address_needs_mapping(struct device *hwdev, @@ -117,7 +126,8 @@ if (check_pages_physically_contiguous(pfn, offset, size)) return 0; - printk("range_straddles_page_boundary: p=%Lx size=%d pfn=%lx\n", + printk(KERN_WARNING "range_straddles_page_boundary: " + "p=%Lx size=%zd pfn=%lx\n", p, size, pfn); return 1; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-17 16:16 UTC
[Xen-devel] Re: [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
On Mon, 2008-11-17 at 12:48 +0900, FUJITA Tomonori wrote:> On Thu, 13 Nov 2008 11:10:16 -0800 > Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > > swiotlb on 32 bit will be used by Xen domain 0 support. > > If you want swiotlb on 32 bit, you need more modifications, I think.Possibly. It currently "Works For Me(tm)", but I should check it over.> For example, the following code assumes that the mask needs to be > 64 bits.The use of unsigned long for the mask is throughout the API and not simply limited to swiotlb.c. All the callers of dma_set_seg_boundary (PCI and SCSI subsys it seems) do not use a value >4G anywhere I can see. Presumably if something was we would see "warning: overflow in implicit constant conversion" somewhere along the line. If no value is set then the default is 0xffffffff which is safe on 32 bit. I suspect that even with PAE addresses above 4G aren''t seen very often due to pre-existing subsystem specific bounce buffers or other existing limitations (like network buffers being in lowmem). Perhaps dma_addr_t should be used though? Ian.> > static void * > map_single(struct device *hwdev, char *buffer, size_t size, int dir) > { > unsigned long flags; > char *dma_addr; > unsigned int nslots, stride, index, wrap; > int i; > unsigned long start_dma_addr; > unsigned long mask; > unsigned long offset_slots; > unsigned long max_slots; > > mask = dma_get_seg_boundary(hwdev); > start_dma_addr = virt_to_bus(io_tlb_start) & mask; > > offset_slots = ALIGN(start_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; > max_slots = mask + 1 > ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT > : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-19 13:48 UTC
[Xen-devel] Re: [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
On Wed, 2008-11-19 at 11:19 +0900, FUJITA Tomonori wrote:> On Mon, 17 Nov 2008 16:16:06 +0000 > Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > > For example, the following code assumes that the mask needs to be > > > 64 bits. > > > > The use of unsigned long for the mask is throughout the API and not > > simply limited to swiotlb.c. All the callers of dma_set_seg_boundary > > (PCI and SCSI subsys it seems) do not use a value >4G anywhere I can > > see. > > 32bit is large enough for dma segment boundary mask, I think. > > The problem that I talked about in the previous mail: > > > max_slots = mask + 1 > > ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT > > : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); > > Since the popular value of the mask is 0xffffffff. So the above code > (mask + 1 ?) works wrongly if the size of mask is 32bit (well, > accidentally the result of max_slots is identical though).Ah, I hadn''t spotted this, you are right it probably works but just by chance. Thanks for pointing it out.> > Presumably if something was we would see "warning: overflow in > > implicit constant conversion" somewhere along the line. If no value is > > set then the default is 0xffffffff which is safe on 32 bit. > > > > I suspect that even with PAE addresses above 4G aren''t seen very often > > due to pre-existing subsystem specific bounce buffers or other existing > > limitations (like network buffers being in lowmem). > > I guess that you talk about the dma_mask (and coherent_dma_mask) in > struct device. The dma segment boundary mask represents the different > dma limitation of a device.I was talking about the segment_boundary_mask in struct device_dma_parameters which is the source of the "mask" value in the code you quoted.> > Perhaps dma_addr_t should be used though? > > I think that ''unsigned long'' is better for the dma segment boundary > mask since it represents the hardware limitation. The size of the > value are not related with kernel configurations at all.Right, it''s just that on occasion we have to cope with slightly larger values while manipulating things. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Simon Horman
2008-Nov-20 08:33 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Hi, it seems that if CONFIG_XEN is set by CONFIG_XEN_DOM0 is not set, then the call to xen_init_apic() in xen_start_kernel() causes the build to fail. One possible soluion to this is to provide a dummy version of xen_init_apic() in the !CONFIG_XEN_DOM0 case. Another possible solution would be to add #ifdef CONFIG_XEN_DOM0 inside xen_start_kernel() #make gcc --version gcc (Debian 4.3.2-1) 4.3.2 Copyright (C) 2008 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # make [snip] UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 arch/x86/xen/built-in.o: In function `xen_start_kernel'': (.init.text+0x8ef): undefined reference to `xen_init_apic'' Index: linux-2.6/arch/x86/xen/xen-ops.h ==================================================================--- linux-2.6.orig/arch/x86/xen/xen-ops.h 2008-11-20 17:23:14.000000000 +0900 +++ linux-2.6/arch/x86/xen/xen-ops.h 2008-11-20 17:24:28.000000000 +0900 @@ -64,7 +64,11 @@ static inline void xen_smp_init(void) {} #endif +#ifdef CONFIG_XEN_DOM0 void xen_init_apic(void); +#else +static inline void xen_init_apic(void) { ; } +#endif /* Declare an asm function, along with symbols needed to make it inlineable */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2008-Nov-20 09:31 UTC
[Xen-devel] Re: [PATCH 36 of 38] xen: route hardware irqs via Xen
> +#ifdef CONFIG_XEN_PCI > + irq = xen_register_gsi(gsi, triggering, polarity); > + if ((int)irq >= 0) > + return irq; > +#endifwhy not change irq to ''int'' and avoid the cast? also, please eliminate the #ifdef by turning xen_register_gsi() into a ''return -1'' inline on !CONFIG_XEN_PCI.> +#ifdef CONFIG_XEN_PCI > + xen_pci_init(); > +#endifhide the #ifdef in a header please. (like you already properly do for xen_setup_pirqs())> + if (rc != 0) { > if (!probing_irq(irq)) > printk(KERN_INFO "Failed to obtain physical IRQ %d\n", > irq); > + dump_stack();generally it''s better to use WARN() or WARN_ONCE() to get good debug feedback and stackdumps. (they also document the reason for the dump)> @@ -523,8 +526,6 @@ > } else > irq = find_unbound_irq(); > > - spin_unlock(&irq_mapping_update_lock); > - > set_irq_chip_and_handler_name(irq, &xen_pirq_chip, > handle_level_irq, "pirq");hm, looks like a stray bugfix? Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2008-Nov-20 09:35 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Writes to the IO APIC are paravirtualized via hypercalls, so implement > the appropriate operations. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > arch/x86/xen/Makefile | 3 +- > arch/x86/xen/apic.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ > arch/x86/xen/enlighten.c | 2 + > arch/x86/xen/xen-ops.h | 2 + > 4 files changed, 72 insertions(+), 1 deletion(-)hm, why is the ioapic used as the API here, and not an irqchip? Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-20 17:00 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Ingo Molnar wrote:> * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > >> Writes to the IO APIC are paravirtualized via hypercalls, so implement >> the appropriate operations. >> >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> --- >> arch/x86/xen/Makefile | 3 +- >> arch/x86/xen/apic.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ >> arch/x86/xen/enlighten.c | 2 + >> arch/x86/xen/xen-ops.h | 2 + >> 4 files changed, 72 insertions(+), 1 deletion(-) >> > > hm, why is the ioapic used as the API here, and not an irqchip? >In essence, the purpose of the series is to break the 1:1 relationship between Linux irqs and hardware GSIs. This allows me to have my own irq allocator, which in turn allows me to intermix "physical" irqs (ie, a Linux irq number bound to a real hardware interrupt source) with the various software/virtual irqs the Xen system needs. Once a physical irq has been mapped onto a gsi interrupt source, the mechanisms for handing the ioapic side of things are more or less the same. There''s the same procedure of finding the ioapic/pin for a gsi and programming the appropriate vector. (Presumably once I implement MSI support, all references to "gsi" will become "gsi/msi/etc".) So, there''s an awkward tradeoff. I could just completely duplicate the whole irq/vector/ioapic management code and hide it under my own irqchip, but it would end up duplicating a lot of the existing code. My alternative was to try to open out the existing code into something like a thin ioapic library, which I can call into as needed. The only low-level difference is that the Xen ioapics need to be programmed via a hypercall rather than register writes. If the x86 interrupt layer in general decouples irqs from GSIs, then I can probably make use of that to clean things up. A general irq allocator along with some way of attaching interrupt-source-specific information to each irq would get me a long way, I think. I''d still need hooks to paravirtualize the actual ioapic writes, but at least I wouldn''t need to have quite so much delicate hooking. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-20 17:06 UTC
[Xen-devel] Re: [PATCH 36 of 38] xen: route hardware irqs via Xen
Ingo Molnar wrote:>> +#ifdef CONFIG_XEN_PCI >> + irq = xen_register_gsi(gsi, triggering, polarity); >> + if ((int)irq >= 0) >> + return irq; >> +#endif >> > > why not change irq to ''int'' and avoid the cast? >Yeah, OK.> also, please eliminate the #ifdef by turning xen_register_gsi() into a > ''return -1'' inline on !CONFIG_XEN_PCI. >OK.>> +#ifdef CONFIG_XEN_PCI >> + xen_pci_init(); >> +#endif >> > > hide the #ifdef in a header please. (like you already properly do for > xen_setup_pirqs()) >OK.>> + if (rc != 0) { >> if (!probing_irq(irq)) >> printk(KERN_INFO "Failed to obtain physical IRQ %d\n", >> irq); >> + dump_stack(); >> > > generally it''s better to use WARN() or WARN_ONCE() to get good debug > feedback and stackdumps. (they also document the reason for the dump) >That was really just a temp debug hack; it shouldn''t be dumping stack for probes anyway.>> @@ -523,8 +526,6 @@ >> } else >> irq = find_unbound_irq(); >> >> - spin_unlock(&irq_mapping_update_lock); >> - >> set_irq_chip_and_handler_name(irq, &xen_pirq_chip, >> handle_level_irq, "pirq"); >> > > hm, looks like a stray bugfix? >Erm, yep. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-20 17:07 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Simon Horman wrote:> Hi, > > it seems that if CONFIG_XEN is set by CONFIG_XEN_DOM0 is not set, > then the call to xen_init_apic() in xen_start_kernel() causes the > build to fail. > > One possible soluion to this is to provide a dummy > version of xen_init_apic() in the !CONFIG_XEN_DOM0 case. > > Another possible solution would be to add #ifdef CONFIG_XEN_DOM0 > inside xen_start_kernel() >Ah, OK, will fix that up. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ingo Molnar
2008-Nov-20 19:22 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:> Ingo Molnar wrote: >> * Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> >> >>> Writes to the IO APIC are paravirtualized via hypercalls, so implement >>> the appropriate operations. >>> >>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >>> --- >>> arch/x86/xen/Makefile | 3 +- >>> arch/x86/xen/apic.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ >>> arch/x86/xen/enlighten.c | 2 + >>> arch/x86/xen/xen-ops.h | 2 + >>> 4 files changed, 72 insertions(+), 1 deletion(-) >>> >> >> hm, why is the ioapic used as the API here, and not an irqchip? >> > > In essence, the purpose of the series is to break the 1:1 > relationship between Linux irqs and hardware GSIs. This allows me > to have my own irq allocator, which in turn allows me to intermix > "physical" irqs (ie, a Linux irq number bound to a real hardware > interrupt source) with the various software/virtual irqs the Xen > system needs. > > Once a physical irq has been mapped onto a gsi interrupt source, the > mechanisms for handing the ioapic side of things are more or less > the same. There''s the same procedure of finding the ioapic/pin for > a gsi and programming the appropriate vector. > > (Presumably once I implement MSI support, all references to "gsi" > will become "gsi/msi/etc".) > > So, there''s an awkward tradeoff. I could just completely duplicate > the whole irq/vector/ioapic management code and hide it under my own > irqchip, but it would end up duplicating a lot of the existing code. > My alternative was to try to open out the existing code into > something like a thin ioapic library, which I can call into as > needed. The only low-level difference is that the Xen ioapics need > to be programmed via a hypercall rather than register writes. > > If the x86 interrupt layer in general decouples irqs from GSIs, then > I can probably make use of that to clean things up. A general irq > allocator along with some way of attaching interrupt-source-specific > information to each irq would get me a long way, I think. I''d still > need hooks to paravirtualize the actual ioapic writes, but at least > I wouldn''t need to have quite so much delicate hooking.it certainly looks thin enough to me although i''m really not sure we want to virtualize at the IO-APIC level. Peter, what''s your opinion/preference? Ingo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-20 19:38 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Ingo Molnar wrote:> it certainly looks thin enough to me although i''m really not sure we > want to virtualize at the IO-APIC level. Peter, what''s your > opinion/preference? >Given that Xen''s requirements here are pretty Xen-specific (I don''t imagine that any other virtualization system would work in the same way), I didn''t bother trying to come up with a general "virtualization" layer at this level - that''s why I was pretty blunt about putting xen_* calls in without any indirection. But the code could certainly be restructured in a way which would make it simpler to hook Xen as a side-effect, so long as it was achieving some other goal as well (general cleanup, big iron architecture support, better msi handling, whatever...). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-20 19:39 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Yinghai Lu wrote:> Ingo Molnar wrote: > > >> it certainly looks thin enough to me although i''m really not sure we >> want to virtualize at the IO-APIC level. Peter, what''s your >> opinion/preference? >> >> > > relative fixed mapping, always make it simple and tracked easily. > like irq nr == gsi nr...That was a bit terse for me to understand your point. The code already has irq==gsi. Are you proposing that it stay that way, or something else? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-20 21:32 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Eric W. Biederman wrote:> Jeremy Fitzhardinge <jeremy@goop.org> writes: > > >> Ingo Molnar wrote: >> >>> * Jeremy Fitzhardinge <jeremy@goop.org> wrote: >>> >>> >>> >>>> Writes to the IO APIC are paravirtualized via hypercalls, so implement >>>> the appropriate operations. >>>> >>>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >>>> --- >>>> arch/x86/xen/Makefile | 3 +- >>>> arch/x86/xen/apic.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ >>>> arch/x86/xen/enlighten.c | 2 + >>>> arch/x86/xen/xen-ops.h | 2 + >>>> 4 files changed, 72 insertions(+), 1 deletion(-) >>>> >>>> >>> hm, why is the ioapic used as the API here, and not an irqchip? >>> >>> >> In essence, the purpose of the series is to break the 1:1 relationship between >> Linux irqs and hardware GSIs. >> > > Bad idea (I think). We have a 1:1 relationship between the linux irq number and > the GSI because it makes the code dramatically simpler, and it took significant > work to get there. The concept of an intermediate mapping layer sounds nasty. > But I haven''t yet read the patch. >The changes are spread over a number of patches, but the meat of it is in "xen: route hardware irqs via Xen". It turns out fairly simply, but perhaps its because I''ve made a number of simplifying assumptions: interrupts are always IOAPIC based, only using ACPI for routing, no MSI support yet. But it seems to me that the only time you really care that the irq isn''t a gsi is when programming a vector into the ioapics - you need to do a irq -> ioapic/pin mapping anyway, so adding a irq -> gsi -> ioapic/pin map isn''t all that complex. And conversely, when probing devices you need to map gsi->irq to see whether the interrupt is shared, though you could do that on a pure gsi level anyway. And of course the current code isn''t purely irq == gsi anyway, since msis are allocated irqs as well, and there''s no underlying gsi. In a sense you can think of the other Xen interrupt sources as being a bit like MSI, at least in as much as they''re not sourced from a GSI (but they go further and are not sourced from an IOAPIC at all). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-21 01:16 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Eric W. Biederman wrote:> Jeremy Fitzhardinge <jeremy@goop.org> writes: > > > >> The changes are spread over a number of patches, but the meat of it is in "xen: >> route hardware irqs via Xen". It turns out fairly simply, but perhaps its >> because I''ve made a number of simplifying assumptions: interrupts are always >> IOAPIC based, only using ACPI for routing, no MSI support yet. >> >> But it seems to me that the only time you really care that the irq isn''t a gsi >> is when programming a vector into the ioapics - you need to do a irq -> >> ioapic/pin mapping anyway, so adding a irq -> gsi -> ioapic/pin map isn''t all >> that complex. >> > > It is hideous. Been there and ripped out hundreds of lines of useless and problem > causing code to get here. It is especially bad when you do not identity map the first > 16 gsi to linux irqs (the legacy isa irqs). >Yes. I made that concession too, and just reserved them as identity mapped legacy irqs.> Yep. And but the numbers we you should be beyond the range of the gsi''s so there > is no conflict. Think of it an extension of how we identitly make the low 16 linux > irqs. >Yes, I suppose we can statically partition the irq space. In fact the original 2.6.18-xen dom0 kernel does precisely that, but runs into limitations because of the compile-time limit on NR_IRQS in that kernel. If we move to a purely dynamically allocated irq space, then having a sparse allocation if irqs becomes reasonable again, for msis and vectorless Xen interrupts.>> In a sense you can think >> of the other Xen interrupt sources as being a bit like MSI, at least in as much >> as they''re not sourced from a GSI (but they go further and are not sourced from >> an IOAPIC at all). >> > > MSI isn''t sourced from an IOAPIC either. >Right.> The difference is that the xen sources are not delivered using vectors. The cpu > vector numbers we do hide and treat as an implementation detail. And I am totally > happy not going through the vector allocation path. >Right. And in the physical irq event channel case, the vector space is managed by Xen, so we need to use Xen to allocate the vector, then program that into the appropriate place in the ioapic.> My gut feel says that you just want to use a different set of irq operations when > doing Xen native and working with hardware interrupts. I haven''t seen the code so > I don''t know how you interact there. Except in dom0 this is not a consideration so > I don''t how it is handled. >Yeah. In the domU case, where there''s no physical interrupts, the Xen code completely avoids the ioapic/vector stuff, and directly converts an event channel into an irq. Indeed, physical irq delivery is handled the same way; its just that the setup requires touching the ioapics to program the appropriate vector and bind it to an event channel. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Simon Horman
2008-Nov-21 07:17 UTC
[Xen-devel] Re: [PATCH 36 of 38] xen: route hardware irqs via Xen
On Thu, Nov 13, 2008 at 11:10:34AM -0800, Jeremy Fitzhardinge wrote:> This patch puts the hooks into place so that when the interrupt > subsystem registers an irq, it gets routed via Xen (if we''re running > under Xen). > > The first step is to get a gsi for a particular device+pin. We use > the normal acpi interrupt routing to do the mapping. > > Normally the gsi number is used directly as the irq number. We can''t > do that since we also have irqs for non-hardware event channels, and > so we must share the irq space between them. A given gsi is only > allocated a single irq, so re-registering a gsi will simply return the > same irq. > > We therefore allocate an irq for a given gsi, and return that. As a > special case, we reserve the first 16 irqs for identity-mapping legacy > irqs, since there''s a fair amount of code which assumes that. > > Having allocated an irq, we ask Xen to allocate a vector, and then > bind that pirq/vector to an event channel. When the hardware raises > an interrupt on a vector, Xen signals us on the corresponding event > channel, which gets routed to the irq and delivered to the appropriate > device driver. > > This patch does everything except set up the IO APIC pin routing to > the vector. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > arch/x86/kernel/acpi/boot.c | 8 +++ > arch/x86/pci/legacy.c | 4 + > arch/x86/xen/Makefile | 1 > arch/x86/xen/pci.c | 98 +++++++++++++++++++++++++++++++++++++++++++ > arch/x86/xen/xen-ops.h | 1 > drivers/xen/events.c | 9 ++- > include/asm-x86/xen/pci.h | 7 +++ > include/xen/events.h | 8 +++ > 8 files changed, 132 insertions(+), 4 deletions(-) >[snip]> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h > --- a/arch/x86/xen/xen-ops.h > +++ b/arch/x86/xen/xen-ops.h > @@ -63,7 +63,6 @@ > static inline void xen_smp_init(void) {} > #endif > > - > void xen_init_apic(void); > > /* Declare an asm function, along with symbols needed to make itHi Jeremy, This seems like a spurious whitespace change that could be merged into "[PATCH 30 of 38] xen: implement io_apic_ops" [snip] -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-21 14:21 UTC
[Xen-devel] Re: [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
On Wed, 2008-11-19 at 11:19 +0900, FUJITA Tomonori wrote:> > The problem that I talked about in the previous mail: > > > max_slots = mask + 1 > > ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT > > : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); > > Since the popular value of the mask is 0xffffffff. So the above code > (mask + 1 ?) works wrongly if the size of mask is 32bit (well, > accidentally the result of max_slots is identical though).I''ve just been looking at this again and I don''t think it is an accident that this evaluates to the correct value when mask + 1 == 0. The patch which adds the "mask + 1 ? ... : 1UL << ..." stuff is: commit b15a3891c916f32a29832886a053a48be2741d4d Author: Jan Beulich <jbeulich@novell.com> Date: Thu Mar 13 09:13:30 2008 +0000 avoid endless loops in lib/swiotlb.c Commit 681cc5cd3efbeafca6386114070e0bfb5012e249 ("iommu sg merging: swiotlb: respect the segment boundary limits") introduced two possibilities for entering an endless loop in lib/swiotlb.c: - if max_slots is zero (possible if mask is ~0UL) [...] I think the existing code is the nicest way to handle this corner case and it is necessary anyway to handle the ~0UL case on 64 bit. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-24 11:41 UTC
[Xen-devel] Re: [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
On Sat, 2008-11-22 at 10:49 +0900, FUJITA Tomonori wrote:> On Fri, 21 Nov 2008 14:21:32 +0000 > Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > On Wed, 2008-11-19 at 11:19 +0900, FUJITA Tomonori wrote: > > > > > > The problem that I talked about in the previous mail: > > > > > > > max_slots = mask + 1 > > > > ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT > > > > : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); > > > > > > Since the popular value of the mask is 0xffffffff. So the above code > > > (mask + 1 ?) works wrongly if the size of mask is 32bit (well, > > > accidentally the result of max_slots is identical though). > > > > I''ve just been looking at this again and I don''t think it is an accident > > that this evaluates to the correct value when mask + 1 == 0. > > > > The patch which adds the "mask + 1 ? ... : 1UL << ..." stuff is: > > > > commit b15a3891c916f32a29832886a053a48be2741d4d > > Author: Jan Beulich <jbeulich@novell.com> > > Date: Thu Mar 13 09:13:30 2008 +0000 > > > > avoid endless loops in lib/swiotlb.c > > > > Commit 681cc5cd3efbeafca6386114070e0bfb5012e249 ("iommu sg merging: > > swiotlb: respect the segment boundary limits") introduced two > > possibilities for entering an endless loop in lib/swiotlb.c: > > > > - if max_slots is zero (possible if mask is ~0UL) > > [...] > > > > I think the existing code is the nicest way to handle this corner case > > and it is necessary anyway to handle the ~0UL case on 64 bit. > > Ah, I vaguely remember this patch. The ~0ULL mask didn''t happen here > (nobody uses it) so the possibility was false. IMHO, if we use this > code on 32bit architectures, the mask should be u64 and the overflow > should be handled explicitly. But as you pointed out, looks like that > this patch takes account of the overflow.Something like this? Ian. --- swiotlb: explicitly handle segment boundary mask overflow. When swiotlb is used on 32 bit we can overflow mask + 1 in the common case where mask is 0xffffffffUL. This overflow was previously caught by the case which attempts to handle a mask of ~0UL on 64 bit. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> diff -r 5fa30e5284dd lib/swiotlb.c --- a/lib/swiotlb.c Mon Nov 24 09:39:50 2008 +0000 +++ b/lib/swiotlb.c Mon Nov 24 11:37:39 2008 +0000 @@ -303,7 +303,7 @@ unsigned int nslots, stride, index, wrap; int i; unsigned long start_dma_addr; - unsigned long mask; + u64 mask; unsigned long offset_slots; unsigned long max_slots; @@ -314,6 +314,7 @@ max_slots = mask + 1 ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); + BUG_ON(max_slots > 1UL << (BITS_PER_LONG - IO_TLB_SHIFT)); /* * For mappings greater than a page, we limit the stride (and _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Nov-24 19:18 UTC
[Xen-devel] Re: [PATCH 30 of 38] xen: implement io_apic_ops
Eric W. Biederman wrote:> Jeremy Fitzhardinge <jeremy@goop.org> writes: > > >> Yes, I suppose we can statically partition the irq space. In fact the original >> 2.6.18-xen dom0 kernel does precisely that, but runs into limitations because of >> the compile-time limit on NR_IRQS in that kernel. If we move to a purely >> dynamically allocated irq space, then having a sparse allocation if irqs becomes >> reasonable again, for msis and vectorless Xen interrupts. >> >> >>> The difference is that the xen sources are not delivered using vectors. The cpu >>> vector numbers we do hide and treat as an implementation detail. And I am totally >>> happy not going through the vector allocation path. >>> >>> >> Right. And in the physical irq event channel case, the vector space is managed >> by Xen, so we need to use Xen to allocate the vector, then program that into the >> appropriate place in the ioapic. >> > > We should be able to share code with iommu for irqs handling, at first glance you > are describing a pretty similar problem. Now I don''t know think the interrupt > remapping code is any kind of beauty but that seems to be roughly what you > are doing with Xen domU. I certainly think with some careful factoring > we can share the ioapic munging code. And the code to pick how we program > the ioapics. >Notwithstanding the possibility that there''ll be general changes to x86 interrupt handing in the future, do you have any objection to my patches as they stand? Ingo would like to see your and/or hpa''s ack before accepting them. Should I repost them? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-26 09:36 UTC
[Xen-devel] Re: [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
On Wed, 2008-11-26 at 11:53 +0900, FUJITA Tomonori wrote:> > > + BUG_ON(max_slots > 1UL << (BITS_PER_LONG - IO_TLB_SHIFT)); > > How can this BUG_ON happen? Using u64 for the mask is fine though.It covers the cases where the previous code would have overflowed. It can''t happen right now because although mask is 64 bits the value assigned to it is currently sizeof(unsigned long). If someone changes the type of that field then we would start seeing unexpected values. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2008-Nov-27 17:14 UTC
[Xen-devel] Re: [PATCH 18 of 38] x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
On Thu, 2008-11-27 at 12:43 +0900, FUJITA Tomonori wrote:> On Wed, 26 Nov 2008 09:36:49 +0000 > Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > On Wed, 2008-11-26 at 11:53 +0900, FUJITA Tomonori wrote: > > > > > > > + BUG_ON(max_slots > 1UL << (BITS_PER_LONG - IO_TLB_SHIFT)); > > > > > > How can this BUG_ON happen? Using u64 for the mask is fine though. > > > > It covers the cases where the previous code would have overflowed. It > > can''t happen right now because although mask is 64 bits the value > > assigned to it is currently sizeof(unsigned long). If someone changes > > the type of that field then we would start seeing unexpected values. > > If someone changes dma_get_seg_boundary to return a u64 value instead > of unsigned long, this BUG_ON could happen on 32bit architectures. But > you don''t need to trigger BUG_ON for it. max_slots > 1UL << > (BITS_PER_LONG - IO_TLB_SHIFT) should be fine for > iommu_is_span_boundary(). > > Anyway, this is minor but would it be nice to make sure that anyone > can easily understand the code without digging into the git log? > > a) dropping this patch and adding some comments how the code works > (especially about the overflow on 32bit architectures). > > b) removing the BUG_ON in this patch and adding some comments.Yes, I think adding a comment to the existing code (option a) would be best. I actually have a small queue of other fixes which make swiotlb work properly for x86 PAE and HighMem but they are not particularly well baked at the moment. I''ll include a patch to add a comment in that series. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-23 09:20 UTC
RE: [Xen-devel] [PATCH 00 of 38] xen: add more Xen dom0 support
We are working on the MSI support now. I'd get some guide on how the pirq is defined in pv dom0, can I assume it is something like xen interrupt controller's gsi? Thanks Yunhong Jiang -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge Sent: 2008年11月14日 3:10 To: Ingo Molnar Cc: the arch/x86 maintainers; Xen-devel; linux-kernel@vger.kernel.org; Ian Campbell Subject: [Xen-devel] [PATCH 00 of 38] xen: add more Xen dom0 support TODO: work out how to fit MSI into all this. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-23 16:27 UTC
Re: [Xen-devel] [PATCH 00 of 38] xen: add more Xen dom0 support
Jiang, Yunhong wrote:> We are working on the MSI support now. I''d get some guide on how the pirq is defined in pv dom0, can I assume it is something like xen interrupt controller''s gsi?The pirq we pass to Xen is the Linux irq we allocate for the interrupt source. For compatibility with native, we maintain an irq == gsi identity, so the Xen pirq will also equal the gsi. However, there''s no problem in allocating irqs above the highest gsi for msi use; you could either adapt xen_allocate_pirq or add an msi variant. I think the first is preferred if possible; you''d just use "gsi" to refer to msis too, if they have easily distinguished identifiers. Are you thinking of putting a Xen hook in setup_msi_irq? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-24 02:23 UTC
RE: [Xen-devel] [PATCH 00 of 38] xen: add more Xen dom0 support
>-----Original Message----- >From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] >Sent: 2009年3月24日 0:28 >To: Jiang, Yunhong >Cc: Xen-devel >Subject: Re: [Xen-devel] [PATCH 00 of 38] xen: add more Xen >dom0 support > >Jiang, Yunhong wrote: >> We are working on the MSI support now. I'd get some guide on >how the pirq is defined in pv dom0, can I assume it is >something like xen interrupt controller's gsi? > > >The pirq we pass to Xen is the Linux irq we allocate for the interrupt >source. For compatibility with native, we maintain an irq == gsi >identity, so the Xen pirq will also equal the gsi. However, there's no >problem in allocating irqs above the highest gsi for msi use; you could >either adapt xen_allocate_pirq or add an msi variant. I think the first >is preferred if possible; you'd just use "gsi" to refer to msis too, if >they have easily distinguished identifiers.I think both should be ok for dom0. Should the driver domain support be considered also now?> >Are you thinking of putting a Xen hook in setup_msi_irq?Yes, exactly, I assume it is acceptable for changes under arch/x86/kernel/, like what changes for IOAPIC, am I right? Thanks Yunhong Jiang> >Thanks, >J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-24 02:52 UTC
Re: [Xen-devel] [PATCH 00 of 38] xen: add more Xen dom0 support
Jiang, Yunhong wrote:> I think both should be ok for dom0. Should the driver domain support be considered also now? >I''ve only given them secondary consideration; keep them in mind, don''t do anything with precludes them from working, but no explicit effort to make it work. PCI Passthrough is probably more important to keep in mind (though there''d be a fair amount of overlap, I''d assume).>> Are you thinking of putting a Xen hook in setup_msi_irq? >> > > Yes, exactly, I assume it is acceptable for changes under arch/x86/kernel/, like what changes for IOAPIC, am I right? >Yes, it should be OK. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel