Yuji Shimada
2008-Dec-02 06:25 UTC
[Xen-devel] [PATCH 0/3] dom0 linux: Add high MMIO area support
This series of patches add high MMIO area support to dom0 linux. They are useful when we reassign page-aligned memory resource to device or when we hot-add device. 1. Use _CRS for PCI resource allocation. (most of codes are backported from 2.6.26). 2. Sort PCI resource based on priority of allocation. 3. Support 64 bit PREF base/limit. Someone might think backport is not good. But we have to use dom0 linux based on 2.6.18 for the moment. I''d like them to accept my patches. Thanks, -- Yuji Shimada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada
2008-Dec-02 06:35 UTC
[Xen-devel] [PATCH 1/3] dom0 linux: Use _CRS for PCI resource allocation.
This patch add code to use _CRS for PCI resource allocation. To use _CRS, please add "pci=use_crs" to dom0 linux boot parameter. Without this patch, MMIO resource is allocated from e820 gap. But e820 gap is available for only low MMIO area. _CRS reports high MMIO area as well as low MMIO area. With this patch, we become able to use high MMIO area. Most of codes are backported from 2.6.26. Thanks, -- Yuji Shimada Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp> diff -r cdc6729dc702 arch/i386/pci/acpi.c --- a/arch/i386/pci/acpi.c Fri Nov 28 13:41:38 2008 +0000 +++ b/arch/i386/pci/acpi.c Mon Dec 01 19:09:12 2008 +0900 @@ -5,27 +5,228 @@ #include <asm/numa.h> #include "pci.h" -struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_device *device, int domain, int busnum) +/* This struct is backported from 2.6.26 kernel */ +struct pci_root_info { + char *name; + unsigned int res_num; + struct resource *res; + struct pci_bus *bus; + int busnum; +}; + +struct pci_sysdata { + int domain; /* PCI domain */ + int node; /* NUMA node */ +}; + +/* This function is backported from 2.6.26 kernel */ +static acpi_status __devinit +resource_to_addr(struct acpi_resource *resource, + struct acpi_resource_address64 *addr) +{ + acpi_status status; + + status = acpi_resource_to_address64(resource, addr); + if (ACPI_SUCCESS(status) && + (addr->resource_type == ACPI_MEMORY_RANGE || + addr->resource_type == ACPI_IO_RANGE) && + addr->address_length > 0 && + addr->producer_consumer == ACPI_PRODUCER) { + return AE_OK; + } + return AE_ERROR; +} + +/* This function is backported from 2.6.26 kernel */ +static acpi_status __devinit +count_resource(struct acpi_resource *acpi_res, void *data) +{ + struct pci_root_info *info = data; + struct acpi_resource_address64 addr; + acpi_status status; + + if (info->res_num >= PCI_BUS_NUM_RESOURCES) + return AE_OK; + + status = resource_to_addr(acpi_res, &addr); + if (ACPI_SUCCESS(status)) + info->res_num++; + + return AE_OK; +} + +/* This function is backported from 2.6.26 kernel */ +static acpi_status __devinit +setup_resource(struct acpi_resource *acpi_res, void *data) +{ + struct pci_root_info *info = data; + struct resource *res; + struct acpi_resource_address64 addr; + acpi_status status; + unsigned long flags; + struct resource *root; + + if (info->res_num >= PCI_BUS_NUM_RESOURCES) + return AE_OK; + + status = resource_to_addr(acpi_res, &addr); + if (!ACPI_SUCCESS(status)) { + return AE_OK; + } + + if (addr.resource_type == ACPI_MEMORY_RANGE) { + root = &iomem_resource; + flags = IORESOURCE_MEM; + if (addr.info.mem.caching == ACPI_PREFETCHABLE_MEMORY) + flags |= IORESOURCE_PREFETCH; + } else if (addr.resource_type == ACPI_IO_RANGE) { + root = &ioport_resource; + flags = IORESOURCE_IO; + } else + return AE_OK; + + res = &info->res[info->res_num]; + res->name = info->name; + res->flags = flags; + res->start = addr.minimum + addr.translation_offset; + res->end = res->start + addr.address_length - 1; + res->child = NULL; + printk(KERN_DEBUG "PCI: ACPI resource [%llx-%llx:%lx] for %s\n", + (unsigned long long)res->start, (unsigned long long)res->end, + (unsigned long)res->flags, info->name); + + if (insert_resource(root, res)) { + printk(KERN_ERR "PCI: Failed to allocate %llx-%llx from %s" + " for %s\n", (unsigned long long)res->start, + (unsigned long long)res->end, root->name, info->name); + } else { + info->bus->resource[info->res_num] = res; + info->res_num++; + } + return AE_OK; +} + +/* This function is backported from 2.6.26 kernel */ +static void __devinit adjust_transparent_bridge_resources(struct pci_bus *bus) +{ + struct pci_dev *dev; + + list_for_each_entry(dev, &bus->devices, bus_list) { + int i; + u16 class = dev->class >> 8; + + if (class == PCI_CLASS_BRIDGE_PCI && dev->transparent) { + for(i = 3; i < PCI_BUS_NUM_RESOURCES; i++) + dev->subordinate->resource[i] + dev->bus->resource[i - 3]; + } + } +} + +/* This function is backported from 2.6.26 kernel */ +static void __devinit +get_current_resources(struct acpi_device *device, int busnum, + int domain, struct pci_bus *bus) +{ + struct pci_root_info info; + size_t size; + + info.bus = bus; + info.res_num = 0; + info.name = kmalloc(16, GFP_KERNEL); + if (!info.name) + goto res_alloc_fail; + sprintf(info.name, "PCI Bus %04x:%02x", domain, busnum); + + acpi_walk_resources(device->handle, METHOD_NAME__CRS, + count_resource, &info); + if (!info.res_num) + return; + + size = sizeof(*info.res) * info.res_num; + info.res = kmalloc(size, GFP_KERNEL); + if (!info.res) { + printk(KERN_ERR "PCI: Failed to allocate resource structure " + "for %s\n", info.name); + goto name_alloc_fail; + } + + info.res_num = 0; + acpi_walk_resources(device->handle, METHOD_NAME__CRS, + setup_resource, &info); + if (info.res_num) { + adjust_transparent_bridge_resources(bus); + } + + return; + +name_alloc_fail: + kfree(info.res); +res_alloc_fail: + return; +} + +/* This function is backported from 2.6.26 kernel */ +struct pci_bus * __devinit +pci_acpi_scan_root(struct acpi_device *device, int domain, int busnum) { struct pci_bus *bus; + struct pci_sysdata *sd; + int node; +#ifdef CONFIG_ACPI_NUMA + int pxm; +#endif - if (domain != 0) { - printk(KERN_WARNING "PCI: Multiple domains not supported\n"); + node = -1; +#ifdef CONFIG_ACPI_NUMA + pxm = acpi_get_pxm(device->handle); + if (pxm >= 0) + node = pxm_to_node(pxm); +#endif + + /* Allocate per-root-bus (not per bus) arch-specific data. + * TODO: leak; this memory is never freed. + * It''s arguable whether it''s worth the trouble to care. + */ + sd = kzalloc(sizeof(*sd), GFP_KERNEL); + if (!sd) { + printk(KERN_ERR "PCI: OOM, not probing PCI bus %02x\n", busnum); return NULL; } - bus = pcibios_scan_root(busnum); + sd->domain = domain; + sd->node = node; + /* + * Maybe the desired pci bus has been already scanned. In such case + * it is unnecessary to scan the pci bus with the given domain,busnum. + */ + bus = pci_find_bus(domain, busnum); + if (bus) { + /* + * If the desired bus exits, the content of bus->sysdata will + * be replaced by sd. + */ + memcpy(bus->sysdata, sd, sizeof(*sd)); + kfree(sd); + } else + bus = pci_scan_bus_parented(NULL, busnum, &pci_root_ops, sd); + + if (!bus) + kfree(sd); + #ifdef CONFIG_ACPI_NUMA - if (bus != NULL) { - int pxm = acpi_get_pxm(device->handle); + if (bus) { if (pxm >= 0) { - bus->sysdata = (void *)(unsigned long)pxm_to_node(pxm); - printk("bus %d -> pxm %d -> node %ld\n", - busnum, pxm, (long)(bus->sysdata)); + printk(KERN_DEBUG "bus %02x -> pxm %d -> node %d\n", + busnum, pxm, pxm_to_node(pxm)); } } #endif - + + if (bus && (pci_probe & PCI_USE__CRS)) { + get_current_resources(device, busnum, domain, bus); + } + return bus; } diff -r cdc6729dc702 arch/i386/pci/common.c --- a/arch/i386/pci/common.c Fri Nov 28 13:41:38 2008 +0000 +++ b/arch/i386/pci/common.c Mon Dec 01 19:09:12 2008 +0900 @@ -260,6 +260,9 @@ char * __devinit pcibios_setup(char *st } else if (!strcmp(str, "assign-busses")) { pci_probe |= PCI_ASSIGN_ALL_BUSSES; return NULL; + } else if (!strcmp(str, "use_crs")) { + pci_probe |= PCI_USE__CRS; + return NULL; } else if (!strcmp(str, "routeirq")) { pci_routeirq = 1; return NULL; diff -r cdc6729dc702 arch/i386/pci/pci.h --- a/arch/i386/pci/pci.h Fri Nov 28 13:41:38 2008 +0000 +++ b/arch/i386/pci/pci.h Mon Dec 01 19:09:12 2008 +0900 @@ -25,6 +25,7 @@ #define PCI_ASSIGN_ROMS 0x1000 #define PCI_BIOS_IRQ_SCAN 0x2000 #define PCI_ASSIGN_ALL_BUSSES 0x4000 +#define PCI_USE__CRS 0x10000 extern unsigned int pci_probe; extern unsigned long pirq_table_addr; diff -r cdc6729dc702 include/asm-i386/pci.h --- a/include/asm-i386/pci.h Fri Nov 28 13:41:38 2008 +0000 +++ b/include/asm-i386/pci.h Mon Dec 01 19:09:12 2008 +0900 @@ -4,6 +4,22 @@ #ifdef __KERNEL__ #include <linux/mm.h> /* for struct page */ + +struct pci_sysdata { + int domain; /* PCI domain */ + int node; /* NUMA node */ +}; + +static inline int pci_domain_nr(struct pci_bus *bus) +{ + struct pci_sysdata *sd = bus->sysdata; + return sd->domain; +} + +static inline int pci_proc_domain(struct pci_bus *bus) +{ + return pci_domain_nr(bus); +} /* Can be used to override the logic in pci_scan_bus for skipping already-configured bus numbers - to be used for buggy BIOSes @@ -116,4 +132,14 @@ static inline void pci_dma_burst_advice( /* generic pci stuff */ #include <asm-generic/pci.h> +#ifdef CONFIG_NUMA +/* Returns the node based on pci bus */ +static inline int __pcibus_to_node(struct pci_bus *bus) +{ + struct pci_sysdata *sd = bus->sysdata; + + return sd->node; +} +#endif + #endif /* __i386_PCI_H */ diff -r cdc6729dc702 include/asm-i386/topology.h --- a/include/asm-i386/topology.h Fri Nov 28 13:41:38 2008 +0000 +++ b/include/asm-i386/topology.h Mon Dec 01 19:09:12 2008 +0900 @@ -67,7 +67,7 @@ static inline int node_to_first_cpu(int return first_cpu(mask); } -#define pcibus_to_node(bus) ((long) (bus)->sysdata) +#define pcibus_to_node(bus) __pcibus_to_node(bus) #define pcibus_to_cpumask(bus) node_to_cpumask(pcibus_to_node(bus)) /* sched_domains SD_NODE_INIT for NUMAQ machines */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada
2008-Dec-02 06:35 UTC
[Xen-devel] [PATCH 2/3] dom0 linux: Sort PCI resource based on priority of allocation.
This patch add sorting PCI resource based on priority of allocation. The priority is following. 1. I/O resource 2. Prefetchable high MMIO 3. Prefetchable low MMIO 4. high MMIO 5. low MMIO This patch works when "pci=use_crs" boot parameter is specified. Thanks, -- Yuji Shimada Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp> diff -r 13c0881b6f9d arch/i386/pci/acpi.c --- a/arch/i386/pci/acpi.c Mon Dec 01 21:04:16 2008 +0900 +++ b/arch/i386/pci/acpi.c Mon Dec 01 21:05:35 2008 +0900 @@ -106,6 +106,75 @@ setup_resource(struct acpi_resource *acp return AE_OK; } +static void __devinit sort_resources(struct pci_bus *bus) +{ + struct resource *res, *work; + int i, j; + + for (i=0; i<(PCI_BUS_NUM_RESOURCES-1); i++) { + for (j=(PCI_BUS_NUM_RESOURCES-1); j>i; j--) { + res = bus->resource[j]; + work = bus->resource[j-1]; + if (!res || !work) + continue; + + /* res is MMIO resource and work is I/O resource + * they should not be swapped */ + if ((res->flags & IORESOURCE_MEM) && + (work->flags & IORESOURCE_IO)) + continue; + + /* both is I/O resource */ + if ((res->flags & IORESOURCE_IO) && + (work->flags & IORESOURCE_IO)) + /* work''s size is bigger than res''s or equal + * they should not be swapped */ + if ((work->end - work->start) >+ (res->end - res->start)) + continue; + + /* both is MMIO resource */ + if ((res->flags & IORESOURCE_MEM) && + (work->flags & IORESOURCE_MEM)) { + /* res isn''t prefetchable and + * work is prefetchable */ + if (!(res->flags & IORESOURCE_PREFETCH) && + (work->flags & IORESOURCE_PREFETCH)) + continue; + + /* both is prefetchable or + * both is not prefetchable */ + if (((res->flags & IORESOURCE_PREFETCH) && + (work->flags & IORESOURCE_PREFETCH)) || + (!(res->flags & IORESOURCE_PREFETCH) && + !(work->flags & IORESOURCE_PREFETCH))) { + + /* res is Low area and work is High area + * they should not be swapped */ + if ((res->start >> 32) == 0 && + (work->start >> 32) != 0) + continue; + + /* both is same area (High or Low) */ + if (((res->start >> 32) != 0 && + (work->start >> 32) != 0) || + ((res->start >> 32) == 0 && + (work->start >> 32) == 0)) + /* work''s size is bigger or + * equal than res''s size + * they should not be swapped */ + if ((work->end - work->start) >= + (res->end - res->start)) + continue; + } + } + /* swap res and work */ + bus->resource[j-1] = res; + bus->resource[j] = work; + } + } +} + /* This function is backported from 2.6.26 kernel */ static void __devinit adjust_transparent_bridge_resources(struct pci_bus *bus) { @@ -155,6 +224,7 @@ get_current_resources(struct acpi_device acpi_walk_resources(device->handle, METHOD_NAME__CRS, setup_resource, &info); if (info.res_num) { + sort_resources(bus); adjust_transparent_bridge_resources(bus); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada
2008-Dec-02 06:35 UTC
[Xen-devel] [PATCH 3/3] dom0 linux: Support 64 bit PREF base/limit.
This patch add suporting 64 bit PREF base/limit. Thanks, -- Yuji Shimada Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp> diff -r f0f7aa8b1367 drivers/pci/setup-bus.c --- a/drivers/pci/setup-bus.c Fri Nov 28 11:50:04 2008 +0900 +++ b/drivers/pci/setup-bus.c Fri Nov 28 15:56:02 2008 +0900 @@ -150,6 +150,7 @@ pci_setup_bridge(struct pci_bus *bus) struct pci_dev *bridge = bus->self; struct pci_bus_region region; u32 l, io_upper16; + u32 base_up32, limit_up32; DBG(KERN_INFO "PCI: Bridge: %s\n", pci_name(bridge)); @@ -203,17 +204,23 @@ pci_setup_bridge(struct pci_bus *bus) if (bus->resource[2]->flags & IORESOURCE_PREFETCH) { l = (region.start >> 16) & 0xfff0; l |= region.end & 0xfff00000; - DBG(KERN_INFO " PREFETCH window: %08lx-%08lx\n", - region.start, region.end); + DBG(KERN_INFO " PREFETCH window: %llx-%llx\n", + (unsigned long long)region.start, + (unsigned long long)region.end); + base_up32 = (region.start >> 32) & 0xffffffff; + limit_up32 = (region.end >> 32) & 0xffffffff; } else { l = 0x0000fff0; + base_up32 = 0xffffffff; + limit_up32 = 0; DBG(KERN_INFO " PREFETCH window: disabled.\n"); } pci_write_config_dword(bridge, PCI_PREF_MEMORY_BASE, l); - /* Clear out the upper 32 bits of PREF base. */ - pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, 0); + /* Set up the upper 32 bits of PREF base/limit. */ + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, base_up32); + pci_write_config_dword(bridge, PCI_PREF_LIMIT_UPPER32, limit_up32); pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-02 08:35 UTC
[Xen-devel] Re: [PATCH 0/3] dom0 linux: Add high MMIO area support
How about using http://xenbits.xensource.com/ext/linux-2.6.27-xen.hg for this sort of thing until pv_ops is readier? linux-2.6.18-xen.hg would like a peaceful and stable old age. :-) -- Keir On 2/12/08 06:25, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:> This series of patches add high MMIO area support to dom0 linux. They > are useful when we reassign page-aligned memory resource to device or > when we hot-add device. > > 1. Use _CRS for PCI resource allocation. > (most of codes are backported from 2.6.26). > > 2. Sort PCI resource based on priority of allocation. > > 3. Support 64 bit PREF base/limit. > > > Someone might think backport is not good. But we have to use dom0 > linux based on 2.6.18 for the moment. I''d like them to accept my > patches. > > Thanks, > -- > Yuji Shimada >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yuji Shimada
2008-Dec-03 02:41 UTC
[Xen-devel] Re: [PATCH 0/3] dom0 linux: Add high MMIO area support
I''d like to include high MMIO area support into dom0 linux for xen 3.4. Which linux will xen 3.4 support? I expect xen 3.4 will support both upstream linux and linux-2.6.18-xen.hg, because I think it is difficult to merge all function which linux-2.6.18-xen.hg has to upstream linux. Will xen 3.4 support linux-2.6.27-xen.hg too? I have more question. If we have some function to be included into future dom0 linux, what should we do? Should we submit patch to ML relating to linux? Thanks, -- Yuji Shimada On Tue, 02 Dec 2008 08:35:27 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> How about using http://xenbits.xensource.com/ext/linux-2.6.27-xen.hg for > this sort of thing until pv_ops is readier? > > linux-2.6.18-xen.hg would like a peaceful and stable old age. :-) > > -- Keir > > On 2/12/08 06:25, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote: > > > This series of patches add high MMIO area support to dom0 linux. They > > are useful when we reassign page-aligned memory resource to device or > > when we hot-add device. > > > > 1. Use _CRS for PCI resource allocation. > > (most of codes are backported from 2.6.26). > > > > 2. Sort PCI resource based on priority of allocation. > > > > 3. Support 64 bit PREF base/limit. > > > > > > Someone might think backport is not good. But we have to use dom0 > > linux based on 2.6.18 for the moment. I''d like them to accept my > > patches. > > > > Thanks, > > -- > > Yuji Shimada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-03 09:22 UTC
[Xen-devel] Re: [PATCH 0/3] dom0 linux: Add high MMIO area support
I suppose it depends what function we think the 2.6.18 tree should have. We created the 2.6.27 tree specifically to allow development of new platform features without needing large backports to old 2.6.18. Your concern over what dom0 will be ''supported'' by 3.4 is confused -- we ship no guest kernels with Xen releases. Most users will install vendor Xen kernels, all of which are newer than 2.6.18 and will not pick up platform-support backports from our 2.6.18 tree. Even if they want a Xen-specific feature present only in 2.6.18, many will be put off trying that feature out by the fact that 2.6.18 is tragically old and they''ll rightly fear feature regressions and security holes (compared with newer kernels). -- Keir On 03/12/2008 02:41, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote:> I''d like to include high MMIO area support into dom0 linux for xen 3.4. > > Which linux will xen 3.4 support? I expect xen 3.4 will support both > upstream linux and linux-2.6.18-xen.hg, because I think it is > difficult to merge all function which linux-2.6.18-xen.hg has to > upstream linux. Will xen 3.4 support linux-2.6.27-xen.hg too? > > > I have more question. If we have some function to be included into > future dom0 linux, what should we do? Should we submit patch to ML > relating to linux? > > Thanks, > -- > Yuji Shimada > > On Tue, 02 Dec 2008 08:35:27 +0000 > Keir Fraser <keir.fraser@eu.citrix.com> wrote: > >> How about using http://xenbits.xensource.com/ext/linux-2.6.27-xen.hg for >> this sort of thing until pv_ops is readier? >> >> linux-2.6.18-xen.hg would like a peaceful and stable old age. :-) >> >> -- Keir >> >> On 2/12/08 06:25, "Yuji Shimada" <shimada-yxb@necst.nec.co.jp> wrote: >> >>> This series of patches add high MMIO area support to dom0 linux. They >>> are useful when we reassign page-aligned memory resource to device or >>> when we hot-add device. >>> >>> 1. Use _CRS for PCI resource allocation. >>> (most of codes are backported from 2.6.26). >>> >>> 2. Sort PCI resource based on priority of allocation. >>> >>> 3. Support 64 bit PREF base/limit. >>> >>> >>> Someone might think backport is not good. But we have to use dom0 >>> linux based on 2.6.18 for the moment. I''d like them to accept my >>> patches. >>> >>> Thanks, >>> -- >>> Yuji Shimada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel