Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such as QoS, security, and etc. The Physical Function and Virtual Function drivers using the SR-IOV APIs will come soon! Major changes from v6 to v7: 1, remove boot-time resource rebalancing support. (Greg KH) 2, emit uevent upon the PF driver is loaded. (Greg KH) 3, put SR-IOV callback function into the 'pci_driver'. (Matthew Wilcox) 4, register SR-IOV service at the PF loading stage. 5, remove unnecessary APIs (pci_iov_enable/disable). --- [PATCH 1/13 v7] PCI: enhance pci_ari_enabled() [PATCH 2/13 v7] PCI: remove unnecessary arg of pci_update_resource() [PATCH 3/13 v7] PCI: define PCI resource names in an 'enum' [PATCH 4/13 v7] PCI: remove unnecessary condition check in pci_restore_bars() [PATCH 5/13 v7] PCI: export __pci_read_base() [PATCH 6/13 v7] PCI: make pci_alloc_child_bus() be able to handle NULL bridge [PATCH 7/13 v7] PCI: add a new function to map BAR offset [PATCH 8/13 v7] PCI: cleanup pci_bus_add_devices() [PATCH 9/13 v7] PCI: split a new function from pci_bus_add_devices() [PATCH 10/13 v7] PCI: support the SR-IOV capability [PATCH 11/13 v7] PCI: reserve bus range for SR-IOV device [PATCH 12/13 v7] PCI: document the SR-IOV sysfs entries [PATCH 13/13 v7] PCI: document for SR-IOV user and developer Cc: Alex Chiang <achiang at hp.com> Cc: Bjorn Helgaas <bjorn.helgaas at hp.com> Cc: Grant Grundler <grundler at parisc-linux.org> Cc: Greg KH <greg at kroah.com> Cc: Ingo Molnar <mingo at elte.hu> Cc: Jesse Barnes <jbarnes at virtuousgeek.org> Cc: Matthew Wilcox <matthew at wil.cx> Cc: Randy Dunlap <randy.dunlap at oracle.com> Cc: Roland Dreier <rdreier at cisco.com> Cc: Simon Horman <horms at verge.net.au> Cc: Yinghai Lu <yinghai at kernel.org> --- Single Root I/O Virtualization (SR-IOV) capability defined by PCI-SIG is intended to enable multiple system software to share PCI hardware resources. PCI device that supports this capability can be extended to one Physical Functions plus multiple Virtual Functions. Physical Function, which could be considered as the "real" PCI device, reflects the hardware instance and manages all physical resources. Virtual Functions are associated with a Physical Function and shares physical resources with the Physical Function.Software can control allocation of Virtual Functions via registers encapsulated in the capability structure. SR-IOV specification can be found at http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf Devices that support SR-IOV are available from following vendors: http://download.intel.com/design/network/ProdBrf/320025.pdf http://www.netxen.com/products/chipsolutions/NX3031.html http://www.neterion.com/products/x3100.html
Change parameter of pci_ari_enabled() from 'pci_dev' to 'pci_bus'. ARI forwarding on the bridge mostly concerns the subordinate devices rather than the bridge itself. So this change will make the function easier to use. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/pci.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9de87e9..1449884 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -162,13 +162,13 @@ struct pci_slot_attribute { extern void pci_enable_ari(struct pci_dev *dev); /** * pci_ari_enabled - query ARI forwarding status - * @dev: the PCI device + * @bus: the PCI bus * * Returns 1 if ARI forwarding is enabled, or 0 if not enabled; */ -static inline int pci_ari_enabled(struct pci_dev *dev) +static inline int pci_ari_enabled(struct pci_bus *bus) { - return dev->ari_enabled; + return bus->self && bus->self->ari_enabled; } #endif /* DRIVERS_PCI_H */ -- 1.5.6.4
Yu Zhao
2008-Nov-21 18:38 UTC
[PATCH 2/13 v7] PCI: remove unnecessary arg of pci_update_resource()
This cleanup removes unnecessary argument 'struct resource *res' in pci_update_resource(), so it takes same arguments as other companion functions (pci_assign_resource(), etc.). Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/pci.c | 4 ++-- drivers/pci/setup-res.c | 7 ++++--- include/linux/pci.h | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 21f2ac6..c408be8 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -377,8 +377,8 @@ pci_restore_bars(struct pci_dev *dev) return; } - for (i = 0; i < numres; i ++) - pci_update_resource(dev, &dev->resource[i], i); + for (i = 0; i < numres; i++) + pci_update_resource(dev, i); } static struct pci_platform_pm_ops *pci_platform_pm; diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index 2dbd96c..b7ca679 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -26,11 +26,12 @@ #include "pci.h" -void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno) +void pci_update_resource(struct pci_dev *dev, int resno) { struct pci_bus_region region; u32 new, check, mask; int reg; + struct resource *res = dev->resource + resno; /* * Ignore resources for unimplemented BARs and unused resource slots @@ -162,7 +163,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno) } else { res->flags &= ~IORESOURCE_STARTALIGN; if (resno < PCI_BRIDGE_RESOURCES) - pci_update_resource(dev, res, resno); + pci_update_resource(dev, resno); } return ret; @@ -197,7 +198,7 @@ int pci_assign_resource_fixed(struct pci_dev *dev, int resno) dev_err(&dev->dev, "BAR %d: can't allocate %s resource %pR\n", resno, res->flags & IORESOURCE_IO ? "I/O" : "mem", res); } else if (resno < PCI_BRIDGE_RESOURCES) { - pci_update_resource(dev, res, resno); + pci_update_resource(dev, resno); } return ret; diff --git a/include/linux/pci.h b/include/linux/pci.h index feb4657..7e7ff03 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -642,7 +642,7 @@ int pcie_get_readrq(struct pci_dev *dev); int pcie_set_readrq(struct pci_dev *dev, int rq); int pci_reset_function(struct pci_dev *dev); int pci_execute_reset_function(struct pci_dev *dev); -void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno); +void pci_update_resource(struct pci_dev *dev, int resno); int __must_check pci_assign_resource(struct pci_dev *dev, int i); int pci_select_bars(struct pci_dev *dev, unsigned long flags); -- 1.5.6.4
This patch moves all definitions of the PCI resource names to an 'enum', and also replaces some hard-coded resource variables with symbol names. This change eases introduction of device specific resources. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/pci-sysfs.c | 4 +++- drivers/pci/probe.c | 2 +- drivers/pci/proc.c | 7 ++++--- include/linux/pci.h | 37 ++++++++++++++++++++++++------------- 4 files changed, 32 insertions(+), 18 deletions(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 5d72866..0d74851 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -101,11 +101,13 @@ resource_show(struct device * dev, struct device_attribute *attr, char * buf) struct pci_dev * pci_dev = to_pci_dev(dev); char * str = buf; int i; - int max = 7; + int max; resource_size_t start, end; if (pci_dev->subordinate) max = DEVICE_COUNT_RESOURCE; + else + max = PCI_BRIDGE_RESOURCES; for (i = 0; i < max; i++) { struct resource *res = &pci_dev->resource[i]; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 003a9b3..4c5429f 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -423,7 +423,7 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, child->subordinate = 0xff; /* Set up default resource pointers and names.. */ - for (i = 0; i < 4; i++) { + for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) { child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i]; child->resource[i]->name = child->name; } diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index e1098c3..f6f2a59 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -352,15 +352,16 @@ static int show_device(struct seq_file *m, void *v) dev->vendor, dev->device, dev->irq); - /* Here should be 7 and not PCI_NUM_RESOURCES as we need to preserve compatibility */ - for (i=0; i<7; i++) { + + /* only print standard and ROM resources to preserve compatibility */ + for (i = 0; i <= PCI_ROM_RESOURCE; i++) { resource_size_t start, end; pci_resource_to_user(dev, i, &dev->resource[i], &start, &end); seq_printf(m, "\t%16llx", (unsigned long long)(start | (dev->resource[i].flags & PCI_REGION_FLAG_MASK))); } - for (i=0; i<7; i++) { + for (i = 0; i <= PCI_ROM_RESOURCE; i++) { resource_size_t start, end; pci_resource_to_user(dev, i, &dev->resource[i], &start, &end); seq_printf(m, "\t%16llx", diff --git a/include/linux/pci.h b/include/linux/pci.h index 7e7ff03..d455ec8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -82,7 +82,30 @@ enum pci_mmap_state { #define PCI_DMA_FROMDEVICE 2 #define PCI_DMA_NONE 3 -#define DEVICE_COUNT_RESOURCE 12 +/* + * For PCI devices, the region numbers are assigned this way: + */ +enum { + /* #0-5: standard PCI resources */ + PCI_STD_RESOURCES, + PCI_STD_RESOURCE_END = 5, + + /* #6: expansion ROM resource */ + PCI_ROM_RESOURCE, + + /* resources assigned to buses behind the bridge */ +#define PCI_BRIDGE_RESOURCE_NUM 4 + + PCI_BRIDGE_RESOURCES, + PCI_BRIDGE_RESOURCE_END = PCI_BRIDGE_RESOURCES + + PCI_BRIDGE_RESOURCE_NUM - 1, + + /* total resources associated with a PCI device */ + PCI_NUM_RESOURCES, + + /* preserve this for compatibility */ + DEVICE_COUNT_RESOURCE +}; typedef int __bitwise pci_power_t; @@ -268,18 +291,6 @@ static inline void pci_add_saved_cap(struct pci_dev *pci_dev, hlist_add_head(&new_cap->next, &pci_dev->saved_cap_space); } -/* - * For PCI devices, the region numbers are assigned this way: - * - * 0-5 standard PCI regions - * 6 expansion ROM - * 7-10 bridges: address space assigned to buses behind the bridge - */ - -#define PCI_ROM_RESOURCE 6 -#define PCI_BRIDGE_RESOURCES 7 -#define PCI_NUM_RESOURCES 11 - #ifndef PCI_BUS_NUM_RESOURCES #define PCI_BUS_NUM_RESOURCES 16 #endif -- 1.5.6.4
Yu Zhao
2008-Nov-21 18:40 UTC
[PATCH 4/13 v7] PCI: remove unnecessary condition check in pci_restore_bars()
Remove the unnecessary number of resources condition checks because the pci_update_resource() will check availability of the resources. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/pci.c | 19 ++----------------- 1 files changed, 2 insertions(+), 17 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index c408be8..9d3f793 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -360,24 +360,9 @@ pci_find_parent_resource(const struct pci_dev *dev, struct resource *res) static void pci_restore_bars(struct pci_dev *dev) { - int i, numres; - - switch (dev->hdr_type) { - case PCI_HEADER_TYPE_NORMAL: - numres = 6; - break; - case PCI_HEADER_TYPE_BRIDGE: - numres = 2; - break; - case PCI_HEADER_TYPE_CARDBUS: - numres = 1; - break; - default: - /* Should never get here, but just in case... */ - return; - } + int i; - for (i = 0; i < numres; i++) + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) pci_update_resource(dev, i); } -- 1.5.6.4
Export __pci_read_base() so it can be used by whole PCI subsystem. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/pci.h | 9 +++++++++ drivers/pci/probe.c | 20 +++++++++----------- 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 1449884..fd0d087 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -159,6 +159,15 @@ struct pci_slot_attribute { }; #define to_pci_slot_attr(s) container_of(s, struct pci_slot_attribute, attr) +enum pci_bar_type { + pci_bar_unknown, /* Standard PCI BAR probe */ + pci_bar_io, /* An io port BAR */ + pci_bar_mem32, /* A 32-bit memory BAR */ + pci_bar_mem64, /* A 64-bit memory BAR */ +}; + +extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, + struct resource *res, unsigned int reg); extern void pci_enable_ari(struct pci_dev *dev); /** * pci_ari_enabled - query ARI forwarding status diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 4c5429f..ae5c7fe 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -135,13 +135,6 @@ static u64 pci_size(u64 base, u64 maxbase, u64 mask) return size; } -enum pci_bar_type { - pci_bar_unknown, /* Standard PCI BAR probe */ - pci_bar_io, /* An io port BAR */ - pci_bar_mem32, /* A 32-bit memory BAR */ - pci_bar_mem64, /* A 64-bit memory BAR */ -}; - static inline enum pci_bar_type decode_bar(struct resource *res, u32 bar) { if ((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) { @@ -156,11 +149,16 @@ static inline enum pci_bar_type decode_bar(struct resource *res, u32 bar) return pci_bar_mem32; } -/* - * If the type is not unknown, we assume that the lowest bit is 'enable'. - * Returns 1 if the BAR was 64-bit and 0 if it was 32-bit. +/** + * pci_read_base - read a PCI BAR + * @dev: the PCI device + * @type: type of the BAR + * @res: resource buffer to be filled in + * @pos: BAR position in the config space + * + * Returns 1 if the BAR is 64-bit, or 0 if 32-bit. */ -static int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, +int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int pos) { u32 l, sz, mask; -- 1.5.6.4
Yu Zhao
2008-Nov-21 18:41 UTC
[PATCH 6/13 v7] PCI: make pci_alloc_child_bus() be able to handle NULL bridge
Make pci_alloc_child_bus() be able to allocate buses without bridge devices. Some SR-IOV devices can occupy more than one bus number, but there is no explicit bridges because that have internal routing mechanism. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/probe.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ae5c7fe..cd205fd 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -398,12 +398,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, if (!child) return NULL; - child->self = bridge; child->parent = parent; child->ops = parent->ops; child->sysdata = parent->sysdata; child->bus_flags = parent->bus_flags; - child->bridge = get_device(&bridge->dev); /* initialize some portions of the bus device, but don't register it * now as the parent is not properly set up yet. This device will get @@ -420,6 +418,12 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, child->primary = parent->secondary; child->subordinate = 0xff; + if (!bridge) + return child; + + child->self = bridge; + child->bridge = get_device(&bridge->dev); + /* Set up default resource pointers and names.. */ for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) { child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i]; -- 1.5.6.4
Add a function to map resource number to corresponding register so people can get the offset and type of device specific BARs. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/pci.c | 22 ++++++++++++++++++++++ drivers/pci/pci.h | 2 ++ drivers/pci/setup-res.c | 13 +++++-------- 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 9d3f793..9382b5f 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2007,6 +2007,28 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags) return bars; } +/** + * pci_resource_bar - get position of the BAR associated with a resource + * @dev: the PCI device + * @resno: the resource number + * @type: the BAR type to be filled in + * + * Returns BAR position in config space, or 0 if the BAR is invalid. + */ +int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type) +{ + if (resno < PCI_ROM_RESOURCE) { + *type = pci_bar_unknown; + return PCI_BASE_ADDRESS_0 + 4 * resno; + } else if (resno == PCI_ROM_RESOURCE) { + *type = pci_bar_mem32; + return dev->rom_base_reg; + } + + dev_err(&dev->dev, "BAR: invalid resource #%d\n", resno); + return 0; +} + static void __devinit pci_no_domains(void) { #ifdef CONFIG_PCI_DOMAINS diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fd0d087..3de70d7 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -168,6 +168,8 @@ enum pci_bar_type { extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int reg); +extern int pci_resource_bar(struct pci_dev *dev, int resno, + enum pci_bar_type *type); extern void pci_enable_ari(struct pci_dev *dev); /** * pci_ari_enabled - query ARI forwarding status diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c index b7ca679..854d43e 100644 --- a/drivers/pci/setup-res.c +++ b/drivers/pci/setup-res.c @@ -31,6 +31,7 @@ void pci_update_resource(struct pci_dev *dev, int resno) struct pci_bus_region region; u32 new, check, mask; int reg; + enum pci_bar_type type; struct resource *res = dev->resource + resno; /* @@ -62,17 +63,13 @@ void pci_update_resource(struct pci_dev *dev, int resno) else mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; - if (resno < 6) { - reg = PCI_BASE_ADDRESS_0 + 4 * resno; - } else if (resno == PCI_ROM_RESOURCE) { + reg = pci_resource_bar(dev, resno, &type); + if (!reg) + return; + if (type != pci_bar_unknown) { if (!(res->flags & IORESOURCE_ROM_ENABLE)) return; new |= PCI_ROM_ADDRESS_ENABLE; - reg = dev->rom_base_reg; - } else { - /* Hmm, non-standard resource. */ - - return; /* kill uninitialised var warning */ } pci_write_config_dword(dev, reg, new); -- 1.5.6.4
This cleanup makes pci_bus_add_devices() easier to read. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/bus.c | 55 +++++++++++++++++++++++++++-------------------------- 1 files changed, 28 insertions(+), 27 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 999cc40..9d800cb 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -71,7 +71,7 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res, } /** - * add a single device + * pci_bus_add_device - add a single device * @dev: device to add * * This adds a single pci device to the global @@ -105,7 +105,7 @@ int pci_bus_add_device(struct pci_dev *dev) void pci_bus_add_devices(struct pci_bus *bus) { struct pci_dev *dev; - struct pci_bus *child_bus; + struct pci_bus *child; int retval; list_for_each_entry(dev, &bus->devices, bus_list) { @@ -120,39 +120,40 @@ void pci_bus_add_devices(struct pci_bus *bus) list_for_each_entry(dev, &bus->devices, bus_list) { BUG_ON(!dev->is_added); + child = dev->subordinate; /* * If there is an unattached subordinate bus, attach * it and then scan for unattached PCI devices. */ - if (dev->subordinate) { - if (list_empty(&dev->subordinate->node)) { - down_write(&pci_bus_sem); - list_add_tail(&dev->subordinate->node, - &dev->bus->children); - up_write(&pci_bus_sem); - } - pci_bus_add_devices(dev->subordinate); - - /* register the bus with sysfs as the parent is now - * properly registered. */ - child_bus = dev->subordinate; - if (child_bus->is_added) - continue; - child_bus->dev.parent = child_bus->bridge; - retval = device_register(&child_bus->dev); - if (retval) - dev_err(&dev->dev, "Error registering pci_bus," - " continuing...\n"); - else { - child_bus->is_added = 1; - retval = device_create_file(&child_bus->dev, - &dev_attr_cpuaffinity); - } + if (!child) + continue; + if (list_empty(&child->node)) { + down_write(&pci_bus_sem); + list_add_tail(&child->node, &dev->bus->children); + up_write(&pci_bus_sem); + } + pci_bus_add_devices(child); + + /* + * register the bus with sysfs as the parent is now + * properly registered. + */ + if (child->is_added) + continue; + child->dev.parent = child->bridge; + retval = device_register(&child->dev); + if (retval) + dev_err(&dev->dev, "Error registering pci_bus," + " continuing...\n"); + else { + child->is_added = 1; + retval = device_create_file(&child->dev, + &dev_attr_cpuaffinity); if (retval) dev_err(&dev->dev, "Error creating cpuaffinity" " file, continuing...\n"); - retval = device_create_file(&child_bus->dev, + retval = device_create_file(&child->dev, &dev_attr_cpulistaffinity); if (retval) dev_err(&dev->dev, -- 1.5.6.4
Yu Zhao
2008-Nov-21 18:42 UTC
[PATCH 9/13 v7] PCI: split a new function from pci_bus_add_devices()
This patch splits a new function from pci_bus_add_devices(). The new function can be used to register PCI bus to the device core. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/bus.c | 49 ++++++++++++++++++++++++++++++------------------- drivers/pci/pci.h | 1 + 2 files changed, 31 insertions(+), 19 deletions(-) diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 9d800cb..65f5a6f 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -91,6 +91,34 @@ int pci_bus_add_device(struct pci_dev *dev) } /** + * pci_bus_add_child - add a child bus + * @bus: bus to add + * + * This adds sysfs entries for a single bus + */ +int pci_bus_add_child(struct pci_bus *bus) +{ + int retval; + + if (bus->bridge) + bus->dev.parent = bus->bridge; + + retval = device_register(&bus->dev); + if (retval) + return retval; + + bus->is_added = 1; + + retval = device_create_file(&bus->dev, &dev_attr_cpuaffinity); + if (retval) + return retval; + + retval = device_create_file(&bus->dev, &dev_attr_cpulistaffinity); + + return retval; +} + +/** * pci_bus_add_devices - insert newly discovered PCI devices * @bus: bus to check for new devices * @@ -140,26 +168,9 @@ void pci_bus_add_devices(struct pci_bus *bus) */ if (child->is_added) continue; - child->dev.parent = child->bridge; - retval = device_register(&child->dev); + retval = pci_bus_add_child(child); if (retval) - dev_err(&dev->dev, "Error registering pci_bus," - " continuing...\n"); - else { - child->is_added = 1; - retval = device_create_file(&child->dev, - &dev_attr_cpuaffinity); - if (retval) - dev_err(&dev->dev, "Error creating cpuaffinity" - " file, continuing...\n"); - - retval = device_create_file(&child->dev, - &dev_attr_cpulistaffinity); - if (retval) - dev_err(&dev->dev, - "Error creating cpulistaffinity" - " file, continuing...\n"); - } + dev_err(&dev->dev, "Error adding bus, continuing\n"); } } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 3de70d7..315bbe6 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -170,6 +170,7 @@ extern int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type, struct resource *res, unsigned int reg); extern int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type); +extern int pci_bus_add_child(struct pci_bus *bus); extern void pci_enable_ari(struct pci_dev *dev); /** * pci_ari_enabled - query ARI forwarding status -- 1.5.6.4
Support Single Root I/O Virtualization (SR-IOV) capability. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/Kconfig | 13 ++ drivers/pci/Makefile | 3 + drivers/pci/iov.c | 491 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci-driver.c | 12 +- drivers/pci/pci.c | 8 + drivers/pci/pci.h | 51 +++++ drivers/pci/probe.c | 4 + include/linux/pci.h | 9 + include/linux/pci_regs.h | 21 ++ 9 files changed, 610 insertions(+), 2 deletions(-) create mode 100644 drivers/pci/iov.c diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index e1ca425..493233e 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -50,3 +50,16 @@ config HT_IRQ This allows native hypertransport devices to use interrupts. If unsure say Y. + +config PCI_IOV + bool "PCI IOV support" + depends on PCI + select PCI_MSI + default n + help + PCI-SIG I/O Virtualization (IOV) Specifications support. + Single Root IOV: allows the Physical Function device driver + to enable the hardware capability, so the Virtual Function + is accessible via the PCI configuration space using its own + Bus, Device and Function Number. Each Virtual Function also + has PCI Memory Space to map its own register set. diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index af3bfe2..8c7c12d 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -29,6 +29,9 @@ obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o obj-$(CONFIG_INTR_REMAP) += dmar.o intr_remapping.o +# PCI IOV support +obj-$(CONFIG_PCI_IOV) += iov.o + # # Some architectures use the generic PCI setup functions # diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c new file mode 100644 index 0000000..03f62ca --- /dev/null +++ b/drivers/pci/iov.c @@ -0,0 +1,491 @@ +/* + * drivers/pci/iov.c + * + * Copyright (C) 2008 Intel Corporation + * + * PCI Express I/O Virtualization (IOV) support. + * Single Root IOV 1.0 + */ + +#include <linux/ctype.h> +#include <linux/string.h> +#include <linux/pci.h> +#include <linux/delay.h> +#include <asm/page.h> +#include "pci.h" + + +#define pci_iov_attr(field) \ +static ssize_t iov_##field##_show(struct device *dev, \ + struct device_attribute *attr, char *buf) \ +{ \ + struct pci_dev *pdev = to_pci_dev(dev); \ + return sprintf(buf, "%d\n", pdev->iov->field); \ +} + +pci_iov_attr(total); +pci_iov_attr(initial); +pci_iov_attr(nr_virtfn); + +static inline void virtfn_bdf(struct pci_dev *dev, int id, u8 *busnr, u8 *devfn) +{ + u16 bdf; + + bdf = (dev->bus->number << 8) + dev->devfn + + dev->iov->offset + dev->iov->stride * id; + *busnr = bdf >> 8; + *devfn = bdf & 0xff; +} + +static int virtfn_add(struct pci_dev *dev, int id) +{ + int i; + int rc; + u8 busnr, devfn; + struct pci_dev *virtfn; + struct resource *res; + resource_size_t size; + + virtfn_bdf(dev, id, &busnr, &devfn); + + virtfn = alloc_pci_dev(); + if (!virtfn) + return -ENOMEM; + + virtfn->bus = pci_find_bus(pci_domain_nr(dev->bus), busnr); + BUG_ON(!virtfn->bus); + virtfn->sysdata = dev->bus->sysdata; + virtfn->dev.parent = dev->dev.parent; + virtfn->dev.bus = dev->dev.bus; + virtfn->devfn = devfn; + virtfn->hdr_type = PCI_HEADER_TYPE_NORMAL; + virtfn->multifunction = 0; + virtfn->vendor = dev->vendor; + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_VF_DID, + &virtfn->device); + virtfn->cfg_size = PCI_CFG_SPACE_EXP_SIZE; + virtfn->error_state = pci_channel_io_normal; + virtfn->is_pcie = 1; + virtfn->pcie_type = PCI_EXP_TYPE_ENDPOINT; + virtfn->dma_mask = 0xffffffff; + + dev_set_name(&virtfn->dev, "%04x:%02x:%02x.%d", + pci_domain_nr(virtfn->bus), busnr, + PCI_SLOT(devfn), PCI_FUNC(devfn)); + + pci_read_config_byte(virtfn, PCI_REVISION_ID, &virtfn->revision); + virtfn->class = dev->class; + virtfn->current_state = PCI_UNKNOWN; + virtfn->irq = 0; + + for (i = 0; i < PCI_IOV_NUM_BAR; i++) { + res = dev->resource + PCI_IOV_RESOURCES + i; + if (!res->parent) + continue; + virtfn->resource[i].name = pci_name(virtfn); + virtfn->resource[i].flags = res->flags; + size = resource_size(res); + do_div(size, dev->iov->total); + virtfn->resource[i].start = res->start + size * id; + virtfn->resource[i].end = virtfn->resource[i].start + size - 1; + rc = request_resource(res, &virtfn->resource[i]); + BUG_ON(rc); + } + + virtfn->subsystem_vendor = dev->subsystem_vendor; + pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID, + &virtfn->subsystem_device); + + pci_device_add(virtfn, virtfn->bus); + rc = pci_bus_add_device(virtfn); + + return rc; +} + +static void virtfn_remove(struct pci_dev *dev, int id) +{ + u8 busnr, devfn; + struct pci_bus *bus; + struct pci_dev *virtfn; + + virtfn_bdf(dev, id, &busnr, &devfn); + + bus = pci_find_bus(pci_domain_nr(dev->bus), busnr); + BUG_ON(!bus); + virtfn = pci_get_slot(bus, devfn); + BUG_ON(!virtfn); + pci_dev_put(virtfn); + pci_remove_bus_device(virtfn); +} + +static int iov_add_bus(struct pci_bus *bus, int busnr) +{ + int i; + int rc; + struct pci_bus *child; + + for (i = bus->number + 1; i <= busnr; i++) { + child = pci_find_bus(pci_domain_nr(bus), i); + if (child) + continue; + child = pci_add_new_bus(bus, NULL, i); + if (!child) + return -ENOMEM; + + child->subordinate = i; + child->dev.parent = bus->bridge; + rc = pci_bus_add_child(child); + if (rc) + return rc; + } + + return 0; +} + +static void iov_remove_bus(struct pci_bus *bus, int busnr) +{ + int i; + struct pci_bus *child; + + for (i = bus->number + 1; i <= busnr; i++) { + child = pci_find_bus(pci_domain_nr(bus), i); + BUG_ON(!child); + if (list_empty(&child->devices)) + pci_remove_bus(child); + } +} + +static int iov_enable(struct pci_dev *dev, int nr_virtfn) +{ + int i, j; + int rc; + u8 busnr, devfn; + u16 ctrl, offset, stride; + + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_NUM_VF, nr_virtfn); + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_VF_OFFSET, &offset); + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_VF_STRIDE, &stride); + + if (!offset || (nr_virtfn > 1 && !stride)) + return -EIO; + + dev->iov->offset = offset; + dev->iov->stride = stride; + + virtfn_bdf(dev, nr_virtfn - 1, &busnr, &devfn); + if (busnr > dev->bus->subordinate) + return -EIO; + + rc = dev->driver->virtual(dev, nr_virtfn); + if (rc) + return rc; + + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, &ctrl); + ctrl |= PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE; + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, ctrl); + ssleep(1); + + iov_add_bus(dev->bus, busnr); + for (i = 0; i < nr_virtfn; i++) { + rc = virtfn_add(dev, i); + if (rc) + goto failed; + } + + dev->iov->nr_virtfn = nr_virtfn; + + return 0; + +failed: + for (j = 0; j < i; j++) + virtfn_remove(dev, j); + + iov_remove_bus(dev->bus, busnr); + + ctrl &= ~(PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE); + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, ctrl); + ssleep(1); + + return rc; +} + +static void iov_disable(struct pci_dev *dev) +{ + int i; + int rc; + u16 ctrl; + u8 busnr, devfn; + + if (!dev->iov->nr_virtfn) + return; + + rc = dev->driver->virtual(dev, 0); + if (rc) + return; + + for (i = 0; i < dev->iov->nr_virtfn; i++) + virtfn_remove(dev, i); + + virtfn_bdf(dev, dev->iov->nr_virtfn - 1, &busnr, &devfn); + iov_remove_bus(dev->bus, busnr); + + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, &ctrl); + ctrl &= ~(PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE); + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, ctrl); + ssleep(1); + + dev->iov->nr_virtfn = 0; +} + +static ssize_t iov_set_nr_virtfn(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int rc; + long nr_virtfn; + struct pci_dev *pdev = to_pci_dev(dev); + + rc = strict_strtol(buf, 0, &nr_virtfn); + if (rc) + return rc; + + if (nr_virtfn < 0 || nr_virtfn > pdev->iov->initial) + return -EINVAL; + + if (nr_virtfn == pdev->iov->nr_virtfn) + return count; + + mutex_lock(&pdev->iov->physfn->iov->lock); + iov_disable(pdev); + + if (nr_virtfn) + rc = iov_enable(pdev, nr_virtfn); + mutex_unlock(&pdev->iov->physfn->iov->lock); + + return rc ? rc : count; +} + +static DEVICE_ATTR(total_virtfn, S_IRUGO, iov_total_show, NULL); +static DEVICE_ATTR(initial_virtfn, S_IRUGO, iov_initial_show, NULL); +static DEVICE_ATTR(nr_virtfn, S_IWUSR | S_IRUGO, + iov_nr_virtfn_show, iov_set_nr_virtfn); + +static struct attribute *iov_attrs[] = { + &dev_attr_total_virtfn.attr, + &dev_attr_initial_virtfn.attr, + &dev_attr_nr_virtfn.attr, + NULL +}; + +static struct attribute_group iov_attr_group = { + .attrs = iov_attrs, + .name = "iov", +}; + +/** + * pci_iov_init - initialize device's SR-IOV capability + * @dev: the PCI device + * + * Returns 0 on success, or negative on failure. + * + * The major differences between Virtual Function and PCI device are: + * 1) the device with multiple bus numbers uses internal routing, so + * there is no explicit bridge device in this case. + * 2) Virtual Function memory spaces are designated by BARs encapsulated + * in the capability structure, and the BARs in Virtual Function PCI + * configuration space are read-only zero. + */ +int pci_iov_init(struct pci_dev *dev) +{ + int i; + int pos; + u32 pgsz; + u16 ctrl, total, initial, offset, stride; + struct pci_iov *iov; + struct resource *res; + struct pci_dev *physfn; + + if (!dev->is_pcie) + return -ENODEV; + + if (dev->pcie_type != PCI_EXP_TYPE_RC_END && + dev->pcie_type != PCI_EXP_TYPE_ENDPOINT) + return -ENODEV; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_IOV); + if (!pos) + return -ENODEV; + + pci_read_config_word(dev, pos + PCI_IOV_CTRL, &ctrl); + if (ctrl & PCI_IOV_CTRL_VFE) { + pci_write_config_word(dev, pos + PCI_IOV_CTRL, 0); + ssleep(1); + } + + physfn = NULL; + if (!list_empty(&dev->bus->devices)) + list_for_each_entry(physfn, &dev->bus->devices, bus_list) + if (physfn->iov) + break; + + ctrl = 0; + if (!(physfn && physfn->iov) && pci_ari_enabled(dev->bus)) + ctrl |= PCI_IOV_CTRL_ARI; + + pci_write_config_word(dev, pos + PCI_IOV_CTRL, ctrl); + pci_read_config_word(dev, pos + PCI_IOV_TOTAL_VF, &total); + pci_read_config_word(dev, pos + PCI_IOV_INITIAL_VF, &initial); + pci_write_config_word(dev, pos + PCI_IOV_NUM_VF, initial); + pci_read_config_word(dev, pos + PCI_IOV_VF_OFFSET, &offset); + pci_read_config_word(dev, pos + PCI_IOV_VF_STRIDE, &stride); + + if (!total || initial > total || (initial && !offset) || + (initial > 1 && !stride)) + return -EIO; + + pci_read_config_dword(dev, pos + PCI_IOV_SUP_PGSIZE, &pgsz); + i = PAGE_SHIFT > 12 ? PAGE_SHIFT - 12 : 0; + pgsz &= ~((1 << i) - 1); + if (!pgsz) + return -EIO; + + pgsz &= ~(pgsz - 1); + pci_write_config_dword(dev, pos + PCI_IOV_SYS_PGSIZE, pgsz); + + iov = kzalloc(sizeof(*iov), GFP_KERNEL); + if (!iov) + return -ENOMEM; + + iov->cap = pos; + iov->total = total; + iov->initial = initial; + iov->offset = offset; + iov->stride = stride; + iov->pgsz = pgsz; + + for (i = 0; i < PCI_IOV_NUM_BAR; i++) { + res = dev->resource + PCI_IOV_RESOURCES + i; + pos = iov->cap + PCI_IOV_BAR_0 + i * 4; + i += __pci_read_base(dev, pci_bar_unknown, res, pos); + if (!res->flags) + continue; + res->end = res->start + resource_size(res) * total - 1; + } + + if (physfn && physfn->iov) { + pci_dev_get(physfn); + iov->physfn = physfn; + } else { + mutex_init(&iov->lock); + iov->physfn = dev; + } + + dev->iov = iov; + + return 0; +} + +/** + * pci_iov_release - release resources used by the SR-IOV capability + * @dev: the PCI device + */ +void pci_iov_release(struct pci_dev *dev) +{ + if (!dev->iov) + return; + + if (dev == dev->iov->physfn) + mutex_destroy(&dev->iov->lock); + else + pci_dev_put(dev->iov->physfn); + + kfree(dev->iov); +} + +/** + * pci_iov_resource_bar - get position of the SR-IOV BAR + * @dev: the PCI device + * @resno: the resource number + * @type: the BAR type to be filled in + * + * Returns position of the BAR encapsulated in the SR-IOV capability. + */ +int pci_iov_resource_bar(struct pci_dev *dev, int resno, + enum pci_bar_type *type) +{ + if (resno < PCI_IOV_RESOURCES || resno > PCI_IOV_RESOURCE_END) + return 0; + + BUG_ON(!dev->iov); + + *type = pci_bar_unknown; + return dev->iov->cap + PCI_IOV_BAR_0 + + 4 * (resno - PCI_IOV_RESOURCES); +} + +/** + * pci_restore_iov_state - restore the state of the SR-IOV capability + * @dev: the PCI device + */ +void pci_restore_iov_state(struct pci_dev *dev) +{ + u16 ctrl; + + if (!dev->iov) + return; + + pci_read_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, &ctrl); + if (ctrl & PCI_IOV_CTRL_VFE) + return; + + pci_write_config_dword(dev, dev->iov->cap + PCI_IOV_SYS_PGSIZE, + dev->iov->pgsz); + ctrl = 0; + if (dev == dev->iov->physfn && pci_ari_enabled(dev->bus)) + ctrl |= PCI_IOV_CTRL_ARI; + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, ctrl); + + if (!dev->iov->nr_virtfn) + return; + + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_NUM_VF, + dev->iov->nr_virtfn); + ctrl |= PCI_IOV_CTRL_VFE | PCI_IOV_CTRL_MSE; + pci_write_config_word(dev, dev->iov->cap + PCI_IOV_CTRL, ctrl); + + ssleep(1); +} + +/** + * pci_iov_register - register the SR-IOV capability + * @dev: the PCI device + */ +int pci_iov_register(struct pci_dev *dev) +{ + int rc; + + if (!dev->iov) + return -ENODEV; + + rc = sysfs_create_group(&dev->dev.kobj, &iov_attr_group); + if (rc) + return rc; + + rc = kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE); + + return rc; +} + +/** + * pci_iov_unregister - unregister the SR-IOV capability + * @dev: the PCI device + */ +void pci_iov_unregister(struct pci_dev *dev) +{ + if (!dev->iov) + return; + + sysfs_remove_group(&dev->dev.kobj, &iov_attr_group); + iov_disable(dev); + kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE); +} diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index b4cdd69..3d5f3a3 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -234,6 +234,8 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev) error = pci_call_probe(drv, pci_dev, id); if (error >= 0) { pci_dev->driver = drv; + if (drv->virtual) + pci_iov_register(pci_dev); error = 0; } } @@ -262,6 +264,8 @@ static int pci_device_remove(struct device * dev) struct pci_driver * drv = pci_dev->driver; if (drv) { + if (drv->virtual) + pci_iov_unregister(pci_dev); if (drv->remove) drv->remove(pci_dev); pci_dev->driver = NULL; @@ -292,8 +296,12 @@ static void pci_device_shutdown(struct device *dev) struct pci_dev *pci_dev = to_pci_dev(dev); struct pci_driver *drv = pci_dev->driver; - if (drv && drv->shutdown) - drv->shutdown(pci_dev); + if (drv) { + if (drv->virtual) + pci_iov_unregister(pci_dev); + if (drv->shutdown) + drv->shutdown(pci_dev); + } pci_msi_shutdown(pci_dev); pci_msix_shutdown(pci_dev); } diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 9382b5f..ca26e53 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -763,6 +763,7 @@ pci_restore_state(struct pci_dev *dev) } pci_restore_pcix_state(dev); pci_restore_msi_state(dev); + pci_restore_iov_state(dev); return 0; } @@ -2017,12 +2018,19 @@ int pci_select_bars(struct pci_dev *dev, unsigned long flags) */ int pci_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type) { + int reg; + if (resno < PCI_ROM_RESOURCE) { *type = pci_bar_unknown; return PCI_BASE_ADDRESS_0 + 4 * resno; } else if (resno == PCI_ROM_RESOURCE) { *type = pci_bar_mem32; return dev->rom_base_reg; + } else if (resno < PCI_BRIDGE_RESOURCES) { + /* device specific resource */ + reg = pci_iov_resource_bar(dev, resno, type); + if (reg) + return reg; } dev_err(&dev->dev, "BAR: invalid resource #%d\n", resno); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 315bbe6..3113d11 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -183,4 +183,55 @@ static inline int pci_ari_enabled(struct pci_bus *bus) return bus->self && bus->self->ari_enabled; } +/* Single Root I/O Virtualization */ +struct pci_iov { + int cap; /* capability position */ + int status; /* status of SR-IOV */ + u16 total; /* total VFs associated with the PF */ + u16 initial; /* initial VFs associated with the PF */ + u16 nr_virtfn; /* number of VFs available */ + u16 offset; /* first VF Routing ID offset */ + u16 stride; /* following VF stride */ + u32 pgsz; /* page size for BAR alignment */ + struct pci_dev *physfn; /* lowest numbered PF */ + struct mutex lock; /* lock for VF bus */ +}; + +#ifdef CONFIG_PCI_IOV +extern int pci_iov_init(struct pci_dev *dev); +extern void pci_iov_release(struct pci_dev *dev); +extern int pci_iov_resource_bar(struct pci_dev *dev, int resno, + enum pci_bar_type *type); +extern int pci_iov_register(struct pci_dev *dev); +extern void pci_iov_unregister(struct pci_dev *dev); +extern void pci_restore_iov_state(struct pci_dev *dev); +#else +static inline int pci_iov_init(struct pci_dev *dev) +{ + return -EIO; +} +static inline void pci_iov_release(struct pci_dev *dev) + +{ +} + +static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno, + enum pci_bar_type *type) +{ + return 0; +} + +static inline int pci_iov_register(struct pci_dev *dev) +{ +} + +static inline void pci_iov_unregister(struct pci_dev *dev) +{ +} + +static inline void pci_restore_iov_state(struct pci_dev *dev) +{ +} +#endif /* CONFIG_PCI_IOV */ + #endif /* DRIVERS_PCI_H */ diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index cd205fd..cb26e64 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -785,6 +785,7 @@ static int pci_setup_device(struct pci_dev * dev) static void pci_release_capabilities(struct pci_dev *dev) { pci_vpd_release(dev); + pci_iov_release(dev); } /** @@ -968,6 +969,9 @@ static void pci_init_capabilities(struct pci_dev *dev) /* Alternative Routing-ID Forwarding */ pci_enable_ari(dev); + + /* Single Root I/O Virtualization */ + pci_iov_init(dev); } void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) diff --git a/include/linux/pci.h b/include/linux/pci.h index d455ec8..c9046a3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -93,6 +93,12 @@ enum { /* #6: expansion ROM resource */ PCI_ROM_RESOURCE, + /* device specific resources */ +#ifdef CONFIG_PCI_IOV + PCI_IOV_RESOURCES, + PCI_IOV_RESOURCE_END = PCI_IOV_RESOURCES + PCI_IOV_NUM_BAR - 1, +#endif + /* resources assigned to buses behind the bridge */ #define PCI_BRIDGE_RESOURCE_NUM 4 @@ -171,6 +177,7 @@ struct pci_cap_saved_state { struct pcie_link_state; struct pci_vpd; +struct pci_iov; /* * The pci_dev structure is used to describe PCI devices. @@ -259,6 +266,7 @@ struct pci_dev { struct list_head msi_list; #endif struct pci_vpd *vpd; + struct pci_iov *iov; }; extern struct pci_dev *alloc_pci_dev(void); @@ -426,6 +434,7 @@ struct pci_driver { int (*resume_early) (struct pci_dev *dev); int (*resume) (struct pci_dev *dev); /* Device woken up */ void (*shutdown) (struct pci_dev *dev); + int (*virtual) (struct pci_dev *dev, int nr_virtfn); struct pm_ext_ops *pm; struct pci_error_handlers *err_handler; struct device_driver driver; diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h index e5effd4..1d1ade2 100644 --- a/include/linux/pci_regs.h +++ b/include/linux/pci_regs.h @@ -363,6 +363,7 @@ #define PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */ #define PCI_EXP_TYPE_DOWNSTREAM 0x6 /* Downstream Port */ #define PCI_EXP_TYPE_PCI_BRIDGE 0x7 /* PCI/PCI-X Bridge */ +#define PCI_EXP_TYPE_RC_END 0x9 /* Root Complex Integrated Endpoint */ #define PCI_EXP_FLAGS_SLOT 0x0100 /* Slot implemented */ #define PCI_EXP_FLAGS_IRQ 0x3e00 /* Interrupt message number */ #define PCI_EXP_DEVCAP 4 /* Device capabilities */ @@ -436,6 +437,7 @@ #define PCI_EXT_CAP_ID_DSN 3 #define PCI_EXT_CAP_ID_PWR 4 #define PCI_EXT_CAP_ID_ARI 14 +#define PCI_EXT_CAP_ID_IOV 16 /* Advanced Error Reporting */ #define PCI_ERR_UNCOR_STATUS 4 /* Uncorrectable Error Status */ @@ -553,4 +555,23 @@ #define PCI_ARI_CTRL_ACS 0x0002 /* ACS Function Groups Enable */ #define PCI_ARI_CTRL_FG(x) (((x) >> 4) & 7) /* Function Group */ +/* Single Root I/O Virtualization */ +#define PCI_IOV_CAP 0x04 /* SR-IOV Capabilities */ +#define PCI_IOV_CTRL 0x08 /* SR-IOV Control */ +#define PCI_IOV_CTRL_VFE 0x01 /* VF Enable */ +#define PCI_IOV_CTRL_MSE 0x08 /* VF Memory Space Enable */ +#define PCI_IOV_CTRL_ARI 0x10 /* ARI Capable Hierarchy */ +#define PCI_IOV_STATUS 0x0a /* SR-IOV Status */ +#define PCI_IOV_INITIAL_VF 0x0c /* Initial VFs */ +#define PCI_IOV_TOTAL_VF 0x0e /* Total VFs */ +#define PCI_IOV_NUM_VF 0x10 /* Number of VFs */ +#define PCI_IOV_FUNC_LINK 0x12 /* Function Dependency Link */ +#define PCI_IOV_VF_OFFSET 0x14 /* First VF Offset */ +#define PCI_IOV_VF_STRIDE 0x16 /* Following VF Stride */ +#define PCI_IOV_VF_DID 0x1a /* VF Device ID */ +#define PCI_IOV_SUP_PGSIZE 0x1c /* Supported Page Sizes */ +#define PCI_IOV_SYS_PGSIZE 0x20 /* System Page Size */ +#define PCI_IOV_BAR_0 0x24 /* VF BAR0 */ +#define PCI_IOV_NUM_BAR 6 /* Number of VF BARs */ + #endif /* LINUX_PCI_REGS_H */ -- 1.5.6.4
Reserve bus range for SR-IOV at device scanning stage when the kernel boot parameter 'pci=assign-busses' is used . Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- drivers/pci/iov.c | 24 ++++++++++++++++++++++++ drivers/pci/pci.h | 6 ++++++ drivers/pci/probe.c | 3 +++ 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 03f62ca..c1a3cea 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -489,3 +489,27 @@ void pci_iov_unregister(struct pci_dev *dev) iov_disable(dev); kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE); } + +/** + * pci_iov_bus_range - find bus range used by the Virtual Function + * @bus: the PCI bus + * + * Returns max number of buses (exclude current one) used by Virtual + * Functions. + */ +int pci_iov_bus_range(struct pci_bus *bus) +{ + int max = 0; + u8 busnr, devfn; + struct pci_dev *dev; + + list_for_each_entry(dev, &bus->devices, bus_list) { + if (!dev->iov) + continue; + virtfn_bdf(dev, dev->iov->total - 1, &busnr, &devfn); + if (busnr > max) + max = busnr; + } + + return max ? max - bus->number : 0; +} diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 3113d11..574bbc7 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -205,6 +205,7 @@ extern int pci_iov_resource_bar(struct pci_dev *dev, int resno, extern int pci_iov_register(struct pci_dev* dev); extern void pci_iov_unregister(struct pci_dev* dev); extern void pci_restore_iov_state(struct pci_dev *dev); +extern int pci_iov_bus_range(struct pci_bus *bus); #else static inline int pci_iov_init(struct pci_dev *dev) { @@ -232,6 +233,11 @@ static inline void pci_iov_unregister(struct pci_dev* dev) static inline void pci_restore_iov_state(struct pci_dev *dev) { } + +static inline int pci_iov_bus_range(struct pci_bus *bus) +{ + return 0; +} #endif /* CONFIG_PCI_IOV */ #endif /* DRIVERS_PCI_H */ diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index cb26e64..7b591e5 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1074,6 +1074,9 @@ unsigned int __devinit pci_scan_child_bus(struct pci_bus *bus) for (devfn = 0; devfn < 0x100; devfn += 8) pci_scan_slot(bus, devfn); + /* Reserve buses for SR-IOV capability. */ + max += pci_iov_bus_range(bus); + /* * After performing arch-dependent fixup of the bus, look behind * all PCI-to-PCI bridges on this bus. -- 1.5.6.4
Document the SR-IOV sysfs entries. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- Documentation/ABI/testing/sysfs-bus-pci | 26 ++++++++++++++++++++++++++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ceddcff..d66d63d 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -9,3 +9,29 @@ Description: that some devices may have malformatted data. If the underlying VPD has a writable section then the corresponding section of this file will be writable. + +What: /sys/bus/pci/devices/.../iov/total_virtfn +Date: November 2008 +Contact: Yu Zhao <yu.zhao at intel.com> +Description: + This file appears when a device has the SR-IOV capability + and the device driver (PF driver) support this operation. + It holds the number of total Virtual Functions (read-only). + +What: /sys/bus/pci/devices/.../iov/initial_virtfn +Date: November 2008 +Contact: Yu Zhao <yu.zhao at intel.com> +Description: + This file appears when a device has the SR-IOV capability + and the device driver (PF driver) support this operation. + It holds the number of initial Virtual Functions (read-only). + +What: /sys/bus/pci/devices/.../iov/nr_virtfn +Date: November 2008 +Contact: Yu Zhao <yu.zhao at intel.com> +Description: + This file appears when a device has the SR-IOV capability + and the device driver (PF driver) support this operation. + It holds the number of available Virtual Functions, and + could be written (0 ~ InitialVFs) to change the number of + the Virtual Functions. -- 1.5.6.4
Yu Zhao
2008-Nov-21 18:44 UTC
[PATCH 13/13 v7] PCI: document for SR-IOV user and developer
Create how-to for the SR-IOV user and driver developer. Signed-off-by: Yu Zhao <yu.zhao at intel.com> --- Documentation/DocBook/kernel-api.tmpl | 1 + Documentation/PCI/pci-iov-howto.txt | 138 +++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+), 0 deletions(-) create mode 100644 Documentation/PCI/pci-iov-howto.txt diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 5818ff7..506e611 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -251,6 +251,7 @@ X!Edrivers/pci/hotplug.c --> !Edrivers/pci/probe.c !Edrivers/pci/rom.c +!Edrivers/pci/iov.c </sect1> <sect1><title>PCI Hotplug Support Library</title> !Edrivers/pci/hotplug/pci_hotplug_core.c diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt new file mode 100644 index 0000000..216cecc --- /dev/null +++ b/Documentation/PCI/pci-iov-howto.txt @@ -0,0 +1,138 @@ + PCI Express I/O Virtualization Howto + Copyright (C) 2008 Intel Corporation + + +1. Overview + +1.1 What is SR-IOV + +Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended +capability which makes one physical device appear as multiple virtual +devices. The physical device is referred to as Physical Function (PF) +while the virtual devices are referred to as Virtual Functions (VF). +Allocation of the VF can be dynamically controlled by the PF via +registers encapsulated in the capability. By default, this feature is +not enabled and the PF behaves as traditional PCIe device. Once it's +turned on, each VF's PCI configuration space can be accessed by its own +Bus, Device and Function Number (Routing ID). And each VF also has PCI +Memory Space, which is used to map its register set. VF device driver +operates on the register set so it can be functional and appear as a +real existing PCI device. + +2. User Guide + +2.1 How can I manage the SR-IOV + +If a device has the SR-IOV capability and the device driver (PF driver) +supports this operation, then there should be some entries under the +PF's sysfs directory: + - /sys/bus/pci/devices/NNNN:BB:DD.F/iov/ + (NNNN:BB:DD:F is the domain, bus, device and function numbers) + +To change number of Virtual Functions: + - /sys/bus/pci/devices/NNNN:BB:DD.F/iov/nr_virtfn + (writing positive integer to this file will change the number of + VFs, and 0 means disable the capability) + +The total and initial numbers of VFs can get from: + - /sys/bus/pci/devices/NNNN:BB:DD.F/iov/total_virtfn + - /sys/bus/pci/devices/NNNN:BB:DD.F/iov/initial_virtfn + +2.2 How can I use the Virtual Functions + +The VF is treated as hot-plugged PCI devices in the kernel, so they +should be able to work in the same way as real PCI devices. And also +the VF requires device driver that is same as a normal PCI device's. + +3. Developer Guide + +3.1 SR-IOV APIs + +To use the SR-IOV service, the Physical Function driver needs to declare +a callback function in its 'struct pci_driver': + + static struct pci_driver dev_driver = { + ... + .virtual = dev_virtual, + ... + }; + + The 'dev_virtual' is a callback function that the SR-IOV service + will invoke it when the number of VFs is changed by the user. + The first argument of this callback is PF itself ('struct pci_dev'), + and the second argument is the number of VFs requested. The callback + should return 0 if the requested number of VFs is supported and all + necessary resources are granted to these VFs; otherwise it should + return a negative value indicating the error. + +3.2 Usage example + +Following piece of code illustrates the usage of APIs above. + +static int __devinit dev_probe(struct pci_dev *dev, + const struct pci_device_id *id) +{ + ... + + return 0; +} + +static void __devexit dev_remove(struct pci_dev *dev) +{ + ... +} + +#ifdef CONFIG_PM +static int dev_suspend(struct pci_dev *dev, pm_message_t state) +{ + ... + + return 0; +} + +static int dev_resume(struct pci_dev *dev) +{ + pci_restore_state(dev); + + ... + + return 0; +} +#endif + +static void dev_shutdown(struct pci_dev *dev) +{ + ... +} + +static int dev_virtual(struct pci_dev *dev, int nr_virtfn) +{ + + if (nr_virtfn) { + /* + * allocate device internal resources for VFs. + * these resources are device-specific (e.g. rx/tx + * queue in the NIC) and necessary to make the VF + * functional. + */ + } else { + /* + * reclaim the VF related resources if any. + */ + } + + return 0; +} + +static struct pci_driver dev_driver = { + .name = "SR-IOV Physical Function driver", + .id_table = dev_id_table, + .probe = dev_probe, + .remove = __devexit_p(dev_remove), +#ifdef CONFIG_PM + .suspend = dev_suspend, + .resume = dev_resume, +#endif + .shutdown = dev_shutdown, + .virtual = dev_virtual +}; -- 1.5.6.4
On Sat, Nov 22, 2008 at 02:36:05AM +0800, Yu Zhao wrote:> Greetings, > > Following patches are intended to support SR-IOV capability in the > Linux kernel. With these patches, people can turn a PCI device with > the capability into multiple ones from software perspective, which > will benefit KVM and achieve other purposes such as QoS, security, > and etc. > > The Physical Function and Virtual Function drivers using the SR-IOV > APIs will come soon!Thanks for respining these patches, but I think we really need to see a driver using this in order to get an idea of how it will be used. Also, the Xen and KVM people need to agree on the userspace interface here, perhaps also getting some libvirt involvement as well, as they are going to be the ones having to use this all the time. thanks, greg k-h
SR-IOV drivers of Intel 82576 NIC are available. There are two parts of the drivers: Physical Function driver and Virtual Function driver. The PF driver is based on the IGB driver and is used to control PF to allocate hardware specific resources and interface with the SR-IOV core. The VF driver is a new NIC driver that is same as the traditional PCI device driver. It works in both the host and the guest (Xen and KVM) environment. These two drivers are testing versions and they are *only* intended to show how to use SR-IOV API. Intel 82576 NIC specification can be found at: http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf [SR-IOV driver example 1/3] PF driver: allocate hardware specific resource [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core [SR-IOV driver example 3/3] VF driver tar ball
Yu Zhao
2008-Nov-26 14:21 UTC
[SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core
This patch integrates the IGB driver with the SR-IOV core. It shows how the SR-IOV API is used to support the capability. Obviously people does not need to put much effort to integrate the PF driver with SR-IOV core. All SR-IOV standard stuff are handled by SR-IOV core and PF driver only concerns the device specific resource allocation and deallocation once it gets the necessary information (i.e. number of Virtual Functions) from the callback function. --- drivers/net/igb/igb_main.c | 30 ++++++++++++++++++++++++++++++ 1 files changed, 30 insertions(+), 0 deletions(-) diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c index bc063d4..b8c7dc6 100644 --- a/drivers/net/igb/igb_main.c +++ b/drivers/net/igb/igb_main.c @@ -139,6 +139,7 @@ void igb_set_mc_list_pools(struct igb_adapter *, struct e1000_hw *, int, u16); static int igb_vmm_control(struct igb_adapter *, bool); static int igb_set_vf_mac(struct net_device *, int, u8*); static void igb_mbox_handler(struct igb_adapter *); +static int igb_virtual(struct pci_dev *, int); #endif static int igb_suspend(struct pci_dev *, pm_message_t); @@ -184,6 +185,9 @@ static struct pci_driver igb_driver = { #endif .shutdown = igb_shutdown, .err_handler = &igb_err_handler, +#ifdef CONFIG_PCI_IOV + .virtual = igb_virtual +#endif }; static int global_quad_port_a; /* global quad port a indication */ @@ -5107,6 +5111,32 @@ void igb_set_mc_list_pools(struct igb_adapter *adapter, reg_data |= (1 << 25); wr32(E1000_VMOLR(pool), reg_data); } + +static int +igb_virtual(struct pci_dev *pdev, int nr_virtfn) +{ + unsigned char my_mac_addr[6] = {0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0xFF}; + struct net_device *netdev = pci_get_drvdata(pdev); + struct igb_adapter *adapter = netdev_priv(netdev); + int i; + + if (nr_virtfn > 7) + return -EINVAL; + + if (nr_virtfn) { + for (i = 0; i < nr_virtfn; i++) { + printk(KERN_INFO "SR-IOV: VF %d is enabled\n", i); + my_mac_addr[5] = (unsigned char)i; + igb_set_vf_mac(netdev, i, my_mac_addr); + igb_set_vf_vmolr(adapter, i); + } + } else + printk(KERN_INFO "SR-IOV is disabled\n"); + + adapter->vfs_allocated_count = nr_virtfn; + + return 0; +} #endif /* igb_main.c */ -- 1.5.4.4
The attachment is the VF driver for Intel 82576 NIC. Since the VF appears as the normal PCI device driver, this VF driver has no difference from other PCI NIC drivers. It handles interrupt, DMA operations, etc. to perform packet receiving and transmission. How the design of the VF internals is up to the hardware vendor. So the VF may have totally different register set from the PF, which means the VF driver may have its own logics (rather than derive from PF driver) to handle the hardware specific stuff. -------------- next part -------------- A non-text attachment was scrubbed... Name: igbvf-0.5.2.tar.gz Type: application/x-tar-gz Size: 141423 bytes Desc: not available Url : http://lists.linux-foundation.org/pipermail/virtualization/attachments/20081126/054c3b1d/attachment-0001.bin
Yu Zhao wrote:> SR-IOV drivers of Intel 82576 NIC are available. There are two parts > of the drivers: Physical Function driver and Virtual Function driver. > The PF driver is based on the IGB driver and is used to control PF to > allocate hardware specific resources and interface with the SR-IOV core. > The VF driver is a new NIC driver that is same as the traditional PCI > device driver. It works in both the host and the guest (Xen and KVM) > environment. > > These two drivers are testing versions and they are *only* intended to > show how to use SR-IOV API. > > Intel 82576 NIC specification can be found at: > http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf > > [SR-IOV driver example 1/3] PF driver: allocate hardware specific resource > [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core > [SR-IOV driver example 3/3] VF driver tar ballPlease copy netdev at vger.kernel.org on all network-related patches. This is where the network developers live, and all patches on this list are automatically archived for review and handling at http://patchwork.ozlabs.org/project/netdev/list/ Jeff
On Thu, Nov 27, 2008 at 04:14:48AM +0800, Jeff Garzik wrote:> Yu Zhao wrote: > > SR-IOV drivers of Intel 82576 NIC are available. There are two parts > > of the drivers: Physical Function driver and Virtual Function driver. > > The PF driver is based on the IGB driver and is used to control PF to > > allocate hardware specific resources and interface with the SR-IOV core. > > The VF driver is a new NIC driver that is same as the traditional PCI > > device driver. It works in both the host and the guest (Xen and KVM) > > environment. > > > > These two drivers are testing versions and they are *only* intended to > > show how to use SR-IOV API. > > > > Intel 82576 NIC specification can be found at: > > http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf > > > > [SR-IOV driver example 1/3] PF driver: allocate hardware specific resource > > [SR-IOV driver example 2/3] PF driver: integrate with SR-IOV core > > [SR-IOV driver example 3/3] VF driver tar ball > > Please copy netdev at vger.kernel.org on all network-related patches. This > is where the network developers live, and all patches on this list are > automatically archived for review and handling at > http://patchwork.ozlabs.org/project/netdev/list/Will do. Thanks, Yu
SR-IOV drivers of Intel 82576 NIC are available. There are two parts of the drivers: Physical Function driver and Virtual Function driver. The PF driver is based on the IGB driver and is used to control PF to allocate hardware specific resources and interface with the SR-IOV core. The VF driver is a new NIC driver that is same as the traditional PCI device driver. It works in both the host and the guest (Xen and KVM) environment. These two drivers are testing versions and they are *only* intended to show how to use SR-IOV API. Intel 82576 NIC specification can be found at: http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf [SR-IOV driver example 0/3 resend] introduction [SR-IOV driver example 1/3 resend] PF driver: hardware specific operations [SR-IOV driver example 2/3 resend] PF driver: integrate with SR-IOV core [SR-IOV driver example 3/3 resend] VF driver: an independent PCI NIC driver
On Tue, Dec 2, 2008 at 1:27 AM, Yu Zhao <yu.zhao at intel.com> wrote:> SR-IOV drivers of Intel 82576 NIC are available. There are two parts > of the drivers: Physical Function driver and Virtual Function driver. > The PF driver is based on the IGB driver and is used to control PF to > allocate hardware specific resources and interface with the SR-IOV core. > The VF driver is a new NIC driver that is same as the traditional PCI > device driver. It works in both the host and the guest (Xen and KVM) > environment. > > These two drivers are testing versions and they are *only* intended to > show how to use SR-IOV API. > > Intel 82576 NIC specification can be found at: > http://download.intel.com/design/network/datashts/82576_Datasheet_v2p1.pdf > > [SR-IOV driver example 0/3 resend] introduction > [SR-IOV driver example 1/3 resend] PF driver: hardware specific operations > [SR-IOV driver example 2/3 resend] PF driver: integrate with SR-IOV core > [SR-IOV driver example 3/3 resend] VF driver: an independent PCI NIC driver > -- >First of all, we (e1000-devel) do support the SR-IOV API. With that said, NAK on the driver changes. We were not involved in these changes and are currently working on a version of the drivers that will make them acceptable for kernel inclusion. -- Cheers, Jeff
On Friday, November 21, 2008 10:36 am Yu Zhao wrote:> Greetings, > > Following patches are intended to support SR-IOV capability in the > Linux kernel. With these patches, people can turn a PCI device with > the capability into multiple ones from software perspective, which > will benefit KVM and achieve other purposes such as QoS, security, > and etc. > > The Physical Function and Virtual Function drivers using the SR-IOV > APIs will come soon! > > Major changes from v6 to v7: > 1, remove boot-time resource rebalancing support. (Greg KH) > 2, emit uevent upon the PF driver is loaded. (Greg KH) > 3, put SR-IOV callback function into the 'pci_driver'. (Matthew Wilcox) > 4, register SR-IOV service at the PF loading stage. > 5, remove unnecessary APIs (pci_iov_enable/disable).Thanks for your patience with this, Yu, I know it's been a long haul. :) I applied 1-9 to my linux-next branch; and at least patch #10 needs a respin, so can you re-do 10-13 as a new patch set? On re-reading the last thread, there was a lot of smoke, but very little fire afaict. The main questions I saw were: 1) do we need SR-IOV at all? why not just make each subsystem export devices to guests? This is a bit of a red herring. Nothing about SR-IOV prevents us from making subsystems more v12n friendly. And since SR-IOV is a hardware feature supported by devices these days, we should make Linux support it. 2) should the PF/VF drivers be the same or not? Again, the SR-IOV patchset and PCI spec don't dictate this. We're free to do what we want here. 3) should VF devices be represented by pci_dev structs? Yes. (This is an easy one :) 4) can VF devices be used on the host? Yet again, SR-IOV doesn't dictate this. Developers can make PF/VF combo drivers or split them, and export the resulting devices however they want. Some subsystem work may be needed to make this efficient, but SR-IOV itself is agnostic about it. So overall I didn't see many objections to the actual code in the last post, and the issues above certainly don't merit a NAK IMO... Given a respin of 10-13 I think it's reasonable to merge this into 2.6.29, but I'd be much happier about it if we got some driver code along with it, so as not to have an unused interface sitting around for who knows how many releases. Is that reasonable? Do you know if any of the corresponding PF/VF driver bits are ready yet? Thanks, -- Jesse Barnes, Intel Open Source Technology Center