Greetings, Following commits from the upstream kernel are backported to the Domain0 kernel as the first step of enabling Xen SR-IOV. Hopefully the SR-IOV core patches for the upstream kernel would be applied on the PCI subsystem tree in the next several days. Will backport them immediately they are in-tree so Xen 3.4 release could have the SR-IOV support. PCI: define PCI resource names in an ''enum'' PCI: remove unnecessary condition check in pci_restore_bars() PCI: add a new function to map BAR offsets PCI: rewrite PCI BAR reading code PCI: handle 64-bit resources better on 32-bit machines PCI: fix 64-vbit prefetchable memory resource BARs PCI: export __pci_read_base() PCI: support PCIe ARI capability PCI: fix ARI code to be compatible with mixed ARI/non-ARI systems PCI: enhance pci_ari_enabled() PCI: allow pci_alloc_child_bus() to handle a NULL bridge PCI: remove unnecessary arg of pci_update_resource() drivers/pci/pci-sysfs.c | 4 drivers/pci/pci.c | 104 +++++++++++++------ drivers/pci/pci.h | 29 ++++- drivers/pci/probe.c | 252 ++++++++++++++++++++++++++++------------------- drivers/pci/proc.c | 7 - drivers/pci/quirks.c | 2 drivers/pci/setup-res.c | 20 +-- include/linux/pci.h | 40 ++++--- include/linux/pci_regs.h | 15 ++ 9 files changed, 310 insertions(+), 163 deletions(-) --- The SR-IOV specification (requires membership) can be found at: * http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf The latest SR-IOV core patches for the upstream kernel are: PCI: initialize and release SR-IOV capability * http://patchwork.kernel.org/patch/11062/ PCI: restore saved SR-IOV state * http://patchwork.kernel.org/patch/11063/ PCI: reserve bus range for SR-IOV device * http://patchwork.kernel.org/patch/11064/ PCI: centralize device setup code * http://patchwork.kernel.org/patch/11065/ PCI: add SR-IOV API for Physical Function driver * http://patchwork.kernel.org/patch/11070/ PCI: handle SR-IOV Virtual Function Migration * http://patchwork.kernel.org/patch/11066/ PCI: document SR-IOV sysfs entries * http://patchwork.kernel.org/patch/11068/ PCI: manual for SR-IOV user and driver developer * http://patchwork.kernel.org/patch/11069/ Intel 82576 Gigabit Ethernet Controller datasheet is available at: * http://download.intel.com/design/network/datashts/82576_Datasheet.pdf The patches to enable the SR-IOV capability of Intel 82576 NIC are available at (a.k.a Physical Function driver): * http://patchwork.kernel.org/patch/8063/ * http://patchwork.kernel.org/patch/8064/ * http://patchwork.kernel.org/patch/8065/ * http://patchwork.kernel.org/patch/8066/ And the driver for Intel 82576 Virtual Function is available at: * http://patchwork.kernel.org/patch/11029/ * http://patchwork.kernel.org/patch/11028/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Greetings, The latest SR-IOV core patches for the native Linux kernel are backported to Xen/Dom0. I planed to do it after they are applied on the native kernel tree, however, Xen 3.4 is going to freeze. I had to ask Keir to take the SR-IOV patches for 3.4 so it wouldn''t disappoint a lot of people who are expecting the SR-IOV support in 3.4 release. Since the SR-IOV patches for the native kernel have been reviewed by some key kernel developers during a long development time cycle, the SR-IOV Physical Function driver API is stable and will not cost any extra porting effort for people who want to develop the Physical Function driver for both Xen/Dom0 and native Linux. The SR-IOV core patches backported to Xen/Dom0 are: PCI: initialize and release SR-IOV capability (hg c/s: 822) PCI: restore saved SR-IOV state (hg c/s: 823) PCI: reserve bus range for SR-IOV device (hg c/s: 824) PCI: centralize device setup code (hg c/s: 825) PCI: add SR-IOV API for Physical Function driver (hg c/s: 826) Following patches are not backported: PCI: handle SR-IOV Virtual Function Migration * This is for using the SR-IOV in conjunction with the MR-IOV. It won''t be needed until the MR-IOV is fully supported. PCI: document SR-IOV sysfs entries * The sysfs ABI document, which has a gap between the upstream and the Dom0 kernels. So please use the latest upstream kernel ABI document instead before Dom0 switches to the pv_ops kernel. PCI: manual for SR-IOV user and driver developer * The PCI subsystem document, same as above. drivers/pci/Kconfig | 9 drivers/pci/Makefile | 2 drivers/pci/iov.c | 537 +++++++++++++++++++++++++++++++++++++++++++++++ drivers/pci/pci.c | 9 drivers/pci/pci.h | 49 ++++ drivers/pci/probe.c | 49 ++-- include/linux/pci.h | 33 ++ include/linux/pci_regs.h | 33 ++ 8 files changed, 699 insertions(+), 22 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu Zhao
2009-Mar-19 08:38 UTC
[Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV
Greetings, Following patches enhance the VT-d and PCI code, and add a quirk for Intel 82576 SR-IOV NIC, so they can work better with the SR-IOV core. PCI: pass ARI and SR-IOV device information to the hypervisor PCI: add a SR-IOV quirk for Intel 82576 NIC PCI: save and restore PCIe 2.0 registers drivers/pci/pci.c | 11 +++++++ drivers/pci/quirks.c | 57 ++++++++++++++++++++++++++++++++++++++++ drivers/xen/core/pci.c | 27 ++++++++++++++++-- include/linux/pci_regs.h | 2 + include/xen/interface/physdev.h | 16 +++++++++++ 5 files changed, 109 insertions(+), 4 deletions(-) Xen: use proper device ID to search VT-d unit for ARI and SR-IOV device arch/ia64/xen/hypercall.c | 22 +++++++++++++++++++ arch/x86/physdev.c | 22 +++++++++++++++++++ drivers/passthrough/pci.c | 41 +++++++++++++++++++++++++++++++++++++ drivers/passthrough/vtd/dmar.c | 14 +++++++++++- drivers/passthrough/vtd/dmar.h | 2 - drivers/passthrough/vtd/intremap.c | 4 +-- drivers/passthrough/vtd/iommu.c | 14 +++++++++--- include/public/physdev.h | 16 ++++++++++++++ include/xen/pci.h | 11 +++++++++ 9 files changed, 138 insertions(+), 8 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
How-to for using the SR-IOV device with VT-d. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2009-Mar-19 12:05 UTC
Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV
Why put all this logic into Xen itself? Finding the matching DRHD unit will often "just work" anyway by the way the DHRD scope matching is performed. That said, yes, I get it that it might sometimes be better to actually map the VFs and ARI functions back to the PF before matching. However, why not extend the current device_add function and hypercall with a master BDF pointing back to the BDF "owning" the function? This leaves all of the matching up logic out of the hypervisor. A zero or -1 master BDF could indicate that there is no owner (i.e., it owns itself). eSk [Yu Zhao]> PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended > Function -- a function whose function number is greater than 7 within an > ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended > Function is under the scope of the same remapping unit as the traditional > function. The hypervisor needs to know if a function is Extended Function > so it can find proper DMAR for it.> And section 8.3.3 specifies that the SR-IOV Virtual Function is under the > scope of the same remapping unit as the Physical Function. The hypervisor > also needs to know if a function is the Virtual Function and which Physical > Function it''s associated with for same reason._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu Zhao
2009-Mar-19 15:35 UTC
Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV
Yes, using the master BDF can move current logic into Dom0 and makes hypervisor cleaner. And it does work for VT-d spec 1.2. But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI device and the Virtual Function have their own remapping unit or something like this, rather than use their masters'', how could we support it using the master BDF? Things evolve fast, we would need to add another hypercall to enhance the master BDF one after it''s in 3.4 -- it would be like when the device_add was added, the VT-d spec didn''t have such requirement, but now we have to add device_add_ext because the compatibility requirement. Passing these device specific information down and doing the IOMMU specific work inside the hypervisor hereditarily come with current passthrough architecture. After choosing putting all IOMMU things (both high level remapping data structures and logics, and low level hardware drivers) into hypervisor, we lost the flexibility to split the matching up logic and move it back to the Dom0 kernel. Thanks, Yu On Thu, Mar 19, 2009 at 08:05:55PM +0800, Espen Skoglund wrote:> Why put all this logic into Xen itself? Finding the matching DRHD > unit will often "just work" anyway by the way the DHRD scope matching > is performed. > > That said, yes, I get it that it might sometimes be better to actually > map the VFs and ARI functions back to the PF before matching. > However, why not extend the current device_add function and hypercall > with a master BDF pointing back to the BDF "owning" the function? > This leaves all of the matching up logic out of the hypervisor. A > zero or -1 master BDF could indicate that there is no owner (i.e., it > owns itself). > > eSk > > > > [Yu Zhao] > > PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended > > Function -- a function whose function number is greater than 7 within an > > ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended > > Function is under the scope of the same remapping unit as the traditional > > function. The hypervisor needs to know if a function is Extended Function > > so it can find proper DMAR for it. > > > And section 8.3.3 specifies that the SR-IOV Virtual Function is under the > > scope of the same remapping unit as the Physical Function. The hypervisor > > also needs to know if a function is the Virtual Function and which Physical > > Function it''s associated with for same reason._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2009-Mar-19 16:10 UTC
Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV
[Yu Zhao]> Yes, using the master BDF can move current logic into Dom0 and makes > hypervisor cleaner. And it does work for VT-d spec 1.2.> But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI > device and the Virtual Function have their own remapping unit or > something like this, rather than use their masters'', how could we > support it using the master BDF?If this happens the dom0 kernel will detect it and pass a different master BDF to the hypervisor. This was the whole point of my comment; the hypervisor should need not know what type of device function it is dealing with. The logic for handling this should if possible be kept out of the hypervisor (and if these kind of changes came along you would still need dom0 support for handling it anyway).> Things evolve fast, we would need > to add another hypercall to enhance the master BDF one after it''s in > 3.4 -- it would be like when the device_add was added, the VT-d spec > didn''t have such requirement, but now we have to add device_add_ext > because the compatibility requirement.> Passing these device specific information down and doing the IOMMU > specific work inside the hypervisor hereditarily come with current > passthrough architecture. After choosing putting all IOMMU things > (both high level remapping data structures and logics, and low level > hardware drivers) into hypervisor, we lost the flexibility to split > the matching up logic and move it back to the Dom0 kernel.I don''t buy this argument. You seem to be indicating that the mechanism for configuring a given setup can not be separated from the mechanism which actually enforces that configuration. This is not true. It''s all a matter of finding the right abstraction for the configuration interface. Flexibility need not be sacrificed. I guess the main problem here is that there was never much thought put into how to best express the interfaces and abstractions for dealing with IOMMUs, and as newer generations of IOMMU and PCIe hardware came along the lack of flexibility in the original abstractions has come back to bite us. eSk> Thanks, > Yu> On Thu, Mar 19, 2009 at 08:05:55PM +0800, Espen Skoglund wrote: >> Why put all this logic into Xen itself? Finding the matching DRHD >> unit will often "just work" anyway by the way the DHRD scope matching >> is performed. >> >> That said, yes, I get it that it might sometimes be better to actually >> map the VFs and ARI functions back to the PF before matching. >> However, why not extend the current device_add function and hypercall >> with a master BDF pointing back to the BDF "owning" the function? >> This leaves all of the matching up logic out of the hypervisor. A >> zero or -1 master BDF could indicate that there is no owner (i.e., it >> owns itself). >> >> eSk >> >> >> >> [Yu Zhao] >>> PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended >>> Function -- a function whose function number is greater than 7 within an >>> ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended >>> Function is under the scope of the same remapping unit as the traditional >>> function. The hypervisor needs to know if a function is Extended Function >>> so it can find proper DMAR for it. >> >>> And section 8.3.3 specifies that the SR-IOV Virtual Function is under the >>> scope of the same remapping unit as the Physical Function. The hypervisor >>> also needs to know if a function is the Virtual Function and which Physical >>> Function it''s associated with for same reason._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu Zhao
2009-Mar-19 16:53 UTC
Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV
On Fri, Mar 20, 2009 at 12:10:54AM +0800, Espen Skoglund wrote:> [Yu Zhao] > > Yes, using the master BDF can move current logic into Dom0 and makes > > hypervisor cleaner. And it does work for VT-d spec 1.2. > > > But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI > > device and the Virtual Function have their own remapping unit or > > something like this, rather than use their masters'', how could we > > support it using the master BDF? > > If this happens the dom0 kernel will detect it and pass a different > master BDF to the hypervisor. This was the whole point of my comment; > the hypervisor should need not know what type of device function it is > dealing with. The logic for handling this should if possible be kept > out of the hypervisor (and if these kind of changes came along you > would still need dom0 support for handling it anyway).Yes, I understand you point but didn''t make myself clear: 1) we can''t extend device_add existing in 3.3 release for compatibility reason. 2) the master BDF can only cover current VT-d 1.2 spec case -- IOMMUs from other vendors may require the ARI Extended Function or the Virtual Function to use a seperate remapping that can''t indicated by the master BDF. They may use a simple algorithm as: if (is_ari_extfn) use IOMMU_1 else if (is_sriov_virtfn) use IOMMU_2 else use BDF to find a proper IOMMU or something like this which doesn''t have the master BDF concept at all. And the Virtual Function ATS has following requirement: PCI SR-IOV 1.0 section 3.7.4: However, all VFs associated with a PF share a single input queue in the PF. To implement Invalidation flow control, the TA must ensure that the total number of outstanding Invalidate Requests to the shared PF queue (targeted to the PF and its associated VFs) does not exceed the value in the PF Invalidate Queue Depth field. Which means if we want to enable ATS for a Virtual Function, we must know it''s a Virtual Function first, then its associated Physical Function. Only knowing its master BDF can''t give IOMMU enough hint to setup the Invalidation Queue (IOMMU won''t figure out the function type behind the master BDF). Eventually we still need to pass the function type to the hypervisor and let the IOMMU code to do something else even we have found the master BDF for DRHD unit matching in the Dom0. This makes me feel no difference between putting a small part this kind logics in the Dom0 while leaving most of them in the hypervisor and putting all of them in the hypervisor.> > Things evolve fast, we would need > > to add another hypercall to enhance the master BDF one after it''s in > > 3.4 -- it would be like when the device_add was added, the VT-d spec > > didn''t have such requirement, but now we have to add device_add_ext > > because the compatibility requirement. > > > Passing these device specific information down and doing the IOMMU > > specific work inside the hypervisor hereditarily come with current > > passthrough architecture. After choosing putting all IOMMU things > > (both high level remapping data structures and logics, and low level > > hardware drivers) into hypervisor, we lost the flexibility to split > > the matching up logic and move it back to the Dom0 kernel. > > I don''t buy this argument. You seem to be indicating that the > mechanism for configuring a given setup can not be separated from the > mechanism which actually enforces that configuration. This is not > true. It''s all a matter of finding the right abstraction for the > configuration interface. Flexibility need not be sacrificed. I guess > the main problem here is that there was never much thought put into > how to best express the interfaces and abstractions for dealing with > IOMMUs, and as newer generations of IOMMU and PCIe hardware came along > the lack of flexibility in the original abstractions has come back to > bite us.Yes, VT-d used to not cover the SR-IOV/ARI device because the PCIe IOV specs appeared relatively late and hardware having these new features is rarely supported by VMMs. Any comments on improving interfaces and abstractions are welcome. Thanks, Yu _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel