thr3ads.net - Xen devel - [Xen-devel] Xen SR-IOV support initiation [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Yu Zhao

2009-Mar-13 03:29 UTC

[Xen-devel] Xen SR-IOV support initiation

Greetings,

Following commits from the upstream kernel are backported to the Domain0
kernel as the first step of enabling Xen SR-IOV. Hopefully the SR-IOV core
patches for the upstream kernel would be applied on the PCI subsystem tree
in the next several days. Will backport them immediately they are in-tree
so Xen 3.4 release could have the SR-IOV support.

  PCI: define PCI resource names in an ''enum''
  PCI: remove unnecessary condition check in pci_restore_bars()
  PCI: add a new function to map BAR offsets
  PCI: rewrite PCI BAR reading code
  PCI: handle 64-bit resources better on 32-bit machines
  PCI: fix 64-vbit prefetchable memory resource BARs
  PCI: export __pci_read_base()
  PCI: support PCIe ARI capability
  PCI: fix ARI code to be compatible with mixed ARI/non-ARI systems
  PCI: enhance pci_ari_enabled()
  PCI: allow pci_alloc_child_bus() to handle a NULL bridge
  PCI: remove unnecessary arg of pci_update_resource()

  drivers/pci/pci-sysfs.c  |    4
  drivers/pci/pci.c        |  104 +++++++++++++------
  drivers/pci/pci.h        |   29 ++++-
  drivers/pci/probe.c      |  252
++++++++++++++++++++++++++++-------------------
  drivers/pci/proc.c       |    7 -
  drivers/pci/quirks.c     |    2
  drivers/pci/setup-res.c  |   20 +--
  include/linux/pci.h      |   40 ++++---
  include/linux/pci_regs.h |   15 ++
  9 files changed, 310 insertions(+), 163 deletions(-)

---

The SR-IOV specification (requires membership) can be found at:
  *
http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf

The latest SR-IOV core patches for the upstream kernel are:
  PCI: initialize and release SR-IOV capability
  * http://patchwork.kernel.org/patch/11062/
  PCI: restore saved SR-IOV state
  * http://patchwork.kernel.org/patch/11063/
  PCI: reserve bus range for SR-IOV device
  * http://patchwork.kernel.org/patch/11064/
  PCI: centralize device setup code
  * http://patchwork.kernel.org/patch/11065/
  PCI: add SR-IOV API for Physical Function driver
  * http://patchwork.kernel.org/patch/11070/
  PCI: handle SR-IOV Virtual Function Migration
  * http://patchwork.kernel.org/patch/11066/
  PCI: document SR-IOV sysfs entries
  * http://patchwork.kernel.org/patch/11068/
  PCI: manual for SR-IOV user and driver developer
  * http://patchwork.kernel.org/patch/11069/

Intel 82576 Gigabit Ethernet Controller datasheet is available at:
  * http://download.intel.com/design/network/datashts/82576_Datasheet.pdf

The patches to enable the SR-IOV capability of Intel 82576 NIC are
available at (a.k.a Physical Function driver):
  * http://patchwork.kernel.org/patch/8063/
  * http://patchwork.kernel.org/patch/8064/
  * http://patchwork.kernel.org/patch/8065/
  * http://patchwork.kernel.org/patch/8066/

And the driver for Intel 82576 Virtual Function is available at:
  * http://patchwork.kernel.org/patch/11029/
  * http://patchwork.kernel.org/patch/11028/













_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu Zhao

2009-Mar-19 08:19 UTC

head link

[Xen-devel] Xen SR-IOV core patches

Greetings,

The latest SR-IOV core patches for the native Linux kernel are backported
to Xen/Dom0. I planed to do it after they are applied on the native kernel
tree, however, Xen 3.4 is going to freeze. I had to ask Keir to take the
SR-IOV patches for 3.4 so it wouldn''t disappoint a lot of people who
are
expecting the SR-IOV support in 3.4 release. Since the SR-IOV patches for
the native kernel have been reviewed by some key kernel developers during
a long development time cycle, the SR-IOV Physical Function driver API is
stable and will not cost any extra porting effort for people who want to
develop the Physical Function driver for both Xen/Dom0 and native Linux.

The SR-IOV core patches backported to Xen/Dom0 are:
  PCI: initialize and release SR-IOV capability (hg c/s: 822)
  PCI: restore saved SR-IOV state (hg c/s: 823)
  PCI: reserve bus range for SR-IOV device (hg c/s: 824)
  PCI: centralize device setup code (hg c/s: 825)
  PCI: add SR-IOV API for Physical Function driver (hg c/s: 826)

Following patches are not backported:
  PCI: handle SR-IOV Virtual Function Migration
  * This is for using the SR-IOV in conjunction with the MR-IOV. It
won''t
    be needed until the MR-IOV is fully supported.
  PCI: document SR-IOV sysfs entries
  * The sysfs ABI document, which has a gap between the upstream and
    the Dom0 kernels. So please use the latest upstream kernel ABI
    document instead before Dom0 switches to the pv_ops kernel.
  PCI: manual for SR-IOV user and driver developer
  * The PCI subsystem document, same as above.

 drivers/pci/Kconfig      |    9
 drivers/pci/Makefile     |    2
 drivers/pci/iov.c        |  537 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci.c        |    9
 drivers/pci/pci.h        |   49 ++++
 drivers/pci/probe.c      |   49 ++--
 include/linux/pci.h      |   33 ++
 include/linux/pci_regs.h |   33 ++
 8 files changed, 699 insertions(+), 22 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu Zhao

2009-Mar-19 08:38 UTC

head link

[Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

Greetings,

Following patches enhance the VT-d and PCI code, and add a quirk for
Intel 82576 SR-IOV NIC, so they can work better with the SR-IOV core.

  PCI: pass ARI and SR-IOV device information to the hypervisor
  PCI: add a SR-IOV quirk for Intel 82576 NIC
  PCI: save and restore PCIe 2.0 registers

 drivers/pci/pci.c               |   11 +++++++
 drivers/pci/quirks.c            |   57 ++++++++++++++++++++++++++++++++++++++++
 drivers/xen/core/pci.c          |   27 ++++++++++++++++--
 include/linux/pci_regs.h        |    2 +
 include/xen/interface/physdev.h |   16 +++++++++++
 5 files changed, 109 insertions(+), 4 deletions(-)

  Xen: use proper device ID to search VT-d unit for ARI and SR-IOV device

 arch/ia64/xen/hypercall.c          |   22 +++++++++++++++++++
 arch/x86/physdev.c                 |   22 +++++++++++++++++++
 drivers/passthrough/pci.c          |   41 +++++++++++++++++++++++++++++++++++++
 drivers/passthrough/vtd/dmar.c     |   14 +++++++++++-
 drivers/passthrough/vtd/dmar.h     |    2 -
 drivers/passthrough/vtd/intremap.c |    4 +--
 drivers/passthrough/vtd/iommu.c    |   14 +++++++++---
 include/public/physdev.h           |   16 ++++++++++++++
 include/xen/pci.h                  |   11 +++++++++
 9 files changed, 138 insertions(+), 8 deletions(-)





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu Zhao

2009-Mar-19 09:43 UTC

head link

[Xen-devel] Xen doc for VT-d/SR-IOV

How-to for using the SR-IOV device with VT-d.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2009-Mar-19 12:05 UTC

head link

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

Why put all this logic into Xen itself?  Finding the matching DRHD
unit will often "just work" anyway by the way the DHRD scope matching
is performed.

That said, yes, I get it that it might sometimes be better to actually
map the VFs and ARI functions back to the PF before matching.
However, why not extend the current device_add function and hypercall
with a master BDF pointing back to the BDF "owning" the function?
This leaves all of the matching up logic out of the hypervisor.  A
zero or -1 master BDF could indicate that there is no owner (i.e., it
owns itself).

	eSk



[Yu Zhao]> PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the Extended
> Function -- a function whose function number is greater than 7 within an
> ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the Extended
> Function is under the scope of the same remapping unit as the traditional
> function. The hypervisor needs to know if a function is Extended Function
> so it can find proper DMAR for it.
> And section 8.3.3 specifies that the SR-IOV Virtual Function is under the
> scope of the same remapping unit as the Physical Function. The hypervisor
> also needs to know if a function is the Virtual Function and which Physical
> Function it''s associated with for same reason.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu Zhao

2009-Mar-19 15:35 UTC

head link

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

Yes, using the master BDF can move current logic into Dom0 and makes
hypervisor cleaner. And it does work for VT-d spec 1.2.

But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI
device and the Virtual Function have their own remapping unit or
something like this, rather than use their masters'', how could we
support
it using the master BDF? Things evolve fast, we would need to add
another hypercall to enhance the master BDF one after it''s in 3.4 --
it would be like when the device_add was added, the VT-d spec didn''t
have such requirement, but now we have to add device_add_ext because
the compatibility requirement.

Passing these device specific information down and doing the IOMMU
specific work inside the hypervisor hereditarily come with current
passthrough architecture. After choosing putting all IOMMU things (both
high level remapping data structures and logics, and low level hardware
drivers) into hypervisor, we lost the flexibility to split the matching
up logic and move it back to the Dom0 kernel.

Thanks,
Yu

On Thu, Mar 19, 2009 at 08:05:55PM +0800, Espen Skoglund
wrote:> Why put all this logic into Xen itself?  Finding the matching DRHD
> unit will often "just work" anyway by the way the DHRD scope
matching
> is performed.
> 
> That said, yes, I get it that it might sometimes be better to actually
> map the VFs and ARI functions back to the PF before matching.
> However, why not extend the current device_add function and hypercall
> with a master BDF pointing back to the BDF "owning" the function?
> This leaves all of the matching up logic out of the hypervisor.  A
> zero or -1 master BDF could indicate that there is no owner (i.e., it
> owns itself).
> 
> 	eSk
> 
> 
> 
> [Yu Zhao]
> > PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the
Extended
> > Function -- a function whose function number is greater than 7 within
an
> > ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the
Extended
> > Function is under the scope of the same remapping unit as the
traditional
> > function. The hypervisor needs to know if a function is Extended
Function
> > so it can find proper DMAR for it.
> 
> > And section 8.3.3 specifies that the SR-IOV Virtual Function is under
the
> > scope of the same remapping unit as the Physical Function. The
hypervisor
> > also needs to know if a function is the Virtual Function and which
Physical
> > Function it''s associated with for same reason.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2009-Mar-19 16:10 UTC

head link

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

[Yu Zhao]> Yes, using the master BDF can move current logic into Dom0 and makes
> hypervisor cleaner. And it does work for VT-d spec 1.2.
> But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI
> device and the Virtual Function have their own remapping unit or
> something like this, rather than use their masters'', how could we
> support it using the master BDF?
If this happens the dom0 kernel will detect it and pass a different
master BDF to the hypervisor.  This was the whole point of my comment;
the hypervisor should need not know what type of device function it is
dealing with.  The logic for handling this should if possible be kept
out of the hypervisor (and if these kind of changes came along you
would still need dom0 support for handling it anyway).
>                                    Things evolve fast, we would need
> to add another hypercall to enhance the master BDF one after it''s
in
> 3.4 -- it would be like when the device_add was added, the VT-d spec
> didn''t have such requirement, but now we have to add
device_add_ext
> because the compatibility requirement.
> Passing these device specific information down and doing the IOMMU
> specific work inside the hypervisor hereditarily come with current
> passthrough architecture. After choosing putting all IOMMU things
> (both high level remapping data structures and logics, and low level
> hardware drivers) into hypervisor, we lost the flexibility to split
> the matching up logic and move it back to the Dom0 kernel.
I don''t buy this argument.  You seem to be indicating that the
mechanism for configuring a given setup can not be separated from the
mechanism which actually enforces that configuration.  This is not
true.  It''s all a matter of finding the right abstraction for the
configuration interface.  Flexibility need not be sacrificed.  I guess
the main problem here is that there was never much thought put into
how to best express the interfaces and abstractions for dealing with
IOMMUs, and as newer generations of IOMMU and PCIe hardware came along
the lack of flexibility in the original abstractions has come back to
bite us.

	eSk

> Thanks,
> Yu
> On Thu, Mar 19, 2009 at 08:05:55PM +0800, Espen Skoglund wrote:
>> Why put all this logic into Xen itself?  Finding the matching DRHD
>> unit will often "just work" anyway by the way the DHRD scope
matching
>> is performed.
>> 
>> That said, yes, I get it that it might sometimes be better to actually
>> map the VFs and ARI functions back to the PF before matching.
>> However, why not extend the current device_add function and hypercall
>> with a master BDF pointing back to the BDF "owning" the
function?
>> This leaves all of the matching up logic out of the hypervisor.  A
>> zero or -1 master BDF could indicate that there is no owner (i.e., it
>> owns itself).
>> 
>> 	eSk
>> 
>> 
>> 
>> [Yu Zhao]
>>> PCIe Alternative Routing-ID Interpretation (ARI) ECN defines the
Extended
>>> Function -- a function whose function number is greater than 7
within an
>>> ARI Device. Intel VT-d spec 1.2 section 8.3.2 specifies that the
Extended
>>> Function is under the scope of the same remapping unit as the
traditional
>>> function. The hypervisor needs to know if a function is Extended
Function
>>> so it can find proper DMAR for it.
>> 
>>> And section 8.3.3 specifies that the SR-IOV Virtual Function is
under the
>>> scope of the same remapping unit as the Physical Function. The
hypervisor
>>> also needs to know if a function is the Virtual Function and which
Physical
>>> Function it''s associated with for same reason.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu Zhao

2009-Mar-19 16:53 UTC

head link

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

On Fri, Mar 20, 2009 at 12:10:54AM +0800, Espen Skoglund
wrote:> [Yu Zhao]
> > Yes, using the master BDF can move current logic into Dom0 and makes
> > hypervisor cleaner. And it does work for VT-d spec 1.2.
> 
> > But if VT-d spec 1.3 (or AMD/IBM/Sun IOMMU specs) says that the ARI
> > device and the Virtual Function have their own remapping unit or
> > something like this, rather than use their masters'', how
could we
> > support it using the master BDF?
> 
> If this happens the dom0 kernel will detect it and pass a different
> master BDF to the hypervisor.  This was the whole point of my comment;
> the hypervisor should need not know what type of device function it is
> dealing with.  The logic for handling this should if possible be kept
> out of the hypervisor (and if these kind of changes came along you
> would still need dom0 support for handling it anyway).
Yes, I understand you point but didn''t make myself clear:
1) we can''t extend device_add existing in 3.3 release for compatibility
reason.
2) the master BDF can only cover current VT-d 1.2 spec case -- IOMMUs
from other vendors may require the ARI Extended Function or the Virtual
Function to use a seperate remapping that can''t indicated by the master
BDF. They may use a simple algorithm as:
    if (is_ari_extfn)
        use IOMMU_1
    else if (is_sriov_virtfn)
        use IOMMU_2
    else
        use BDF to find a proper IOMMU
or something like this which doesn''t have the master BDF concept at
all.
And the Virtual Function ATS has following requirement:

PCI SR-IOV 1.0 section 3.7.4:
However, all VFs associated with a PF share a single input queue in the
PF. To implement Invalidation flow control, the TA must ensure that the
total number of outstanding Invalidate Requests to the shared PF queue
(targeted to the PF and its associated VFs) does not exceed the value
in the PF Invalidate Queue Depth field.

Which means if we want to enable ATS for a Virtual Function, we must
know it''s a Virtual Function first, then its associated Physical
Function. Only knowing its master BDF can''t give IOMMU enough hint
to setup the Invalidation Queue (IOMMU won''t figure out the function
type behind the master BDF).

Eventually we still need to pass the function type to the hypervisor and
let the IOMMU code to do something else even we have found the master BDF
for DRHD unit matching in the Dom0. This makes me feel no difference
between putting a small part this kind logics in the Dom0 while leaving
most of them in the hypervisor and putting all of them in the hypervisor.
> >                                    Things evolve fast, we would need
> > to add another hypercall to enhance the master BDF one after
it''s in
> > 3.4 -- it would be like when the device_add was added, the VT-d spec
> > didn''t have such requirement, but now we have to add
device_add_ext
> > because the compatibility requirement.
> 
> > Passing these device specific information down and doing the IOMMU
> > specific work inside the hypervisor hereditarily come with current
> > passthrough architecture. After choosing putting all IOMMU things
> > (both high level remapping data structures and logics, and low level
> > hardware drivers) into hypervisor, we lost the flexibility to split
> > the matching up logic and move it back to the Dom0 kernel.
> 
> I don''t buy this argument.  You seem to be indicating that the
> mechanism for configuring a given setup can not be separated from the
> mechanism which actually enforces that configuration.  This is not
> true.  It''s all a matter of finding the right abstraction for the
> configuration interface.  Flexibility need not be sacrificed.  I guess
> the main problem here is that there was never much thought put into
> how to best express the interfaces and abstractions for dealing with
> IOMMUs, and as newer generations of IOMMU and PCIe hardware came along
> the lack of flexibility in the original abstractions has come back to
> bite us.
Yes, VT-d used to not cover the SR-IOV/ARI device because the PCIe IOV
specs appeared relatively late and hardware having these new features
is rarely supported by VMMs.

Any comments on improving interfaces and abstractions are welcome.

Thanks,
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2009 - Xen SR-IOV support initiation

[Xen-devel] Xen SR-IOV support initiation

[Xen-devel] Xen SR-IOV core patches

[Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

[Xen-devel] Xen doc for VT-d/SR-IOV

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV

Re: [Xen-devel] PCIe 2.0, VT-d and Intel 82576 enhancement for Xen SR-IOV