thr3ads.net - Xen devel - [Xen-devel] MSI and VT-d interrupt remapping [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Espen Skoglund

2008-Mar-06 21:07 UTC

[Xen-devel] MSI and VT-d interrupt remapping

I''ve seen patches for handling MSIs posted on this list.  I know that
they are still being worked on, but by the looks of it they are
missing some functionality that could prove useful.

The proposed patches enable a device to allocate a vector to an MSI
interrupt.  For MSI-X a set of vectors can be allocated.  When
configuring MSIs, for each PCI function one configures a specific
destination address and message data to be used for interrupt
triggering.  The message address indicates the destination for the
interrupt and the message data essentially indicates the vector to
trigger on the destination.

Now, MSI also has a mode which allows up to the 5 lower bits of the
message data to be set arbitrarily by the device itself.  That is, a
device can be configured to deliver up to 32 different, contigous
vectors aligned within an apropriate boundary.

Enabling a device to trigger 32 different vectors on a single
interrupt destination may not actually be all that useful.  However,
with the introduction of VT-d interrupt remapping these 32 different
messages can be remapped to to arbitrary vectors *and*
destinations---not only to a contigous set of vectors on a single
destination.

If an MSI capable device was able to make use of the above feature,
the device could be set up to generate different interrupts depending
on where the incoming interrupt was to be handled.  For example,
incoming data for a particular guest could trigger an interrupt on the
processor where that guest is running.  Obviously, a dom0-like backend
driver would not be involved in the actual event delivery in these
situations.  The event would be delivered directly to the frontend.

The necessary changes would enable a device driver for an MSI capable
device to allocate a range of pirqs and bind these to different
frontends.

Does this make sense?

	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Mar-07 13:26 UTC

head link

Re: [Xen-devel] MSI and VT-d interrupt remapping

On 6/3/08 21:07, "Espen Skoglund" <espen.skoglund@netronome.com>
wrote:
> If an MSI capable device was able to make use of the above feature,
> the device could be set up to generate different interrupts depending
> on where the incoming interrupt was to be handled.  For example,
> incoming data for a particular guest could trigger an interrupt on the
> processor where that guest is running.  Obviously, a dom0-like backend
> driver would not be involved in the actual event delivery in these
> situations.  The event would be delivered directly to the frontend.
> 
> The necessary changes would enable a device driver for an MSI capable
> device to allocate a range of pirqs and bind these to different
> frontends.
The only tricky bit here is deciding what the interface should be to the
hypervisor to specify these allocation constraints.

Another thought though: there''s no good reqson for Xen to scatter its
irq-vector allocations across the vector space. That''s a holdover from
classic-Pentium-era systems, which could lose interrupts if too many got
''queued up'' at any single priority level. So we could actually
allocate our
vectors contiguously, making it much more likely that you could successfully
allocate a contiguous range even without remapping.

However, I guess you want to be able to specify different target APICs for
different vectors too, so again it comes back to: what should the guest
interface to irq-remapping hardware be?

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-Mar-07 15:31 UTC

head link

Re: [Xen-devel] MSI and VT-d interrupt remapping

[Keir Fraser]> On 6/3/08 21:07, "Espen Skoglund"
<espen.skoglund@netronome.com> wrote:
>> If an MSI capable device was able to make use of the above feature,
>> the device could be set up to generate different interrupts
>> depending on where the incoming interrupt was to be handled.  For
>> example, incoming data for a particular guest could trigger an
>> interrupt on the processor where that guest is running.  Obviously,
>> a dom0-like backend driver would not be involved in the actual
>> event delivery in these situations.  The event would be delivered
>> directly to the frontend.
>> 
>> The necessary changes would enable a device driver for an MSI capable
>> device to allocate a range of pirqs and bind these to different
>> frontends.
> The only tricky bit here is deciding what the interface should be to
> the hypervisor to specify these allocation constraints.
> Another thought though: there''s no good reqson for Xen to scatter
> its irq-vector allocations across the vector space. That''s a
> holdover from classic-Pentium-era systems, which could lose
> interrupts if too many got ''queued up'' at any single
priority
> level. So we could actually allocate our vectors contiguously,
> making it much more likely that you could successfully allocate a
> contiguous range even without remapping.
> However, I guess you want to be able to specify different target
> APICs for different vectors too, so again it comes back to: what
> should the guest interface to irq-remapping hardware be?
Right.  The reason for bringing up this suggestion now rather than
later is because MSI support has not yet found its way into mainline.
Whoever decides on the interface used for registering MSI and MSI-X
interrupts might want to take multi-message MSIs into account as well.

I do not think explicitly specifying destination APIC upon allocation
is the best idea.  Setting the affinity upon binding the interrupt
like it''s done today seems like a better approach.  This leaves us
with dealing with the vectors.

My initial thought was to make use of the new msix_entries[] field in
the xen_pci_op structure.  This field is already used as an in/out
parameter for allocating MSI-X interrupts.  The pciback_enable_msi()
function can then attempt to allocate multiple interrupts instead of a
single one, and return the allocated vectors.

The current MSI patchset also lacks a set_affinity() function for
changing the APIC destination similar to what is done for, e.g.,
IOAPICs.  Also similar to IOAPICs, the MSI support should have
something like the io_apic_write_remap_rte() for rewriting the
interrupt remapping table when enabled.

A special case must exist when setting the interrupt affinity for
multiple-message enabled MSI devices.  There probably should exist
some magic in the set_affinity() function for handling this properly.
That is, setting affinity for the whole group of MSI interrupts does
not make all that much sense.  It makes more sense when one can set
the per-interrupt affinity through the interrupt remapping table.

It should be evident by now that my suggestions for deciding upon an
interface and implementation of it is rather fluffy; borderlining
non-existent.  This is partly because I''m talking about not yet
existing MSI support, but mainly because I''m still new to Xen
internals.  Nonetheless, I believe it would be good for people working
on MSI support to take multi-message and interrupt remapping into
account as well.

	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2008-Mar-25 03:01 UTC

head link

RE: [Xen-devel] MSI and VT-d interrupt remapping

xen-devel-bounces@lists.xensource.com <> wrote:> [Keir Fraser]
>> On 6/3/08 21:07, "Espen Skoglund"
> <espen.skoglund@netronome.com> wrote:
>>> If an MSI capable device was able to make use of the above feature,
>>> the device could be set up to generate different interrupts
>>> depending on where the incoming interrupt was to be handled.  For
>>> example, incoming data for a particular guest could trigger an
>>> interrupt on the processor where that guest is running.  Obviously,
>>> a dom0-like backend driver would not be involved in the actual
>>> event delivery in these situations.  The event would be delivered
>>> directly to the frontend. 
>>> 
>>> The necessary changes would enable a device driver for an MSI
capable>>> device to allocate a range of pirqs and bind these to different
>>> frontends.
> 
>> The only tricky bit here is deciding what the interface should be to
>> the hypervisor to specify these allocation constraints.
> 
>> Another thought though: there''s no good reqson for Xen to
scatter
>> its irq-vector allocations across the vector space. That''s a
>> holdover from classic-Pentium-era systems, which could lose
>> interrupts if too many got ''queued up'' at any single
priority
>> level. So we could actually allocate our vectors contiguously,
>> making it much more likely that you could successfully allocate a
>> contiguous range even without remapping.
> 
>> However, I guess you want to be able to specify different target
>> APICs for different vectors too, so again it comes back to: what
>> should the guest interface to irq-remapping hardware be?
> 
> Right.  The reason for bringing up this suggestion now rather than
> later is because MSI support has not yet found its way into mainline.
> Whoever decides on the interface used for registering MSI and MSI-X
> interrupts might want to take multi-message MSIs into account as well.
Espen, thanks for your comments. I remember Linux has not such support,
so Linux driver will not benifit from such implementation. After all,
driver need provide ISR for the interrupts. Of course, we need this
feature if any OS has support. I didn''t support this because it may
require changes to various common components and need more discussion,
while Linux has no support to it. (also I rushed to 3.2 cut-off at that
time :$).
> 
> I do not think explicitly specifying destination APIC upon allocation
> is the best idea.  Setting the affinity upon binding the interrupt
> like it''s done today seems like a better approach.  This leaves us
> with dealing with the vectors.
But what should happen when the vcpu is migrated to another physical
cpu? I''m not sure the cost to program the interrupt remapping table,
otherwise, that is a good choice to achieveh the affinity.

As for vector assignment, I agree simpler method is to change the vector
assignment in xen as Keir suggestion. Also I suspect we may need support
per-CPU vector later, if there are so many vector requested.
> 
> My initial thought was to make use of the new msix_entries[] field in
> the xen_pci_op structure.  This field is already used as an in/out
> parameter for allocating MSI-X interrupts.  The pciback_enable_msi()
> function can then attempt to allocate multiple interrupts instead of a
> single one, and return the allocated vectors.
> 
> The current MSI patchset also lacks a set_affinity() function for
> changing the APIC destination similar to what is done for, e.g.,
> IOAPICs.  Also similar to IOAPICs, the MSI support should have
> something like the io_apic_write_remap_rte() for rewriting the
> interrupt remapping table when enabled.
For the set_affinity(),  what do you mean of changing the APIC
destination? Currently if set guest''s pirq''s affinity, it will
only
impact event channel. The physical one will only be called once, when
the pirq is bound.

As for rewriting interrupt remapping table like
io_apic_write_remap_rte(), I think it will be added later also.

I''m also a bit confused for your statement in previous mail "The
necessary changes would enable a device driver for an MSI capable device
to allocate a range of pirqs and bind these to different frontends.".
What do you mean of different frontends?

Really thanks for your suggestion.
> 
> A special case must exist when setting the interrupt affinity for
> multiple-message enabled MSI devices.  There probably should exist
> some magic in the set_affinity() function for handling this properly.
> That is, setting affinity for the whole group of MSI interrupts does
> not make all that much sense.  It makes more sense when one can set
> the per-interrupt affinity through the interrupt remapping table.
> 
> It should be evident by now that my suggestions for deciding upon an
> interface and implementation of it is rather fluffy; borderlining
> non-existent.  This is partly because I''m talking about not yet
> existing MSI support, but mainly because I''m still new to Xen
> internals.  Nonetheless, I believe it would be good for people working
> on MSI support to take multi-message and interrupt remapping into
account> as well. 
> 
> 	eSk
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-Mar-25 13:28 UTC

head link

RE: [Xen-devel] MSI and VT-d interrupt remapping

[Yunhong Jiang]>> Right.  The reason for bringing up this suggestion now rather than
>> later is because MSI support has not yet found its way into
>> mainline.  Whoever decides on the interface used for registering
>> MSI and MSI-X interrupts might want to take multi-message MSIs into
>> account as well.
> Espen, thanks for your comments. I remember Linux has not such
> support, so Linux driver will not benifit from such
> implementation. After all, driver need provide ISR for the
> interrupts. Of course, we need this feature if any OS has support. I
> didn''t support this because it may require changes to various
common
> components and need more discussion, while Linux has no support to
> it. (also I rushed to 3.2 cut-off at that time :$).
You''re right in that Linux does not currently support this.  You can,
however, allocate multiple interrupts using MSI-X.  Anyhow, I was not
envisioning this feature being used directly for passthrough device
access.  Rather, I was considering the case where a device could be
configured to communicate data directly into a VM (e.g., using
multi-queue NICs) and deliver the interrupt to the appropriate VM.  In
this case the frontend in the guest would not need to see a
multi-message MSI device, only the backend in dom0/the driver domain
would need to be made aware of it.
>> I do not think explicitly specifying destination APIC upon
>> allocation is the best idea.  Setting the affinity upon binding the
>> interrupt like it''s done today seems like a better approach. 
This
>> leaves us with dealing with the vectors.
> But what should happen when the vcpu is migrated to another physical
> cpu? I''m not sure the cost to program the interrupt remapping
table,
> otherwise, that is a good choice to achieveh the affinity.
As you''ve already said, the interrupt affinity is only set when a pirq
is bound.  The interrupt routing is not redirected if the vcpu it''s
bound to migrates to another physical cpu.  This can (should?) be
changed in the future so that the affinity is either set implicitly
when migrating the vcpu, or explictily with a rebind call by dom0.  In
any case the affinity would be reset by the set_affinity method.
>> My initial thought was to make use of the new msix_entries[] field
>> in the xen_pci_op structure.  This field is already used as an
>> in/out parameter for allocating MSI-X interrupts.  The
>> pciback_enable_msi() function can then attempt to allocate multiple
>> interrupts instead of a single one, and return the allocated
>> vectors.
>> 
>> The current MSI patchset also lacks a set_affinity() function for
>> changing the APIC destination similar to what is done for, e.g.,
>> IOAPICs.  Also similar to IOAPICs, the MSI support should have
>> something like the io_apic_write_remap_rte() for rewriting the
>> interrupt remapping table when enabled.
> For the set_affinity(), what do you mean of changing the APIC
> destination? Currently if set guest''s pirq''s affinity, it
will only
> impact event channel. The physical one will only be called once,
> when the pirq is bound.
With "changing the APIC destination" I meant changing the destination
CPU of an interrupt while keeping the vector, delivery type,
etc. intact.
> As for rewriting interrupt remapping table like
> io_apic_write_remap_rte(), I think it will be added later also.
> I''m also a bit confused for your statement in previous mail
"The
> necessary changes would enable a device driver for an MSI capable
> device to allocate a range of pirqs and bind these to different
> frontends.".  What do you mean of different frontends?
Different frontends here means multiple instances of frontends
residing in different VMs, all served by a single backend.  As eluded
to above, the idea is to have a single backend that has direct access
to the device, and multiple frontends that somehow share some limited
direct access to the device.  For example, a multi-queue capable NIC
could deliver the packets to the queue in the apropriate VM and raise
an interrupt in that VM without involving the domain of the backend
driver.

	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2008-Mar-25 14:29 UTC

head link

RE: [Xen-devel] MSI and VT-d interrupt remapping

xen-devel-bounces@lists.xensource.com <> wrote:> [Yunhong Jiang]
>>> Right.  The reason for bringing up this suggestion now rather than
>>> later is because MSI support has not yet found its way into
>>> mainline.  Whoever decides on the interface used for registering
>>> MSI and MSI-X interrupts might want to take multi-message MSIs into
>>> account as well.
> 
>> Espen, thanks for your comments. I remember Linux has not such
>> support, so Linux driver will not benifit from such
>> implementation. After all, driver need provide ISR for the
>> interrupts. Of course, we need this feature if any OS has support. I
>> didn''t support this because it may require changes to various
common
>> components and need more discussion, while Linux has no support to
>> it. (also I rushed to 3.2 cut-off at that time :$).
> 
> You''re right in that Linux does not currently support this.  You
can,
> however, allocate multiple interrupts using MSI-X.  Anyhow, I was not
> envisioning this feature being used directly for passthrough device
> access.  Rather, I was considering the case where a device could be
> configured to communicate data directly into a VM (e.g., using
> multi-queue NICs) and deliver the interrupt to the appropriate VM.  In
> this case the frontend in the guest would not need to see a
> multi-message MSI device, only the backend in dom0/the driver domain
> would need to be made aware of it.
Although I don''t know if any device has such usage model
(Intel''s VMDq
is using MSI-X ), but yes, your usage model will be helpful.
To achive this, maybe we need change the protocol between pci backend
and pci frontend, in fact, maybe the pci_enable_msi/pci_enable_msix can
be commbind, with a flag to determin if the vector should be continous
or not. 

One thing left is, how can the driver domain bind the vector to the
frontend VM.  Some sanity check mechanism should be added.

BTW, can you tell which device may use this feature? I''m a bit
interesting on this.
> 
>>> I do not think explicitly specifying destination APIC upon
>>> allocation is the best idea.  Setting the affinity upon binding the
>>> interrupt like it''s done today seems like a better
approach.  This
>>> leaves us with dealing with the vectors.
> 
>> But what should happen when the vcpu is migrated to another physical
>> cpu? I''m not sure the cost to program the interrupt remapping
table,
>> otherwise, that is a good choice to achieveh the affinity.
> 
> As you''ve already said, the interrupt affinity is only set when a
pirq
> is bound.  The interrupt routing is not redirected if the vcpu
it''s
> bound to migrates to another physical cpu.  This can (should?) be
> changed in the future so that the affinity is either set implicitly
> when migrating the vcpu, or explictily with a rebind call by dom0.  In
> any case the affinity would be reset by the set_affinity method.
Yes, I remember Keir suggested to use interrupt remapping table in vtd
to achieve this, not sure that is still ok.
> 
>>> My initial thought was to make use of the new msix_entries[] field
>>> in the xen_pci_op structure.  This field is already used as an
>>> in/out parameter for allocating MSI-X interrupts.  The
>>> pciback_enable_msi() function can then attempt to allocate multiple
>>> interrupts instead of a single one, and return the allocated
vectors.>>> 
>>> The current MSI patchset also lacks a set_affinity() function for
>>> changing the APIC destination similar to what is done for, e.g.,
>>> IOAPICs.  Also similar to IOAPICs, the MSI support should have
>>> something like the io_apic_write_remap_rte() for rewriting the
>>> interrupt remapping table when enabled.
> 
>> For the set_affinity(), what do you mean of changing the APIC
>> destination? Currently if set guest''s pirq''s
affinity, it will only
>> impact event channel. The physical one will only be called once,
>> when the pirq is bound.
> 
> With "changing the APIC destination" I meant changing the
destination
> CPU of an interrupt while keeping the vector, delivery type,
> etc. intact.
> 
>> As for rewriting interrupt remapping table like
>> io_apic_write_remap_rte(), I think it will be added later also.
> 
>> I''m also a bit confused for your statement in previous mail
"The
>> necessary changes would enable a device driver for an MSI capable
>> device to allocate a range of pirqs and bind these to different
>> frontends.".  What do you mean of different frontends?
> 
> Different frontends here means multiple instances of frontends
> residing in different VMs, all served by a single backend.  As eluded
> to above, the idea is to have a single backend that has direct access
> to the device, and multiple frontends that somehow share some limited
> direct access to the device.  For example, a multi-queue capable NIC
> could deliver the packets to the queue in the apropriate VM and raise
> an interrupt in that VM without involving the domain of the backenddriver.

Got it.
> 
> 	eSk
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-Mar-25 16:38 UTC

head link

RE: [Xen-devel] MSI and VT-d interrupt remapping

[Yunhong Jiang]> xen-devel-bounces@lists.xensource.com <> wrote:
>> You''re right in that Linux does not currently support this. 
You
>> can, however, allocate multiple interrupts using MSI-X.  Anyhow, I
>> was not envisioning this feature being used directly for
>> passthrough device access.  Rather, I was considering the case
>> where a device could be configured to communicate data directly
>> into a VM (e.g., using multi-queue NICs) and deliver the interrupt
>> to the appropriate VM.  In this case the frontend in the guest
>> would not need to see a multi-message MSI device, only the backend
>> in dom0/the driver domain would need to be made aware of it.
> Although I don''t know if any device has such usage model
(Intel''s
> VMDq is using MSI-X ), but yes, your usage model will be helpful.
> To achive this, maybe we need change the protocol between pci
> backend and pci frontend, in fact, maybe the
> pci_enable_msi/pci_enable_msix can be commbind, with a flag to
> determin if the vector should be continous or not.
This is similar to my initial idea as well.  In addition to being
contigous the multi-message MSI request would also need to allocate
vectors that are properly aligned.
> One thing left is, how can the driver domain bind the vector to the
> frontend VM.  Some sanity check mechanism should be added.
Well, there exists a domctl for modifying the permissions of a pirq.
This could be used to grant pirq access to a frontend domain.  Not
sure if this is sufficient.

Also, as discussed in my previous reply dom0 may need the ability to
reset the affinity of an irq when migrating the destination vcpu.
Further, a pirq is now always bound to vcpu[0] of a domain (in
evtchn_bind_pirq).  There is clearly some room for improvement and
more flexibility here.

Not sure what the best solution is.  One option is to allow a guest to
re-bind a pirq to set its affinity, and have such expliticly set
affinities be automatically updated when the associated vcpu is
migrated.  Another option is to create unbound ports in a guest domain
and let a privileged domain bind pirqs to those port.  The privileged
domain should then also be allowed to later modify the destination
vcpu and set the affinity of the bound pirq.

> BTW, can you tell which device may use this feature? I''m a bit
> interesting on this.
I must confess that I do not know of any device that currently use
this feature (perhaps Solarflare or NetXen devices have support for
it), and the whole connection with VT-d interreupt remapping is as of
now purely academic anyway due to the lack of chipsets with the
apropriate feature.

However, the whole issue of binding multiple pirqs of a device to
different guest domains remains the same even if using MSI-X.
Multi-message MSI devices only/mostly add some additional restrictions
upon allocating interrupt vectors.

>>>> I do not think explicitly specifying destination APIC upon
>>>> allocation is the best idea.  Setting the affinity upon binding
>>>> the interrupt like it''s done today seems like a better
approach.
>>>> This leaves us with dealing with the vectors.
>> 
>>> But what should happen when the vcpu is migrated to another
>>> physical cpu? I''m not sure the cost to program the
interrupt
>>> remapping table, otherwise, that is a good choice to achieveh the
>>> affinity.
>> 
>> As you''ve already said, the interrupt affinity is only set
when a
>> pirq is bound.  The interrupt routing is not redirected if the vcpu
>> it''s bound to migrates to another physical cpu.  This can
(should?)
>> be changed in the future so that the affinity is either set
>> implicitly when migrating the vcpu, or explictily with a rebind
>> call by dom0.  In any case the affinity would be reset by the
>> set_affinity method.
> Yes, I remember Keir suggested to use interrupt remapping table in
> vtd to achieve this, not sure that is still ok.
Relying on the VT-d interrupt remapping table would rule out any Intel
chipset on the market today, and also the equivalent solution (if any)
used by AMD and others.

It seems better to update the IOAPIC entry or MSI capability structure
directly when redirecting the interrupt, and let io_apic_write() or
the equivalent function for MSI rewrite the interrupt remapping table
if VT-d is enabled.  Not sure how much it would cost to rewrite the
remapping table and perform the respecive VT-d interrupt entry cache
flush; it''s difficult to measure without actually having any available
hardware.  However, I suspect the cost would in many cases be dwarfed
by migrating the cache working set and by other associated costs of
migrating a vcpu.

	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2008-Mar-26 14:46 UTC

head link

RE: [Xen-devel] MSI and VT-d interrupt remapping

Espen Skoglund <mailto:espen.skoglund@netronome.com>
wrote:> [Yunhong Jiang]
>> xen-devel-bounces@lists.xensource.com <> wrote:
>>> You''re right in that Linux does not currently support
this.  You
>>> can, however, allocate multiple interrupts using MSI-X.  Anyhow, I
>>> was not envisioning this feature being used directly for
>>> passthrough device access.  Rather, I was considering the case
>>> where a device could be configured to communicate data directly
>>> into a VM (e.g., using multi-queue NICs) and deliver the interrupt
>>> to the appropriate VM.  In this case the frontend in the guest
>>> would not need to see a multi-message MSI device, only the backend
>>> in dom0/the driver domain would need to be made aware of it.
> 
>> Although I don''t know if any device has such usage model
(Intel''s
>> VMDq is using MSI-X ), but yes, your usage model will be helpful.
>> To achive this, maybe we need change the protocol between pci
>> backend and pci frontend, in fact, maybe the
>> pci_enable_msi/pci_enable_msix can be commbind, with a flag to
>> determin if the vector should be continous or not.
> 
> This is similar to my initial idea as well.  In addition to being
> contigous the multi-message MSI request would also need to allocate
> vectors that are properly aligned.
Yes, but don''t think we need add the implementation now. We can change
the xen_pci_op to accomondate this requirement, otherwise, this will
cause more difference with upstream Linux. (Maybe the hypercall need
changed for this requirement also).

As for set_irq_affinity, I think it is a general issue, not MSI related,
we can follow up on it continously.

> 
>> One thing left is, how can the driver domain bind the vector to the
>> frontend VM.  Some sanity check mechanism should be added.
> 
> Well, there exists a domctl for modifying the permissions of a pirq.
> This could be used to grant pirq access to a frontend domain.  Not
sure if> this is sufficient. 
> 
> Also, as discussed in my previous reply dom0 may need the ability to
> reset the affinity of an irq when migrating the destination vcpu.
> Further, a pirq is now always bound to vcpu[0] of a domain (in
> evtchn_bind_pirq).  There is clearly some room for improvement and
more> flexibility here. 
> 
> Not sure what the best solution is.  One option is to allow a guest to
> re-bind a pirq to set its affinity, and have such expliticly set
> affinities be automatically updated when the associated vcpu is
> migrated.  Another option is to create unbound ports in a guest domain
> and let a privileged domain bind pirqs to those port.  The privileged
> domain should then also be allowed to later modify the destination
> vcpu and set the affinity of the bound pirq.
> 
> 
>> BTW, can you tell which device may use this feature? I''m a bit
>> interesting on this.
> 
> I must confess that I do not know of any device that currently use
> this feature (perhaps Solarflare or NetXen devices have support for
> it), and the whole connection with VT-d interreupt remapping is as of
> now purely academic anyway due to the lack of chipsets with the
apropriate> feature. 
> 
> However, the whole issue of binding multiple pirqs of a device to
> different guest domains remains the same even if using MSI-X.
> Multi-message MSI devices only/mostly add some additional restrictions
> upon allocating interrupt vectors.
> 
> 
>>>>> I do not think explicitly specifying destination APIC upon
>>>>> allocation is the best idea.  Setting the affinity upon
binding
>>>>> the interrupt like it''s done today seems like a
better approach.
>>>>> This leaves us with dealing with the vectors.
>>> 
>>>> But what should happen when the vcpu is migrated to another
>>>> physical cpu? I''m not sure the cost to program the
interrupt
>>>> remapping table, otherwise, that is a good choice to achieveh
the
>>>> affinity.
>>> 
>>> As you''ve already said, the interrupt affinity is only set
when a
>>> pirq is bound.  The interrupt routing is not redirected if the vcpu
>>> it''s bound to migrates to another physical cpu.  This can
(should?)
>>> be changed in the future so that the affinity is either set
>>> implicitly when migrating the vcpu, or explictily with a rebind
>>> call by dom0.  In any case the affinity would be reset by the
>>> set_affinity method.
> 
>> Yes, I remember Keir suggested to use interrupt remapping table in
>> vtd to achieve this, not sure that is still ok.
> 
> Relying on the VT-d interrupt remapping table would rule out any Intel
> chipset on the market today, and also the equivalent solution (if any)
used> by AMD and others. 
> 
> It seems better to update the IOAPIC entry or MSI capability structure
> directly when redirecting the interrupt, and let io_apic_write() or
> the equivalent function for MSI rewrite the interrupt remapping table
> if VT-d is enabled.  Not sure how much it would cost to rewrite the
> remapping table and perform the respecive VT-d interrupt entry cache
> flush; it''s difficult to measure without actually having any
available
> hardware.  However, I suspect the cost would in many cases be dwarfed
> by migrating the cache working set and by other associated costs of
> migrating a vcpu. 
> 
> 	eSk
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-Mar-26 20:17 UTC

head link

RE: [Xen-devel] MSI and VT-d interrupt remapping

[Yunhong Jiang]> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:
>> [Yunhong Jiang]
>>> xen-devel-bounces@lists.xensource.com <> wrote:
>>>> You''re right in that Linux does not currently support
this.  You
>>>> can, however, allocate multiple interrupts using MSI-X. 
Anyhow,
>>>> I was not envisioning this feature being used directly for
>>>> passthrough device access.  Rather, I was considering the case
>>>> where a device could be configured to communicate data directly
>>>> into a VM (e.g., using multi-queue NICs) and deliver the
>>>> interrupt to the appropriate VM.  In this case the frontend in
>>>> the guest would not need to see a multi-message MSI device,
only
>>>> the backend in dom0/the driver domain would need to be made
aware
>>>> of it.
>> 
>>> Although I don''t know if any device has such usage model
(Intel''s
>>> VMDq is using MSI-X ), but yes, your usage model will be helpful.
>>> To achive this, maybe we need change the protocol between pci
>>> backend and pci frontend, in fact, maybe the
>>> pci_enable_msi/pci_enable_msix can be commbind, with a flag to
>>> determin if the vector should be continous or not.
>>
>> This is similar to my initial idea as well.  In addition to being
>> contigous the multi-message MSI request would also need to allocate
>> vectors that are properly aligned.
> Yes, but don''t think we need add the implementation now. We can
> change the xen_pci_op to accomondate this requirement, otherwise,
> this will cause more difference with upstream Linux. (Maybe the
> hypercall need changed for this requirement also).
Isn''t this more of a PHYSDEVOP_alloc_irq_vector thing?  That is, dom0
should be able to request a region of contigous vectors.  As for
upstream Linux differences, we should get away with only a few
modifications to the MSI specific parts in dom0.  Also, if contigous
vector regions are wanted, the vector allocation alogorithm in Xen
should be changed to avoid spreading the allocations all over place.
> As for set_irq_affinity, I think it is a general issue, not MSI
> related, we can follow up on it continously.
Agreed.

	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Neil Turton

2008-Apr-02 14:47 UTC

head link

Re: [Xen-devel] MSI and VT-d interrupt remapping

Espen Skoglund wrote:
 > I must confess that I do not know of any device that currently use
 > this feature (perhaps Solarflare or NetXen devices have support for
 > it), and the whole connection with VT-d interreupt remapping is as of
 > now purely academic anyway due to the lack of chipsets with the
 > apropriate feature.

Solarflare would be interested in the ability to pass MSI-X interrupts
through to different guests for our netfront/netback accelerator
plugins.  Our latest chips support MSI-X with multiple queues so they
do not need to use multi-message MSIs.  I don''t know enough about VT-d
interrupt remapping to comment on that.

Cheers, Neil.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2008 - MSI and VT-d interrupt remapping

[Xen-devel] MSI and VT-d interrupt remapping

Re: [Xen-devel] MSI and VT-d interrupt remapping

Re: [Xen-devel] MSI and VT-d interrupt remapping

RE: [Xen-devel] MSI and VT-d interrupt remapping

RE: [Xen-devel] MSI and VT-d interrupt remapping

RE: [Xen-devel] MSI and VT-d interrupt remapping

RE: [Xen-devel] MSI and VT-d interrupt remapping

RE: [Xen-devel] MSI and VT-d interrupt remapping

RE: [Xen-devel] MSI and VT-d interrupt remapping

Re: [Xen-devel] MSI and VT-d interrupt remapping