I''ve seen patches for handling MSIs posted on this list. I know that they are still being worked on, but by the looks of it they are missing some functionality that could prove useful. The proposed patches enable a device to allocate a vector to an MSI interrupt. For MSI-X a set of vectors can be allocated. When configuring MSIs, for each PCI function one configures a specific destination address and message data to be used for interrupt triggering. The message address indicates the destination for the interrupt and the message data essentially indicates the vector to trigger on the destination. Now, MSI also has a mode which allows up to the 5 lower bits of the message data to be set arbitrarily by the device itself. That is, a device can be configured to deliver up to 32 different, contigous vectors aligned within an apropriate boundary. Enabling a device to trigger 32 different vectors on a single interrupt destination may not actually be all that useful. However, with the introduction of VT-d interrupt remapping these 32 different messages can be remapped to to arbitrary vectors *and* destinations---not only to a contigous set of vectors on a single destination. If an MSI capable device was able to make use of the above feature, the device could be set up to generate different interrupts depending on where the incoming interrupt was to be handled. For example, incoming data for a particular guest could trigger an interrupt on the processor where that guest is running. Obviously, a dom0-like backend driver would not be involved in the actual event delivery in these situations. The event would be delivered directly to the frontend. The necessary changes would enable a device driver for an MSI capable device to allocate a range of pirqs and bind these to different frontends. Does this make sense? eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 6/3/08 21:07, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:> If an MSI capable device was able to make use of the above feature, > the device could be set up to generate different interrupts depending > on where the incoming interrupt was to be handled. For example, > incoming data for a particular guest could trigger an interrupt on the > processor where that guest is running. Obviously, a dom0-like backend > driver would not be involved in the actual event delivery in these > situations. The event would be delivered directly to the frontend. > > The necessary changes would enable a device driver for an MSI capable > device to allocate a range of pirqs and bind these to different > frontends.The only tricky bit here is deciding what the interface should be to the hypervisor to specify these allocation constraints. Another thought though: there''s no good reqson for Xen to scatter its irq-vector allocations across the vector space. That''s a holdover from classic-Pentium-era systems, which could lose interrupts if too many got ''queued up'' at any single priority level. So we could actually allocate our vectors contiguously, making it much more likely that you could successfully allocate a contiguous range even without remapping. However, I guess you want to be able to specify different target APICs for different vectors too, so again it comes back to: what should the guest interface to irq-remapping hardware be? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
[Keir Fraser]> On 6/3/08 21:07, "Espen Skoglund" <espen.skoglund@netronome.com> wrote: >> If an MSI capable device was able to make use of the above feature, >> the device could be set up to generate different interrupts >> depending on where the incoming interrupt was to be handled. For >> example, incoming data for a particular guest could trigger an >> interrupt on the processor where that guest is running. Obviously, >> a dom0-like backend driver would not be involved in the actual >> event delivery in these situations. The event would be delivered >> directly to the frontend. >> >> The necessary changes would enable a device driver for an MSI capable >> device to allocate a range of pirqs and bind these to different >> frontends.> The only tricky bit here is deciding what the interface should be to > the hypervisor to specify these allocation constraints.> Another thought though: there''s no good reqson for Xen to scatter > its irq-vector allocations across the vector space. That''s a > holdover from classic-Pentium-era systems, which could lose > interrupts if too many got ''queued up'' at any single priority > level. So we could actually allocate our vectors contiguously, > making it much more likely that you could successfully allocate a > contiguous range even without remapping.> However, I guess you want to be able to specify different target > APICs for different vectors too, so again it comes back to: what > should the guest interface to irq-remapping hardware be?Right. The reason for bringing up this suggestion now rather than later is because MSI support has not yet found its way into mainline. Whoever decides on the interface used for registering MSI and MSI-X interrupts might want to take multi-message MSIs into account as well. I do not think explicitly specifying destination APIC upon allocation is the best idea. Setting the affinity upon binding the interrupt like it''s done today seems like a better approach. This leaves us with dealing with the vectors. My initial thought was to make use of the new msix_entries[] field in the xen_pci_op structure. This field is already used as an in/out parameter for allocating MSI-X interrupts. The pciback_enable_msi() function can then attempt to allocate multiple interrupts instead of a single one, and return the allocated vectors. The current MSI patchset also lacks a set_affinity() function for changing the APIC destination similar to what is done for, e.g., IOAPICs. Also similar to IOAPICs, the MSI support should have something like the io_apic_write_remap_rte() for rewriting the interrupt remapping table when enabled. A special case must exist when setting the interrupt affinity for multiple-message enabled MSI devices. There probably should exist some magic in the set_affinity() function for handling this properly. That is, setting affinity for the whole group of MSI interrupts does not make all that much sense. It makes more sense when one can set the per-interrupt affinity through the interrupt remapping table. It should be evident by now that my suggestions for deciding upon an interface and implementation of it is rather fluffy; borderlining non-existent. This is partly because I''m talking about not yet existing MSI support, but mainly because I''m still new to Xen internals. Nonetheless, I believe it would be good for people working on MSI support to take multi-message and interrupt remapping into account as well. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com <> wrote:> [Keir Fraser] >> On 6/3/08 21:07, "Espen Skoglund" > <espen.skoglund@netronome.com> wrote: >>> If an MSI capable device was able to make use of the above feature, >>> the device could be set up to generate different interrupts >>> depending on where the incoming interrupt was to be handled. For >>> example, incoming data for a particular guest could trigger an >>> interrupt on the processor where that guest is running. Obviously, >>> a dom0-like backend driver would not be involved in the actual >>> event delivery in these situations. The event would be delivered >>> directly to the frontend. >>> >>> The necessary changes would enable a device driver for an MSIcapable>>> device to allocate a range of pirqs and bind these to different >>> frontends. > >> The only tricky bit here is deciding what the interface should be to >> the hypervisor to specify these allocation constraints. > >> Another thought though: there''s no good reqson for Xen to scatter >> its irq-vector allocations across the vector space. That''s a >> holdover from classic-Pentium-era systems, which could lose >> interrupts if too many got ''queued up'' at any single priority >> level. So we could actually allocate our vectors contiguously, >> making it much more likely that you could successfully allocate a >> contiguous range even without remapping. > >> However, I guess you want to be able to specify different target >> APICs for different vectors too, so again it comes back to: what >> should the guest interface to irq-remapping hardware be? > > Right. The reason for bringing up this suggestion now rather than > later is because MSI support has not yet found its way into mainline. > Whoever decides on the interface used for registering MSI and MSI-X > interrupts might want to take multi-message MSIs into account as well.Espen, thanks for your comments. I remember Linux has not such support, so Linux driver will not benifit from such implementation. After all, driver need provide ISR for the interrupts. Of course, we need this feature if any OS has support. I didn''t support this because it may require changes to various common components and need more discussion, while Linux has no support to it. (also I rushed to 3.2 cut-off at that time :$).> > I do not think explicitly specifying destination APIC upon allocation > is the best idea. Setting the affinity upon binding the interrupt > like it''s done today seems like a better approach. This leaves us > with dealing with the vectors.But what should happen when the vcpu is migrated to another physical cpu? I''m not sure the cost to program the interrupt remapping table, otherwise, that is a good choice to achieveh the affinity. As for vector assignment, I agree simpler method is to change the vector assignment in xen as Keir suggestion. Also I suspect we may need support per-CPU vector later, if there are so many vector requested.> > My initial thought was to make use of the new msix_entries[] field in > the xen_pci_op structure. This field is already used as an in/out > parameter for allocating MSI-X interrupts. The pciback_enable_msi() > function can then attempt to allocate multiple interrupts instead of a > single one, and return the allocated vectors. > > The current MSI patchset also lacks a set_affinity() function for > changing the APIC destination similar to what is done for, e.g., > IOAPICs. Also similar to IOAPICs, the MSI support should have > something like the io_apic_write_remap_rte() for rewriting the > interrupt remapping table when enabled.For the set_affinity(), what do you mean of changing the APIC destination? Currently if set guest''s pirq''s affinity, it will only impact event channel. The physical one will only be called once, when the pirq is bound. As for rewriting interrupt remapping table like io_apic_write_remap_rte(), I think it will be added later also. I''m also a bit confused for your statement in previous mail "The necessary changes would enable a device driver for an MSI capable device to allocate a range of pirqs and bind these to different frontends.". What do you mean of different frontends? Really thanks for your suggestion.> > A special case must exist when setting the interrupt affinity for > multiple-message enabled MSI devices. There probably should exist > some magic in the set_affinity() function for handling this properly. > That is, setting affinity for the whole group of MSI interrupts does > not make all that much sense. It makes more sense when one can set > the per-interrupt affinity through the interrupt remapping table. > > It should be evident by now that my suggestions for deciding upon an > interface and implementation of it is rather fluffy; borderlining > non-existent. This is partly because I''m talking about not yet > existing MSI support, but mainly because I''m still new to Xen > internals. Nonetheless, I believe it would be good for people working > on MSI support to take multi-message and interrupt remapping intoaccount> as well. > > eSk > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
[Yunhong Jiang]>> Right. The reason for bringing up this suggestion now rather than >> later is because MSI support has not yet found its way into >> mainline. Whoever decides on the interface used for registering >> MSI and MSI-X interrupts might want to take multi-message MSIs into >> account as well.> Espen, thanks for your comments. I remember Linux has not such > support, so Linux driver will not benifit from such > implementation. After all, driver need provide ISR for the > interrupts. Of course, we need this feature if any OS has support. I > didn''t support this because it may require changes to various common > components and need more discussion, while Linux has no support to > it. (also I rushed to 3.2 cut-off at that time :$).You''re right in that Linux does not currently support this. You can, however, allocate multiple interrupts using MSI-X. Anyhow, I was not envisioning this feature being used directly for passthrough device access. Rather, I was considering the case where a device could be configured to communicate data directly into a VM (e.g., using multi-queue NICs) and deliver the interrupt to the appropriate VM. In this case the frontend in the guest would not need to see a multi-message MSI device, only the backend in dom0/the driver domain would need to be made aware of it.>> I do not think explicitly specifying destination APIC upon >> allocation is the best idea. Setting the affinity upon binding the >> interrupt like it''s done today seems like a better approach. This >> leaves us with dealing with the vectors.> But what should happen when the vcpu is migrated to another physical > cpu? I''m not sure the cost to program the interrupt remapping table, > otherwise, that is a good choice to achieveh the affinity.As you''ve already said, the interrupt affinity is only set when a pirq is bound. The interrupt routing is not redirected if the vcpu it''s bound to migrates to another physical cpu. This can (should?) be changed in the future so that the affinity is either set implicitly when migrating the vcpu, or explictily with a rebind call by dom0. In any case the affinity would be reset by the set_affinity method.>> My initial thought was to make use of the new msix_entries[] field >> in the xen_pci_op structure. This field is already used as an >> in/out parameter for allocating MSI-X interrupts. The >> pciback_enable_msi() function can then attempt to allocate multiple >> interrupts instead of a single one, and return the allocated >> vectors. >> >> The current MSI patchset also lacks a set_affinity() function for >> changing the APIC destination similar to what is done for, e.g., >> IOAPICs. Also similar to IOAPICs, the MSI support should have >> something like the io_apic_write_remap_rte() for rewriting the >> interrupt remapping table when enabled.> For the set_affinity(), what do you mean of changing the APIC > destination? Currently if set guest''s pirq''s affinity, it will only > impact event channel. The physical one will only be called once, > when the pirq is bound.With "changing the APIC destination" I meant changing the destination CPU of an interrupt while keeping the vector, delivery type, etc. intact.> As for rewriting interrupt remapping table like > io_apic_write_remap_rte(), I think it will be added later also.> I''m also a bit confused for your statement in previous mail "The > necessary changes would enable a device driver for an MSI capable > device to allocate a range of pirqs and bind these to different > frontends.". What do you mean of different frontends?Different frontends here means multiple instances of frontends residing in different VMs, all served by a single backend. As eluded to above, the idea is to have a single backend that has direct access to the device, and multiple frontends that somehow share some limited direct access to the device. For example, a multi-queue capable NIC could deliver the packets to the queue in the apropriate VM and raise an interrupt in that VM without involving the domain of the backend driver. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com <> wrote:> [Yunhong Jiang] >>> Right. The reason for bringing up this suggestion now rather than >>> later is because MSI support has not yet found its way into >>> mainline. Whoever decides on the interface used for registering >>> MSI and MSI-X interrupts might want to take multi-message MSIs into >>> account as well. > >> Espen, thanks for your comments. I remember Linux has not such >> support, so Linux driver will not benifit from such >> implementation. After all, driver need provide ISR for the >> interrupts. Of course, we need this feature if any OS has support. I >> didn''t support this because it may require changes to various common >> components and need more discussion, while Linux has no support to >> it. (also I rushed to 3.2 cut-off at that time :$). > > You''re right in that Linux does not currently support this. You can, > however, allocate multiple interrupts using MSI-X. Anyhow, I was not > envisioning this feature being used directly for passthrough device > access. Rather, I was considering the case where a device could be > configured to communicate data directly into a VM (e.g., using > multi-queue NICs) and deliver the interrupt to the appropriate VM. In > this case the frontend in the guest would not need to see a > multi-message MSI device, only the backend in dom0/the driver domain > would need to be made aware of it.Although I don''t know if any device has such usage model (Intel''s VMDq is using MSI-X ), but yes, your usage model will be helpful. To achive this, maybe we need change the protocol between pci backend and pci frontend, in fact, maybe the pci_enable_msi/pci_enable_msix can be commbind, with a flag to determin if the vector should be continous or not. One thing left is, how can the driver domain bind the vector to the frontend VM. Some sanity check mechanism should be added. BTW, can you tell which device may use this feature? I''m a bit interesting on this.> >>> I do not think explicitly specifying destination APIC upon >>> allocation is the best idea. Setting the affinity upon binding the >>> interrupt like it''s done today seems like a better approach. This >>> leaves us with dealing with the vectors. > >> But what should happen when the vcpu is migrated to another physical >> cpu? I''m not sure the cost to program the interrupt remapping table, >> otherwise, that is a good choice to achieveh the affinity. > > As you''ve already said, the interrupt affinity is only set when a pirq > is bound. The interrupt routing is not redirected if the vcpu it''s > bound to migrates to another physical cpu. This can (should?) be > changed in the future so that the affinity is either set implicitly > when migrating the vcpu, or explictily with a rebind call by dom0. In > any case the affinity would be reset by the set_affinity method.Yes, I remember Keir suggested to use interrupt remapping table in vtd to achieve this, not sure that is still ok.> >>> My initial thought was to make use of the new msix_entries[] field >>> in the xen_pci_op structure. This field is already used as an >>> in/out parameter for allocating MSI-X interrupts. The >>> pciback_enable_msi() function can then attempt to allocate multiple >>> interrupts instead of a single one, and return the allocatedvectors.>>> >>> The current MSI patchset also lacks a set_affinity() function for >>> changing the APIC destination similar to what is done for, e.g., >>> IOAPICs. Also similar to IOAPICs, the MSI support should have >>> something like the io_apic_write_remap_rte() for rewriting the >>> interrupt remapping table when enabled. > >> For the set_affinity(), what do you mean of changing the APIC >> destination? Currently if set guest''s pirq''s affinity, it will only >> impact event channel. The physical one will only be called once, >> when the pirq is bound. > > With "changing the APIC destination" I meant changing the destination > CPU of an interrupt while keeping the vector, delivery type, > etc. intact. > >> As for rewriting interrupt remapping table like >> io_apic_write_remap_rte(), I think it will be added later also. > >> I''m also a bit confused for your statement in previous mail "The >> necessary changes would enable a device driver for an MSI capable >> device to allocate a range of pirqs and bind these to different >> frontends.". What do you mean of different frontends? > > Different frontends here means multiple instances of frontends > residing in different VMs, all served by a single backend. As eluded > to above, the idea is to have a single backend that has direct access > to the device, and multiple frontends that somehow share some limited > direct access to the device. For example, a multi-queue capable NIC > could deliver the packets to the queue in the apropriate VM and raise > an interrupt in that VM without involving the domain of the backenddriver. Got it.> > eSk > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
[Yunhong Jiang]> xen-devel-bounces@lists.xensource.com <> wrote: >> You''re right in that Linux does not currently support this. You >> can, however, allocate multiple interrupts using MSI-X. Anyhow, I >> was not envisioning this feature being used directly for >> passthrough device access. Rather, I was considering the case >> where a device could be configured to communicate data directly >> into a VM (e.g., using multi-queue NICs) and deliver the interrupt >> to the appropriate VM. In this case the frontend in the guest >> would not need to see a multi-message MSI device, only the backend >> in dom0/the driver domain would need to be made aware of it.> Although I don''t know if any device has such usage model (Intel''s > VMDq is using MSI-X ), but yes, your usage model will be helpful. > To achive this, maybe we need change the protocol between pci > backend and pci frontend, in fact, maybe the > pci_enable_msi/pci_enable_msix can be commbind, with a flag to > determin if the vector should be continous or not.This is similar to my initial idea as well. In addition to being contigous the multi-message MSI request would also need to allocate vectors that are properly aligned.> One thing left is, how can the driver domain bind the vector to the > frontend VM. Some sanity check mechanism should be added.Well, there exists a domctl for modifying the permissions of a pirq. This could be used to grant pirq access to a frontend domain. Not sure if this is sufficient. Also, as discussed in my previous reply dom0 may need the ability to reset the affinity of an irq when migrating the destination vcpu. Further, a pirq is now always bound to vcpu[0] of a domain (in evtchn_bind_pirq). There is clearly some room for improvement and more flexibility here. Not sure what the best solution is. One option is to allow a guest to re-bind a pirq to set its affinity, and have such expliticly set affinities be automatically updated when the associated vcpu is migrated. Another option is to create unbound ports in a guest domain and let a privileged domain bind pirqs to those port. The privileged domain should then also be allowed to later modify the destination vcpu and set the affinity of the bound pirq.> BTW, can you tell which device may use this feature? I''m a bit > interesting on this.I must confess that I do not know of any device that currently use this feature (perhaps Solarflare or NetXen devices have support for it), and the whole connection with VT-d interreupt remapping is as of now purely academic anyway due to the lack of chipsets with the apropriate feature. However, the whole issue of binding multiple pirqs of a device to different guest domains remains the same even if using MSI-X. Multi-message MSI devices only/mostly add some additional restrictions upon allocating interrupt vectors.>>>> I do not think explicitly specifying destination APIC upon >>>> allocation is the best idea. Setting the affinity upon binding >>>> the interrupt like it''s done today seems like a better approach. >>>> This leaves us with dealing with the vectors. >> >>> But what should happen when the vcpu is migrated to another >>> physical cpu? I''m not sure the cost to program the interrupt >>> remapping table, otherwise, that is a good choice to achieveh the >>> affinity. >> >> As you''ve already said, the interrupt affinity is only set when a >> pirq is bound. The interrupt routing is not redirected if the vcpu >> it''s bound to migrates to another physical cpu. This can (should?) >> be changed in the future so that the affinity is either set >> implicitly when migrating the vcpu, or explictily with a rebind >> call by dom0. In any case the affinity would be reset by the >> set_affinity method.> Yes, I remember Keir suggested to use interrupt remapping table in > vtd to achieve this, not sure that is still ok.Relying on the VT-d interrupt remapping table would rule out any Intel chipset on the market today, and also the equivalent solution (if any) used by AMD and others. It seems better to update the IOAPIC entry or MSI capability structure directly when redirecting the interrupt, and let io_apic_write() or the equivalent function for MSI rewrite the interrupt remapping table if VT-d is enabled. Not sure how much it would cost to rewrite the remapping table and perform the respecive VT-d interrupt entry cache flush; it''s difficult to measure without actually having any available hardware. However, I suspect the cost would in many cases be dwarfed by migrating the cache working set and by other associated costs of migrating a vcpu. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:> [Yunhong Jiang] >> xen-devel-bounces@lists.xensource.com <> wrote: >>> You''re right in that Linux does not currently support this. You >>> can, however, allocate multiple interrupts using MSI-X. Anyhow, I >>> was not envisioning this feature being used directly for >>> passthrough device access. Rather, I was considering the case >>> where a device could be configured to communicate data directly >>> into a VM (e.g., using multi-queue NICs) and deliver the interrupt >>> to the appropriate VM. In this case the frontend in the guest >>> would not need to see a multi-message MSI device, only the backend >>> in dom0/the driver domain would need to be made aware of it. > >> Although I don''t know if any device has such usage model (Intel''s >> VMDq is using MSI-X ), but yes, your usage model will be helpful. >> To achive this, maybe we need change the protocol between pci >> backend and pci frontend, in fact, maybe the >> pci_enable_msi/pci_enable_msix can be commbind, with a flag to >> determin if the vector should be continous or not. > > This is similar to my initial idea as well. In addition to being > contigous the multi-message MSI request would also need to allocate > vectors that are properly aligned.Yes, but don''t think we need add the implementation now. We can change the xen_pci_op to accomondate this requirement, otherwise, this will cause more difference with upstream Linux. (Maybe the hypercall need changed for this requirement also). As for set_irq_affinity, I think it is a general issue, not MSI related, we can follow up on it continously.> >> One thing left is, how can the driver domain bind the vector to the >> frontend VM. Some sanity check mechanism should be added. > > Well, there exists a domctl for modifying the permissions of a pirq. > This could be used to grant pirq access to a frontend domain. Notsure if> this is sufficient. > > Also, as discussed in my previous reply dom0 may need the ability to > reset the affinity of an irq when migrating the destination vcpu. > Further, a pirq is now always bound to vcpu[0] of a domain (in > evtchn_bind_pirq). There is clearly some room for improvement andmore> flexibility here. > > Not sure what the best solution is. One option is to allow a guest to > re-bind a pirq to set its affinity, and have such expliticly set > affinities be automatically updated when the associated vcpu is > migrated. Another option is to create unbound ports in a guest domain > and let a privileged domain bind pirqs to those port. The privileged > domain should then also be allowed to later modify the destination > vcpu and set the affinity of the bound pirq. > > >> BTW, can you tell which device may use this feature? I''m a bit >> interesting on this. > > I must confess that I do not know of any device that currently use > this feature (perhaps Solarflare or NetXen devices have support for > it), and the whole connection with VT-d interreupt remapping is as of > now purely academic anyway due to the lack of chipsets with theapropriate> feature. > > However, the whole issue of binding multiple pirqs of a device to > different guest domains remains the same even if using MSI-X. > Multi-message MSI devices only/mostly add some additional restrictions > upon allocating interrupt vectors. > > >>>>> I do not think explicitly specifying destination APIC upon >>>>> allocation is the best idea. Setting the affinity upon binding >>>>> the interrupt like it''s done today seems like a better approach. >>>>> This leaves us with dealing with the vectors. >>> >>>> But what should happen when the vcpu is migrated to another >>>> physical cpu? I''m not sure the cost to program the interrupt >>>> remapping table, otherwise, that is a good choice to achieveh the >>>> affinity. >>> >>> As you''ve already said, the interrupt affinity is only set when a >>> pirq is bound. The interrupt routing is not redirected if the vcpu >>> it''s bound to migrates to another physical cpu. This can (should?) >>> be changed in the future so that the affinity is either set >>> implicitly when migrating the vcpu, or explictily with a rebind >>> call by dom0. In any case the affinity would be reset by the >>> set_affinity method. > >> Yes, I remember Keir suggested to use interrupt remapping table in >> vtd to achieve this, not sure that is still ok. > > Relying on the VT-d interrupt remapping table would rule out any Intel > chipset on the market today, and also the equivalent solution (if any)used> by AMD and others. > > It seems better to update the IOAPIC entry or MSI capability structure > directly when redirecting the interrupt, and let io_apic_write() or > the equivalent function for MSI rewrite the interrupt remapping table > if VT-d is enabled. Not sure how much it would cost to rewrite the > remapping table and perform the respecive VT-d interrupt entry cache > flush; it''s difficult to measure without actually having any available > hardware. However, I suspect the cost would in many cases be dwarfed > by migrating the cache working set and by other associated costs of > migrating a vcpu. > > eSk_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
[Yunhong Jiang]> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >> [Yunhong Jiang] >>> xen-devel-bounces@lists.xensource.com <> wrote: >>>> You''re right in that Linux does not currently support this. You >>>> can, however, allocate multiple interrupts using MSI-X. Anyhow, >>>> I was not envisioning this feature being used directly for >>>> passthrough device access. Rather, I was considering the case >>>> where a device could be configured to communicate data directly >>>> into a VM (e.g., using multi-queue NICs) and deliver the >>>> interrupt to the appropriate VM. In this case the frontend in >>>> the guest would not need to see a multi-message MSI device, only >>>> the backend in dom0/the driver domain would need to be made aware >>>> of it. >> >>> Although I don''t know if any device has such usage model (Intel''s >>> VMDq is using MSI-X ), but yes, your usage model will be helpful. >>> To achive this, maybe we need change the protocol between pci >>> backend and pci frontend, in fact, maybe the >>> pci_enable_msi/pci_enable_msix can be commbind, with a flag to >>> determin if the vector should be continous or not. >> >> This is similar to my initial idea as well. In addition to being >> contigous the multi-message MSI request would also need to allocate >> vectors that are properly aligned.> Yes, but don''t think we need add the implementation now. We can > change the xen_pci_op to accomondate this requirement, otherwise, > this will cause more difference with upstream Linux. (Maybe the > hypercall need changed for this requirement also).Isn''t this more of a PHYSDEVOP_alloc_irq_vector thing? That is, dom0 should be able to request a region of contigous vectors. As for upstream Linux differences, we should get away with only a few modifications to the MSI specific parts in dom0. Also, if contigous vector regions are wanted, the vector allocation alogorithm in Xen should be changed to avoid spreading the allocations all over place.> As for set_irq_affinity, I think it is a general issue, not MSI > related, we can follow up on it continously.Agreed. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund wrote: > I must confess that I do not know of any device that currently use > this feature (perhaps Solarflare or NetXen devices have support for > it), and the whole connection with VT-d interreupt remapping is as of > now purely academic anyway due to the lack of chipsets with the > apropriate feature. Solarflare would be interested in the ability to pass MSI-X interrupts through to different guests for our netfront/netback accelerator plugins. Our latest chips support MSI-X with multiple queues so they do not need to use multi-message MSIs. I don''t know enough about VT-d interrupt remapping to comment on that. Cheers, Neil. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel