Hi, Keir, These patches are rebased version of Yunhong''s original patches, which were sent out before XEN 3.2 was released. These patches enable MSI support and limited MSI-X support in XEN. Here is the original description of the patches from Yunhong''s mail. The basic idea including: 1) Keep vector global resource owned by xen, while split pirq into per-domain information. 2) Domain0 kernel will operate msi resource for domain0/domU, while QEMU will operate MSI resource for HVM domain. 3) Xen will do EOI for MSI interrupt. Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com <mailto:yunhong.jiang@intel.com> > There are no much changes made compared with the original patches. But there do have some issues that we need your kind comments. 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes the deadlock. During my tests, I do find there can be deadlock with patches applied. When assigned a NIC device to HVM domain, the scenario is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is waiting for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector 0x31) is waiting for injection to HVM domain since it is blocked now; IDE interrupt is waiting for NIC interrupt since NIC interrupt is of high priority but not ACKed by XEN now. When IDE interrupt and NIC interrupt are delivered to the same CPU, and when guest OS is Vista, the phenomenon is easy to be observed. 2> Without ACK-NEW, some naughty NIC devices as we observed will bring IRQ storms. For this phenomenon, I think Yunhong can comment more. Basically, writing EOI without mask the source of MSI will bring IRQ storm. Although the reason is under investigation, XEN should anyhow handle such bogous device, right? 3> Using ACK-OLD and masking the MSI when writing EOI can be solution. However, XEN does not own PCI configuration spaces. We also tried some work arounds. One work around might be using a timer to force a EOI within some time interval. This method is already implemented in VT-D''s code. However, with this approach, if the timer is fired and EOI is written, this is essentially the same apporach as option 2. Another approach is to never deliver these two IRQs to the same CPU. But this is really ugly and can not be applied to UP. We have also considered using VT-D 2 interrupt remapping feature. According to the spec, there is no bit in the remapping table to mask the interrupt. Therefore, this can not be combined with option 2 to solve the issue. Masking the interrupt still needs accessing PCI configuration spaces. We think the most clean method may be to move ownership from dom0 to VMM. However, this is a great change. This should be well discussed in community and need your comments. These patch series sent out can be served as a discussion materials. What is your comments on the patches and the issues, Keir? Thanks! Haitao Shan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks, I¹ll have to look at the patches regarding the per-domain pirq changes. That sounds like it probably makes sense, but I seem to remember there were big changes to the irq architecture and irq naming in the hypervisor in previous iterations of these patches, which I didn¹t understand. This IRQ storm issue still needs properly resolving. Noone has yet explained how a message-based interrupt source can cause an irq storm. Storms are inherently a property of level-triggered sources, where ACK/EOI immediately causes re-sampling of the interrupt line and re-assertion of the interrupt at the CPU. How can anything similar happen with MSI? You (Intel) are probably uniquely placed to answer this question, since you manufacture the chipset and NIC which exhibit this problem. -- Keir On 27/3/08 06:55, "Shan, Haitao" <haitao.shan@intel.com> wrote:> The basic idea including: > 1) Keep vector global resource owned by xen, while split pirq into per-domain > information. > 2) Domain0 kernel will operate msi resource for domain0/domU, while QEMU will > operate MSI resource for HVM domain. > 3) Xen will do EOI for MSI interrupt. > > Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com > <mailto:yunhong.jiang@intel.com> > > > There are no much changes made compared with the original patches. But > there do have some issues that we need your kind comments. > 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes the > deadlock. > During my tests, I do find there can be deadlock with patches > applied. When assigned a NIC device to HVM domain, the scenario is: Dom0 is > waiting to IDE interrupt (vector 0x21); HVM domain is waiting for qemu¹s IDE > emulation and thus blocked; NIC interrupt (MSI vector 0x31) is waiting for > injection to HVM domain since it is blocked now; IDE interrupt is waiting for > NIC interrupt since NIC interrupt is of high priority but not ACKed by XEN > now. When IDE interrupt and NIC interrupt are delivered to the same CPU, and > when guest OS is Vista, the phenomenon is easy to be observed. > 2> Without ACK-NEW, some naughty NIC devices as we observed will bring IRQ > storms. For this phenomenon, I think Yunhong can comment more. Basically, > writing EOI without mask the source of MSI will bring IRQ storm. Although the > reason is under investigation, XEN should anyhow handle such bogous device, > right? > 3> Using ACK-OLD and masking the MSI when writing EOI can be solution. > However, XEN does not own PCI configuration spaces. > > We also tried some work arounds. > One work around might be using a timer to force a EOI within some time > interval. This method is already implemented in VT-D¹s code. However, with > this approach, if the timer is fired and EOI is written, this is essentially > the same apporach as option 2. > Another approach is to never deliver these two IRQs to the same CPU. But > this is really ugly and can not be applied to UP. > We have also considered using VT-D 2 interrupt remapping feature. > According to the spec, there is no bit in the remapping table to mask the > interrupt. Therefore, this can not be combined with option 2 to solve the > issue. Masking the interrupt still needs accessing PCI configuration spaces. > > We think the most clean method may be to move ownership from dom0 to VMM. > However, this is a great change. This should be well discussed in community > and need your comments. > > These patch series sent out can be served as a discussion materials. What > is your comments on the patches and the issues, Keir? > > Thanks!_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Preventing interrupt storms by masking the interrupt in the MSI/MSI-X capabilty structure or MSI-X table within the interrupt handler is insane. It requires accesses over the PCI/PCIe bus and is clearly something you want to avoid on the fast path. eSk [Haitao Shan]> There are no much changes made compared with the original patches. > But there do have some issues that we need your kind comments.> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes the > deadlock. > During my tests, I do find there can be deadlock with patches > applied. When assigned a NIC device to HVM domain, the scenario is: Dom0 > is waiting to IDE interrupt (vector 0x21); HVM domain is waiting for > qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector 0x31) > is waiting for injection to HVM domain since it is blocked now; IDE > interrupt is waiting for NIC interrupt since NIC interrupt is of high > priority but not ACKed by XEN now. When IDE interrupt and NIC interrupt > are delivered to the same CPU, and when guest OS is Vista, the > phenomenon is easy to be observed.> 2> Without ACK-NEW, some naughty NIC devices as we observed will > bring IRQ storms. For this phenomenon, I think Yunhong can comment more. > Basically, writing EOI without mask the source of MSI will bring IRQ > storm. Although the reason is under investigation, XEN should anyhow > handle such bogous device, right?> 3> Using ACK-OLD and masking the MSI when writing EOI can be > solution. However, XEN does not own PCI configuration spaces._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund wrote:> > Preventing interrupt storms by masking the interrupt in the MSI/MSI-X > capabilty structure or MSI-X table within the interrupt handler is > insane. It requires accesses over the PCI/PCIe bus and is clearly > something you want to avoid on the fast path. > > eSk >I agree. Interrupt mitigation schemes should already be part of the host/device interface that is being assigned to the HVM guest. The HVM guest should already know how to use it.> > [Haitao Shan] > > There are no much changes made compared with the original > patches. > > But there do have some issues that we need your kind comments. > > > 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes > the > > deadlock. > > During my tests, I do find there can be deadlock with > patches > > applied. When assigned a NIC device to HVM domain, the scenario is: > Dom0 > > is waiting to IDE interrupt (vector 0x21); HVM domain is waiting for > > qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector > 0x31) > > is waiting for injection to HVM domain since it is blocked now; IDE > > interrupt is waiting for NIC interrupt since NIC interrupt is ofhigh> > priority but not ACKed by XEN now. When IDE interrupt and NIC > interrupt > > are delivered to the same CPU, and when guest OS is Vista, the > > phenomenon is easy to be observed. > > > 2> Without ACK-NEW, some naughty NIC devices as we observed will > > bring IRQ storms. For this phenomenon, I think Yunhong can comment > more. > > Basically, writing EOI without mask the source of MSI will bring IRQ > > storm. Although the reason is under investigation, XEN should anyhow > > handle such bogous device, right? >Device assignment should deliver the device to the HVM, with all of its warts as well as all of its features. Isn''t the ultimate point is to use the same driver in the HVM guest whether Xen is present or not? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Not masking each time when interrupt happen, instead, we do that only when the second interrupt happen while the previous one is still pending, it should be something like handle_edge_irqs() in upstream linux. -- Yunhong Jiang Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote:> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X > capabilty structure or MSI-X table within the interrupt handler is > insane. It requires accesses over the PCI/PCIe bus and is clearly > something you want to avoid on the fast path. > > eSk > > > [Haitao Shan] >> There are no much changes made compared with the originalpatches.>> But there do have some issues that we need your kind comments. > >> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causesthe>> deadlock. During my tests, I do find there can be deadlockwith>> patches applied. When assigned a NIC device to HVM domain, thescenario>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain iswaiting>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector0x31)>> is waiting for injection to HVM domain since it is blocked now; IDE >> interrupt is waiting for NIC interrupt since NIC interrupt is of high >> priority but not ACKed by XEN now. When IDE interrupt and NICinterrupt>> are delivered to the same CPU, and when guest OS is Vista, the >> phenomenon is easy to be observed. > >> 2> Without ACK-NEW, some naughty NIC devices as we observed will >> bring IRQ storms. For this phenomenon, I think Yunhong can commentmore.>> Basically, writing EOI without mask the source of MSI will bring IRQ >> storm. Although the reason is under investigation, XEN should anyhow >> handle such bogous device, right? > >> 3> Using ACK-OLD and masking the MSI when writing EOI can be >> solution. However, XEN does not own PCI configuration spaces._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
This requires the guest to call back into Xen to signal EOI (as we already do for legacy level-triggered interrupts). We shouldn''t really need to do that for MSI and it''s rather more expensive than a couple of accesses over the PCI bus! It''s this callback into Xen, which we do not really understand why it''s needed, which I''m railing against. Is there some fundamental aspect of MSI we do not understand, or are we working around one brain-dead or buggy device? -- Keir On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Not masking each time when interrupt happen, instead, we do that only > when the second interrupt happen while the previous one is still > pending, it should be something like handle_edge_irqs() in upstream > linux. > > -- Yunhong Jiang > > Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X >> capabilty structure or MSI-X table within the interrupt handler is >> insane. It requires accesses over the PCI/PCIe bus and is clearly >> something you want to avoid on the fast path. >> >> eSk >> >> >> [Haitao Shan] >>> There are no much changes made compared with the original > patches. >>> But there do have some issues that we need your kind comments. >> >>> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes > the >>> deadlock. During my tests, I do find there can be deadlock > with >>> patches applied. When assigned a NIC device to HVM domain, the > scenario >>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is > waiting >>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector > 0x31) >>> is waiting for injection to HVM domain since it is blocked now; IDE >>> interrupt is waiting for NIC interrupt since NIC interrupt is of high >>> priority but not ACKed by XEN now. When IDE interrupt and NIC > interrupt >>> are delivered to the same CPU, and when guest OS is Vista, the >>> phenomenon is easy to be observed. >> >>> 2> Without ACK-NEW, some naughty NIC devices as we observed will >>> bring IRQ storms. For this phenomenon, I think Yunhong can comment > more. >>> Basically, writing EOI without mask the source of MSI will bring IRQ >>> storm. Although the reason is under investigation, XEN should anyhow >>> handle such bogous device, right? >> >>> 3> Using ACK-OLD and masking the MSI when writing EOI can be >>> solution. However, XEN does not own PCI configuration spaces. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I''d give some experiement I did after I discovered this issue. The device was a 82575EB NIC card, the driver I used was igb 1.0.8 (search http://sourceforge.net/project/showfiles.php?group_id=42302 for it). LSC interrupt is a line status change interrupt. It can happen physically , or it can be triggered as the driver did in igb_open() in igb_main.c line 1496, which write to a special register (E1000_ICS) to trigger an interrupt event. I did some experiemnt in linux 2.6.23 again with this driver, I try to a) change the handle_edge_irqs() to mask/ack to only ack the interrupt if the interrupt happen when the previous one is on way, see the patch below, b) commented out line 1496 in the driver. The investigation result is, 1) if mask and ack the interrupt, the interrupt will happen 3 times, the last 2 is masked because they happened when the first one is still pending for ISR''s handler, the system is ok. 2) if ack and no-mask the interrupt, the interrupt will happen continously, the system hang for ever. 3) if ack and no-mask the interrupt, and I remove line 1496 (i.e. no software trigger interrupt), the intrrupt will happen twice, system is ok. So I suppose the problem happens only if trigger the interrupt by software. I consulted the HW engineer also but didn''t get confirmation, the only answer I got is, the PCI-E need a rising edge before send the 2nd interrupt :( I''m not sure if there are any other BRAIN-DEAD device like this, I only have this device to test MSI-X function, but we may need make sure it will not break the whole system. The call-back to guest because we are using the ACK-new method to work around this issue. Yes, it is expensive, Also, this ACK-new method may cause deadlock as Haitao suggested in the mail. But if we move the config space to HV, then we don''t need this ACK-new method, that should be ok, but admittedly, that should be the last method we we turn to, since config-space should be owned by domain0. Thanks -- Yunhong Jiang The patch to ack and no-mask the MSI-x interrupt is below: --- kernel/irq/chip.c 2008-03-28 13:23:51.000000000 -0400 +++ ../linux-2.6.23/kernel/irq/chip.c 2007-10-09 16:31:38.000000000 -0400 @@ -439,9 +439,7 @@ * the handler was running. If all pending interrupts are handled, the * loop is left. */ - -extern struct irq_chip msi_chip ; -void +void fastcall handle_edge_irq(unsigned int irq, struct irq_desc *desc) { const unsigned int cpu = smp_processor_id(); @@ -457,23 +455,11 @@ */ if (unlikely((desc->status & (IRQ_INPROGRESS | IRQ_DISABLED)) || !desc->action)) { - - if (desc->chip == &msi_chip) - printk("mask msi chip irq %x cpu %x desc->status %x desc->action %p tsc %lx\n", irq, cpu, desc->status, desc->action, tsc_this); - desc->status |= (IRQ_PENDING | IRQ_MASKED); - if (desc->chip == &msi_chip) - { - desc->chip->ack(irq); - }else mask_ack_irq(desc, irq); - goto out_unlock; } Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> This requires the guest to call back into Xen to signal EOI (as wealready> do for legacy level-triggered interrupts). We shouldn''t really > need to do > that for MSI and it''s rather more expensive than a couple of > accesses over > the PCI bus! > > It''s this callback into Xen, which we do not really understand whyit''s> needed, which I''m railing against. Is there some fundamental > aspect of MSI > we do not understand, or are we working around one brain-dead or buggy > device? > > -- Keir > > On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Not masking each time when interrupt happen, instead, we do that only >> when the second interrupt happen while the previous one is still >> pending, it should be something like handle_edge_irqs() in upstreamlinux.>> >> -- Yunhong Jiang >> >> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >>> Preventing interrupt storms by masking the interrupt in theMSI/MSI-X>>> capabilty structure or MSI-X table within the interrupt handler is >>> insane. It requires accesses over the PCI/PCIe bus and is clearly >>> something you want to avoid on the fast path. >>> >>> eSk >>> >>> >>> [Haitao Shan] >>>> There are no much changes made compared with the originalpatches.>>>> But there do have some issues that we need your kind comments. >>> >>>> 1> ACK-NEW method is necessary to avoid IRQ storm. But it causesthe>>>> deadlock. During my tests, I do find there can be deadlockwith>>>> patches applied. When assigned a NIC device to HVM domain, thescenario>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain iswaiting>>>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSIvector>>>> 0x31) is waiting for injection to HVM domain since it is blockednow; IDE>>>> interrupt is waiting for NIC interrupt since NIC interrupt is ofhigh>>>> priority but not ACKed by XEN now. When IDE interrupt and NICinterrupt>>>> are delivered to the same CPU, and when guest OS is Vista, the >>>> phenomenon is easy to be observed. >>> >>>> 2> Without ACK-NEW, some naughty NIC devices as we observed will >>>> bring IRQ storms. For this phenomenon, I think Yunhong can commentmore.>>>> Basically, writing EOI without mask the source of MSI will bringIRQ>>>> storm. Although the reason is under investigation, XEN shouldanyhow>>>> handle such bogous device, right? >>> >>>> 3> Using ACK-OLD and masking the MSI when writing EOI can be >>>> solution. However, XEN does not own PCI configuration spaces. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/3/08 08:40, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> The investigation result is, > 1) if mask and ack the interrupt, the interrupt will happen 3 times, the > last 2 is masked because they happened when the first one is still > pending for ISR''s handler, the system is ok.How can you tell it happened three times? If the interrupt is pending in the ISR then only one further pending interrupt can become visible to software as there is only one pending bit per vector in the IRR.> So I suppose the problem happens only if trigger the interrupt by > software. I consulted the HW engineer also but didn''t get confirmation, > the only answer I got is, the PCI-E need a rising edge before send the > 2nd interrupt :(That answer means very little to me. One interesting question to have answered would be: is this a closed-loop or open-loop interrupt storm? I.e., does the device somehow detect API EOI and then trigger re-send of the MSI (closed loop) or is this an initialisation-time-only open-loop storm where the device is spitting out the MSI regularly until some device register gets written by the interrupt service routine? Given the circumstances, I''m inclined to think it is the latter. Especially since I think the former is impossible as EPIC EOI is not visible outside the processor unless the interrupt came from a level-triggered IO-APIC pin, and even then the EOI would not be visible across the PCI bus! Also it seems *very* likely that this is just an initialisation-time thing, and the device probably behaves very nicely after it is bootstrapped. In light of this I think we should treat MSI sources as ACKTYPE_NONE in Xen (i.e, require no callback from guest to hypervisor on completion of the interrupt handler). We can then handle the interrupt storm entirely within the hypervisor by detecting the storm and masking the interrupt and only unmasking on some timeout. In your tests, how aggressive was the IRQ storm? If you looked at the interrupted EIP on each interrupt, was it immediately after the APIC was EOIed and EFLAGS.IF set back to 1, or was it some time after? This tells us how aggressively the device is sending out EOIs, and may determine how cunning we need to be regarding interrupt storm detection.> I''m not sure if there are any other BRAIN-DEAD device like this, I only > have this device to test MSI-X function, but we may need make sure it > will not break the whole system.Yes, we have to handle this case, unfortunately.> The call-back to guest because we are using the ACK-new method to work > around this issue. Yes, it is expensive, Also, this ACK-new method may > cause deadlock as Haitao suggested in the mail.Yes, that sucks. See my previous email -- if possible it would be great to teach Xen enough about the PCI config space to be able to mask MSIs.> But if we move the config space to HV, then we don''t need this ACK-new > method, that should be ok, but admittedly, that should be the last > method we we turn to, since config-space should be owned by domain0.A partial movement into the hypervisor may be the best of a choice of evils. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen-devel-bounces@lists.xensource.com <> wrote:> On 28/3/08 08:40, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> The investigation result is, >> 1) if mask and ack the interrupt, the interrupt will happen 3 times,the>> last 2 is masked because they happened when the first one is still >> pending for ISR''s handler, the system is ok. > > How can you tell it happened three times? If the interrupt is > pending in the > ISR then only one further pending interrupt can become visible > to software > as there is only one pending bit per vector in the IRR.There are two type of msi interrupt, one for receive/transmit, one for other (this is the one cuase storm). I add printk if interrupt happen while previous is in progress. Then I added the print number and the output in /prot/interrupt. The output in /prco/interrupt is only 1.> >> So I suppose the problem happens only if trigger the interrupt by >> software. I consulted the HW engineer also but didn''t getconfirmation,>> the only answer I got is, the PCI-E need a rising edge before sendthe>> 2nd interrupt :( > > That answer means very little to me. One interesting question to have > answered would be: is this a closed-loop or open-loop > interrupt storm? I.e., > does the device somehow detect API EOI and then trigger > re-send of the MSI > (closed loop) or is this an initialisation-time-only open-loop > storm where > the device is spitting out the MSI regularly until some deviceregister gets> written by the interrupt service routine? > > Given the circumstances, I''m inclined to think it is the > latter. Especially > since I think the former is impossible as EPIC EOI is not > visible outside > the processor unless the interrupt came from a level-triggered > IO-APIC pin, > and even then the EOI would not be visible across the PCI bus! > > Also it seems *very* likely that this is just an > initialisation-time thing, > and the device probably behaves very nicely after it is > bootstrapped. InI can''t tell this becuase this interrupt didn''t happen again after the device is up. Maybe I can change the driver to do more experiement.> light of this I think we should treat MSI sources as > ACKTYPE_NONE in Xen > (i.e, require no callback from guest to hypervisor on completion ofthe> interrupt handler). We can then handle the interrupt storm > entirely within > the hypervisor by detecting the storm and masking the > interrupt and only > unmasking on some timeout. > > In your tests, how aggressive was the IRQ storm? If you looked at the > interrupted EIP on each interrupt, was it immediately after > the APIC was > EOIed and EFLAGS.IF set back to 1, or was it some time after? > This tells us > how aggressively the device is sending out EOIs, and may determine how > cunning we need to be regarding interrupt storm detection.I will try that.> >> I''m not sure if there are any other BRAIN-DEAD device like this, Ionly>> have this device to test MSI-X function, but we may need make sure it >> will not break the whole system. > > Yes, we have to handle this case, unfortunately. > >> The call-back to guest because we are using the ACK-new method towork>> around this issue. Yes, it is expensive, Also, this ACK-new methodmay>> cause deadlock as Haitao suggested in the mail. > > Yes, that sucks. See my previous email -- if possible it would > be great to > teach Xen enough about the PCI config space to be able to mask MSIs.In fact, currently xen is already tryting to access config space, althought that is a bug still currently. In vt-d, xen try to access FLR directly :)> >> But if we move the config space to HV, then we don''t need thisACK-new>> method, that should be ok, but admittedly, that should be the last >> method we we turn to, since config-space should be owned by domain0. > > A partial movement into the hypervisor may be the best of a > choice of evils.Sure, we will do that!> -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
That is true. I was quite puzzled with the requirement of the callback into Xen myself. In standard Linux MSI interrupts are treated as edge triggered and are just acked in the local APIC upon delivery. eSk [Keir Fraser]> This requires the guest to call back into Xen to signal EOI (as we already > do for legacy level-triggered interrupts). We shouldn''t really need to do > that for MSI and it''s rather more expensive than a couple of accesses over > the PCI bus!> It''s this callback into Xen, which we do not really understand why it''s > needed, which I''m railing against. Is there some fundamental aspect of MSI > we do not understand, or are we working around one brain-dead or buggy > device?> -- Keir> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Not masking each time when interrupt happen, instead, we do that only >> when the second interrupt happen while the previous one is still >> pending, it should be something like handle_edge_irqs() in upstream >> linux. >> >> -- Yunhong Jiang >> >> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X >>> capabilty structure or MSI-X table within the interrupt handler is >>> insane. It requires accesses over the PCI/PCIe bus and is clearly >>> something you want to avoid on the fast path. >>> >>> eSk >>> >>> >>> [Haitao Shan] >>>> There are no much changes made compared with the original >> patches. >>>> But there do have some issues that we need your kind comments. >>>1> ACK-NEW method is necessary to avoid IRQ storm. But it causes>> the >>>> deadlock. During my tests, I do find there can be deadlock >> with >>>> patches applied. When assigned a NIC device to HVM domain, the >> scenario >>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is >> waiting >>>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector >> 0x31) >>>> is waiting for injection to HVM domain since it is blocked now; IDE >>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high >>>> priority but not ACKed by XEN now. When IDE interrupt and NIC >> interrupt >>>> are delivered to the same CPU, and when guest OS is Vista, the >>>> phenomenon is easy to be observed. >>>2> Without ACK-NEW, some naughty NIC devices as we observed will>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment >> more. >>>> Basically, writing EOI without mask the source of MSI will bring IRQ >>>> storm. Although the reason is under investigation, XEN should anyhow >>>> handle such bogous device, right? >>>3> Using ACK-OLD and masking the MSI when writing EOI can be>>>> solution. However, XEN does not own PCI configuration spaces. >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I think Linux EOIs on ->end() not on ->ack(). Which is fine since Linux doesn''t defer or otherwise schedule ISR handlers. -- Keir On 28/3/08 11:37, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:> That is true. I was quite puzzled with the requirement of the > callback into Xen myself. In standard Linux MSI interrupts are > treated as edge triggered and are just acked in the local APIC upon > delivery. > > eSk > > > > [Keir Fraser] >> This requires the guest to call back into Xen to signal EOI (as we already >> do for legacy level-triggered interrupts). We shouldn''t really need to do >> that for MSI and it''s rather more expensive than a couple of accesses over >> the PCI bus! > >> It''s this callback into Xen, which we do not really understand why it''s >> needed, which I''m railing against. Is there some fundamental aspect of MSI >> we do not understand, or are we working around one brain-dead or buggy >> device? > >> -- Keir > >> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Not masking each time when interrupt happen, instead, we do that only >>> when the second interrupt happen while the previous one is still >>> pending, it should be something like handle_edge_irqs() in upstream >>> linux. >>> >>> -- Yunhong Jiang >>> >>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X >>>> capabilty structure or MSI-X table within the interrupt handler is >>>> insane. It requires accesses over the PCI/PCIe bus and is clearly >>>> something you want to avoid on the fast path. >>>> >>>> eSk >>>> >>>> >>>> [Haitao Shan] >>>>> There are no much changes made compared with the original >>> patches. >>>>> But there do have some issues that we need your kind comments. >>>> > 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes >>> the >>>>> deadlock. During my tests, I do find there can be deadlock >>> with >>>>> patches applied. When assigned a NIC device to HVM domain, the >>> scenario >>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is >>> waiting >>>>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector >>> 0x31) >>>>> is waiting for injection to HVM domain since it is blocked now; IDE >>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high >>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC >>> interrupt >>>>> are delivered to the same CPU, and when guest OS is Vista, the >>>>> phenomenon is easy to be observed. >>>> > 2> Without ACK-NEW, some naughty NIC devices as we observed will >>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment >>> more. >>>>> Basically, writing EOI without mask the source of MSI will bring IRQ >>>>> storm. Although the reason is under investigation, XEN should anyhow >>>>> handle such bogous device, right? >>>> > 3> Using ACK-OLD and masking the MSI when writing EOI can be >>>>> solution. However, XEN does not own PCI configuration spaces. >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Just checked this. Linux does the local APIC EOI on ->ack(). eSk [Keir Fraser]> I think Linux EOIs on ->end() not on ->ack(). Which is fine since > Linux doesn''t defer or otherwise schedule ISR handlers.> -- Keir> On 28/3/08 11:37, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:>> That is true. I was quite puzzled with the requirement of the >> callback into Xen myself. In standard Linux MSI interrupts are >> treated as edge triggered and are just acked in the local APIC upon >> delivery. >> >> eSk >> >> >> >> [Keir Fraser] >>> This requires the guest to call back into Xen to signal EOI (as we already >>> do for legacy level-triggered interrupts). We shouldn''t really need to do >>> that for MSI and it''s rather more expensive than a couple of accesses over >>> the PCI bus! >> >>> It''s this callback into Xen, which we do not really understand why it''s >>> needed, which I''m railing against. Is there some fundamental aspect of MSI >>> we do not understand, or are we working around one brain-dead or buggy >>> device? >> >>> -- Keir >> >>> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >>>> Not masking each time when interrupt happen, instead, we do that only >>>> when the second interrupt happen while the previous one is still >>>> pending, it should be something like handle_edge_irqs() in upstream >>>> linux. >>>> >>>> -- Yunhong Jiang >>>> >>>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >>>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X >>>>> capabilty structure or MSI-X table within the interrupt handler is >>>>> insane. It requires accesses over the PCI/PCIe bus and is clearly >>>>> something you want to avoid on the fast path. >>>>> >>>>> eSk >>>>> >>>>> >>>>> [Haitao Shan] >>>>>> There are no much changes made compared with the original >>>> patches. >>>>>> But there do have some issues that we need your kind comments. >>>>>1> ACK-NEW method is necessary to avoid IRQ storm. But it causes>>>> the >>>>>> deadlock. During my tests, I do find there can be deadlock >>>> with >>>>>> patches applied. When assigned a NIC device to HVM domain, the >>>> scenario >>>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is >>>> waiting >>>>>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector >>>> 0x31) >>>>>> is waiting for injection to HVM domain since it is blocked now; IDE >>>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high >>>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC >>>> interrupt >>>>>> are delivered to the same CPU, and when guest OS is Vista, the >>>>>> phenomenon is easy to be observed. >>>>>2> Without ACK-NEW, some naughty NIC devices as we observed will>>>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment >>>> more. >>>>>> Basically, writing EOI without mask the source of MSI will bring IRQ >>>>>> storm. Although the reason is under investigation, XEN should anyhow >>>>>> handle such bogous device, right? >>>>>3> Using ACK-OLD and masking the MSI when writing EOI can be>>>>>> solution. However, XEN does not own PCI configuration spaces. >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >> >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Oh yes, that is true. They then have special logic for detecting nested delivery and mask/unmask in that case. Fair enough, and similar to what we should do in Xen. -- Keir On 28/3/08 12:15, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:> Just checked this. Linux does the local APIC EOI on ->ack(). > > eSk > > > [Keir Fraser] >> I think Linux EOIs on ->end() not on ->ack(). Which is fine since >> Linux doesn''t defer or otherwise schedule ISR handlers. > >> -- Keir > >> On 28/3/08 11:37, "Espen Skoglund" <espen.skoglund@netronome.com> wrote: > >>> That is true. I was quite puzzled with the requirement of the >>> callback into Xen myself. In standard Linux MSI interrupts are >>> treated as edge triggered and are just acked in the local APIC upon >>> delivery. >>> >>> eSk >>> >>> >>> >>> [Keir Fraser] >>>> This requires the guest to call back into Xen to signal EOI (as we already >>>> do for legacy level-triggered interrupts). We shouldn''t really need to do >>>> that for MSI and it''s rather more expensive than a couple of accesses over >>>> the PCI bus! >>> >>>> It''s this callback into Xen, which we do not really understand why it''s >>>> needed, which I''m railing against. Is there some fundamental aspect of MSI >>>> we do not understand, or are we working around one brain-dead or buggy >>>> device? >>> >>>> -- Keir >>> >>>> On 28/3/08 01:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>> >>>>> Not masking each time when interrupt happen, instead, we do that only >>>>> when the second interrupt happen while the previous one is still >>>>> pending, it should be something like handle_edge_irqs() in upstream >>>>> linux. >>>>> >>>>> -- Yunhong Jiang >>>>> >>>>> Espen Skoglund <mailto:espen.skoglund@netronome.com> wrote: >>>>>> Preventing interrupt storms by masking the interrupt in the MSI/MSI-X >>>>>> capabilty structure or MSI-X table within the interrupt handler is >>>>>> insane. It requires accesses over the PCI/PCIe bus and is clearly >>>>>> something you want to avoid on the fast path. >>>>>> >>>>>> eSk >>>>>> >>>>>> >>>>>> [Haitao Shan] >>>>>>> There are no much changes made compared with the original >>>>> patches. >>>>>>> But there do have some issues that we need your kind comments. >>>>>> > 1> ACK-NEW method is necessary to avoid IRQ storm. But it causes >>>>> the >>>>>>> deadlock. During my tests, I do find there can be deadlock >>>>> with >>>>>>> patches applied. When assigned a NIC device to HVM domain, the >>>>> scenario >>>>>>> is: Dom0 is waiting to IDE interrupt (vector 0x21); HVM domain is >>>>> waiting >>>>>>> for qemu''s IDE emulation and thus blocked; NIC interrupt (MSI vector >>>>> 0x31) >>>>>>> is waiting for injection to HVM domain since it is blocked now; IDE >>>>>>> interrupt is waiting for NIC interrupt since NIC interrupt is of high >>>>>>> priority but not ACKed by XEN now. When IDE interrupt and NIC >>>>> interrupt >>>>>>> are delivered to the same CPU, and when guest OS is Vista, the >>>>>>> phenomenon is easy to be observed. >>>>>> > 2> Without ACK-NEW, some naughty NIC devices as we observed will >>>>>>> bring IRQ storms. For this phenomenon, I think Yunhong can comment >>>>> more. >>>>>>> Basically, writing EOI without mask the source of MSI will bring IRQ >>>>>>> storm. Although the reason is under investigation, XEN should anyhow >>>>>>> handle such bogous device, right? >>>>>> > 3> Using ACK-OLD and masking the MSI when writing EOI can be >>>>>>> solution. However, XEN does not own PCI configuration spaces. >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>> >>> >>> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir, when I try to get the ip address today, I suddenly found I can''t reproduce it anymore, also orginally if I removed the code that trigger the software LSC interrupt, the NIC can still work and get IP address, but now if I remove that code, the NIC can''t work anymore. It is really strange to me, I did''t change anything to the system. Also I don''t know any changes in the lab environment that may cause this change. But I do can reproduce it before each time. Really frustrated to get this :-( , do you think we still need move the config space access down, now the only reasons to move this down is, ack_edge_ioapic_irq() did the mask, and this mask can make HV more robust. Thanks -- Yunhong Jiang Jiang, Yunhong <> wrote:> xen-devel-bounces@lists.xensource.com <> wrote: >> On 28/3/08 08:40, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >>> The investigation result is, >>> 1) if mask and ack the interrupt, the interrupt will happen 3 times,the>>> last 2 is masked because they happened when the first one is still >>> pending for ISR''s handler, the system is ok. >> >> How can you tell it happened three times? If the interrupt is pendingin>> the ISR then only one further pending interrupt can become visible >> to software >> as there is only one pending bit per vector in the IRR. > > There are two type of msi interrupt, one for receive/transmit, > one for other (this is the one cuase storm). I add printk if > interrupt happen while previous is in progress. Then I added > the print number and the output in /prot/interrupt. The output in > /prco/interrupt is only 1. > >> >>> So I suppose the problem happens only if trigger the interrupt by >>> software. I consulted the HW engineer also but didn''t getconfirmation,>>> the only answer I got is, the PCI-E need a rising edge before sendthe>>> 2nd interrupt :( >> >> That answer means very little to me. One interesting question to have >> answered would be: is this a closed-loop or open-loop >> interrupt storm? I.e., >> does the device somehow detect API EOI and then trigger >> re-send of the MSI >> (closed loop) or is this an initialisation-time-only open-loop >> storm where >> the device is spitting out the MSI regularly until some deviceregister>> gets written by the interrupt service routine? >> >> Given the circumstances, I''m inclined to think it is the >> latter. Especially >> since I think the former is impossible as EPIC EOI is not >> visible outside >> the processor unless the interrupt came from a level-triggeredIO-APIC pin,>> and even then the EOI would not be visible across the PCI bus! >> >> Also it seems *very* likely that this is just an >> initialisation-time thing, >> and the device probably behaves very nicely after it is >> bootstrapped. In > > I can''t tell this becuase this interrupt didn''t happen again > after the device is up. Maybe I can change the driver to do more > experiement. > >> light of this I think we should treat MSI sources as >> ACKTYPE_NONE in Xen >> (i.e, require no callback from guest to hypervisor on completion ofthe>> interrupt handler). We can then handle the interrupt storm >> entirely within >> the hypervisor by detecting the storm and masking the >> interrupt and only >> unmasking on some timeout. >> >> In your tests, how aggressive was the IRQ storm? If you looked at the >> interrupted EIP on each interrupt, was it immediately after >> the APIC was >> EOIed and EFLAGS.IF set back to 1, or was it some time after? >> This tells us >> how aggressively the device is sending out EOIs, and may determinehow>> cunning we need to be regarding interrupt storm detection. > > I will try that. > >> >>> I''m not sure if there are any other BRAIN-DEAD device like this, Ionly>>> have this device to test MSI-X function, but we may need make sureit>>> will not break the whole system. >> >> Yes, we have to handle this case, unfortunately. >> >>> The call-back to guest because we are using the ACK-new method towork>>> around this issue. Yes, it is expensive, Also, this ACK-new methodmay>>> cause deadlock as Haitao suggested in the mail. >> >> Yes, that sucks. See my previous email -- if possible it would >> be great to >> teach Xen enough about the PCI config space to be able to mask MSIs. > In fact, currently xen is already tryting to access config > space, althought that is a bug still currently. In vt-d, xen try toaccess> FLR directly :) > >> >>> But if we move the config space to HV, then we don''t need thisACK-new>>> method, that should be ok, but admittedly, that should be the last >>> method we we turn to, since config-space should be owned by domain0. >> >> A partial movement into the hypervisor may be the best of a >> choice of evils. > > Sure, we will do that! > >> -- Keir >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/3/08 14:57, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Keir, when I try to get the ip address today, I suddenly found I can''t > reproduce it anymore, also orginally if I removed the code that trigger > the software LSC interrupt, the NIC can still work and get IP address, > but now if I remove that code, the NIC can''t work anymore. > It is really strange to me, I did''t change anything to the system. Also > I don''t know any changes in the lab environment that may cause this > change. But I do can reproduce it before each time. > > Really frustrated to get this :-( , do you think we still need move the > config space access down, now the only reasons to move this down is, > ack_edge_ioapic_irq() did the mask, and this mask can make HV more > robust.So, if you leave the driver as it is (triggering the software LSC interrupt), do APIC EOI in Xen before executing the interrupt handler in dom0, and do not mask the MSI at all, then you no longer hang? That''s a weird change in behaviour if so! I wonder whether there is a timing issue of some sort, and it depends if the NIC generates the software-triggered interrupt at a fast enough rate that the host CPU fails to make progress if it doesn''t mask the MSI? You haven''t changed test machine at all, or put the NIC in a different PCI slot, or anything like that? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/3/08 15:14, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> I wonder whether there is a timing issue of some sort, and it depends if the > NIC generates the software-triggered interrupt at a fast enough rate that the > host CPU fails to make progress if it doesn''t mask the MSI? You haven''t > changed test machine at all, or put the NIC in a different PCI slot, or > anything like that?Also, it''s got to be worth kicking your hardware guys again and find out from them exactly what happens when that software-triggered interrupt register gets written by the device driver. Their previous response didn''t sound very enlightening. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser <mailto:keir.fraser@eu.citrix.com> wrote:> On 31/3/08 14:57, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Keir, when I try to get the ip address today, I suddenly found Ican''t>> reproduce it anymore, also orginally if I removed the code thattrigger>> the software LSC interrupt, the NIC can still work and get IPaddress,>> but now if I remove that code, the NIC can''t work anymore. >> It is really strange to me, I did''t change anything to the system.Also>> I don''t know any changes in the lab environment that may cause this >> change. But I do can reproduce it before each time. >> >> Really frustrated to get this :-( , do you think we still need movethe>> config space access down, now the only reasons to move this down is, >> ack_edge_ioapic_irq() did the mask, and this mask can make HV more >> robust. > > So, if you leave the driver as it is (triggering the software LSC > interrupt), do APIC EOI in Xen before executing the interrupt > handler in > dom0, and do not mask the MSI at all, then you no longer hang?I usuually do experiement in linux kernel, and it no longer hang.> > That''s a weird change in behaviour if so! > > I wonder whether there is a timing issue of some sort, and it > depends if the > NIC generates the software-triggered interrupt at a fast > enough rate that > the host CPU fails to make progress if it doesn''t mask the > MSI? You haven''t > changed test machine at all, or put the NIC in a different PCI slot,or> anything like that?I haven''t change anything at all, the machine is on lab, which is far away from my cub. And I just stay at home at weekend.> > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/3/08 15:25, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> So, if you leave the driver as it is (triggering the software LSC >> interrupt), do APIC EOI in Xen before executing the interrupt >> handler in >> dom0, and do not mask the MSI at all, then you no longer hang? > > I usuually do experiement in linux kernel, and it no longer hang.Well, I''d be okay with an initial implementation which does not allow Xen to mask MSIs. But still I think it will be cleaner and more extensible to have Xen program the MSI registers anyway. This will hide details like interrupt vector, APIC destination mode, etc. from the MSI-capable guest, and also will make it easier to support things like changing interrupt affinity on the fly (since it will not be necessary to get dom0 involved in that). Once you have Xen able to write the MSI registers, I suppose it is not much extra work to implement some kind of interrupt mitigation scheme involving mask/enable bits of the MSI configuration register. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Keir, I am doing on that and incorporating your comments in. I will post the updated patch after I finished. Thanks for your help! Best Regards Haitao Shan Keir Fraser wrote:> On 31/3/08 15:25, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> So, if you leave the driver as it is (triggering the software LSC >>> interrupt), do APIC EOI in Xen before executing the interrupt >>> handler in dom0, and do not mask the MSI at all, then you no longer >>> hang? >> >> I usuually do experiement in linux kernel, and it no longer hang. > > Well, I''d be okay with an initial implementation which does not allow > Xen to mask MSIs. But still I think it will be cleaner and more > extensible to have Xen program the MSI registers anyway. This will > hide details like interrupt vector, APIC destination mode, etc. from > the MSI-capable guest, and also will make it easier to support things > like changing interrupt affinity on the fly (since it will not be > necessary to get dom0 involved in that). > > Once you have Xen able to write the MSI registers, I suppose it is > not much extra work to implement some kind of interrupt mitigation > scheme involving mask/enable bits of the MSI configuration register. > > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I tried this patch and MSI seems to work fine with a driver in DOM0. It didn''t work with MSI-X though because pci_vector_resources returned 8 and I have 10 MSI capable devices in the machine. I''ve only got 6 Phys-irq interrupts listed in /proc/interrupts so I''d expect there to be more vectors free. I applied the debugging patch below and got the following output. diff -r 9bb373519b68 arch/i386/pci/irq-xen.c --- a/arch/i386/pci/irq-xen.c Tue Apr 01 14:15:23 2008 +0100 +++ b/arch/i386/pci/irq-xen.c Wed Apr 02 13:19:05 2008 +0100 @@ -1192,6 +1192,7 @@ int pci_vector_resources(int last, int n int offset = (last % 8); while (next < FIRST_SYSTEM_VECTOR) { + printk("next=%d count=%d\n", next, count); next += 8; #ifdef CONFIG_X86_64 if (next == IA32_SYSCALL_VECTOR) [pci_vector_resources(176, 1) called] next=176 count=1 next=184 count=2 next=192 count=3 next=200 count=4 next=208 count=5 next=216 count=6 next=224 count=7 next=232 count=8 [pci_vector_resources returned 8] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Neil Thanks for trying the patches. The problem is caused by incompatibility between Xen and Dom0 kernel. Pci_vector_resources is to calculate available vectors. Xen assigns vector by start with vector 0x20 and offset = 0. This will confuse the code in pci_vector_resources. Maybe we should replace the function with a hypercall to acquire the number of available vectors. How do you think about it, Keir? Thanks! Shan Haitao -----Original Message----- From: Neil Turton [mailto:nturton@solarflare.com] Sent: 2008年4月2日 22:56 To: Shan, Haitao Cc: Keir Fraser; xen-devel; Tian, Kevin; Jiang, Yunhong; Li, Xin B Subject: Re: [Xen-devel] [PATCH 0/5] Add MSI support to XEN I tried this patch and MSI seems to work fine with a driver in DOM0. It didn''t work with MSI-X though because pci_vector_resources returned 8 and I have 10 MSI capable devices in the machine. I''ve only got 6 Phys-irq interrupts listed in /proc/interrupts so I''d expect there to be more vectors free. I applied the debugging patch below and got the following output. diff -r 9bb373519b68 arch/i386/pci/irq-xen.c --- a/arch/i386/pci/irq-xen.c Tue Apr 01 14:15:23 2008 +0100 +++ b/arch/i386/pci/irq-xen.c Wed Apr 02 13:19:05 2008 +0100 @@ -1192,6 +1192,7 @@ int pci_vector_resources(int last, int n int offset = (last % 8); while (next < FIRST_SYSTEM_VECTOR) { + printk("next=%d count=%d\n", next, count); next += 8; #ifdef CONFIG_X86_64 if (next == IA32_SYSCALL_VECTOR) [pci_vector_resources(176, 1) called] next=176 count=1 next=184 count=2 next=192 count=3 next=200 count=4 next=208 count=5 next=216 count=6 next=224 count=7 next=232 count=8 [pci_vector_resources returned 8] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 3/4/08 13:11, "Shan, Haitao" <haitao.shan@intel.com> wrote:> Thanks for trying the patches. The problem is caused by incompatibility > between Xen and Dom0 kernel. > Pci_vector_resources is to calculate available vectors. Xen assigns vector by > start with vector 0x20 and offset = 0. This will confuse the code in > pci_vector_resources. > Maybe we should replace the function with a hypercall to acquire the number of > available vectors. > How do you think about it, Keir?I may not understand the issue here, but in principle I do not particularly want to have anything outside Xen handling real IRQ vectors. In which case this confusion should not exist in the first place? I know the last round of patches did have dom0 poking the MSI registers, and hence it knew about real vectors, but that''s being changed in the next round, right? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel