Kay, Allen M
2007-May-30 19:05 UTC
[Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
The following 5 patches are re-submissions of the vt-d patch. This set of patches has been tested against cs# 15080 and is now much more mature and tested against more environments than the original patch. Specifically, we have successfully tested the patch with following environements: - 32/64-bit Linux HVM guest - 32-bit Windows XP/Vista (64-bit should work but did not test) - 32PAE/64-bit hypervisor - APIC and PIC interrupt mechanisms - PCIe E1000 and PCI E100 NICs Allen ---------------------- 1) patch description: vtd1.patch: - vt-d specific code - low risk changes in common code vtd2.patch: - io port handling vtd3.patch: - interrupt handling vtd4.patch: - mmio handling vtd5.patch: - turn on VT-d processing in ACPI table 2) how to run - Use same syntax as PV driver domain method to "hide" and assign PCI device - use pciback.hid=(02:00.0) to "hide" device from dom0 - use pci = [ ''02:00.00'' ] in /etc/xen/hvm.conf to assign device to HVM domain - set acpi and apic to 0 in hvm.conf as current patch only works with PIC - grub.conf: use "ioapic_ack=old" for /boot/xen.gz (io_apic.c contains code for avoiding global interrupt problem) 4) description of hvm PCI device assignment design: - pci config virtualization - Control panel and qemu changed to pass assigned PCI devices to qemu. - A new file ioemu/hw/dpci.c reads assigned devices PCI conf and constructs a new virtual device and attaches to the guest PCI bus. - PCI read/write functions are similar to other virtual devices. Except write function intercepts writes to COMMAND register and do actual hardware writes. - interrupt virtualization - Currently only works for ACPI/APIC mode - dpci.c makes a hypercall to tell xen device/intx on vPCI - In do_IRQ_guest(), when Xen determines a interrupt belongs to a device owned by HVM domain, it injects guest IRQ to the domain - Revert back to ioapic_ack=old to allow for IRQ sharing amongst guests. - Implemented new method for mask/unmask in io_apic.c to avoid spurious interrupt issue. - mmio - When guest BIOS (i.e hvmloader) or OS changes PCI BAR, PCI config write function in qemu makes a hypercall to instruct Xen to construct p2m mapping. - shadow page table fault handler have been modified to allow memory above max_pages to be mapped. - ioport - Xen intercepts guest io port accesses - translates guest io port to machine io port - does machine port access on behalf of guest 5) new hypercalls int xc_assign_device(int xc_handle, uint32_t domain_id, uint32_t machine_bdf); int xc_domain_ioport_mapping(int xc_handle, uint32_t domid, uint32_t first_gport, uint32_t first_mport, uint32_t nr_ports, uint32_t add_mapping); int xc_irq_mapping(int xc_handle, uint32_t domain_id, uint32_t method, uint32_t machine_irq, uint32_t device, uint32_t intx, uint32_t add_mapping); int xc_domain_memory_mapping(int xc_handle, uint32_t domid, unsigned long first_gfn, unsigned long first_mfn, unsigned long nr_mfns, uint32_t add_mapping); 6) interface to common code: int iommu_setup(void); int iommu_domain_init(struct domain *d); int assign_device(struct domain *d, u8 bus, u8 devfn); int release_devices(struct vcpu *v); int hvm_do_IRQ_dpci(struct domain *d, unsigned int irq); int dpci_ioport_intercept(ioreq_t *p, int type); int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn); int iommu_unmap_page( struct domain *d, unsigned long gfn); void iommu_flush(struct domain *d, unsigned long gfn, u64 *p2m_entry); void iommu_set_pgd(struct domain *d); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-30 19:55 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 30/5/07 20:05, "Kay, Allen M" <allen.m.kay@intel.com> wrote:> - grub.conf: use "ioapic_ack=old" for /boot/xen.gz > (io_apic.c contains code for avoiding global interrupt problem)How does this new scheme work? Can it supplant the ioapic_ack=new method? Clearly requiring the use of a hacky command-line option to make use of a new core feature is not very nice to say the least. It looks like the interrupt gets EOIed by writing to the IOSAPIC EOI register. I thought that x86 IOAPICs don''t have that register?> - Revert back to ioapic_ack=old to allow for IRQ sharing amongst > guests.I would expect it to work (by design at least) even with ioapic_ack=new. Actually I also know there are some other patches coming down the pipeline to do pci passthrough to HVM guests without need for hardware support (of course it is not so general; in particular it will only work for one special hvm guest). However, they deal with this interrupt issue quite cunningly, by inverting the interrupt polarity so that they get interrupts on both +ve and -ve edges of the INTx line. This allows the virtual interrupt wire to be ''wiggled'' precisely according to the behaviour of the physical interrupt wire. Which is rather nice, although of course it does double the interrupt rate, which is not so great but perhaps acceptable for the kind of low interrupt rate devices that most people would want to hand off to a hvm guest. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2007-May-30 22:02 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> >How does this new scheme work? Can it supplant the >ioapic_ack=new method? >Clearly requiring the use of a hacky command-line option to >make use of a >new core feature is not very nice to say the least. >Basically, the calls to mask_IO_APIC_irq()/unMask_IO_APIC_irq() in arch/x86/io_apic.c were replaced with write_fake_IO_APIC_vector()/restore_real_IO_APIC_vector - where the fake vector is an unused vector. Since the global interrupt from chipset only occurs during masked interrupts, this avoids that chipset bug that cause you to switch to ioapic_ack=new last year. I believe it can supplant ioapic_ack=new method.> >I would expect it to work (by design at least) even with >ioapic_ack=new. >We based our enabling effort on ioapic_ack=old so far. In theory, it should work with ioapic_ack=new. I have tried Ioapic_ack=new but it is not working right now. We will be looking into this. Allen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-May-31 06:05 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Keir Fraser > Sent: Wednesday, May 30, 2007 10:56 PM > To: Kay, Allen M; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using vt-d >> > Actually I also know there are some other patches coming down > the pipeline to do pci passthrough to HVM guests without need > for hardware support (of course it is not so general; in > particular it will only work for one special hvm guest). > However, they deal with this interrupt issue quite cunningly, > by inverting the interrupt polarity so that they get > interrupts on both +ve and -ve edges of the INTx line. This > allows the virtual interrupt wire to be ''wiggled'' precisely > according to the behaviour of the physical interrupt wire. > Which is rather nice, although of course it does double the > interrupt rate, which is not so great but perhaps acceptable > for the kind of low interrupt rate devices that most people > would want to hand off to a hvm guest. >Just FYI. Neocleus'' pass-through patches performs the "change polarity" trick. With changing the polarity, our motivation was to reflect the allocated device''s assertion state to the HVM AS IS. Regarding the performance, using a USB 2.0 storage device (working with DMA), a huge file copy was compared when working in pass-through, and when working in native (on the same OS), the time differences were negligible so I''m not sure yet about the impact of doubling the number of interrupts. The advantage of changing the polarity is the simplicity. Anyways, We''ll release some patches during the day so you could give your comments. Thanks, Guy. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 06:49 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 30/5/07 23:02, "Kay, Allen M" <allen.m.kay@intel.com> wrote:>> How does this new scheme work? Can it supplant the >> ioapic_ack=new method? >> Clearly requiring the use of a hacky command-line option to >> make use of a >> new core feature is not very nice to say the least. > > Basically, the calls to mask_IO_APIC_irq()/unMask_IO_APIC_irq() > in arch/x86/io_apic.c were replaced with > write_fake_IO_APIC_vector()/restore_real_IO_APIC_vector > - where the fake vector is an unused vector. > > Since the global interrupt from chipset only occurs during > masked interrupts, this avoids that chipset bug that cause > you to switch to ioapic_ack=new last year. I believe it can > supplant ioapic_ack=new method.I''m not against removing the ''new'' ioapic-ack method entirely if this is better. I just don''t fully understand this replacement method yet. I''ll need a walkthrough of the new mask/unmask replacements, most likely! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 13:20 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
It does be a ''cunning'' approach, which however seems to only apply to interrupt instance with both rising edge and falling edge like: _____________ ______________ ___| 1 |____| 2 |____________ Just be curious whether following cases can be addressed, when one edge is missing. [Case one] Is it possible for one device to keep line ''high'' for two successive instances, like: _________________________________________ ___| 1 | 2 ... When driver requests device to clear interrupt assertion at end of handling 1st, it''s possible that device keeps assertion if interrupt condition still matches in 2nd. In that case, no interrupt will happen any more when EOI is written to IOAPIC due to polarity inversion. [Case two] Similar to case one, two PCI devices share one interrupt pin: PCI-A _______ ___| 1 |_______________________ PCI-B _____________________________ ______________| 2 PIN _______ _____________________________ ___| |___| ^EOI If: - Guest finishes invocation to all irq actions hooked to that pin before PCI-B does assertion. - EOI to IOAPIC happens after PCI-B does assertion The net effect is that line status keeps ''high'' after EOI and polarity inverse makes no interrupt again. Maybe I didn''t get the exact detail of your named ''change polarity'' idea, and if yes, appreciate your elaboration here. :-) Thanks, Kevin>From: Guy Zana >Sent: 2007年5月31日 14:05 > > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >> Keir Fraser >> Sent: Wednesday, May 30, 2007 10:56 PM >> To: Kay, Allen M; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device >> assignment using vt-d >> > >> >> Actually I also know there are some other patches coming down >> the pipeline to do pci passthrough to HVM guests without need >> for hardware support (of course it is not so general; in >> particular it will only work for one special hvm guest). >> However, they deal with this interrupt issue quite cunningly, >> by inverting the interrupt polarity so that they get >> interrupts on both +ve and -ve edges of the INTx line. This >> allows the virtual interrupt wire to be ''wiggled'' precisely >> according to the behaviour of the physical interrupt wire. >> Which is rather nice, although of course it does double the >> interrupt rate, which is not so great but perhaps acceptable >> for the kind of low interrupt rate devices that most people >> would want to hand off to a hvm guest. >> > >Just FYI. > >Neocleus'' pass-through patches performs the "change polarity" trick. >With changing the polarity, our motivation was to reflect the allocated >device''s assertion state to the HVM AS IS. > >Regarding the performance, using a USB 2.0 storage device (working >with DMA), a huge file copy was compared when working in >pass-through, and when working in native (on the same OS), the time >differences were negligible so I''m not sure yet about the impact of >doubling the number of interrupts. The advantage of changing the >polarity is the simplicity. > >Anyways, We''ll release some patches during the day so you could give >your comments. > >Thanks, >Guy. > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 13:37 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 31/5/07 14:20, "Tian, Kevin" <kevin.tian@intel.com> wrote:> When driver requests device to clear interrupt assertion at end > of handling 1st, it''s possible that device keeps assertion if interrupt > condition still matches in 2nd. In that case, no interrupt will happen > any more when EOI is written to IOAPIC due to polarity inversion.This is absolutely fine. The virtual wire status will remain HIGH in this case, which is correct since the ''runt'' LOW pulse on the physical wire can be ignored. What we are looking for is to track the physical wire status in the long run; pulses one way or the other do not matter. Remember we are talking about *level-triggered* interrupt lines, not edge-triggered. This polarity-change trick would not be used, and would not be necessary, for edge-triggered interrupts. We would EOI the physical APIC early, before running the ISR, just as usual for edge-triggered interrupts. Because we are talking about level-triggered interrupts, if the line continues to be HIGH after the ISR runs then of course we''ll just deliver another interrupt straight to the relevant VCPU. That''s how level-triggered interrupts work. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 13:59 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年5月31日 21:37 > >On 31/5/07 14:20, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> When driver requests device to clear interrupt assertion at end >> of handling 1st, it''s possible that device keeps assertion if interrupt >> condition still matches in 2nd. In that case, no interrupt will happen >> any more when EOI is written to IOAPIC due to polarity inversion. > >This is absolutely fine. The virtual wire status will remain HIGH in this >case, which is correct since the ''runt'' LOW pulse on the physical wire >can >be ignored. What we are looking for is to track the physical wire status in >the long run; pulses one way or the other do not matter. > >Remember we are talking about *level-triggered* interrupt lines, not >edge-triggered. This polarity-change trick would not be used, and would >not >be necessary, for edge-triggered interrupts. We would EOI the physical >APIC >early, before running the ISR, just as usual for edge-triggered interrupts. > >Because we are talking about level-triggered interrupts, if the line >continues to be HIGH after the ISR runs then of course we''ll just deliver >another interrupt straight to the relevant VCPU. That''s how >level-triggered >interrupts work. :-) > > -- KeirHa, you''re exactly right. I forgot the virtual wire status which will anyway result a new virtual interrupt if no new hardware interrupt occurs as a de-assert signal. But... still one question, seems that current Xen doesn''t allow multiple end() methods called for one physical interrupt instance, while a new physical interrupt will happen only as result of end() (EOI for ioapic_new, and unmask RTE for ioapic_old). See below case: - 1st interrupt is injected and polarity is inverted - HVM finishes handle and write EOI to vIOAPIC - 1st is deasserted - 2nd instance happens - that EOI is converted into an invocation to end() method - either EOI or unmask RTE is issued to physical IOAPIC - No physical interrupt triggered due to inversed polarity - a new virtual interrupt is injected at next resume to HVM - HVM finishes handle and write EOI to vIOAPIC - 2nd instance is deasserted - EOI to vIOAPIC gears to end() again Then it''s Xen to decide whether to allow one more end(), does it? I think this part may need some change for this ''change polarity'' Approach, like check on pirq_mask. :-) BTW, how about the alternative to take guest EOI to vIOAPIC as the deassertion hint for assigned device? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-May-31 14:08 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Thursday, May 31, 2007 4:21 PM > To: Guy Zana; Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using vt-d > > It does be a ''cunning'' approach, which however seems to only > apply to interrupt instance with both rising edge and falling > edge like: > > _____________ ______________ > ___| 1 |____| 2 |____________ > > Just be curious whether following cases can be addressed, > when one edge is missing. > > [Case one] > Is it possible for one device to keep line ''high'' for > two successive instances, like: > _________________________________________ > ___| 1 | 2 ... > > When driver requests device to clear interrupt assertion > at end of handling 1st, it''s possible that device keeps > assertion if interrupt condition still matches in 2nd. In > that case, no interrupt will happen any more when EOI is > written to IOAPIC due to polarity inversion.Since the HVM''s assertion state is kept "asserted" until the _external_ line ''fall'', the HVM itself will keep getting interrupts on VMENTRYs, until the _external_ line is deasserted (by the external device). This reflects the real behavior of the external line. If this behavior of interrupt is treated differently, you''ll get redundant interrupts :)> > [Case two] > Similar to case one, two PCI devices share one interrupt pin: > PCI-A > _______ > ___| 1 |_______________________ > PCI-B > _____________________________ > ______________| 2 > PIN > _______ _____________________________ > ___| |___| ^EOI > > If: > - Guest finishes invocation to all irq actions hooked > to that pin before PCI-B does assertion. > - EOI to IOAPIC happens after PCI-B does assertion > > The net effect is that line status keeps ''high'' after EOI and > polarity inverse makes no interrupt again.If both devices works in pass-through, it should work, since it is the ORed line that is reflected. We can add functionality for sharing such devices between dom0 and a guest, by changing the way dom0 handles level-triggered interrupts. Thanks, Guy.> > Maybe I didn''t get the exact detail of your named ''change polarity'' > idea, and if yes, appreciate your elaboration here. :-) > > Thanks, > Kevin >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 15:03 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 31/5/07 14:59, "Tian, Kevin" <kevin.tian@intel.com> wrote:> But... still one question, seems that current Xen doesn''t allow > multiple end() methods called for one physical interrupt instance, > while a new physical interrupt will happen only as result of end() > (EOI for ioapic_new, and unmask RTE for ioapic_old). See below > case:Yeah, well I haven''t looked at how the Neocleus patches actually deal with this, but I expect they might steal hvm-bound physical irqs completely, hook them off early in do_IRQ() and have bespoke code to deal with them. Possibly it could be integrated with existing ->ack and ->end methods, actually. ->end() would be no-op while ->ack() would switch polarity in the IOAPIC and then EOI the LAPIC. Then the handler function called by do_IRQ() would toggle virtual HVM wires. Anyhow, integrating with existing Xen IRQ handling subsystem clearly isn''t a rocket-science problem. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 15:10 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年5月31日 23:03 > >On 31/5/07 14:59, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> But... still one question, seems that current Xen doesn''t allow >> multiple end() methods called for one physical interrupt instance, >> while a new physical interrupt will happen only as result of end() >> (EOI for ioapic_new, and unmask RTE for ioapic_old). See below >> case: > >Yeah, well I haven''t looked at how the Neocleus patches actually deal >with >this, but I expect they might steal hvm-bound physical irqs completely, >hook >them off early in do_IRQ() and have bespoke code to deal with them. >Possibly >it could be integrated with existing ->ack and ->end methods, actually. >->end() would be no-op while ->ack() would switch polarity in the IOAPIC >and >then EOI the LAPIC. Then the handler function called by do_IRQ() would >toggle virtual HVM wires. > >Anyhow, integrating with existing Xen IRQ handling subsystem clearly >isn''t a >rocket-science problem. :-) >Sure. :-) But I''m still thinking the effect to use virtual EOI as de-assertion signal, which doesn''t require to change polarity in the line frequently. We can just add a flag per gsi to indicate whether a physical irq is injected on this line. When intercepting HVM EOI, invoke deassert and also jump to ->end() if flag is on. Does it work basically? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 15:14 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 31/5/07 16:10, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Sure. :-) But I''m still thinking the effect to use virtual EOI as de-assertion > signal, which doesn''t require to change polarity in the line frequently. We > can just add a flag per gsi to indicate whether a physical irq is injected > on this line. When intercepting HVM EOI, invoke deassert and also jump > to ->end() if flag is on. Does it work basically?I didn''t realise you were suggesting another mechanism. It''s not clear to me how it works from the very brief description you give above. Could you provide an example or two for how your method would work (e.g., one which avoids switching polarity, and another where you do end up switching polarity)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 15:30 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年5月31日 23:15 > >On 31/5/07 16:10, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Sure. :-) But I''m still thinking the effect to use virtual EOI as >de-assertion >> signal, which doesn''t require to change polarity in the line frequently. >We >> can just add a flag per gsi to indicate whether a physical irq is injected >> on this line. When intercepting HVM EOI, invoke deassert and also >jump >> to ->end() if flag is on. Does it work basically? > >I didn''t realise you were suggesting another mechanism. It''s not clear to >me >how it works from the very brief description you give above. Could you >provide an example or two for how your method would work (e.g., one >which >avoids switching polarity, and another where you do end up switching >polarity)? > > -- KeirOK, my rough thought is as below: The reason to change polarity, IMO, is to capture the de-assert edge in the physical wire and then reflect de-assertion into the virtual wire. Then allow the statistics on gsi_assert_count to be updated correctly, when shared with virtual devices in Qemu. My proposal is to take virtual EOI as the de-assertion hint, without any change on physical RTE property like polarity. For example, the flow could be following by keeping a saying hw_assert_status array for all virtual GSIs: (take vioapic for example) - physical interrupt happens, and ->ack() - assert into virtual wire with assert count incremented, and also set hw_assert_status[gsi] - HVM handles the interrupt, and write EOI to vlapic, and then vioapic - vioapic_update_EOI then: - check whether hw_assert_status[gsi] is set, if yes: - invoke __hvm_pci_intx_deassert to decrement the count - ->end() - check whether injecting a new instance based on gsi count (original logic) ->end() may trigger another physical interrupt if physical wire keeps ''high'' due to any reason. Of course, some code change may be required to allow hvm_irq logic and vioapic/vpic logic to call each other, like the lock issue. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 15:40 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 31/5/07 16:30, "Tian, Kevin" <kevin.tian@intel.com> wrote:> OK, my rough thought is as below: > > The reason to change polarity, IMO, is to capture the de-assert > edge in the physical wire and then reflect de-assertion into the virtual > wire. Then allow the statistics on gsi_assert_count to be updated > correctly, when shared with virtual devices in Qemu. > > My proposal is to take virtual EOI as the de-assertion hint, without > any change on physical RTE property like polarity. For example, the > flow could be following by keeping a saying hw_assert_status array for > all virtual GSIs: (take vioapic for example)Ah, okay, so no polarity switching at all. Basically use VIOAPIC EOI as a hint to tentatively drop the virtual wire to LOW, and only then ->end the physical interrupt. I guess this is pretty much what you already implement in your VT-d patches? It''d be interesting to know how these two approaches compare performance-wise. I suppose yours should win, really, due to fewer physical interrupts. If this is how your current VT-d patches handle interrupts then I don''t see why ioapic_ack=new is not working for you. That''s a bit weird. I guess I could read the patches some more. ;-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 15:51 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年5月31日 23:40 > >On 31/5/07 16:30, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> OK, my rough thought is as below: >> >> The reason to change polarity, IMO, is to capture the de-assert >> edge in the physical wire and then reflect de-assertion into the virtual >> wire. Then allow the statistics on gsi_assert_count to be updated >> correctly, when shared with virtual devices in Qemu. >> >> My proposal is to take virtual EOI as the de-assertion hint, without >> any change on physical RTE property like polarity. For example, the >> flow could be following by keeping a saying hw_assert_status array for >> all virtual GSIs: (take vioapic for example) > >Ah, okay, so no polarity switching at all. Basically use VIOAPIC EOI as a >hint to tentatively drop the virtual wire to LOW, and only then ->end the >physical interrupt. I guess this is pretty much what you already >implement >in your VT-d patches? > >It''d be interesting to know how these two approaches compare >performance-wise. I suppose yours should win, really, due to fewer >physical >interrupts. > >If this is how your current VT-d patches handle interrupts then I don''t see >why ioapic_ack=new is not working for you. That''s a bit weird. I guess I >could read the patches some more. ;-) > > -- KeirOh, I''m not the author of VT-d patches which is the credit of Allen and Xiaohui. :-) I just had the concrete thought along with the discussion with you, and will talk to them for confirmation tomorrow. I guess "ioapic_ack=new" should be just some manual bug since one NIC assignment shouldn''t result shared interrupt case yet. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 15:52 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote:> It''d be interesting to know how these two approaches compare > performance-wise. I suppose yours should win, really, due to fewer physical > interrupts.One thing is that the polarity-switching approach is a slightly better fit with the HVM interrupt logic. Currently interrupt sources and VIOAPIC are not tightly bound together; they only interact by one waggling the virtual intx wires and the other sampling that wire periodically (or synchronously on +ve edges). Your approach requires a ''back channel'' from the VIOAPIC code back to physical interrupt code to call ->end(). It''s kind of ugly. On the other hand I suspect the polarity-switching code adds more stuff to the phsyical interrupt subsystem, and your approach can certainly be supported, probably by adding a bit more state (maybe just a single bit) per virtual intx wire. Really we need to look at and measure each implementation... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 15:59 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年5月31日 23:52 > >On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote: > >> It''d be interesting to know how these two approaches compare >> performance-wise. I suppose yours should win, really, due to fewer >physical >> interrupts. > >One thing is that the polarity-switching approach is a slightly better fit >with the HVM interrupt logic. Currently interrupt sources and VIOAPIC >are >not tightly bound together; they only interact by one waggling the virtual >intx wires and the other sampling that wire periodically (or synchronously >on +ve edges). Your approach requires a ''back channel'' from the >VIOAPIC code >back to physical interrupt code to call ->end(). It''s kind of ugly. On the >other hand I suspect the polarity-switching code adds more stuff to the >phsyical interrupt subsystem, and your approach can certainly be >supported, >probably by adding a bit more state (maybe just a single bit) per virtual >intx wire. Really we need to look at and measure each implementation... > > -- KeirAgree to support both with a common infrastructure. But I doubt that polarity-switching code should also use such ->end call in virtual EOI path, since you anyway need an unmask or EOI signal to physical ioapic. Or else, how to trigger the 2nd interrupt at falling-edge? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-May-31 16:03 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>From: Tian, Kevin >Sent: 2007年5月31日 23:59 > >>From: Keir Fraser [mailto:keir@xensource.com] >>Sent: 2007年5月31日 23:52 >> >>On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote: >> >>> It''d be interesting to know how these two approaches compare >>> performance-wise. I suppose yours should win, really, due to fewer >>physical >>> interrupts. >> >>One thing is that the polarity-switching approach is a slightly better fit >>with the HVM interrupt logic. Currently interrupt sources and VIOAPIC >>are >>not tightly bound together; they only interact by one waggling the virtual >>intx wires and the other sampling that wire periodically (or >synchronously >>on +ve edges). Your approach requires a ''back channel'' from the >>VIOAPIC code >>back to physical interrupt code to call ->end(). It''s kind of ugly. On the >>other hand I suspect the polarity-switching code adds more stuff to the >>phsyical interrupt subsystem, and your approach can certainly be >>supported, >>probably by adding a bit more state (maybe just a single bit) per virtual >>intx wire. Really we need to look at and measure each >implementation... >> >> -- Keir > >Agree to support both with a common infrastructure. But I doubt that >polarity-switching code should also use such ->end call in virtual EOI >path, since you anyway need an unmask or EOI signal to physical >ioapic. Or else, how to trigger the 2nd interrupt at falling-edge? > >Thanks, >KevinOh, forgive my ignorance. That can be done in ->ack() by changing polarity and then EOI as what you said before. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-May-31 16:07 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Thursday, May 31, 2007 7:04 PM > To: Tian, Kevin; Keir Fraser; Guy Zana; Kay, Allen M; > xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using vt-d > > >From: Tian, Kevin > >Sent: 2007年5月31日 23:59 > > > >>From: Keir Fraser [mailto:keir@xensource.com] > >>Sent: 2007年5月31日 23:52 > >> > >>On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote: > >> > >>> It''d be interesting to know how these two approaches compare > >>> performance-wise. I suppose yours should win, really, due to fewer > >>physical > >>> interrupts. > >> > >>One thing is that the polarity-switching approach is a > slightly better > >>fit with the HVM interrupt logic. Currently interrupt sources and > >>VIOAPIC are not tightly bound together; they only interact by one > >>waggling the virtual intx wires and the other sampling that wire > >>periodically (or > >synchronously > >>on +ve edges). Your approach requires a ''back channel'' from the > >>VIOAPIC code back to physical interrupt code to call ->end(). It''s > >>kind of ugly. On the other hand I suspect the > polarity-switching code > >>adds more stuff to the phsyical interrupt subsystem, and > your approach > >>can certainly be supported, probably by adding a bit more > state (maybe > >>just a single bit) per virtual intx wire. Really we need to look at > >>and measure each > >implementation... > >> > >> -- Keir > > > >Agree to support both with a common infrastructure. But I doubt that > >polarity-switching code should also use such ->end call in > virtual EOI > >path, since you anyway need an unmask or EOI signal to > physical ioapic. > >Or else, how to trigger the 2nd interrupt at falling-edge? > > > >Thanks, > >Kevin > > Oh, forgive my ignorance. That can be done in ->ack() by > changing polarity and then EOI as what you said before. :-) >We did it by replacing the end() callback of the hw_interrupt_type, the new handler performs a change_vector_polarity and then calls the original ->end() of the level-triggered hw_interrupt_type, all the rest of the callbacks stays the same. I hope to get the patch ready today, but it will be for c/s 15011. Thanks, Guy.> Thanks, > Kevin >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-May-31 16:28 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 31/5/07 16:59, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Agree to support both with a common infrastructure.We don''t need both! Let''s look at which is cleanest and/or fastest and make a judgment call. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2007-May-31 17:43 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> >Just FYI. > >Neocleus'' pass-through patches performs the "change polarity" trick. >With changing the polarity, our motivation was to reflect the >allocated device''s assertion state to the HVM AS IS. > >Regarding the performance, using a USB 2.0 storage device >(working with DMA), a huge file copy was compared when working >in pass-through, and when working in native (on the same OS), >the time differences were negligible so I''m not sure yet about >the impact of doubling the number of interrupts. The advantage >of changing the polarity is the simplicity. > >Anyways, We''ll release some patches during the day so you >could give your comments. > >Thanks, >Guy. >How do you handle DMA buffers without hardware support? Did you modify the device driver in HVM to get the machine physical address? Sounds like the conflict is only limited to the vt-d interrupt patch (vtd3.patch) - which is a relatively small part vt-d patch set. Once your patch is released, I will take a look at it. Allen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-May-31 18:00 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> -----Original Message----- > From: Kay, Allen M [mailto:allen.m.kay@intel.com] > Sent: Thursday, May 31, 2007 8:44 PM > To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using vt-d > > > > >Just FYI. > > > >Neocleus'' pass-through patches performs the "change polarity" trick. > >With changing the polarity, our motivation was to reflect > the allocated > >device''s assertion state to the HVM AS IS. > > > >Regarding the performance, using a USB 2.0 storage device > (working with > >DMA), a huge file copy was compared when working in > pass-through, and > >when working in native (on the same OS), the time differences were > >negligible so I''m not sure yet about the impact of doubling > the number > >of interrupts. The advantage of changing the polarity is the > >simplicity. > > > >Anyways, We''ll release some patches during the day so you could give > >your comments. > > > >Thanks, > >Guy. > > > > How do you handle DMA buffers without hardware support? Did > you modify the device driver in HVM to get the machine > physical address?We actually launch a HVM domain with its P2M table populated in a 1:1 fashion (where the gpfn==mfn), We gave a lecture at the last Xen Summit, you can see it at: http://www.xensource.com/files/xensummit_4/Neocleus_HVM_PCI_Pass-through_Zana.pdf The 1:1 layout is still not robust as we would like it to be, it doesn''t support domain recreation for instance.> > Once your patch is released, I will take a look at it.Sure. Thanks, Guy. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2007-May-31 18:42 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
>> >> Once your patch is released, I will take a look at it. > >Sure. > >Thanks, >Guy. >If possible, can you package the interrupt part of the patch separately so that it can be easily tried out? Thanks. Allen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Jun-01 02:57 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
Two more minor comments: - For polarity-switching approach, now I''m inclined to applaud if it can help handle ''boot interrupt'' issue as fuse of ioapic_ack_new. That brings more value than simple assist on virtual wire de-assertion. - For ->end() in VIOAPIC code, I think that''s not ugly since similar to pirq_guest_eoi used in do_physdev_op which also comes from end() method of pirq_type in dom0. Thanks, Kevin>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年5月31日 23:52 > >On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote: > >> It''d be interesting to know how these two approaches compare >> performance-wise. I suppose yours should win, really, due to fewer >physical >> interrupts. > >One thing is that the polarity-switching approach is a slightly better fit >with the HVM interrupt logic. Currently interrupt sources and VIOAPIC >are >not tightly bound together; they only interact by one waggling the virtual >intx wires and the other sampling that wire periodically (or synchronously >on +ve edges). Your approach requires a ''back channel'' from the >VIOAPIC code >back to physical interrupt code to call ->end(). It''s kind of ugly. On the >other hand I suspect the polarity-switching code adds more stuff to the >phsyical interrupt subsystem, and your approach can certainly be >supported, >probably by adding a bit more state (maybe just a single bit) per virtual >intx wire. Really we need to look at and measure each implementation... > > -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2007-Jun-03 08:29 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
Base on my understanding of the Neocleus'' passthrough patch, it seems all devices sharing that interrupt will get the double number of interrupts. This means if a interrupt is shared between a NIC device used by a HVM guest and a SATA device used by dom0, the SATA driver in dom0 will also get twice the number of interrupts. Am I correct? Allen>-----Original Message----- >From: Guy Zana [mailto:guy@neocleus.com] >Sent: Wednesday, May 30, 2007 11:05 PM >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device >assignment using vt-d > > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of >> Keir Fraser >> Sent: Wednesday, May 30, 2007 10:56 PM >> To: Kay, Allen M; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device >> assignment using vt-d >> > >> >> Actually I also know there are some other patches coming down >> the pipeline to do pci passthrough to HVM guests without need >> for hardware support (of course it is not so general; in >> particular it will only work for one special hvm guest). >> However, they deal with this interrupt issue quite cunningly, >> by inverting the interrupt polarity so that they get >> interrupts on both +ve and -ve edges of the INTx line. This >> allows the virtual interrupt wire to be ''wiggled'' precisely >> according to the behaviour of the physical interrupt wire. >> Which is rather nice, although of course it does double the >> interrupt rate, which is not so great but perhaps acceptable >> for the kind of low interrupt rate devices that most people >> would want to hand off to a hvm guest. >> > >Just FYI. > >Neocleus'' pass-through patches performs the "change polarity" trick. >With changing the polarity, our motivation was to reflect the >allocated device''s assertion state to the HVM AS IS. > >Regarding the performance, using a USB 2.0 storage device >(working with DMA), a huge file copy was compared when working >in pass-through, and when working in native (on the same OS), >the time differences were negligible so I''m not sure yet about >the impact of doubling the number of interrupts. The advantage >of changing the polarity is the simplicity. > >Anyways, We''ll release some patches during the day so you >could give your comments. > >Thanks, >Guy. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-03 08:37 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 3/6/07 09:29, "Kay, Allen M" <allen.m.kay@intel.com> wrote:> Base on my understanding of the Neocleus'' passthrough patch, it seems > all devices sharing that interrupt will get the double number of > interrupts. This means if a interrupt is shared between a NIC device > used by a HVM guest and a SATA device used by dom0, the SATA driver in > dom0 will also get twice the number of interrupts. Am I correct?No, it should be the case that the device ISRs are only invoked about as often as in your scheme. The extra interrupts are only visible to Xen, and tell Xen to deassert the virtual INTx wire for any hvm-attached devices which share that physical interrupt. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Jun-03 09:59 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
Sort of... Our method might doubles the number of interrupts if both devices are connected to the same pin, but since all devices are OR wired, you might even "save" *physical* interrupts from happening -> I guess that we''ll get a decisive answer only after performing some profiling. Our method will not work "out of the box" if you''re trying to use it when sharing a pin between dom0 and an HVM. Consider the following scenario: HVM: _____________________ ____| |___________________ Dom0: ____________________________________ __________| Phys Line: __________________________________________ ____| A B C D In point B you changed the polarity. In point C and D you won''t be getting any interrupts since of the polarity-change, and the device that is allocated for dom0 will keep its line asserted until the dom0 driver will handle the interrupt, but it won''t get a chance to do so, moreover, the hvm vline will still be kept asserted. We are currently modeling the problem, it seems that it''s a complicated concept, regardless of changing-polarity. For instance, an HVM with a Linux OS will die if 99,900 interrupts out of 100,000 are not handled.>From a logical POV, the aforementioned race is solved like this: we can hold a virtual assertion line for _dom0_ (which will be updated by the arrival of interrupts as a result from change-polarity) and concatenate the HVM''s ISR chain with dom0''s ISR chain, and dom0 must be the first to try handle the interrupt (because of the 99,000 to 100,000 problem), I guess that pass-through shared interrupts probably should be handled as the last (default) function in dom0''s ISR chain.How do you plan to provide interrupts sharing with your method exactly? Please provide your thoughts. Thanks, Guy.> -----Original Message----- > From: Kay, Allen M [mailto:allen.m.kay@intel.com] > Sent: Sunday, June 03, 2007 11:29 AM > To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using vt-d > > Base on my understanding of the Neocleus'' passthrough patch, > it seems all devices sharing that interrupt will get the > double number of interrupts. This means if a interrupt is > shared between a NIC device used by a HVM guest and a SATA > device used by dom0, the SATA driver in dom0 will also get > twice the number of interrupts. Am I correct? > > Allen > > >-----Original Message----- > >From: Guy Zana [mailto:guy@neocleus.com] > >Sent: Wednesday, May 30, 2007 11:05 PM > >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com > >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using > >vt-d > > > > > >> -----Original Message----- > >> From: xen-devel-bounces@lists.xensource.com > >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir > >> Fraser > >> Sent: Wednesday, May 30, 2007 10:56 PM > >> To: Kay, Allen M; xen-devel@lists.xensource.com > >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using > >> vt-d > >> > > > >> > >> Actually I also know there are some other patches coming down the > >> pipeline to do pci passthrough to HVM guests without need for > >> hardware support (of course it is not so general; in particular it > >> will only work for one special hvm guest). > >> However, they deal with this interrupt issue quite cunningly, by > >> inverting the interrupt polarity so that they get > interrupts on both > >> +ve and -ve edges of the INTx line. This allows the > virtual interrupt > >> wire to be ''wiggled'' precisely according to the behaviour of the > >> physical interrupt wire. > >> Which is rather nice, although of course it does double > the interrupt > >> rate, which is not so great but perhaps acceptable for the kind of > >> low interrupt rate devices that most people would want to > hand off to > >> a hvm guest. > >> > > > >Just FYI. > > > >Neocleus'' pass-through patches performs the "change polarity" trick. > >With changing the polarity, our motivation was to reflect the > >allocated device''s assertion state to the HVM AS IS. > > > >Regarding the performance, using a USB 2.0 storage device > >(working with DMA), a huge file copy was compared when working > >in pass-through, and when working in native (on the same OS), > >the time differences were negligible so I''m not sure yet about > >the impact of doubling the number of interrupts. The advantage > >of changing the polarity is the simplicity. > > > >Anyways, We''ll release some patches during the day so you > >could give your comments. > > > >Thanks, > >Guy. > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2007-Jun-03 13:29 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
The sequence of interrupt injection doesn''t matter actually, since you can''t wait and inject to next domain only after previous one in the chain doesn''t handle it which is very low efficient. To me the unhandled irq issue (as 99900 out of 100000) is inevitable. Say irq sharing among 2 HVM domains, with one assigned a high rate PCI device like NIC and the other assigned with a low rate PCI device like UHCI, it''s likely to have over 100000 interrupts from NIC with UHCI silent in given period. Since, from Xen point of view, there''s no way to know which HVM guest owns given interrupt instance, same amount of interrupts will be injected into both HVM domains. We may force "noirqdebug", however that may not apply to all linux version and other OSes from HVM side. Actually there''re more tricky things to consider for irq sharing among domains. For example: - Driver in one HVM domain may leave device in interrupt assertion status while having related virtual wire always masked (like an unclean driver unload). - When OS first mask PIC entry and then unmask IOAPIC entry one interrupt may occur in the middle and IOAPIC doesn''t pend when masked). So that pending indicator in PIC is missed. Such rare cases can block the other domain sharing same irq, once occurring unfortunately. This breaks the isolation between domains heavily, which is common issue whatever approach we use to share irq. Maybe better way is to use MSI instead and we may then avoid above irq share issue from management tool side. For example, avoid sharing devices with same irq among domains when MSI is not able to use... Thanks, Kevin>-----Original Message----- >From: Guy Zana [mailto:guy@neocleus.com] >Sent: 2007年6月3日 17:59 >To: Kay, Allen M; Keir Fraser; xen-devel@lists.xensource.com >Cc: Tian, Kevin >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using >vt-d > >Sort of... Our method might doubles the number of interrupts if both >devices are connected to the same pin, but since all devices are OR >wired, you might even "save" *physical* interrupts from happening -> I >guess that we''ll get a decisive answer only after performing some >profiling. > >Our method will not work "out of the box" if you''re trying to use it when >sharing a pin between dom0 and an HVM. >Consider the following scenario: > >HVM: > _____________________ > ____| >|___________________ > >Dom0: > >____________________________________ > __________| > >Phys Line: > __________________________________________ > ____| > > > A B C D > > >In point B you changed the polarity. In point C and D you won''t be getting >any interrupts since of the polarity-change, and the device that is >allocated for dom0 will keep its line asserted until the dom0 driver will >handle the interrupt, but it won''t get a chance to do so, moreover, the >hvm vline will still be kept asserted. > >We are currently modeling the problem, it seems that it''s a complicated >concept, regardless of changing-polarity. For instance, an HVM with a >Linux OS will die if 99,900 interrupts out of 100,000 are not handled. > >From a logical POV, the aforementioned race is solved like this: we can >hold a virtual assertion line for _dom0_ (which will be updated by the >arrival of interrupts as a result from change-polarity) and concatenate the >HVM''s ISR chain with dom0''s ISR chain, and dom0 must be the first to >try handle the interrupt (because of the 99,000 to 100,000 problem), I >guess that pass-through shared interrupts probably should be handled >as the last (default) function in dom0''s ISR chain. > >How do you plan to provide interrupts sharing with your method exactly? >Please provide your thoughts. > >Thanks, >Guy. > >> -----Original Message----- >> From: Kay, Allen M [mailto:allen.m.kay@intel.com] >> Sent: Sunday, June 03, 2007 11:29 AM >> To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com >> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device >> assignment using vt-d >> >> Base on my understanding of the Neocleus'' passthrough patch, >> it seems all devices sharing that interrupt will get the >> double number of interrupts. This means if a interrupt is >> shared between a NIC device used by a HVM guest and a SATA >> device used by dom0, the SATA driver in dom0 will also get >> twice the number of interrupts. Am I correct? >> >> Allen >> >> >-----Original Message----- >> >From: Guy Zana [mailto:guy@neocleus.com] >> >Sent: Wednesday, May 30, 2007 11:05 PM >> >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com >> >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device >> assignment using >> >vt-d >> > >> > >> >> -----Original Message----- >> >> From: xen-devel-bounces@lists.xensource.com >> >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir >> >> Fraser >> >> Sent: Wednesday, May 30, 2007 10:56 PM >> >> To: Kay, Allen M; xen-devel@lists.xensource.com >> >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device >> assignment using >> >> vt-d >> >> >> > >> >> >> >> Actually I also know there are some other patches coming down the >> >> pipeline to do pci passthrough to HVM guests without need for >> >> hardware support (of course it is not so general; in particular it >> >> will only work for one special hvm guest). >> >> However, they deal with this interrupt issue quite cunningly, by >> >> inverting the interrupt polarity so that they get >> interrupts on both >> >> +ve and -ve edges of the INTx line. This allows the >> virtual interrupt >> >> wire to be ''wiggled'' precisely according to the behaviour of the >> >> physical interrupt wire. >> >> Which is rather nice, although of course it does double >> the interrupt >> >> rate, which is not so great but perhaps acceptable for the kind of >> >> low interrupt rate devices that most people would want to >> hand off to >> >> a hvm guest. >> >> >> > >> >Just FYI. >> > >> >Neocleus'' pass-through patches performs the "change polarity" trick. >> >With changing the polarity, our motivation was to reflect the >> >allocated device''s assertion state to the HVM AS IS. >> > >> >Regarding the performance, using a USB 2.0 storage device >> >(working with DMA), a huge file copy was compared when working >> >in pass-through, and when working in native (on the same OS), >> >the time differences were negligible so I''m not sure yet about >> >the impact of doubling the number of interrupts. The advantage >> >of changing the polarity is the simplicity. >> > >> >Anyways, We''ll release some patches during the day so you >> >could give your comments. >> > >> >Thanks, >> >Guy. >> > >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-03 13:35 UTC
Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
On 3/6/07 14:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Maybe better way is to use MSI instead and we may then avoid above irq share > issue from management tool side. For example, avoid > sharing devices with same irq among domains when MSI is not able to > use...Yes, any sharing of legacy INTx lines is inherently unrobust. There''s not much we can do about that, but many devices, chipsets and OSes do support MSI these days. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Guy Zana
2007-Jun-03 14:35 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@intel.com] > Sent: Sunday, June 03, 2007 4:30 PM > To: Guy Zana; Kay, Allen M; Keir Fraser; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using vt-d > > The sequence of interrupt injection doesn''t matter actually, > since you can''t wait and inject to next domain only after > previous one in the chain doesn''t handle it which is very low > efficient. > > To me the unhandled irq issue (as 99900 out of 100000) is inevitable. > Say irq sharing among 2 HVM domains, with one assigned a high > rate PCI device like NIC and the other assigned with a low > rate PCI device like UHCI, it''s likely to have over 100000 > interrupts from NIC with UHCI silent in given period. Since, > from Xen point of view, there''s no way to know which HVM > guest owns given interrupt instance, same amount of > interrupts will be injected into both HVM domains.Sharing an irq between two HVMs is surely not something we would want to handle right now.> > We may force "noirqdebug", however that may not apply to all > linux version and other OSes from HVM side. >Maybe we would like to add a PV dummy-driver to the HVM, that will register on that IRQ and solve the 99,900:100,000 problem? I think that HVM assert/deassert state should be set only after giving dom0 a chance to handle the IRQ is more robust.> Actually there''re more tricky things to consider for irq > sharing among domains. For example: > - Driver in one HVM domain may leave device in > interrupt assertion status while having related virtual wire > always masked (like an unclean driver unload). > > - When OS first mask PIC entry and then unmask IOAPIC > entry one interrupt may occur in the middle and IOAPIC > doesn''t pend when masked). So that pending indicator in PIC is missed. > > Such rare cases can block the other domain sharing same > irq, once occurring unfortunately. This breaks the isolation > between domains heavily, which is common issue whatever > approach we use to share irq. > > Maybe better way is to use MSI instead and we may then > avoid above irq share issue from management tool side. For > example, avoid sharing devices with same irq among domains > when MSI is not able to use...We can also disable the driver in dom0 :-)> > Thanks, > Kevin > > >-----Original Message----- > >From: Guy Zana [mailto:guy@neocleus.com] > >Sent: 2007年6月3日 17:59 > >To: Kay, Allen M; Keir Fraser; xen-devel@lists.xensource.com > >Cc: Tian, Kevin > >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using > >vt-d > > > >Sort of... Our method might doubles the number of interrupts if both > >devices are connected to the same pin, but since all devices are OR > >wired, you might even "save" *physical* interrupts from > happening -> I > >guess that we''ll get a decisive answer only after performing some > >profiling. > > > >Our method will not work "out of the box" if you''re trying to use it > >when sharing a pin between dom0 and an HVM. > >Consider the following scenario: > > > >HVM: > > _____________________ > > ____| > >|___________________ > > > >Dom0: > > > >____________________________________ > > __________| > > > >Phys Line: > > __________________________________________ > > ____| > > > > > > A B C D > > > > > >In point B you changed the polarity. In point C and D you won''t be > >getting any interrupts since of the polarity-change, and the device > >that is allocated for dom0 will keep its line asserted until > the dom0 > >driver will handle the interrupt, but it won''t get a chance > to do so, > >moreover, the hvm vline will still be kept asserted. > > > >We are currently modeling the problem, it seems that it''s a > complicated > >concept, regardless of changing-polarity. For instance, an > HVM with a > >Linux OS will die if 99,900 interrupts out of 100,000 are > not handled. > > > >From a logical POV, the aforementioned race is solved like > this: we can > >hold a virtual assertion line for _dom0_ (which will be > updated by the > >arrival of interrupts as a result from change-polarity) and > concatenate > >the HVM''s ISR chain with dom0''s ISR chain, and dom0 must be > the first > >to try handle the interrupt (because of the 99,000 to > 100,000 problem), > >I guess that pass-through shared interrupts probably should > be handled > >as the last (default) function in dom0''s ISR chain. > > > >How do you plan to provide interrupts sharing with your > method exactly? > >Please provide your thoughts. > > > >Thanks, > >Guy. > > > >> -----Original Message----- > >> From: Kay, Allen M [mailto:allen.m.kay@intel.com] > >> Sent: Sunday, June 03, 2007 11:29 AM > >> To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com > >> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > assignment using > >> vt-d > >> > >> Base on my understanding of the Neocleus'' passthrough > patch, it seems > >> all devices sharing that interrupt will get the double number of > >> interrupts. This means if a interrupt is shared between a > NIC device > >> used by a HVM guest and a SATA device used by dom0, the > SATA driver > >> in dom0 will also get twice the number of interrupts. Am > I correct? > >> > >> Allen > >> > >> >-----Original Message----- > >> >From: Guy Zana [mailto:guy@neocleus.com] > >> >Sent: Wednesday, May 30, 2007 11:05 PM > >> >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com > >> >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device > >> assignment using > >> >vt-d > >> > > >> > > >> >> -----Original Message----- > >> >> From: xen-devel-bounces@lists.xensource.com > >> >> [mailto:xen-devel-bounces@lists.xensource.com] On > Behalf Of Keir > >> >> Fraser > >> >> Sent: Wednesday, May 30, 2007 10:56 PM > >> >> To: Kay, Allen M; xen-devel@lists.xensource.com > >> >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device > >> assignment using > >> >> vt-d > >> >> > >> > > >> >> > >> >> Actually I also know there are some other patches > coming down the > >> >> pipeline to do pci passthrough to HVM guests without need for > >> >> hardware support (of course it is not so general; in > particular it > >> >> will only work for one special hvm guest). > >> >> However, they deal with this interrupt issue quite > cunningly, by > >> >> inverting the interrupt polarity so that they get > >> interrupts on both > >> >> +ve and -ve edges of the INTx line. This allows the > >> virtual interrupt > >> >> wire to be ''wiggled'' precisely according to the > behaviour of the > >> >> physical interrupt wire. > >> >> Which is rather nice, although of course it does double > >> the interrupt > >> >> rate, which is not so great but perhaps acceptable for > the kind of > >> >> low interrupt rate devices that most people would want to > >> hand off to > >> >> a hvm guest. > >> >> > >> > > >> >Just FYI. > >> > > >> >Neocleus'' pass-through patches performs the "change > polarity" trick. > >> >With changing the polarity, our motivation was to reflect the > >> >allocated device''s assertion state to the HVM AS IS. > >> > > >> >Regarding the performance, using a USB 2.0 storage device > (working > >> >with DMA), a huge file copy was compared when working in > >> >pass-through, and when working in native (on the same > OS), the time > >> >differences were negligible so I''m not sure yet about the > impact of > >> >doubling the number of interrupts. The advantage of changing the > >> >polarity is the simplicity. > >> > > >> >Anyways, We''ll release some patches during the day so you > could give > >> >your comments. > >> > > >> >Thanks, > >> >Guy. > >> > > >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2007-Jun-04 22:56 UTC
RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d
> >How do you plan to provide interrupts sharing with your method exactly? >Please provide your thoughts. > >Thanks, >Guy. >I agree with others'' comments about MSI would be the ultimate solution. In the short term, a simple stop gap meausre might be to check the interrupt status bit in device''s PCI config before delivering it to the guest. I found interrupt sharing with low interrupt device such as non-storage USB is not much a problem. It is a problem when sharing the interrupt with high freqency interrupt devices such as SATA. The problem I encountered was the xen tried to deliver interrupt to the guest before the guest was fully ready to accept interrupts. Allen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Seemingly Similar Threads
- [RFC][PATCH 4/6] HVM PCI Passthrough (non-IOMMU)
- [VTD][PATCH] a time out mechanism for the shared interrupt issue for vtd
- [VTD][RESEND]add a timer for the shared interrupt issue for vt-d
- [NEO 1:1] Nativedom 1:1 Mapping
- [PATCH][RFC]Move PCI Configuration Spaces from Dom0 to Xen