thr3ads.net - Xen devel - [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d [May 2007]

If this information is useful, please help other people find it:
Share via:

Kay, Allen M

2007-May-30 19:05 UTC

[Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

The following 5 patches are re-submissions of the vt-d patch.
This set of patches has been tested against cs# 15080 and is
now much more mature and tested against more environments than
the original patch.  Specifically, we have successfully tested
the patch with following environements:

    - 32/64-bit Linux HVM guest
    - 32-bit Windows XP/Vista (64-bit should work but did not test)
    - 32PAE/64-bit hypervisor
    - APIC and PIC interrupt mechanisms
    - PCIe E1000 and PCI E100 NICs

Allen

----------------------

1) patch description:

vtd1.patch:
    - vt-d specific code 
    - low risk changes in common code
vtd2.patch:
    - io port handling
vtd3.patch:
    - interrupt handling
vtd4.patch:
    - mmio handling
vtd5.patch:
    - turn on VT-d processing in ACPI table

2) how to run

- Use same syntax as PV driver domain method to "hide" and assign PCI
device
    - use pciback.hid=(02:00.0) to "hide" device from dom0
    - use pci = [ ''02:00.00'' ] in /etc/xen/hvm.conf to assign
device to
HVM domain
    - set acpi and apic to 0 in hvm.conf as current patch only works
with PIC
    - grub.conf: use "ioapic_ack=old" for /boot/xen.gz
      (io_apic.c contains code for avoiding global interrupt problem)

4) description of hvm PCI device assignment design:

- pci config virtualization
  - Control panel and qemu changed to pass assigned PCI devices to qemu.
  - A new file ioemu/hw/dpci.c reads assigned devices PCI conf and
constructs a
    new virtual device and attaches to the guest PCI bus.
  - PCI read/write functions are similar to other virtual devices.
Except
    write function intercepts writes to COMMAND register and do actual
    hardware writes.

- interrupt virtualization
  - Currently only works for ACPI/APIC mode
  - dpci.c makes a hypercall to tell xen device/intx on vPCI
  - In do_IRQ_guest(), when Xen determines a interrupt belongs to a
device
    owned by HVM domain, it injects guest IRQ to the domain
  - Revert back to ioapic_ack=old to allow for IRQ sharing amongst
guests.
  - Implemented new method for mask/unmask in io_apic.c to avoid
    spurious interrupt issue.

- mmio
  - When guest BIOS (i.e hvmloader) or OS changes PCI BAR, PCI config
write
    function in qemu makes a hypercall to instruct Xen to construct p2m
mapping.
  - shadow page table fault handler have been modified to allow memory
above
    max_pages to be mapped.

- ioport
  - Xen intercepts guest io port accesses
  - translates guest io port to machine io port
  - does machine port access on behalf of guest

5) new hypercalls

int xc_assign_device(int xc_handle,
                     uint32_t domain_id,
                     uint32_t machine_bdf);

int xc_domain_ioport_mapping(int xc_handle,
                             uint32_t domid,
                             uint32_t first_gport,
                             uint32_t first_mport,
                             uint32_t nr_ports,
                             uint32_t add_mapping);

int xc_irq_mapping(int xc_handle,
                   uint32_t domain_id,
                   uint32_t method,
                   uint32_t machine_irq,
                   uint32_t device,
                   uint32_t intx,
                   uint32_t add_mapping);

int xc_domain_memory_mapping(int xc_handle,
                             uint32_t domid,
                             unsigned long first_gfn,
                             unsigned long first_mfn,
                             unsigned long nr_mfns,
                             uint32_t add_mapping);

6) interface to common code: 

int iommu_setup(void);
int iommu_domain_init(struct domain *d);
int assign_device(struct domain *d, u8 bus, u8 devfn);
int release_devices(struct vcpu *v);
int hvm_do_IRQ_dpci(struct domain *d, unsigned int irq);
int dpci_ioport_intercept(ioreq_t *p, int type);
int iommu_map_page(struct domain *d,
        unsigned long gfn, unsigned long mfn);
int iommu_unmap_page(
    struct domain *d, unsigned long gfn);
void iommu_flush(struct domain *d, unsigned long gfn, u64 *p2m_entry);
void iommu_set_pgd(struct domain *d);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-30 19:55 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 30/5/07 20:05, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>     - grub.conf: use "ioapic_ack=old" for /boot/xen.gz
>       (io_apic.c contains code for avoiding global interrupt problem)
How does this new scheme work? Can it supplant the ioapic_ack=new method?
Clearly requiring the use of a hacky command-line option to make use of a
new core feature is not very nice to say the least.

It looks like the interrupt gets EOIed by writing to the IOSAPIC EOI
register. I thought that x86 IOAPICs don''t have that register?
>   - Revert back to ioapic_ack=old to allow for IRQ sharing amongst
> guests.
I would expect it to work (by design at least) even with ioapic_ack=new.

Actually I also know there are some other patches coming down the pipeline
to do pci passthrough to HVM guests without need for hardware support (of
course it is not so general; in particular it will only work for one special
hvm guest). However, they deal with this interrupt issue quite cunningly, by
inverting the interrupt polarity so that they get interrupts on both +ve and
-ve edges of the INTx line. This allows the virtual interrupt wire to be
''wiggled'' precisely according to the behaviour of the physical
interrupt
wire. Which is rather nice, although of course it does double the interrupt
rate, which is not so great but perhaps acceptable for the kind of low
interrupt rate devices that most people would want to hand off to a hvm
guest.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2007-May-30 22:02 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>
>How does this new scheme work? Can it supplant the 
>ioapic_ack=new method?
>Clearly requiring the use of a hacky command-line option to 
>make use of a
>new core feature is not very nice to say the least.
>
Basically, the calls to mask_IO_APIC_irq()/unMask_IO_APIC_irq()
in arch/x86/io_apic.c were replaced with
write_fake_IO_APIC_vector()/restore_real_IO_APIC_vector
- where the fake vector is an unused vector.

Since the global interrupt from chipset only occurs during
masked interrupts, this avoids that chipset bug that cause
you to switch to ioapic_ack=new last year.  I believe it can
supplant ioapic_ack=new method.
>
>I would expect it to work (by design at least) even with 
>ioapic_ack=new.
>
We based our enabling effort on ioapic_ack=old so far.
In theory, it should work with ioapic_ack=new.  I have tried
Ioapic_ack=new but it is not working right now.  We will
be looking into this.

Allen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guy Zana

2007-May-31 06:05 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> Keir Fraser
> Sent: Wednesday, May 30, 2007 10:56 PM
> To: Kay, Allen M; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using vt-d
> 
> 
> Actually I also know there are some other patches coming down 
> the pipeline to do pci passthrough to HVM guests without need 
> for hardware support (of course it is not so general; in 
> particular it will only work for one special hvm guest). 
> However, they deal with this interrupt issue quite cunningly, 
> by inverting the interrupt polarity so that they get 
> interrupts on both +ve and -ve edges of the INTx line. This 
> allows the virtual interrupt wire to be ''wiggled''
precisely
> according to the behaviour of the physical interrupt wire. 
> Which is rather nice, although of course it does double the 
> interrupt rate, which is not so great but perhaps acceptable 
> for the kind of low interrupt rate devices that most people 
> would want to hand off to a hvm guest.
> 
Just FYI.

Neocleus'' pass-through patches performs the "change polarity"
trick.
With changing the polarity, our motivation was to reflect the allocated
device''s assertion state to the HVM AS IS.

Regarding the performance, using a USB 2.0 storage device (working with DMA), a
huge file copy was compared when working in pass-through, and when working in
native (on the same OS), the time differences were negligible so I''m
not sure yet about the impact of doubling the number of interrupts. The
advantage of changing the polarity is the simplicity.

Anyways, We''ll release some patches during the day so you could give
your comments.

Thanks,
Guy.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 06:49 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 30/5/07 23:02, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
>> How does this new scheme work? Can it supplant the
>> ioapic_ack=new method?
>> Clearly requiring the use of a hacky command-line option to
>> make use of a
>> new core feature is not very nice to say the least.
> 
> Basically, the calls to mask_IO_APIC_irq()/unMask_IO_APIC_irq()
> in arch/x86/io_apic.c were replaced with
> write_fake_IO_APIC_vector()/restore_real_IO_APIC_vector
> - where the fake vector is an unused vector.
> 
> Since the global interrupt from chipset only occurs during
> masked interrupts, this avoids that chipset bug that cause
> you to switch to ioapic_ack=new last year.  I believe it can
> supplant ioapic_ack=new method.
I''m not against removing the ''new'' ioapic-ack method
entirely if this is
better. I just don''t fully understand this replacement method yet.
I''ll need
a walkthrough of the new mask/unmask replacements, most likely!

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 13:20 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

It does be a ''cunning'' approach, which however seems to only
apply
to interrupt instance with both rising edge and falling edge like:

    _____________      ______________
___|      1        |____|       2        |____________

Just be curious whether following cases can be addressed, when 
one edge is missing.

[Case one]
	Is it possible for one device to keep line ''high'' for two
successive
instances, like:
    _________________________________________
___|      1        |       2                      ...

    When driver requests device to clear interrupt assertion at end 
of handling 1st, it''s possible that device keeps assertion if interrupt
condition still matches in 2nd. In that case, no interrupt will happen 
any more when EOI is written to IOAPIC due to polarity inversion.

[Case two]
	Similar to case one, two PCI devices share one interrupt pin:
PCI-A
    _______
___|     1  |_______________________
PCI-B
                _____________________________
______________|      2       
PIN
    _______    _____________________________
___|        |___|    ^EOI

If:
	- Guest finishes invocation to all irq actions hooked to that pin 
before PCI-B does assertion.
	- EOI to IOAPIC happens after PCI-B does assertion

The net effect is that line status keeps ''high'' after EOI and
polarity
inverse makes no interrupt again.

Maybe I didn''t get the exact detail of your named ''change
polarity''
idea, and if yes, appreciate your elaboration here. :-)

Thanks,
Kevin
>From: Guy Zana
>Sent: 2007年5月31日 14:05
>
>
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xensource.com
>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of
>> Keir Fraser
>> Sent: Wednesday, May 30, 2007 10:56 PM
>> To: Kay, Allen M; xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device
>> assignment using vt-d
>>
>
>>
>> Actually I also know there are some other patches coming down
>> the pipeline to do pci passthrough to HVM guests without need
>> for hardware support (of course it is not so general; in
>> particular it will only work for one special hvm guest).
>> However, they deal with this interrupt issue quite cunningly,
>> by inverting the interrupt polarity so that they get
>> interrupts on both +ve and -ve edges of the INTx line. This
>> allows the virtual interrupt wire to be ''wiggled''
precisely
>> according to the behaviour of the physical interrupt wire.
>> Which is rather nice, although of course it does double the
>> interrupt rate, which is not so great but perhaps acceptable
>> for the kind of low interrupt rate devices that most people
>> would want to hand off to a hvm guest.
>>
>
>Just FYI.
>
>Neocleus'' pass-through patches performs the "change
polarity" trick.
>With changing the polarity, our motivation was to reflect the allocated
>device''s assertion state to the HVM AS IS.
>
>Regarding the performance, using a USB 2.0 storage device (working
>with DMA), a huge file copy was compared when working in
>pass-through, and when working in native (on the same OS), the time
>differences were negligible so I''m not sure yet about the impact of
>doubling the number of interrupts. The advantage of changing the
>polarity is the simplicity.
>
>Anyways, We''ll release some patches during the day so you could
give
>your comments.
>
>Thanks,
>Guy.
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 13:37 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 31/5/07 14:20, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>     When driver requests device to clear interrupt assertion at end
> of handling 1st, it''s possible that device keeps assertion if
interrupt
> condition still matches in 2nd. In that case, no interrupt will happen
> any more when EOI is written to IOAPIC due to polarity inversion.
This is absolutely fine. The virtual wire status will remain HIGH in this
case, which is correct since the ''runt'' LOW pulse on the
physical wire can
be ignored. What we are looking for is to track the physical wire status in
the long run; pulses one way or the other do not matter.

Remember we are talking about *level-triggered* interrupt lines, not
edge-triggered. This polarity-change trick would not be used, and would not
be necessary, for edge-triggered interrupts. We would EOI the physical APIC
early, before running the ISR, just as usual for edge-triggered interrupts.

Because we are talking about level-triggered interrupts, if the line
continues to be HIGH after the ISR runs then of course we''ll just
deliver
another interrupt straight to the relevant VCPU. That''s how
level-triggered
interrupts work. :-)

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 13:59 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年5月31日 21:37
>
>On 31/5/07 14:20, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
>
>>     When driver requests device to clear interrupt assertion at end
>> of handling 1st, it''s possible that device keeps assertion if
interrupt
>> condition still matches in 2nd. In that case, no interrupt will happen
>> any more when EOI is written to IOAPIC due to polarity inversion.
>
>This is absolutely fine. The virtual wire status will remain HIGH in this
>case, which is correct since the ''runt'' LOW pulse on the
physical wire
>can
>be ignored. What we are looking for is to track the physical wire status in
>the long run; pulses one way or the other do not matter.
>
>Remember we are talking about *level-triggered* interrupt lines, not
>edge-triggered. This polarity-change trick would not be used, and would
>not
>be necessary, for edge-triggered interrupts. We would EOI the physical
>APIC
>early, before running the ISR, just as usual for edge-triggered interrupts.
>
>Because we are talking about level-triggered interrupts, if the line
>continues to be HIGH after the ISR runs then of course we''ll just
deliver
>another interrupt straight to the relevant VCPU. That''s how
>level-triggered
>interrupts work. :-)
>
> -- Keir
Ha, you''re exactly right. I forgot the virtual wire status which will 
anyway result a new virtual interrupt if no new hardware interrupt 
occurs as a de-assert signal. 

But... still one question, seems that current Xen doesn''t allow 
multiple end() methods called for one physical interrupt instance, 
while a new physical interrupt will happen only as result of end() 
(EOI for ioapic_new, and unmask RTE for ioapic_old). See below 
case:
	- 1st interrupt is injected and polarity is inverted
	- HVM finishes handle and write EOI to vIOAPIC
		- 1st is deasserted
	- 2nd instance happens
	- that EOI is converted into an invocation to end() method
	- either EOI or unmask RTE is issued to physical IOAPIC
	- No physical interrupt triggered due to inversed polarity
	- a new virtual interrupt is injected at next resume to HVM
	- HVM finishes handle and write EOI to vIOAPIC
		- 2nd instance is deasserted
	- EOI to vIOAPIC gears to end() again

Then it''s Xen to decide whether to allow one more end(), does it? 
I think this part may need some change for this ''change
polarity''
Approach, like check on pirq_mask. :-)

BTW, how about the alternative to take guest EOI to vIOAPIC as 
the deassertion hint for assigned device?

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guy Zana

2007-May-31 14:08 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com] 
> Sent: Thursday, May 31, 2007 4:21 PM
> To: Guy Zana; Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using vt-d
> 
> It does be a ''cunning'' approach, which however seems to
only
> apply to interrupt instance with both rising edge and falling 
> edge like:
> 
>     _____________      ______________
> ___|      1        |____|       2        |____________
> 
> Just be curious whether following cases can be addressed, 
> when one edge is missing.
> 
> [Case one]
> 	Is it possible for one device to keep line ''high'' for 
> two successive instances, like:
>     _________________________________________
> ___|      1        |       2                      ...
> 
>     When driver requests device to clear interrupt assertion 
> at end of handling 1st, it''s possible that device keeps 
> assertion if interrupt condition still matches in 2nd. In 
> that case, no interrupt will happen any more when EOI is 
> written to IOAPIC due to polarity inversion.
Since the HVM''s assertion state is kept "asserted" until the
_external_ line ''fall'', the HVM itself will keep getting
interrupts on VMENTRYs, until the _external_ line is deasserted (by the external
device). This reflects the real behavior of the external line.

If this behavior of interrupt is treated differently, you''ll get
redundant interrupts :)

> 
> [Case two]
> 	Similar to case one, two PCI devices share one interrupt pin:
> PCI-A
>     _______
> ___|     1  |_______________________
> PCI-B
>                 _____________________________
> ______________|      2       
> PIN
>     _______    _____________________________
> ___|        |___|    ^EOI
> 
> If:
> 	- Guest finishes invocation to all irq actions hooked 
> to that pin before PCI-B does assertion.
> 	- EOI to IOAPIC happens after PCI-B does assertion
> 
> The net effect is that line status keeps ''high'' after EOI
and
> polarity inverse makes no interrupt again.
If both devices works in pass-through, it should work, since it is the ORed line
that is reflected.
We can add functionality for sharing such devices between dom0 and a guest, by
changing the way dom0 handles level-triggered interrupts.

Thanks,
Guy.
> 
> Maybe I didn''t get the exact detail of your named ''change
polarity''
> idea, and if yes, appreciate your elaboration here. :-)
> 
> Thanks,
> Kevin
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 15:03 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 31/5/07 14:59, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> But... still one question, seems that current Xen doesn''t allow
> multiple end() methods called for one physical interrupt instance,
> while a new physical interrupt will happen only as result of end()
> (EOI for ioapic_new, and unmask RTE for ioapic_old). See below
> case:
Yeah, well I haven''t looked at how the Neocleus patches actually deal
with
this, but I expect they might steal hvm-bound physical irqs completely, hook
them off early in do_IRQ() and have bespoke code to deal with them. Possibly
it could be integrated with existing ->ack and ->end methods, actually.
->end() would be no-op while ->ack() would switch polarity in the IOAPIC
and
then EOI the LAPIC. Then the handler function called by do_IRQ() would
toggle virtual HVM wires.

Anyhow, integrating with existing Xen IRQ handling subsystem clearly
isn''t a
rocket-science problem. :-)

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 15:10 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年5月31日 23:03
>
>On 31/5/07 14:59, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
>
>> But... still one question, seems that current Xen doesn''t
allow
>> multiple end() methods called for one physical interrupt instance,
>> while a new physical interrupt will happen only as result of end()
>> (EOI for ioapic_new, and unmask RTE for ioapic_old). See below
>> case:
>
>Yeah, well I haven''t looked at how the Neocleus patches actually
deal
>with
>this, but I expect they might steal hvm-bound physical irqs completely,
>hook
>them off early in do_IRQ() and have bespoke code to deal with them.
>Possibly
>it could be integrated with existing ->ack and ->end methods,
actually.
>->end() would be no-op while ->ack() would switch polarity in the
IOAPIC
>and
>then EOI the LAPIC. Then the handler function called by do_IRQ() would
>toggle virtual HVM wires.
>
>Anyhow, integrating with existing Xen IRQ handling subsystem clearly
>isn''t a
>rocket-science problem. :-)
>
Sure. :-) But I''m still thinking the effect to use virtual EOI as
de-assertion
signal, which doesn''t require to change polarity in the line
frequently. We
can just add a flag per gsi to indicate whether a physical irq is injected 
on this line. When intercepting HVM EOI, invoke deassert and also jump 
to ->end() if flag is on. Does it work basically?

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 15:14 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 31/5/07 16:10, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> Sure. :-) But I''m still thinking the effect to use virtual EOI as
de-assertion
> signal, which doesn''t require to change polarity in the line
frequently. We
> can just add a flag per gsi to indicate whether a physical irq is injected
> on this line. When intercepting HVM EOI, invoke deassert and also jump
> to ->end() if flag is on. Does it work basically?
I didn''t realise you were suggesting another mechanism. It''s
not clear to me
how it works from the very brief description you give above. Could you
provide an example or two for how your method would work (e.g., one which
avoids switching polarity, and another where you do end up switching
polarity)?

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 15:30 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年5月31日 23:15
>
>On 31/5/07 16:10, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
>
>> Sure. :-) But I''m still thinking the effect to use virtual EOI
as
>de-assertion
>> signal, which doesn''t require to change polarity in the line
frequently.
>We
>> can just add a flag per gsi to indicate whether a physical irq is
injected
>> on this line. When intercepting HVM EOI, invoke deassert and also
>jump
>> to ->end() if flag is on. Does it work basically?
>
>I didn''t realise you were suggesting another mechanism.
It''s not clear to
>me
>how it works from the very brief description you give above. Could you
>provide an example or two for how your method would work (e.g., one
>which
>avoids switching polarity, and another where you do end up switching
>polarity)?
>
> -- Keir
OK, my rough thought is as below:

	The reason to change polarity, IMO, is to capture the de-assert 
edge in the physical wire and then reflect de-assertion into the virtual 
wire. Then allow the statistics on gsi_assert_count to be updated 
correctly, when shared with virtual devices in Qemu.

	My proposal is to take virtual EOI as the de-assertion hint, without 
any change on physical RTE property like polarity. For example, the 
flow could be following by keeping a saying hw_assert_status array for 
all virtual GSIs: (take vioapic for example)

	- physical interrupt happens, and ->ack()
	- assert into virtual wire with assert count incremented, and also 
set hw_assert_status[gsi]
	- HVM handles the interrupt, and write EOI to vlapic, and then 
vioapic
	- vioapic_update_EOI then:
		- check whether hw_assert_status[gsi] is set, if yes:
			- invoke __hvm_pci_intx_deassert to decrement the count
			- ->end()
		- check whether injecting a new instance based on gsi count (original logic)

	->end() may trigger another physical interrupt if physical wire 
keeps ''high'' due to any reason. Of course, some code change
may be
required to allow hvm_irq logic and vioapic/vpic logic to call each other, 
like the lock issue.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 15:40 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 31/5/07 16:30, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> OK, my rough thought is as below:
> 
> The reason to change polarity, IMO, is to capture the de-assert
> edge in the physical wire and then reflect de-assertion into the virtual
> wire. Then allow the statistics on gsi_assert_count to be updated
> correctly, when shared with virtual devices in Qemu.
> 
> My proposal is to take virtual EOI as the de-assertion hint, without
> any change on physical RTE property like polarity. For example, the
> flow could be following by keeping a saying hw_assert_status array for
> all virtual GSIs: (take vioapic for example)
Ah, okay, so no polarity switching at all. Basically use VIOAPIC EOI as a
hint to tentatively drop the virtual wire to LOW, and only then ->end the
physical interrupt. I guess this is pretty much what you already implement
in your VT-d patches?

It''d be interesting to know how these two approaches compare
performance-wise. I suppose yours should win, really, due to fewer physical
interrupts.

If this is how your current VT-d patches handle interrupts then I don''t
see
why ioapic_ack=new is not working for you. That''s a bit weird. I guess
I
could read the patches some more. ;-)

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 15:51 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年5月31日 23:40
>
>On 31/5/07 16:30, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
>
>> OK, my rough thought is as below:
>>
>> The reason to change polarity, IMO, is to capture the de-assert
>> edge in the physical wire and then reflect de-assertion into the
virtual
>> wire. Then allow the statistics on gsi_assert_count to be updated
>> correctly, when shared with virtual devices in Qemu.
>>
>> My proposal is to take virtual EOI as the de-assertion hint, without
>> any change on physical RTE property like polarity. For example, the
>> flow could be following by keeping a saying hw_assert_status array for
>> all virtual GSIs: (take vioapic for example)
>
>Ah, okay, so no polarity switching at all. Basically use VIOAPIC EOI as a
>hint to tentatively drop the virtual wire to LOW, and only then ->end the
>physical interrupt. I guess this is pretty much what you already
>implement
>in your VT-d patches?
>
>It''d be interesting to know how these two approaches compare
>performance-wise. I suppose yours should win, really, due to fewer
>physical
>interrupts.
>
>If this is how your current VT-d patches handle interrupts then I
don''t see
>why ioapic_ack=new is not working for you. That''s a bit weird. I
guess I
>could read the patches some more. ;-)
>
> -- Keir

Oh, I''m not the author of VT-d patches which is the credit of Allen and
Xiaohui. :-) I just had the concrete thought along with the discussion 
with you, and will talk to them for confirmation tomorrow. I guess 
"ioapic_ack=new" should be just some manual bug since one NIC 
assignment shouldn''t result shared interrupt case yet.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 15:52 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote:
> It''d be interesting to know how these two approaches compare
> performance-wise. I suppose yours should win, really, due to fewer physical
> interrupts.
One thing is that the polarity-switching approach is a slightly better fit
with the HVM interrupt logic. Currently interrupt sources and VIOAPIC are
not tightly bound together; they only interact by one waggling the virtual
intx wires and the other sampling that wire periodically (or synchronously
on +ve edges). Your approach requires a ''back channel'' from
the VIOAPIC code
back to physical interrupt code to call ->end(). It''s kind of ugly.
On the
other hand I suspect the polarity-switching code adds more stuff to the
phsyical interrupt subsystem, and your approach can certainly be supported,
probably by adding a bit more state (maybe just a single bit) per virtual
intx wire. Really we need to look at and measure each implementation...

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 15:59 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年5月31日 23:52
>
>On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote:
>
>> It''d be interesting to know how these two approaches compare
>> performance-wise. I suppose yours should win, really, due to fewer
>physical
>> interrupts.
>
>One thing is that the polarity-switching approach is a slightly better fit
>with the HVM interrupt logic. Currently interrupt sources and VIOAPIC
>are
>not tightly bound together; they only interact by one waggling the virtual
>intx wires and the other sampling that wire periodically (or synchronously
>on +ve edges). Your approach requires a ''back channel''
from the
>VIOAPIC code
>back to physical interrupt code to call ->end(). It''s kind of
ugly. On the
>other hand I suspect the polarity-switching code adds more stuff to the
>phsyical interrupt subsystem, and your approach can certainly be
>supported,
>probably by adding a bit more state (maybe just a single bit) per virtual
>intx wire. Really we need to look at and measure each implementation...
>
> -- Keir
Agree to support both with a common infrastructure. But I doubt that 
polarity-switching code should also use such ->end call in virtual EOI 
path, since you anyway need an unmask or EOI signal to physical 
ioapic. Or else, how to trigger the 2nd interrupt at falling-edge?

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-May-31 16:03 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>From: Tian, Kevin
>Sent: 2007年5月31日 23:59
>
>>From: Keir Fraser [mailto:keir@xensource.com]
>>Sent: 2007年5月31日 23:52
>>
>>On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com>
wrote:
>>
>>> It''d be interesting to know how these two approaches
compare
>>> performance-wise. I suppose yours should win, really, due to fewer
>>physical
>>> interrupts.
>>
>>One thing is that the polarity-switching approach is a slightly better
fit
>>with the HVM interrupt logic. Currently interrupt sources and VIOAPIC
>>are
>>not tightly bound together; they only interact by one waggling the
virtual
>>intx wires and the other sampling that wire periodically (or
>synchronously
>>on +ve edges). Your approach requires a ''back channel''
from the
>>VIOAPIC code
>>back to physical interrupt code to call ->end(). It''s kind
of ugly. On the
>>other hand I suspect the polarity-switching code adds more stuff to the
>>phsyical interrupt subsystem, and your approach can certainly be
>>supported,
>>probably by adding a bit more state (maybe just a single bit) per
virtual
>>intx wire. Really we need to look at and measure each
>implementation...
>>
>> -- Keir
>
>Agree to support both with a common infrastructure. But I doubt that
>polarity-switching code should also use such ->end call in virtual EOI
>path, since you anyway need an unmask or EOI signal to physical
>ioapic. Or else, how to trigger the 2nd interrupt at falling-edge?
>
>Thanks,
>Kevin
Oh, forgive my ignorance. That can be done in ->ack() by 
changing polarity and then EOI as what you said before. :-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guy Zana

2007-May-31 16:07 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com] 
> Sent: Thursday, May 31, 2007 7:04 PM
> To: Tian, Kevin; Keir Fraser; Guy Zana; Kay, Allen M; 
> xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using vt-d
> 
> >From: Tian, Kevin
> >Sent: 2007年5月31日 23:59
> >
> >>From: Keir Fraser [mailto:keir@xensource.com]
> >>Sent: 2007年5月31日 23:52
> >>
> >>On 31/5/07 16:40, "Keir Fraser"
<keir@xensource.com> wrote:
> >>
> >>> It''d be interesting to know how these two approaches
compare
> >>> performance-wise. I suppose yours should win, really, due to
fewer
> >>physical
> >>> interrupts.
> >>
> >>One thing is that the polarity-switching approach is a 
> slightly better 
> >>fit with the HVM interrupt logic. Currently interrupt sources and 
> >>VIOAPIC are not tightly bound together; they only interact by one 
> >>waggling the virtual intx wires and the other sampling that wire 
> >>periodically (or
> >synchronously
> >>on +ve edges). Your approach requires a ''back
channel'' from the
> >>VIOAPIC code back to physical interrupt code to call ->end().
It''s
> >>kind of ugly. On the other hand I suspect the 
> polarity-switching code 
> >>adds more stuff to the phsyical interrupt subsystem, and 
> your approach 
> >>can certainly be supported, probably by adding a bit more 
> state (maybe 
> >>just a single bit) per virtual intx wire. Really we need to look at
> >>and measure each
> >implementation...
> >>
> >> -- Keir
> >
> >Agree to support both with a common infrastructure. But I doubt that 
> >polarity-switching code should also use such ->end call in 
> virtual EOI 
> >path, since you anyway need an unmask or EOI signal to 
> physical ioapic. 
> >Or else, how to trigger the 2nd interrupt at falling-edge?
> >
> >Thanks,
> >Kevin
> 
> Oh, forgive my ignorance. That can be done in ->ack() by 
> changing polarity and then EOI as what you said before. :-)
> 
We did it by replacing the end() callback of the hw_interrupt_type, the new
handler performs a change_vector_polarity and then calls the original ->end()
of the level-triggered hw_interrupt_type, all the rest of the callbacks stays
the same.

I hope to get the patch ready today, but it will be for c/s 15011.

Thanks,
Guy.
> Thanks,
> Kevin
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-31 16:28 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 31/5/07 16:59, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> Agree to support both with a common infrastructure.
We don''t need both! Let''s look at which is cleanest and/or
fastest and make
a judgment call.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2007-May-31 17:43 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>
>Just FYI.
>
>Neocleus'' pass-through patches performs the "change
polarity" trick.
>With changing the polarity, our motivation was to reflect the 
>allocated device''s assertion state to the HVM AS IS.
>
>Regarding the performance, using a USB 2.0 storage device 
>(working with DMA), a huge file copy was compared when working 
>in pass-through, and when working in native (on the same OS), 
>the time differences were negligible so I''m not sure yet about 
>the impact of doubling the number of interrupts. The advantage 
>of changing the polarity is the simplicity.
>
>Anyways, We''ll release some patches during the day so you 
>could give your comments.
>
>Thanks,
>Guy.
>
How do you handle DMA buffers without hardware support?  Did you
modify the device driver in HVM to get the machine physical address?

Sounds like the conflict is only limited to the vt-d interrupt
patch (vtd3.patch) - which is a relatively small part vt-d patch set.

Once your patch is released, I will take a look at it.

Allen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guy Zana

2007-May-31 18:00 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

> -----Original Message-----
> From: Kay, Allen M [mailto:allen.m.kay@intel.com] 
> Sent: Thursday, May 31, 2007 8:44 PM
> To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using vt-d
> 
> >
> >Just FYI.
> >
> >Neocleus'' pass-through patches performs the "change
polarity" trick.
> >With changing the polarity, our motivation was to reflect 
> the allocated 
> >device''s assertion state to the HVM AS IS.
> >
> >Regarding the performance, using a USB 2.0 storage device 
> (working with 
> >DMA), a huge file copy was compared when working in 
> pass-through, and 
> >when working in native (on the same OS), the time differences were 
> >negligible so I''m not sure yet about the impact of doubling 
> the number 
> >of interrupts. The advantage of changing the polarity is the 
> >simplicity.
> >
> >Anyways, We''ll release some patches during the day so you
could give
> >your comments.
> >
> >Thanks,
> >Guy.
> >
> 
> How do you handle DMA buffers without hardware support?  Did 
> you modify the device driver in HVM to get the machine 
> physical address?
We actually launch a HVM domain with its P2M table populated in a 1:1 fashion
(where the gpfn==mfn),
We gave a lecture at the last Xen Summit, you can see it at:
http://www.xensource.com/files/xensummit_4/Neocleus_HVM_PCI_Pass-through_Zana.pdf

The 1:1 layout is still not robust as we would like it to be, it
doesn''t support domain recreation for instance.
> 
> Once your patch is released, I will take a look at it.
Sure.

Thanks,
Guy.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2007-May-31 18:42 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>> 
>> Once your patch is released, I will take a look at it.
>
>Sure.
>
>Thanks,
>Guy.
>
If possible, can you package the interrupt part of the patch separately
so that it can be easily tried out?  Thanks.

Allen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-Jun-01 02:57 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Two more minor comments:
	- For polarity-switching approach, now I''m inclined to applaud if it 
can help handle ''boot interrupt'' issue as fuse of
ioapic_ack_new. That
brings more value than simple assist on virtual wire de-assertion.

	- For ->end() in VIOAPIC code, I think that''s not ugly since
similar
to pirq_guest_eoi used in do_physdev_op which also comes from end() 
method of pirq_type in dom0.

Thanks,
Kevin
>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年5月31日 23:52
>
>On 31/5/07 16:40, "Keir Fraser" <keir@xensource.com> wrote:
>
>> It''d be interesting to know how these two approaches compare
>> performance-wise. I suppose yours should win, really, due to fewer
>physical
>> interrupts.
>
>One thing is that the polarity-switching approach is a slightly better fit
>with the HVM interrupt logic. Currently interrupt sources and VIOAPIC
>are
>not tightly bound together; they only interact by one waggling the virtual
>intx wires and the other sampling that wire periodically (or synchronously
>on +ve edges). Your approach requires a ''back channel''
from the
>VIOAPIC code
>back to physical interrupt code to call ->end(). It''s kind of
ugly. On the
>other hand I suspect the polarity-switching code adds more stuff to the
>phsyical interrupt subsystem, and your approach can certainly be
>supported,
>probably by adding a bit more state (maybe just a single bit) per virtual
>intx wire. Really we need to look at and measure each implementation...
>
> -- Keir
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2007-Jun-03 08:29 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Base on my understanding of the Neocleus'' passthrough patch, it seems
all devices sharing that interrupt will get the double number of
interrupts.  This means if a interrupt is shared between a NIC device
used by a HVM guest and a SATA device used by dom0, the SATA driver in
dom0 will also get twice the number of interrupts.  Am I correct?

Allen 
>-----Original Message-----
>From: Guy Zana [mailto:guy@neocleus.com] 
>Sent: Wednesday, May 30, 2007 11:05 PM
>To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com
>Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
>assignment using vt-d
>
>
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xensource.com 
>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
>> Keir Fraser
>> Sent: Wednesday, May 30, 2007 10:56 PM
>> To: Kay, Allen M; xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device 
>> assignment using vt-d
>> 
>
>> 
>> Actually I also know there are some other patches coming down 
>> the pipeline to do pci passthrough to HVM guests without need 
>> for hardware support (of course it is not so general; in 
>> particular it will only work for one special hvm guest). 
>> However, they deal with this interrupt issue quite cunningly, 
>> by inverting the interrupt polarity so that they get 
>> interrupts on both +ve and -ve edges of the INTx line. This 
>> allows the virtual interrupt wire to be ''wiggled''
precisely
>> according to the behaviour of the physical interrupt wire. 
>> Which is rather nice, although of course it does double the 
>> interrupt rate, which is not so great but perhaps acceptable 
>> for the kind of low interrupt rate devices that most people 
>> would want to hand off to a hvm guest.
>> 
>
>Just FYI.
>
>Neocleus'' pass-through patches performs the "change
polarity" trick.
>With changing the polarity, our motivation was to reflect the 
>allocated device''s assertion state to the HVM AS IS.
>
>Regarding the performance, using a USB 2.0 storage device 
>(working with DMA), a huge file copy was compared when working 
>in pass-through, and when working in native (on the same OS), 
>the time differences were negligible so I''m not sure yet about 
>the impact of doubling the number of interrupts. The advantage 
>of changing the polarity is the simplicity.
>
>Anyways, We''ll release some patches during the day so you 
>could give your comments.
>
>Thanks,
>Guy.
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jun-03 08:37 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 3/6/07 09:29, "Kay, Allen M" <allen.m.kay@intel.com> wrote:
> Base on my understanding of the Neocleus'' passthrough patch, it
seems
> all devices sharing that interrupt will get the double number of
> interrupts.  This means if a interrupt is shared between a NIC device
> used by a HVM guest and a SATA device used by dom0, the SATA driver in
> dom0 will also get twice the number of interrupts.  Am I correct?
No, it should be the case that the device ISRs are only invoked about as
often as in your scheme. The extra interrupts are only visible to Xen, and
tell Xen to deassert the virtual INTx wire for any hvm-attached devices
which share that physical interrupt.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guy Zana

2007-Jun-03 09:59 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Sort of... Our method might doubles the number of interrupts if both devices are
connected to the same pin, but since all devices are OR wired, you might even
"save" *physical* interrupts from happening -> I guess that
we''ll get a decisive answer only after performing some profiling.

Our method will not work "out of the box" if you''re trying to
use it when sharing a pin between dom0 and an HVM.
Consider the following scenario:

HVM:
               _____________________
        ____|                                    |___________________

Dom0:
                          ____________________________________
        __________|

Phys Line:
                __________________________________________
        ____|


        A    B         C                       D


In point B you changed the polarity. In point C and D you won''t be
getting any interrupts since of the polarity-change, and the device that is
allocated for dom0 will keep its line asserted until the dom0 driver will handle
the interrupt, but it won''t get a chance to do so, moreover, the hvm
vline will still be kept asserted.

We are currently modeling the problem, it seems that it''s a complicated
concept, regardless of changing-polarity. For instance, an HVM with a Linux OS
will die if 99,900 interrupts out of 100,000 are not handled.
>From a logical POV, the aforementioned race is solved like this: we can hold
a virtual assertion line for _dom0_ (which will be updated by the arrival of
interrupts as a result from change-polarity) and concatenate the HVM''s
ISR chain with dom0''s ISR chain, and dom0 must be the first to try
handle the interrupt (because of the 99,000 to 100,000 problem), I guess that
pass-through shared interrupts probably should be handled as the last (default)
function in dom0''s ISR chain.
How do you plan to provide interrupts sharing with your method exactly?
Please provide your thoughts.

Thanks,
Guy.
> -----Original Message-----
> From: Kay, Allen M [mailto:allen.m.kay@intel.com] 
> Sent: Sunday, June 03, 2007 11:29 AM
> To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using vt-d
> 
> Base on my understanding of the Neocleus'' passthrough patch, 
> it seems all devices sharing that interrupt will get the 
> double number of interrupts.  This means if a interrupt is 
> shared between a NIC device used by a HVM guest and a SATA 
> device used by dom0, the SATA driver in dom0 will also get 
> twice the number of interrupts.  Am I correct?
> 
> Allen 
> 
> >-----Original Message-----
> >From: Guy Zana [mailto:guy@neocleus.com]
> >Sent: Wednesday, May 30, 2007 11:05 PM
> >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com
> >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using 
> >vt-d
> >
> >
> >> -----Original Message-----
> >> From: xen-devel-bounces@lists.xensource.com
> >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir 
> >> Fraser
> >> Sent: Wednesday, May 30, 2007 10:56 PM
> >> To: Kay, Allen M; xen-devel@lists.xensource.com
> >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using 
> >> vt-d
> >> 
> >
> >> 
> >> Actually I also know there are some other patches coming down the 
> >> pipeline to do pci passthrough to HVM guests without need for 
> >> hardware support (of course it is not so general; in particular it
> >> will only work for one special hvm guest).
> >> However, they deal with this interrupt issue quite cunningly, by 
> >> inverting the interrupt polarity so that they get 
> interrupts on both 
> >> +ve and -ve edges of the INTx line. This allows the 
> virtual interrupt 
> >> wire to be ''wiggled'' precisely according to the
behaviour of the
> >> physical interrupt wire.
> >> Which is rather nice, although of course it does double 
> the interrupt 
> >> rate, which is not so great but perhaps acceptable for the kind of
> >> low interrupt rate devices that most people would want to 
> hand off to 
> >> a hvm guest.
> >> 
> >
> >Just FYI.
> >
> >Neocleus'' pass-through patches performs the "change
polarity" trick.
> >With changing the polarity, our motivation was to reflect the 
> >allocated device''s assertion state to the HVM AS IS.
> >
> >Regarding the performance, using a USB 2.0 storage device 
> >(working with DMA), a huge file copy was compared when working 
> >in pass-through, and when working in native (on the same OS), 
> >the time differences were negligible so I''m not sure yet about
> >the impact of doubling the number of interrupts. The advantage 
> >of changing the polarity is the simplicity.
> >
> >Anyways, We''ll release some patches during the day so you 
> >could give your comments.
> >
> >Thanks,
> >Guy.
> >
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2007-Jun-03 13:29 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

The sequence of interrupt injection doesn''t matter actually, since you 
can''t wait and inject to next domain only after previous one in the
chain
doesn''t handle it which is very low efficient.

To me the unhandled irq issue (as 99900 out of 100000) is inevitable. 
Say irq sharing among 2 HVM domains, with one assigned a high rate 
PCI device like NIC and the other assigned with a low rate PCI device 
like UHCI, it''s likely to have over 100000 interrupts from NIC with
UHCI
silent in given period. Since, from Xen point of view, there''s no way
to
know which HVM guest owns given interrupt instance, same amount 
of interrupts will be injected into both HVM domains. 

We may force "noirqdebug", however that may not apply to all linux 
version and other OSes from HVM side.

Actually there''re more tricky things to consider for irq sharing among 
domains. For example:
	- Driver in one HVM domain may leave device in interrupt 
assertion status while having related virtual wire always masked (like 
an unclean driver unload). 

	- When OS first mask PIC entry and then unmask IOAPIC entry 
one interrupt may occur in the middle and IOAPIC doesn''t pend when 
masked). So that pending indicator in PIC is missed.

	Such rare cases can block the other domain sharing same irq, 
once occurring unfortunately. This breaks the isolation between domains 
heavily, which is common issue whatever approach we use to share irq. 

	Maybe better way is to use MSI instead and we may then avoid above irq share
issue from management tool side. For example, avoid
sharing devices with same irq among domains when MSI is not able to 
use...

Thanks,
Kevin
>-----Original Message-----
>From: Guy Zana [mailto:guy@neocleus.com]
>Sent: 2007年6月3日 17:59
>To: Kay, Allen M; Keir Fraser; xen-devel@lists.xensource.com
>Cc: Tian, Kevin
>Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using
>vt-d
>
>Sort of... Our method might doubles the number of interrupts if both
>devices are connected to the same pin, but since all devices are OR
>wired, you might even "save" *physical* interrupts from happening
-> I
>guess that we''ll get a decisive answer only after performing some
>profiling.
>
>Our method will not work "out of the box" if you''re
trying to use it when
>sharing a pin between dom0 and an HVM.
>Consider the following scenario:
>
>HVM:
>               _____________________
>        ____|
>|___________________
>
>Dom0:
>
>____________________________________
>        __________|
>
>Phys Line:
>                __________________________________________
>        ____|
>
>
>        A    B         C                       D
>
>
>In point B you changed the polarity. In point C and D you won''t be
getting
>any interrupts since of the polarity-change, and the device that is
>allocated for dom0 will keep its line asserted until the dom0 driver will
>handle the interrupt, but it won''t get a chance to do so, moreover,
the
>hvm vline will still be kept asserted.
>
>We are currently modeling the problem, it seems that it''s a
complicated
>concept, regardless of changing-polarity. For instance, an HVM with a
>Linux OS will die if 99,900 interrupts out of 100,000 are not handled.
>
>From a logical POV, the aforementioned race is solved like this: we can
>hold a virtual assertion line for _dom0_ (which will be updated by the
>arrival of interrupts as a result from change-polarity) and concatenate the
>HVM''s ISR chain with dom0''s ISR chain, and dom0 must be
the first to
>try handle the interrupt (because of the 99,000 to 100,000 problem), I
>guess that pass-through shared interrupts probably should be handled
>as the last (default) function in dom0''s ISR chain.
>
>How do you plan to provide interrupts sharing with your method exactly?
>Please provide your thoughts.
>
>Thanks,
>Guy.
>
>> -----Original Message-----
>> From: Kay, Allen M [mailto:allen.m.kay@intel.com]
>> Sent: Sunday, June 03, 2007 11:29 AM
>> To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com
>> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device
>> assignment using vt-d
>>
>> Base on my understanding of the Neocleus'' passthrough patch,
>> it seems all devices sharing that interrupt will get the
>> double number of interrupts.  This means if a interrupt is
>> shared between a NIC device used by a HVM guest and a SATA
>> device used by dom0, the SATA driver in dom0 will also get
>> twice the number of interrupts.  Am I correct?
>>
>> Allen
>>
>> >-----Original Message-----
>> >From: Guy Zana [mailto:guy@neocleus.com]
>> >Sent: Wednesday, May 30, 2007 11:05 PM
>> >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com
>> >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device
>> assignment using
>> >vt-d
>> >
>> >
>> >> -----Original Message-----
>> >> From: xen-devel-bounces@lists.xensource.com
>> >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of
Keir
>> >> Fraser
>> >> Sent: Wednesday, May 30, 2007 10:56 PM
>> >> To: Kay, Allen M; xen-devel@lists.xensource.com
>> >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device
>> assignment using
>> >> vt-d
>> >>
>> >
>> >>
>> >> Actually I also know there are some other patches coming down
the
>> >> pipeline to do pci passthrough to HVM guests without need for
>> >> hardware support (of course it is not so general; in
particular it
>> >> will only work for one special hvm guest).
>> >> However, they deal with this interrupt issue quite cunningly,
by
>> >> inverting the interrupt polarity so that they get
>> interrupts on both
>> >> +ve and -ve edges of the INTx line. This allows the
>> virtual interrupt
>> >> wire to be ''wiggled'' precisely according to
the behaviour of the
>> >> physical interrupt wire.
>> >> Which is rather nice, although of course it does double
>> the interrupt
>> >> rate, which is not so great but perhaps acceptable for the
kind of
>> >> low interrupt rate devices that most people would want to
>> hand off to
>> >> a hvm guest.
>> >>
>> >
>> >Just FYI.
>> >
>> >Neocleus'' pass-through patches performs the "change
polarity" trick.
>> >With changing the polarity, our motivation was to reflect the
>> >allocated device''s assertion state to the HVM AS IS.
>> >
>> >Regarding the performance, using a USB 2.0 storage device
>> >(working with DMA), a huge file copy was compared when working
>> >in pass-through, and when working in native (on the same OS),
>> >the time differences were negligible so I''m not sure yet
about
>> >the impact of doubling the number of interrupts. The advantage
>> >of changing the polarity is the simplicity.
>> >
>> >Anyways, We''ll release some patches during the day so you
>> >could give your comments.
>> >
>> >Thanks,
>> >Guy.
>> >
>>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jun-03 13:35 UTC

head link

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

On 3/6/07 14:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> Maybe better way is to use MSI instead and we may then avoid above irq
share
> issue from management tool side. For example, avoid
> sharing devices with same irq among domains when MSI is not able to
> use...
Yes, any sharing of legacy INTx lines is inherently unrobust. There''s
not
much we can do about that, but many devices, chipsets and OSes do support
MSI these days.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guy Zana

2007-Jun-03 14:35 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com] 
> Sent: Sunday, June 03, 2007 4:30 PM
> To: Guy Zana; Kay, Allen M; Keir Fraser; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using vt-d
> 
> The sequence of interrupt injection doesn''t matter actually, 
> since you can''t wait and inject to next domain only after 
> previous one in the chain doesn''t handle it which is very low 
> efficient.
> 
> To me the unhandled irq issue (as 99900 out of 100000) is inevitable. 
> Say irq sharing among 2 HVM domains, with one assigned a high 
> rate PCI device like NIC and the other assigned with a low 
> rate PCI device like UHCI, it''s likely to have over 100000 
> interrupts from NIC with UHCI silent in given period. Since, 
> from Xen point of view, there''s no way to know which HVM 
> guest owns given interrupt instance, same amount of 
> interrupts will be injected into both HVM domains. 
Sharing an irq between two HVMs is surely not something we would want to handle
right now.
> 
> We may force "noirqdebug", however that may not apply to all 
> linux version and other OSes from HVM side.
> 
Maybe we would like to add a PV dummy-driver to the HVM, that will register on
that IRQ and solve the 99,900:100,000 problem?
I think that HVM assert/deassert state should be set only after giving dom0 a
chance to handle the IRQ is more robust.

> Actually there''re more tricky things to consider for irq 
> sharing among domains. For example:
> 	- Driver in one HVM domain may leave device in 
> interrupt assertion status while having related virtual wire 
> always masked (like an unclean driver unload). 
> 
> 	- When OS first mask PIC entry and then unmask IOAPIC 
> entry one interrupt may occur in the middle and IOAPIC 
> doesn''t pend when masked). So that pending indicator in PIC is
missed.
> 
> 	Such rare cases can block the other domain sharing same 
> irq, once occurring unfortunately. This breaks the isolation 
> between domains heavily, which is common issue whatever 
> approach we use to share irq. 
> 
> 	Maybe better way is to use MSI instead and we may then 
> avoid above irq share issue from management tool side. For 
> example, avoid sharing devices with same irq among domains 
> when MSI is not able to use...
We can also disable the driver in dom0 :-)
> 
> Thanks,
> Kevin
> 
> >-----Original Message-----
> >From: Guy Zana [mailto:guy@neocleus.com]
> >Sent: 2007年6月3日 17:59
> >To: Kay, Allen M; Keir Fraser; xen-devel@lists.xensource.com
> >Cc: Tian, Kevin
> >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using 
> >vt-d
> >
> >Sort of... Our method might doubles the number of interrupts if both 
> >devices are connected to the same pin, but since all devices are OR 
> >wired, you might even "save" *physical* interrupts from 
> happening -> I 
> >guess that we''ll get a decisive answer only after performing
some
> >profiling.
> >
> >Our method will not work "out of the box" if you''re
trying to use it
> >when sharing a pin between dom0 and an HVM.
> >Consider the following scenario:
> >
> >HVM:
> >               _____________________
> >        ____|
> >|___________________
> >
> >Dom0:
> >
> >____________________________________
> >        __________|
> >
> >Phys Line:
> >                __________________________________________
> >        ____|
> >
> >
> >        A    B         C                       D
> >
> >
> >In point B you changed the polarity. In point C and D you
won''t be
> >getting any interrupts since of the polarity-change, and the device 
> >that is allocated for dom0 will keep its line asserted until 
> the dom0 
> >driver will handle the interrupt, but it won''t get a chance 
> to do so, 
> >moreover, the hvm vline will still be kept asserted.
> >
> >We are currently modeling the problem, it seems that it''s a 
> complicated 
> >concept, regardless of changing-polarity. For instance, an 
> HVM with a 
> >Linux OS will die if 99,900 interrupts out of 100,000 are 
> not handled.
> >
> >From a logical POV, the aforementioned race is solved like 
> this: we can 
> >hold a virtual assertion line for _dom0_ (which will be 
> updated by the 
> >arrival of interrupts as a result from change-polarity) and 
> concatenate 
> >the HVM''s ISR chain with dom0''s ISR chain, and dom0
must be
> the first 
> >to try handle the interrupt (because of the 99,000 to 
> 100,000 problem), 
> >I guess that pass-through shared interrupts probably should 
> be handled 
> >as the last (default) function in dom0''s ISR chain.
> >
> >How do you plan to provide interrupts sharing with your 
> method exactly?
> >Please provide your thoughts.
> >
> >Thanks,
> >Guy.
> >
> >> -----Original Message-----
> >> From: Kay, Allen M [mailto:allen.m.kay@intel.com]
> >> Sent: Sunday, June 03, 2007 11:29 AM
> >> To: Guy Zana; Keir Fraser; xen-devel@lists.xensource.com
> >> Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device 
> assignment using 
> >> vt-d
> >>
> >> Base on my understanding of the Neocleus'' passthrough 
> patch, it seems 
> >> all devices sharing that interrupt will get the double number of 
> >> interrupts.  This means if a interrupt is shared between a 
> NIC device 
> >> used by a HVM guest and a SATA device used by dom0, the 
> SATA driver 
> >> in dom0 will also get twice the number of interrupts.  Am 
> I correct?
> >>
> >> Allen
> >>
> >> >-----Original Message-----
> >> >From: Guy Zana [mailto:guy@neocleus.com]
> >> >Sent: Wednesday, May 30, 2007 11:05 PM
> >> >To: Keir Fraser; Kay, Allen M; xen-devel@lists.xensource.com
> >> >Subject: RE: [Xen-devel] [VTD][patch 0/5] HVM device
> >> assignment using
> >> >vt-d
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: xen-devel-bounces@lists.xensource.com
> >> >> [mailto:xen-devel-bounces@lists.xensource.com] On 
> Behalf Of Keir 
> >> >> Fraser
> >> >> Sent: Wednesday, May 30, 2007 10:56 PM
> >> >> To: Kay, Allen M; xen-devel@lists.xensource.com
> >> >> Subject: Re: [Xen-devel] [VTD][patch 0/5] HVM device
> >> assignment using
> >> >> vt-d
> >> >>
> >> >
> >> >>
> >> >> Actually I also know there are some other patches 
> coming down the 
> >> >> pipeline to do pci passthrough to HVM guests without need
for
> >> >> hardware support (of course it is not so general; in 
> particular it 
> >> >> will only work for one special hvm guest).
> >> >> However, they deal with this interrupt issue quite 
> cunningly, by 
> >> >> inverting the interrupt polarity so that they get
> >> interrupts on both
> >> >> +ve and -ve edges of the INTx line. This allows the
> >> virtual interrupt
> >> >> wire to be ''wiggled'' precisely
according to the
> behaviour of the 
> >> >> physical interrupt wire.
> >> >> Which is rather nice, although of course it does double
> >> the interrupt
> >> >> rate, which is not so great but perhaps acceptable for 
> the kind of 
> >> >> low interrupt rate devices that most people would want to
> >> hand off to
> >> >> a hvm guest.
> >> >>
> >> >
> >> >Just FYI.
> >> >
> >> >Neocleus'' pass-through patches performs the
"change
> polarity" trick.
> >> >With changing the polarity, our motivation was to reflect the 
> >> >allocated device''s assertion state to the HVM AS IS.
> >> >
> >> >Regarding the performance, using a USB 2.0 storage device 
> (working 
> >> >with DMA), a huge file copy was compared when working in 
> >> >pass-through, and when working in native (on the same 
> OS), the time 
> >> >differences were negligible so I''m not sure yet about
the
> impact of 
> >> >doubling the number of interrupts. The advantage of changing
the
> >> >polarity is the simplicity.
> >> >
> >> >Anyways, We''ll release some patches during the day so
you
> could give 
> >> >your comments.
> >> >
> >> >Thanks,
> >> >Guy.
> >> >
> >>
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2007-Jun-04 22:56 UTC

head link

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

>
>How do you plan to provide interrupts sharing with your method exactly?
>Please provide your thoughts.
>
>Thanks,
>Guy.
>
I agree with others'' comments about MSI would be the ultimate solution.
In the short term, a simple stop gap meausre might be to check the
interrupt status bit in device''s PCI config before delivering it
to the guest.  I found interrupt sharing with low interrupt device
such as non-storage USB is not much a problem.

It is a problem when sharing the interrupt with high freqency interrupt
devices such as SATA. The problem I encountered was the xen tried to
deliver interrupt to the guest before the guest was fully ready to
accept interrupts.

Allen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Seemingly Similar Threads

Search for more maybe matching threads

Xen devel - May 2007 - [VTD][patch 0/5] HVM device assignment using vt-d

[Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Re: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

RE: [Xen-devel] [VTD][patch 0/5] HVM device assignment using vt-d

Seemingly Similar Threads