thr3ads.net - Xen devel - [Xen-devel] [PATCH 0/5] VT-d support for PV guests [May 2008]

If this information is useful, please help other people find it:
Share via:

Espen Skoglund

2008-May-19 20:27 UTC

[Xen-devel] [PATCH 0/5] VT-d support for PV guests

Hi,

I''ve added some preliminary support for VT-d for paravirtualized
guests.  This must be enabled using an ''iommu_pv'' boot
parameter
(disabled by default).

I''ve added some python bindigs to allow xend to assign PCI devices to
IOMMU for PV guests.  For HVM guests this is handled in ioemu.  Not
sure if it makes sense to handle both cases in one place.

The changes currently hook into get_page_type() in xen/arch/x86/mm.c
to map/unmap IOMMU pages when the page types change.  This might
not be the apropriate place to hook these calls.

The patches I''ve added are as follows:

   xen-vtd-unmap.patch --- Make the VT-d iommu_unmap_page() code
       actually do something close to useful.

   xen-ptab-dump.path --- There''s no point in using
''current'' when an
       IOMMU page fault is raised.  Also, add some page type
       statistics for DomPage debug output.

   xen-iommu-pv.patch --- Add support for iommu_pv_enable boot
       parameter and IOMMU assignment of PCI devices to guests.

   xen-iommu-pv-mappings.patch --- Hook iommu_{un}map_page() calls
       into various Xen locations.

   xen-pv-assign.patch --- Allow PCI devices to be assigned to IOMMU
       PV guests from xend.


While adding the PV VT-d support I also noticed that there is a lot of
duplicated and unused code dealing with VT-d.  This really needs to be
cleaned up properly.  I don''t know if the original VT-d submitters
already have made plans for doing this.


	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2008-May-19 21:50 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>
>While adding the PV VT-d support I also noticed that there is a lot of
>duplicated and unused code dealing with VT-d.  This really needs to be
>cleaned up properly.  I don''t know if the original VT-d submitters
>already have made plans for doing this.
>
>
Other than the hardware wait loops which is something we need to clean
up, can you give some example of duplicated/unused code in vt-d
directory?

Allen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-May-20 07:39 UTC

head link

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

On 19/5/08 21:27, "Espen Skoglund"
<espen.skoglund@netronome.com> wrote:
> I''ve added some preliminary support for VT-d for paravirtualized
> guests.  This must be enabled using an ''iommu_pv'' boot
parameter
> (disabled by default).
> 
> I''ve added some python bindigs to allow xend to assign PCI devices
to
> IOMMU for PV guests.  For HVM guests this is handled in ioemu.  Not
> sure if it makes sense to handle both cases in one place.
> 
> The changes currently hook into get_page_type() in xen/arch/x86/mm.c
> to map/unmap IOMMU pages when the page types change.  This might
> not be the apropriate place to hook these calls.
What functionality does this patchset enable, Espen? Is this a security
enhancement (isolation/containment) for PV guests with direct hardware
access? For example: can access all its own memory except that which has
pagetable/GDT type, and only foreign memory which is granted to it?

Is there a good reason to hide this behind a boot option?

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yang, Xiaowei

2008-May-20 07:52 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>
>I''ve added some preliminary support for VT-d for paravirtualized
>guests.  This must be enabled using an ''iommu_pv'' boot
parameter
>(disabled by default).
>
>I''ve added some python bindigs to allow xend to assign PCI devices
to
>IOMMU for PV guests.  For HVM guests this is handled in ioemu.  Not
>sure if it makes sense to handle both cases in one place.
>
>The changes currently hook into get_page_type() in xen/arch/x86/mm.c
>to map/unmap IOMMU pages when the page types change.  This might
>not be the apropriate place to hook these calls.
>
>The patches I''ve added are as follows:
>
>   xen-vtd-unmap.patch --- Make the VT-d iommu_unmap_page() code
>       actually do something close to useful.
>
>   xen-ptab-dump.path --- There''s no point in using
''current'' when an
>       IOMMU page fault is raised.  Also, add some page type
>       statistics for DomPage debug output.
>
>   xen-iommu-pv.patch --- Add support for iommu_pv_enable boot
>       parameter and IOMMU assignment of PCI devices to guests.
>
>   xen-iommu-pv-mappings.patch --- Hook iommu_{un}map_page() calls
>       into various Xen locations.
>
Espen,
The patches look good to me with some comments:

- For the occasions when P2M is changed, the hooks of
iommu_{un}map_page() can be added cleaner. Only the hooks inside
guest_physmap_add/remove_page() are necessary. The hooks in
populate_physmap() and memory_exchange() can be omitted by some small
code rearrangement like removing if(paging_mode_translate(d)) before
calling guest_physmap_add_page(). 

- gnttab_map/unmap_grant_ref() need to be hooked also. There are no P2M
changes at that time while the guest PT is updated directly. The mapped
pages can also be used for DMA by backend drivers.

- dom0 can be treated as the same as other PV domains with regard to VTd
PT updating. Unfortunately, it need some special care. All of devices
are assigned to it by default and usually it ones the most of devices.
iommu_{un}map_page() could be called very frequently by it while it
serves other domains IO requests. It will bring performance penalty and
CPU overhead. 

Thanks,
Xiaowei

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yang, Xiaowei

2008-May-20 07:58 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>-----Original Message-----
>From: Yang, Xiaowei
>Sent: Tuesday, May 20, 2008 3:54 PM
>To: Yang, Xiaowei
>Subject: FW: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
>
>
>
>Thanks,
>Xiaowei
>________________________________________
>From: Yang Xiaowei [mailto:xiaowei.yang@gmail.com]
>Sent: Tuesday, May 20, 2008 3:53 PM
>To: Yang, Xiaowei
>Subject: Fwd: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
>
>
>---------- Forwarded message ----------
>From: Keir Fraser <keir.fraser@eu.citrix.com>
>Date: Tue, May 20, 2008 at 3:39 PM
>Subject: Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
>To: Espen Skoglund <espen.skoglund@netronome.com>,
>xen-devel@lists.xensource.com
>
>On 19/5/08 21:27, "Espen Skoglund"
<espen.skoglund@netronome.com> wrote:
>
>> I''ve added some preliminary support for VT-d for
paravirtualized
>> guests.  This must be enabled using an ''iommu_pv''
boot parameter
>> (disabled by default).
>>
>> I''ve added some python bindigs to allow xend to assign PCI
devices to
>> IOMMU for PV guests.  For HVM guests this is handled in ioemu.  Not
>> sure if it makes sense to handle both cases in one place.
>>
>> The changes currently hook into get_page_type() in xen/arch/x86/mm.c
>> to map/unmap IOMMU pages when the page types change.  This might
>> not be the apropriate place to hook these calls.
>What functionality does this patchset enable, Espen? Is this a security
>enhancement (isolation/containment) for PV guests with direct hardware
>access? For example: can access all its own memory except that which has
>pagetable/GDT type, and only foreign memory which is granted to it?
>Yes to me. VTd support for PV guest can prevent one domain from accessing other
domains'' pages without permission.

Thanks,
Xiaowei



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-May-20 10:16 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

[Allen M Kay]>> 
>> While adding the PV VT-d support I also noticed that there is a lot
>> of duplicated and unused code dealing with VT-d.  This really needs
>> to be cleaned up properly.  I don''t know if the original VT-d
>> submitters already have made plans for doing this.
>> 
>> 
> Other than the hardware wait loops which is something we need to
> clean up, can you give some example of duplicated/unused code in
> vt-d directory?
>From the top of my head:
    iommu_set/free_pgd()  in xen/drivers/passthrough/vtd/x86/vtd.c
    iommu_page_unmapping() in xen/drivers/passthrough/vtd/iommu.c

There''s probably more that I can''t think of right now.

	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-May-20 10:43 UTC

head link

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

[Keir Fraser]> On 19/5/08 21:27, "Espen Skoglund"
<espen.skoglund@netronome.com> wrote:
>> I''ve added some preliminary support for VT-d for
paravirtualized
>> guests.  This must be enabled using an ''iommu_pv''
boot parameter
>> (disabled by default).
>> 
>> I''ve added some python bindigs to allow xend to assign PCI
devices to
>> IOMMU for PV guests.  For HVM guests this is handled in ioemu.  Not
>> sure if it makes sense to handle both cases in one place.
>> 
>> The changes currently hook into get_page_type() in xen/arch/x86/mm.c
>> to map/unmap IOMMU pages when the page types change.  This might
>> not be the apropriate place to hook these calls.
> What functionality does this patchset enable, Espen? Is this a
> security enhancement (isolation/containment) for PV guests with
> direct hardware access? For example: can access all its own memory
> except that which has pagetable/GDT type, and only foreign memory
> which is granted to it?
> Is there a good reason to hide this behind a boot option?
The patchset does, as you guessed, enable isolation for PV guests with
direct hardware access.  If you assign a PCI device to a guest you are
guaranteed that the assigned device can''t access the memory of other
guests or Xen itself.  The patchseet allows the device to access all
its own memory which it has write access to, and memory which is
granted to it.

The only reason for making it a boot option was to allow for the old
behaviour (i.e., complete access) to be the default behaviour until
people get more confident with the patches.

	eSk



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-May-20 11:11 UTC

head link

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

On 20/5/08 11:43, "Espen Skoglund"
<espen.skoglund@netronome.com> wrote:
>> Is there a good reason to hide this behind a boot option?
> 
> The patchset does, as you guessed, enable isolation for PV guests with
> direct hardware access.  If you assign a PCI device to a guest you are
> guaranteed that the assigned device can''t access the memory of
other
> guests or Xen itself.  The patchseet allows the device to access all
> its own memory which it has write access to, and memory which is
> granted to it.
> 
> The only reason for making it a boot option was to allow for the old
> behaviour (i.e., complete access) to be the default behaviour until
> people get more confident with the patches.
Okay. Well it seems that there have been some comments to be dealt with in
another iteration of these patches. But apart from that I''m happy in
principle to apply these patches.

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Han, Weidong

2008-May-20 12:50 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

Espen Skoglund wrote:> [Allen M Kay]
>>> 
>>> While adding the PV VT-d support I also noticed that there is a lot
>>> of duplicated and unused code dealing with VT-d.  This really needs
>>> to be cleaned up properly.  I don''t know if the original
VT-d
>>> submitters already have made plans for doing this.
>>> 
>>> 
> 
>> Other than the hardware wait loops which is something we need to
>> clean up, can you give some example of duplicated/unused code in
>> vt-d directory?
> 
> From the top of my head:
> 
>     iommu_set/free_pgd()  in xen/drivers/passthrough/vtd/x86/vtd.c
>     iommu_page_unmapping() in xen/drivers/passthrough/vtd/iommu.c
> 
In the VT-d development, these code are used once. Now they are indeed
useless. We will clean up them later. Thanks!

Randy (Weidong)
> There''s probably more that I can''t think of right now.
> 
> 	eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-May-20 13:21 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

> The patchset does, as you guessed, enable isolation for PV guests with
> direct hardware access.  If you assign a PCI device to a guest you are
> guaranteed that the assigned device can''t access the memory of
other
> guests or Xen itself.  The patchseet allows the device to access all
> its own memory which it has write access to, and memory which is
> granted to it.
Not that I particularly think it matters, but does the patch configure
the IOMMU to distinguish between read-only and read-write access to a
guest''s own memory or granted memory? If not, we should at least
clearly
document that we''re being a little more permissive. 

Have you got any with-and-without performance results with a decent
high-throughput device (e.g. a HBA or 10Gb/s NIC)? 

It would be good if you could provide a bit more detail on when the
patch populates IOMMU entries, and how it keeps them in sync. For
example, does the IOMMU map all the guest''s memory, or just that which
will soon be the subject of a DMA? How synchronous is the patch in
removing mappings, e.g. due to page type changes (pagetable pages,
balloon driver) or due to unmapping grants?  

There''s been a lot of discussion at various xen summits about different
IOMMU optimizations (e.g. for IBM Summit, Power etc) and I''d like to
understand exactly what tradeoffs your implementation makes. Anyhow,
good stuff, thanks!

Thanks,
Ian
 
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-May-20 14:10 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

[Ian Pratt]>> The patchset does, as you guessed, enable isolation for PV guests
>> with direct hardware access.  If you assign a PCI device to a guest
>> you are guaranteed that the assigned device can''t access the
memory
>> of other guests or Xen itself.  The patchseet allows the device to
>> access all its own memory which it has write access to, and memory
>> which is granted to it.
> Not that I particularly think it matters, but does the patch
> configure the IOMMU to distinguish between read-only and read-write
> access to a guest''s own memory or granted memory? If not, we
should
> at least clearly document that we''re being a little more
permissive.
The idea was indeed to distinguish between this properly.  However,
the current VT-d code only handles read-write or no-access.  For PV
guests I''ve made it so that page tables and such are mapped with no
access in the IOMMU.  This is a bit more restrictive than necessary,
but it shouldn''t really matter for the common usage scenarios.

Anyhow, read-only access can indeed be supported for VT-d.  I just
wanted to get basic PV guest support in there first.  Also, I''m not
familiar with AMD''s IOMMU, but I would guess that it also supports
read-only access.


> Have you got any with-and-without performance results with a decent
> high-throughput device (e.g. a HBA or 10Gb/s NIC)?
I don''t have a 10GbE NIC in my VT-d enabled machine right now, so I
can''t test it.  We have however tried with a 10GbE NIC running in dom0
with VT-d enabled, and there was as far as I rememeber no performance
hit.  Of course, any performance degradation will largely depend on
the networking memory footprint and the size of the IOTLB.


> It would be good if you could provide a bit more detail on when the
> patch populates IOMMU entries, and how it keeps them in sync. For
> example, does the IOMMU map all the guest''s memory, or just that
> which will soon be the subject of a DMA? How synchronous is the
> patch in removing mappings, e.g. due to page type changes (pagetable
> pages, balloon driver) or due to unmapping grants?
All writable memory is initially mapped in the IOMMU.  Page type
changes are also reflected there.  In general all maps and unmaps to a
domain are synced with the IOMMU.  According to the feedback I got I
apparently missed some places, though.  Will look into this and fix
it.

It''s clear that performance will pretty much suck if you do frequent
updates in grant tables, but the whole idea of having passthrough
access for NICs is to avoid this netfront/netback data plane scheme
altogether.  This leaves you with grant table updates for block device
access.  I don''t know what the expected update frequency is for that
one.

It must be noted that reflecting grant table updates in the IOMMU is
required for correctness.  The alternative --- which is indeed
possible --- is to catch DMA faults to such memory regions and somehow
notify the driver to, e.g., drop packets or retry the DMA transaction
once the IOMMU mapping has been established.


> There''s been a lot of discussion at various xen summits about
> different IOMMU optimizations (e.g. for IBM Summit, Power etc) and
> I''d like to understand exactly what tradeoffs your implementation
> makes. Anyhow, good stuff, thanks!
I can''t say I know much about those other IOMMUs, but as far as I know
they are quite limited in that they only support a fixed number of
mappings and can not differentiate between different DMA sources
(i.e., PCI devices).  Someone please correct me if I''m wrong here.  In
short, "my" implementation don''t actually do many tradeoffs. 
It''s
simply based on the VT-d implementation by the Intel folks.  It
assumes a more fully fledged IOMMU that can have different mappings
for different devices.

	eSk



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Espen Skoglund

2008-May-20 14:36 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

[Xiaowei Yang]> Espen,
> The patches look good to me with some comments:
> - For the occasions when P2M is changed, the hooks of
> iommu_{un}map_page() can be added cleaner. Only the hooks inside
> guest_physmap_add/remove_page() are necessary. The hooks in
> populate_physmap() and memory_exchange() can be omitted by some
> small code rearrangement like removing if(paging_mode_translate(d))
> before calling guest_physmap_add_page().
Yes.  I considered this as an option as well, but ended up with the
current approach.  Your suggestion is probably cleaner, so I''ll switch
over to doing that.

> - gnttab_map/unmap_grant_ref() need to be hooked also. There are no
> P2M changes at that time while the guest PT is updated directly. The
> mapped pages can also be used for DMA by backend drivers.
Thanks.  Overlooked that one.  Only caught the gnttab_transfer().

> - dom0 can be treated as the same as other PV domains with regard to
> VTd PT updating. Unfortunately, it need some special care. All of
> devices are assigned to it by default and usually it ones the most
> of devices.  iommu_{un}map_page() could be called very frequently by
> it while it serves other domains IO requests. It will bring
> performance penalty and CPU overhead.
dom0 should not need to do any VT-d page table updating once it has
been set up, so marking it as need_iommu() should be unnecessary.
Also, if passthrough mode is supported in VT-d then dom0 does not need
to have VT-d page tables at all.  I think setting it''s VT-d tables up
to have complete access at startup and leave it that way is perfectly
fine.


Thanks for feedback.  Will repost once I''ve incorporated all the
comments.

	eSk




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-May-20 15:33 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

> > Not that I particularly think it matters, but does the patch
> > configure the IOMMU to distinguish between read-only and read-write
> > access to a guest''s own memory or granted memory? If not, we
should
> > at least clearly document that we''re being a little more
permissive.
> 
> The idea was indeed to distinguish between this properly.  However,
> the current VT-d code only handles read-write or no-access.  For PV
> guests I''ve made it so that page tables and such are mapped with
no
> access in the IOMMU.  This is a bit more restrictive than necessary,
> but it shouldn''t really matter for the common usage scenarios.
> 
> Anyhow, read-only access can indeed be supported for VT-d.  I just
> wanted to get basic PV guest support in there first.  Also, I''m
not
> familiar with AMD''s IOMMU, but I would guess that it also supports
> read-only access.
OK, hopefully someone can fill in the missing bits in the VTd support.
> > Have you got any with-and-without performance results with a decent
> > high-throughput device (e.g. a HBA or 10Gb/s NIC)?
> 
> I don''t have a 10GbE NIC in my VT-d enabled machine right now, so
I
> can''t test it.  We have however tried with a 10GbE NIC running in
dom0
> with VT-d enabled, and there was as far as I rememeber no performance
> hit.  Of course, any performance degradation will largely depend on
> the networking memory footprint and the size of the IOTLB.
Indeed, that''s why I''d like to see some measurements, both for
dom0 IO,
and also when dom0 is doing IO on behalf of another domain.

I''m also interested to understand what the overhead to page type change
/ balloon operations are. Do you synchronously invalidate the entries in
the IOMMU? How slow is that?
> > It would be good if you could provide a bit more detail on when the
> > patch populates IOMMU entries, and how it keeps them in sync. For
> > example, does the IOMMU map all the guest''s memory, or just
that
> > which will soon be the subject of a DMA? How synchronous is the
> > patch in removing mappings, e.g. due to page type changes (pagetable
> > pages, balloon driver) or due to unmapping grants?
> 
> All writable memory is initially mapped in the IOMMU.  Page type
> changes are also reflected there.  In general all maps and unmaps to a
> domain are synced with the IOMMU.  According to the feedback I got I
> apparently missed some places, though.  Will look into this and fix
> it.
Is "demotion" of access handled synchronously, or do you have some
tricks to mitigate the synchronization?
> It''s clear that performance will pretty much suck if you do
frequent
> updates in grant tables, but the whole idea of having passthrough
> access for NICs is to avoid this netfront/netback data plane scheme
> altogether.  This leaves you with grant table updates for block device
> access.  I don''t know what the expected update frequency is for
that
> one.
I don''t entirely buy this -- I think we need to make grant map/unmaps
fast too. We''ve discussed schemes to make this more efficient by doing
the IOMMU operations at grant map time (where they can be easily
batched) rather than at dma_map time. We''ve talked about using a
kmap-style area of physical address space to cycle the mappings through
to avoid having to do so many synchronous invalidates (at the expense of
allowing a driver domain to be able to DMA to a page for a little longer
than it strictly ought to).


Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-May-21 00:36 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>From: Ian Pratt
>Sent: 2008年5月20日 23:34
>> > It would be good if you could provide a bit more detail on when
the
>> > patch populates IOMMU entries, and how it keeps them in sync. For
>> > example, does the IOMMU map all the guest''s memory, or
just that
>> > which will soon be the subject of a DMA? How synchronous is the
>> > patch in removing mappings, e.g. due to page type changes 
>(pagetable
>> > pages, balloon driver) or due to unmapping grants?
>> 
>> All writable memory is initially mapped in the IOMMU.  Page type
>> changes are also reflected there.  In general all maps and 
>unmaps to a
>> domain are synced with the IOMMU.  According to the feedback I got I
>> apparently missed some places, though.  Will look into this and fix
>> it.
>
>Is "demotion" of access handled synchronously, or do you have some
>tricks to mitigate the synchronization?
All changes need be handled synchronously, as DMA request is not
restartable with VT-d fault as async event notification. Hardware bits
are designed in such way that all expected permission controls have
to exist before device actually issues access request. 
>
>> It''s clear that performance will pretty much suck if you do
frequent
>> updates in grant tables, but the whole idea of having passthrough
>> access for NICs is to avoid this netfront/netback data plane scheme
>> altogether.  This leaves you with grant table updates for 
>block device
>> access.  I don''t know what the expected update frequency is
for that
>> one.
>
>I don''t entirely buy this -- I think we need to make grant
map/unmaps
>fast too. We''ve discussed schemes to make this more efficient by
doing
>the IOMMU operations at grant map time (where they can be easily
>batched) rather than at dma_map time. We''ve talked about using a
Agree.
>kmap-style area of physical address space to cycle the mappings through
>to avoid having to do so many synchronous invalidates (at the 
>expense of
>allowing a driver domain to be able to DMA to a page for a 
>little longer
>than it strictly ought to).
Could you elaborate a bit how kmap-style area helps here? The key
point is whether frequency of p2m mapping can be reduced...

Thanks,
Kevin 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-May-21 05:05 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

> >Is "demotion" of access handled synchronously, or do you have
some
> >tricks to mitigate the synchronization?
> 
> All changes need be handled synchronously, as DMA request is not
> restartable with VT-d fault as async event notification. Hardware bits
> are designed in such way that all expected permission controls have
> to exist before device actually issues access request.
You are talking about ''promtion'' (adding more permissions).
Demotion
required flushing entries in the TLB and is typically more expensive,
hence the desire to ''batch'' the synchronization.
> >kmap-style area of physical address space to cycle the mappings
> through
> >to avoid having to do so many synchronous invalidates (at the
> >expense of
> >allowing a driver domain to be able to DMA to a page for a
> >little longer
> >than it strictly ought to).
> 
> Could you elaborate a bit how kmap-style area helps here? The key
> point is whether frequency of p2m mapping can be reduced...
A window of guest physical address space is created that is used to
create mappings to granted pages. The next available free slot is used
when creating a mapping. When the end of the window is reached, flush
the IOMMU. (Actually, you can do better by issuing a flush at the
halfway point and then synchronizing against the flush at the end if the
window, effectively double buffering).

This typically avoids the churn that would happen with grant mappings
where the same guest physical address region is used to map different
pages (requiring a synchronous flush).

Ian



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-May-21 05:16 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] 
>Sent: 2008年5月21日 13:05
>> >Is "demotion" of access handled synchronously, or do you
have some
>> >tricks to mitigate the synchronization?
>> 
>> All changes need be handled synchronously, as DMA request is not
>> restartable with VT-d fault as async event notification. 
>Hardware bits
>> are designed in such way that all expected permission controls have
>> to exist before device actually issues access request.
>
>You are talking about ''promtion'' (adding more
permissions). Demotion
>required flushing entries in the TLB and is typically more expensive,
>hence the desire to ''batch'' the synchronization.
So you''re talking about TLB inside CPU? IMO, both demotion/promotion
requires IOTLB flush. Even for promotion, it''s not like instruction
access
to trigger fault for Xen to promote lazily and then restart the exection...

''batch'' IOTLB flush is good direction, which requires some
cooperation
from guest, e.g. as long as guest driver doesn''t attempt to use set of
frames in changing. So to me it''s more like some change in guest side
to batch grant/m2p request together. Or else Xen itself doesn''t know 
when one changed mapping will be used by guest and thus has to
force flush for each change before resuming back to guest
>
>> >kmap-style area of physical address space to cycle the mappings
>> through
>> >to avoid having to do so many synchronous invalidates (at the
>> >expense of
>> >allowing a driver domain to be able to DMA to a page for a
>> >little longer
>> >than it strictly ought to).
>> 
>> Could you elaborate a bit how kmap-style area helps here? The key
>> point is whether frequency of p2m mapping can be reduced...
>
>A window of guest physical address space is created that is used to
>create mappings to granted pages. The next available free slot is used
>when creating a mapping. When the end of the window is reached, flush
>the IOMMU. (Actually, you can do better by issuing a flush at the
>halfway point and then synchronizing against the flush at the 
>end if the
>window, effectively double buffering).
>
>This typically avoids the churn that would happen with grant mappings
>where the same guest physical address region is used to map different
>pages (requiring a synchronous flush).
>
Yep, that''s a neat one to hold unflushed guest frames from being used
until batch flush is done later. :-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-May-21 05:32 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

> >You are talking about ''promtion'' (adding more
permissions). Demotion
> >required flushing entries in the TLB and is typically more expensive,
> >hence the desire to ''batch'' the synchronization.
> 
> So you''re talking about TLB inside CPU? IMO, both
demotion/promotion
> requires IOTLB flush. Even for promotion, it''s not like
instruction
> access to trigger fault for Xen to promote lazily and then restart the
> exection...
The IOMMU TLB doesn''t cache not_present entries, hence you
don''t need to
do a flush when transitioning an entry from not_present to present. Some
IOMMUs will also re-walk the pagetable if they find a TLB entry that is
read-only and the operation is a write, but I can''t recall whether VTd
is like this.
> ''batch'' IOTLB flush is good direction, which requires
some cooperation
> from guest, e.g. as long as guest driver doesn''t attempt to use
set of
> frames in changing. So to me it''s more like some change in guest
side
> to batch grant/m2p request together. Or else Xen itself doesn''t
know
> when one changed mapping will be used by guest and thus has to
> force flush for each change before resuming back to guest
For ballooned frames, Xen should be able to put them on a
''pending'' list
awaiting completion of a flush (hence the flush is typically not
synchronous).

Frames that transition to pagetable frames are more problematic. It''s
probably better to modify the guest to create a separate quicklist for
pagetable frames, so they get recycled and remain out of the IOMMU until
they return to the free list.

Ian


 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yang, Xiaowei

2008-May-21 05:42 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>-----Original Message-----
>From: Espen Skoglund [mailto:espen.skoglund@netronome.com]
>Sent: Tuesday, May 20, 2008 10:36 PM
>To: Yang, Xiaowei
>Cc: Espen Skoglund; xen-devel@lists.xensource.com
>Subject: RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
>
>[Xiaowei Yang]
>> Espen,
>> The patches look good to me with some comments:
>
>> - For the occasions when P2M is changed, the hooks of
>> iommu_{un}map_page() can be added cleaner. Only the hooks inside
>> guest_physmap_add/remove_page() are necessary. The hooks in
>> populate_physmap() and memory_exchange() can be omitted by some
>> small code rearrangement like removing if(paging_mode_translate(d))
>> before calling guest_physmap_add_page().
>
>Yes.  I considered this as an option as well, but ended up with the
>current approach.  Your suggestion is probably cleaner, so I''ll
switch
>over to doing that.
>
>
>> - gnttab_map/unmap_grant_ref() need to be hooked also. There are no
>> P2M changes at that time while the guest PT is updated directly. The
>> mapped pages can also be used for DMA by backend drivers.
>
>Thanks.  Overlooked that one.  Only caught the gnttab_transfer().
>
>
>> - dom0 can be treated as the same as other PV domains with regard to
>> VTd PT updating. Unfortunately, it need some special care. All of
>> devices are assigned to it by default and usually it ones the most
>> of devices.  iommu_{un}map_page() could be called very frequently by
>> it while it serves other domains IO requests. It will bring
>> performance penalty and CPU overhead.
>
>dom0 should not need to do any VT-d page table updating once it has
>been set up, so marking it as need_iommu() should be unnecessary.
>Also, if passthrough mode is supported in VT-d then dom0 does not need
>to have VT-d page tables at all.  I think setting it''s VT-d tables
up
>to have complete access at startup and leave it that way is perfectly
>fine.
>
Currently, [0, max_page] is 1:1 mapped in dom0''s VT-d page table, which
gives dom0 the ability to DMA all range of memory, including critical
regions like Xen HV. It''s a security hole, and it''s still
there with
passthrough mode. Dynamic VT-d page table for dom0 can fix it.
Hopefully, it will be acceptable with gnttab map/unmap and other
optimizations.

Thanks,
Xiaowei

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-May-21 05:56 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] 
>Sent: 2008年5月21日 13:32
>
>The IOMMU TLB doesn''t cache not_present entries, hence you 
>don''t need to
>do a flush when transitioning an entry from not_present to 
>present. Some
>IOMMUs will also re-walk the pagetable if they find a TLB entry that is
>read-only and the operation is a write, but I can''t recall whether
VTd
>is like this.
You''re right, though VT-d spec defines a capability bit to indicate
whether VT-d may cache non-present entry or not. In reality it
doesn''t make sense to do that.

For promotion from non-present to present, then no flush is required.
But for promotion from read-only to read-write, I guess flush has to
be forced. Not sure whether currently such grant entry exists to
switch RO/RW permission on demand.
>
>> ''batch'' IOTLB flush is good direction, which requires
some
>cooperation
>> from guest, e.g. as long as guest driver doesn''t attempt to 
>use set of
>> frames in changing. So to me it''s more like some change in
guest side
>> to batch grant/m2p request together. Or else Xen itself
doesn''t know
>> when one changed mapping will be used by guest and thus has to
>> force flush for each change before resuming back to guest
>
>For ballooned frames, Xen should be able to put them on a 
>''pending'' list
>awaiting completion of a flush (hence the flush is typically not
>synchronous).
It seems so. Just one security concern: it''s possible to have decreased
frames allocated to another VM before completion of async flush. In this
case, the IOMMU still caches old mapping and thus it leaks a window
for mallicious domain/device to touch those frames...

But we may force a delayed completion check when a free page is 
allocated to a new VM or inserted to other p2m table. Then I agree upon 
page''s specific usage, async flush is possible. :-)
>
>Frames that transition to pagetable frames are more problematic.
It''s
>probably better to modify the guest to create a separate quicklist for
>pagetable frames, so they get recycled and remain out of the 
>IOMMU until
>they return to the free list.
>
So you''re already talking about one more step from current
implentation,
to selectively insert mapping as device really requires. Create a PV iommu
interface may serve this purpose more accurately.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-May-21 05:57 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

>From: Tian, Kevin
>Sent: 2008年5月21日 13:56
>
>>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] 
>>Sent: 2008年5月21日 13:32
>>
>>The IOMMU TLB doesn''t cache not_present entries, hence you 
>>don''t need to
>>do a flush when transitioning an entry from not_present to 
>>present. Some
>>IOMMUs will also re-walk the pagetable if they find a TLB 
>entry that is
>>read-only and the operation is a write, but I can''t recall
whether VTd
>>is like this.
>
>You''re right, though VT-d spec defines a capability bit to indicate
>whether VT-d may cache non-present entry or not. In reality it
>doesn''t make sense to do that.
>
>For promotion from non-present to present, then no flush is required.
>But for promotion from read-only to read-write, I guess flush has to
>be forced. Not sure whether currently such grant entry exists to
>switch RO/RW permission on demand.
Forgot above, I''m just talking exact same thing as you already pointed
out. :-P

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-May-21 10:26 UTC

head link

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

> >For ballooned frames, Xen should be able to put them on a
> >''pending'' list
> >awaiting completion of a flush (hence the flush is typically not
> >synchronous).
> 
> It seems so. Just one security concern: it''s possible to have
decreased> frames allocated to another VM before completion of async flush. In
> this case, the IOMMU still caches old mapping and thus it leaks a
window> for mallicious domain/device to touch those frames...
> 
> But we may force a delayed completion check when a free page is
> allocated to a new VM or inserted to other p2m table. Then I agree
upon> page''s specific usage, async flush is possible. :-)
Yep, that''s the purpose of the ''pending'' list(s):
pages don''t graduate
to the free list until synchronized against flush completion.
> >Frames that transition to pagetable frames are more problematic.
It''s
> >probably better to modify the guest to create a separate quicklist
for> >pagetable frames, so they get recycled and remain out of the
> >IOMMU until
> >they return to the free list.
> >
> 
> So you''re already talking about one more step from current
> implentation, to selectively insert mapping as device really requires.
Create a PV> iommu interface may serve this purpose more accurately.
Linux 2.4 had a concept of a quicklist that was used to recycle
pagetable pages rather than just returning them to the free list. It got
removed in 2.6, but reinstating something similar would help Xen,
particularly in the case with an IOMMU. 

Ian 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Muli Ben-Yehuda

2008-May-26 09:21 UTC

head link

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

On Tue, May 20, 2008 at 03:10:38PM +0100, Espen Skoglund wrote:
> Anyhow, read-only access can indeed be supported for VT-d.  I just
> wanted to get basic PV guest support in there first.  Also, I''m
not
> familiar with AMD''s IOMMU, but I would guess that it also supports
> read-only access.
It does.
> > It would be good if you could provide a bit more detail on when
> > the patch populates IOMMU entries, and how it keeps them in
> > sync. For example, does the IOMMU map all the guest''s memory,
or
> > just that which will soon be the subject of a DMA? How synchronous
> > is the patch in removing mappings, e.g. due to page type changes
> > (pagetable pages, balloon driver) or due to unmapping grants?
> 
> All writable memory is initially mapped in the IOMMU.  Page type
> changes are also reflected there.  In general all maps and unmaps to
> a domain are synced with the IOMMU.  According to the feedback I got
> I apparently missed some places, though.  Will look into this and
> fix it.
> 
> It''s clear that performance will pretty much suck if you do
frequent
> updates in grant tables, but the whole idea of having passthrough
> access for NICs is to avoid this netfront/netback data plane scheme
> altogether.  This leaves you with grant table updates for block
> device access.  I don''t know what the expected update frequency is
> for that one.
> 
> It must be noted that reflecting grant table updates in the IOMMU is
> required for correctness.  The alternative --- which is indeed
> possible --- is to catch DMA faults to such memory regions and
> somehow notify the driver to, e.g., drop packets or retry the DMA
> transaction once the IOMMU mapping has been established.
That would assume that the device can retry failed DMA''s or otherwise
deal with them. The majority of devices can''t.
> > There''s been a lot of discussion at various xen summits about
> > different IOMMU optimizations (e.g. for IBM Summit, Power etc) and
> > I''d like to understand exactly what tradeoffs your
implementation
> > makes. Anyhow, good stuff, thanks!
I think the major difference is that with Espen''s patches all of the
PV guest''s memory is exposed to the device (i.e., it provides
inter-guest protection, but no intra-guest protection). Our patches
aimed at providing both inter-guest and intra-guest protection, and
incurred a substantial performance hit (c.f., our OLS ''07 paper on
IOMMU performance). There will be a paper at USENIX ''08 by Willmann et
al., on different IOMMU mapping strategies which provide varying
levels of inter/intra guest protection and performance hit.
> I can''t say I know much about those other IOMMUs, but as far as I
> know they are quite limited in that they only support a fixed number
> of mappings
The IBM Calgary/CalIOC2 family of IOMMUs support 4GB address spaces.
> and can not differentiate between different DMA sources (i.e., PCI
> devices).
Calgary/CalIOC2 have a per-bus translation table. In practice most
devices are on their own bus in these systems, so you get effectively
per-device translation translation tables.

Cheers,
Muli

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Seemingly Similar Threads

Search for more reasonably related threads

Xen devel - May 2008 - [PATCH 0/5] VT-d support for PV guests

[Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests

Seemingly Similar Threads