Hi, I''ve added some preliminary support for VT-d for paravirtualized guests. This must be enabled using an ''iommu_pv'' boot parameter (disabled by default). I''ve added some python bindigs to allow xend to assign PCI devices to IOMMU for PV guests. For HVM guests this is handled in ioemu. Not sure if it makes sense to handle both cases in one place. The changes currently hook into get_page_type() in xen/arch/x86/mm.c to map/unmap IOMMU pages when the page types change. This might not be the apropriate place to hook these calls. The patches I''ve added are as follows: xen-vtd-unmap.patch --- Make the VT-d iommu_unmap_page() code actually do something close to useful. xen-ptab-dump.path --- There''s no point in using ''current'' when an IOMMU page fault is raised. Also, add some page type statistics for DomPage debug output. xen-iommu-pv.patch --- Add support for iommu_pv_enable boot parameter and IOMMU assignment of PCI devices to guests. xen-iommu-pv-mappings.patch --- Hook iommu_{un}map_page() calls into various Xen locations. xen-pv-assign.patch --- Allow PCI devices to be assigned to IOMMU PV guests from xend. While adding the PV VT-d support I also noticed that there is a lot of duplicated and unused code dealing with VT-d. This really needs to be cleaned up properly. I don''t know if the original VT-d submitters already have made plans for doing this. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >While adding the PV VT-d support I also noticed that there is a lot of >duplicated and unused code dealing with VT-d. This really needs to be >cleaned up properly. I don''t know if the original VT-d submitters >already have made plans for doing this. > >Other than the hardware wait loops which is something we need to clean up, can you give some example of duplicated/unused code in vt-d directory? Allen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19/5/08 21:27, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:> I''ve added some preliminary support for VT-d for paravirtualized > guests. This must be enabled using an ''iommu_pv'' boot parameter > (disabled by default). > > I''ve added some python bindigs to allow xend to assign PCI devices to > IOMMU for PV guests. For HVM guests this is handled in ioemu. Not > sure if it makes sense to handle both cases in one place. > > The changes currently hook into get_page_type() in xen/arch/x86/mm.c > to map/unmap IOMMU pages when the page types change. This might > not be the apropriate place to hook these calls.What functionality does this patchset enable, Espen? Is this a security enhancement (isolation/containment) for PV guests with direct hardware access? For example: can access all its own memory except that which has pagetable/GDT type, and only foreign memory which is granted to it? Is there a good reason to hide this behind a boot option? Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yang, Xiaowei
2008-May-20 07:52 UTC
RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
> >I''ve added some preliminary support for VT-d for paravirtualized >guests. This must be enabled using an ''iommu_pv'' boot parameter >(disabled by default). > >I''ve added some python bindigs to allow xend to assign PCI devices to >IOMMU for PV guests. For HVM guests this is handled in ioemu. Not >sure if it makes sense to handle both cases in one place. > >The changes currently hook into get_page_type() in xen/arch/x86/mm.c >to map/unmap IOMMU pages when the page types change. This might >not be the apropriate place to hook these calls. > >The patches I''ve added are as follows: > > xen-vtd-unmap.patch --- Make the VT-d iommu_unmap_page() code > actually do something close to useful. > > xen-ptab-dump.path --- There''s no point in using ''current'' when an > IOMMU page fault is raised. Also, add some page type > statistics for DomPage debug output. > > xen-iommu-pv.patch --- Add support for iommu_pv_enable boot > parameter and IOMMU assignment of PCI devices to guests. > > xen-iommu-pv-mappings.patch --- Hook iommu_{un}map_page() calls > into various Xen locations. >Espen, The patches look good to me with some comments: - For the occasions when P2M is changed, the hooks of iommu_{un}map_page() can be added cleaner. Only the hooks inside guest_physmap_add/remove_page() are necessary. The hooks in populate_physmap() and memory_exchange() can be omitted by some small code rearrangement like removing if(paging_mode_translate(d)) before calling guest_physmap_add_page(). - gnttab_map/unmap_grant_ref() need to be hooked also. There are no P2M changes at that time while the guest PT is updated directly. The mapped pages can also be used for DMA by backend drivers. - dom0 can be treated as the same as other PV domains with regard to VTd PT updating. Unfortunately, it need some special care. All of devices are assigned to it by default and usually it ones the most of devices. iommu_{un}map_page() could be called very frequently by it while it serves other domains IO requests. It will bring performance penalty and CPU overhead. Thanks, Xiaowei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yang, Xiaowei
2008-May-20 07:58 UTC
RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
>-----Original Message----- >From: Yang, Xiaowei >Sent: Tuesday, May 20, 2008 3:54 PM >To: Yang, Xiaowei >Subject: FW: [Xen-devel] [PATCH 0/5] VT-d support for PV guests > > > >Thanks, >Xiaowei >________________________________________ >From: Yang Xiaowei [mailto:xiaowei.yang@gmail.com] >Sent: Tuesday, May 20, 2008 3:53 PM >To: Yang, Xiaowei >Subject: Fwd: [Xen-devel] [PATCH 0/5] VT-d support for PV guests > > >---------- Forwarded message ---------- >From: Keir Fraser <keir.fraser@eu.citrix.com> >Date: Tue, May 20, 2008 at 3:39 PM >Subject: Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests >To: Espen Skoglund <espen.skoglund@netronome.com>, >xen-devel@lists.xensource.com > >On 19/5/08 21:27, "Espen Skoglund" <espen.skoglund@netronome.com> wrote: > >> I''ve added some preliminary support for VT-d for paravirtualized >> guests. This must be enabled using an ''iommu_pv'' boot parameter >> (disabled by default). >> >> I''ve added some python bindigs to allow xend to assign PCI devices to >> IOMMU for PV guests. For HVM guests this is handled in ioemu. Not >> sure if it makes sense to handle both cases in one place. >> >> The changes currently hook into get_page_type() in xen/arch/x86/mm.c >> to map/unmap IOMMU pages when the page types change. This might >> not be the apropriate place to hook these calls. >What functionality does this patchset enable, Espen? Is this a security >enhancement (isolation/containment) for PV guests with direct hardware >access? For example: can access all its own memory except that which has >pagetable/GDT type, and only foreign memory which is granted to it? >Yes to me. VTd support for PV guest can prevent one domain from accessing other domains'' pages without permission. Thanks, Xiaowei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2008-May-20 10:16 UTC
RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
[Allen M Kay]>> >> While adding the PV VT-d support I also noticed that there is a lot >> of duplicated and unused code dealing with VT-d. This really needs >> to be cleaned up properly. I don''t know if the original VT-d >> submitters already have made plans for doing this. >> >>> Other than the hardware wait loops which is something we need to > clean up, can you give some example of duplicated/unused code in > vt-d directory?>From the top of my head:iommu_set/free_pgd() in xen/drivers/passthrough/vtd/x86/vtd.c iommu_page_unmapping() in xen/drivers/passthrough/vtd/iommu.c There''s probably more that I can''t think of right now. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2008-May-20 10:43 UTC
Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
[Keir Fraser]> On 19/5/08 21:27, "Espen Skoglund" <espen.skoglund@netronome.com> wrote: >> I''ve added some preliminary support for VT-d for paravirtualized >> guests. This must be enabled using an ''iommu_pv'' boot parameter >> (disabled by default). >> >> I''ve added some python bindigs to allow xend to assign PCI devices to >> IOMMU for PV guests. For HVM guests this is handled in ioemu. Not >> sure if it makes sense to handle both cases in one place. >> >> The changes currently hook into get_page_type() in xen/arch/x86/mm.c >> to map/unmap IOMMU pages when the page types change. This might >> not be the apropriate place to hook these calls.> What functionality does this patchset enable, Espen? Is this a > security enhancement (isolation/containment) for PV guests with > direct hardware access? For example: can access all its own memory > except that which has pagetable/GDT type, and only foreign memory > which is granted to it?> Is there a good reason to hide this behind a boot option?The patchset does, as you guessed, enable isolation for PV guests with direct hardware access. If you assign a PCI device to a guest you are guaranteed that the assigned device can''t access the memory of other guests or Xen itself. The patchseet allows the device to access all its own memory which it has write access to, and memory which is granted to it. The only reason for making it a boot option was to allow for the old behaviour (i.e., complete access) to be the default behaviour until people get more confident with the patches. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/5/08 11:43, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:>> Is there a good reason to hide this behind a boot option? > > The patchset does, as you guessed, enable isolation for PV guests with > direct hardware access. If you assign a PCI device to a guest you are > guaranteed that the assigned device can''t access the memory of other > guests or Xen itself. The patchseet allows the device to access all > its own memory which it has write access to, and memory which is > granted to it. > > The only reason for making it a boot option was to allow for the old > behaviour (i.e., complete access) to be the default behaviour until > people get more confident with the patches.Okay. Well it seems that there have been some comments to be dealt with in another iteration of these patches. But apart from that I''m happy in principle to apply these patches. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund wrote:> [Allen M Kay] >>> >>> While adding the PV VT-d support I also noticed that there is a lot >>> of duplicated and unused code dealing with VT-d. This really needs >>> to be cleaned up properly. I don''t know if the original VT-d >>> submitters already have made plans for doing this. >>> >>> > >> Other than the hardware wait loops which is something we need to >> clean up, can you give some example of duplicated/unused code in >> vt-d directory? > > From the top of my head: > > iommu_set/free_pgd() in xen/drivers/passthrough/vtd/x86/vtd.c > iommu_page_unmapping() in xen/drivers/passthrough/vtd/iommu.c >In the VT-d development, these code are used once. Now they are indeed useless. We will clean up them later. Thanks! Randy (Weidong)> There''s probably more that I can''t think of right now. > > eSk_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> The patchset does, as you guessed, enable isolation for PV guests with > direct hardware access. If you assign a PCI device to a guest you are > guaranteed that the assigned device can''t access the memory of other > guests or Xen itself. The patchseet allows the device to access all > its own memory which it has write access to, and memory which is > granted to it.Not that I particularly think it matters, but does the patch configure the IOMMU to distinguish between read-only and read-write access to a guest''s own memory or granted memory? If not, we should at least clearly document that we''re being a little more permissive. Have you got any with-and-without performance results with a decent high-throughput device (e.g. a HBA or 10Gb/s NIC)? It would be good if you could provide a bit more detail on when the patch populates IOMMU entries, and how it keeps them in sync. For example, does the IOMMU map all the guest''s memory, or just that which will soon be the subject of a DMA? How synchronous is the patch in removing mappings, e.g. due to page type changes (pagetable pages, balloon driver) or due to unmapping grants? There''s been a lot of discussion at various xen summits about different IOMMU optimizations (e.g. for IBM Summit, Power etc) and I''d like to understand exactly what tradeoffs your implementation makes. Anyhow, good stuff, thanks! Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2008-May-20 14:10 UTC
RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
[Ian Pratt]>> The patchset does, as you guessed, enable isolation for PV guests >> with direct hardware access. If you assign a PCI device to a guest >> you are guaranteed that the assigned device can''t access the memory >> of other guests or Xen itself. The patchseet allows the device to >> access all its own memory which it has write access to, and memory >> which is granted to it.> Not that I particularly think it matters, but does the patch > configure the IOMMU to distinguish between read-only and read-write > access to a guest''s own memory or granted memory? If not, we should > at least clearly document that we''re being a little more permissive.The idea was indeed to distinguish between this properly. However, the current VT-d code only handles read-write or no-access. For PV guests I''ve made it so that page tables and such are mapped with no access in the IOMMU. This is a bit more restrictive than necessary, but it shouldn''t really matter for the common usage scenarios. Anyhow, read-only access can indeed be supported for VT-d. I just wanted to get basic PV guest support in there first. Also, I''m not familiar with AMD''s IOMMU, but I would guess that it also supports read-only access.> Have you got any with-and-without performance results with a decent > high-throughput device (e.g. a HBA or 10Gb/s NIC)?I don''t have a 10GbE NIC in my VT-d enabled machine right now, so I can''t test it. We have however tried with a 10GbE NIC running in dom0 with VT-d enabled, and there was as far as I rememeber no performance hit. Of course, any performance degradation will largely depend on the networking memory footprint and the size of the IOTLB.> It would be good if you could provide a bit more detail on when the > patch populates IOMMU entries, and how it keeps them in sync. For > example, does the IOMMU map all the guest''s memory, or just that > which will soon be the subject of a DMA? How synchronous is the > patch in removing mappings, e.g. due to page type changes (pagetable > pages, balloon driver) or due to unmapping grants?All writable memory is initially mapped in the IOMMU. Page type changes are also reflected there. In general all maps and unmaps to a domain are synced with the IOMMU. According to the feedback I got I apparently missed some places, though. Will look into this and fix it. It''s clear that performance will pretty much suck if you do frequent updates in grant tables, but the whole idea of having passthrough access for NICs is to avoid this netfront/netback data plane scheme altogether. This leaves you with grant table updates for block device access. I don''t know what the expected update frequency is for that one. It must be noted that reflecting grant table updates in the IOMMU is required for correctness. The alternative --- which is indeed possible --- is to catch DMA faults to such memory regions and somehow notify the driver to, e.g., drop packets or retry the DMA transaction once the IOMMU mapping has been established.> There''s been a lot of discussion at various xen summits about > different IOMMU optimizations (e.g. for IBM Summit, Power etc) and > I''d like to understand exactly what tradeoffs your implementation > makes. Anyhow, good stuff, thanks!I can''t say I know much about those other IOMMUs, but as far as I know they are quite limited in that they only support a fixed number of mappings and can not differentiate between different DMA sources (i.e., PCI devices). Someone please correct me if I''m wrong here. In short, "my" implementation don''t actually do many tradeoffs. It''s simply based on the VT-d implementation by the Intel folks. It assumes a more fully fledged IOMMU that can have different mappings for different devices. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Espen Skoglund
2008-May-20 14:36 UTC
RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
[Xiaowei Yang]> Espen, > The patches look good to me with some comments:> - For the occasions when P2M is changed, the hooks of > iommu_{un}map_page() can be added cleaner. Only the hooks inside > guest_physmap_add/remove_page() are necessary. The hooks in > populate_physmap() and memory_exchange() can be omitted by some > small code rearrangement like removing if(paging_mode_translate(d)) > before calling guest_physmap_add_page().Yes. I considered this as an option as well, but ended up with the current approach. Your suggestion is probably cleaner, so I''ll switch over to doing that.> - gnttab_map/unmap_grant_ref() need to be hooked also. There are no > P2M changes at that time while the guest PT is updated directly. The > mapped pages can also be used for DMA by backend drivers.Thanks. Overlooked that one. Only caught the gnttab_transfer().> - dom0 can be treated as the same as other PV domains with regard to > VTd PT updating. Unfortunately, it need some special care. All of > devices are assigned to it by default and usually it ones the most > of devices. iommu_{un}map_page() could be called very frequently by > it while it serves other domains IO requests. It will bring > performance penalty and CPU overhead.dom0 should not need to do any VT-d page table updating once it has been set up, so marking it as need_iommu() should be unnecessary. Also, if passthrough mode is supported in VT-d then dom0 does not need to have VT-d page tables at all. I think setting it''s VT-d tables up to have complete access at startup and leave it that way is perfectly fine. Thanks for feedback. Will repost once I''ve incorporated all the comments. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Not that I particularly think it matters, but does the patch > > configure the IOMMU to distinguish between read-only and read-write > > access to a guest''s own memory or granted memory? If not, we should > > at least clearly document that we''re being a little more permissive. > > The idea was indeed to distinguish between this properly. However, > the current VT-d code only handles read-write or no-access. For PV > guests I''ve made it so that page tables and such are mapped with no > access in the IOMMU. This is a bit more restrictive than necessary, > but it shouldn''t really matter for the common usage scenarios. > > Anyhow, read-only access can indeed be supported for VT-d. I just > wanted to get basic PV guest support in there first. Also, I''m not > familiar with AMD''s IOMMU, but I would guess that it also supports > read-only access.OK, hopefully someone can fill in the missing bits in the VTd support.> > Have you got any with-and-without performance results with a decent > > high-throughput device (e.g. a HBA or 10Gb/s NIC)? > > I don''t have a 10GbE NIC in my VT-d enabled machine right now, so I > can''t test it. We have however tried with a 10GbE NIC running in dom0 > with VT-d enabled, and there was as far as I rememeber no performance > hit. Of course, any performance degradation will largely depend on > the networking memory footprint and the size of the IOTLB.Indeed, that''s why I''d like to see some measurements, both for dom0 IO, and also when dom0 is doing IO on behalf of another domain. I''m also interested to understand what the overhead to page type change / balloon operations are. Do you synchronously invalidate the entries in the IOMMU? How slow is that?> > It would be good if you could provide a bit more detail on when the > > patch populates IOMMU entries, and how it keeps them in sync. For > > example, does the IOMMU map all the guest''s memory, or just that > > which will soon be the subject of a DMA? How synchronous is the > > patch in removing mappings, e.g. due to page type changes (pagetable > > pages, balloon driver) or due to unmapping grants? > > All writable memory is initially mapped in the IOMMU. Page type > changes are also reflected there. In general all maps and unmaps to a > domain are synced with the IOMMU. According to the feedback I got I > apparently missed some places, though. Will look into this and fix > it.Is "demotion" of access handled synchronously, or do you have some tricks to mitigate the synchronization?> It''s clear that performance will pretty much suck if you do frequent > updates in grant tables, but the whole idea of having passthrough > access for NICs is to avoid this netfront/netback data plane scheme > altogether. This leaves you with grant table updates for block device > access. I don''t know what the expected update frequency is for that > one.I don''t entirely buy this -- I think we need to make grant map/unmaps fast too. We''ve discussed schemes to make this more efficient by doing the IOMMU operations at grant map time (where they can be easily batched) rather than at dma_map time. We''ve talked about using a kmap-style area of physical address space to cycle the mappings through to avoid having to do so many synchronous invalidates (at the expense of allowing a driver domain to be able to DMA to a page for a little longer than it strictly ought to). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Ian Pratt >Sent: 2008年5月20日 23:34 >> > It would be good if you could provide a bit more detail on when the >> > patch populates IOMMU entries, and how it keeps them in sync. For >> > example, does the IOMMU map all the guest''s memory, or just that >> > which will soon be the subject of a DMA? How synchronous is the >> > patch in removing mappings, e.g. due to page type changes >(pagetable >> > pages, balloon driver) or due to unmapping grants? >> >> All writable memory is initially mapped in the IOMMU. Page type >> changes are also reflected there. In general all maps and >unmaps to a >> domain are synced with the IOMMU. According to the feedback I got I >> apparently missed some places, though. Will look into this and fix >> it. > >Is "demotion" of access handled synchronously, or do you have some >tricks to mitigate the synchronization?All changes need be handled synchronously, as DMA request is not restartable with VT-d fault as async event notification. Hardware bits are designed in such way that all expected permission controls have to exist before device actually issues access request.> >> It''s clear that performance will pretty much suck if you do frequent >> updates in grant tables, but the whole idea of having passthrough >> access for NICs is to avoid this netfront/netback data plane scheme >> altogether. This leaves you with grant table updates for >block device >> access. I don''t know what the expected update frequency is for that >> one. > >I don''t entirely buy this -- I think we need to make grant map/unmaps >fast too. We''ve discussed schemes to make this more efficient by doing >the IOMMU operations at grant map time (where they can be easily >batched) rather than at dma_map time. We''ve talked about using aAgree.>kmap-style area of physical address space to cycle the mappings through >to avoid having to do so many synchronous invalidates (at the >expense of >allowing a driver domain to be able to DMA to a page for a >little longer >than it strictly ought to).Could you elaborate a bit how kmap-style area helps here? The key point is whether frequency of p2m mapping can be reduced... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >Is "demotion" of access handled synchronously, or do you have some > >tricks to mitigate the synchronization? > > All changes need be handled synchronously, as DMA request is not > restartable with VT-d fault as async event notification. Hardware bits > are designed in such way that all expected permission controls have > to exist before device actually issues access request.You are talking about ''promtion'' (adding more permissions). Demotion required flushing entries in the TLB and is typically more expensive, hence the desire to ''batch'' the synchronization.> >kmap-style area of physical address space to cycle the mappings > through > >to avoid having to do so many synchronous invalidates (at the > >expense of > >allowing a driver domain to be able to DMA to a page for a > >little longer > >than it strictly ought to). > > Could you elaborate a bit how kmap-style area helps here? The key > point is whether frequency of p2m mapping can be reduced...A window of guest physical address space is created that is used to create mappings to granted pages. The next available free slot is used when creating a mapping. When the end of the window is reached, flush the IOMMU. (Actually, you can do better by issuing a flush at the halfway point and then synchronizing against the flush at the end if the window, effectively double buffering). This typically avoids the churn that would happen with grant mappings where the same guest physical address region is used to map different pages (requiring a synchronous flush). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] >Sent: 2008年5月21日 13:05 >> >Is "demotion" of access handled synchronously, or do you have some >> >tricks to mitigate the synchronization? >> >> All changes need be handled synchronously, as DMA request is not >> restartable with VT-d fault as async event notification. >Hardware bits >> are designed in such way that all expected permission controls have >> to exist before device actually issues access request. > >You are talking about ''promtion'' (adding more permissions). Demotion >required flushing entries in the TLB and is typically more expensive, >hence the desire to ''batch'' the synchronization.So you''re talking about TLB inside CPU? IMO, both demotion/promotion requires IOTLB flush. Even for promotion, it''s not like instruction access to trigger fault for Xen to promote lazily and then restart the exection... ''batch'' IOTLB flush is good direction, which requires some cooperation from guest, e.g. as long as guest driver doesn''t attempt to use set of frames in changing. So to me it''s more like some change in guest side to batch grant/m2p request together. Or else Xen itself doesn''t know when one changed mapping will be used by guest and thus has to force flush for each change before resuming back to guest> >> >kmap-style area of physical address space to cycle the mappings >> through >> >to avoid having to do so many synchronous invalidates (at the >> >expense of >> >allowing a driver domain to be able to DMA to a page for a >> >little longer >> >than it strictly ought to). >> >> Could you elaborate a bit how kmap-style area helps here? The key >> point is whether frequency of p2m mapping can be reduced... > >A window of guest physical address space is created that is used to >create mappings to granted pages. The next available free slot is used >when creating a mapping. When the end of the window is reached, flush >the IOMMU. (Actually, you can do better by issuing a flush at the >halfway point and then synchronizing against the flush at the >end if the >window, effectively double buffering). > >This typically avoids the churn that would happen with grant mappings >where the same guest physical address region is used to map different >pages (requiring a synchronous flush). >Yep, that''s a neat one to hold unflushed guest frames from being used until batch flush is done later. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >You are talking about ''promtion'' (adding more permissions). Demotion > >required flushing entries in the TLB and is typically more expensive, > >hence the desire to ''batch'' the synchronization. > > So you''re talking about TLB inside CPU? IMO, both demotion/promotion > requires IOTLB flush. Even for promotion, it''s not like instruction > access to trigger fault for Xen to promote lazily and then restart the > exection...The IOMMU TLB doesn''t cache not_present entries, hence you don''t need to do a flush when transitioning an entry from not_present to present. Some IOMMUs will also re-walk the pagetable if they find a TLB entry that is read-only and the operation is a write, but I can''t recall whether VTd is like this.> ''batch'' IOTLB flush is good direction, which requires some cooperation > from guest, e.g. as long as guest driver doesn''t attempt to use set of > frames in changing. So to me it''s more like some change in guest side > to batch grant/m2p request together. Or else Xen itself doesn''t know > when one changed mapping will be used by guest and thus has to > force flush for each change before resuming back to guestFor ballooned frames, Xen should be able to put them on a ''pending'' list awaiting completion of a flush (hence the flush is typically not synchronous). Frames that transition to pagetable frames are more problematic. It''s probably better to modify the guest to create a separate quicklist for pagetable frames, so they get recycled and remain out of the IOMMU until they return to the free list. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yang, Xiaowei
2008-May-21 05:42 UTC
RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
>-----Original Message----- >From: Espen Skoglund [mailto:espen.skoglund@netronome.com] >Sent: Tuesday, May 20, 2008 10:36 PM >To: Yang, Xiaowei >Cc: Espen Skoglund; xen-devel@lists.xensource.com >Subject: RE: [Xen-devel] [PATCH 0/5] VT-d support for PV guests > >[Xiaowei Yang] >> Espen, >> The patches look good to me with some comments: > >> - For the occasions when P2M is changed, the hooks of >> iommu_{un}map_page() can be added cleaner. Only the hooks inside >> guest_physmap_add/remove_page() are necessary. The hooks in >> populate_physmap() and memory_exchange() can be omitted by some >> small code rearrangement like removing if(paging_mode_translate(d)) >> before calling guest_physmap_add_page(). > >Yes. I considered this as an option as well, but ended up with the >current approach. Your suggestion is probably cleaner, so I''ll switch >over to doing that. > > >> - gnttab_map/unmap_grant_ref() need to be hooked also. There are no >> P2M changes at that time while the guest PT is updated directly. The >> mapped pages can also be used for DMA by backend drivers. > >Thanks. Overlooked that one. Only caught the gnttab_transfer(). > > >> - dom0 can be treated as the same as other PV domains with regard to >> VTd PT updating. Unfortunately, it need some special care. All of >> devices are assigned to it by default and usually it ones the most >> of devices. iommu_{un}map_page() could be called very frequently by >> it while it serves other domains IO requests. It will bring >> performance penalty and CPU overhead. > >dom0 should not need to do any VT-d page table updating once it has >been set up, so marking it as need_iommu() should be unnecessary. >Also, if passthrough mode is supported in VT-d then dom0 does not need >to have VT-d page tables at all. I think setting it''s VT-d tables up >to have complete access at startup and leave it that way is perfectly >fine. >Currently, [0, max_page] is 1:1 mapped in dom0''s VT-d page table, which gives dom0 the ability to DMA all range of memory, including critical regions like Xen HV. It''s a security hole, and it''s still there with passthrough mode. Dynamic VT-d page table for dom0 can fix it. Hopefully, it will be acceptable with gnttab map/unmap and other optimizations. Thanks, Xiaowei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] >Sent: 2008年5月21日 13:32 > >The IOMMU TLB doesn''t cache not_present entries, hence you >don''t need to >do a flush when transitioning an entry from not_present to >present. Some >IOMMUs will also re-walk the pagetable if they find a TLB entry that is >read-only and the operation is a write, but I can''t recall whether VTd >is like this.You''re right, though VT-d spec defines a capability bit to indicate whether VT-d may cache non-present entry or not. In reality it doesn''t make sense to do that. For promotion from non-present to present, then no flush is required. But for promotion from read-only to read-write, I guess flush has to be forced. Not sure whether currently such grant entry exists to switch RO/RW permission on demand.> >> ''batch'' IOTLB flush is good direction, which requires some >cooperation >> from guest, e.g. as long as guest driver doesn''t attempt to >use set of >> frames in changing. So to me it''s more like some change in guest side >> to batch grant/m2p request together. Or else Xen itself doesn''t know >> when one changed mapping will be used by guest and thus has to >> force flush for each change before resuming back to guest > >For ballooned frames, Xen should be able to put them on a >''pending'' list >awaiting completion of a flush (hence the flush is typically not >synchronous).It seems so. Just one security concern: it''s possible to have decreased frames allocated to another VM before completion of async flush. In this case, the IOMMU still caches old mapping and thus it leaks a window for mallicious domain/device to touch those frames... But we may force a delayed completion check when a free page is allocated to a new VM or inserted to other p2m table. Then I agree upon page''s specific usage, async flush is possible. :-)> >Frames that transition to pagetable frames are more problematic. It''s >probably better to modify the guest to create a separate quicklist for >pagetable frames, so they get recycled and remain out of the >IOMMU until >they return to the free list. >So you''re already talking about one more step from current implentation, to selectively insert mapping as device really requires. Create a PV iommu interface may serve this purpose more accurately. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Tian, Kevin >Sent: 2008年5月21日 13:56 > >>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] >>Sent: 2008年5月21日 13:32 >> >>The IOMMU TLB doesn''t cache not_present entries, hence you >>don''t need to >>do a flush when transitioning an entry from not_present to >>present. Some >>IOMMUs will also re-walk the pagetable if they find a TLB >entry that is >>read-only and the operation is a write, but I can''t recall whether VTd >>is like this. > >You''re right, though VT-d spec defines a capability bit to indicate >whether VT-d may cache non-present entry or not. In reality it >doesn''t make sense to do that. > >For promotion from non-present to present, then no flush is required. >But for promotion from read-only to read-write, I guess flush has to >be forced. Not sure whether currently such grant entry exists to >switch RO/RW permission on demand.Forgot above, I''m just talking exact same thing as you already pointed out. :-P Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >For ballooned frames, Xen should be able to put them on a > >''pending'' list > >awaiting completion of a flush (hence the flush is typically not > >synchronous). > > It seems so. Just one security concern: it''s possible to havedecreased> frames allocated to another VM before completion of async flush. In > this case, the IOMMU still caches old mapping and thus it leaks awindow> for mallicious domain/device to touch those frames... > > But we may force a delayed completion check when a free page is > allocated to a new VM or inserted to other p2m table. Then I agreeupon> page''s specific usage, async flush is possible. :-)Yep, that''s the purpose of the ''pending'' list(s): pages don''t graduate to the free list until synchronized against flush completion.> >Frames that transition to pagetable frames are more problematic. It''s > >probably better to modify the guest to create a separate quicklistfor> >pagetable frames, so they get recycled and remain out of the > >IOMMU until > >they return to the free list. > > > > So you''re already talking about one more step from current > implentation, to selectively insert mapping as device really requires.Create a PV> iommu interface may serve this purpose more accurately.Linux 2.4 had a concept of a quicklist that was used to recycle pagetable pages rather than just returning them to the free list. It got removed in 2.6, but reinstating something similar would help Xen, particularly in the case with an IOMMU. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Muli Ben-Yehuda
2008-May-26 09:21 UTC
Re: [Xen-devel] [PATCH 0/5] VT-d support for PV guests
On Tue, May 20, 2008 at 03:10:38PM +0100, Espen Skoglund wrote:> Anyhow, read-only access can indeed be supported for VT-d. I just > wanted to get basic PV guest support in there first. Also, I''m not > familiar with AMD''s IOMMU, but I would guess that it also supports > read-only access.It does.> > It would be good if you could provide a bit more detail on when > > the patch populates IOMMU entries, and how it keeps them in > > sync. For example, does the IOMMU map all the guest''s memory, or > > just that which will soon be the subject of a DMA? How synchronous > > is the patch in removing mappings, e.g. due to page type changes > > (pagetable pages, balloon driver) or due to unmapping grants? > > All writable memory is initially mapped in the IOMMU. Page type > changes are also reflected there. In general all maps and unmaps to > a domain are synced with the IOMMU. According to the feedback I got > I apparently missed some places, though. Will look into this and > fix it. > > It''s clear that performance will pretty much suck if you do frequent > updates in grant tables, but the whole idea of having passthrough > access for NICs is to avoid this netfront/netback data plane scheme > altogether. This leaves you with grant table updates for block > device access. I don''t know what the expected update frequency is > for that one. > > It must be noted that reflecting grant table updates in the IOMMU is > required for correctness. The alternative --- which is indeed > possible --- is to catch DMA faults to such memory regions and > somehow notify the driver to, e.g., drop packets or retry the DMA > transaction once the IOMMU mapping has been established.That would assume that the device can retry failed DMA''s or otherwise deal with them. The majority of devices can''t.> > There''s been a lot of discussion at various xen summits about > > different IOMMU optimizations (e.g. for IBM Summit, Power etc) and > > I''d like to understand exactly what tradeoffs your implementation > > makes. Anyhow, good stuff, thanks!I think the major difference is that with Espen''s patches all of the PV guest''s memory is exposed to the device (i.e., it provides inter-guest protection, but no intra-guest protection). Our patches aimed at providing both inter-guest and intra-guest protection, and incurred a substantial performance hit (c.f., our OLS ''07 paper on IOMMU performance). There will be a paper at USENIX ''08 by Willmann et al., on different IOMMU mapping strategies which provide varying levels of inter/intra guest protection and performance hit.> I can''t say I know much about those other IOMMUs, but as far as I > know they are quite limited in that they only support a fixed number > of mappingsThe IBM Calgary/CalIOC2 family of IOMMUs support 4GB address spaces.> and can not differentiate between different DMA sources (i.e., PCI > devices).Calgary/CalIOC2 have a per-bus translation table. In practice most devices are on their own bus in these systems, so you get effectively per-device translation translation tables. Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [PATCH] VT-d: IOTLB flush fixups
- [PATCH] fix memory allocation from NUMA node for VT-d.
- Megaraid SAS driver failing in Xen-3.3.0 but was working in Xen-3.2.2-rc3
- Workaround for the corrupted Intel X48 DMAR table
- [PATCH][VTD] pci mmcfg patch for x86-64 - version 2