thr3ads.net - Xen devel - [Xen-devel] [VTD] Intel iommu IOTLB flush really slow [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Jean Guyader

2011-Oct-31 16:38 UTC

[Xen-devel] [VTD] Intel iommu IOTLB flush really slow

Hi,

Some IOMMU DMA remapping engine sometimes take longer to flush the IOTLBs.
For instance on Ibex Peak a iommu_map_page can in the order of milisecondes.

In the Intel IOMMU spec you can see that you don''t need to flush if the
PTE was
present so it''s all good when we are creating a domain because we
don''t need to
flush anything. Some problem happen when we try to move memory arround.

Here is some code from hvmloader, pci.c:190 on xen-unstable:

while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
{
    struct xen_add_to_physmap xatp;
    if ( hvm_info->high_mem_pgend == 0 )
        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
    xatp.domid = DOMID_SELF;
    xatp.space = XENMAPSPACE_gmfn;
    xatp.idx   = --hvm_info->low_mem_pgend;
    xatp.gpfn  = hvm_info->high_mem_pgend++;
    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
        BUG();
}

This code gets triggered when the PCI hole increased so much that it
overlaps with the allocated RAM. So we have to relocate the section that
overlap in the top memory.

If we folow the code down to Xen we can find that add_to_physmap calls
set_p2m_entry which uses either p2m_set_entry or ept_set_entry with an order
or 0, yes we only try to move one page.

Both implementations update the iommu page table with iommu_map_page.
So at the end we end up doing a loop of iommu_map_page driven by this loop
in hvmloader.

The IOMMU DMA remapping enigne of the Intel GPU is really really
slow to flush. So when we try to create a domain that does Intel GPU pass
through with enough memory to force a relocation of the top RAM below 4G
the domain can take minutes to start!

There are multiple approches that we can use to fix this problem, but before I
start working on a patch I would like to get the list''s point of view.

Plan A:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a
new gfn.
  - Add a flag in the IOMMU API to delay the IOTLB flush
  - Add a new API call to flush the the IOTLB manully once we relocate all the
range.

Plan B:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a
new gfn.
  - Add a new set_p2m_entry function that will understand batches of gfns and
mfns.
  - Implement batch operation for shadow and HAP.
  - Add new IOMMU API to support batch operation

(A) isn''t very nice but has the benefit of not modifying to much code,
(B) would be the
right thing to do but would be quite disruptive in term of code and API change.

Let me know what you think,
Jean

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Nov-01 06:00 UTC

head link

[Xen-devel] RE: [VTD] Intel iommu IOTLB flush really slow

Hi Jean,

I agree plan B is the better solution.  Having batch capability in shadow/HAP
might be useful for other use cases.

Allen

-----Original Message-----
From: Jean Guyader [mailto:jean.guyader@eu.citrix.com] 
Sent: Monday, October 31, 2011 9:38 AM
To: xen-devel@lists.xensource.com
Cc: Kay, Allen M
Subject: [VTD] Intel iommu IOTLB flush really slow

Hi,

Some IOMMU DMA remapping engine sometimes take longer to flush the IOTLBs.
For instance on Ibex Peak a iommu_map_page can in the order of milisecondes.

In the Intel IOMMU spec you can see that you don''t need to flush if the
PTE was present so it''s all good when we are creating a domain because
we don''t need to flush anything. Some problem happen when we try to
move memory arround.

Here is some code from hvmloader, pci.c:190 on xen-unstable:

while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) {
    struct xen_add_to_physmap xatp;
    if ( hvm_info->high_mem_pgend == 0 )
        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
    xatp.domid = DOMID_SELF;
    xatp.space = XENMAPSPACE_gmfn;
    xatp.idx   = --hvm_info->low_mem_pgend;
    xatp.gpfn  = hvm_info->high_mem_pgend++;
    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
        BUG();
}

This code gets triggered when the PCI hole increased so much that it overlaps
with the allocated RAM. So we have to relocate the section that overlap in the
top memory.

If we folow the code down to Xen we can find that add_to_physmap calls
set_p2m_entry which uses either p2m_set_entry or ept_set_entry with an order or
0, yes we only try to move one page.

Both implementations update the iommu page table with iommu_map_page.
So at the end we end up doing a loop of iommu_map_page driven by this loop in
hvmloader.

The IOMMU DMA remapping enigne of the Intel GPU is really really slow to flush.
So when we try to create a domain that does Intel GPU pass through with enough
memory to force a relocation of the top RAM below 4G the domain can take minutes
to start!

There are multiple approches that we can use to fix this problem, but before I
start working on a patch I would like to get the list''s point of view.

Plan A:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a
new gfn.
  - Add a flag in the IOMMU API to delay the IOTLB flush
  - Add a new API call to flush the the IOTLB manully once we relocate all the
range.

Plan B:
  - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a
new gfn.
  - Add a new set_p2m_entry function that will understand batches of gfns and
mfns.
  - Implement batch operation for shadow and HAP.
  - Add new IOMMU API to support batch operation

(A) isn''t very nice but has the benefit of not modifying to much code,
(B) would be the right thing to do but would be quite disruptive in term of code
and API change.

Let me know what you think,
Jean

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2011 - [VTD] Intel iommu IOTLB flush really slow

[Xen-devel] [VTD] Intel iommu IOTLB flush really slow

[Xen-devel] RE: [VTD] Intel iommu IOTLB flush really slow