Hi, Some IOMMU DMA remapping engine sometimes take longer to flush the IOTLBs. For instance on Ibex Peak a iommu_map_page can in the order of milisecondes. In the Intel IOMMU spec you can see that you don''t need to flush if the PTE was present so it''s all good when we are creating a domain because we don''t need to flush anything. Some problem happen when we try to move memory arround. Here is some code from hvmloader, pci.c:190 on xen-unstable: while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) { struct xen_add_to_physmap xatp; if ( hvm_info->high_mem_pgend == 0 ) hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); xatp.domid = DOMID_SELF; xatp.space = XENMAPSPACE_gmfn; xatp.idx = --hvm_info->low_mem_pgend; xatp.gpfn = hvm_info->high_mem_pgend++; if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) BUG(); } This code gets triggered when the PCI hole increased so much that it overlaps with the allocated RAM. So we have to relocate the section that overlap in the top memory. If we folow the code down to Xen we can find that add_to_physmap calls set_p2m_entry which uses either p2m_set_entry or ept_set_entry with an order or 0, yes we only try to move one page. Both implementations update the iommu page table with iommu_map_page. So at the end we end up doing a loop of iommu_map_page driven by this loop in hvmloader. The IOMMU DMA remapping enigne of the Intel GPU is really really slow to flush. So when we try to create a domain that does Intel GPU pass through with enough memory to force a relocation of the top RAM below 4G the domain can take minutes to start! There are multiple approches that we can use to fix this problem, but before I start working on a patch I would like to get the list''s point of view. Plan A: - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a new gfn. - Add a flag in the IOMMU API to delay the IOTLB flush - Add a new API call to flush the the IOTLB manully once we relocate all the range. Plan B: - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a new gfn. - Add a new set_p2m_entry function that will understand batches of gfns and mfns. - Implement batch operation for shadow and HAP. - Add new IOMMU API to support batch operation (A) isn''t very nice but has the benefit of not modifying to much code, (B) would be the right thing to do but would be quite disruptive in term of code and API change. Let me know what you think, Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2011-Nov-01 06:00 UTC
[Xen-devel] RE: [VTD] Intel iommu IOTLB flush really slow
Hi Jean, I agree plan B is the better solution. Having batch capability in shadow/HAP might be useful for other use cases. Allen -----Original Message----- From: Jean Guyader [mailto:jean.guyader@eu.citrix.com] Sent: Monday, October 31, 2011 9:38 AM To: xen-devel@lists.xensource.com Cc: Kay, Allen M Subject: [VTD] Intel iommu IOTLB flush really slow Hi, Some IOMMU DMA remapping engine sometimes take longer to flush the IOTLBs. For instance on Ibex Peak a iommu_map_page can in the order of milisecondes. In the Intel IOMMU spec you can see that you don''t need to flush if the PTE was present so it''s all good when we are creating a domain because we don''t need to flush anything. Some problem happen when we try to move memory arround. Here is some code from hvmloader, pci.c:190 on xen-unstable: while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) { struct xen_add_to_physmap xatp; if ( hvm_info->high_mem_pgend == 0 ) hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); xatp.domid = DOMID_SELF; xatp.space = XENMAPSPACE_gmfn; xatp.idx = --hvm_info->low_mem_pgend; xatp.gpfn = hvm_info->high_mem_pgend++; if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) BUG(); } This code gets triggered when the PCI hole increased so much that it overlaps with the allocated RAM. So we have to relocate the section that overlap in the top memory. If we folow the code down to Xen we can find that add_to_physmap calls set_p2m_entry which uses either p2m_set_entry or ept_set_entry with an order or 0, yes we only try to move one page. Both implementations update the iommu page table with iommu_map_page. So at the end we end up doing a loop of iommu_map_page driven by this loop in hvmloader. The IOMMU DMA remapping enigne of the Intel GPU is really really slow to flush. So when we try to create a domain that does Intel GPU pass through with enough memory to force a relocation of the top RAM below 4G the domain can take minutes to start! There are multiple approches that we can use to fix this problem, but before I start working on a patch I would like to get the list''s point of view. Plan A: - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a new gfn. - Add a flag in the IOMMU API to delay the IOTLB flush - Add a new API call to flush the the IOTLB manully once we relocate all the range. Plan B: - Add a new XENMEM add_to_physmap_range that would relocate a gfn range to a new gfn. - Add a new set_p2m_entry function that will understand batches of gfns and mfns. - Implement batch operation for shadow and HAP. - Add new IOMMU API to support batch operation (A) isn''t very nice but has the benefit of not modifying to much code, (B) would be the right thing to do but would be quite disruptive in term of code and API change. Let me know what you think, Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel