Hello, I''m passing through a graphic card to a guest that has more than 3G of RAM (4G to be precise in my case). What happen is that the VM creation is stuck in the process, so I put some tracing in the Xen code to see what was taking the time. I discovered that the guest was stuck in hvmloader inside this loop: while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) { struct xen_add_to_physmap xatp; if ( hvm_info->high_mem_pgend == 0 ) hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); xatp.domid = DOMID_SELF; xatp.space = XENMAPSPACE_gmfn; xatp.idx = --hvm_info->low_mem_pgend; xatp.gpfn = hvm_info->high_mem_pgend++; if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) BUG(); } This loop relocate the RAM on the top to leave so space for the PCI BARs. It''s loop on each page so in my case it''s quite a big loop because the GPU has a BAR of 256M. So the interesting is that the function add_to_physmap takes most of the time. I believe that what takes most part of it is the iommu iotlb flush that come with the iommu_map_pages or the iommu_unmap_page which are called when we manipulate the p2m table. In my case the iommu flush take a very long time (because of the intel gpu ?), about 10 milliseconds. So if I''m patient enough my domain will start, about 10 minutes. A way to go will be to create a range interface to iommu_map_page iommu_unmap_page since iommu_flush are so expensive. Then some work need to be done to add a range interface to all the function between add_to_physmap and the p2m_set_entry which would be a big patch. I hope there is another way out of this problem. Thanks, Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jean, Do you see any boot time difference between passing through integrated graphics for the very first time and the subsequent times? Which platform are you using? Allen -----Original Message----- From: Jean Guyader [mailto:jean.guyader@gmail.com] Sent: Wednesday, November 10, 2010 1:50 PM To: xen-devel@lists.xensource.com Cc: Kay, Allen M Subject: Intel GPU pass-through with > 3G Hello, I''m passing through a graphic card to a guest that has more than 3G of RAM (4G to be precise in my case). What happen is that the VM creation is stuck in the process, so I put some tracing in the Xen code to see what was taking the time. I discovered that the guest was stuck in hvmloader inside this loop: while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) { struct xen_add_to_physmap xatp; if ( hvm_info->high_mem_pgend == 0 ) hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); xatp.domid = DOMID_SELF; xatp.space = XENMAPSPACE_gmfn; xatp.idx = --hvm_info->low_mem_pgend; xatp.gpfn = hvm_info->high_mem_pgend++; if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) BUG(); } This loop relocate the RAM on the top to leave so space for the PCI BARs. It''s loop on each page so in my case it''s quite a big loop because the GPU has a BAR of 256M. So the interesting is that the function add_to_physmap takes most of the time. I believe that what takes most part of it is the iommu iotlb flush that come with the iommu_map_pages or the iommu_unmap_page which are called when we manipulate the p2m table. In my case the iommu flush take a very long time (because of the intel gpu ?), about 10 milliseconds. So if I''m patient enough my domain will start, about 10 minutes. A way to go will be to create a range interface to iommu_map_page iommu_unmap_page since iommu_flush are so expensive. Then some work need to be done to add a range interface to all the function between add_to_physmap and the p2m_set_entry which would be a big patch. I hope there is another way out of this problem. Thanks, Jean _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It''s consistent. The reason is that the vt-d mapping already happened once when xen allocate the guest memory. So the relocation of the page for the pci hole end up to be the second mapping. Jean On 11 November 2010 00:04, Kay, Allen M <allen.m.kay@intel.com> wrote:> Jean, > > Do you see any boot time difference between passing through integrated graphics for the very first time and the subsequent times? Which platform are you using? > > Allen > > -----Original Message----- > From: Jean Guyader [mailto:jean.guyader@gmail.com] > Sent: Wednesday, November 10, 2010 1:50 PM > To: xen-devel@lists.xensource.com > Cc: Kay, Allen M > Subject: Intel GPU pass-through with > 3G > > Hello, > > I''m passing through a graphic card to a guest that has more than 3G of > RAM (4G to be precise in my case). > > What happen is that the VM creation is stuck in the process, so I put > some tracing in the Xen code to see what > was taking the time. I discovered that the guest was stuck in > hvmloader inside this loop: > > while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) > { > struct xen_add_to_physmap xatp; > if ( hvm_info->high_mem_pgend == 0 ) > hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); > xatp.domid = DOMID_SELF; > xatp.space = XENMAPSPACE_gmfn; > xatp.idx = --hvm_info->low_mem_pgend; > xatp.gpfn = hvm_info->high_mem_pgend++; > if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > BUG(); > } > > This loop relocate the RAM on the top to leave so space for the PCI BARs. > It''s loop on each page so in my case it''s quite a big loop because the > GPU has a BAR of 256M. > > So the interesting is that the function add_to_physmap takes most of > the time. I believe > that what takes most part of it is the iommu iotlb flush that come > with the iommu_map_pages > or the iommu_unmap_page which are called when we manipulate the p2m table. > > In my case the iommu flush take a very long time (because of the intel > gpu ?), about 10 > milliseconds. So if I''m patient enough my domain will start, about 10 minutes. > > A way to go will be to create a range interface to iommu_map_page > iommu_unmap_page > since iommu_flush are so expensive. Then some work need to be done to > add a range interface > to all the function between add_to_physmap and the p2m_set_entry which > would be a big > patch. I hope there is another way out of this problem. > > Thanks, > Jean >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel De Graaf
2010-Nov-12 14:18 UTC
Re: [Xen-devel] RE: Intel GPU pass-through with > 3G
I have also noticed this issue (9ms IOMMU flush), although I not during domain creation. The path in which I observed it is page remapping when using map_grant_ref. I haven''t tested a DomU with over 3G of memory, however; the delay may also be present in that case on my platform. I have done some work to try to add an ''order'' parameter to iommu_map_page, but it isn''t stable yet; if this is the only way to get around the slow flush, I will look at finishing it. Would it be possible to add a flag to delay IOMMU flushing until after a batch update is finished? A single flush at the end, even if expensive, would be faster than 10ms per page on mappings of a significant size. This is also likely to be a less intrusive patch. In case you''re interested, my platform is a Dell Optiplex 755, 4G RAM: # lspci 00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02) 00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 02) 00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) 00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA bridge: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) -- Daniel De Graaf National Security Agency On 11/10/2010 07:04 PM, Kay, Allen M wrote:> Jean, > > Do you see any boot time difference between passing through integrated graphics for the very first time and the subsequent times? Which platform are you using? > > Allen > > -----Original Message----- > From: Jean Guyader [mailto:jean.guyader@gmail.com] > Sent: Wednesday, November 10, 2010 1:50 PM > To: xen-devel@lists.xensource.com > Cc: Kay, Allen M > Subject: Intel GPU pass-through with > 3G > > Hello, > > I''m passing through a graphic card to a guest that has more than 3G of > RAM (4G to be precise in my case). > > What happen is that the VM creation is stuck in the process, so I put > some tracing in the Xen code to see what > was taking the time. I discovered that the guest was stuck in > hvmloader inside this loop: > > while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) > { > struct xen_add_to_physmap xatp; > if ( hvm_info->high_mem_pgend == 0 ) > hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); > xatp.domid = DOMID_SELF; > xatp.space = XENMAPSPACE_gmfn; > xatp.idx = --hvm_info->low_mem_pgend; > xatp.gpfn = hvm_info->high_mem_pgend++; > if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > BUG(); > } > > This loop relocate the RAM on the top to leave so space for the PCI BARs. > It''s loop on each page so in my case it''s quite a big loop because the > GPU has a BAR of 256M. > > So the interesting is that the function add_to_physmap takes most of > the time. I believe > that what takes most part of it is the iommu iotlb flush that come > with the iommu_map_pages > or the iommu_unmap_page which are called when we manipulate the p2m table. > > In my case the iommu flush take a very long time (because of the intel > gpu ?), about 10 > milliseconds. So if I''m patient enough my domain will start, about 10 minutes. > > A way to go will be to create a range interface to iommu_map_page > iommu_unmap_page > since iommu_flush are so expensive. Then some work need to be done to > add a range interface > to all the function between add_to_physmap and the p2m_set_entry which > would be a big > patch. I hope there is another way out of this problem. > > Thanks, > Jean_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I''ve done a patch like this where I set a flag d->iommu_dont_flush at the beginning of my batched function then an explicit call to flush the iotlb at the end. It''s not a very nice way of solving this problem maybe it would be better to have a range/batch interface at the p2m_set_entry level. Jean On 12 November 2010 14:18, Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote:> I have also noticed this issue (9ms IOMMU flush), although I not during > domain creation. The path in which I observed it is page remapping when > using map_grant_ref. I haven''t tested a DomU with over 3G of memory, > however; the delay may also be present in that case on my platform. > > I have done some work to try to add an ''order'' parameter to iommu_map_page, > but it isn''t stable yet; if this is the only way to get around the slow > flush, I will look at finishing it. > > Would it be possible to add a flag to delay IOMMU flushing until after a > batch update is finished? A single flush at the end, even if expensive, > would be faster than 10ms per page on mappings of a significant size. This > is also likely to be a less intrusive patch. > > In case you''re interested, my platform is a Dell Optiplex 755, 4G RAM: > > # lspci > 00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02) > 00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 02) > 00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) > 00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) > 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02) > 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) > 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) > 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) > 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02) > 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02) > 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) > 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) > 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) > 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) > 00:1f.0 ISA bridge: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller (rev 02) > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) > > -- > Daniel De Graaf > National Security Agency > > On 11/10/2010 07:04 PM, Kay, Allen M wrote: >> Jean, >> >> Do you see any boot time difference between passing through integrated graphics for the very first time and the subsequent times? Which platform are you using? >> >> Allen >> >> -----Original Message----- >> From: Jean Guyader [mailto:jean.guyader@gmail.com] >> Sent: Wednesday, November 10, 2010 1:50 PM >> To: xen-devel@lists.xensource.com >> Cc: Kay, Allen M >> Subject: Intel GPU pass-through with > 3G >> >> Hello, >> >> I''m passing through a graphic card to a guest that has more than 3G of >> RAM (4G to be precise in my case). >> >> What happen is that the VM creation is stuck in the process, so I put >> some tracing in the Xen code to see what >> was taking the time. I discovered that the guest was stuck in >> hvmloader inside this loop: >> >> while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) >> { >> struct xen_add_to_physmap xatp; >> if ( hvm_info->high_mem_pgend == 0 ) >> hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); >> xatp.domid = DOMID_SELF; >> xatp.space = XENMAPSPACE_gmfn; >> xatp.idx = --hvm_info->low_mem_pgend; >> xatp.gpfn = hvm_info->high_mem_pgend++; >> if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) >> BUG(); >> } >> >> This loop relocate the RAM on the top to leave so space for the PCI BARs. >> It''s loop on each page so in my case it''s quite a big loop because the >> GPU has a BAR of 256M. >> >> So the interesting is that the function add_to_physmap takes most of >> the time. I believe >> that what takes most part of it is the iommu iotlb flush that come >> with the iommu_map_pages >> or the iommu_unmap_page which are called when we manipulate the p2m table. >> >> In my case the iommu flush take a very long time (because of the intel >> gpu ?), about 10 >> milliseconds. So if I''m patient enough my domain will start, about 10 minutes. >> >> A way to go will be to create a range interface to iommu_map_page >> iommu_unmap_page >> since iommu_flush are so expensive. Then some work need to be done to >> add a range interface >> to all the function between add_to_physmap and the p2m_set_entry which >> would be a big >> patch. I hope there is another way out of this problem. >> >> Thanks, >> Jean >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Another thing to try is to set add a flag to XENMEM_add_physmap hypercall (or create another hypercall) to set a no_iotlb_flag. Hvmloader will then make this hypercall with this flag set to 1. When this flag is set, then intel_iommu_map_page will not do iotlb_flush. Given this mapping is done at domain creation time, there is really no need to do iotlb flushes, especially when it is doing page mapping. Allen -----Original Message----- From: Jean Guyader [mailto:jean.guyader@gmail.com] Sent: Friday, November 12, 2010 6:23 AM To: Daniel De Graaf Cc: Kay, Allen M; xen-devel@lists.xensource.com; Tim.Deegan@citrix.com Subject: Re: [Xen-devel] RE: Intel GPU pass-through with > 3G I''ve done a patch like this where I set a flag d->iommu_dont_flush at the beginning of my batched function then an explicit call to flush the iotlb at the end. It''s not a very nice way of solving this problem maybe it would be better to have a range/batch interface at the p2m_set_entry level. Jean On 12 November 2010 14:18, Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote:> I have also noticed this issue (9ms IOMMU flush), although I not during > domain creation. The path in which I observed it is page remapping when > using map_grant_ref. I haven''t tested a DomU with over 3G of memory, > however; the delay may also be present in that case on my platform. > > I have done some work to try to add an ''order'' parameter to iommu_map_page, > but it isn''t stable yet; if this is the only way to get around the slow > flush, I will look at finishing it. > > Would it be possible to add a flag to delay IOMMU flushing until after a > batch update is finished? A single flush at the end, even if expensive, > would be faster than 10ms per page on mappings of a significant size. This > is also likely to be a less intrusive patch. > > In case you''re interested, my platform is a Dell Optiplex 755, 4G RAM: > > # lspci > 00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02) > 00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 02) > 00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) > 00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) > 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02) > 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) > 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) > 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) > 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02) > 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02) > 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) > 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) > 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) > 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) > 00:1f.0 ISA bridge: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller (rev 02) > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) > > -- > Daniel De Graaf > National Security Agency > > On 11/10/2010 07:04 PM, Kay, Allen M wrote: >> Jean, >> >> Do you see any boot time difference between passing through integrated graphics for the very first time and the subsequent times? Which platform are you using? >> >> Allen >> >> -----Original Message----- >> From: Jean Guyader [mailto:jean.guyader@gmail.com] >> Sent: Wednesday, November 10, 2010 1:50 PM >> To: xen-devel@lists.xensource.com >> Cc: Kay, Allen M >> Subject: Intel GPU pass-through with > 3G >> >> Hello, >> >> I''m passing through a graphic card to a guest that has more than 3G of >> RAM (4G to be precise in my case). >> >> What happen is that the VM creation is stuck in the process, so I put >> some tracing in the Xen code to see what >> was taking the time. I discovered that the guest was stuck in >> hvmloader inside this loop: >> >> while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) >> { >> struct xen_add_to_physmap xatp; >> if ( hvm_info->high_mem_pgend == 0 ) >> hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); >> xatp.domid = DOMID_SELF; >> xatp.space = XENMAPSPACE_gmfn; >> xatp.idx = --hvm_info->low_mem_pgend; >> xatp.gpfn = hvm_info->high_mem_pgend++; >> if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) >> BUG(); >> } >> >> This loop relocate the RAM on the top to leave so space for the PCI BARs. >> It''s loop on each page so in my case it''s quite a big loop because the >> GPU has a BAR of 256M. >> >> So the interesting is that the function add_to_physmap takes most of >> the time. I believe >> that what takes most part of it is the iommu iotlb flush that come >> with the iommu_map_pages >> or the iommu_unmap_page which are called when we manipulate the p2m table. >> >> In my case the iommu flush take a very long time (because of the intel >> gpu ?), about 10 >> milliseconds. So if I''m patient enough my domain will start, about 10 minutes. >> >> A way to go will be to create a range interface to iommu_map_page >> iommu_unmap_page >> since iommu_flush are so expensive. Then some work need to be done to >> add a range interface >> to all the function between add_to_physmap and the p2m_set_entry which >> would be a big >> patch. I hope there is another way out of this problem. >> >> Thanks, >> Jean >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 14:23 +0000 on 12 Nov (1289571802), Jean Guyader wrote:> I''ve done a patch like this where I set a flag d->iommu_dont_flush at > the beginning of my batched function then > an explicit call to flush the iotlb at the end.Is that safe against concurrent updates? (e.g. do you hold an appropriate lock?) If so, please send us your patch! We can always put some syntactic sugar on it. Cheers, Tim.> It''s not a very nice way of solving this problem maybe it would be > better to have a range/batch interface at the p2m_set_entry level. > > Jean > > On 12 November 2010 14:18, Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote: > > I have also noticed this issue (9ms IOMMU flush), although I not during > > domain creation. The path in which I observed it is page remapping when > > using map_grant_ref. I haven''t tested a DomU with over 3G of memory, > > however; the delay may also be present in that case on my platform. > > > > I have done some work to try to add an ''order'' parameter to iommu_map_page, > > but it isn''t stable yet; if this is the only way to get around the slow > > flush, I will look at finishing it. > > > > Would it be possible to add a flag to delay IOMMU flushing until after a > > batch update is finished? A single flush at the end, even if expensive, > > would be faster than 10ms per page on mappings of a significant size. This > > is also likely to be a less intrusive patch. > > > > In case you''re interested, my platform is a Dell Optiplex 755, 4G RAM: > > > > # lspci > > 00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02) > > 00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 02) > > 00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) > > 00:02.1 Display controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02) > > 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02) > > 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) > > 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) > > 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) > > 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02) > > 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02) > > 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) > > 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) > > 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) > > 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) > > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) > > 00:1f.0 ISA bridge: Intel Corporation 82801IO (ICH9DO) LPC Interface Controller (rev 02) > > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) > > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) > > > > -- > > Daniel De Graaf > > National Security Agency > > > > On 11/10/2010 07:04 PM, Kay, Allen M wrote: > >> Jean, > >> > >> Do you see any boot time difference between passing through integrated graphics for the very first time and the subsequent times? Which platform are you using? > >> > >> Allen > >> > >> -----Original Message----- > >> From: Jean Guyader [mailto:jean.guyader@gmail.com] > >> Sent: Wednesday, November 10, 2010 1:50 PM > >> To: xen-devel@lists.xensource.com > >> Cc: Kay, Allen M > >> Subject: Intel GPU pass-through with > 3G > >> > >> Hello, > >> > >> I''m passing through a graphic card to a guest that has more than 3G of > >> RAM (4G to be precise in my case). > >> > >> What happen is that the VM creation is stuck in the process, so I put > >> some tracing in the Xen code to see what > >> was taking the time. I discovered that the guest was stuck in > >> hvmloader inside this loop: > >> > >> while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) > >> { > >> struct xen_add_to_physmap xatp; > >> if ( hvm_info->high_mem_pgend == 0 ) > >> hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); > >> xatp.domid = DOMID_SELF; > >> xatp.space = XENMAPSPACE_gmfn; > >> xatp.idx = --hvm_info->low_mem_pgend; > >> xatp.gpfn = hvm_info->high_mem_pgend++; > >> if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 ) > >> BUG(); > >> } > >> > >> This loop relocate the RAM on the top to leave so space for the PCI BARs. > >> It''s loop on each page so in my case it''s quite a big loop because the > >> GPU has a BAR of 256M. > >> > >> So the interesting is that the function add_to_physmap takes most of > >> the time. I believe > >> that what takes most part of it is the iommu iotlb flush that come > >> with the iommu_map_pages > >> or the iommu_unmap_page which are called when we manipulate the p2m table. > >> > >> In my case the iommu flush take a very long time (because of the intel > >> gpu ?), about 10 > >> milliseconds. So if I''m patient enough my domain will start, about 10 minutes. > >> > >> A way to go will be to create a range interface to iommu_map_page > >> iommu_unmap_page > >> since iommu_flush are so expensive. Then some work need to be done to > >> add a range interface > >> to all the function between add_to_physmap and the p2m_set_entry which > >> would be a big > >> patch. I hope there is another way out of this problem. > >> > >> Thanks, > >> Jean > >-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel