Tao Shen
2007-May-28 09:53 UTC
[Xen-users] continued question on Xen 3d virtualization with IOMMU
Hi , Xen-User group: I am planning on a new system to run 4 VMs within Xen, hopefully after 3d(openGL, Direct3d) working in Xen or windows. I have some questions after extensively googling for it: Mats, you said that you don''t need Xen-aware drivers in DomU if the system has IOMMU. now on the subject of hardware support, currently we have Vt-x and AMD-v and that''s hardware assisted CPU virtualization. what''s coming up in Penryn is the Extended Page Table(EPT) and AMD Barcelona''s Nested Page Table(NPT) for help with hardware assisted memory virtualization as far as I understand it. Now question #1, EPT and NPT should only help performance of the VM, it doesn''t help with 3d right? What I understand is that you need IOMMU instead which is a chipset feature instead of a CPU feature. on the subject of IOMMU support: The Bearlake Q35 chipset will come with Intel VT-d(intel''s version of IO virt), expected in a few months, Bearlake P35 is already out. On the AMD side, I have heard that current chipsets already have IOMMU support built in.(probably not AMD IOMMU spec 1.2 just released, but at least 1.0) Now question #2, which AMD chipsets(there is a bunch of Nforce, and ATI chipsets) that Xen developers know of that has IOMMU working?(I have heard that the GART and DEV together is a fully functional IOMMU unit) and if I were to get an Athlon X2 AM2 chip with that chipset mobo, technically, I can get the 3d working right? but without the benefits of NPT which later comes with Barcelona(which is also AM2 socket compatible) Question #3: you said that Xen aware GPU drivers can help 3d accleration in domU VMs if the GPU driver is open source. Intel''s GPUs are all open source now, when can users expect to have Xen work with Intel''s embedded GPUs like GMA950 and X3100s? Now question #4: not that important, but how much performance benefits do you think you can get from the addition of NPT and EPT?, VMware argues that the first gen VT-x and AMD-V sometimes made the VMs slower. If EPT doesn''t add much and AMD''s got IOMMU already working, there is no reason for me to wait for Penryn IMHO. Thanks for your time and thank you in advance for helping me with those questions, _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Petersson, Mats
2007-May-29 14:59 UTC
RE: [Xen-users] continued question on Xen 3d virtualization with IOMMU
> -----Original Message----- > From: xen-users-bounces@lists.xensource.com > [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Tao Shen > Sent: 28 May 2007 10:53 > To: xen-users@lists.xensource.com > Subject: [Xen-users] continued question on Xen 3d > virtualization with IOMMU > > Hi , Xen-User group: > > I am planning on a new system to run 4 VMs within Xen, > hopefully after 3d(openGL, Direct3d) working in Xen or windows. > > I have some questions after extensively googling for it: > > Mats, you said that you don''t need Xen-aware drivers in DomU > if the system has IOMMU.Yes.> > now on the subject of hardware support, currently we have > Vt-x and AMD-v and that''s hardware assisted CPU > virtualization. what''s coming up in Penryn is the Extended > Page Table(EPT) and AMD Barcelona''s Nested Page Table(NPT) > for help with hardware assisted memory virtualization as far > as I understand it.Yes.> > Now question #1, EPT and NPT should only help performance of > the VM, it doesn''t help with 3d right? What I understand is > that you need IOMMU instead which is a chipset feature > instead of a CPU feature.That is correct. {N,E}PT is only to support the fact that the guest view of memory layout is different from the physical memory that is ACTUALLY used by the guest. It relieves all the work done by the shadow-paging code in the guest.> > on the subject of IOMMU support: The Bearlake Q35 chipset > will come with Intel VT-d(intel''s version of IO virt), > expected in a few months, Bearlake P35 is already out. On > the AMD side, I have heard that current chipsets already have > IOMMU support built in.(probably not AMD IOMMU spec 1.2 just > released, but at least 1.0)As far as I''m aware, there are no AMD chipsets with IOMMU available - I could be wrong, but that''s my understanding.> > Now question #2, which AMD chipsets(there is a bunch of > Nforce, and ATI chipsets) that Xen developers know of that > has IOMMU working?(I have heard that the GART and DEV > together is a fully functional IOMMU unit) and if I were to > get an Athlon X2 AM2 chip with that chipset mobo, > technically, I can get the 3d working right? but without the > benefits of NPT which later comes with Barcelona(which is > also AM2 socket compatible)GART will support re-mapping of the device memory access, but it only supports one map for the entire system, which may be insufficient for anything but the most minimal setup. Also, there''s currently no software to support GART at all in Xen, although this about to change for the purpose of using GART to map the para-virtual memory. Thus far I''ve heard of no plans to use this to support fully virtualizaed guests. DEV will prevent one guest from access another guests memory (which is another functionality that the IOMMU allows - making sure that the PCI device doesn''t access somewhere OUTSIDE it''s own memory)> > Question #3: you said that Xen aware GPU drivers can help 3d > accleration in domU VMs if the GPU driver is open source. > Intel''s GPUs are all open source now, when can users expect > to have Xen work with Intel''s embedded GPUs like GMA950 and X3100s?Just to clarify, unless we start making really big changes to the driver architecture, we have to use a modified driver in the guest. I''m not 100% sure that the entire interface necessary to perform this task is there in the para-virtual driver interface [it probably can be ADDED, but it further makes the task complicated]. If the driver is open source, you have some chance of actually modifying it. But these drivers are quite clearly non-trivial, so it''s not just a case of "recompiling for Xen". It is a case of wading through the code and modifying any place where a reference to memory is given to the graphics card, such that the new code takes into account the fact that memory in the guest isn''t actually the REAL physical memory layout. I''m sure this CAN be done, but it''s a lot of hard work to find all the relevent places [also, you have to be aware that memory the guest thinks is contiguous may not actually be contiguous in the ACTUAL physical memory map, which means that some process of re-mapping this to a MACHINE PHYSICAL contiguous memor region would be necessary]. I also don''t think the GPU drivers for Windows are Open Source, and since the vast majority of "requests" for 3D graphics in guest are related to using Windows to do 3D graphics, this is clearly where the effort would have to be put in.> > Now question #4: not that important, but how much performance > benefits do you think you can get from the addition of NPT > and EPT?, VMware argues that the first gen VT-x and AMD-V > sometimes made the VMs slower. If EPT doesn''t add much and > AMD''s got IOMMU already working, there is no reason for me to > wait for Penryn IMHO.It very much depends on the "application" you''re using. It requires much less interaction with the hypervisor, which is why it''s there. The shadow-paging code in Xen is not trivial, it interprets instructions that are "trapped" by the hypervisor. Each update will take many thousand cycles, guaranteed! On the other hand, the nested paging adds overhead reads of the "host-pagetable", which is in a worst case scenario 4 per page-table level, so a maximum of 20 reads for one complete page-table fill. This is unlikely to happen very often (the highest level page-table usually only has two entries, one for kernel and one for "user-code", so at least these should be cached in the TLB - the next levels depend on the application). So, in a test-case like "kernel compile" (which does lots of page-table updates), the benefit will definitely be noticable [if not at the speed the compile scrolls past, at least you will be able to measure it with a regular wrist-watch with a second hand, rather than a cronograph]. On the other "extreme", you''ll have the case where you have a HUGE array (many gigabytes), and use a random number to index that array - then you''ll have few updates to the page-table, and many TLB-miss operations where the whole chain of memory reads have to take place. In between comes some benchmarks such as CPU intensive calculations where the amount of memory accecssed is relatively small and not many page-table-updates, where there''s no big difference either direction. Just like for the x86-64 vs. x86-32 performance difference, one isn''t necessarily better than the other on individual cases, and it may even be that the "new" one is slower. But on an average over some reasonably different benchmarks, the overall win is with the "new" technology. I can''t give any direct benchmarks, simply because I don''t have any. -- Mats> > Thanks for your time and thank you in advance for helping me > with those questions, > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users