Andre Przywara
2010-Mar-05 14:36 UTC
[Xen-devel] [PATCH] x86/hvm: accelerate IO intercept handling
Hi, currently we go through the emulator every time a HVM guest does an I/O port access (in/out). This is unnecessary most of the times, as both VMX and SVM provide all the necessary information already in the VMCS/VMCB. String instructions are not covered by this shortcut, but they are quite rare and we would need to access the guest memory anyway. This patch decodes the information from VMCB/VMCS and calls a simple handle_mmio wrapper. In handle_mmio() itself the emulation part will simply be skipped, this approach avoids code duplication. Since the vendor specific part is quite trivial, I implemented both the VMX and SVM part, please check the VMX part for sanity. I simply boot tested both versions and ran some simple benchmarks. A micro benchmark (hammering an I/O port in a tight loop) shows a significant performance improvement (down to 66% of the time needed to handle the intercept on a K8, measured in the guest with TSC). Even with reading a 1GB file from an emulated IDE harddisk (Dom0 cached) I could get a 4-5% improvement. We found some guests (e.g. the TCP stack in some Windows version) which exercise the PM-Timer I/O port (0x1F48) very often (multiple 10,000 times per second), these workloads also benefit from this patch. Since this is not a regression, but only a performance improvement, I''d suggest to apply to the post 4.0 tree. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448 3567 12 ----to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Mar-05 14:52 UTC
[Xen-devel] Re: [PATCH] x86/hvm: accelerate IO intercept handling
On 05/03/2010 14:36, "Andre Przywara" <andre.przywara@amd.com> wrote:> I simply boot tested both versions and ran some simple benchmarks. > A micro benchmark (hammering an I/O port in a tight loop) shows a > significant performance improvement (down to 66% of the time needed to > handle the intercept on a K8, measured in the guest with TSC). > Even with reading a 1GB file from an emulated IDE harddisk (Dom0 cached) > I could get a 4-5% improvement. > We found some guests (e.g. the TCP stack in some Windows version) which > exercise the PM-Timer I/O port (0x1F48) very often (multiple 10,000 > times per second), these workloads also benefit from this patch.By how much? I mean, the microbenchmark and 5% speedup on our poor-man''s IO path are not very interesting. Unless the speedup on the only possibly-interesting workload you mention is significant, this whole optimisation seems unnecessary. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2010-Mar-05 15:49 UTC
[Xen-devel] Re: [PATCH] x86/hvm: accelerate IO intercept handling
Keir Fraser wrote:> On 05/03/2010 14:36, "Andre Przywara" <andre.przywara@amd.com> wrote: > >> I simply boot tested both versions and ran some simple benchmarks. >> A micro benchmark (hammering an I/O port in a tight loop) shows a >> significant performance improvement (down to 66% of the time needed to >> handle the intercept on a K8, measured in the guest with TSC). >> Even with reading a 1GB file from an emulated IDE harddisk (Dom0 cached) >> I could get a 4-5% improvement. >> We found some guests (e.g. the TCP stack in some Windows version) which >> exercise the PM-Timer I/O port (0x1F48) very often (multiple 10,000 >> times per second), these workloads also benefit from this patch. > > By how much? I mean, the microbenchmark and 5% speedup on our poor-man''s IO > path are not very interesting.Educated estimation: Sysmark productivity should give about 0.5%, Passmark TCP localhost transfer on Windows 2008R2 should improve about 5%. > Unless the speedup on the only> possibly-interesting workload you mention is significant, this whole > optimisation seems unnecessary.Actually it is missing enablement. What is the purpose of going through the emulator (mapping and walking guest page tables, reading guest instruction memory, decoding x86 code) when you don''t have to? KVM is implementing this for quite some time now. And, after all, low hanging fruits are growing higher nowadays, so I''d consider an even modest performance improvement for _every_ machine by just a software patch a valuable thing. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448 3567 12 ----to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Mar-05 15:55 UTC
[Xen-devel] Re: [PATCH] x86/hvm: accelerate IO intercept handling
On 05/03/2010 15:49, "Andre Przywara" <andre.przywara@amd.com> wrote:>> Unless the speedup on the only >> possibly-interesting workload you mention is significant, this whole >> optimisation seems unnecessary. > Actually it is missing enablement.Yes it''s very important to use *every* CPU feature. ;-) Well, the patch looks okay, and it may speed up things we care about a percent or two I guess. I will look at it properly after 4.0 and I''m sure we can get it in the tree. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Reasonably Related Threads
- [PATCH] AMD IOMMU: Fix a xen crash on amd iommu systems
- [PATCH] stubdom: make -> $(MAKE)
- Re: [Xen-changelog] [xen-3.4-testing] x86: Generalise BUGFRAME_dump mechanism to allow polled UART irq to
- [PATCH] tools/stubdom: get rid of hardcoded pathes
- [PATCH] Fix xentop on pv-ops domain0