Not sure whether anyone has seen this: http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtualization Some of the comments are interesting, but not really as negative as they used to be. In any case, it may make sense to have a quick look Lars
Thanos Makatos
2013-Jul-09 15:40 UTC
Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Lars Kurth > Sent: 09 July 2013 16:28 > To: xen-devel@lists.xen.org > Subject: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell > > Not sure whether anyone has seen this: > http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtua > lization > > Some of the comments are interesting, but not really as negative as > they used to be. In any case, it may make sense to have a quick look > > Lars > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-develThey use PostMark for their disk I/O tests, which is an ancient benchmark.
----- Original Message -----> From: Thanos Makatos <thanos.makatos@citrix.com> > To: "lars.kurth@xen.org" <lars.kurth@xen.org>; "xen-devel@lists.xen.org" <xen-devel@lists.xen.org> > Cc: > Sent: Tuesday, 9 July 2013, 16:40 > Subject: Re: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell > >> -----Original Message----- >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- >> bounces@lists.xen.org] On Behalf Of Lars Kurth >> Sent: 09 July 2013 16:28 >> To: xen-devel@lists.xen.org >> Subject: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell >> >> Not sure whether anyone has seen this: >> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtua >> lization >> >> Some of the comments are interesting, but not really as negative as >> they used to be. In any case, it may make sense to have a quick look >> >> Lars >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > > They use PostMark for their disk I/O tests, which is an ancient benchmark.is that a good or a bad thing? If so, why?> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On Tue, 09 Jul 2013 16:27:31 +0100, Lars Kurth <lars.kurth@xen.org> wrote:> Not sure whether anyone has seen this: > > http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtualization > > Some of the comments are interesting, but not really as negative as > they used to be. In any case, it may make sense to have a quick lookRelative figures at least in terms of ordering are similar to what I found last time I did a similar test: http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/ My test was harsher, though, because it exposed more of the context switching and inter-core (and worse, inter-die since I tested on a C2Q) migration overheads. The process migration overheads are _expensive_ - I found that on bare metal pining CPU/RAM intensive processes to cores made a ~20% difference to overall throughput on a C2Q class CPU (no shared caches between the two dies made it worse). I expect 4.3.x will be a substantial improvement with NUMA awareness improvements to the scheduler (looking forward to trying it this weekend). Shame phoronix didn''t test PV performance, in my tests that made a huge difference and put Xen firmly ahead of the competition. Gordan
Thanos Makatos
2013-Jul-09 15:56 UTC
Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
> >> Not sure whether anyone has seen this: > >> > >> > http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virt > >> ua > >> lization > >> > >> Some of the comments are interesting, but not really as negative as > >> they used to be. In any case, it may make sense to have a quick look > >> > >> Lars > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xen.org > >> http://lists.xen.org/xen-devel > > > > They use PostMark for their disk I/O tests, which is an ancient > benchmark. > > is that a good or a bad thing? If so, why?IMO it''s a bad thing because it''s far from a representative benchmark, which can lead to wrong conclusions when evaluation I/O performance.
On Tue, 9 Jul 2013 15:56:51 +0000, Thanos Makatos <thanos.makatos@citrix.com> wrote:>> >> Not sure whether anyone has seen this: >> >> >> >> >> >> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virt >> >> ua >> >> lization >> >> >> >> Some of the comments are interesting, but not really as negative >> as >> >> they used to be. In any case, it may make sense to have a quick >> look >> >> >> >> Lars >> >> >> > They use PostMark for their disk I/O tests, which is an ancient >> benchmark. >> >> is that a good or a bad thing? If so, why? > > IMO it''s a bad thing because it''s far from a representative > benchmark, which can lead to wrong conclusions when evaluation I/O > performance.Ancient doesn''t mean non-representative. A good file-system benchmark is a tricky one to come up with because most FS-es are good at some things and bad at others. If you really want to test the virtualization overhead on FS I/O, the only sane way to test it is by putting the FS on the host''s RAM disk and testing from there. That should expose the full extent of the overhead, subject to the same caveat about different FS-es being better at different load types. Personally I''m in favour of redneck-benchmarks that easily push the whole stack to saturation point (e.g. highly parallel kernel compile) since those cannot be cheated. But generically speaking, the only way to get a worthwhile measure is to create a custom benchmark that tests your specific application to saturation point. Any generic/synthetic benchmark will provide results that are almost certainly going to be misleading for any specific real-world load you are planning to run on your system. For example, on a read-only MySQL load (read-only because it simplified testing, no need to rebuild huge data sets between runs, just drop all the caches), in custom application performance test that I carried out for a client, ESX showed a ~40% throughput degradation over bare metal (8 cores/server, 16 SQL threads cat-ing select-filtered general-log extracts, load generator running in same VM). And the test machines (both physical and virtual had enough RAM in them that they were both only disk I/O bound for the first 2-3 minutes of the test (which took the best part of an hour to complete); which goes to show that disk I/O bottlenecks are good at covering up overheads elsewhere. Gordan
Thanos Makatos
2013-Jul-09 16:21 UTC
Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
> > IMO it''s a bad thing because it''s far from a representative > benchmark, > > which can lead to wrong conclusions when evaluation I/O performance. > > Ancient doesn''t mean non-representative. A good file-system benchmarkIn this particular case it is: PostMark is a single-threaded application that performs read and write operations on a fixed set of files, at an unrealistically low directory depth; modern I/O workloads exhibit much more complicated behaviour than this.> is a tricky one to come up with because most FS-es are good at some > things and bad at others. If you really want to test the virtualization > overhead on FS I/O, the only sane way to test it is by putting the FS > on the host''s RAM disk and testing from there. That should expose the > full extent of the overhead, subject to the same caveat about > different FS-es being better at different load types. > > Personally I''m in favour of redneck-benchmarks that easily push the > whole stack to saturation point (e.g. highly parallel kernel > compile) since those cannot be cheated. But generically speaking, the > only way to get a worthwhile measure is to create a custom benchmark > that tests your specific application to saturation point. Any > generic/synthetic benchmark will provide results that are almost > certainly going to be misleading for any specific real-world load you > are planning to run on your system. > > For example, on a read-only MySQL load (read-only because it > simplified testing, no need to rebuild huge data sets between runs, > just drop all the caches), in custom application performance test that > I carried out for a client, ESX showed a ~40% throughput degradation > over bare metal (8 cores/server, 16 SQL threads cat-ing select- > filtered general-log extracts, load generator running in same VM). And > the test machines (both physical and virtual had enough RAM in them > that they were both only disk I/O bound for the first 2-3 minutes of > the test (which took the best part of an hour to complete); which goes > to show that disk I/O bottlenecks are good at covering up overheads > elsewhere. > > Gordan
On Tue, 9 Jul 2013 16:21:52 +0000, Thanos Makatos <thanos.makatos@citrix.com> wrote:>> > IMO it''s a bad thing because it''s far from a representative >> benchmark, >> > which can lead to wrong conclusions when evaluation I/O >> performance. >> >> Ancient doesn''t mean non-representative. A good file-system >> benchmark > > In this particular case it is: PostMark is a single-threaded > application that performs read and write operations on a fixed set of > files, at an unrealistically low directory depth; modern I/O > workloads > exhibit much more complicated behaviour than this.Unless you are running a mail server. Granted, running multiple postmarks in parallel might be a better test on today''s many-core servers, but it''d likely make no little or no difference on a disk I/O bound test. Gordan
--On 9 July 2013 16:27:31 +0100 Lars Kurth <lars.kurth@xen.org> wrote:> Not sure whether anyone has seen this: > http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtuali > zation > > Some of the comments are interesting, but not really as negative as they > used to be. In any case, it may make sense to have a quick lookLast time I looked at the Phoronix benchmarks, they were using the default disk caching with Xen and Qemu, and these were not identical. From memory KVM was using writethrough and Xen was using no caching. This one says "Xen and KVM virtualization were setup through virt-manager". I don''t know whether that evens things out, as I don''t use it. -- Alex Bligh
Dario Faggioli
2013-Jul-11 10:53 UTC
Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
On mar, 2013-07-09 at 16:54 +0100, Gordan Bobic wrote:> The process migration overheads are _expensive_ >Indeed!> - I found that on bare > metal pining CPU/RAM intensive processes to cores made a ~20% > difference to overall throughput on a C2Q class CPU (no shared caches > between the two dies made it worse). I expect 4.3.x will be a > substantial improvement with NUMA awareness improvements to the > scheduler (looking forward to trying it this weekend). >Well, yes, something good could be expected, although the actual improvement will depend on the number of involved VMs, their sizes, the workload they''re running, etc. When I tried to use kernel compile as a benchmark for the NUMA effects, it did not turn out that useful to me (and that''s why I switched to SpecJBB), but perhaps it was me that was doing something wrong... Anyway, if you do anything like this, please, do let us know here (and, please, Cc me :-P). Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote:> On mar, 2013-07-09 at 16:54 +0100, Gordan Bobic wrote: >> The process migration overheads are _expensive_ >> > Indeed! > >> - I found that on bare >> metal pining CPU/RAM intensive processes to cores made a ~20% >> difference to overall throughput on a C2Q class CPU (no shared caches >> between the two dies made it worse). I expect 4.3.x will be a >> substantial improvement with NUMA awareness improvements to the >> scheduler (looking forward to trying it this weekend). >> > Well, yes, something good could be expected, although the actual > improvement will depend on the number of involved VMs, their sizes, the > workload they''re running, etc. > > When I tried to use kernel compile as a benchmark for the NUMA effects, > it did not turn out that useful to me (and that''s why I switched to > SpecJBB), but perhaps it was me that was doing something wrong...In my experience, kernel-build has excellent memory locality. One effect is that the effect of nested paging on TLB time is almostt nil; I''m not surprised that the caches make the effect of NUMA almost nil as well. -George
Dario Faggioli
2013-Jul-11 16:27 UTC
Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote:> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli > > When I tried to use kernel compile as a benchmark for the NUMA effects, > > it did not turn out that useful to me (and that''s why I switched to > > SpecJBB), but perhaps it was me that was doing something wrong... > > In my experience, kernel-build has excellent memory locality. One > effect is that the effect of nested paging on TLB time is almostt nil; > I''m not surprised that the caches make the effect of NUMA almost nil > as well. >Not to mention I/O, unless you setup a ramfs backed building environment. Again, when I tried, that was my intention, but perhaps I failed right at that... Gordan, what about you? Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 07/11/2013 05:27 PM, Dario Faggioli wrote:> On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote: >> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli >>> When I tried to use kernel compile as a benchmark for the NUMA effects, >>> it did not turn out that useful to me (and that''s why I switched to >>> SpecJBB), but perhaps it was me that was doing something wrong... >> >> In my experience, kernel-build has excellent memory locality. One >> effect is that the effect of nested paging on TLB time is almostt nil; >> I''m not surprised that the caches make the effect of NUMA almost nil >> as well. >> > Not to mention I/O, unless you setup a ramfs backed building > environment. Again, when I tried, that was my intention, but perhaps I > failed right at that... Gordan, what about you?IIRC in my tests the disk I/O was relatively minimal. If you read the details here: http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/ you may notice that I actually primed the test by catting everything to /dev/null, so all the reads should have been coming from the page cache. I didn''t have enough RAM in the machine (only 8GB) to fit all the produced binaries in tmpfs at the time. I don''t think this had a large impact, though - the iowait time was about 0% all the time because there were plenty of threads that had productive compiling work to do while some were waiting to commit to disk. Since this was on a C2Q, there was no NUMA in play, so if I had to guess at the major cause of performance degradation, it would be related to context switching; having said that, I didn''t get around to doing any in-depth profiling to be able to tell for sure. (Speaking of which, how would one go about profiling things at bare-metal hypervisor level? I will re-run the test on a new machine at some point and see how it compares, and this time I will have enough RAM for the whole lot to fit. Gordan