thr3ads.net - Xen devel - Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Lars Kurth

2013-Jul-09 15:27 UTC

Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Not sure whether anyone has seen this:
http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtualization

Some of the comments are interesting, but not really as negative as they 
used to be. In any case, it may make sense to have a quick look

Lars

Thanos Makatos

2013-Jul-09 15:40 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Lars Kurth
> Sent: 09 July 2013 16:28
> To: xen-devel@lists.xen.org
> Subject: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
> 
> Not sure whether anyone has seen this:
> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtua
> lization
> 
> Some of the comments are interesting, but not really as negative as
> they used to be. In any case, it may make sense to have a quick look
> 
> Lars
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
They use PostMark for their disk I/O tests, which is an ancient benchmark.

Ian Murray

2013-Jul-09 15:53 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

----- Original Message -----> From: Thanos Makatos <thanos.makatos@citrix.com>
> To: "lars.kurth@xen.org" <lars.kurth@xen.org>;
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
> Cc: 
> Sent: Tuesday, 9 July 2013, 16:40
> Subject: Re: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
> 
>>  -----Original Message-----
>>  From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>>  bounces@lists.xen.org] On Behalf Of Lars Kurth
>>  Sent: 09 July 2013 16:28
>>  To: xen-devel@lists.xen.org
>>  Subject: [Xen-devel] Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell
>> 
>>  Not sure whether anyone has seen this:
>> 
http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtua
>>  lization
>> 
>>  Some of the comments are interesting, but not really as negative as
>>  they used to be. In any case, it may make sense to have a quick look
>> 
>>  Lars
>> 
>>  _______________________________________________
>>  Xen-devel mailing list
>>  Xen-devel@lists.xen.org
>>  http://lists.xen.org/xen-devel
> 
> They use PostMark for their disk I/O tests, which is an ancient benchmark.
is that a good or a bad thing? If so, why?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Gordan Bobic

2013-Jul-09 15:54 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On Tue, 09 Jul 2013 16:27:31 +0100, Lars Kurth <lars.kurth@xen.org> 
 wrote:> Not sure whether anyone has seen this:
> 
>
http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtualization
>
> Some of the comments are interesting, but not really as negative as
> they used to be. In any case, it may make sense to have a quick look
 Relative figures at least in terms of ordering are similar to what I
 found last time I did a similar test:

 http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/

 My test was harsher, though, because it exposed more of the context
 switching and inter-core (and worse, inter-die since I tested on a
 C2Q) migration overheads.

 The process migration overheads are _expensive_ - I found that on bare
 metal pining CPU/RAM intensive processes to cores made a ~20%
 difference to overall throughput on a C2Q class CPU (no shared caches
 between the two dies made it worse). I expect 4.3.x will be a
 substantial improvement with NUMA awareness improvements to the
 scheduler (looking forward to trying it this weekend).

 Shame phoronix didn''t test PV performance, in my tests that made
 a huge difference and put Xen firmly ahead of the competition.

 Gordan

Thanos Makatos

2013-Jul-09 15:56 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

> >>  Not sure whether anyone has seen this:
> >>
> >>
> http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virt
> >> ua
> >>  lization
> >>
> >>  Some of the comments are interesting, but not really as negative
as
> >> they used to be. In any case, it may make sense to have a quick
look
> >>
> >>  Lars
> >>
> >>  _______________________________________________
> >>  Xen-devel mailing list
> >>  Xen-devel@lists.xen.org
> >>  http://lists.xen.org/xen-devel
> >
> > They use PostMark for their disk I/O tests, which is an ancient
> benchmark.
> 
> is that a good or a bad thing? If so, why?
IMO it''s a bad thing because it''s far from a representative
benchmark, which can lead to wrong conclusions when evaluation I/O performance.

Gordan Bobic

2013-Jul-09 16:14 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On Tue, 9 Jul 2013 15:56:51 +0000, Thanos Makatos 
 <thanos.makatos@citrix.com> wrote:>> >>  Not sure whether anyone has seen this:
>> >>
>> >>
>> 
>>
http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virt
>> >> ua
>> >>  lization
>> >>
>> >>  Some of the comments are interesting, but not really as
negative
>> as
>> >> they used to be. In any case, it may make sense to have a
quick
>> look
>> >>
>> >>  Lars
>> >>
>> > They use PostMark for their disk I/O tests, which is an ancient
>> benchmark.
>>
>> is that a good or a bad thing? If so, why?
>
> IMO it''s a bad thing because it''s far from a
representative
> benchmark, which can lead to wrong conclusions when evaluation I/O
> performance.
 Ancient doesn''t mean non-representative. A good file-system benchmark
 is a tricky one to come up with because most FS-es are good at some
 things and bad at others. If you really want to test the virtualization
 overhead on FS I/O, the only sane way to test it is by putting the
 FS on the host''s RAM disk and testing from there. That should
 expose the full extent of the overhead, subject to the same
 caveat about different FS-es being better at different load types.

 Personally I''m in favour of redneck-benchmarks that easily push
 the whole stack to saturation point (e.g. highly parallel kernel
 compile) since those cannot be cheated. But generically speaking,
 the only way to get a worthwhile measure is to create a custom
 benchmark that tests your specific application to saturation
 point. Any generic/synthetic benchmark will provide results
 that are almost certainly going to be misleading for any
 specific real-world load you are planning to run on your
 system.

 For example, on a read-only MySQL load (read-only
 because it simplified testing, no need to rebuild huge data
 sets between runs, just drop all the caches), in custom application
 performance test that I carried out for a client, ESX showed
 a ~40% throughput degradation over bare metal (8 cores/server, 16
 SQL threads cat-ing select-filtered general-log extracts, load
 generator running in same VM). And the test machines (both
 physical and virtual had enough RAM in them that they were both
 only disk I/O bound for the first 2-3 minutes of the test (which
 took the best part of an hour to complete); which goes to show
 that disk I/O bottlenecks are good at covering up overheads
 elsewhere.

 Gordan

Thanos Makatos

2013-Jul-09 16:21 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

> > IMO it''s a bad thing because it''s far from a
representative
> benchmark,
> > which can lead to wrong conclusions when evaluation I/O performance.
> 
>  Ancient doesn''t mean non-representative. A good file-system
benchmark
In this particular case it is: PostMark is a single-threaded application that
performs read and write operations on a fixed set of files, at an
unrealistically low directory depth; modern I/O workloads exhibit much more
complicated behaviour than this.
> is a tricky one to come up with because most FS-es are good at some
> things and bad at others. If you really want to test the virtualization
> overhead on FS I/O, the only sane way to test it is by putting the  FS
> on the host''s RAM disk and testing from there. That should  expose
the
> full extent of the overhead, subject to the same  caveat about
> different FS-es being better at different load types.
> 
>  Personally I''m in favour of redneck-benchmarks that easily push 
the
> whole stack to saturation point (e.g. highly parallel kernel
>  compile) since those cannot be cheated. But generically speaking,  the
> only way to get a worthwhile measure is to create a custom  benchmark
> that tests your specific application to saturation  point. Any
> generic/synthetic benchmark will provide results  that are almost
> certainly going to be misleading for any  specific real-world load you
> are planning to run on your  system.
> 
>  For example, on a read-only MySQL load (read-only  because it
> simplified testing, no need to rebuild huge data  sets between runs,
> just drop all the caches), in custom application  performance test that
> I carried out for a client, ESX showed  a ~40% throughput degradation
> over bare metal (8 cores/server, 16  SQL threads cat-ing select-
> filtered general-log extracts, load  generator running in same VM). And
> the test machines (both  physical and virtual had enough RAM in them
> that they were both  only disk I/O bound for the first 2-3 minutes of
> the test (which  took the best part of an hour to complete); which goes
> to show  that disk I/O bottlenecks are good at covering up overheads
> elsewhere.
> 
>  Gordan

Gordan Bobic

2013-Jul-09 16:26 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On Tue, 9 Jul 2013 16:21:52 +0000, Thanos Makatos 
 <thanos.makatos@citrix.com> wrote:>> > IMO it''s a bad thing because it''s far from a
representative
>> benchmark,
>> > which can lead to wrong conclusions when evaluation I/O 
>> performance.
>>
>>  Ancient doesn''t mean non-representative. A good file-system 
>> benchmark
>
> In this particular case it is: PostMark is a single-threaded
> application that performs read and write operations on a fixed set of
> files, at an unrealistically low directory depth; modern I/O 
> workloads
> exhibit much more complicated behaviour than this.
 Unless you are running a mail server. Granted, running multiple
 postmarks in parallel might be a better test on today''s many-core
 servers, but it''d likely make no little or no difference on a
 disk I/O bound test.

 Gordan

Alex Bligh

2013-Jul-09 16:52 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

--On 9 July 2013 16:27:31 +0100 Lars Kurth <lars.kurth@xen.org> wrote:
> Not sure whether anyone has seen this:
>
http://www.phoronix.com/scan.php?page=article&item=intel_haswell_virtuali
> zation
>
> Some of the comments are interesting, but not really as negative as they
> used to be. In any case, it may make sense to have a quick look
Last time I looked at the Phoronix benchmarks, they were using the default
disk caching with Xen and Qemu, and these were not identical. From memory
KVM was using writethrough and Xen was using no caching.

This one says "Xen and KVM virtualization were setup through
virt-manager".
I don''t know whether that evens things out, as I don''t use it.

-- 
Alex Bligh

Dario Faggioli

2013-Jul-11 10:53 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On mar, 2013-07-09 at 16:54 +0100, Gordan Bobic wrote:>  The process migration overheads are _expensive_
>Indeed!
>  - I found that on bare
>  metal pining CPU/RAM intensive processes to cores made a ~20%
>  difference to overall throughput on a C2Q class CPU (no shared caches
>  between the two dies made it worse). I expect 4.3.x will be a
>  substantial improvement with NUMA awareness improvements to the
>  scheduler (looking forward to trying it this weekend).
> Well, yes, something good could be expected, although the actual
improvement will depend on the number of involved VMs, their sizes, the
workload they''re running, etc.

When I tried to use kernel compile as a benchmark for the NUMA effects,
it did not turn out that useful to me (and that''s why I switched to
SpecJBB), but perhaps it was me that was doing something wrong...

Anyway, if you do anything like this, please, do let us know here (and,
please, Cc me :-P).

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2013-Jul-11 16:23 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:> On mar, 2013-07-09 at 16:54 +0100, Gordan Bobic wrote:
>>  The process migration overheads are _expensive_
>>
> Indeed!
>
>>  - I found that on bare
>>  metal pining CPU/RAM intensive processes to cores made a ~20%
>>  difference to overall throughput on a C2Q class CPU (no shared caches
>>  between the two dies made it worse). I expect 4.3.x will be a
>>  substantial improvement with NUMA awareness improvements to the
>>  scheduler (looking forward to trying it this weekend).
>>
> Well, yes, something good could be expected, although the actual
> improvement will depend on the number of involved VMs, their sizes, the
> workload they''re running, etc.
>
> When I tried to use kernel compile as a benchmark for the NUMA effects,
> it did not turn out that useful to me (and that''s why I switched
to
> SpecJBB), but perhaps it was me that was doing something wrong...
In my experience, kernel-build has excellent memory locality.  One
effect is that the effect of nested paging on TLB time is almostt nil;
I''m not surprised that the caches make the effect of NUMA almost nil
as well.

 -George

Dario Faggioli

2013-Jul-11 16:27 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote:> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli
> > When I tried to use kernel compile as a benchmark for the NUMA
effects,
> > it did not turn out that useful to me (and that''s why I
switched to
> > SpecJBB), but perhaps it was me that was doing something wrong...
> 
> In my experience, kernel-build has excellent memory locality.  One
> effect is that the effect of nested paging on TLB time is almostt nil;
> I''m not surprised that the caches make the effect of NUMA almost
nil
> as well.
> Not to mention I/O, unless you setup a ramfs backed building
environment. Again, when I tried, that was my intention, but perhaps I
failed right at that... Gordan, what about you?

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Gordan Bobic

2013-Jul-11 17:49 UTC

head link

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

On 07/11/2013 05:27 PM, Dario Faggioli wrote:> On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote:
>> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli
>>> When I tried to use kernel compile as a benchmark for the NUMA
effects,
>>> it did not turn out that useful to me (and that''s why I
switched to
>>> SpecJBB), but perhaps it was me that was doing something wrong...
>>
>> In my experience, kernel-build has excellent memory locality.  One
>> effect is that the effect of nested paging on TLB time is almostt nil;
>> I''m not surprised that the caches make the effect of NUMA
almost nil
>> as well.
>>
> Not to mention I/O, unless you setup a ramfs backed building
> environment. Again, when I tried, that was my intention, but perhaps I
> failed right at that... Gordan, what about you?
IIRC in my tests the disk I/O was relatively minimal. If you read the 
details here:

http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/

you may notice that I actually primed the test by catting everything to 
/dev/null, so all the reads should have been coming from the page cache. 
I didn''t have enough RAM in the machine (only 8GB) to fit all the 
produced binaries in tmpfs at the time.

I don''t think this had a large impact, though - the iowait time was 
about 0% all the time because there were plenty of threads that had 
productive compiling work to do while some were waiting to commit to 
disk. Since this was on a C2Q, there was no NUMA in play, so if I had to 
guess at the major cause of performance degradation, it would be related 
to context switching; having said that, I didn''t get around to doing
any
in-depth profiling to be able to tell for sure. (Speaking of which, how 
would one go about profiling things at bare-metal hypervisor level?

I will re-run the test on a new machine at some point and see how it 
compares, and this time I will have enough RAM for the whole lot to fit.

Gordan

Xen devel - Jul 2013 - Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell

Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell