thr3ads.net - Xen devel - [Xen-devel] Odd blkdev throughput results [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Daniel Stodden

2008-Mar-09 17:13 UTC

[Xen-devel] Odd blkdev throughput results

Hi.

I''ve been running a couple of benchmarks on a Xen-3.0 installation
lately. Part of those compared SMP and CMP configurations on a 2x2 Intel
Woodcrest (i.e. two sockets). Tests were all performed between a UP dom0
(on core 0) and a UP domU and pinning the domU VCPU to core 1 (processor
0) or 3 (processor 1).

Switching from SMP to CMP, netperf -tTCP_TREAM gets me 1686.64 vs.
2673.20 Mbit/s. Lesser IPI latency, shared caches, all as one should
expect, I believe.

Now, trying that for block I/O may sound strange but can be done:
created a 3 GB ramdisk on dom0 and fed that to domU. peak with ''hdparm
-t'' is at 759.37 MB/s on SMP.

The fun (for me, fun is probably a personal thing) part is that
throughput is higher than with TCP. May be due to the block layer being
much thinner than TCP/IP networking, or the fact that transfers utilize
the whole 4KB page size for sequential reads. Possibly some of both, I
didn''t try.

This is not my question. What strikes me is that for the blkdev
interface, the CMP setup is 13% *slower* than SMP, at 661.99 MB/s.

Now, any ideas? I''m mildly familiar with both netback and blkback, and
I''d never expected something like that. Any hint appreciated.

Thanks,
Daniel

-- 
Daniel Stodden
LRR     -      Lehrstuhl für Rechnertechnik und Rechnerorganisation
Institut für Informatik der TU München             D-85748 Garching
http://www.lrr.in.tum.de/~stodden         mailto:stodden@cs.tum.edu
PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-Mar-09 20:07 UTC

head link

RE: [Xen-devel] Odd blkdev throughput results

> The fun (for me, fun is probably a personal thing) part is that
> throughput is higher than with TCP. May be due to the block layer being
> much thinner than TCP/IP networking, or the fact that transfers utilize
> the whole 4KB page size for sequential reads. Possibly some of both, I
> didn't try.
The big thing is that on network RX it is currently dom0 that does the copy. In
the CMP case this leaves the data in the shared cache ready to be accessed by
the guest. In the SMP case it doesn't help at all. In netchannel2 we're
moving the copy to the guest CPU, and trying to eliminate it with smart
hardware.

Block IO doesn't require a copy at all. 
> This is not my question. What strikes me is that for the blkdev
> interface, the CMP setup is 13% *slower* than SMP, at 661.99 MB/s.
> 
> Now, any ideas? I'm mildly familiar with both netback and blkback, and
> I'd never expected something like that. Any hint appreciated.
How stable are your results with hdparm? I've never really trusted it as a
benchmarking tool.

The ramdisk isn't going to be able to DMA data into the domU's buffer on
a read, so it will have to copy it. The hdparm running in domU probably
doesn't actually look at any of the data it requests, so it stays local to
the dom0 CPU's cache (unlike a real app). Doing all that copying in dom0 is
going to beat up the domU in the shared cache in the CMP case, but won't
effect it as much in the SMP case.
 

Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2008-Mar-09 21:18 UTC

head link

RE: [Xen-devel] Odd blkdev throughput results

On Sun, 2008-03-09 at 20:07 +0000, Ian Pratt wrote:> > The fun (for me, fun is probably a personal thing) part is that
> > throughput is higher than with TCP. May be due to the block layer
being
> > much thinner than TCP/IP networking, or the fact that transfers
utilize
> > the whole 4KB page size for sequential reads. Possibly some of both, I
> > didn''t try.
> The big thing is that on network RX it is currently dom0 that does the
copy.
> In the CMP case this leaves the data in the shared cache ready to be
accessed by
> the guest. In the SMP case it doesn''t help at all. In netchannel2
we''re
>  moving the copy to the guest CPU, and trying to eliminate it with smart
hardware.
> Block IO doesn''t require a copy at all. 
Well, not in blkback by itself, but certainly from the in-memory disk
image. Unless I misunderstoode Keirs post recently, page flipping is
basically dead code, so I thought the number should at least point into
roughly the same directions.
> > This is not my question. What strikes me is that for the blkdev
> > interface, the CMP setup is 13% *slower* than SMP, at 661.99 MB/s.
> > 
> > Now, any ideas? I''m mildly familiar with both netback and
blkback, and
> > I''d never expected something like that. Any hint appreciated.
> 
> How stable are your results with hdparm? I''ve never really trusted
it as a benchmarking tool.
So far, all the experiments I''ve done look fairly reasonable. Standard
deviance is low, and since I''ve been tracing netback reads I''m
fairly
confident that the volume wasn''t been left in domU memory somewhere.

I''m not so much interested in bio or physical disk performance, but
relative performance of how much can be squeezed through the buffer ring
before and after applying some changes. It''s hardly a physical disk
benchmark, but it''s simple and for the purpose given it seems okay.
> The ramdisk isn''t going to be able to DMA data into the
domU''s buffer on
>  a read, so it will have to copy it. 
Right...
> The hdparm running in domU probably
>  doesn''t actually look at any of the data it requests, so it stays
local
>  to the dom0 CPU''s cache (unlike a real app). 
hdparm performs sequential 2MB-read()s over a 3s period. It''s not
calling the block layer directly or something. That''ll certainly hit
domU caches?
> Doing all that copying
>  in dom0 is going to beat up the domU in the shared cache in the CMP
>  case, but won''t effect it as much in the SMP case.
Well, I could live with blaming L2 footprint. Just wanted to hear if
someone has different explanations. And I would expect similar results
on net RX then, but I may be mistaken.

Furthermore, I need to apologize because I failed to use netperf
correctly and managed to report the TX path on my original post :P. The
real numbers are rather 885.43 (SMP) vs. 1295.46 (CMP), but the
difference compared to blk reads as such stays the same.

regards,
daniel

-- 
Daniel Stodden
LRR     -      Lehrstuhl für Rechnertechnik und Rechnerorganisation
Institut für Informatik der TU München             D-85748 Garching
http://www.lrr.in.tum.de/~stodden         mailto:stodden@cs.tum.edu
PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2008-Mar-16 21:15 UTC

head link

Re: [Xen-devel] Odd blkdev throughput results

> > The big thing is that on network RX it is currently dom0 that does the
> > copy. In the CMP case this leaves the data in the shared cache ready
to
> > be accessed by the guest. In the SMP case it doesn''t help at
all. In
> > netchannel2 we''re moving the copy to the guest CPU, and
trying to
> > eliminate it with smart hardware.
> >
> > Block IO doesn''t require a copy at all.
>
> Well, not in blkback by itself, but certainly from the in-memory disk
> image. Unless I misunderstoode Keirs post recently, page flipping is
> basically dead code, so I thought the number should at least point into
> roughly the same directions.
Blkback has always DMA-ed directly into guest memory when reading data from 
the disk drive (normal usecase), in which case there''s no copy - I
think that
was Ian''s point.  In contrast the Netback driver has to do a copy in
the
normal case.

If you''re using a ramdisk then there must be a copy somewhere, although
I''m
not sure exactly where it happens!

Cheers,
Mark
> > > This is not my question. What strikes me is that for the blkdev
> > > interface, the CMP setup is 13% *slower* than SMP, at 661.99
MB/s.
> > >
> > > Now, any ideas? I''m mildly familiar with both netback
and blkback, and
> > > I''d never expected something like that. Any hint
appreciated.
> >
> > How stable are your results with hdparm? I''ve never really
trusted it as
> > a benchmarking tool.
>
> So far, all the experiments I''ve done look fairly reasonable.
Standard
> deviance is low, and since I''ve been tracing netback reads
I''m fairly
> confident that the volume wasn''t been left in domU memory
somewhere.
>
> I''m not so much interested in bio or physical disk performance,
but
> relative performance of how much can be squeezed through the buffer ring
> before and after applying some changes. It''s hardly a physical
disk
> benchmark, but it''s simple and for the purpose given it seems
okay.
>
> > The ramdisk isn''t going to be able to DMA data into the
domU''s buffer on
> >  a read, so it will have to copy it.
>
> Right...
>
> > The hdparm running in domU probably
> >  doesn''t actually look at any of the data it requests, so it
stays local
> >  to the dom0 CPU''s cache (unlike a real app).
>
> hdparm performs sequential 2MB-read()s over a 3s period. It''s not
> calling the block layer directly or something. That''ll certainly
hit
> domU caches?
>
> > Doing all that copying
> >  in dom0 is going to beat up the domU in the shared cache in the CMP
> >  case, but won''t effect it as much in the SMP case.
>
> Well, I could live with blaming L2 footprint. Just wanted to hear if
> someone has different explanations. And I would expect similar results
> on net RX then, but I may be mistaken.
>
> Furthermore, I need to apologize because I failed to use netperf
> correctly and managed to report the TX path on my original post :P. The
> real numbers are rather 885.43 (SMP) vs. 1295.46 (CMP), but the
> difference compared to blk reads as such stays the same.
>
> regards,
> daniel


-- 
Push Me Pull You - Distributed SCM tool (http://www.cl.cam.ac.uk/~maw48/pmpu/)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Daniel Stodden

2008-Mar-16 23:04 UTC

head link

Re: [Xen-devel] Odd blkdev throughput results

On Sun, 2008-03-16 at 21:15 +0000, Mark Williamson
wrote:> >
> > > Block IO doesn''t require a copy at all.
> >
> > Well, not in blkback by itself, but certainly from the in-memory disk
> > image. Unless I misunderstoode Keirs post recently, page flipping is
> > basically dead code, so I thought the number should at least point
into
> > roughly the same directions.
> 
> Blkback has always DMA-ed directly into guest memory when reading data from
> the disk drive (normal usecase), in which case there''s no copy - I
think that
> was Ian''s point.  In contrast the Netback driver has to do a copy
in the
> normal case.
> 
> If you''re using a ramdisk then there must be a copy somewhere,
although I''m
> not sure exactly where it happens!
I checked it, this is comparatively easy to find. Since DMA-or-not is
ultimately up to the driver, it''s that single memcpy() in rd.c. Looks
rather straightforward. 

In theory, such a pseudo-device could make use of the host DMA engine
embedded into newer Intel chipsets to save a few cycles. But looking at
the source that does not seem to be the case (and typical usage
scenarios for the ramdisk driver (4MB default size iirc) would hardly
justify the effort).

Blkdev peak throughput at 500MB/s is certainly not a usability issue :)
I just asked because I hoped someone who spent more time one the PV
drivers than I did might have experienced (or even profiled) similar
effects already, and could explain.

Thanks and greetings,
Daniel
 
-- 
Daniel Stodden
LRR     -      Lehrstuhl für Rechnertechnik und Rechnerorganisation
Institut für Informatik der TU München             D-85748 Garching
http://www.lrr.in.tum.de/~stodden         mailto:stodden@cs.tum.edu
PGP Fingerprint: F5A4 1575 4C56 E26A 0B33  3D80 457E 82AE B0D8 735B



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Mar 2008 - Odd blkdev throughput results

[Xen-devel] Odd blkdev throughput results

RE: [Xen-devel] Odd blkdev throughput results

RE: [Xen-devel] Odd blkdev throughput results

Re: [Xen-devel] Odd blkdev throughput results

Re: [Xen-devel] Odd blkdev throughput results