thr3ads.net - Libguestfs - [Libguestfs] Virt-v2v performance benchmarking part 3 [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2022-Jan-10 15:52 UTC

[Libguestfs] Virt-v2v performance benchmarking part 3

For the raw format local disk to local disk conversion, it's possible
to regain most of the performance by adding
--request-size=$(( 16 * 1024 * 1024 )) to the nbdcopy command.  The
patch below is not suitable for going upstream but it can be used for
testing:

diff --git a/v2v/v2v.ml b/v2v/v2v.ml
index 47e6e937..ece3b7d9 100644
--- a/v2v/v2v.ml
+++ b/v2v/v2v.ml
@@ -613,6 +613,7 @@ and nbdcopy output_alloc input_uri output_uri    let cmd =
ref [] in
   List.push_back_list cmd [ "nbdcopy"; input_uri; output_uri ];
   List.push_back cmd "--flush";
+  List.push_back cmd "--request-size=16777216";
   (*List.push_back cmd "--verbose";*)
   if not (quiet ()) then List.push_back cmd "--progress";
   if output_alloc = Types.Preallocated then List.push_back cmd
"--allocated";

The problem is of course this is a pessimisation for other
conversions.  It's known to make at least qcow2 to qcow2, and all VDDK
conversions worse.  So we'd have to make it conditional on doing a raw
format local conversion, which is a pretty ugly hack.  Even worse, the
exact size (16M) varies for me when I test this on different machines
and HDDs vs SSDs.  On my very fast AMD machine with an SSD, the
nbdcopy default request size (256K) is fastest and larger sizes are a
very slightly slower.

I can imagine an "adaptive nbdcopy" which adjusts these parameters
while copying in order to find the best performance.  A little bit
hard to implement ...

I'm also still wondering exactly why a larger request size is better
in this case.  You can easily reproduce the effect using the attached
test script and adjusting --request-size.  You'll need to build the
standard test guest, see part 1.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.sh
Type: application/x-sh
Size: 578 bytes
Desc: not available
URL:
<http://listman.redhat.com/archives/libguestfs/attachments/20220110/d5851b99/attachment.sh>

Laszlo Ersek

2022-Jan-11 07:00 UTC

head link

[Libguestfs] Virt-v2v performance benchmarking part 3

On 01/10/22 16:52, Richard W.M. Jones wrote:> 
> For the raw format local disk to local disk conversion, it's possible
> to regain most of the performance by adding
> --request-size=$(( 16 * 1024 * 1024 )) to the nbdcopy command.  The
> patch below is not suitable for going upstream but it can be used for
> testing:
> 
> diff --git a/v2v/v2v.ml b/v2v/v2v.ml
> index 47e6e937..ece3b7d9 100644
> --- a/v2v/v2v.ml
> +++ b/v2v/v2v.ml
> @@ -613,6 +613,7 @@ and nbdcopy output_alloc input_uri output_uri >   
let cmd = ref [] in
>    List.push_back_list cmd [ "nbdcopy"; input_uri; output_uri ];
>    List.push_back cmd "--flush";
> +  List.push_back cmd "--request-size=16777216";
>    (*List.push_back cmd "--verbose";*)
>    if not (quiet ()) then List.push_back cmd "--progress";
>    if output_alloc = Types.Preallocated then List.push_back cmd
"--allocated";
> 
> The problem is of course this is a pessimisation for other
> conversions.  It's known to make at least qcow2 to qcow2, and all VDDK
> conversions worse.  So we'd have to make it conditional on doing a raw
> format local conversion, which is a pretty ugly hack.  Even worse, the
> exact size (16M) varies for me when I test this on different machines
> and HDDs vs SSDs.  On my very fast AMD machine with an SSD, the
> nbdcopy default request size (256K) is fastest and larger sizes are a
> very slightly slower.
> 
> I can imagine an "adaptive nbdcopy" which adjusts these
parameters
> while copying in order to find the best performance.  A little bit
> hard to implement ...
> 
> I'm also still wondering exactly why a larger request size is better
> in this case.  You can easily reproduce the effect using the attached
> test script and adjusting --request-size.  You'll need to build the
> standard test guest, see part 1.
(The following thought occurred to me last evening.)

In modular v2v, we use multi-threaded nbdkit instances, and
multi-threaded nbdcopy instances. (IIUC.) I think: that should result in
quite a bit of thrashing, on both source and destination disks, no? That
should be especially visible on HDDs, but perhaps also on SSDs
(dependent on request size as you mention above).

The worst is likely when both nbdcopy processes operate on the same
physical HDD (i.e., spinning rust).

qemu-img is single-threaded, so even if reads from and writes to the
same physical hard disk, it kind of generates two "parallel" request
streams, which both the disk and the kernel's IO scheduler could cope
with more easily. According to the nbdcopy manual, the default thread
count is "number of processor cores available", the "sliding
window of
requests" with a high thread count is likely undistinguishable from real
random access.

Also I (vaguely?) gather that nbdcopy bypasses the page cache (or does
it only sync automatically at the end? I don't remember). If the page
cache is avoided, then the page cache has no chance to mitigate the
thrashing, especially on HDDs -- but even on SSDs, if the drive's
internal cache is not large enough (considering the individual request
size and the number of random requests flying in parallel), the
degradation should be visible.

Can you tweak (i.e., lower) the thread count of both nbdcopy processes;
let's say to "1", for starters?

Thanks!
Laszlo

Libguestfs - Jan 2022 - Virt-v2v performance benchmarking part 3

[Libguestfs] Virt-v2v performance benchmarking part 3

[Libguestfs] Virt-v2v performance benchmarking part 3