thr3ads.net - Libguestfs - [Libguestfs] Virt-v2v performance benchmarking part 3 [Jan 2022]

If this information is useful, please help other people find it:
Share via:

Laszlo Ersek

2022-Jan-11 07:07 UTC

[Libguestfs] Virt-v2v performance benchmarking part 3

On 01/11/22 08:00, Laszlo Ersek wrote:> On 01/10/22 16:52, Richard W.M. Jones wrote:
>>
>> For the raw format local disk to local disk conversion, it's
possible
>> to regain most of the performance by adding
>> --request-size=$(( 16 * 1024 * 1024 )) to the nbdcopy command.  The
>> patch below is not suitable for going upstream but it can be used for
>> testing:
>>
>> diff --git a/v2v/v2v.ml b/v2v/v2v.ml
>> index 47e6e937..ece3b7d9 100644
>> --- a/v2v/v2v.ml
>> +++ b/v2v/v2v.ml
>> @@ -613,6 +613,7 @@ and nbdcopy output_alloc input_uri output_uri
>>    let cmd = ref [] in
>>    List.push_back_list cmd [ "nbdcopy"; input_uri; output_uri
];
>>    List.push_back cmd "--flush";
>> +  List.push_back cmd "--request-size=16777216";
>>    (*List.push_back cmd "--verbose";*)
>>    if not (quiet ()) then List.push_back cmd "--progress";
>>    if output_alloc = Types.Preallocated then List.push_back cmd
"--allocated";
>>
>> The problem is of course this is a pessimisation for other
>> conversions.  It's known to make at least qcow2 to qcow2, and all
VDDK
>> conversions worse.  So we'd have to make it conditional on doing a
raw
>> format local conversion, which is a pretty ugly hack.  Even worse, the
>> exact size (16M) varies for me when I test this on different machines
>> and HDDs vs SSDs.  On my very fast AMD machine with an SSD, the
>> nbdcopy default request size (256K) is fastest and larger sizes are a
>> very slightly slower.
>>
>> I can imagine an "adaptive nbdcopy" which adjusts these
parameters
>> while copying in order to find the best performance.  A little bit
>> hard to implement ...
>>
>> I'm also still wondering exactly why a larger request size is
better
>> in this case.  You can easily reproduce the effect using the attached
>> test script and adjusting --request-size.  You'll need to build the
>> standard test guest, see part 1.
> 
> (The following thought occurred to me last evening.)
> 
> In modular v2v, we use multi-threaded nbdkit instances, and
> multi-threaded nbdcopy instances. (IIUC.) I think: that should result in
> quite a bit of thrashing, on both source and destination disks, no? That
> should be especially visible on HDDs, but perhaps also on SSDs
> (dependent on request size as you mention above).
> 
> The worst is likely when both nbdcopy processes operate on the same
> physical HDD (i.e., spinning rust).
> 
> qemu-img is single-threaded,
hmmmm, not necessarily; according to the manual, "qemu-img convert"
uses
(by default) 8 co-routines. There's also the -W flag ("out of order
writes"), which I don't know if the original virt-v2v used.

Laszlo
> so even if reads from and writes to the
> same physical hard disk, it kind of generates two "parallel"
request
> streams, which both the disk and the kernel's IO scheduler could cope
> with more easily. According to the nbdcopy manual, the default thread
> count is "number of processor cores available", the "sliding
window of
> requests" with a high thread count is likely undistinguishable from
real
> random access.
> 
> Also I (vaguely?) gather that nbdcopy bypasses the page cache (or does
> it only sync automatically at the end? I don't remember). If the page
> cache is avoided, then the page cache has no chance to mitigate the
> thrashing, especially on HDDs -- but even on SSDs, if the drive's
> internal cache is not large enough (considering the individual request
> size and the number of random requests flying in parallel), the
> degradation should be visible.
> 
> Can you tweak (i.e., lower) the thread count of both nbdcopy processes;
> let's say to "1", for starters?
> 
> Thanks!
> Laszlo
>

Richard W.M. Jones

2022-Jan-11 10:56 UTC

head link

[Libguestfs] Virt-v2v performance benchmarking part 3

On Tue, Jan 11, 2022 at 08:07:39AM +0100, Laszlo Ersek
wrote:> hmmmm, not necessarily; according to the manual, "qemu-img
convert" uses
> (by default) 8 co-routines. There's also the -W flag ("out of
order
> writes"), which I don't know if the original virt-v2v used.
I'm never sure how qemu coroutines map to threads.  I assume it's not
1-1, and it's somehow connected to the iothread setting?

The -W flag was only used for -o rhv-upload and not for any other
input or output method.  See write_out_of_order in the 1.44 code.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Libguestfs - Jan 2022 - Virt-v2v performance benchmarking part 3

[Libguestfs] Virt-v2v performance benchmarking part 3

[Libguestfs] Virt-v2v performance benchmarking part 3