thr3ads.net - Libguestfs - [Libguestfs] How to speed up Kernel Client

If this information is useful, please help other people find it:
Share via:

Nikolaus Rath

2022-Jun-13 09:33 UTC

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

Hello,

I am trying to improve performance of the scenario where the kernel's
NBD client talks to NBDKit's S3 plugin.

For me, the main bottleneck is currently due to the fact that the kernel
aligns requests to only 512 B, no matter the blocksize reported by
nbdkit.

Using a 512 B object size is not feasible (due to latency and request
overhead). However, with a larger object size there are two conflicting
objectives:

1. To maximize parallelism (which is important to reduce the effects of
connection latency), it's best to limit the size of the kernel's NBD
requests to the object size.

2. To minimize un-aligned writes, it's best to allow arbitrarily large
NBD requests, because the larger the requests the larger the amount of
full blocks that are written. Unfortunately this means that all objects
touched by the request are written sequentially.

I see a number of ways to address that:

1. Change the kernel's NBD code to honor the blocksize reported by the
   NBD server. This would be ideal, but I don't feel up to making this
   happen. Theoretical solution only.

2. Change the S3 plugin to use multiple threads, so that it can upload
   multiple objects in parallel even when they're part of the same NBD
   request. The disadvantage is that this adds a second "layer" of
   threads, in addition to those started by NBDkit itself.

3. Change NBDkit itself to split up requests *and* distribute them to
   multiple threads. I believe this means changes to the core code
   because the blocksize filter can't dispatch requests to multiple
   threads. 
   

What do people think is the best way to proceed? Is there a fourth
option that I might be missing?


Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

Richard W.M. Jones

2022-Jun-13 10:24 UTC

head link

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath
wrote:> Hello,
> 
> I am trying to improve performance of the scenario where the kernel's
> NBD client talks to NBDKit's S3 plugin.
> 
> For me, the main bottleneck is currently due to the fact that the kernel
> aligns requests to only 512 B, no matter the blocksize reported by
> nbdkit.
> 
> Using a 512 B object size is not feasible (due to latency and request
> overhead). However, with a larger object size there are two conflicting
> objectives:
> 
> 1. To maximize parallelism (which is important to reduce the effects of
> connection latency), it's best to limit the size of the kernel's
NBD
> requests to the object size.
> 
> 2. To minimize un-aligned writes, it's best to allow arbitrarily large
> NBD requests, because the larger the requests the larger the amount of
> full blocks that are written. Unfortunately this means that all objects
> touched by the request are written sequentially.
> 
> I see a number of ways to address that:
> 
> 1. Change the kernel's NBD code to honor the blocksize reported by the
>    NBD server. This would be ideal, but I don't feel up to making this
>    happen. Theoretical solution only.
This would be the ideal solution.  I wonder how technically
complicated it would be actually?

AIUI you'd have to modify nbd-client to query the block limits from
the server, which is the hardest part of this, but it's all userspace
code.  Then you'd pass those down to the kernel via the ioctl (see
drivers/block/nbd.c:__nbd_ioctl).  Then inside the kernel you'd call
blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure how
you set the max request size, or if that's possible).  See
block/blk-settings.c for details of these functions.

As a quick test you could try calling blk_queue_io_* in the kernel
driver with hard-coded values, to see if that modifies the requests
that are seen by nbdkit.  Should give you some confidence before
making the full change.

BTW I notice that the kernel NBD driver always reports that it's a
non-rotational device, ignoring the server setting ...
> 2. Change the S3 plugin to use multiple threads, so that it can upload
>    multiple objects in parallel even when they're part of the same NBD
>    request. The disadvantage is that this adds a second "layer"
of
>    threads, in addition to those started by NBDkit itself.
There are existing plugins which do this (see VDDK plugin).
> 3. Change NBDkit itself to split up requests *and* distribute them to
>    multiple threads. I believe this means changes to the core code
>    because the blocksize filter can't dispatch requests to multiple
>    threads. 
This would be a major change to nbdkit that would likely have
unexpected side-effects everywhere.
> What do people think is the best way to proceed? Is there a fourth
> option that I might be missing?
> 
> 
> Best,
> -Nikolaus
Personally I think option (1) is the best here.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Libguestfs - Jun 2022 - How to speed up Kernel Client - S3 plugin use-case

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case