thr3ads.net - Libguestfs - [Libguestfs] How to speed up Kernel Client

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2022-Jun-13 10:24 UTC

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath
wrote:> Hello,
> 
> I am trying to improve performance of the scenario where the kernel's
> NBD client talks to NBDKit's S3 plugin.
> 
> For me, the main bottleneck is currently due to the fact that the kernel
> aligns requests to only 512 B, no matter the blocksize reported by
> nbdkit.
> 
> Using a 512 B object size is not feasible (due to latency and request
> overhead). However, with a larger object size there are two conflicting
> objectives:
> 
> 1. To maximize parallelism (which is important to reduce the effects of
> connection latency), it's best to limit the size of the kernel's
NBD
> requests to the object size.
> 
> 2. To minimize un-aligned writes, it's best to allow arbitrarily large
> NBD requests, because the larger the requests the larger the amount of
> full blocks that are written. Unfortunately this means that all objects
> touched by the request are written sequentially.
> 
> I see a number of ways to address that:
> 
> 1. Change the kernel's NBD code to honor the blocksize reported by the
>    NBD server. This would be ideal, but I don't feel up to making this
>    happen. Theoretical solution only.
This would be the ideal solution.  I wonder how technically
complicated it would be actually?

AIUI you'd have to modify nbd-client to query the block limits from
the server, which is the hardest part of this, but it's all userspace
code.  Then you'd pass those down to the kernel via the ioctl (see
drivers/block/nbd.c:__nbd_ioctl).  Then inside the kernel you'd call
blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure how
you set the max request size, or if that's possible).  See
block/blk-settings.c for details of these functions.

As a quick test you could try calling blk_queue_io_* in the kernel
driver with hard-coded values, to see if that modifies the requests
that are seen by nbdkit.  Should give you some confidence before
making the full change.

BTW I notice that the kernel NBD driver always reports that it's a
non-rotational device, ignoring the server setting ...
> 2. Change the S3 plugin to use multiple threads, so that it can upload
>    multiple objects in parallel even when they're part of the same NBD
>    request. The disadvantage is that this adds a second "layer"
of
>    threads, in addition to those started by NBDkit itself.
There are existing plugins which do this (see VDDK plugin).
> 3. Change NBDkit itself to split up requests *and* distribute them to
>    multiple threads. I believe this means changes to the core code
>    because the blocksize filter can't dispatch requests to multiple
>    threads. 
This would be a major change to nbdkit that would likely have
unexpected side-effects everywhere.
> What do people think is the best way to proceed? Is there a fourth
> option that I might be missing?
> 
> 
> Best,
> -Nikolaus
Personally I think option (1) is the best here.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Nikolaus Rath

2022-Jun-13 10:58 UTC

head link

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

On Jun 13 2022, "Richard W.M. Jones" <rjones at redhat.com>
wrote:> On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath wrote:
>> Hello,
>> 
>> I am trying to improve performance of the scenario where the
kernel's
>> NBD client talks to NBDKit's S3 plugin.
>> 
>> For me, the main bottleneck is currently due to the fact that the
kernel
>> aligns requests to only 512 B, no matter the blocksize reported by
>> nbdkit.
>> 
>> Using a 512 B object size is not feasible (due to latency and request
>> overhead). However, with a larger object size there are two conflicting
>> objectives:
>> 
>> 1. To maximize parallelism (which is important to reduce the effects of
>> connection latency), it's best to limit the size of the
kernel's NBD
>> requests to the object size.
>> 
>> 2. To minimize un-aligned writes, it's best to allow arbitrarily
large
>> NBD requests, because the larger the requests the larger the amount of
>> full blocks that are written. Unfortunately this means that all objects
>> touched by the request are written sequentially.
>> 
>> I see a number of ways to address that:
>> 
>> 1. Change the kernel's NBD code to honor the blocksize reported by
the
>>    NBD server. This would be ideal, but I don't feel up to making
this
>>    happen. Theoretical solution only.
>
> This would be the ideal solution.  I wonder how technically
> complicated it would be actually?
>
> AIUI you'd have to modify nbd-client to query the block limits from
> the server, which is the hardest part of this, but it's all userspace
> code.  Then you'd pass those down to the kernel via the ioctl (see
> drivers/block/nbd.c:__nbd_ioctl).  Then inside the kernel you'd call
> blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure
how
> you set the max request size, or if that's possible).  See
> block/blk-settings.c for details of these functions.
If it's only about getting the blocksize from the NBD server, then I
certainly feel up to the task.

However, nbd-client already has:

       -block-size block size

       -b     Use a blocksize of "block size". Default is 1024;
allowed values  are  ei?
              ther 512, 1024, 2048 or 4096

So my worry is that more complicated in-kernel changes will be needed to
make other values work. In particular, nbd_is_valid_blksize() (in nbd.c)
checks that the block size is less or equal to PAGE_SIZE.

(I'm interested in 32 kB and 512 kB block sizes)

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

Josef Bacik

2022-Jun-13 17:25 UTC

head link

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

On Mon, Jun 13, 2022 at 6:24 AM Richard W.M. Jones <rjones at redhat.com>
wrote:>
> On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath wrote:
> > Hello,
> >
> > I am trying to improve performance of the scenario where the
kernel's
> > NBD client talks to NBDKit's S3 plugin.
> >
> > For me, the main bottleneck is currently due to the fact that the
kernel
> > aligns requests to only 512 B, no matter the blocksize reported by
> > nbdkit.
> >
> > Using a 512 B object size is not feasible (due to latency and request
> > overhead). However, with a larger object size there are two
conflicting
> > objectives:
> >
> > 1. To maximize parallelism (which is important to reduce the effects
of
> > connection latency), it's best to limit the size of the
kernel's NBD
> > requests to the object size.
> >
> > 2. To minimize un-aligned writes, it's best to allow arbitrarily
large
> > NBD requests, because the larger the requests the larger the amount of
> > full blocks that are written. Unfortunately this means that all
objects
> > touched by the request are written sequentially.
> >
> > I see a number of ways to address that:
> >
> > 1. Change the kernel's NBD code to honor the blocksize reported by
the
> >    NBD server. This would be ideal, but I don't feel up to making
this
> >    happen. Theoretical solution only.
>
> This would be the ideal solution.  I wonder how technically
> complicated it would be actually?
>
> AIUI you'd have to modify nbd-client to query the block limits from
> the server, which is the hardest part of this, but it's all userspace
> code.  Then you'd pass those down to the kernel via the ioctl (see
> drivers/block/nbd.c:__nbd_ioctl).  Then inside the kernel you'd call
> blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure
how
> you set the max request size, or if that's possible).  See
> block/blk-settings.c for details of these functions.
>
Exactly this.  The kernel just does what the client tells it to do,
and the kernel can be configured for whatever blocksize.
Unfortunately there's not a way for the server to advertise to the
client what to do, you have to configure it on the client.  Adding
some code to userspace negotiation that happens is the right thing to
do here to pull the blocksize, and then simply pass this into the
configuration stuff in the nbd-client and it uses the appropriate
netlink tag to set the blocksize.
> As a quick test you could try calling blk_queue_io_* in the kernel
> driver with hard-coded values, to see if that modifies the requests
> that are seen by nbdkit.  Should give you some confidence before
> making the full change.
>
> BTW I notice that the kernel NBD driver always reports that it's a
> non-rotational device, ignoring the server setting ...
That I can fix easily, I'll get that done.  Thanks,

Josef

Richard W.M. Jones

2022-Jun-13 21:20 UTC

head link

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

Attached are the patches I'm experimenting with.  I'm not sure if
I'm
missing something important.

The first patch is for the kernel.  This adds two netlink attributes
NBD_ATTR_BLOCK_SIZE_{MIN,OPT} to set the minimum and preferred block
size respectively.  These appear to work, the values are reflected in
/sys/devices/virtual/block/nbd0/queue/minimum_io_size,
/sys/devices/virtual/block/nbd0/queue/optimal_io_size and in blkid:

$ cat /sys/devices/virtual/block/nbd0/queue/minimum_io_size 
16384
$ cat /sys/devices/virtual/block/nbd0/queue/optimal_io_size
65536
$ sudo blkid -i /dev/nbd0
/dev/nbd0: MINIMUM_IO_SIZE="16384" OPTIMAL_IO_SIZE="65536"
PHYSICAL_SECTOR_SIZE="512" LOGICAL_SECTOR_SIZE="512"

The second patch is for nbd and just hard-codes some values for
testing purposes.

The problem is that programs I've tried, including the kernel, don't
respect these values very much.  Certainly mkfs and the kernel doesn't
seem to have a problem sending requests to the server which are
smaller than the minimum we requested.  It's hard to tell if it really
has much effect or not.

BTW I only modified the netlink interface, not the ioctl interface.
Is it true that we've deprecated the ioctl interface to nbd.ko and
prefer the netlink interface?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
-------------- next part -------------->From ad861c06ec0a35278d546dc61e687bf65090a98d Mon Sep 17 00:00:00 2001From: "Richard W.M. Jones" <rjones at redhat.com>
Date: Mon, 13 Jun 2022 20:08:08 +0100
Subject: [PATCH] nbd: Permit NBD client to set IO minimum and preferred sizes

The NBD protocol permits servers to specify a minimum, preferred and
maximum block size during the handshake:

https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-size-constraints

Our NBD client previously ignored this and would send requests with
smaller granularity than the server wanted (eg. 512 bytes).

Allow the userspace part of the client to negotiate minimum and
preferred block sizes with the server and send this information over
netlink to the kernel.  The maximum block size is still ignored.

Signed-off-by: Richard W.M. Jones <rjones at redhat.com>
---
 drivers/block/nbd.c              | 11 ++++++++++-
 include/uapi/linux/nbd-netlink.h |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 07f3c139a3d7..441cdd96aa9d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1876,6 +1876,8 @@ static const struct nla_policy
nbd_attr_policy[NBD_ATTR_MAX + 1] = {
 	[NBD_ATTR_DEAD_CONN_TIMEOUT]	=	{ .type = NLA_U64 },
 	[NBD_ATTR_DEVICE_LIST]		=	{ .type = NLA_NESTED},
 	[NBD_ATTR_BACKEND_IDENTIFIER]	=	{ .type = NLA_STRING},
+	[NBD_ATTR_BLOCK_SIZE_MIN]	=	{ .type = NLA_U32 },
+	[NBD_ATTR_BLOCK_SIZE_OPT]	=	{ .type = NLA_U32 },
 };
 
 static const struct nla_policy nbd_sock_policy[NBD_SOCK_MAX + 1] = {
@@ -2031,7 +2033,14 @@ static int nbd_genl_connect(struct sk_buff *skb, struct
genl_info *info)
 				&config->runtime_flags);
 		}
 	}
-
+	if (info->attrs[NBD_ATTR_BLOCK_SIZE_MIN]) {
+		u32 bytes = nla_get_u32(info->attrs[NBD_ATTR_BLOCK_SIZE_MIN]);
+		blk_queue_io_min(nbd->disk->queue, bytes);
+	}
+	if (info->attrs[NBD_ATTR_BLOCK_SIZE_OPT]) {
+		u32 bytes = nla_get_u32(info->attrs[NBD_ATTR_BLOCK_SIZE_OPT]);
+		blk_queue_io_opt(nbd->disk->queue, bytes);
+	}
 	if (info->attrs[NBD_ATTR_SOCKETS]) {
 		struct nlattr *attr;
 		int rem, fd;
diff --git a/include/uapi/linux/nbd-netlink.h b/include/uapi/linux/nbd-netlink.h
index 2d0b90964227..1d6621487560 100644
--- a/include/uapi/linux/nbd-netlink.h
+++ b/include/uapi/linux/nbd-netlink.h
@@ -36,6 +36,8 @@ enum {
 	NBD_ATTR_DEAD_CONN_TIMEOUT,
 	NBD_ATTR_DEVICE_LIST,
 	NBD_ATTR_BACKEND_IDENTIFIER,
+	NBD_ATTR_BLOCK_SIZE_MIN,
+	NBD_ATTR_BLOCK_SIZE_OPT,
 	__NBD_ATTR_MAX,
 };
 #define NBD_ATTR_MAX (__NBD_ATTR_MAX - 1)
-- 
2.35.1

-------------- next part -------------->From d8cea729193d5607e568b3ededbcbb7c8e2e51ea Mon Sep 17 00:00:00 2001From: "Richard W.M. Jones" <rjones at redhat.com>
Date: Mon, 13 Jun 2022 21:04:34 +0100
Subject: [PATCH] Force min/opt sizes

---
 nbd-client.c  | 2 ++
 nbd-netlink.h | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/nbd-client.c b/nbd-client.c
index c187e8c..bd1587b 100644
--- a/nbd-client.c
+++ b/nbd-client.c
@@ -186,6 +186,8 @@ static void netlink_configure(int index, int *sockfds, int
num_connects,
 		NLA_PUT_U32(msg, NBD_ATTR_INDEX, index);
 	NLA_PUT_U64(msg, NBD_ATTR_SIZE_BYTES, size64);
 	NLA_PUT_U64(msg, NBD_ATTR_BLOCK_SIZE_BYTES, blocksize);
+	NLA_PUT_U32(msg, NBD_ATTR_BLOCK_SIZE_MIN, 16384);
+	NLA_PUT_U32(msg, NBD_ATTR_BLOCK_SIZE_OPT, 65536);
 	NLA_PUT_U64(msg, NBD_ATTR_SERVER_FLAGS, flags);
 	if (timeout)
 		NLA_PUT_U64(msg, NBD_ATTR_TIMEOUT, timeout);
diff --git a/nbd-netlink.h b/nbd-netlink.h
index fd0f4e4..9901f1b 100644
--- a/nbd-netlink.h
+++ b/nbd-netlink.h
@@ -31,6 +31,11 @@ enum {
 	NBD_ATTR_SERVER_FLAGS,
 	NBD_ATTR_CLIENT_FLAGS,
 	NBD_ATTR_SOCKETS,
+	NBD_ATTR_DEAD_CONN_TIMEOUT,
+	NBD_ATTR_DEVICE_LIST,
+	NBD_ATTR_BACKEND_IDENTIFIER,
+	NBD_ATTR_BLOCK_SIZE_MIN,
+	NBD_ATTR_BLOCK_SIZE_OPT,
 	__NBD_ATTR_MAX,
 };
 #define NBD_ATTR_MAX (__NBD_ATTR_MAX - 1)
-- 
2.36.1

Libguestfs - Jun 2022 - How to speed up Kernel Client - S3 plugin use-case

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

[Libguestfs] How to speed up Kernel Client - S3 plugin use-case