On Jun 14 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote:> This is a follow-up to this thread: > > https://listman.redhat.com/archives/libguestfs/2022-June/thread.html#29210 > > about getting the kernel client (nbd.ko) to obey block size > constraints sent by the NBD server: > > https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-size-constraints > > I was sent this very interesting design document about the original > intent behind the kernel's I/O limits: > > https://people.redhat.com/msnitzer/docs/io-limits.txt > > There are four or five kernel block layer settings we could usefully > adjust, and there are three NBD block size constraints, and in my > opinion there's not a very clear mapping between them. But I'll have > a go at what I think we should do. > > - - - > > (1) Kernel physical_block_size & logical_block_size: The example given > is of a hard disk with 4K physical sectors (AF) which can nevertheless > emulate 512-byte sectors. In this case you'd set physical_block_size > = 4K, logical_block_size = 512b. > > Data structures (partition tables, etc) should be aligned to > physical_block_size to avoid unnecessary RMW cycles. But the > fundamental until of I/O is logical_block_size. > > Current behaviour of nbd.ko is that logical_block_size => physical_block_size == the nbd-client "-b" option (default: 512 bytes, > contradicting the documentation). > > I think we should set logical_block_size == physical_block_size => MAX (512, NBD minimum block size constraint).Why the lower bound of 512?> What should happen to the nbd-client -b option?Perhaps it should become the lower-bound (instead of the hardcoded 512)? That's assuming there is a reason for having a client-specified lower bound.> (2) Kernel minimum_io_size: The documentation says this is the > "preferred minimum unit for random I/O". > > Current behaviour of nbd.ko is this is not set. > > I think the NBD's preferred block size should map to minimum_io_size. > > > (3) Kernel optimal_io_size: The documentation says this is the > "[preferred] streaming I/O [size]". > > Current behaviour of nbd.ko is this is not set. > > NBD doesn't really have the concept of streaming vs random I/O, so we > could either ignore this or set it to the same value as > minimum_io_size. > > I have a kernel patch allowing nbd-client to set both minimum_io_size > and optimal_io_size from userspace. > > > (4) Kernel blk_queue_max_hw_sectors: This is documented as: "set max > sectors for a request ... Enables a low level driver to set a hard > upper limit, max_hw_sectors, on the size of requests." > > Current behaviour of nbd.ko is that we set this to 65536 (sectors? > blocks?), which for 512b sectors is 32M.FWIW, on my 5.16 kernel, the default is 65 kB (according to /sys/block/nbdX/queue/max_sectors_kb x 512b).> I think we could set this to MIN (32M, NBD maximum block size constraint), > converting the result to sectors.I don't think that's right. Rather, it should be NBD's preferred block size. Setting this to the preferred block size means that NBD requests will be this large whenever there are enough sequential dirty pages, and that no requests will ever be larger than this. I think this is exactly what the NBD server would like to have. Settings this to the maximum block size would mean that NBD requests will exceed the preferred size whenever there are enough sequential dirty pages (while still obeying the maximum). This seems strictly worse. Unrelated to the proposed changes (all of which I think are technically correct), I am wondering if this will have much practical benefits. As far as I can tell, the kernel currently aligns NBD requests to the logical/physical block size rather than the size of the NBD request. Are there NBD servers that would benefit from the kernel honoring the preferred blocksize if the data is not also aligned to this blocksize? Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.?
Richard W.M. Jones
2022-Jun-15 10:09 UTC
[Libguestfs] Kernel driver I/O block size hinting
On Tue, Jun 14, 2022 at 08:30:15PM +0100, Nikolaus Rath wrote:> On Jun 14 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote: > > I think we should set logical_block_size == physical_block_size => > MAX (512, NBD minimum block size constraint). > > Why the lower bound of 512?I suspect the kernel can't handle sector sizes smaller than 512 bytes. By default the NBD protocol advises advertising a minimum size of 1 byte, and I'm almost certain setting logical_block_size == 1 would break everything.> > What should happen to the nbd-client -b option? > > Perhaps it should become the lower-bound (instead of the hardcoded 512)? > That's assuming there is a reason for having a client-specified lower > bound.Right, I don't think there's a reason to continue with the -b option. I only use it to set -b 512 to work around the annoying default in older versions (which was 1024).> > (4) Kernel blk_queue_max_hw_sectors: This is documented as: "set max > > sectors for a request ... Enables a low level driver to set a hard > > upper limit, max_hw_sectors, on the size of requests." > > > > Current behaviour of nbd.ko is that we set this to 65536 (sectors? > > blocks?), which for 512b sectors is 32M. > > FWIW, on my 5.16 kernel, the default is 65 kB (according to > /sys/block/nbdX/queue/max_sectors_kb x 512b).I have: $ cat /sys/devices/virtual/block/nbd0/queue/max_hw_sectors_kb 32768 (ie. 32 MB) which I think comes from the nbd module setting: blk_queue_max_hw_sectors(disk->queue, 65536); multiplied by 512b sectors.> > I think we could set this to MIN (32M, NBD maximum block size constraint), > > converting the result to sectors. > > I don't think that's right. Rather, it should be NBD's preferred block > size. > > Setting this to the preferred block size means that NBD requests will be > this large whenever there are enough sequential dirty pages, and that no > requests will ever be larger than this. I think this is exactly what the > NBD server would like to have.This kernel setting limits the maximum request size on the queue. In my testing reading and writing files with the default [above] the kernel never got anywhere near sending multi-megabyte requests. In fact the largest request it sent was 128K, even when I did stuff like: # dd if=/dev/zero of=/tmp/mnt/zero bs=100M count=10 128K happens to be 2 x blk_queue_io_opt, but I need to do more testing to see if that relationship always holds.> Settings this to the maximum block size would mean that NBD requests > will exceed the preferred size whenever there are enough sequential > dirty pages (while still obeying the maximum). This seems strictly > worse. > > Unrelated to the proposed changes (all of which I think are technically > correct), I am wondering if this will have much practical benefits. As > far as I can tell, the kernel currently aligns NBD requests to the > logical/physical block size rather than the size of the NBD request. Are > there NBD servers that would benefit from the kernel honoring the > preferred blocksize if the data is not also aligned to this blocksize?I'm not sure I parsed this. Can you give an example? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit