Richard W.M. Jones
2022-Jun-14 14:38 UTC
[Libguestfs] Kernel driver I/O block size hinting
This is a follow-up to this thread: https://listman.redhat.com/archives/libguestfs/2022-June/thread.html#29210 about getting the kernel client (nbd.ko) to obey block size constraints sent by the NBD server: https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-size-constraints I was sent this very interesting design document about the original intent behind the kernel's I/O limits: https://people.redhat.com/msnitzer/docs/io-limits.txt There are four or five kernel block layer settings we could usefully adjust, and there are three NBD block size constraints, and in my opinion there's not a very clear mapping between them. But I'll have a go at what I think we should do. - - - (1) Kernel physical_block_size & logical_block_size: The example given is of a hard disk with 4K physical sectors (AF) which can nevertheless emulate 512-byte sectors. In this case you'd set physical_block_size = 4K, logical_block_size = 512b. Data structures (partition tables, etc) should be aligned to physical_block_size to avoid unnecessary RMW cycles. But the fundamental until of I/O is logical_block_size. Current behaviour of nbd.ko is that logical_block_size =physical_block_size == the nbd-client "-b" option (default: 512 bytes, contradicting the documentation). I think we should set logical_block_size == physical_block_size =MAX (512, NBD minimum block size constraint). What should happen to the nbd-client -b option? (2) Kernel minimum_io_size: The documentation says this is the "preferred minimum unit for random I/O". Current behaviour of nbd.ko is this is not set. I think the NBD's preferred block size should map to minimum_io_size. (3) Kernel optimal_io_size: The documentation says this is the "[preferred] streaming I/O [size]". Current behaviour of nbd.ko is this is not set. NBD doesn't really have the concept of streaming vs random I/O, so we could either ignore this or set it to the same value as minimum_io_size. I have a kernel patch allowing nbd-client to set both minimum_io_size and optimal_io_size from userspace. (4) Kernel blk_queue_max_hw_sectors: This is documented as: "set max sectors for a request ... Enables a low level driver to set a hard upper limit, max_hw_sectors, on the size of requests." Current behaviour of nbd.ko is that we set this to 65536 (sectors? blocks?), which for 512b sectors is 32M. I think we could set this to MIN (32M, NBD maximum block size constraint), converting the result to sectors. - - - What do people think? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
On Jun 14 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote:> This is a follow-up to this thread: > > https://listman.redhat.com/archives/libguestfs/2022-June/thread.html#29210 > > about getting the kernel client (nbd.ko) to obey block size > constraints sent by the NBD server: > > https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-size-constraints > > I was sent this very interesting design document about the original > intent behind the kernel's I/O limits: > > https://people.redhat.com/msnitzer/docs/io-limits.txt > > There are four or five kernel block layer settings we could usefully > adjust, and there are three NBD block size constraints, and in my > opinion there's not a very clear mapping between them. But I'll have > a go at what I think we should do. > > - - - > > (1) Kernel physical_block_size & logical_block_size: The example given > is of a hard disk with 4K physical sectors (AF) which can nevertheless > emulate 512-byte sectors. In this case you'd set physical_block_size > = 4K, logical_block_size = 512b. > > Data structures (partition tables, etc) should be aligned to > physical_block_size to avoid unnecessary RMW cycles. But the > fundamental until of I/O is logical_block_size. > > Current behaviour of nbd.ko is that logical_block_size => physical_block_size == the nbd-client "-b" option (default: 512 bytes, > contradicting the documentation). > > I think we should set logical_block_size == physical_block_size => MAX (512, NBD minimum block size constraint).Why the lower bound of 512?> What should happen to the nbd-client -b option?Perhaps it should become the lower-bound (instead of the hardcoded 512)? That's assuming there is a reason for having a client-specified lower bound.> (2) Kernel minimum_io_size: The documentation says this is the > "preferred minimum unit for random I/O". > > Current behaviour of nbd.ko is this is not set. > > I think the NBD's preferred block size should map to minimum_io_size. > > > (3) Kernel optimal_io_size: The documentation says this is the > "[preferred] streaming I/O [size]". > > Current behaviour of nbd.ko is this is not set. > > NBD doesn't really have the concept of streaming vs random I/O, so we > could either ignore this or set it to the same value as > minimum_io_size. > > I have a kernel patch allowing nbd-client to set both minimum_io_size > and optimal_io_size from userspace. > > > (4) Kernel blk_queue_max_hw_sectors: This is documented as: "set max > sectors for a request ... Enables a low level driver to set a hard > upper limit, max_hw_sectors, on the size of requests." > > Current behaviour of nbd.ko is that we set this to 65536 (sectors? > blocks?), which for 512b sectors is 32M.FWIW, on my 5.16 kernel, the default is 65 kB (according to /sys/block/nbdX/queue/max_sectors_kb x 512b).> I think we could set this to MIN (32M, NBD maximum block size constraint), > converting the result to sectors.I don't think that's right. Rather, it should be NBD's preferred block size. Setting this to the preferred block size means that NBD requests will be this large whenever there are enough sequential dirty pages, and that no requests will ever be larger than this. I think this is exactly what the NBD server would like to have. Settings this to the maximum block size would mean that NBD requests will exceed the preferred size whenever there are enough sequential dirty pages (while still obeying the maximum). This seems strictly worse. Unrelated to the proposed changes (all of which I think are technically correct), I am wondering if this will have much practical benefits. As far as I can tell, the kernel currently aligns NBD requests to the logical/physical block size rather than the size of the NBD request. Are there NBD servers that would benefit from the kernel honoring the preferred blocksize if the data is not also aligned to this blocksize? Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.?