Richard W.M. Jones
2022-Feb-13 09:40 UTC
[Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload
On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote:> rhv-upload plugin is translating every NBD command to HTTP request, > translated back to NBD command on imageio server. The HTTP client and > server, and the NBD client on the imageio server side are synchronous > and implemented in python, so they have high overhead per request. To > get good performance we need to use larger request size. > > Testing shows that request size of 8MiB is best, speeding up the copy > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster).Unfortunately this will break VDDK since it cannot handle very large requests (I think 4M is about the max without reconfiguring the server). Also larger requests have adverse performance effects in other configurations, although I understand this patch tries to retrict the change to when the output mode is rhv-upload. We need to think of some other approach, but I'm not sure what it is. I'd really like to be able to talk to imageio's NBD server directly! Other relevant commits: https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6 https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165 [...]> This is an ugly hack; the preferred request size should be a function of > the output module that only output_rhv_upload will override, but I don't > know how to implement this with the current code.Just add a new value to output/output.ml{,i}. There is no superclassing (this is not OO) so you'll have to add the value to every output module implementation, defaulting to None. However I'd like to think of another approach first. - Have nbdcopy split and combine requests so request size for input and output can be different? Sounds complicated but might be necessary one day to support minimum block size. - More efficient Python plugin that might combine requests? Also complicated ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Nir Soffer
2022-Feb-13 15:13 UTC
[Libguestfs] [PATCH] v2v/v2v.ml: Use larger request size for -o rhv-upload
On Sun, Feb 13, 2022 at 11:41 AM Richard W.M. Jones <rjones at redhat.com> wrote:> > On Sat, Feb 12, 2022 at 10:49:42PM +0200, Nir Soffer wrote: > > rhv-upload plugin is translating every NBD command to HTTP request, > > translated back to NBD command on imageio server. The HTTP client and > > server, and the NBD client on the imageio server side are synchronous > > and implemented in python, so they have high overhead per request. To > > get good performance we need to use larger request size. > > > > Testing shows that request size of 8MiB is best, speeding up the copy > > disk phase from 14.7 seconds to 7.7 seconds (1.9x times faster). > > Unfortunately this will break VDDK since it cannot handle very large > requests (I think 4M is about the max without reconfiguring the > server).Are you sure it will break VDDK? Request size limit is in VDDK API, not in the nbdkit plugin. When you request 8M request, the VDDK plugin should allocated a 8M buffer, and issue multiple calls to VDDK APIs, using VDDK maximum request size to fill the buffer. If the VDDK plugin does not do this, this is a bug in the plugin, since it must respect the underlying API. If 8M does break the vddk plugin, we can use 4M, it is only a little slower then 8M but still much faster than 256k.> Also larger requests have adverse performance effects in > other configurations, although I understand this patch tries to > retrict the change to when the output mode is rhv-upload.Yes, this affects only -o rhv-upload.> We need to think of some other approach, but I'm not sure what it is. > I'd really like to be able to talk to imageio's NBD server directly!We have a RFE to implement a local nbd socket, which should be easy, but the only attention we got so far was an attempt to close it ;-) Even if we have a local only nbd socket, lets's say in oVirt 4.6, it will not help existing users.> Other relevant commits: > https://github.com/libguestfs/virt-v2v/commit/7ebb2c8db9d4d297fbbef116a9828a9dde700de6 > https://github.com/libguestfs/virt-v2v/commit/08e764959ec9dadd71a95d22d3d88d647a18d165This looks like a nicer way to implement this change.> > [...] > > This is an ugly hack; the preferred request size should be a function of > > the output module that only output_rhv_upload will override, but I don't > > know how to implement this with the current code. > > Just add a new value to output/output.ml{,i}. There is no > superclassing (this is not OO) so you'll have to add the value to > every output module implementation, defaulting to None.Sounds reasonable, we have only a few outputs.> However I'd like to think of another approach first. > > - Have nbdcopy split and combine requests so request size for input > and output can be different? Sounds complicated but might be > necessary one day to support minimum block size.I think this is not needed for the case of large request size, since on nbd client/server size it is easy to handle large requests regardless of the underlying API limits.> - More efficient Python plugin that might combine requests? Also > complicated ...This will be slower and complicated. First we need to solve the issue of getting the right connection to handle the nbd command - now every command is handled by a random connection from the pool. Assuming we solved connection pooling, we can keep a buffer for the next request, and on every pwrite() fill this buffer. When the buffer is full, or when receiving a zero()/flush()/close() request, we need to send the buffer. This introduces additional copy which will slow down the transfer. Finally there is the issue of reporting errors - since we buffer nbd commands data, we cannot report errors for every commands, we need to report errors later, either in the middle of the command (buffer becomes full) or when the next command is received. So this will be slower, very hard to implement, and hard to debug. The way we solved this in imageio client - copying data between nbd and http backends, is to implement the copy loop in the http client. In imageio we have this (simplified) copy loop: for extent in extents: if zero: dst.zero(extent) else: copy(src, dst, extent) copy() is checking if the src or dst can do efficient copy, and if not it fallbacks to a generic copy: if hasattr(dst, "read_from"): dst.read_from(src, extent) elif hasattr(src, "write_to"): srt.write_to(dst, extent) else: for chunk in extent: src.read(chunk) dst.write(chunk) The http backend implements read_from like this: send put request headers for chunk in extent: src.read(chunk) socket.write(chunk) In this way we can use the most efficient request size for the nbd input (256k), for the http socket (256k is good), but minimize the number of http requests. Of course this will be harder with the async API, but it is possible. We can keep the pread requests in a FIFO queue, and send each chunk in the right order. The Go aio_copy example does this: https://gitlab.com/nbdkit/libnbd/-/blob/master/golang/examples/aio_copy/aio_copy.go#L106 This can be done in nbdcopy by adding http-ops. The pipeline in rhv-upload case will be: nbdkit (any input) -> nbdcopy (nbd-ops/http-ops) -> imageio server Another way is to drop nbdcopy for rhv-upload - if we provide nbd input, we can use imageio client to read from the nbd input and write to imageio server. The pipeline will be: nbdkit (any input) -> imageio client -> imageio server This means that the v2v output can optionally do the copy phase, if it knows how to read from the nbdkit input. Yet another way is to rewrite imageio in Go, and use libnbd async API, which will eliminate most of the overhead of the http requests. I started this work with the imageio client side. You can check what we have so far here: https://github.com/oVirt/ovirt-imageio/tree/master/ovirt-img For the long term, I think improving nbdcopy to support additional input/output backends is the right way and will make it more useful for other purposes. For now I will post a better v2. It will be nice to have this in RHEL 9.0. Nir