Nir Soffer
2018-Mar-14 18:56 UTC
Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
On Mon, Mar 12, 2018 at 2:57 PM Eric Blake <eblake@redhat.com> wrote:> On 03/12/2018 07:13 AM, Nir Soffer wrote: > > On Mon, Mar 12, 2018 at 12:32 PM Richard W.M. Jones <rjones@redhat.com> > > wrote: > > > >> On Mon, Mar 12, 2018 at 07:13:52AM +0000, Nir Soffer wrote: > >>> On Fri, Mar 9, 2018 at 4:25 PM Richard W.M. Jones <rjones@redhat.com> > >> wrote: > >>> > >>>> It has to be said it would be really convenient to have a 'zero' > >>>> and/or 'trim' method of some sort. > >>>> > >>> > >>> 'trim' means discard? > >> > >> Yes. The 5 functions we could support are: > >> > >> * pread - done > >> * pwrite - done > >> * flush - does fdatasync(2) on the block device > >> > > > > Currently we do fsync() on every PUT request, so flush is not very > > useful. > > > > > >> * zero - write a range of zeroes without having to send zeroes > >> * trim - punch hole, can be emulated using zero if not possile > >> > > trim is advisory in NBD, so it can also be emulated as a no-op while > still having correct semantics. If you want to guarantee reading back > zeroes after punching a hole, you have to use zero instead of trim. > > > > >> Also (not implemented in nbdkit today, but coming soon), pwrite, zero > >> and trim can be extended with a FUA (force unit access) flag, which > >> would mean that the range should be persisted to disk before > >> returning. It can be emulated by calling flush after the operation. > > > > It wasn't clear if anything in this process flushes the content to > >> disk. Is that what transfer.finalize does? > >> > > > > All PUT requests fsync() before returning. We optimize for complete image > > trasfer, not for random io. > > In other words, you are already implicitly behaving as if FUA is already > set on every single request. It might be less efficient than what you > could otherwise achieve, but it's fine if consistency is more important > than speed. > > > >>> I would like to support only aligned offset and size - do you think it > >>> should work > >>> for qemu-img? > >> > >> It depends a bit on what you mean by "aligned" and what the alignment > >> is. We'd probably have to work around it in the plugin so that it can > >> round in the request, issues a zero operation for the aligned part, > >> and writes zeroes at each end. There's no guarantee that qemu-img > >> will be well-behaved in the future even if it is now. > > qemu-img in general tries to send sector-aligned data by default (it's > unusual that qemu tries to access less than that at once). In 2.11, > qemu-io can be made to send byte-aligned requests across any NBD > connection; in 2.12, it's tightened so that NBD requests are > sector-aligned unless the server advertised support for byte-aligned > requests (nbdkit does not yet advertise this). As a client, qemu-io > will then manually write zeroes to any unaligned portion (if there are > any), and use the actual zero request for the aligned middle. > > >> > > > > Aligned for direct I/O (we use direct I/O for read/write). We can support > > non-aligned ranges by doing similar emulation in the server, but I prefer > > to do > > it only if we have such requirement. If you need to do this in the > client, > > we > > probably need to do this in the server otherwise all clients may need to > > emulate this. > > > > I think there is no reason that qemu-img will zero unaligned ranges, but > > I guess Eric can have a better answer. > > Yeah, for now, you are probably safe assuming that qemu-img will never > send unaligned ranges. You're also correct that not all NBD servers > support read-modify-write at unaligned boundaries, so well-behaved > clients have to implement it themselves; while conversely not all > clients are well-behaved so good NBD servers have to implement it - > which is a duplication of effort since both sides of the equation have > to worry about it when they want maximum cross-implementation > portability. But that's life. > > And my pending patches for FUA support in nbdkit also add a > --filter=blocksize, which at least lets nbdkit guarantee aligned data > handed to the plugin even when the client is not well-behaved. >Thanks for the good input! I posted documentation for the new API optimized for random I/O: https://gerrit.ovirt.org/#/c/89022/ I changed POST to PATCH to match the existing /tickets API, and this also seems to be more standard way to do such operations. Please check and comment if this makes sense and serves the v2v use case or other uses case we missed. I think we can implement all of this for 4.2.4, but: - using simple zero loop, as in https://gerrit.ovirt.org/#/c/88793/. later we can make it more efficient. - trim is a noop, maybe we will be able to support it in 4.3 - flush - may be noop now (all requests will implicitly flush). I think we better have complete API with partial or simpler implementation now, to minimize the hacks needed in v2v and other clients. Nir
Richard W.M. Jones
2018-Mar-14 19:04 UTC
Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
On Wed, Mar 14, 2018 at 06:56:19PM +0000, Nir Soffer wrote:> I posted documentation for the new API optimized for random I/O: > https://gerrit.ovirt.org/#/c/89022/Wish I'd had this documentation when I started the patch :-) Yes, it's much clearer.> I changed POST to PATCH to match the existing /tickets API, and > this also seems to be more standard way to do such operations.Assuming Python httplib will allow us to put anything in the method argument of http.putrequest then this doesn't appear to make any significant difference so that's fine. Also we can set the "flush" (ie. FUA) parameter to match the NBD request.> Please check and comment if this makes sense and serves the v2v > use case or other uses case we missed. > > I think we can implement all of this for 4.2.4, but: > > - using simple zero loop, as in https://gerrit.ovirt.org/#/c/88793/. > later we can make it more efficient. > - trim is a noop, maybe we will be able to support it in 4.3 > - flush - may be noop now (all requests will implicitly flush).I don't think we really need trim or flush. They're only minor optimizations. Zero is the one which is required. FWIW NBD allows you to flush ranges or flush the whole disk, in case that matters (your proposal only allows you to flush the whole disk).> I think we better have complete API with partial or simpler > implementation now, to minimize the hacks needed in v2v and > other clients.Agreed. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Nir Soffer
2018-Mar-14 21:07 UTC
Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
On Wed, Mar 14, 2018 at 9:04 PM Richard W.M. Jones <rjones@redhat.com> wrote:> On Wed, Mar 14, 2018 at 06:56:19PM +0000, Nir Soffer wrote: > > I posted documentation for the new API optimized for random I/O: > > https://gerrit.ovirt.org/#/c/89022/ > > Wish I'd had this documentation when I started the patch :-) > Yes, it's much clearer. > > > I changed POST to PATCH to match the existing /tickets API, and > > this also seems to be more standard way to do such operations. > > Assuming Python httplib will allow us to put anything in the method > argument of http.putrequest then this doesn't appear to make any > significant difference so that's fine. Also we can set the "flush" > (ie. FUA) parameter to match the NBD request. > > > Please check and comment if this makes sense and serves the v2v > > use case or other uses case we missed. > > > > I think we can implement all of this for 4.2.4, but: > > > > - using simple zero loop, as in https://gerrit.ovirt.org/#/c/88793/. > > later we can make it more efficient. > > - trim is a noop, maybe we will be able to support it in 4.3 > > - flush - may be noop now (all requests will implicitly flush). > > I don't think we really need trim or flush. They're only minor > optimizations. Zero is the one which is required. > > FWIW NBD allows you to flush ranges or flush the whole disk, in case > that matters (your proposal only allows you to flush the whole disk). >What is the use case for flushing ranges? I guess we will have one or few flushes per images. Looking at sync_file_range(2), it does not seem to be a safe way to flush: Warning This system call is extremely dangerous and should not be used in portable programs. None of these operations writes out the file's metadata. Therefore, unless the application is strictly performing overwrites of already-instantiated disk blocks, there are no guarantees that the data will be available after a crash. There is no user interface to know if a write is purely an overwrite. On file systems using copy-on-write semantics (e.g., btrfs) an overwrite of existing allocated blocks is impossible. When writing into prealā located space, many file systems also require calls into the block allocator, which this system call does not sync out to disk. This system call does not flush disk write caches and thus does not provide any data integrity on systems with volatile disk write caches. I can support the same size and offset arguments, and treat them as a hint if we can implement this safely in some future version. But I think providing only simple and safe way to flush is good enough for this context.> I think we better have complete API with partial or simpler > > implementation now, to minimize the hacks needed in v2v and > > other clients. > > Agreed. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat > http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-p2v converts physical machines to virtual machines. Boot with a > live CD or over the network (PXE) and turn machines into KVM guests. > http://libguestfs.org/virt-v2v >
Eric Blake
2018-Mar-15 11:32 UTC
Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
On 03/14/2018 02:04 PM, Richard W.M. Jones wrote:> I don't think we really need trim or flush. They're only minor > optimizations. Zero is the one which is required. > > FWIW NBD allows you to flush ranges or flush the whole disk, in case > that matters (your proposal only allows you to flush the whole disk).No, for now, NBD requires flush to be sent with parameters offset=0 length=0 for flushing the entire disk. Non-zero parameters for flushing only a range of the disk is reserved for future expansion, if someone actually has a use case for it. nbdkit doesn't expose ranges to the .flush callback. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- Re: [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
- Re: [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
- v2v: -o rhv-upload - oVirt imageio random I/O APIs
- Re: [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.
- [PATCH] v2v: -o rhv-upload: Optimize http request sending