thr3ads.net - Libguestfs - Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode. [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Nir Soffer

2018-Mar-12 12:13 UTC

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

On Mon, Mar 12, 2018 at 12:32 PM Richard W.M. Jones <rjones@redhat.com>
wrote:
> On Mon, Mar 12, 2018 at 07:13:52AM +0000, Nir Soffer wrote:
> > On Fri, Mar 9, 2018 at 4:25 PM Richard W.M. Jones
<rjones@redhat.com>
> wrote:
> >
> > > It has to be said it would be really convenient to have a
'zero'
> > > and/or 'trim' method of some sort.
> > >
> >
> > 'trim' means discard?
>
> Yes.  The 5 functions we could support are:
>
> * pread  - done
> * pwrite - done
> * flush  - does fdatasync(2) on the block device
>
Currently we do fsync() on every PUT request, so flush is not very
useful.

> * zero   - write a range of zeroes without having to send zeroes
> * trim   - punch hole, can be emulated using zero if not possile
>
Makes sense.

> Also (not implemented in nbdkit today, but coming soon), pwrite, zero
> and trim can be extended with a FUA (force unit access) flag, which
> would mean that the range should be persisted to disk before
> returning.  It can be emulated by calling flush after the operation.
It wasn't clear if anything in this process flushes the content
to> disk.  Is that what transfer.finalize does?
>
All PUT requests fsync() before returning. We optimize for complete image
trasfer, not for random io.
> Currently we cannot support discard on block storage since ovirt may
> > need to wipe lvs when deleting a disk, and discarding may leave
> > unwiped user data. This may change in 4.3 if we switch to wipe on
> > creation instead of wipe after delete.
>
> I think this depends on BLKDISCARDZEROES[1] for the block device?  Of
> course if you're worried about data remanence for someone who has
> access to the physical device then that wouldn't be enough.
>
BLKDISCARDZEROES was never reliable and it was removed from the
kernel recently. Please check https://patchwork.kernel.org/patch/9903757/
We are not relying on it since ovirt 4.2, hopefully also in 4.1.

> [1]
>
https://rwmj.wordpress.com/2014/03/11/blkdiscard-blkzeroout-blkdiscardzeroes-blksecdiscard/
>
> > POST /images/ticket-id ...
> > ...
> > {
> >     "op": "zero",
> >     "offset": X,
> >     "size": Y
> > }
> >
> > I would like to support only aligned offset and size - do you think it
> > should work
> > for qemu-img?
>
> It depends a bit on what you mean by "aligned" and what the
alignment
> is.  We'd probably have to work around it in the plugin so that it can
> round in the request, issues a zero operation for the aligned part,
> and writes zeroes at each end.  There's no guarantee that qemu-img
> will be well-behaved in the future even if it is now.
>
Aligned for direct I/O (we use direct I/O for read/write). We can support
non-aligned ranges by doing similar emulation in the server, but I prefer
to do
it only if we have such requirement. If you need to do this in the client,
we
probably need to do this in the server otherwise all clients may need to
emulate this.

I think there is no reason that qemu-img will zero unaligned ranges, but
I guess Eric can have a better answer.

> Anyway this sounds do-able, is it something we can get for 4.2?
>
I think it is for 4.2.z.

Is zero support in the daemon and proxy enough, or we need the other
options now?

> How will we detect that the server supports it?
>
Because we don't support OPTIONS yet, the only way is to send a POST
request and checking for 405 response.

Nir

Richard W.M. Jones

2018-Mar-12 12:25 UTC

head link

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

On Mon, Mar 12, 2018 at 12:13:08PM +0000, Nir Soffer
wrote:> I think it is for 4.2.z.
> 
> Is zero support in the daemon and proxy enough, or we need the other
> options now?
zero support is all we really need right now.  I'll turn trim requests
into zero requests.  We don't need flush or FUA support as you've
pointed out.
> > How will we detect that the server supports it?
> >
> 
> Because we don't support OPTIONS yet, the only way is to send a POST
> request and checking for 405 response.
If it's going to be in 4.2(-z) then we can just assume it's there.

Let me know when there are packages I can test.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Eric Blake

2018-Mar-12 12:57 UTC

head link

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

On 03/12/2018 07:13 AM, Nir Soffer wrote:> On Mon, Mar 12, 2018 at 12:32 PM Richard W.M. Jones
<rjones@redhat.com>
> wrote:
> 
>> On Mon, Mar 12, 2018 at 07:13:52AM +0000, Nir Soffer wrote:
>>> On Fri, Mar 9, 2018 at 4:25 PM Richard W.M. Jones
<rjones@redhat.com>
>> wrote:
>>>
>>>> It has to be said it would be really convenient to have a
'zero'
>>>> and/or 'trim' method of some sort.
>>>>
>>>
>>> 'trim' means discard?
>>
>> Yes.  The 5 functions we could support are:
>>
>> * pread  - done
>> * pwrite - done
>> * flush  - does fdatasync(2) on the block device
>>
> 
> Currently we do fsync() on every PUT request, so flush is not very
> useful.
> 
> 
>> * zero   - write a range of zeroes without having to send zeroes
>> * trim   - punch hole, can be emulated using zero if not possile
>>
trim is advisory in NBD, so it can also be emulated as a no-op while 
still having correct semantics.  If you want to guarantee reading back 
zeroes after punching a hole, you have to use zero instead of trim.
> 
>> Also (not implemented in nbdkit today, but coming soon), pwrite, zero
>> and trim can be extended with a FUA (force unit access) flag, which
>> would mean that the range should be persisted to disk before
>> returning.  It can be emulated by calling flush after the operation.
> 
> It wasn't clear if anything in this process flushes the content to
>> disk.  Is that what transfer.finalize does?
>>
> 
> All PUT requests fsync() before returning. We optimize for complete image
> trasfer, not for random io.
In other words, you are already implicitly behaving as if FUA is already 
set on every single request.  It might be less efficient than what you 
could otherwise achieve, but it's fine if consistency is more important 
than speed.

>>> I would like to support only aligned offset and size - do you think
it
>>> should work
>>> for qemu-img?
>>
>> It depends a bit on what you mean by "aligned" and what the
alignment
>> is.  We'd probably have to work around it in the plugin so that it
can
>> round in the request, issues a zero operation for the aligned part,
>> and writes zeroes at each end.  There's no guarantee that qemu-img
>> will be well-behaved in the future even if it is now.
qemu-img in general tries to send sector-aligned data by default (it's 
unusual that qemu tries to access less than that at once).  In 2.11, 
qemu-io can be made to send byte-aligned requests across any NBD 
connection; in 2.12, it's tightened so that NBD requests are 
sector-aligned unless the server advertised support for byte-aligned 
requests (nbdkit does not yet advertise this).  As a client, qemu-io 
will then manually write zeroes to any unaligned portion (if there are 
any), and use the actual zero request for the aligned middle.
>>
> 
> Aligned for direct I/O (we use direct I/O for read/write). We can support
> non-aligned ranges by doing similar emulation in the server, but I prefer
> to do
> it only if we have such requirement. If you need to do this in the client,
> we
> probably need to do this in the server otherwise all clients may need to
> emulate this.
> 
> I think there is no reason that qemu-img will zero unaligned ranges, but
> I guess Eric can have a better answer.
Yeah, for now, you are probably safe assuming that qemu-img will never 
send unaligned ranges.  You're also correct that not all NBD servers 
support read-modify-write at unaligned boundaries, so well-behaved 
clients have to implement it themselves; while conversely not all 
clients are well-behaved so good NBD servers have to implement it - 
which is a duplication of effort since both sides of the equation have 
to worry about it when they want maximum cross-implementation 
portability.  But that's life.

And my pending patches for FUA support in nbdkit also add a 
--filter=blocksize, which at least lets nbdkit guarantee aligned data 
handed to the plugin even when the client is not well-behaved.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Nir Soffer

2018-Mar-14 18:56 UTC

head link

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

On Mon, Mar 12, 2018 at 2:57 PM Eric Blake <eblake@redhat.com> wrote:
> On 03/12/2018 07:13 AM, Nir Soffer wrote:
> > On Mon, Mar 12, 2018 at 12:32 PM Richard W.M. Jones
<rjones@redhat.com>
> > wrote:
> >
> >> On Mon, Mar 12, 2018 at 07:13:52AM +0000, Nir Soffer wrote:
> >>> On Fri, Mar 9, 2018 at 4:25 PM Richard W.M. Jones
<rjones@redhat.com>
> >> wrote:
> >>>
> >>>> It has to be said it would be really convenient to have a
'zero'
> >>>> and/or 'trim' method of some sort.
> >>>>
> >>>
> >>> 'trim' means discard?
> >>
> >> Yes.  The 5 functions we could support are:
> >>
> >> * pread  - done
> >> * pwrite - done
> >> * flush  - does fdatasync(2) on the block device
> >>
> >
> > Currently we do fsync() on every PUT request, so flush is not very
> > useful.
> >
> >
> >> * zero   - write a range of zeroes without having to send zeroes
> >> * trim   - punch hole, can be emulated using zero if not possile
> >>
>
> trim is advisory in NBD, so it can also be emulated as a no-op while
> still having correct semantics.  If you want to guarantee reading back
> zeroes after punching a hole, you have to use zero instead of trim.
>
> >
> >> Also (not implemented in nbdkit today, but coming soon), pwrite,
zero
> >> and trim can be extended with a FUA (force unit access) flag,
which
> >> would mean that the range should be persisted to disk before
> >> returning.  It can be emulated by calling flush after the
operation.
> >
> > It wasn't clear if anything in this process flushes the content to
> >> disk.  Is that what transfer.finalize does?
> >>
> >
> > All PUT requests fsync() before returning. We optimize for complete
image
> > trasfer, not for random io.
>
> In other words, you are already implicitly behaving as if FUA is already
> set on every single request.  It might be less efficient than what you
> could otherwise achieve, but it's fine if consistency is more important
> than speed.
>
>
> >>> I would like to support only aligned offset and size - do you
think it
> >>> should work
> >>> for qemu-img?
> >>
> >> It depends a bit on what you mean by "aligned" and what
the alignment
> >> is.  We'd probably have to work around it in the plugin so
that it can
> >> round in the request, issues a zero operation for the aligned
part,
> >> and writes zeroes at each end.  There's no guarantee that
qemu-img
> >> will be well-behaved in the future even if it is now.
>
> qemu-img in general tries to send sector-aligned data by default (it's
> unusual that qemu tries to access less than that at once).  In 2.11,
> qemu-io can be made to send byte-aligned requests across any NBD
> connection; in 2.12, it's tightened so that NBD requests are
> sector-aligned unless the server advertised support for byte-aligned
> requests (nbdkit does not yet advertise this).  As a client, qemu-io
> will then manually write zeroes to any unaligned portion (if there are
> any), and use the actual zero request for the aligned middle.
>
> >>
> >
> > Aligned for direct I/O (we use direct I/O for read/write). We can
support
> > non-aligned ranges by doing similar emulation in the server, but I
prefer
> > to do
> > it only if we have such requirement. If you need to do this in the
> client,
> > we
> > probably need to do this in the server otherwise all clients may need
to
> > emulate this.
> >
> > I think there is no reason that qemu-img will zero unaligned ranges,
but
> > I guess Eric can have a better answer.
>
> Yeah, for now, you are probably safe assuming that qemu-img will never
> send unaligned ranges.  You're also correct that not all NBD servers
> support read-modify-write at unaligned boundaries, so well-behaved
> clients have to implement it themselves; while conversely not all
> clients are well-behaved so good NBD servers have to implement it -
> which is a duplication of effort since both sides of the equation have
> to worry about it when they want maximum cross-implementation
> portability.  But that's life.
>
> And my pending patches for FUA support in nbdkit also add a
> --filter=blocksize, which at least lets nbdkit guarantee aligned data
> handed to the plugin even when the client is not well-behaved.
>
Thanks for the good input!

I posted documentation for the new API optimized for random I/O:
https://gerrit.ovirt.org/#/c/89022/

I changed POST to PATCH to match the existing /tickets API, and
this also seems to be more standard way to do such operations.

Please check and comment if this makes sense and serves the v2v
use case or other uses case we missed.

I think we can implement all of this for 4.2.4, but:

- using simple zero loop, as in https://gerrit.ovirt.org/#/c/88793/.
  later we can make it more efficient.
- trim is a noop, maybe we will be able to support it in 4.3
- flush - may be noop now (all requests will implicitly flush).

I think we better have complete API with partial or simpler
implementation now, to minimize the hacks needed in v2v and
other clients.

Nir

Reasonably Related Threads

Search for more reasonably related threads

Libguestfs - Mar 2018 - Re: [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

Re: [Libguestfs] [PATCH v4 0/3] v2v: Add -o rhv-upload output mode.

Reasonably Related Threads