On 6/25/19 8:14 AM, Richard W.M. Jones wrote:> On Tue, Jun 25, 2019 at 02:58:34PM +0200, Martin Kletzander wrote: >> Here are few things I found out when using libnbd that might be perfectly fine >> or maybe just an oversight, but I wanted to point them out. It's nothing major. >> >> When running a program with `nbdkit -U - --run ...`, the $nbd parameter gets >> expanded to nbd:unix:/path/to/socket. When this string is passed to >> nbd_connect_uri(), it does not return an error (even though it is not a valid >> URL), but what's more it treats it as "nbd://localhost", which might be a completely different server (that actually happened to me and it was kind of confusing). > > There's a bit of magic in nbdkit about $nbd. It also predates our > standardization effort for NBD URLs which is going on upstream. > > That needs to be fixed, but in the meantime just use $unixsocket > instead (see nbdkit-captive(1) man page for the full details).In the meantime, I don't mind working on the quick fix for rejecting an invalid URI.> >> When nbd_block_status() callback returns -1, the nbd_block_status() function >> reports: >> >> command failed: Protocol error (errno = 71) >> >> which is a bit confusing to me. I could be nicer to have that report that it >> really was the callback that caused this. > > Not sure about this one - Eric?Right now, a failed callback will cause the overall nbd_block_status() to fail with whatever errno the callback left behind, with EPROTO as the default. I'm guessing you didn't set an errno when you returned -1? Is there a better default we should use than EPROTO? Is there really a way to diagnose without an errno value that the callback failed but didn't set errno? There's also the related question - my patches to add nbd_pread_structured are set up to call the callback for as many chunks as the server sends, even if an earlier invocation of the callback fails, but to still preserve the errno of the first failed callback (again defaulting to EPROTO). Should nbd_block_status also call the callback for all chunks from the server, rather than stopping the use of further callbacks once the first callback fails, if only for consistency between the two?>> One last thing is that I could not find any definition for the flags >> of "base:allocation" metadata (NBD_STATE_HOLE and NBD_STATE_ZERO). >> It might just be things that are still missing due to an early stage >> of libnbd, or it might not belong there or it might be me misusing >> something. > > Have a look at the NBD protocol doc: > > https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.mdWe could add convenience #defines into libnbd.h if needed, which would work for the base:allocation context, but it would not help any other context (such as qemu:dirty-bitmap:FOO) which has other definitions for the returned bits. But the fact that you have to call nbd_add_meta_context(h, "base:allocation") is also awkward; if we add convenience macros for the results, we probably also want to support nbd_add_meta_context(h, LIBNBD_CONTEXT_BASE_ALLOCATION). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Tue, Jun 25, 2019 at 09:25:51AM -0500, Eric Blake wrote:>On 6/25/19 8:14 AM, Richard W.M. Jones wrote: >> On Tue, Jun 25, 2019 at 02:58:34PM +0200, Martin Kletzander wrote: >>> Here are few things I found out when using libnbd that might be perfectly fine >>> or maybe just an oversight, but I wanted to point them out. It's nothing major. >>> >>> When running a program with `nbdkit -U - --run ...`, the $nbd parameter gets >>> expanded to nbd:unix:/path/to/socket. When this string is passed to >>> nbd_connect_uri(), it does not return an error (even though it is not a valid >>> URL), but what's more it treats it as "nbd://localhost", which might be a completely different server (that actually happened to me and it was kind of confusing). >> >> There's a bit of magic in nbdkit about $nbd. It also predates our >> standardization effort for NBD URLs which is going on upstream. >> >> That needs to be fixed, but in the meantime just use $unixsocket >> instead (see nbdkit-captive(1) man page for the full details). >I forgot to say I solved all the issues mentioned here. I was just worried that there was no error and it lead me to a different NBD server.>In the meantime, I don't mind working on the quick fix for rejecting an >invalid URI. > >> >>> When nbd_block_status() callback returns -1, the nbd_block_status() function >>> reports: >>> >>> command failed: Protocol error (errno = 71) >>> >>> which is a bit confusing to me. I could be nicer to have that report that it >>> really was the callback that caused this. >> >> Not sure about this one - Eric? > >Right now, a failed callback will cause the overall nbd_block_status() >to fail with whatever errno the callback left behind, with EPROTO as the >default. I'm guessing you didn't set an errno when you returned -1? Is >there a better default we should use than EPROTO? Is there really a way >to diagnose without an errno value that the callback failed but didn't >set errno? >I haven't looked how the error messages are stored, but if is says that the callback returned negative value in nbd_get_error(), then I don't need anything else.>There's also the related question - my patches to add >nbd_pread_structured are set up to call the callback for as many chunks >as the server sends, even if an earlier invocation of the callback >fails, but to still preserve the errno of the first failed callback >(again defaulting to EPROTO). Should nbd_block_status also call the >callback for all chunks from the server, rather than stopping the use of >further callbacks once the first callback fails, if only for consistency >between the two? >I can't decide. I think there should be a way to say you don't want the callback to be called again. I can imagine a callback which relies on that (failed to allocate data for an internal structure and now the integrity is broken or something).>>> One last thing is that I could not find any definition for the flags >>> of "base:allocation" metadata (NBD_STATE_HOLE and NBD_STATE_ZERO). >>> It might just be things that are still missing due to an early stage >>> of libnbd, or it might not belong there or it might be me misusing >>> something. >> >> Have a look at the NBD protocol doc: >> >> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.mdI know, I took the values from there and #define'd them in my code. But it would be nicer if this was defined in the provided header file. This leads me to another value exposed only in an error message, the maximum request sizes for each call. And that leads me to nbd_pread() saying the maximum request size is 2^32-1, but if I use that I see (in nbdkit logs) it requests with count=2^32 (probably an alignment) and then hits an assertion inside the nbdkit code: nbdkit: protocol.c:528: extents_to_block_descriptors: Assertion `e.length <= length' failed.> >We could add convenience #defines into libnbd.h if needed, which would >work for the base:allocation context, but it would not help any other >context (such as qemu:dirty-bitmap:FOO) which has other definitions for >the returned bits. But the fact that you have to call >nbd_add_meta_context(h, "base:allocation") is also awkward; if we add >convenience macros for the results, we probably also want to support >nbd_add_meta_context(h, LIBNBD_CONTEXT_BASE_ALLOCATION). >That is what I was thinking about. Standardized metadata contexts could be "supported" in a way that the values are #defined in libnbd.h.>-- >Eric Blake, Principal Software Engineer >Red Hat, Inc. +1-919-301-3226 >Virtualization: qemu.org | libvirt.org >>_______________________________________________ >Libguestfs mailing list >Libguestfs@redhat.com >https://www.redhat.com/mailman/listinfo/libguestfs
On 6/25/19 9:25 AM, Eric Blake wrote:>>> When running a program with `nbdkit -U - --run ...`, the $nbd parameter gets >>> expanded to nbd:unix:/path/to/socket. When this string is passed to >>> nbd_connect_uri(), it does not return an error (even though it is not a valid >>> URL), but what's more it treats it as "nbd://localhost", which might be a completely different server (that actually happened to me and it was kind of confusing). >> >> There's a bit of magic in nbdkit about $nbd. It also predates our >> standardization effort for NBD URLs which is going on upstream. >> >> That needs to be fixed, but in the meantime just use $unixsocket >> instead (see nbdkit-captive(1) man page for the full details). > > In the meantime, I don't mind working on the quick fix for rejecting an > invalid URI.Done.> >> >>> When nbd_block_status() callback returns -1, the nbd_block_status() function >>> reports: >>> >>> command failed: Protocol error (errno = 71) >>> >>> which is a bit confusing to me. I could be nicer to have that report that it >>> really was the callback that caused this. >> >> Not sure about this one - Eric? > > Right now, a failed callback will cause the overall nbd_block_status() > to fail with whatever errno the callback left behind, with EPROTO as the > default. I'm guessing you didn't set an errno when you returned -1? Is > there a better default we should use than EPROTO? Is there really a way > to diagnose without an errno value that the callback failed but didn't > set errno? >My pending patch to take advantage of Mutable(Int) in the callback, makes this a bit nicer, as you can then control the error status by assigning into the parameter rather than by what got left in errno.> There's also the related question - my patches to add > nbd_pread_structured are set up to call the callback for as many chunks > as the server sends, even if an earlier invocation of the callback > fails, but to still preserve the errno of the first failed callback > (again defaulting to EPROTO). Should nbd_block_status also call the > callback for all chunks from the server, rather than stopping the use of > further callbacks once the first callback fails, if only for consistency > between the two?So now I'm leaning towards making that change, particularly since I'm also working on adding a nbd_aio_FOO_notify() for each command that takes a callback that also takes an error pointer.> >>> One last thing is that I could not find any definition for the flags >>> of "base:allocation" metadata (NBD_STATE_HOLE and NBD_STATE_ZERO). >>> It might just be things that are still missing due to an early stage >>> of libnbd, or it might not belong there or it might be me misusing >>> something. >> >> Have a look at the NBD protocol doc: >> >> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md > > We could add convenience #defines into libnbd.h if needed, which would > work for the base:allocation context, but it would not help any other > context (such as qemu:dirty-bitmap:FOO) which has other definitions for > the returned bits. But the fact that you have to call > nbd_add_meta_context(h, "base:allocation") is also awkward; if we add > convenience macros for the results, we probably also want to support > nbd_add_meta_context(h, LIBNBD_CONTEXT_BASE_ALLOCATION).Still needs to be done. Another thing I just noticed: nbd_poll returns 0 both for timeout and for when it notified the state machine about POLLIN or POLLOUT making progress. We probably want to switch it to return 0 on timeout and 1 when it made progress, so that a user can detect timeout (all our internal callers pass -1 for the timeout parameter, and thus can't time out, but external callers may care). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Wed, Jun 26, 2019 at 09:33:04PM -0500, Eric Blake wrote:> Another thing I just noticed: nbd_poll returns 0 both for timeout and > for when it notified the state machine about POLLIN or POLLOUT making > progress. We probably want to switch it to return 0 on timeout and 1 > when it made progress, so that a user can detect timeout (all our > internal callers pass -1 for the timeout parameter, and thus can't time > out, but external callers may care).Agreed. And another thing you said on IRC:> 04:26 < eblake> urgh - libnbd is prone to die from SIGPIPE, but as a library, I don't know if we can ignore it > 04:26 < eblake> should we be using pthread_sigmask and blocking SIGPIPE around our reads, and restoring it to the calling application's default the rest of the > time? > 04:28 < eblake> ooh - maybe we can rely on MSG_NOSIGNAL :) > 04:28 < eblake> that's a lot cleaner, even if it is Linux-specificMSG_NOSIGNAL is also available on FreeBSD and seems to make a lot of sense here, given that as you say changing the signal handler isn't a good idea from a library. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
On Wed, Jun 26, 2019 at 09:33:04PM -0500, Eric Blake wrote:>On 6/25/19 9:25 AM, Eric Blake wrote: > >>>> When running a program with `nbdkit -U - --run ...`, the $nbd parameter gets >>>> expanded to nbd:unix:/path/to/socket. When this string is passed to >>>> nbd_connect_uri(), it does not return an error (even though it is not a valid >>>> URL), but what's more it treats it as "nbd://localhost", which might be a completely different server (that actually happened to me and it was kind of confusing). >>> >>> There's a bit of magic in nbdkit about $nbd. It also predates our >>> standardization effort for NBD URLs which is going on upstream. >>> >>> That needs to be fixed, but in the meantime just use $unixsocket >>> instead (see nbdkit-captive(1) man page for the full details). >> >> In the meantime, I don't mind working on the quick fix for rejecting an >> invalid URI. > >Done. > >> >>> >>>> When nbd_block_status() callback returns -1, the nbd_block_status() function >>>> reports: >>>> >>>> command failed: Protocol error (errno = 71) >>>> >>>> which is a bit confusing to me. I could be nicer to have that report that it >>>> really was the callback that caused this. >>> >>> Not sure about this one - Eric? >> >> Right now, a failed callback will cause the overall nbd_block_status() >> to fail with whatever errno the callback left behind, with EPROTO as the >> default. I'm guessing you didn't set an errno when you returned -1? Is >> there a better default we should use than EPROTO? Is there really a way >> to diagnose without an errno value that the callback failed but didn't >> set errno? >> > >My pending patch to take advantage of Mutable(Int) in the callback, >makes this a bit nicer, as you can then control the error status by >assigning into the parameter rather than by what got left in errno. > >> There's also the related question - my patches to add >> nbd_pread_structured are set up to call the callback for as many chunks >> as the server sends, even if an earlier invocation of the callback >> fails, but to still preserve the errno of the first failed callback >> (again defaulting to EPROTO). Should nbd_block_status also call the >> callback for all chunks from the server, rather than stopping the use of >> further callbacks once the first callback fails, if only for consistency >> between the two? > >So now I'm leaning towards making that change, particularly since I'm >also working on adding a nbd_aio_FOO_notify() for each command that >takes a callback that also takes an error pointer. > >> >>>> One last thing is that I could not find any definition for the flags >>>> of "base:allocation" metadata (NBD_STATE_HOLE and NBD_STATE_ZERO). >>>> It might just be things that are still missing due to an early stage >>>> of libnbd, or it might not belong there or it might be me misusing >>>> something. >>> >>> Have a look at the NBD protocol doc: >>> >>> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md >> >> We could add convenience #defines into libnbd.h if needed, which would >> work for the base:allocation context, but it would not help any other >> context (such as qemu:dirty-bitmap:FOO) which has other definitions for >> the returned bits. But the fact that you have to call >> nbd_add_meta_context(h, "base:allocation") is also awkward; if we add >> convenience macros for the results, we probably also want to support >> nbd_add_meta_context(h, LIBNBD_CONTEXT_BASE_ALLOCATION). > >Still needs to be done. >Oh, it it's just this, then I can add that, at least this is not too deep inside the machinery that I'd have to go through everything to understand it properly. I sent a simple patch that adds it, we can discuss that there.>Another thing I just noticed: nbd_poll returns 0 both for timeout and >for when it notified the state machine about POLLIN or POLLOUT making >progress. We probably want to switch it to return 0 on timeout and 1 >when it made progress, so that a user can detect timeout (all our >internal callers pass -1 for the timeout parameter, and thus can't time >out, but external callers may care). >I haven't gotten to using async APIs yet. But resembling poll(2) seems reasonable.>-- >Eric Blake, Principal Software Engineer >Red Hat, Inc. +1-919-301-3226 >Virtualization: qemu.org | libvirt.org >