Laszlo Ersek
2023-Mar-15 11:01 UTC
[Libguestfs] [libnbd PATCH v4 1/3] lib/utils: introduce xwritel() as a more robust and convenient write()
While nbd_internal_fork_safe_perror() must indeed call write(), and arguably justifiedly ignores the return value of write(), we can still make the write operations slightly more robust and convenient. Let's do that by introducing xwritel(): - Let the caller pass a list of NUL-terminated strings, via stdarg / ellipsis notation in the parameter list. - Handle partial writes. - Cope with EINTR and EAGAIN errors. (A busy loop around EAGAIN on a non-blocking file is not great in the general case, but it's good enough for writing out diagnostics before giving up.) - In the common case, handle an nbd_internal_fork_safe_perror() call with a single xwritel() -> writev() call chain, rather than with four separate write() calls. In practice, this tends to make the error message written to a regular file contiguous, even if other threads are writing to the same file. Multiple separate write() calls tend to interleave chunks of data from different threads. As a side bonus, remove the path in nbd_internal_fork_safe_perror() where at least one of the first two write() syscalls fails, and overwrites "errno", before we get to formatting the error string from "errno". Thanks to Eric Blake for helping me understand the scope of Austin Group bug reports. Signed-off-by: Laszlo Ersek <lersek at redhat.com> --- Notes: v4: - Rework with <stdarg.h> and writev(). - Don't split the output into chunks of SSIZE_MAX bytes. In v3, the goal of that chunking was to avoid implementation-defined behavior. However, POSIX requires writev() to fail cleanly when more than SSIZE_MAX bytes would be transfered in a single call. Hence the original goal (avoiding implementation-defined behavior) is ensured simply by switching to writev(). The SSIZE_MAX limit is not expected to be hit in practice (_POSIX_SSIZE_MAX is 32767). - As a "bonus", remove the pre-patch possibility to trample "errno" before formatting the error string. - Refresh the commit message. - The "contiguous output" from a single xwritel() -> writev() call (as opposed to the "interleaved output" from multiple xwrite() -> write() calls in v3) is easily testable in practice (on my end) with the following small patch, even though this "contiguity" is of course not guaranteed: > diff --git a/generator/states-connect-socket-activation.c b/generator/states-connect-socket-activation.c > index e4b3b565ae2e..c66c638d331f 100644 > --- a/generator/states-connect-socket-activation.c > +++ b/generator/states-connect-socket-activation.c > @@ -179,6 +179,8 @@ CONNECT_SA.START: > * socket). > */ > int flags = fcntl (s, F_GETFD, 0); > + flags = -1; > + errno = EBADF; > if (flags == -1) { > nbd_internal_fork_safe_perror ("fcntl: F_GETFD"); > _exit (126); It results in the following snippet in "tests/connect-systemd-socket-activation.log": > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: CONNECT.CONNECTING -> MAGIC.START > fcntl: F_GETFD: Bad file descriptor > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: MAGIC.START -> MAGIC.RECV_MAGIC Note that the child process's output is on an isolated line. - Do not pick up R-b tags from Eric and Rich due to significant changes in v4. context:-U5 lib/utils.c | 87 +++++++++++++++++++- 1 file changed, 83 insertions(+), 4 deletions(-) diff --git a/lib/utils.c b/lib/utils.c index 6df4f14ce9f4..62b4bfdda5c3 100644 --- a/lib/utils.c +++ b/lib/utils.c @@ -23,11 +23,14 @@ #include <string.h> #include <unistd.h> #include <ctype.h> #include <errno.h> #include <fcntl.h> +#include <stdarg.h> +#include <sys/uio.h> +#include "array-size.h" #include "minmax.h" #include "internal.h" void @@ -179,33 +182,109 @@ nbd_internal_fork_safe_itoa (long v, char *buf, size_t bufsize) #if defined (__GNUC__) #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wunused-result" #endif +/* "Best effort" function for writing out a list of NUL-terminated strings to a + * file descriptor (without the NUL-terminators). The list is terminated with + * (char *)NULL. Partial writes, and EINTR and EAGAIN failures are handled + * internally. No value is returned; only call this function for writing + * diagnostic data on error paths, when giving up on a higher-level action + * anyway. + * + * No more than 16 strings, excluding the NULL terminator, will be written. (As + * of POSIX Issue 7 + TC2, _XOPEN_IOV_MAX is 16.) + * + * The function is supposed to remain async-signal-safe. + * + * (The va_*() macros, while not marked async-signal-safe in Issue 7 + TC2, are + * considered such, per <https://www.austingroupbugs.net/view.php?id=711>, which + * is binding for Issue 7 implementations via the Interpretations Track. + * + * Furthermore, writev(), while also not marked async-signal-safe in Issue 7 + + * TC2, is considered such, per + * <https://www.austingroupbugs.net/view.php?id=1455>, which is slated for + * inclusion in Issue 7 TC3 (if there's going to be a TC3), and in Issue 8.) + */ +static void +xwritel (int fildes, ...) +{ + /* Open-code the current value of _XOPEN_IOV_MAX, in order to contain stack + * footprint, should _XOPEN_IOV_MAX grow in the future. + */ + struct iovec iovec[16], *filled, *end, *pos; + va_list ap; + char *arg; + + /* Translate the variable argument list to IO vectors. Note that we cast away + * const-ness intentionally. + */ + filled = iovec; + end = iovec + ARRAY_SIZE (iovec); + va_start (ap, fildes); + while (filled < end && (arg = va_arg (ap, char *)) != NULL) + *filled++ = (struct iovec){ .iov_base = arg, .iov_len = strlen (arg) }; + va_end (ap); + + /* Write out the IO vectors. */ + pos = iovec; + while (pos < filled) { + ssize_t written; + + /* Skip any empty vectors at the front. */ + if (pos->iov_len == 0) { + ++pos; + continue; + } + + /* Write out the vectors. */ + do + written = writev (fildes, pos, filled - pos); + while (written == -1 && (errno == EINTR || errno == EAGAIN)); + + if (written == -1) + return; + + /* Consume the vectors that have been written out (fully, or in part). Note + * that "written" is positive here. + */ + do { + size_t advance; + + advance = MIN (written, pos->iov_len); + /* Note that "advance" is positive here iff "pos->iov_len" is positive. */ + pos->iov_base = (char *)pos->iov_base + advance; + pos->iov_len -= advance; + written -= advance; + + /* At least one of "written" and "pos->iov_len" is zero here. */ + if (pos->iov_len == 0) + ++pos; + } while (written > 0); + } +} + /* Fork-safe version of perror. ONLY use this after fork and before * exec, the rest of the time use set_error(). */ void nbd_internal_fork_safe_perror (const char *s) { const int err = errno; const char *m = NULL; char buf[32]; - write (2, s, strlen (s)); - write (2, ": ", 2); #ifdef HAVE_STRERRORDESC_NP m = strerrordesc_np (errno); #else #if HAVE_SYS_ERRLIST /* NB Don't use #ifdef */ m = errno >= 0 && errno < sys_nerr ? sys_errlist[errno] : NULL; #endif #endif if (!m) m = nbd_internal_fork_safe_itoa ((long) errno, buf, sizeof buf); - write (2, m, strlen (m)); - write (2, "\n", 1); + xwritel (STDERR_FILENO, s, ": ", m, "\n", (char *)NULL); /* Restore original errno in case it was disturbed by the system * calls above. */ errno = err;
Eric Blake
2023-Mar-15 14:01 UTC
[Libguestfs] [libnbd PATCH v4 1/3] lib/utils: introduce xwritel() as a more robust and convenient write()
On Wed, Mar 15, 2023 at 12:01:55PM +0100, Laszlo Ersek wrote:> While nbd_internal_fork_safe_perror() must indeed call write(), and > arguably justifiedly ignores the return value of write(), we can still > make the write operations slightly more robust and convenient. Let's do > that by introducing xwritel(): > > - Let the caller pass a list of NUL-terminated strings, via stdarg / > ellipsis notation in the parameter list. > > - Handle partial writes. > > - Cope with EINTR and EAGAIN errors. (A busy loop around EAGAIN on a > non-blocking file is not great in the general case, but it's good enough > for writing out diagnostics before giving up.) > > - In the common case, handle an nbd_internal_fork_safe_perror() call with > a single xwritel() -> writev() call chain, rather than with four > separate write() calls. In practice, this tends to make the error > message written to a regular file contiguous, even if other threads are > writing to the same file. Multiple separate write() calls tend to > interleave chunks of data from different threads. > > As a side bonus, remove the path in nbd_internal_fork_safe_perror() where > at least one of the first two write() syscalls fails, and overwrites > "errno", before we get to formatting the error string from "errno".All nice benefits, even if we don't normally exercise the code.> > Thanks to Eric Blake for helping me understand the scope of Austin Group > bug reports. > > Signed-off-by: Laszlo Ersek <lersek at redhat.com> > --- > > Notes: > v4: > > - Rework with <stdarg.h> and writev(). > > - Don't split the output into chunks of SSIZE_MAX bytes. In v3, the goal > of that chunking was to avoid implementation-defined behavior. > However, POSIX requires writev() to fail cleanly when more than > SSIZE_MAX bytes would be transfered in a single call. Hence the > original goal (avoiding implementation-defined behavior) is ensured > simply by switching to writev(). The SSIZE_MAX limit is not expected > to be hit in practice (_POSIX_SSIZE_MAX is 32767).Concur.> > - As a "bonus", remove the pre-patch possibility to trample "errno" > before formatting the error string.Nice find; and one I missed in my earlier review.> > - Refresh the commit message. > > - The "contiguous output" from a single xwritel() -> writev() call (as > opposed to the "interleaved output" from multiple xwrite() -> write() > calls in v3) is easily testable in practice (on my end) with the > following small patch, even though this "contiguity" is of course not > guaranteed: > > > diff --git a/generator/states-connect-socket-activation.c b/generator/states-connect-socket-activation.c > > index e4b3b565ae2e..c66c638d331f 100644 > > --- a/generator/states-connect-socket-activation.c > > +++ b/generator/states-connect-socket-activation.c > > @@ -179,6 +179,8 @@ CONNECT_SA.START: > > * socket). > > */ > > int flags = fcntl (s, F_GETFD, 0); > > + flags = -1; > > + errno = EBADF; > > if (flags == -1) { > > nbd_internal_fork_safe_perror ("fcntl: F_GETFD"); > > _exit (126); > > It results in the following snippet in > "tests/connect-systemd-socket-activation.log": > > > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: CONNECT.CONNECTING -> MAGIC.START > > fcntl: F_GETFD: Bad file descriptor > > libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition: MAGIC.START -> MAGIC.RECV_MAGIC > > Note that the child process's output is on an isolated line.Without the patch, it had a high likelihood (but not guarantee) of interleaving. And writev() is not a bulletproof avoidance of interleaving (if we hit a short write and retry the tail, interleaving is possible) - but we are very unlikely to see that in practice.> > - Do not pick up R-b tags from Eric and Rich due to significant changes > in v4. > > context:-U5 > > lib/utils.c | 87 +++++++++++++++++++- > 1 file changed, 83 insertions(+), 4 deletions(-) > > diff --git a/lib/utils.c b/lib/utils.c > index 6df4f14ce9f4..62b4bfdda5c3 100644 > --- a/lib/utils.c > +++ b/lib/utils.c > @@ -23,11 +23,14 @@ > #include <string.h> > #include <unistd.h> > #include <ctype.h> > #include <errno.h> > #include <fcntl.h> > +#include <stdarg.h> > +#include <sys/uio.h> > > +#include "array-size.h" > #include "minmax.h" > > #include "internal.h" > > void > @@ -179,33 +182,109 @@ nbd_internal_fork_safe_itoa (long v, char *buf, size_t bufsize) > #if defined (__GNUC__) > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wunused-result" > #endif > > +/* "Best effort" function for writing out a list of NUL-terminated strings to a > + * file descriptor (without the NUL-terminators). The list is terminated with > + * (char *)NULL. Partial writes, and EINTR and EAGAIN failures are handled > + * internally. No value is returned; only call this function for writing > + * diagnostic data on error paths, when giving up on a higher-level action > + * anyway. > + * > + * No more than 16 strings, excluding the NULL terminator, will be written. (As > + * of POSIX Issue 7 + TC2, _XOPEN_IOV_MAX is 16.) > + * > + * The function is supposed to remain async-signal-safe. > + * > + * (The va_*() macros, while not marked async-signal-safe in Issue 7 + TC2, are > + * considered such, per <https://www.austingroupbugs.net/view.php?id=711>, which > + * is binding for Issue 7 implementations via the Interpretations Track. > + * > + * Furthermore, writev(), while also not marked async-signal-safe in Issue 7 + > + * TC2, is considered such, per > + * <https://www.austingroupbugs.net/view.php?id=1455>, which is slated for > + * inclusion in Issue 7 TC3 (if there's going to be a TC3), and in Issue 8.)Correct summary, and matches our off-list collaboration on determining what the Austin Group guarantees (while still waiting for their release of POSIX Issue 8 later this year).> + */ > +static void > +xwritel (int fildes, ...) > +{Good candidate for __attribute__ ((sentinel)), so that -Wformat marks callers that fail to supply a trailing NULL argument.> + /* Open-code the current value of _XOPEN_IOV_MAX, in order to contain stack > + * footprint, should _XOPEN_IOV_MAX grow in the future. > + */ > + struct iovec iovec[16], *filled, *end, *pos; > + va_list ap; > + char *arg; > + > + /* Translate the variable argument list to IO vectors. Note that we cast away > + * const-ness intentionally. > + */ > + filled = iovec; > + end = iovec + ARRAY_SIZE (iovec); > + va_start (ap, fildes); > + while (filled < end && (arg = va_arg (ap, char *)) != NULL) > + *filled++ = (struct iovec){ .iov_base = arg, .iov_len = strlen (arg) }; > + va_end (ap); > + > + /* Write out the IO vectors. */ > + pos = iovec; > + while (pos < filled) { > + ssize_t written; > + > + /* Skip any empty vectors at the front. */ > + if (pos->iov_len == 0) { > + ++pos; > + continue; > + }In practice, writev() will handle 0-length iovs on all file types; but I agree with your effort to skip them since POSIX only guarantees behavior on regular files (and our typical fd of stderr is not always a regular file).> + > + /* Write out the vectors. */ > + do > + written = writev (fildes, pos, filled - pos); > + while (written == -1 && (errno == EINTR || errno == EAGAIN)); > + > + if (written == -1) > + return; > + > + /* Consume the vectors that have been written out (fully, or in part). Note > + * that "written" is positive here.The POSIX wording is a bit tricky to read in this regards, but I think you are correct that write() (and therefore writev()) will never return 0 if passed a non-zero length on input: either a short write happens because of a signal before anything is written (return is -1, errno is EINTR or EAGAIN), or the short write occurred after partial write (return must be positive); the only time return can be 0 is if length was 0 but we don't have that issue.> + */ > + do { > + size_t advance; > + > + advance = MIN (written, pos->iov_len); > + /* Note that "advance" is positive here iff "pos->iov_len" is positive. */ > + pos->iov_base = (char *)pos->iov_base + advance; > + pos->iov_len -= advance; > + written -= advance; > + > + /* At least one of "written" and "pos->iov_len" is zero here. */ > + if (pos->iov_len == 0) > + ++pos; > + } while (written > 0); > + } > +}Nice! Took me more than one pass to fully understand it, but I agree that your documented loop invariants hold, and that it does indeed do a best effort vectored write.> + > /* Fork-safe version of perror. ONLY use this after fork and before > * exec, the rest of the time use set_error(). > */ > void > nbd_internal_fork_safe_perror (const char *s) > { > const int err = errno; > const char *m = NULL; > char buf[32]; > > - write (2, s, strlen (s)); > - write (2, ": ", 2); > #ifdef HAVE_STRERRORDESC_NP > m = strerrordesc_np (errno); > #else > #if HAVE_SYS_ERRLIST /* NB Don't use #ifdef */ > m = errno >= 0 && errno < sys_nerr ? sys_errlist[errno] : NULL; > #endif > #endif > if (!m) > m = nbd_internal_fork_safe_itoa ((long) errno, buf, sizeof buf);The bonus bug you fixed here could be independently fixed by s/errno/err/ without the rest of your patch, but now that nothing else touches errno prior to assigning to m, I don't see the need to make that change now.> - write (2, m, strlen (m)); > - write (2, "\n", 1); > + xwritel (STDERR_FILENO, s, ": ", m, "\n", (char *)NULL);I also like the change of s/2/STDERR_FILENO/ that you snuck in here. The only change I recommend is the addition of the __attribute__; but with or without it, I'm happy with: Reviewed-by: Eric Blake <eblake at redhat.com> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org