thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe() [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2023-Feb-15 22:28 UTC

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

On Wed, Feb 15, 2023 at 03:11:38PM +0100, Laszlo Ersek
wrote:> execvp() [01] is powerful:
You KNOW this is going to be a juicy commit message when the first
line contains a 2-digit footnote ;)
> 
> - it performs PATH search [02] if necessary,
> 
> - it falls back to executing the file with the shell as a shell script in
>   case the underlying execv() or execve() fails with ENOEXEC.
> 
> However, execvp() is not async-signal-safe [03], and so we shouldn't
call
> it in a child process forked from a multi-threaded parent process [04],
> which the libnbd client application may well be.
> 
> Introduce three APIs for safely emulating execvp() with execve():
> 
> - nbd_internal_execvpe_init(): to be called in the parent process, before
>   fork(). Allocate and format candidate pathnames for execution with
>   execve(), based on PATH. Allocate a buffer for the longer command line
>   that the shell script fallback needs.
> 
> - nbd_internal_fork_safe_execvpe(): to be called in the child process. Try
>   the candidates formatted previously in sequence with execve(), as long
>   as execve() fails with errors related to pathname resolution /
>   inode-level file type / execution permission. Stop iterating on fatal
>   errors; if the fatal error is ENOEXEC, activate the shell script
>   fallback. As a small added feature, take the environment from an
>   explicit parameter (effectively imitating the GNU-specific execvpe()
>   function -- cf. commit dc64ac5cdd0b, "states: Use execvp instead of
>   execvpe", 2019-11-15).
> 
> - nbd_internal_execvpe_uninit(): to be called in the parent process, after
>   fork(). Release the resources allocated with
>   nbd_internal_execvpe_init().
Makes sense.  A shame that it requires 3 functions, and that we have
to duplicate so much work that libc would normally do for us if we
don't care about async-safety, but I think the result is worth it.
> 
> The main idea with the above distribution of responsibilities is that any
> pre-scanning of PATH -- which was suggested by Eric Blake -- should not
> include activities that are performed by execve() anyway. I've
considered
> and abandoned two approaches:
> 
> - During scanning (i.e., in nbd_internal_execvpe_init()), check each
>   candidate with access(X_OK) [05].
> 
>   I dropped this because the informative APPLICATION USAGE section of the
>   same specification [06] advises against it, saying "[a]n application
>   should instead attempt the action itself and handle the [EACCES] error
>   that occurs if the file is not accessible".
> 
> - During scanning, open each candidate with O_EXEC [07] until one open()
>   succeeds. Save the sole file descriptor for
>   nbd_internal_fork_safe_execvpe(). In the latter, call fexecve() [08],
>   which is free of all error conditions that necessitate iteration over
>   candidates. Still implement the ENOEXEC (shell script) fallback.
> 
>   I dropped this because (a) fexecve() is a quite recent interface
>   (highlighted by Eric); (b) for an application to open a file for
>   execution with fexecve(), it's more robust to pass O_EXEC to open()
than
>   not to pass it, per APPLICATION USAGE [09], but on Linux/glibc, O_EXEC
>   does not seem supported, only O_PATH does [10].
> 
> Thus the chosen approach -- pre-generate filenames -- contains a  small
Accidental double space.
> TOCTTOU race (highlighted by Eric) after all, but it should be harmless.
Yeah - if you are actively changing out binaries on PATH at the same
time as anything else running in earnest on a production system,
you've got more issues than just libnbd's potential race.
> 
> Implementation-defined details:
> 
> - If PATH searching is necessary, but PATH is absent [01], then fall back
>   to confstr(_CS_PATH) [11], similarly to Linux/glibc execvpe() [12].
> 
>   If PATH is set but empty ("set to null") [02], or PATH is unset
and
>   confstr(_CS_PATH) fails or returns no information or returns the empty
>   string, we fail the initial scanning (!) with ENOENT. This is consistent
>   with bash's behavior on Linux/glibc:
> 
>   $ PATH= ls
>   bash: ls: No such file or directory
_Technically_, PATH= should behave identically to PATH=.; if you have
an executable named './ls' in your working directory, bash WILL
execute it!

$ mkdir -p /tmp/blah
$ cd /tmp/blah
$ cp /bin/ls .
$ PATH= ls -a
.  ..  ls

But whether you want to tweak this patch to actually comply with that
corner case is up to you (no one in their right mind should set PATH
to empty).  Below, I'll refer to this as [1].
> 
> Details chosen for unspecified behavior:
> 
> - Hardwire "/bin/sh" as the shell's pathname [01].
> 
> [01] https://pubs.opengroup.org/onlinepubs/9699919799/functions/execvp.html
> [02]
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
> [03]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03
> [04] https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
> [05] https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html
> [06]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html#tag_16_09_07
> [07] https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
> [08]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html
> [09]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html#tag_16_111_07
> [10] https://man7.org/linux/man-pages/man2/open.2.html
> [11]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html
> [12] https://man7.org/linux/man-pages/man3/execvpe.3.html
Very useful bibliography.
> 
> Suggested-by: Eric Blake <eblake at redhat.com>
> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
> ---
>  lib/internal.h |  22 ++
>  lib/utils.c    | 351 ++++++++++++++++++++
>  2 files changed, 373 insertions(+)
> 
> diff --git a/lib/internal.h b/lib/internal.h
> index 9c9021ed366e..8c4e32013e40 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -368,6 +368,18 @@ struct command {
>    uint32_t error; /* Local errno value */
>  };
>  
> +struct execvpe {
> +  string_vector pathnames;
> +
> +  /* Note: "const_string_vector" is not a good type for
"sh_argv" below. Even if
> +   * we reserved enough space in a "const_string_vector",
> +   * const_string_vector_append() would still not be async-signal-safe,
due to
> +   * the underlying const_string_vector_insert() calling assert().
> +   */
> +  char **sh_argv;
> +  size_t num_sh_args;
> +};
> +
>  /* Test if a callback is "null" or not, and set it to null. */
>  #define CALLBACK_IS_NULL(cb)     ((cb).callback == NULL &&
(cb).free == NULL)
>  #define CALLBACK_IS_NOT_NULL(cb) (! CALLBACK_IS_NULL ((cb)))
> @@ -540,4 +552,14 @@ extern void nbd_internal_fork_safe_assert (int result,
const char *file,
>                                    __func__, #expression))
>  #endif
>  
> +extern int nbd_internal_execvpe_init (struct execvpe *ctx, const char
*file,
> +                                      size_t num_args)
> +  LIBNBD_ATTRIBUTE_NONNULL (1, 2);
> +extern void nbd_internal_execvpe_uninit (struct execvpe *ctx)
> +  LIBNBD_ATTRIBUTE_NONNULL (1);
> +extern int nbd_internal_fork_safe_execvpe (struct execvpe *ctx,
> +                                           const string_vector *argv,
> +                                           char * const *envp)
> +  LIBNBD_ATTRIBUTE_NONNULL (1, 2, 3);
The interface looks good for now (if you can't tell, I'm doing a
stream-of-consciousness review, and may have further comments about
the interface once I see the implementation and/or usage).
> +
>  #endif /* LIBNBD_INTERNAL_H */
> diff --git a/lib/utils.c b/lib/utils.c
> index 5dd00f82e073..9a96ed6ed674 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -28,6 +28,7 @@
>  #include <limits.h>
>  
>  #include "minmax.h"
> +#include "checked-overflow.h"
>  
>  #include "internal.h"
>  
> @@ -463,3 +464,353 @@ nbd_internal_fork_safe_assert (int result, const char
*file, long line,
>    xwrite (STDERR_FILENO, "' failed.\n", 10);
>    abort ();
>  }
> +
> +/* Returns the value of the PATH environment variable -- falling back to
> + * confstr(_CS_PATH) if PATH is absent -- as a dynamically allocated
string. On
> + * failure, sets "errno" and returns NULL.
> + */
> +static char *
> +get_path (void)
> +  LIBNBD_ATTRIBUTE_ALLOC_DEALLOC (free)
> +{
> +  char *path;
> +  bool env_path_found;
> +  size_t path_size, path_size2;
> +
> +  /* FIXME: lock the environment. */
> +  path = getenv ("PATH");
> +  if ((env_path_found = (path != NULL)))
> +    path = strdup (path);
> +  /* FIXME: unlock the environment. */
Are the FIXME comments still necessary?  I know we discussed about
getenv() being technically non-async-safe in POSIX, but that in all
practical aspects, we consider that a multi-threaded app should be
smart enough to never modify the environment in parallel with calling
into a library like libnbd.  At best, you could reword the comment to
get rid of FIXME and instead just state our reasoning for why we don't
bother with locking the environment: glibc and any other high-quality
libc will not be modifying environ during getenv().
> +
> +  if (env_path_found) {
> +    /* This handles out-of-memory as well. */
> +    return path;
> +  }
Clever, and definitely worth that comment.
> +
> +  errno = 0;
> +  path_size = confstr (_CS_PATH, NULL, 0);
> +  if (path_size == 0) {
> +    /* If _CS_PATH does not have a configuration-defined value, just store
> +     * ENOENT to "errno".
> +     */
> +    if (errno == 0)
> +      errno = ENOENT;
> +
> +    return NULL;
> +  }
> +
> +  path = malloc (path_size);
> +  if (path == NULL)
> +    return NULL;
> +
> +  path_size2 = confstr (_CS_PATH, path, path_size);
> +  assert (path_size2 == path_size);
> +  return path;
> +}
Nice.
> +
> +/* nbd_internal_execvpe_init() and nbd_internal_fork_safe_execvpe()
together
> + * present an execvp() alternative that is async-signal-safe.
> + *
> + * nbd_internal_execvpe_init() may only be called before fork(), for
filling in
> + * the caller-allocated, uninitialized "ctx" structure. If
> + * nbd_internal_execvpe_init() succeeds, then fork() may be called.
> + * Subsequently, in the child process, nbd_internal_fork_safe_execvpe()
may be
> + * called with the inherited "ctx" structure, while in the
parent process,
> + * nbd_internal_execvpe_uninit() must be called to uninitialize (evacuate)
the
> + * "ctx" structure.
> + *
> + * On failure, "ctx" will not have been modified,
"errno" is set, and -1 is
> + * returned. Failures include:
> + *
> + * - Errors forwarded from underlying functions such as strdup(),
confstr(),
> + *   malloc(), string_vector_append().
> + *
> + * - ENOENT: "file" is an empty string.
> + *
> + * - ENOENT: "file" does not contain a <slash>
"/" character, the PATH
> + *           environment variable is not set, and confstr() doesn't
associate a
> + *           configuration-defined value with _CS_PATH.
> + *
> + * - ENOENT: "file" does not contain a <slash>
"/" character, and: (a) the PATH
> + *           environment variable is set to the empty string, or (b) PATH
is not
> + *           set, and confstr() outputs the empty string for _CS_PATH.
[1] This might need tweaking if you treat empty PATH or _CS_PATH as
equivalent to "." instead of an unconditional ENOENT.
> + *
> + * - EOVERFLOW: the sizes or counts of necessary objects could not be
expressed.
> + *
> + * - EINVAL: "num_args" is less than 2.
> + *
> + * On success, the "ctx" structure will have been filled in, and
0 is returned.
> + *
> + * - "pathnames" member:
> + *
> + *   - All strings pointed-to by elements of the "pathnames"
string_vector
> + *     member are owned by "pathnames".
> + *
> + *   - If "file" contains a <slash> "/"
character, then the sole entry in
> + *     "pathnames" is a copy of "file".
> + *
> + *   - If "file" does not contain a <slash> "/"
character:
> + *
> + *     Let "system path" be defined as the value of the PATH
environment
> + *     variable, if the latter exists, and as the value output by
confstr() for
> + *     _CS_PATH otherwise. Per the ENOENT specifications above,
"system path" is
> + *     a non-empty string. Let "system path" further be of the
form
> + *
> + *       <prefix_0> [n = 1]
> + *
> + *     or
> + *
> + *       <prefix_0>:<prefix_1>:...:<prefix_(n-1)> [n
>= 2]
> + *
> + *     where for each 0 <= i < n, <prefix_i> does not contain
the <colon> ":"
> + *     character. In the (n = 1) case, <prefix_0> is never empty
(see ENOENT
> + *     above), while in the (n >= 2) case, any individual
<prefix_i> may or may
> + *     not be empty.
[1] And this simplifies if empty PATH can imply searching in ".".
> + *
> + *     The "pathnames" string_vector member has n elements; for
each 0 <= i < n,
> + *     element i is of the form
> + *
> + *       suffix(curdir(<prefix_i>))file
> + *
> + *     where
> + *
> + *       curdir(x) := "."  if x = ""
> + *       curdir(x) := x    otherwise
> + *
> + *     and
> + *
> + *       suffix(x) := x      if "x" ends with a <slash>
"/"
> + *       suffix(x) := x "/"  otherwise
> + *
> + *     This transformation implements the POSIX XBD / PATH environment
variable
> + *     semantics, creating candidate pathnames for execution by
> + *     nbd_internal_fork_safe_execvpe(). nbd_internal_fork_safe_execvpe()
will
> + *     iterate over the candidate pathnames with execve() until execve()
> + *     succeeds, or fails with an error that is due to neither pathname
> + *     resolution, nor the candidate not being a regular file, nor the
candidate
> + *     lacking execution permission.
> + *
> + * - The "sh_argv" array member will have at least (num_args +
1) elements
> + *   allocated, and none populated.
> + *
> + *   (The minimum value of "num_args" is 2 -- see EINVAL above.
According to
> + *   POSIX, "[t]he argument /arg0/ should point to a filename string
that is
> + *   associated with the process being started by one of the /exec/
functions",
> + *   plus "num_args" includes the null pointer that terminates
the argument
> + *   list.)
> + *
> + *   This allocation is made in anticipation of execve() failing for a
> + *   particular candidate inside nbd_internal_fork_safe_execvpe() with
ENOEXEC
> + *   ("[t]he new process image file has the appropriate access
permission but
> + *   has an unrecognized format"). While that failure terminates the
iteration,
> + *   the failed call
> + *
> + *     execve (pathnames[i],
> + *             { argv[0], argv[1], ..., NULL }, // (num_args >= 2)
elements
> + *             { envp[0], envp[1], ..., NULL })
> + *
> + *   must be repeated as
> + *
> + *     execve (<shell-path>,
> + *             { argv[0], pathnames[i],         // ((num_args + 1) >=
3)
Are we guaranteed that if we get to this fallback, pathnames[i] does
not begin with '-' or '+'?  Or is it possible to write
PATH=/usr/bin:+my/evil/relative/path:... and/or the user to attempt to
execute '+my/evil/name' which skips the PATH search because the binary
name contains '/', such that the resulting argument to /bin/sh might
need to be protected with an extra "--" argument to avoid being
misinterpreted as options to the shell rather than the script name to
be executed?

Is the current working directory something we change between fork()
and exec()?  If so, should we canonicalize the entries in pathnames[]
to always start with '/', so that the child process can reposition
directories at will without changing the meaning of the pre-scanned
candidates?  (Note that canonicalizing names would automatically
resolve my above question about safety in the presence of an unusual
PATH or binary name beginning with [-+]). [2]
> +                 argv[1], ..., NULL },          // elements
> + *             { envp[0], envp[1], ..., NULL })
> + *
> + *   for emulating execvp(). The allocation in the "sh_argv"
member makes it
> + *   possible just to *link* the original "argv" elements and
the "pathnames[i]"
> + *   candidate into the right positions.
> + *
> + *   (POSIX leaves the shell pathname unspecified; "/bin/sh"
should be good
> + *   enough.)
> + *
> + *   The shell *binary* will see itself being executed under the name
"argv[0]",
> + *   will receive "pathnames[i]" as the pathname of the shell
*script* to read
> + *   and interpret ("command_file" in POSIX terminology), will
expose
> + *   "pathnames[i]" as the positional parameter $0 to the
script, and will
> + *   forward "argv[1]" and the rest to the script as positional
parameters $1
> + *   and onward.
> + */
> +int
> +nbd_internal_execvpe_init (struct execvpe *ctx, const char *file,
> +                           size_t num_args)
Should the interface take the proposed argv[] here pre-fork(), or are
we trying to keep this interface generic enough where the child may
alter some of the argv post-fork()?  On the other hand, I already know
we are altering the environment post-fork() (setting LISTEN_PIDcannot be done
pre-fork, for starters), so keeping the selection of
argv solely in the child rather than making the parent generate it
pre-fork seems like the right division of labor.
> +{
> +  int rc;
> +  char *sys_path;
> +  string_vector pathnames;
> +  char *pathname;
> +  size_t num_sh_args;
> +  char **sh_argv;
> +  size_t sh_argv_bytes;
> +
> +  rc = -1;
> +
> +  if (file[0] == '\0') {
> +    errno = ENOENT;
> +    return rc;
> +  }
> +
> +  /* First phase. */
> +  sys_path = NULL;
> +  pathnames = (string_vector)empty_vector;
Another point where we may want consistency on whether we do space
after cast.
> +
> +  if (strchr (file, '/') == NULL) {
> +    size_t file_len;
> +    const char *sys_path_element, *scan;
> +    bool finish;
> +
> +    sys_path = get_path ();
> +    if (sys_path == NULL)
> +      return rc;
> +
> +    if (sys_path[0] == '\0') {
> +      errno = ENOENT;
> +      goto free_sys_path;
> +    }
[1] Here is where you would code up an empty PATH as being equivalent
to ".", if we think it's worth having, probably by just dropping
this
condition and letting the do-loop below handle it.
> +
> +    pathname = NULL;
> +    file_len = strlen (file);
> +    sys_path_element = sys_path;
> +    scan = sys_path;
> +    do {
> +      assert (sys_path_element <= scan);
> +      finish = (*scan == '\0');
> +      if (finish || *scan == ':') {
> +        const char *sys_path_copy_start;
> +        size_t sys_path_copy_size;
> +        size_t sep_copy_size;
> +        size_t pathname_size;
> +        char *p;
> +
> +        if (scan == sys_path_element) {
> +          sys_path_copy_start = ".";
> +          sys_path_copy_size = 1;
> +        } else {
> +          sys_path_copy_start = sys_path_element;
> +          sys_path_copy_size = scan - sys_path_element;
> +        }
> +
> +        assert (sys_path_copy_size >= 1);
> +        sep_copy_size = (sys_path_copy_start[sys_path_copy_size - 1] !=
'/');
> +
> +        if (ADD_OVERFLOW (sys_path_copy_size, sep_copy_size,
&pathname_size) ||
> +            ADD_OVERFLOW (pathname_size, file_len, &pathname_size) ||
> +            ADD_OVERFLOW (pathname_size, 1u, &pathname_size)) {
> +          errno = EOVERFLOW;
> +          goto empty_pathnames;
> +        }
Since we are in the parent process during this _init() function, can't
we just use asprintf() instead of concatenating it all out by hand?
> +
> +        pathname = malloc (pathname_size);
> +        if (pathname == NULL)
> +          goto empty_pathnames;
> +        p = pathname;
> +
> +        memcpy (p, sys_path_copy_start, sys_path_copy_size);
> +        p += sys_path_copy_size;
[2] Here is where we would want to consider whether to
(semi-)canonicalize any relative names into a corresponding absolute
prefix, to avoid issues with a relative PATH element starting with
[-+].
> +
> +        memcpy (p, "/", sep_copy_size);
> +        p += sep_copy_size;
Clever way to avoid turning PATH=/:/usr/bin:... into an attempt to
execute "//binary" which is NOT the same as "/binary".  But
also
possible if you use
  asprintf(..., "...%.*s...", ..., (int) sep_copy_size, "/",
...)

or declare sep_copy_size as int to begin with (because we know it will
only be 0 or 1).
> +
> +        memcpy (p, file, file_len);
> +        p += file_len;
> +
> +        *p++ = '\0';
> +
> +        if (string_vector_append (&pathnames, pathname) == -1)
> +          goto empty_pathnames;
> +        /* Ownership transferred. */
> +        pathname = NULL;
> +
> +        sys_path_element = scan + 1;
> +      }
> +
> +      ++scan;
> +    } while (!finish);
> +  } else {
> +    pathname = strdup (file);
[2] and here we'd have to canonicalize 'file' itself if we are
worrying about the child changing directories and/or starting with a
character that could change the behavior of /bin/sh if mistaken as an
option instead of a script name.
> +    if (pathname == NULL)
> +      return rc;
> +
> +    if (string_vector_append (&pathnames, pathname) == -1)
> +      goto empty_pathnames;
> +    /* Ownership transferred. */
> +    pathname = NULL;
> +  }
> +
> +  /* Second phase. */
> +  if (num_args < 2) {
> +    errno = EINVAL;
> +    goto empty_pathnames;
> +  }
> +  if (ADD_OVERFLOW (num_args, 1u, &num_sh_args) ||
> +      MUL_OVERFLOW (num_sh_args, sizeof *sh_argv, &sh_argv_bytes)) {
> +    errno = EOVERFLOW;
> +    goto empty_pathnames;
> +  }
> +  sh_argv = malloc (sh_argv_bytes);
> +  if (sh_argv == NULL)
> +    goto empty_pathnames;
Should we prepopulate sh_argv[0] as a placeholder that can then be
easily swapped to ctx->pathnames[i], where the child first attempts
execve(pathnames[i], argv+1, env) with the ENOEXEC fallback to
argv[0]=pathnames[i]; execve("/bin/sh", argv, env), rather than having
to shuffle things in the child process? [3]
> +
> +  /* Commit. */
> +  ctx->pathnames = pathnames;
> +  ctx->sh_argv = sh_argv;
> +  ctx->num_sh_args = num_sh_args;
> +  rc = 0;
> +  /* Fall through, for freeing temporaries. */
> +
> +empty_pathnames:
> +  if (rc == -1) {
> +    free (pathname);
> +    string_vector_empty (&pathnames);
> +  }
> +
> +free_sys_path:
> +  free (sys_path);
> +
> +  return rc;
> +}
> +
> +void
> +nbd_internal_execvpe_uninit (struct execvpe *ctx)
> +{
> +  free (ctx->sh_argv);
> +  ctx->num_sh_args = 0;
> +  string_vector_empty (&ctx->pathnames);
> +}
> +
> +int
> +nbd_internal_fork_safe_execvpe (struct execvpe *ctx, const string_vector
*argv,
> +                                char * const *envp)
> +{
> +  size_t pathname_idx;
> +
> +  NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->pathnames.len > 0);
> +
> +  pathname_idx = 0;
> +  do {
> +    (void)execve (ctx->pathnames.ptr[pathname_idx], argv->ptr,
envp);
> +    if (errno != EACCES && errno != ELOOP && errno !=
ENAMETOOLONG &&
> +        errno != ENOENT && errno != ENOTDIR)
> +      break;
> +
> +    ++pathname_idx;
> +  } while (pathname_idx < ctx->pathnames.len);
> +
> +  if (errno == ENOEXEC) {
> +    char **sh_argp;
> +    size_t argv_idx;
> +
> +    NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args >=
argv->len);
> +    NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args - argv->len ==
1);
> +
> +    sh_argp = ctx->sh_argv;
> +    *sh_argp++ = argv->ptr[0];
> +    *sh_argp++ = ctx->pathnames.ptr[pathname_idx];
> +    for (argv_idx = 1; argv_idx < argv->len; ++argv_idx)
> +      *sh_argp++ = argv->ptr[argv_idx];
> +
> +    (void)execve ("/bin/sh", ctx->sh_argv, envp);
[3] Oh, I see - the shuffle is still probably needed, because
ctx->sh_argv is different than user's argv which does not have a
pre-allocated filler slot that we can just swap out.  Unless we decide
that the _init() function should take argv itself rather than making
the child do the work of shifting everything from one array to
another, but then we are inconsistent for grabbing argv in the parent
and env in the child.
> +  }
> +
> +  return -1;
> +}
>
Overall looks like nice work.

Based on my review, you may decide to make further tweaks, but I
didn't see any hard reason why we couldn't use the patch as-written if
you don't think my review warrants changes.  Thus:

Reviewed-by: Eric Blake <eblake at redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Laszlo Ersek

2023-Feb-21 11:55 UTC

head link

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

On 2/15/23 23:28, Eric Blake wrote:> On Wed, Feb 15, 2023 at 03:11:38PM +0100, Laszlo Ersek wrote:
>> execvp() [01] is powerful:
> 
> You KNOW this is going to be a juicy commit message when the first
> line contains a 2-digit footnote ;)
Haha, yes :) Clearly I originally started the commit message with
single-digit footnotes, but I ended up needing another digit :)
> 
>>
>> - it performs PATH search [02] if necessary,
>>
>> - it falls back to executing the file with the shell as a shell script
in
>>   case the underlying execv() or execve() fails with ENOEXEC.
>>
>> However, execvp() is not async-signal-safe [03], and so we
shouldn't call
>> it in a child process forked from a multi-threaded parent process [04],
>> which the libnbd client application may well be.
>>
>> Introduce three APIs for safely emulating execvp() with execve():
>>
>> - nbd_internal_execvpe_init(): to be called in the parent process,
before
>>   fork(). Allocate and format candidate pathnames for execution with
>>   execve(), based on PATH. Allocate a buffer for the longer command
line
>>   that the shell script fallback needs.
>>
>> - nbd_internal_fork_safe_execvpe(): to be called in the child process.
Try
>>   the candidates formatted previously in sequence with execve(), as
long
>>   as execve() fails with errors related to pathname resolution /
>>   inode-level file type / execution permission. Stop iterating on fatal
>>   errors; if the fatal error is ENOEXEC, activate the shell script
>>   fallback. As a small added feature, take the environment from an
>>   explicit parameter (effectively imitating the GNU-specific execvpe()
>>   function -- cf. commit dc64ac5cdd0b, "states: Use execvp instead
of
>>   execvpe", 2019-11-15).
>>
>> - nbd_internal_execvpe_uninit(): to be called in the parent process,
after
>>   fork(). Release the resources allocated with
>>   nbd_internal_execvpe_init().
> 
> Makes sense.  A shame that it requires 3 functions, and that we have
> to duplicate so much work that libc would normally do for us if we
> don't care about async-safety, but I think the result is worth it.
> 
>>
>> The main idea with the above distribution of responsibilities is that
any
>> pre-scanning of PATH -- which was suggested by Eric Blake -- should not
>> include activities that are performed by execve() anyway. I've
considered
>> and abandoned two approaches:
>>
>> - During scanning (i.e., in nbd_internal_execvpe_init()), check each
>>   candidate with access(X_OK) [05].
>>
>>   I dropped this because the informative APPLICATION USAGE section of
the
>>   same specification [06] advises against it, saying "[a]n
application
>>   should instead attempt the action itself and handle the [EACCES]
error
>>   that occurs if the file is not accessible".
>>
>> - During scanning, open each candidate with O_EXEC [07] until one
open()
>>   succeeds. Save the sole file descriptor for
>>   nbd_internal_fork_safe_execvpe(). In the latter, call fexecve() [08],
>>   which is free of all error conditions that necessitate iteration over
>>   candidates. Still implement the ENOEXEC (shell script) fallback.
>>
>>   I dropped this because (a) fexecve() is a quite recent interface
>>   (highlighted by Eric); (b) for an application to open a file for
>>   execution with fexecve(), it's more robust to pass O_EXEC to
open() than
>>   not to pass it, per APPLICATION USAGE [09], but on Linux/glibc,
O_EXEC
>>   does not seem supported, only O_PATH does [10].
>>
>> Thus the chosen approach -- pre-generate filenames -- contains a  small
> 
> Accidental double space.
Meh, those are the bane of my existence. :) Thanks for catching it!
> 
>> TOCTTOU race (highlighted by Eric) after all, but it should be
harmless.
> 
> Yeah - if you are actively changing out binaries on PATH at the same
> time as anything else running in earnest on a production system,
> you've got more issues than just libnbd's potential race.
> 
>>
>> Implementation-defined details:
>>
>> - If PATH searching is necessary, but PATH is absent [01], then fall
back
>>   to confstr(_CS_PATH) [11], similarly to Linux/glibc execvpe() [12].
>>
>>   If PATH is set but empty ("set to null") [02], or PATH is
unset and
>>   confstr(_CS_PATH) fails or returns no information or returns the
empty
>>   string, we fail the initial scanning (!) with ENOENT. This is
consistent
>>   with bash's behavior on Linux/glibc:
>>
>>   $ PATH= ls
>>   bash: ls: No such file or directory
> 
> _Technically_, PATH= should behave identically to PATH=.; if you have
> an executable named './ls' in your working directory, bash WILL
> execute it!
> 
> $ mkdir -p /tmp/blah
> $ cd /tmp/blah
> $ cp /bin/ls .
> $ PATH= ls -a
> .  ..  ls
> 
> But whether you want to tweak this patch to actually comply with that
> corner case is up to you (no one in their right mind should set PATH
> to empty).  Below, I'll refer to this as [1].
Thanks for the observation.

I'll modify the commit message -- remove the note that this behavior is
consistent with bash/Linux/glibc. I've re-checked [02], and it's
implementation-defined what we do when PATH is set to the empty string.
POSIX does not mandate (even as "legacy") the fallback to
"." when PATH
is empty. The code was extremely hard to review for myself (much harder
than just writing it), so I'd *really* like to avoid "tweaking"
it. I'd
need to review the whole thing from the ground up, and I'm kind of "out
of steam" for that, and this is such a corner case indeed that I don't
think maintaining the above-noted consistency with linux/glibc/bash is
worth repeating the whole effort.

(Not to mention the unit tests in the next patch, which were super
difficult to *write* as well.)
> 
>>
>> Details chosen for unspecified behavior:
>>
>> - Hardwire "/bin/sh" as the shell's pathname [01].
>>
>> [01]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/execvp.html
>> [02]
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
>> [03]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03
>> [04]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
>> [05]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html
>> [06]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html#tag_16_09_07
>> [07]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
>> [08]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html
>> [09]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html#tag_16_111_07
>> [10] https://man7.org/linux/man-pages/man2/open.2.html
>> [11]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html
>> [12] https://man7.org/linux/man-pages/man3/execvpe.3.html
> 
> Very useful bibliography.
> 
>>
>> Suggested-by: Eric Blake <eblake at redhat.com>
>> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
>> ---
>>  lib/internal.h |  22 ++
>>  lib/utils.c    | 351 ++++++++++++++++++++
>>  2 files changed, 373 insertions(+)
>>
>> diff --git a/lib/internal.h b/lib/internal.h
>> index 9c9021ed366e..8c4e32013e40 100644
>> --- a/lib/internal.h
>> +++ b/lib/internal.h
>> @@ -368,6 +368,18 @@ struct command {
>>    uint32_t error; /* Local errno value */
>>  };
>>  
>> +struct execvpe {
>> +  string_vector pathnames;
>> +
>> +  /* Note: "const_string_vector" is not a good type for
"sh_argv" below. Even if
>> +   * we reserved enough space in a "const_string_vector",
>> +   * const_string_vector_append() would still not be
async-signal-safe, due to
>> +   * the underlying const_string_vector_insert() calling assert().
>> +   */
>> +  char **sh_argv;
>> +  size_t num_sh_args;
>> +};
>> +
>>  /* Test if a callback is "null" or not, and set it to null.
*/
>>  #define CALLBACK_IS_NULL(cb)     ((cb).callback == NULL &&
(cb).free == NULL)
>>  #define CALLBACK_IS_NOT_NULL(cb) (! CALLBACK_IS_NULL ((cb)))
>> @@ -540,4 +552,14 @@ extern void nbd_internal_fork_safe_assert (int
result, const char *file,
>>                                    __func__, #expression))
>>  #endif
>>  
>> +extern int nbd_internal_execvpe_init (struct execvpe *ctx, const char
*file,
>> +                                      size_t num_args)
>> +  LIBNBD_ATTRIBUTE_NONNULL (1, 2);
>> +extern void nbd_internal_execvpe_uninit (struct execvpe *ctx)
>> +  LIBNBD_ATTRIBUTE_NONNULL (1);
>> +extern int nbd_internal_fork_safe_execvpe (struct execvpe *ctx,
>> +                                           const string_vector *argv,
>> +                                           char * const *envp)
>> +  LIBNBD_ATTRIBUTE_NONNULL (1, 2, 3);
> 
> The interface looks good for now (if you can't tell, I'm doing a
> stream-of-consciousness review, and may have further comments about
> the interface once I see the implementation and/or usage).
> 
>> +
>>  #endif /* LIBNBD_INTERNAL_H */
>> diff --git a/lib/utils.c b/lib/utils.c
>> index 5dd00f82e073..9a96ed6ed674 100644
>> --- a/lib/utils.c
>> +++ b/lib/utils.c
>> @@ -28,6 +28,7 @@
>>  #include <limits.h>
>>  
>>  #include "minmax.h"
>> +#include "checked-overflow.h"
>>  
>>  #include "internal.h"
>>  
>> @@ -463,3 +464,353 @@ nbd_internal_fork_safe_assert (int result, const
char *file, long line,
>>    xwrite (STDERR_FILENO, "' failed.\n", 10);
>>    abort ();
>>  }
>> +
>> +/* Returns the value of the PATH environment variable -- falling back
to
>> + * confstr(_CS_PATH) if PATH is absent -- as a dynamically allocated
string. On
>> + * failure, sets "errno" and returns NULL.
>> + */
>> +static char *
>> +get_path (void)
>> +  LIBNBD_ATTRIBUTE_ALLOC_DEALLOC (free)
>> +{
>> +  char *path;
>> +  bool env_path_found;
>> +  size_t path_size, path_size2;
>> +
>> +  /* FIXME: lock the environment. */
>> +  path = getenv ("PATH");
>> +  if ((env_path_found = (path != NULL)))
>> +    path = strdup (path);
>> +  /* FIXME: unlock the environment. */
> 
> Are the FIXME comments still necessary?  I know we discussed about
> getenv() being technically non-async-safe in POSIX, but that in all
> practical aspects, we consider that a multi-threaded app should be
> smart enough to never modify the environment in parallel with calling
> into a library like libnbd.  At best, you could reword the comment to
> get rid of FIXME and instead just state our reasoning for why we don't
> bother with locking the environment: glibc and any other high-quality
> libc will not be modifying environ during getenv().
Yes, I'll replace the FIXME with a NOTE -- that strictly per POSIX, we
should lock the environment even for getenv(), but "glibc and any other
high-quality libc will not be modifying environ during getenv()", and
that no sane application should modify the environment after launching
threads.
> 
>> +
>> +  if (env_path_found) {
>> +    /* This handles out-of-memory as well. */
>> +    return path;
>> +  }
> 
> Clever, and definitely worth that comment.
> 
>> +
>> +  errno = 0;
>> +  path_size = confstr (_CS_PATH, NULL, 0);
>> +  if (path_size == 0) {
>> +    /* If _CS_PATH does not have a configuration-defined value, just
store
>> +     * ENOENT to "errno".
>> +     */
>> +    if (errno == 0)
>> +      errno = ENOENT;
>> +
>> +    return NULL;
>> +  }
>> +
>> +  path = malloc (path_size);
>> +  if (path == NULL)
>> +    return NULL;
>> +
>> +  path_size2 = confstr (_CS_PATH, path, path_size);
>> +  assert (path_size2 == path_size);
>> +  return path;
>> +}
> 
> Nice.
> 
>> +
>> +/* nbd_internal_execvpe_init() and nbd_internal_fork_safe_execvpe()
together
>> + * present an execvp() alternative that is async-signal-safe.
>> + *
>> + * nbd_internal_execvpe_init() may only be called before fork(), for
filling in
>> + * the caller-allocated, uninitialized "ctx" structure. If
>> + * nbd_internal_execvpe_init() succeeds, then fork() may be called.
>> + * Subsequently, in the child process,
nbd_internal_fork_safe_execvpe() may be
>> + * called with the inherited "ctx" structure, while in the
parent process,
>> + * nbd_internal_execvpe_uninit() must be called to uninitialize
(evacuate) the
>> + * "ctx" structure.
>> + *
>> + * On failure, "ctx" will not have been modified,
"errno" is set, and -1 is
>> + * returned. Failures include:
>> + *
>> + * - Errors forwarded from underlying functions such as strdup(),
confstr(),
>> + *   malloc(), string_vector_append().
>> + *
>> + * - ENOENT: "file" is an empty string.
>> + *
>> + * - ENOENT: "file" does not contain a <slash>
"/" character, the PATH
>> + *           environment variable is not set, and confstr()
doesn't associate a
>> + *           configuration-defined value with _CS_PATH.
>> + *
>> + * - ENOENT: "file" does not contain a <slash>
"/" character, and: (a) the PATH
>> + *           environment variable is set to the empty string, or (b)
PATH is not
>> + *           set, and confstr() outputs the empty string for _CS_PATH.
> 
> [1] This might need tweaking if you treat empty PATH or _CS_PATH as
> equivalent to "." instead of an unconditional ENOENT.
> 
>> + *
>> + * - EOVERFLOW: the sizes or counts of necessary objects could not be
expressed.
>> + *
>> + * - EINVAL: "num_args" is less than 2.
>> + *
>> + * On success, the "ctx" structure will have been filled in,
and 0 is returned.
>> + *
>> + * - "pathnames" member:
>> + *
>> + *   - All strings pointed-to by elements of the "pathnames"
string_vector
>> + *     member are owned by "pathnames".
>> + *
>> + *   - If "file" contains a <slash> "/"
character, then the sole entry in
>> + *     "pathnames" is a copy of "file".
>> + *
>> + *   - If "file" does not contain a <slash>
"/" character:
>> + *
>> + *     Let "system path" be defined as the value of the PATH
environment
>> + *     variable, if the latter exists, and as the value output by
confstr() for
>> + *     _CS_PATH otherwise. Per the ENOENT specifications above,
"system path" is
>> + *     a non-empty string. Let "system path" further be of
the form
>> + *
>> + *       <prefix_0> [n = 1]
>> + *
>> + *     or
>> + *
>> + *       <prefix_0>:<prefix_1>:...:<prefix_(n-1)> [n
>= 2]
>> + *
>> + *     where for each 0 <= i < n, <prefix_i> does not
contain the <colon> ":"
>> + *     character. In the (n = 1) case, <prefix_0> is never empty
(see ENOENT
>> + *     above), while in the (n >= 2) case, any individual
<prefix_i> may or may
>> + *     not be empty.
> 
> [1] And this simplifies if empty PATH can imply searching in ".".
If we ever get practical problems with PATH="", let's update the
code
then, incrementally.
> 
>> + *
>> + *     The "pathnames" string_vector member has n elements;
for each 0 <= i < n,
>> + *     element i is of the form
>> + *
>> + *       suffix(curdir(<prefix_i>))file
>> + *
>> + *     where
>> + *
>> + *       curdir(x) := "."  if x = ""
>> + *       curdir(x) := x    otherwise
>> + *
>> + *     and
>> + *
>> + *       suffix(x) := x      if "x" ends with a
<slash> "/"
>> + *       suffix(x) := x "/"  otherwise
>> + *
>> + *     This transformation implements the POSIX XBD / PATH environment
variable
>> + *     semantics, creating candidate pathnames for execution by
>> + *     nbd_internal_fork_safe_execvpe().
nbd_internal_fork_safe_execvpe() will
>> + *     iterate over the candidate pathnames with execve() until
execve()
>> + *     succeeds, or fails with an error that is due to neither
pathname
>> + *     resolution, nor the candidate not being a regular file, nor the
candidate
>> + *     lacking execution permission.
>> + *
>> + * - The "sh_argv" array member will have at least (num_args
+ 1) elements
>> + *   allocated, and none populated.
>> + *
>> + *   (The minimum value of "num_args" is 2 -- see EINVAL
above. According to
>> + *   POSIX, "[t]he argument /arg0/ should point to a filename
string that is
>> + *   associated with the process being started by one of the /exec/
functions",
>> + *   plus "num_args" includes the null pointer that
terminates the argument
>> + *   list.)
>> + *
>> + *   This allocation is made in anticipation of execve() failing for a
>> + *   particular candidate inside nbd_internal_fork_safe_execvpe() with
ENOEXEC
>> + *   ("[t]he new process image file has the appropriate access
permission but
>> + *   has an unrecognized format"). While that failure terminates
the iteration,
>> + *   the failed call
>> + *
>> + *     execve (pathnames[i],
>> + *             { argv[0], argv[1], ..., NULL }, // (num_args >= 2)
elements
>> + *             { envp[0], envp[1], ..., NULL })
>> + *
>> + *   must be repeated as
>> + *
>> + *     execve (<shell-path>,
>> + *             { argv[0], pathnames[i],         // ((num_args + 1)
>= 3)
> 
> Are we guaranteed that if we get to this fallback, pathnames[i] does
> not begin with '-' or '+'?
No, but it's not something that POSIX requires either.
> Or is it possible to write
> PATH=/usr/bin:+my/evil/relative/path:... and/or the user to attempt to
> execute '+my/evil/name' which skips the PATH search because the
binary
> name contains '/', such that the resulting argument to /bin/sh
might
> need to be protected with an extra "--" argument to avoid being
> misinterpreted as options to the shell rather than the script name to
> be executed?
Certainly possible, but I don't think it's the job of this code to
prevent that. I don't think we expect any libnbd application to run as a
part of a setuid or setgid application (definitely not without the
application properly separating privileges), so I think the user
launching the libnbd application is responsible for setting their
environment (including PATH) correctly. Also I think libnbd application
writers are responsible for passing sane arguments to
nbd_connect_systemd_socket_activation() and nbd_connect_command().

(To begin with -- if we such privilege escalations were relevant
threats, we should be using the non-portable secure_getenv() already,
not getenv()!)
> Is the current working directory something we change between fork()
> and exec()?
No, we don't do that. While chdir() is async-signal-safe, our two
fork->exec paths have no reason for calling chdir().
> If so, should we canonicalize the entries in pathnames[]
> to always start with '/', so that the child process can reposition
> directories at will without changing the meaning of the pre-scanned
> candidates?
Again, we don't call chdir() beween fork and exec, but even if we did --
from the user's perspective (and from the application's perspective that
calls nbd_connect_systemd_socket_activation() or nbd_connect_command()),
there's no difference between:

- having a PATH with relative prefixes on it and changing the working
directory *before* fork (either in the application or in libnbd), and

- having a PATH with relative prefixes on it and changing the working
directory in libnbd between fork and exec, for something the user might
want.

Having relative prefixes on PATH means that execvpe() will be
cwd-sensitive one way or another. We have no reason for calling chdir()
ourselves -- i.e., a libnbd-internal reason -- between fork and exec.
> (Note that canonicalizing names would automatically
> resolve my above question about safety in the presence of an unusual
> PATH or binary name beginning with [-+]). [2]
> 
>> +                 argv[1], ..., NULL },          // elements
>> + *             { envp[0], envp[1], ..., NULL })
>> + *
>> + *   for emulating execvp(). The allocation in the "sh_argv"
member makes it
>> + *   possible just to *link* the original "argv" elements
and the "pathnames[i]"
>> + *   candidate into the right positions.
>> + *
>> + *   (POSIX leaves the shell pathname unspecified; "/bin/sh"
should be good
>> + *   enough.)
>> + *
>> + *   The shell *binary* will see itself being executed under the name
"argv[0]",
>> + *   will receive "pathnames[i]" as the pathname of the
shell *script* to read
>> + *   and interpret ("command_file" in POSIX terminology),
will expose
>> + *   "pathnames[i]" as the positional parameter $0 to the
script, and will
>> + *   forward "argv[1]" and the rest to the script as
positional parameters $1
>> + *   and onward.
>> + */
>> +int
>> +nbd_internal_execvpe_init (struct execvpe *ctx, const char *file,
>> +                           size_t num_args)
> 
> Should the interface take the proposed argv[] here pre-fork(), or are
> we trying to keep this interface generic enough where the child may
> alter some of the argv post-fork()?  On the other hand, I already know
> we are altering the environment post-fork() (setting LISTEN_PID> cannot
be done pre-fork, for starters), so keeping the selection of
> argv solely in the child rather than making the parent generate it
> pre-fork seems like the right division of labor.
The intent is to move the absolute minimum of work into the parent. Argv
is indeed not meant to change between _init and execvpe, but that's just
not enough reason for capturing argv in the context structure. The only
piece of information needed for _init is the element count of argv, so
that's what the function takes. If the contents of argv are modified
afterwards (excluding resizing), things would work for now, but the
intent is neither to prevent that practice programmatically nor to
encourage it.
> 
>> +{
>> +  int rc;
>> +  char *sys_path;
>> +  string_vector pathnames;
>> +  char *pathname;
>> +  size_t num_sh_args;
>> +  char **sh_argv;
>> +  size_t sh_argv_bytes;
>> +
>> +  rc = -1;
>> +
>> +  if (file[0] == '\0') {
>> +    errno = ENOENT;
>> +    return rc;
>> +  }
>> +
>> +  /* First phase. */
>> +  sys_path = NULL;
>> +  pathnames = (string_vector)empty_vector;
> 
> Another point where we may want consistency on whether we do space
> after cast.
> 
>> +
>> +  if (strchr (file, '/') == NULL) {
>> +    size_t file_len;
>> +    const char *sys_path_element, *scan;
>> +    bool finish;
>> +
>> +    sys_path = get_path ();
>> +    if (sys_path == NULL)
>> +      return rc;
>> +
>> +    if (sys_path[0] == '\0') {
>> +      errno = ENOENT;
>> +      goto free_sys_path;
>> +    }
> 
> [1] Here is where you would code up an empty PATH as being equivalent
> to ".", if we think it's worth having, probably by just
dropping this
> condition and letting the do-loop below handle it.
> 
>> +
>> +    pathname = NULL;
>> +    file_len = strlen (file);
>> +    sys_path_element = sys_path;
>> +    scan = sys_path;
>> +    do {
>> +      assert (sys_path_element <= scan);
>> +      finish = (*scan == '\0');
>> +      if (finish || *scan == ':') {
>> +        const char *sys_path_copy_start;
>> +        size_t sys_path_copy_size;
>> +        size_t sep_copy_size;
>> +        size_t pathname_size;
>> +        char *p;
>> +
>> +        if (scan == sys_path_element) {
>> +          sys_path_copy_start = ".";
>> +          sys_path_copy_size = 1;
>> +        } else {
>> +          sys_path_copy_start = sys_path_element;
>> +          sys_path_copy_size = scan - sys_path_element;
>> +        }
>> +
>> +        assert (sys_path_copy_size >= 1);
>> +        sep_copy_size = (sys_path_copy_start[sys_path_copy_size - 1]
!= '/');
>> +
>> +        if (ADD_OVERFLOW (sys_path_copy_size, sep_copy_size,
&pathname_size) ||
>> +            ADD_OVERFLOW (pathname_size, file_len, &pathname_size)
||
>> +            ADD_OVERFLOW (pathname_size, 1u, &pathname_size)) {
>> +          errno = EOVERFLOW;
>> +          goto empty_pathnames;
>> +        }
> 
> Since we are in the parent process during this _init() function, can't
> we just use asprintf() instead of concatenating it all out by hand?
I thought long and hard about asprintf(), and certainly expected this
question :)

My problem with asprintf (and to an extent with sprintf / snprintf) is
that they return the number of bytes printed (or the number of bytes
that would have been printed), without the terminating NUL, as a *signed
int*. I can't fathom how that is an acceptable interface design when
strlen() and sizeof both produce size_t, and when -- specifically
regarding asprintf() -- malloc() takes a size_t. The glibc manual is
super vague on this "int" overflow ("If memory  allocation
wasn't
possible, or some other error occurs"):

  https://man7.org/linux/man-pages/man3/asprintf.3.html

and I can't refer to the POSIX standard, because asprintf() is not POSIX.

If I simply refer to fprintf() in C99 (which has similar return value
semantics), I find (7.19.6.1p14-15):

  Returns

  The fprintf function returns the number of characters transmitted, or
  a negative value if an output or encoding error occurred.

  Environmental limits

  The number of characters that can be produced by any single conversion
  shall be at least 4095.

Ugh... thanks? Sigh.

If I refer to fprintf() in POSIX:

  https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html

it *does* say what I want to see:

  [EOVERFLOW] [CX] [Option Start] The value to be returned is greater
                                  than {INT_MAX}. [Option End]

but it *does not* connect to asprintf(). Asprintf() is not in POSIX, and
the EOVERFLOW language is not in the linux/glibc manual. (The
linux/glibc manual on asprintf doesn't mention "errno" *at all*.)

So, writing the code like this was actually not a huge problem (it's a
relatively small complication compared to the rest), and it saved me an
*assumption* about asprintf(). (And no, I didn't want to rely on the
glibc source code happening to do what we want.)
> 
>> +
>> +        pathname = malloc (pathname_size);
>> +        if (pathname == NULL)
>> +          goto empty_pathnames;
>> +        p = pathname;
>> +
>> +        memcpy (p, sys_path_copy_start, sys_path_copy_size);
>> +        p += sys_path_copy_size;
> 
> [2] Here is where we would want to consider whether to
> (semi-)canonicalize any relative names into a corresponding absolute
> prefix, to avoid issues with a relative PATH element starting with
> [-+].
> 
>> +
>> +        memcpy (p, "/", sep_copy_size);
>> +        p += sep_copy_size;
> 
> Clever way to avoid turning PATH=/:/usr/bin:... into an attempt to
> execute "//binary" which is NOT the same as "/binary".
Well to be honest, I did this specifically because reference [02] on
PATH states:

  When a non-zero-length prefix is applied to this filename, a <slash>
  shall be inserted between the prefix and the filename if the prefix
  did not end in <slash>.

It happens to prevent "//binary" too, but that wasn't specifically
my
intent.

How does "//binary" differ from "/binary", in pathname
resolution?
> But also
> possible if you use
>   asprintf(..., "...%.*s...", ..., (int) sep_copy_size,
"/", ...)
> 
> or declare sep_copy_size as int to begin with (because we know it will
> only be 0 or 1).
Yes, I'd agree about "int" being fine for
"sep_copy_size" here; I just
wanted to avoid asprintf altogether. (Once I do the checked additions
up-front, asprintf() doesn't really save much.)
> 
>> +
>> +        memcpy (p, file, file_len);
>> +        p += file_len;
>> +
>> +        *p++ = '\0';
>> +
>> +        if (string_vector_append (&pathnames, pathname) == -1)
>> +          goto empty_pathnames;
>> +        /* Ownership transferred. */
>> +        pathname = NULL;
>> +
>> +        sys_path_element = scan + 1;
>> +      }
>> +
>> +      ++scan;
>> +    } while (!finish);
>> +  } else {
>> +    pathname = strdup (file);
> 
> [2] and here we'd have to canonicalize 'file' itself if we are
> worrying about the child changing directories and/or starting with a
> character that could change the behavior of /bin/sh if mistaken as an
> option instead of a script name.
> 
>> +    if (pathname == NULL)
>> +      return rc;
>> +
>> +    if (string_vector_append (&pathnames, pathname) == -1)
>> +      goto empty_pathnames;
>> +    /* Ownership transferred. */
>> +    pathname = NULL;
>> +  }
>> +
>> +  /* Second phase. */
>> +  if (num_args < 2) {
>> +    errno = EINVAL;
>> +    goto empty_pathnames;
>> +  }
>> +  if (ADD_OVERFLOW (num_args, 1u, &num_sh_args) ||
>> +      MUL_OVERFLOW (num_sh_args, sizeof *sh_argv, &sh_argv_bytes))
{
>> +    errno = EOVERFLOW;
>> +    goto empty_pathnames;
>> +  }
>> +  sh_argv = malloc (sh_argv_bytes);
>> +  if (sh_argv == NULL)
>> +    goto empty_pathnames;
> 
> Should we prepopulate sh_argv[0] as a placeholder that can then be
> easily swapped to ctx->pathnames[i], where the child first attempts
> execve(pathnames[i], argv+1, env) with the ENOEXEC fallback to
> argv[0]=pathnames[i]; execve("/bin/sh", argv, env), rather than
having
> to shuffle things in the child process? [3]
(commenting under [3] below)
> 
>> +
>> +  /* Commit. */
>> +  ctx->pathnames = pathnames;
>> +  ctx->sh_argv = sh_argv;
>> +  ctx->num_sh_args = num_sh_args;
>> +  rc = 0;
>> +  /* Fall through, for freeing temporaries. */
>> +
>> +empty_pathnames:
>> +  if (rc == -1) {
>> +    free (pathname);
>> +    string_vector_empty (&pathnames);
>> +  }
>> +
>> +free_sys_path:
>> +  free (sys_path);
>> +
>> +  return rc;
>> +}
>> +
>> +void
>> +nbd_internal_execvpe_uninit (struct execvpe *ctx)
>> +{
>> +  free (ctx->sh_argv);
>> +  ctx->num_sh_args = 0;
>> +  string_vector_empty (&ctx->pathnames);
>> +}
>> +
>> +int
>> +nbd_internal_fork_safe_execvpe (struct execvpe *ctx, const
string_vector *argv,
>> +                                char * const *envp)
>> +{
>> +  size_t pathname_idx;
>> +
>> +  NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->pathnames.len > 0);
>> +
>> +  pathname_idx = 0;
>> +  do {
>> +    (void)execve (ctx->pathnames.ptr[pathname_idx], argv->ptr,
envp);
>> +    if (errno != EACCES && errno != ELOOP && errno !=
ENAMETOOLONG &&
>> +        errno != ENOENT && errno != ENOTDIR)
>> +      break;
>> +
>> +    ++pathname_idx;
>> +  } while (pathname_idx < ctx->pathnames.len);
>> +
>> +  if (errno == ENOEXEC) {
>> +    char **sh_argp;
>> +    size_t argv_idx;
>> +
>> +    NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args >=
argv->len);
>> +    NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args - argv->len
== 1);
>> +
>> +    sh_argp = ctx->sh_argv;
>> +    *sh_argp++ = argv->ptr[0];
>> +    *sh_argp++ = ctx->pathnames.ptr[pathname_idx];
>> +    for (argv_idx = 1; argv_idx < argv->len; ++argv_idx)
>> +      *sh_argp++ = argv->ptr[argv_idx];
>> +
>> +    (void)execve ("/bin/sh", ctx->sh_argv, envp);
> 
> [3] Oh, I see - the shuffle is still probably needed, because
> ctx->sh_argv is different than user's argv which does not have a
> pre-allocated filler slot that we can just swap out.  Unless we decide
> that the _init() function should take argv itself rather than making
> the child do the work of shifting everything from one array to
> another, but then we are inconsistent for grabbing argv in the parent
> and env in the child.
Indeed. Again, the intent was to move the minimum amount of work
required into the parent; in parallel, to move the minimum amount of
state into the explicit "context" structure.

Other than that, I'll make a comment of my own here. I could have used
memcpy() in place of the loop, and I actually prototyped how that would
look. (Memcpy() is async-signal-safe, and arguably faster on a
high-quality libc than the manual loop.) However, for the last argument
of memcpy(), I would have wanted to assert that the element size of
"ctx->sh_argv" and that of argv->ptr were the same. Furthermore,
I would
have wanted to make that a static assertion. But a general static
assertion would have been so much work (cf. STATIC_ASSERT_UNSIGNED_INT),
in comparison to how few elements we tend to copy here, that I decided
to go with the loop. The assignment operator in the loop is really nice
because the compiler checks for element compatibility (i.e., it enforces
that both arrays have (char*) elements).

> 
>> +  }
>> +
>> +  return -1;
>> +}
>>
> 
> Overall looks like nice work.
> 
> Based on my review, you may decide to make further tweaks, but I
> didn't see any hard reason why we couldn't use the patch as-written
if
> you don't think my review warrants changes.  Thus:
> 
> Reviewed-by: Eric Blake <eblake at redhat.com>
> 
Thanks!

I will update the commit message (fix the double space typo, and remove
the allusion to bash compatibility near PATH=''), and replace the FIXME
with a NOTE in get_path().

Thanks for the thorough review!
Laszlo

Libguestfs - Feb 2023 - [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()