Daniel P. Berrangé
2023-Feb-21 12:08 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
On Wed, Feb 15, 2023 at 03:11:38PM +0100, Laszlo Ersek wrote:> execvp() [01] is powerful: > > - it performs PATH search [02] if necessary, > > - it falls back to executing the file with the shell as a shell script in > case the underlying execv() or execve() fails with ENOEXEC. > > However, execvp() is not async-signal-safe [03], and so we shouldn't call > it in a child process forked from a multi-threaded parent process [04], > which the libnbd client application may well be.Is that actually a problem in the real world ? There are various things not technically listed as async-signal-safe by POSIX, which all sane impls none the less make safe, or are practically safe in any sane program using them. IIUC with execvp the risk would be that setenv makes any use of the environment potentially unsafe, and possibly execvp might use malloc which is also technically unsafe. Both of these factors would technically make your replacement unsafe too. Apps simply shouldn't ever call setenv once threads are created, and malloc() is safe in any impl that is relevant these days.> > Introduce three APIs for safely emulating execvp() with execve(): > > - nbd_internal_execvpe_init(): to be called in the parent process, before > fork(). Allocate and format candidate pathnames for execution with > execve(), based on PATH. Allocate a buffer for the longer command line > that the shell script fallback needs. > > - nbd_internal_fork_safe_execvpe(): to be called in the child process. Try > the candidates formatted previously in sequence with execve(), as long > as execve() fails with errors related to pathname resolution / > inode-level file type / execution permission. Stop iterating on fatal > errors; if the fatal error is ENOEXEC, activate the shell script > fallback. As a small added feature, take the environment from an > explicit parameter (effectively imitating the GNU-specific execvpe() > function -- cf. commit dc64ac5cdd0b, "states: Use execvp instead of > execvpe", 2019-11-15). > > - nbd_internal_execvpe_uninit(): to be called in the parent process, after > fork(). Release the resources allocated with > nbd_internal_execvpe_init(). > > The main idea with the above distribution of responsibilities is that any > pre-scanning of PATH -- which was suggested by Eric Blake -- should not > include activities that are performed by execve() anyway. I've considered > and abandoned two approaches: > > - During scanning (i.e., in nbd_internal_execvpe_init()), check each > candidate with access(X_OK) [05]. > > I dropped this because the informative APPLICATION USAGE section of the > same specification [06] advises against it, saying "[a]n application > should instead attempt the action itself and handle the [EACCES] error > that occurs if the file is not accessible". > > - During scanning, open each candidate with O_EXEC [07] until one open() > succeeds. Save the sole file descriptor for > nbd_internal_fork_safe_execvpe(). In the latter, call fexecve() [08], > which is free of all error conditions that necessitate iteration over > candidates. Still implement the ENOEXEC (shell script) fallback. > > I dropped this because (a) fexecve() is a quite recent interface > (highlighted by Eric); (b) for an application to open a file for > execution with fexecve(), it's more robust to pass O_EXEC to open() than > not to pass it, per APPLICATION USAGE [09], but on Linux/glibc, O_EXEC > does not seem supported, only O_PATH does [10]. > > Thus the chosen approach -- pre-generate filenames -- contains a small > TOCTTOU race (highlighted by Eric) after all, but it should be harmless. > > Implementation-defined details: > > - If PATH searching is necessary, but PATH is absent [01], then fall back > to confstr(_CS_PATH) [11], similarly to Linux/glibc execvpe() [12]. > > If PATH is set but empty ("set to null") [02], or PATH is unset and > confstr(_CS_PATH) fails or returns no information or returns the empty > string, we fail the initial scanning (!) with ENOENT. This is consistent > with bash's behavior on Linux/glibc: > > $ PATH= ls > bash: ls: No such file or directory > > Details chosen for unspecified behavior: > > - Hardwire "/bin/sh" as the shell's pathname [01]. > > [01] https://pubs.opengroup.org/onlinepubs/9699919799/functions/execvp.html > [02] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03 > [03] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 > [04] https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html > [05] https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html > [06] https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html#tag_16_09_07 > [07] https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html > [08] https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html > [09] https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html#tag_16_111_07 > [10] https://man7.org/linux/man-pages/man2/open.2.html > [11] https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html > [12] https://man7.org/linux/man-pages/man3/execvpe.3.html > > Suggested-by: Eric Blake <eblake at redhat.com> > Signed-off-by: Laszlo Ersek <lersek at redhat.com> > --- > lib/internal.h | 22 ++ > lib/utils.c | 351 ++++++++++++++++++++ > 2 files changed, 373 insertions(+) > > diff --git a/lib/internal.h b/lib/internal.h > index 9c9021ed366e..8c4e32013e40 100644 > --- a/lib/internal.h > +++ b/lib/internal.h > @@ -368,6 +368,18 @@ struct command { > uint32_t error; /* Local errno value */ > }; > > +struct execvpe { > + string_vector pathnames; > + > + /* Note: "const_string_vector" is not a good type for "sh_argv" below. Even if > + * we reserved enough space in a "const_string_vector", > + * const_string_vector_append() would still not be async-signal-safe, due to > + * the underlying const_string_vector_insert() calling assert(). > + */ > + char **sh_argv; > + size_t num_sh_args; > +}; > + > /* Test if a callback is "null" or not, and set it to null. */ > #define CALLBACK_IS_NULL(cb) ((cb).callback == NULL && (cb).free == NULL) > #define CALLBACK_IS_NOT_NULL(cb) (! CALLBACK_IS_NULL ((cb))) > @@ -540,4 +552,14 @@ extern void nbd_internal_fork_safe_assert (int result, const char *file, > __func__, #expression)) > #endif > > +extern int nbd_internal_execvpe_init (struct execvpe *ctx, const char *file, > + size_t num_args) > + LIBNBD_ATTRIBUTE_NONNULL (1, 2); > +extern void nbd_internal_execvpe_uninit (struct execvpe *ctx) > + LIBNBD_ATTRIBUTE_NONNULL (1); > +extern int nbd_internal_fork_safe_execvpe (struct execvpe *ctx, > + const string_vector *argv, > + char * const *envp) > + LIBNBD_ATTRIBUTE_NONNULL (1, 2, 3); > + > #endif /* LIBNBD_INTERNAL_H */ > diff --git a/lib/utils.c b/lib/utils.c > index 5dd00f82e073..9a96ed6ed674 100644 > --- a/lib/utils.c > +++ b/lib/utils.c > @@ -28,6 +28,7 @@ > #include <limits.h> > > #include "minmax.h" > +#include "checked-overflow.h" > > #include "internal.h" > > @@ -463,3 +464,353 @@ nbd_internal_fork_safe_assert (int result, const char *file, long line, > xwrite (STDERR_FILENO, "' failed.\n", 10); > abort (); > } > + > +/* Returns the value of the PATH environment variable -- falling back to > + * confstr(_CS_PATH) if PATH is absent -- as a dynamically allocated string. On > + * failure, sets "errno" and returns NULL. > + */ > +static char * > +get_path (void) > + LIBNBD_ATTRIBUTE_ALLOC_DEALLOC (free) > +{ > + char *path; > + bool env_path_found; > + size_t path_size, path_size2; > + > + /* FIXME: lock the environment. */ > + path = getenv ("PATH"); > + if ((env_path_found = (path != NULL))) > + path = strdup (path); > + /* FIXME: unlock the environment. */ > + > + if (env_path_found) { > + /* This handles out-of-memory as well. */ > + return path; > + } > + > + errno = 0; > + path_size = confstr (_CS_PATH, NULL, 0); > + if (path_size == 0) { > + /* If _CS_PATH does not have a configuration-defined value, just store > + * ENOENT to "errno". > + */ > + if (errno == 0) > + errno = ENOENT; > + > + return NULL; > + } > + > + path = malloc (path_size); > + if (path == NULL) > + return NULL; > + > + path_size2 = confstr (_CS_PATH, path, path_size); > + assert (path_size2 == path_size); > + return path; > +} > + > +/* nbd_internal_execvpe_init() and nbd_internal_fork_safe_execvpe() together > + * present an execvp() alternative that is async-signal-safe. > + * > + * nbd_internal_execvpe_init() may only be called before fork(), for filling in > + * the caller-allocated, uninitialized "ctx" structure. If > + * nbd_internal_execvpe_init() succeeds, then fork() may be called. > + * Subsequently, in the child process, nbd_internal_fork_safe_execvpe() may be > + * called with the inherited "ctx" structure, while in the parent process, > + * nbd_internal_execvpe_uninit() must be called to uninitialize (evacuate) the > + * "ctx" structure. > + * > + * On failure, "ctx" will not have been modified, "errno" is set, and -1 is > + * returned. Failures include: > + * > + * - Errors forwarded from underlying functions such as strdup(), confstr(), > + * malloc(), string_vector_append(). > + * > + * - ENOENT: "file" is an empty string. > + * > + * - ENOENT: "file" does not contain a <slash> "/" character, the PATH > + * environment variable is not set, and confstr() doesn't associate a > + * configuration-defined value with _CS_PATH. > + * > + * - ENOENT: "file" does not contain a <slash> "/" character, and: (a) the PATH > + * environment variable is set to the empty string, or (b) PATH is not > + * set, and confstr() outputs the empty string for _CS_PATH. > + * > + * - EOVERFLOW: the sizes or counts of necessary objects could not be expressed. > + * > + * - EINVAL: "num_args" is less than 2. > + * > + * On success, the "ctx" structure will have been filled in, and 0 is returned. > + * > + * - "pathnames" member: > + * > + * - All strings pointed-to by elements of the "pathnames" string_vector > + * member are owned by "pathnames". > + * > + * - If "file" contains a <slash> "/" character, then the sole entry in > + * "pathnames" is a copy of "file". > + * > + * - If "file" does not contain a <slash> "/" character: > + * > + * Let "system path" be defined as the value of the PATH environment > + * variable, if the latter exists, and as the value output by confstr() for > + * _CS_PATH otherwise. Per the ENOENT specifications above, "system path" is > + * a non-empty string. Let "system path" further be of the form > + * > + * <prefix_0> [n = 1] > + * > + * or > + * > + * <prefix_0>:<prefix_1>:...:<prefix_(n-1)> [n >= 2] > + * > + * where for each 0 <= i < n, <prefix_i> does not contain the <colon> ":" > + * character. In the (n = 1) case, <prefix_0> is never empty (see ENOENT > + * above), while in the (n >= 2) case, any individual <prefix_i> may or may > + * not be empty. > + * > + * The "pathnames" string_vector member has n elements; for each 0 <= i < n, > + * element i is of the form > + * > + * suffix(curdir(<prefix_i>))file > + * > + * where > + * > + * curdir(x) := "." if x = "" > + * curdir(x) := x otherwise > + * > + * and > + * > + * suffix(x) := x if "x" ends with a <slash> "/" > + * suffix(x) := x "/" otherwise > + * > + * This transformation implements the POSIX XBD / PATH environment variable > + * semantics, creating candidate pathnames for execution by > + * nbd_internal_fork_safe_execvpe(). nbd_internal_fork_safe_execvpe() will > + * iterate over the candidate pathnames with execve() until execve() > + * succeeds, or fails with an error that is due to neither pathname > + * resolution, nor the candidate not being a regular file, nor the candidate > + * lacking execution permission. > + * > + * - The "sh_argv" array member will have at least (num_args + 1) elements > + * allocated, and none populated. > + * > + * (The minimum value of "num_args" is 2 -- see EINVAL above. According to > + * POSIX, "[t]he argument /arg0/ should point to a filename string that is > + * associated with the process being started by one of the /exec/ functions", > + * plus "num_args" includes the null pointer that terminates the argument > + * list.) > + * > + * This allocation is made in anticipation of execve() failing for a > + * particular candidate inside nbd_internal_fork_safe_execvpe() with ENOEXEC > + * ("[t]he new process image file has the appropriate access permission but > + * has an unrecognized format"). While that failure terminates the iteration, > + * the failed call > + * > + * execve (pathnames[i], > + * { argv[0], argv[1], ..., NULL }, // (num_args >= 2) elements > + * { envp[0], envp[1], ..., NULL }) > + * > + * must be repeated as > + * > + * execve (<shell-path>, > + * { argv[0], pathnames[i], // ((num_args + 1) >= 3) > + argv[1], ..., NULL }, // elements > + * { envp[0], envp[1], ..., NULL }) > + * > + * for emulating execvp(). The allocation in the "sh_argv" member makes it > + * possible just to *link* the original "argv" elements and the "pathnames[i]" > + * candidate into the right positions. > + * > + * (POSIX leaves the shell pathname unspecified; "/bin/sh" should be good > + * enough.) > + * > + * The shell *binary* will see itself being executed under the name "argv[0]", > + * will receive "pathnames[i]" as the pathname of the shell *script* to read > + * and interpret ("command_file" in POSIX terminology), will expose > + * "pathnames[i]" as the positional parameter $0 to the script, and will > + * forward "argv[1]" and the rest to the script as positional parameters $1 > + * and onward. > + */ > +int > +nbd_internal_execvpe_init (struct execvpe *ctx, const char *file, > + size_t num_args) > +{ > + int rc; > + char *sys_path; > + string_vector pathnames; > + char *pathname; > + size_t num_sh_args; > + char **sh_argv; > + size_t sh_argv_bytes; > + > + rc = -1; > + > + if (file[0] == '\0') { > + errno = ENOENT; > + return rc; > + } > + > + /* First phase. */ > + sys_path = NULL; > + pathnames = (string_vector)empty_vector; > + > + if (strchr (file, '/') == NULL) { > + size_t file_len; > + const char *sys_path_element, *scan; > + bool finish; > + > + sys_path = get_path (); > + if (sys_path == NULL) > + return rc; > + > + if (sys_path[0] == '\0') { > + errno = ENOENT; > + goto free_sys_path; > + } > + > + pathname = NULL; > + file_len = strlen (file); > + sys_path_element = sys_path; > + scan = sys_path; > + do { > + assert (sys_path_element <= scan); > + finish = (*scan == '\0'); > + if (finish || *scan == ':') { > + const char *sys_path_copy_start; > + size_t sys_path_copy_size; > + size_t sep_copy_size; > + size_t pathname_size; > + char *p; > + > + if (scan == sys_path_element) { > + sys_path_copy_start = "."; > + sys_path_copy_size = 1; > + } else { > + sys_path_copy_start = sys_path_element; > + sys_path_copy_size = scan - sys_path_element; > + } > + > + assert (sys_path_copy_size >= 1); > + sep_copy_size = (sys_path_copy_start[sys_path_copy_size - 1] != '/'); > + > + if (ADD_OVERFLOW (sys_path_copy_size, sep_copy_size, &pathname_size) || > + ADD_OVERFLOW (pathname_size, file_len, &pathname_size) || > + ADD_OVERFLOW (pathname_size, 1u, &pathname_size)) { > + errno = EOVERFLOW; > + goto empty_pathnames; > + } > + > + pathname = malloc (pathname_size); > + if (pathname == NULL) > + goto empty_pathnames; > + p = pathname; > + > + memcpy (p, sys_path_copy_start, sys_path_copy_size); > + p += sys_path_copy_size; > + > + memcpy (p, "/", sep_copy_size); > + p += sep_copy_size; > + > + memcpy (p, file, file_len); > + p += file_len; > + > + *p++ = '\0'; > + > + if (string_vector_append (&pathnames, pathname) == -1) > + goto empty_pathnames; > + /* Ownership transferred. */ > + pathname = NULL; > + > + sys_path_element = scan + 1; > + } > + > + ++scan; > + } while (!finish); > + } else { > + pathname = strdup (file); > + if (pathname == NULL) > + return rc; > + > + if (string_vector_append (&pathnames, pathname) == -1) > + goto empty_pathnames; > + /* Ownership transferred. */ > + pathname = NULL; > + } > + > + /* Second phase. */ > + if (num_args < 2) { > + errno = EINVAL; > + goto empty_pathnames; > + } > + if (ADD_OVERFLOW (num_args, 1u, &num_sh_args) || > + MUL_OVERFLOW (num_sh_args, sizeof *sh_argv, &sh_argv_bytes)) { > + errno = EOVERFLOW; > + goto empty_pathnames; > + } > + sh_argv = malloc (sh_argv_bytes); > + if (sh_argv == NULL) > + goto empty_pathnames; > + > + /* Commit. */ > + ctx->pathnames = pathnames; > + ctx->sh_argv = sh_argv; > + ctx->num_sh_args = num_sh_args; > + rc = 0; > + /* Fall through, for freeing temporaries. */ > + > +empty_pathnames: > + if (rc == -1) { > + free (pathname); > + string_vector_empty (&pathnames); > + } > + > +free_sys_path: > + free (sys_path); > + > + return rc; > +} > + > +void > +nbd_internal_execvpe_uninit (struct execvpe *ctx) > +{ > + free (ctx->sh_argv); > + ctx->num_sh_args = 0; > + string_vector_empty (&ctx->pathnames); > +} > + > +int > +nbd_internal_fork_safe_execvpe (struct execvpe *ctx, const string_vector *argv, > + char * const *envp) > +{ > + size_t pathname_idx; > + > + NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->pathnames.len > 0); > + > + pathname_idx = 0; > + do { > + (void)execve (ctx->pathnames.ptr[pathname_idx], argv->ptr, envp); > + if (errno != EACCES && errno != ELOOP && errno != ENAMETOOLONG && > + errno != ENOENT && errno != ENOTDIR) > + break; > + > + ++pathname_idx; > + } while (pathname_idx < ctx->pathnames.len); > + > + if (errno == ENOEXEC) { > + char **sh_argp; > + size_t argv_idx; > + > + NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args >= argv->len); > + NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args - argv->len == 1); > + > + sh_argp = ctx->sh_argv; > + *sh_argp++ = argv->ptr[0]; > + *sh_argp++ = ctx->pathnames.ptr[pathname_idx]; > + for (argv_idx = 1; argv_idx < argv->len; ++argv_idx) > + *sh_argp++ = argv->ptr[argv_idx]; > + > + (void)execve ("/bin/sh", ctx->sh_argv, envp); > + } > + > + return -1; > +} > > _______________________________________________ > Libguestfs mailing list > Libguestfs at redhat.com > https://listman.redhat.com/mailman/listinfo/libguestfs >With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Laszlo Ersek
2023-Feb-21 16:03 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
On 2/21/23 13:08, Daniel P. Berrang? wrote:> On Wed, Feb 15, 2023 at 03:11:38PM +0100, Laszlo Ersek wrote: >> execvp() [01] is powerful: >> >> - it performs PATH search [02] if necessary, >> >> - it falls back to executing the file with the shell as a shell >> script in case the underlying execv() or execve() fails with >> ENOEXEC. >> >> However, execvp() is not async-signal-safe [03], and so we shouldn't >> call it in a child process forked from a multi-threaded parent >> process [04], which the libnbd client application may well be. > > Is that actually a problem in the real world ? There are various > things not technically listed as async-signal-safe by POSIX, which > all sane impls none the less make safe, or are practically safe in > any sane program using them.By my count, it's virtually impossible to implement exec*p*() and friends without calling malloc() internally. That's the case even if you can get PATH in there without any problems. In turn, malloc() relies on libc-internal mutex(es) or other synchronization mechanisms, so that multiple threads can call malloc() / free() etc simultaneously, for managing the one shared address space of the process. The problem is with calling fork() in a multi-threaded process. The child process created by fork() only has one thread -- the thread returning from fork(). All other threads simply "disappear", as viewed from the child process's perspective. This means that, if one of those "other" (disappearing) threads was in the middle of malloc(), holding for example a mutex, then in the child process, that mutex will appear as acquired, but no thread will own it -- so no thread will ever release it. Then, when the one thread returning from fork() in the child process calls exec*p*(), the malloc() call internal to exec*p*() will deadlock on that mutex. Rich mentioned that libnbd had actually encountered a bug of this kind, just not specifically in exec*p*(). So the "async-signal-safe" term is a bit misleading here; I think what POSIX actually means is a kind of general reentrancy. Anyway, the POSIX language does use "async-signal-safe" when discussing this topic of threads, in the fork() specification. Now, the exec*p*() manual at https://man7.org/linux/man-pages/man3/execvp.3.html calls execlp(), execvp(), execvpe() "MT-Safe env", so you are right about at least Linux/glibc. I've (intentionally) not looked at the glibc implementation of execvp(), but I agree that, *if* we are happy with getenv() just because the linux/glibc manual calls it "MT-Safe env", *then* we could arguably be satisfied with execvp() as well. I don't know if/how that applies to FreeBSD / OpenBSD though. https://man.freebsd.org/cgi/man.cgi?query=execvp https://man.openbsd.org/execvp.3 These man pages do not seem to contain any of the strings "mt", "async", "signal", "safe", "thread".> IIUC with execvp the risk would be that setenv makes any use of the > environment potentially unsafe,to a smaller extent, yes> and possibly execvp might use malloc which is also technically unsafe.primarily this one, yes> Both of these factors would technically make your replacement unsafe > too.No, they wouldn't. That's the whole point. Both getenv() and malloc() are restricted to the _init API, which is called in the parent process, before fork(). The necessary information is prepared in a context structure, which the child only inherits. After fork(), the child calls a second API -- the effective execvpe() replacement -- which operates on the inherited context structure. The only functions called from this second API are write() and abort() -- indirectly, in case the new async-signal-safe assert() variant fails --, and execve(). All three of these functions are async-signal-safe.> Apps simply shouldn't ever call setenv once threads are created, and > malloc() is safe in any impl that is relevant these days.FWIW, the linux/glibc manual states, in <https://man7.org/linux/man-pages/man2/fork.2.html>:> * After a fork() in a multithreaded program, the child can > safely call only async-signal-safe functions (see > signal-safety(7)) until such time as it calls execve(2).and <https://man7.org/linux/man-pages/man7/signal-safety.7.html> does not list either execvp() or malloc(). The glibc manual is more verbose <https://www.gnu.org/software/libc/manual/html_node/Creating-a-Process.html>:> The _Fork function is similar to fork, but it does not invoke any > callbacks registered with pthread_atfork, nor does it reset any > internal state or locks (such as the malloc locks). In the new > subprocess, only async-signal-safe functions may be called, such as > dup2 or execve.This implies that malloc() and fork() in glibc actually inter-operate, so that fork() -- not _Fork() -- "reset malloc locks". This confirms your point about latest glibc, but I don't know if/how the same applies to the BSDs, or to earlier versions of glibc (for example the one in RHEL7, which libnbd still cares about, IIUC). (I'm unsure if it's wise for me to dig into the glibc commit history with git-blame, for finding the earliest release that adds this extra safety beyond POSIX.) Laszlo