thr3ads.net - Libguestfs - [Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe() [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Daniel P. Berrangé

2023-Feb-21 16:16 UTC

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

On Tue, Feb 21, 2023 at 05:03:23PM +0100, Laszlo Ersek
wrote:> On 2/21/23 13:08, Daniel P. Berrang? wrote:
> > On Wed, Feb 15, 2023 at 03:11:38PM +0100, Laszlo Ersek wrote:
> >> execvp() [01] is powerful:
> >>
> >> - it performs PATH search [02] if necessary,
> >>
> >> - it falls back to executing the file with the shell as a shell
> >>   script in case the underlying execv() or execve() fails with
> >>   ENOEXEC.
> >>
> >> However, execvp() is not async-signal-safe [03], and so we
shouldn't
> >> call it in a child process forked from a multi-threaded parent
> >> process [04], which the libnbd client application may well be.
> >
> > Is that actually a problem in the real world ?  There are various
> > things not technically listed as async-signal-safe by POSIX, which
> > all sane impls none the less make safe, or are practically safe in
> > any sane program using them.
> 
> By my count, it's virtually impossible to implement exec*p*() and
> friends without calling malloc() internally. That's the case even if
you
> can get PATH in there without any problems. In turn, malloc() relies on
> libc-internal mutex(es) or other synchronization mechanisms, so that
> multiple threads can call malloc() / free() etc simultaneously, for
> managing the one shared address space of the process.
> 
> The problem is with calling fork() in a multi-threaded process. The
> child process created by fork() only has one thread -- the thread
> returning from fork(). All other threads simply "disappear", as
viewed
> from the child process's perspective. This means that, if one of those
> "other" (disappearing) threads was in the middle of malloc(),
holding
> for example a mutex, then in the child process, that mutex will appear
> as acquired, but no thread will own it -- so no thread will ever release
> it. Then, when the one thread returning from fork() in the child process
> calls exec*p*(), the malloc() call internal to exec*p*() will deadlock
> on that mutex.
> 
> Rich mentioned that libnbd had actually encountered a bug of this kind,
> just not specifically in exec*p*().
> 
> So the "async-signal-safe" term is a bit misleading here; I think
what
> POSIX actually means is a kind of general reentrancy. Anyway, the POSIX
> language does use "async-signal-safe" when discussing this topic
of
> threads, in the fork() specification.
> 
> Now, the exec*p*() manual at
> 
>   https://man7.org/linux/man-pages/man3/execvp.3.html
> 
> calls execlp(), execvp(), execvpe() "MT-Safe env", so you are
right
> about at least Linux/glibc. I've (intentionally) not looked at the
glibc
> implementation of execvp(), but I agree that, *if* we are happy with
> getenv() just because the linux/glibc manual calls it "MT-Safe
env",
> *then* we could arguably be satisfied with execvp() as well.
> 
> I don't know if/how that applies to FreeBSD / OpenBSD though.
> 
>   https://man.freebsd.org/cgi/man.cgi?query=execvp
>   https://man.openbsd.org/execvp.3
> 
> These man pages do not seem to contain any of the strings "mt",
"async",
> "signal", "safe", "thread".
> 
> > IIUC with execvp the risk would be that setenv makes any use of the
> > environment potentially unsafe,
> 
> to a smaller extent, yes
> 
> > and possibly execvp might use malloc which is also technically unsafe.
> 
> primarily this one, yes
> 
> > Both of these factors would technically make your replacement unsafe
> > too.
> 
> No, they wouldn't. That's the whole point. Both getenv() and
malloc()
> are restricted to the _init API, which is called in the parent process,
> before fork(). The necessary information is prepared in a context
> structure, which the child only inherits. After fork(), the child calls
> a second API -- the effective execvpe() replacement -- which operates on
> the inherited context structure. The only functions called from this
> second API are write() and abort() -- indirectly, in case the new
> async-signal-safe assert() variant fails --, and execve(). All three of
> these functions are async-signal-safe.
Oh I missed that subtlety.
> 
> > Apps simply shouldn't ever call setenv once threads are created,
and
> > malloc() is safe in any impl that is relevant these days.
> 
> FWIW, the linux/glibc manual states, in
> <https://man7.org/linux/man-pages/man2/fork.2.html>:
> 
> >        *  After a fork() in a multithreaded program, the child can
> >           safely call only async-signal-safe functions (see
> >           signal-safety(7)) until such time as it calls execve(2).
> 
> and <https://man7.org/linux/man-pages/man7/signal-safety.7.html> does
> not list either execvp() or malloc().
> 
> The glibc manual is more verbose
>
<https://www.gnu.org/software/libc/manual/html_node/Creating-a-Process.html>:
Yes, the docs are written by language lawyers, but the reality is
that malloc is safe in glibc for 15+ years and that's not going to
change.
> 
> > The _Fork function is similar to fork, but it does not invoke any
> > callbacks registered with pthread_atfork, nor does it reset any
> > internal state or locks (such as the malloc locks). In the new
> > subprocess, only async-signal-safe functions may be called, such as
> > dup2 or execve.
> 
> This implies that malloc() and fork() in glibc actually inter-operate,
> so that fork() -- not _Fork() -- "reset malloc locks".
> 
> This confirms your point about latest glibc, but I don't know if/how
the
> same applies to the BSDs, or to earlier versions of glibc (for example
> the one in RHEL7, which libnbd still cares about, IIUC).
macOS, Linux glib, Linux musl, FreeBSD are all malloc safe after fork
with threads. Windows doesn't have fork.  I'm fairly certain that I
validated the OpenBSD/NetBSD malloc impl too, but it was a while ago.

Every single non-trivial program today uses threads, so you can
more or less guarantee that malloc is going to be used between
fork & exec on a app that uses threads. Any platform that doesn't
make malloc safe is going to be on the receiving end of many bug
reports.  Musl tried to be a hold out in this respect, refusing
to make their malloc safe, but eventually they just accepted the
modern reality and made it safe.

IMHO it just isn't worth spending time avoiding malloc between
fork/exec, and apps which call setenv()/unsetenv from outside
their initial main() method are almost certainly already broken,
because so many library APIs create threads behind your back.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Laszlo Ersek

2023-Feb-21 17:53 UTC

head link

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

On 2/21/23 17:16, Daniel P. Berrang? wrote:> On Tue, Feb 21, 2023 at 05:03:23PM +0100, Laszlo Ersek wrote:
>> On 2/21/23 13:08, Daniel P. Berrang? wrote:
>>> On Wed, Feb 15, 2023 at 03:11:38PM +0100, Laszlo Ersek wrote:
>>>> execvp() [01] is powerful:
>>>>
>>>> - it performs PATH search [02] if necessary,
>>>>
>>>> - it falls back to executing the file with the shell as a shell
>>>>   script in case the underlying execv() or execve() fails with
>>>>   ENOEXEC.
>>>>
>>>> However, execvp() is not async-signal-safe [03], and so we
shouldn't
>>>> call it in a child process forked from a multi-threaded parent
>>>> process [04], which the libnbd client application may well be.
>>>
>>> Is that actually a problem in the real world ?  There are various
>>> things not technically listed as async-signal-safe by POSIX, which
>>> all sane impls none the less make safe, or are practically safe in
>>> any sane program using them.
>>
>> By my count, it's virtually impossible to implement exec*p*() and
>> friends without calling malloc() internally. That's the case even
if you
>> can get PATH in there without any problems. In turn, malloc() relies on
>> libc-internal mutex(es) or other synchronization mechanisms, so that
>> multiple threads can call malloc() / free() etc simultaneously, for
>> managing the one shared address space of the process.
>>
>> The problem is with calling fork() in a multi-threaded process. The
>> child process created by fork() only has one thread -- the thread
>> returning from fork(). All other threads simply "disappear",
as viewed
>> from the child process's perspective. This means that, if one of
those
>> "other" (disappearing) threads was in the middle of malloc(),
holding
>> for example a mutex, then in the child process, that mutex will appear
>> as acquired, but no thread will own it -- so no thread will ever
release
>> it. Then, when the one thread returning from fork() in the child
process
>> calls exec*p*(), the malloc() call internal to exec*p*() will deadlock
>> on that mutex.
>>
>> Rich mentioned that libnbd had actually encountered a bug of this kind,
>> just not specifically in exec*p*().
>>
>> So the "async-signal-safe" term is a bit misleading here; I
think what
>> POSIX actually means is a kind of general reentrancy. Anyway, the POSIX
>> language does use "async-signal-safe" when discussing this
topic of
>> threads, in the fork() specification.
>>
>> Now, the exec*p*() manual at
>>
>>   https://man7.org/linux/man-pages/man3/execvp.3.html
>>
>> calls execlp(), execvp(), execvpe() "MT-Safe env", so you are
right
>> about at least Linux/glibc. I've (intentionally) not looked at the
glibc
>> implementation of execvp(), but I agree that, *if* we are happy with
>> getenv() just because the linux/glibc manual calls it "MT-Safe
env",
>> *then* we could arguably be satisfied with execvp() as well.
>>
>> I don't know if/how that applies to FreeBSD / OpenBSD though.
>>
>>   https://man.freebsd.org/cgi/man.cgi?query=execvp
>>   https://man.openbsd.org/execvp.3
>>
>> These man pages do not seem to contain any of the strings
"mt", "async",
>> "signal", "safe", "thread".
>>
>>> IIUC with execvp the risk would be that setenv makes any use of the
>>> environment potentially unsafe,
>>
>> to a smaller extent, yes
>>
>>> and possibly execvp might use malloc which is also technically
unsafe.
>>
>> primarily this one, yes
>>
>>> Both of these factors would technically make your replacement
unsafe
>>> too.
>>
>> No, they wouldn't. That's the whole point. Both getenv() and
malloc()
>> are restricted to the _init API, which is called in the parent process,
>> before fork(). The necessary information is prepared in a context
>> structure, which the child only inherits. After fork(), the child calls
>> a second API -- the effective execvpe() replacement -- which operates
on
>> the inherited context structure. The only functions called from this
>> second API are write() and abort() -- indirectly, in case the new
>> async-signal-safe assert() variant fails --, and execve(). All three of
>> these functions are async-signal-safe.
> 
> Oh I missed that subtlety.
> 
>>
>>> Apps simply shouldn't ever call setenv once threads are
created, and
>>> malloc() is safe in any impl that is relevant these days.
>>
>> FWIW, the linux/glibc manual states, in
>> <https://man7.org/linux/man-pages/man2/fork.2.html>:
>>
>>>        *  After a fork() in a multithreaded program, the child can
>>>           safely call only async-signal-safe functions (see
>>>           signal-safety(7)) until such time as it calls execve(2).
>>
>> and <https://man7.org/linux/man-pages/man7/signal-safety.7.html>
does
>> not list either execvp() or malloc().
>>
>> The glibc manual is more verbose
>>
<https://www.gnu.org/software/libc/manual/html_node/Creating-a-Process.html>:
> 
> Yes, the docs are written by language lawyers, but the reality is
> that malloc is safe in glibc for 15+ years and that's not going to
> change.
> 
>>
>>> The _Fork function is similar to fork, but it does not invoke any
>>> callbacks registered with pthread_atfork, nor does it reset any
>>> internal state or locks (such as the malloc locks). In the new
>>> subprocess, only async-signal-safe functions may be called, such as
>>> dup2 or execve.
>>
>> This implies that malloc() and fork() in glibc actually inter-operate,
>> so that fork() -- not _Fork() -- "reset malloc locks".
>>
>> This confirms your point about latest glibc, but I don't know
if/how the
>> same applies to the BSDs, or to earlier versions of glibc (for example
>> the one in RHEL7, which libnbd still cares about, IIUC).
> 
> macOS, Linux glib, Linux musl, FreeBSD are all malloc safe after fork
> with threads. Windows doesn't have fork.  I'm fairly certain that I
> validated the OpenBSD/NetBSD malloc impl too, but it was a while ago.
> 
> Every single non-trivial program today uses threads, so you can
> more or less guarantee that malloc is going to be used between
> fork & exec on a app that uses threads. Any platform that doesn't
> make malloc safe is going to be on the receiving end of many bug
> reports.  Musl tried to be a hold out in this respect, refusing
> to make their malloc safe, but eventually they just accepted the
> modern reality and made it safe.
> 
> IMHO it just isn't worth spending time avoiding malloc between
> fork/exec, and apps which call setenv()/unsetenv from outside
> their initial main() method are almost certainly already broken,
> because so many library APIs create threads behind your back.
I wish I had known this (i.e., about your research regarding multiple
OSes) *before* starting this work. Unfortunately, the execvpe impl in
this patch, and the unit tests in the next, required the majority of
effort in this series.

I'm not really seeing the point for the bulk of this series if we can
simply deem -- and do so justifiedly -- the existent code as "good
enough".

Now, I did anticipate that most of this work could be perceived as
needless pedantry, and I still decided to go ahead with it. So I can
accept it if we drop the series (and return to Rich's v2 for extending
the environment with LISTEN_FDNAMES). What I definitely wanted to
achieve was for this series to reach the list, and that has been
accomplished.

What I *cannot* offer though myself is to go over the series and
cherry-pick what patches make practical sense and what don't. To me,
they all matter; I consider them all pre-requisites, because I approach
the whole thing from a standards conformance background. If we drop that
("do whatever works"), that's fine, but then I can't help;
more
precisely, then there's no *need* for this series in the first place.
I'm honestly at loss as to how to implement the last few patches
*without* the cleanups -- but luckily, I don't need to figure that out,
because Rich did it already in v2. Reviewing that, I asked for a chance
to take a stab at the problem, so I'm good.

More in general, this lesson tells me that POSIX is effectively
irrelevant -- which is quite sad in itself; the bigger problem however
is that *nothing replaces it*. If the one formal standard we have for
portability does not reflect reality closely enough, and we need to rely
on personal experience with various platforms, then we're back to where
we were *before* POSIX. That is, having to check several separate
documentation sets, and testing each API on every relevant platform in
*each project* where the API is used. The idea is "ignore POSIX, care
about Linux / modern systems only", but then it turns out those modern
systems *do* differ sufficiently that extracting a common programming
base *would* be useful. It's just that POSIX is not that common base;
more precisely, there is no formalized, explicit common base. I guess
"whatever passes CI" is the common base. That's... terrible, and
it
makes me seriously question if I want to program userspace in C at all.

Laszlo

Laszlo Ersek

2023-Feb-21 21:39 UTC

head link

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

On 2/21/23 17:16, Daniel P. Berrang? wrote:
>>> Apps simply shouldn't ever call setenv once threads are
created, and
>>> malloc() is safe in any impl that is relevant these days.
> Yes, the docs are written by language lawyers, but the reality is
> that malloc is safe in glibc for 15+ years and that's not going to
> change.
> macOS, Linux glib, Linux musl, FreeBSD are all malloc safe after fork
> with threads. Windows doesn't have fork.  I'm fairly certain that I
> validated the OpenBSD/NetBSD malloc impl too, but it was a while ago.
Consider two patterns:

(1) an application calling setenv() in one thread and getenv() in
another thread

(2) a multi-threaded application forking, and then calling malloc() in
the child process

(Assume there are no user-provided libraries here, just the standard C
library, and applications.)

POSIX rules each practice a bug in the application.

"We" (for some uncertain definition of "we") however call
(1) a bug in
the application, and (2) a bug in the standard C library.

Put differently, glibc chooses to break under (1), and to protect iself
against (2).

If users report bugs about issue (1), the tickets are routed to the
application's bug tracker; if they report bugs about issue (2), the
tickets are routed to glibc's bug tracker.

Why? More precisely: why this inconsistency between the choices for (1)
and (2)? I'm not even asking why diverge from the published technical
standard -- my question is, why diverge *inconsistently* from the standard.

A different standard C library implementation might choose the inverse
"armoring".

I can't wrap my brain around the *arbitrariness* of this.

I guess I could entertain an argument like

  issue (1) is easy to fix in the application, while issue (2) is hard
  to fix in an application, so considering total development cost, it's
  best to fix issue (2) in a central location: glibc; i.e., it's best to
  extend POSIX just for issue (2)

*IF* your run-of-the-mill Linux distribution's *own* documentation were
not inconsistent with such an argument. The fact that we have two
divergent, conflicting documentation sets on just that *specific*
programming environment -- the linux manual pages, vs. the glibc info
pages from GNU --, where the former calls malloc() unsafe for (2) but
the latter calls it safe, turns the whole thing into a travesty. No
statement ever in either manual can be then taken seriously, everything
needs to be tested.

What justification do you see for the different treatment of (1) vs (2)?

Related tickets (both about the same issue):
- https://bugzilla.redhat.com/show_bug.cgi?id=906468
- https://sourceware.org/bugzilla/show_bug.cgi?id=19431

These were fixed in 2016-2017, so *not* 15+ years ago. malloc() may be
"exceptionally safe" in this regard, in practice, but other functions
that POSIX similarly calls unsafe, may be just "relatively safe", or
not
safe at all, in glibc. The problem is that we now can't make *any
assessment at all* based on the documentation, because POSIX is being
called "overly cautious", and because the Linux manual vs. the glibc
documentation have been proved obsolete and/or contradictory.

With this, it's effectively impossible, by way of code review and docs
checking, to determine if API usage is safe or not. What sense does code
review make like this in the first place?

Laszlo

Libguestfs - Feb 2023 - [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()

[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()