Laszlo Ersek
2023-Feb-22 00:47 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
On 2/21/23 20:32, Eric Blake wrote:> On Tue, Feb 21, 2023 at 07:07:38PM +0100, Laszlo Ersek wrote:>> Well, given Daniel's comments meanwhile, it seems like the original >> execvp() is something we shouldn't fret about. :/ > > glibc marks execvp() and exevpe() as 'MT-Safe env', which means it > does not modify 'environ' and presumably does not use 'malloc'. If > the only reason POSIX marks execvp() as thread-unsafe is because of > its interaction with getenv() for PATH and therefore unsafe to exec a > child in one thread while another is calling putenv(), then it is no > worse than our use of getenv() without locking on the grounds that no > sane app will be doing setenv() after spawning threads. > > But you've gone to a lot of work on this series; I'm still in favor of > including your work, even if decide our premise behind it is weaker > than intended. As I understood it, the premise is that actively > avoiding as many non-async-safe functions as possible between fork() > and exec() is always a good idea to avoid the deadlock created when a > multithreaded process fork()s in one thread while another thread holds > a mutex on the non-async-safe resource (in the child process, the > other thread no longer exists and therefore can never release the > resource). Knowing WHICH resources are liable to be locked in other > threads makes it easier to reason about which generically > non-async-safe functions can be used if we make other limiting > restrictions (such as glibc's 'MT-Safe env' designation), but that > requires more thought than blindly avoiding all non-async-safe > functions.It's killing me to see my work turn out as waste, but I don't want it to be included (and to present maintenance burden) just for "saving" the work. I kind of expected it to be unwelcome (due to it being opinionated and not fixing acute symptoms); I didn't expect it to be lost from the specification / premise up. This early paragraph in the commit message is torpedoed: However, execvp() is not async-signal-safe [03], and so we shouldn't call it in a child process forked from a multi-threaded parent process [04], which the libnbd client application may well be. That "and so" implication is busted because the glibc manual (from GNU, not the Linux manual pages!) *implies* that fork(), as opposed to _Fork(), does not restrict the child process to AS-Safe functions. Anyway, I think your characterization is right. My approach was, "avoid calling anything that's not explicitly async-signal-safe". A laxer approach is "avoid calling anything that might interfere with particular resources". Unfortunately, this laxer approach, while it may require some "further caution" in case we want to call further functions in the affected context, quite summarily obviates this particular execvp() replacement. Plus, regarding said "further caution" in general, I'm getting the impression (from RHBZ#906468) that glibc's intent is actually the opposite -- i.e., the intent seems to be that any particular API should opt out of, rather than opt in to, usability after fork(). Considering how much work is still needed to address the review feedback thus far (and the necessary v4 reviews!), I'm seriously torn if we should just drop the whole thing. (And there's a really sore lesson for me in this, regarding cleanups for standards conformance.)>> Can you please elaborate on "+s"? (I'd like to understand your point >> regardless of this patch, too.) > > https://www.austingroupbugs.net/view.php?id=1440 is where it came up > in the Austin Group. Basically, having system("+s") be _required_ to > invoke ["/path/to/sh", "-c", "+s"] is risky - "+s" is ambiguous > between being the name of a real script and being a shell optionSo what does the "+s" option do specifically? "-s" makes the shell read commands from stdin, but "+s" would only apply to such an "-s" option that were taken by the "set" special builtin -- it would *disable* that set-option (while "set -s" would enable it). However, I don't see *any* "-s" under "set", so I don't know what "set +s" would do.> (POSIX already warns that naming an executable with a leading "-" is > unwise because of potential conflicts with options, but 'sh' has the > special rules that + as well as - can introduce options for historical > reasons).So is "+s" just a synonym for "-s", meaning the shell will read commands from stdin?> Calling ["/path/to/sh", "-c", "--", "+s"] is unambiguous > for a POSIX sh that understands "--" as the end-of-options marker, but > not all historical sh did that. It is also possible to use > ["/path/to/sh", "-c", " +s"] (note the added leading space, which is > ignored by the shell),ignored why?> but that requires space for injecting the > leading space, and how do you do that concisely while still > maintaining async-safety?(right, not doing it, just learning about +s)>> (... I wanted to say that "there's no replacement for reading POSIX in >> full", but now I'm wondering if reading POSIX *at all* makes sense...) > > Alas, this rings too close to home... Standards are only useful if > they are likely to be followed, which in turn is harder if they are > hard to read."Hard to read" is a problem, but not the core issue IMO here. The problem is that for the sake of various technological advances, POSIX is intentionally abandoned, and that leaves us with *zero* common documentation that might uniformly cover even a small handful of modern OSes. And we have no alternative to individually testing out every "standard" API on each platform of interest. I have no problem with prctl() being Linux-specific; I'm very much irritated by fork() relaxing its standard restrictions without proper documentation. It's a recipe for wasting work. Anyway I'll log out now because I'm hardly coherent. Laszlo
Eric Blake
2023-Feb-22 21:50 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
On Wed, Feb 22, 2023 at 01:47:02AM +0100, Laszlo Ersek wrote:> On 2/21/23 20:32, Eric Blake wrote: > > On Tue, Feb 21, 2023 at 07:07:38PM +0100, Laszlo Ersek wrote: > > >> Well, given Daniel's comments meanwhile, it seems like the original > >> execvp() is something we shouldn't fret about. :/ > > > > glibc marks execvp() and exevpe() as 'MT-Safe env', which means it > > does not modify 'environ' and presumably does not use 'malloc'. If > > the only reason POSIX marks execvp() as thread-unsafe is because of > > its interaction with getenv() for PATH and therefore unsafe to exec a > > child in one thread while another is calling putenv(), then it is no > > worse than our use of getenv() without locking on the grounds that no > > sane app will be doing setenv() after spawning threads. > > > > But you've gone to a lot of work on this series; I'm still in favor of > > including your work, even if decide our premise behind it is weaker > > than intended. As I understood it, the premise is that actively > > avoiding as many non-async-safe functions as possible between fork() > > and exec() is always a good idea to avoid the deadlock created when a > > multithreaded process fork()s in one thread while another thread holds > > a mutex on the non-async-safe resource (in the child process, the > > other thread no longer exists and therefore can never release the > > resource). Knowing WHICH resources are liable to be locked in other > > threads makes it easier to reason about which generically > > non-async-safe functions can be used if we make other limiting > > restrictions (such as glibc's 'MT-Safe env' designation), but that > > requires more thought than blindly avoiding all non-async-safe > > functions. > > It's killing me to see my work turn out as waste, but I don't want it to be included (and to present maintenance burden) just for "saving" the work. I kind of expected it to be unwelcome (due to it being opinionated and not fixing acute symptoms); I didn't expect it to be lost from the specification / premise up. This early paragraph in the commit message is torpedoed:And even with your advance communication that you did not want to work on something that might be deemed overkill, I was not expecting the reaction in this thread to turn out in quite the way it has. I sincerely hope that whether or not your execvp replacement code lands, you do not consider time spent on writing/reviewing it as wasted, as we are all learning things in this thread.> > However, execvp() is not async-signal-safe [03], and so we shouldn't call > it in a child process forked from a multi-threaded parent process [04], > which the libnbd client application may well be. > > That "and so" implication is busted because the glibc manual (from GNU, not the Linux manual pages!) *implies* that fork(), as opposed to _Fork(), does not restrict the child process to AS-Safe functions.Or that glibc has taken special pains to make reliance on certain AS-unsafe functions (namely, malloc() and getenv()) "safer" in practice, despite permission from POSIX allowing glibc to throw caution to the wind and claim deadlocks to be the progammer's bug rather than glibc's. The fact that documentation does not match reality is indeed disheartening, but this may be a good time to open up bug requests to the documentation team(s) that can improve it for the next reader. Deadlock avoidance is one of the harder topics in computer science; I'm glad that glibc is going out of their way to make fork()/exec() work in more cases than promised by POSIX, but it also means a lot of programmers don't realize the complexities that glibc is solving on their behalf, and how their code may not be portable to a lesser libc.> > Anyway, I think your characterization is right. My approach was, "avoid calling anything that's not explicitly async-signal-safe". A laxer approach is "avoid calling anything that might interfere with particular resources". > > Unfortunately, this laxer approach, while it may require some "further caution" in case we want to call further functions in the affected context, quite summarily obviates this particular execvp() replacement. > > Plus, regarding said "further caution" in general, I'm getting the impression (from RHBZ#906468) that glibc's intent is actually the opposite -- i.e., the intent seems to be that any particular API should opt out of, rather than opt in to, usability after fork(). > > Considering how much work is still needed to address the review feedback thus far (and the necessary v4 reviews!), I'm seriously torn if we should just drop the whole thing. (And there's a really sore lesson for me in this, regarding cleanups for standards conformance.) > > >> Can you please elaborate on "+s"? (I'd like to understand your point > >> regardless of this patch, too.) > > > > https://www.austingroupbugs.net/view.php?id=1440 is where it came up > > in the Austin Group. Basically, having system("+s") be _required_ to > > invoke ["/path/to/sh", "-c", "+s"] is risky - "+s" is ambiguous > > between being the name of a real script and being a shell option > > So what does the "+s" option do specifically? > > "-s" makes the shell read commands from stdin, but "+s" would only apply to such an "-s" option that were taken by the "set" special builtin -- it would *disable* that set-option (while "set -s" would enable it). However, I don't see *any* "-s" under "set", so I don't know what "set +s" would do.POSIX doesn't define 'set -s', and in fact, bash treats 'set -s' and 'set +s' as errors. But sh itself DOES have a -s option (mandated by POSIX), and POSIX also mandates that shells must accept options with a leading '+'. So in practice, what happens is that both bash and dash _happen_ to parse and silently ignore '+s' as an unknown option when passed as a command line argument under the invocation name 'sh', since they already have to do similar command-line parsing for 'sh -a' vs. 'sh +a'. Maybe the problem is more readily visible when using '+a' as a script name, because that's something that POSIX does require shells to support. If I write system("+a") and have an executable by that name on my PATH, I want to execute that program, not get an error about the shell not having an argument to -c. And if I don't have it on my PATH, I want to be told that. But my initial example used "+s" instead of "+a" because that was the example discussed in the Austin Group (and I don't know why the original poster in that context picked that as their example). $ /bin/bash -c +a; echo $? /bin/bash: -c: option requires an argument 2 $ /bin/bash -c -- +a; echo $? /bin/bash: line 1: +a: command not found 127 $ /bin/dash -c +a; echo $? /bin/dash: 0: -c requires an argument 2 $ /bin/dash -c -- +a; echo $? /bin/dash: 1: +a: not found 127> > > (POSIX already warns that naming an executable with a leading "-" is > > unwise because of potential conflicts with options, but 'sh' has the > > special rules that + as well as - can introduce options for historical > > reasons). > > So is "+s" just a synonym for "-s", meaning the shell will read commands from stdin?As far as I can tell, neither bash nor dash documents +s behavior; it is probably more likely an accident of implementation than something intentional.> > > Calling ["/path/to/sh", "-c", "--", "+s"] is unambiguous > > for a POSIX sh that understands "--" as the end-of-options marker, but > > not all historical sh did that. It is also possible to use > > ["/path/to/sh", "-c", " +s"] (note the added leading space, which is > > ignored by the shell), > > ignored why?The string argument present when -c is in use parses the entire string (in this case, " +s") as a command line to run, and command lines ignore leading space characters. The problem with bare "+s" is that it is potentially ambiguous as an option name (at which point -c fails when there is no matching string), while " +s" is unambiguously not an option and therefore can only be the contents of the script that -c asked to run, at which point the leading space in the string no longer matters, having gotten us past the option-parsing ambiguity.> > > but that requires space for injecting the > > leading space, and how do you do that concisely while still > > maintaining async-safety? > > (right, not doing it, just learning about +s) > > >> (... I wanted to say that "there's no replacement for reading POSIX in > >> full", but now I'm wondering if reading POSIX *at all* makes sense...) > > > > Alas, this rings too close to home... Standards are only useful if > > they are likely to be followed, which in turn is harder if they are > > hard to read. > > "Hard to read" is a problem, but not the core issue IMO here. The problem is that for the sake of various technological advances, POSIX is intentionally abandoned, and that leaves us with *zero* common documentation that might uniformly cover even a small handful of modern OSes. And we have no alternative to individually testing out every "standard" API on each platform of interest. I have no problem with prctl() being Linux-specific; I'm very much irritated by fork() relaxing its standard restrictions without proper documentation. It's a recipe for wasting work. > > Anyway I'll log out now because I'm hardly coherent.There's no rush here; I understand that this effort has been emotionally draining on you, so please take the time you need to get back into the right mental frame when following up. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Seemingly Similar Threads
- [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()