Daniel P. Berrangé
2023-Feb-21 18:04 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
On Tue, Feb 21, 2023 at 06:53:39PM +0100, Laszlo Ersek wrote:> > More in general, this lesson tells me that POSIX is effectively > irrelevant -- which is quite sad in itself; the bigger problem however > is that *nothing replaces it*. If the one formal standard we have for > portability does not reflect reality closely enough, and we need to rely > on personal experience with various platforms, then we're back to where > we were *before* POSIX. That is, having to check several separate > documentation sets, and testing each API on every relevant platform in > *each project* where the API is used. The idea is "ignore POSIX, care > about Linux / modern systems only", but then it turns out those modern > systems *do* differ sufficiently that extracting a common programming > base *would* be useful. It's just that POSIX is not that common base; > more precisely, there is no formalized, explicit common base. I guess > "whatever passes CI" is the common base. That's... terrible, and it > makes me seriously question if I want to program userspace in C at all.FWIW, I wouldn't say that POSIX is irrelevant in general. If you are trying to maximimse portability it is worth paying attention to. Rather I'd say that maintainers of projects may be opinionated about which platforms they wish to support, to eliminate the burden of caring about platforms which have few if any users in the modern world. In libvirt and QEMU context we set explicit platform support targets: https://libvirt.org/platforms.html https://www.qemu.org/docs/master/about/build-platforms.html which effectively limit us to only care about actively developed OS from the last ~4 years, and even then only fairly mainstream stuff. We don't care about a hobbyist/toy UNIX OS. The burden is on other OS to attain compatibility with mainstream modern OS, not for apps to adapt to osbscure feature-poor platforms. With this attitude, we don't care about compliance with countless obsolete vendor's UNIX platforms, and thus many of the edge cases that POSIX worries about can be ignored. This frees the project maintainers time to focus on work that benefits a broader set of users.>From this, libvirt/QEMU could both explicitly decide to not careabout any C compilers other than CLang/GCC. Vendor compilers and most especially MSVC are out of scope. CLang/GCC are able to support any of the OS platforms we target. This frees us from caring about ISO C standards, letting us use GNU extensions. AFAIK, libnbd/nbdkit haven't made a statement about what platforms they aim to target. In my response I'm more or less assuming though that you would only care about similar modern platforms to QEMU/libvirt, and thus POSIX conformance would not be needed in all areas. Maybe libnbd/nbdkit want to be more explicit about what they target as platforms to make the portability requirements clear to contributors ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Laszlo Ersek
2023-Feb-21 22:59 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
On 2/21/23 19:04, Daniel P. Berrang? wrote:> AFAIK, libnbd/nbdkit haven't made a statement about what platforms > they aim to target. In my response I'm more or less assuming though > that you would only care about similar modern platforms to QEMU/libvirt, > and thus POSIX conformance would not be needed in all areas. Maybe > libnbd/nbdkit want to be more explicit about what they target as > platforms to make the portability requirements clear to contributors ?libnbd's README.md requires * Linux, FreeBSD or OpenBSD. Other OSes may also work but we have only tested these three. * GCC or Clang * GNU make * bash * [...] nbdkit's requires * Linux, macOS, Windows, FreeBSD, OpenBSD or Haiku * GCC or Clang * bash * GNU make * [...] To me, anything beyond Linux on those OS lists is entirely untestable *locally*, hence my reliance on POSIX. CI is a horrible way (compared to a published technical standard) to figure out whether each individual interface works as needed everywhere, even just across this small set of OSes. Having to look at multiple OS manual pages is just slightly less horrible (and I consider those less trustworthy than POSIX; see again the conflict between the linux man pages and the glibc documentation from GNU). The POSIX people have done *huge work* to save us that effort. Sticking with POSIX might make us work more (as in, write technically superfluous code), but I've always felt fewer nasty surprises are waiting to ambush us that way. I don't think we have documentation that describes the broadest intersection of these OSes specifically. (We don't even have conflict-free documentation just for Linux!) Laszlo
Laszlo Ersek
2023-Feb-22 08:48 UTC
[Libguestfs] [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
Sorry about replying for the second time. After having slept on it (not much, but some), some thoughts are emerging / being distilled about my own attitude. On 2/21/23 19:04, Daniel P. Berrang? wrote:> On Tue, Feb 21, 2023 at 06:53:39PM +0100, Laszlo Ersek wrote: >> >> More in general, this lesson tells me that POSIX is effectively >> irrelevant -- which is quite sad in itself; the bigger problem however >> is that *nothing replaces it*. If the one formal standard we have for >> portability does not reflect reality closely enough, and we need to rely >> on personal experience with various platforms, then we're back to where >> we were *before* POSIX. That is, having to check several separate >> documentation sets, and testing each API on every relevant platform in >> *each project* where the API is used. The idea is "ignore POSIX, care >> about Linux / modern systems only", but then it turns out those modern >> systems *do* differ sufficiently that extracting a common programming >> base *would* be useful. It's just that POSIX is not that common base; >> more precisely, there is no formalized, explicit common base. I guess >> "whatever passes CI" is the common base. That's... terrible, and it >> makes me seriously question if I want to program userspace in C at all. > > FWIW, I wouldn't say that POSIX is irrelevant in general. If you > are trying to maximimse portability it is worth paying attention to. > > Rather I'd say that maintainers of projects may be opinionated about > which platforms they wish to support, to eliminate the burden of caring > about platforms which have few if any users in the modern world. > > In libvirt and QEMU context we set explicit platform support targets: > > https://libvirt.org/platforms.html > https://www.qemu.org/docs/master/about/build-platforms.html > > which effectively limit us to only care about actively developed > OS from the last ~4 years, and even then only fairly mainstream > stuff. We don't care about a hobbyist/toy UNIX OS. The burden is > on other OS to attain compatibility with mainstream modern OS, > not for apps to adapt to osbscure feature-poor platforms. > > With this attitude, we don't care about compliance with countless > obsolete vendor's UNIX platforms, and thus many of the edge cases > that POSIX worries about can be ignored. This frees the project > maintainers time to focus on work that benefits a broader set of > users. > > From this, libvirt/QEMU could both explicitly decide to not care > about any C compilers other than CLang/GCC. Vendor compilers and > most especially MSVC are out of scope. CLang/GCC are able to support > any of the OS platforms we target. This frees us from caring about > ISO C standards, letting us use GNU extensions.The attitude you describe above and my attitude are largely driven by the same goal: target development as narrowly as possible. Portability is essential in both cases; the big difference is in the workflow chosen for achieving portability. Approach #1: A number of OSes and a number of tools (compilers etc) are hand-picked, based on "practical" factors. This set of components taken as a whole does not have uniform, central documentation. Therefore, development is driven by (a) continuously consulting multiple -- often conflicting -- sets of documentation, and by (b) trial-and-error. By "trial-and-error" I mean that a "CI pass" is taken as strong evidence of absence of bugs, including portability bugs. The workflow relies heavily on CI to root out portability bugs. The advantage of this approach is that it deduces -- with documentation reconciliation, trial-and-error, and compiler / OS / libc source code investigation -- such a "common denominator" that is fairly likely the *greatest* common denominator. Therefore less/simpler code has to be written and maintained for feature and bugfix delivery. The disadvantage is that there is no single source of truth; the workflow is centered on reconciling incomplete and/or conflicting documentation sets, and "happens to work in CI" is taken as the final argument. CI is costly in computing time (energy), developer time (waiting, bad presentation of results), and money (minutes are expensive), and locally testing all targeted platforms is a huge chore. CI development/management in itself consumes immense human effort. Approach #2: target a published technical standard, as a single source of truth. Still employ CI, but not as a guiding tool, more like "just in case". CI failures originating from portability issues are not expected by general. CI success is not taken as the primary evidence of lack of portability bugs. The advantage of this approach is that developers can focus on a single source of truth, for driving development -- POSIX. Patching up portability problems may occasionally be necessary, but that should be the exception. The disadvantage of this approach is that POSIX, while arguably a common denominator, is almost certainly not the *greatest* common denominator. Therefore, more code needs to be written and maintained, plus recent developments that "eventually" appear in all of the targeted platforms / tools, are not consumable until they become centrally standardized. So, here's the thing: at a personal level, I can entirely identify with approach #2, and I'm unable to identify with approach #1, as the development workflow that I am supposed to follow and practice. To me, being torn to pieces between 3-4 conflicting documentation sets, and writing code such that the *primary metric* be "let me see if this passes CI -- let me throw code at the wall and see what sticks" is unbearable. Having to submit several tweaks in succession and waiting tens of minutes for CI to finish every time, rules out software development as a profession for me. (CI remains relevant anyway, but not for dictating or driving portability decisions.) Having to test out interfaces manually that are supposed to be standard, for determining and exploiting their *accidental* greatest common denominator, again rules out software development for me as a profession. Such work *is* valuable, but it's called standardization / standards development, not software development. I don't mind participating in standards development, but the *output* of that activity needs to be a *central formal standard* that programmers can rely upon in the future, not some implicit understanding that gets encoded in / dispersed over a bunch of disparate applications and libraries -- such as "we can call execvp() here because our particular fork() version lets us" -- that merely happen to target the same arbitrary set of platforms. QEMU actually gets this *quite* right, with "devel/style.rst". It still doesn't say anything about fork()/execvp() though, for example. On the same note, I honestly think that the conflict between the linux manual pages and the GNU manual, regarding the child process restrictions, is *unforgivable*. Note that I'm not trying to assert an objective truth here. All I'm saying is I'm personally incompatible with approach#1. To me, *how* I work is generally more important than *what* I achieve for users. Under that umbrella, the justification for introducing our own async-signal-safe execvpe in this patch is simply the fact that the official documentations (plural) available on Linux are *inconsistent* about fork()+execvp(). The fact that it "happens" to work in practice is just happenstance. If you will, call this my denial of practical reality. So: if the libnbd project can tolerate my attitude (approach#2), then I'd like to proceed with this series (full scope), with me addressing the v3 review feedback in v4, and so on. If not, then I'll abandon the series, and try to make myself useful with something else -- where my basic stance, towards whatever documentation I read, need not be *distrust*. Laszlo
Reasonably Related Threads
- [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v4 0/2] lib/utils: introduce async-signal-safe execvpe()
- [libnbd PATCH v3 09/29] lib/utils: introduce async-signal-safe execvpe()