Eric Blake
2023-Feb-06 17:28 UTC
[Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation
On Sun, Feb 05, 2023 at 09:10:29AM +0100, Laszlo Ersek wrote:> On 2/1/23 17:17, Laszlo Ersek wrote: > > On 1/31/23 18:19, Eric Blake wrote: > > >> The alternative to relying on execvp() to scan PATH is to pre-scan > >> PATH ourselves before fork(). I wish there were a helper function in > >> glibc that would quickly return the absolute path that execvp() would > >> otherwise utilize. > > > > We use execvp() in another spot -- "generator/states-connect.c". > > > > I'll try to come up with some new nbd_internal_fork_safe_*() APIs, for > > preparing and then "running" execvp. > > There's a problem. > > getenv() is not (generally) thread-safe -- even if we called > getenv("PATH") before forking. > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html > > An application may call a libnbd interface from one of its threads, and > call getenv() -- or setenv(), or unsetenv(), or putenv() -- from another > one of its threads. Because there is no synchronization around > *existent* getenv() calls in libnbd (LOGNAME, HOME, LIBNBD_DEBUG), this > would lead to undefined behavior.You're right: In general, manipulating the environment in a multi-threaded application is a risky proposition, as some libc implementations can perform memory allocations to do things like compact or (re-)sort the list of current variables even on a getenv() action; although from the glibc documentation, it looks like the better qualtiy-of-implementation ones make getenv() non-modifying and only putenv()/setenv()/unsetenv() can cause actual modifications to the current list. It doesn't help that POSIX allows the historical practice where assignments to the global 'environ' cause libc to use the resulting new environment correctly, even if the just-assigned list is not sorted the way libc would have done had it been in charge of all environment manipulations through the function interfaces.> > We can't robustly pre-fetch these environment variables in a library > constructor funcion, like errors_init(), for two reasons: > > - Client code that manipulates these variables intentionally, would > break, even if single-threaded. Example: "tests/debug-environment.c".Indeed - the fact that we have already documented the use of an environment variable as a way to tweak debugging behavior of libnbd from within a given process means that we don't want to go breaking that behavior, in places where it is done safely. Maybe we can improve the documentation to remind users that using setenv()/putenv() to add LIBNBD_DEBUG to the environment is best done at a point where other threads in the process will not be accessing the environment, but at most it would be just documentation and not something we can add code to overcome.> > - dlopen() and dlsym() are thread-safe functions (all POSIX functions > are, unless explicitly noted otherwiseYep, libraries in general should not modify the environment. Reading the environment is a different matter, though, and these days, it is an acceptable (although probably under-documented) practice to assume that as long as the main app doesn't modify the environment in one thread while using a library in another, then the library can read (but not modify) the environment. This puts the burden of locking on the application, without needing any additional interfaces into the library. ...> > Linux/glibc is a bit more helpful: > <https://man7.org/linux/man-pages/man3/getenv.3.html> calls getenv() > "MT-Safe env", and that is explained as follows > <https://man7.org/linux/man-pages/man7/attributes.7.html>: > > > env Functions marked with env as an MT-Safety issue access > > the environment with getenv(3) or similar, without any > > guards to ensure safety in the presence of concurrent > > modifications. > > > > We do not mark these functions as MT-Unsafe, however, > > because functions that modify the environment are all > > marked with const:env and regarded as unsafe. Being > > unsafe, the latter are not to be called when multiple > > threads are running or asynchronous signals are enabled, > > and so the environment can be considered effectively > > constant in these contexts, which makes the former safe. > > IOW the argument on Linux/glibc is that "getenv() will not break > getenv(), because it does not modify anything".Yes, and any good quality of implementation libc should do the same (although I don't know for sure if the BSD-derived systems actually do that).> > It still needs to be protected from writes: setenv() is marked as > "MT-Unsafe const:env" > <https://man7.org/linux/man-pages/man3/setenv.3.html>, and see the > following excerpt about const:env", again from attributes(7):But unless we write the environment in libnbd (which we should not be doing), the burden of protecting writes is the applications job - the application should not be calling into libnbd while also modifying the environment, but libnbd doesn't need to do anything special to assist an application in implementing that level of lockout.> > I don't like it: > > - it requires the same amount of work in the source code (both libnbd > and client apps), > > - readers excluding readers is safer per POSIX, where getenv() may be > entirely MT-Unsafe, > > - the environment is not a highly contended resource, so the performance > impact of getenv() excluding getenv() is likely negligible. > > Summary: > - ignore the whole thing?This, or at most document that applications should not be modifying the environment in parallel with calling libnbd functions. That is, I'm okay if we decide to be explicit that we rely on glibc's promise of getenv() being MT-safe as long as the other 3 functions which modify the environment are not used in parallel; but I'm also okay with leaving that implicit in the .pod documentation where we can refer people back to this thread in the archives if it ever actually comes up in the future.> - update the documentation, add locking APIs, implemented with a mutex? > - update the documentation, add locking APIs, implemented with an > rwlock?Too much work for no real gain. Libnbd does not need to do either of these, at least not without someone giving us good reason why they can't implement proper locking themselves without libnbd's help. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Laszlo Ersek
2023-Feb-07 08:34 UTC
[Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation
Hi Eric, On 2/6/23 18:28, Eric Blake wrote:> On Sun, Feb 05, 2023 at 09:10:29AM +0100, Laszlo Ersek wrote: >> On 2/1/23 17:17, Laszlo Ersek wrote: >>> On 1/31/23 18:19, Eric Blake wrote: >> >>>> The alternative to relying on execvp() to scan PATH is to pre-scan >>>> PATH ourselves before fork(). I wish there were a helper function in >>>> glibc that would quickly return the absolute path that execvp() would >>>> otherwise utilize. >>> >>> We use execvp() in another spot -- "generator/states-connect.c". >>> >>> I'll try to come up with some new nbd_internal_fork_safe_*() APIs, for >>> preparing and then "running" execvp. >> >> There's a problem. >> >> getenv() is not (generally) thread-safe -- even if we called >> getenv("PATH") before forking. >> >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html >> >> An application may call a libnbd interface from one of its threads, and >> call getenv() -- or setenv(), or unsetenv(), or putenv() -- from another >> one of its threads. Because there is no synchronization around >> *existent* getenv() calls in libnbd (LOGNAME, HOME, LIBNBD_DEBUG), this >> would lead to undefined behavior. > > You're right: In general, manipulating the environment in a > multi-threaded application is a risky proposition, as some libc > implementations can perform memory allocations to do things like > compact or (re-)sort the list of current variables even on a getenv() > action; although from the glibc documentation, it looks like the > better qualtiy-of-implementation ones make getenv() non-modifying and > only putenv()/setenv()/unsetenv() can cause actual modifications to > the current list. It doesn't help that POSIX allows the historical > practice where assignments to the global 'environ' cause libc to use > the resulting new environment correctly, even if the just-assigned > list is not sorted the way libc would have done had it been in charge > of all environment manipulations through the function interfaces. > >> >> We can't robustly pre-fetch these environment variables in a library >> constructor funcion, like errors_init(), for two reasons: >> >> - Client code that manipulates these variables intentionally, would >> break, even if single-threaded. Example: "tests/debug-environment.c". > > Indeed - the fact that we have already documented the use of an > environment variable as a way to tweak debugging behavior of libnbd > from within a given process means that we don't want to go breaking > that behavior, in places where it is done safely. Maybe we can > improve the documentation to remind users that using setenv()/putenv() > to add LIBNBD_DEBUG to the environment is best done at a point where > other threads in the process will not be accessing the environment, > but at most it would be just documentation and not something we can > add code to overcome. > >> >> - dlopen() and dlsym() are thread-safe functions (all POSIX functions >> are, unless explicitly noted otherwise > > Yep, libraries in general should not modify the environment. Reading > the environment is a different matter, though, and these days, it is > an acceptable (although probably under-documented) practice to assume > that as long as the main app doesn't modify the environment in one > thread while using a library in another, then the library can read > (but not modify) the environment. This puts the burden of locking on > the application, without needing any additional interfaces into the > library. > > ... >> >> Linux/glibc is a bit more helpful: >> <https://man7.org/linux/man-pages/man3/getenv.3.html> calls getenv() >> "MT-Safe env", and that is explained as follows >> <https://man7.org/linux/man-pages/man7/attributes.7.html>: >> >>> env Functions marked with env as an MT-Safety issue access >>> the environment with getenv(3) or similar, without any >>> guards to ensure safety in the presence of concurrent >>> modifications. >>> >>> We do not mark these functions as MT-Unsafe, however, >>> because functions that modify the environment are all >>> marked with const:env and regarded as unsafe. Being >>> unsafe, the latter are not to be called when multiple >>> threads are running or asynchronous signals are enabled, >>> and so the environment can be considered effectively >>> constant in these contexts, which makes the former safe. >> >> IOW the argument on Linux/glibc is that "getenv() will not break >> getenv(), because it does not modify anything". > > Yes, and any good quality of implementation libc should do the same > (although I don't know for sure if the BSD-derived systems actually do > that). > >> >> It still needs to be protected from writes: setenv() is marked as >> "MT-Unsafe const:env" >> <https://man7.org/linux/man-pages/man3/setenv.3.html>, and see the >> following excerpt about const:env", again from attributes(7): > > But unless we write the environment in libnbd (which we should not be > doing), the burden of protecting writes is the applications job - the > application should not be calling into libnbd while also modifying the > environment, but libnbd doesn't need to do anything special to assist > an application in implementing that level of lockout. > >> >> I don't like it: >> >> - it requires the same amount of work in the source code (both libnbd >> and client apps), >> >> - readers excluding readers is safer per POSIX, where getenv() may be >> entirely MT-Unsafe, >> >> - the environment is not a highly contended resource, so the performance >> impact of getenv() excluding getenv() is likely negligible. >> >> Summary: >> - ignore the whole thing? > > This, or at most document that applications should not be modifying > the environment in parallel with calling libnbd functions. That is, > I'm okay if we decide to be explicit that we rely on glibc's promise > of getenv() being MT-safe as long as the other 3 functions which > modify the environment are not used in parallel; but I'm also okay > with leaving that implicit in the .pod documentation where we can > refer people back to this thread in the archives if it ever actually > comes up in the future. > >> - update the documentation, add locking APIs, implemented with a mutex? >> - update the documentation, add locking APIs, implemented with an >> rwlock? > > Too much work for no real gain. Libnbd does not need to do either of > these, at least not without someone giving us good reason why they > can't implement proper locking themselves without libnbd's help.Thanks for the detailed reply. I've wholly gotten on the "ignore it" wagon, just one comment: It's insufficient for the writer to lock, and the reader *not* to lock. There are no visibility and consistency guarantees that way. If one thread in the application modifies the environment and protects it with some mutex or lock (that libnbd does not know about), or even performs the modification "atomically", but then another thread running libnbd consumes the environment some ten minutes later, without a proper, matching synchronization mechanism, then the libnbd code executing on the latter thread may not see the modified environment at all, or may see garbage. C11 goes into extreme depths about this in "5.1.2.4 Multi-threaded executions and data races" and "7.26 Threads <threads.h>". Before C11, there is no formal memory model to speak of, but in practice, in portable code, the proper use of those pthreads APIs that must have inspired the parallel C11 APIs is both necessary and sufficient for avoiding data races. Where the glibc manual states, at https://man7.org/linux/man-pages/man7/attributes.7.html Conditionally safe features const that consumers (readers) must use read locks, that's not only in order to prevent that they consume the object as it's being modified. It's also there so readers *do* consume the "latest" (updated) object once the other thread is done updating, and so that whatever readers see is *complete*. In a publish-subscribe pattern, there is no subscription without actually subscribing. (Here's an example. If the application modifies LIBNBD_DEBUG in main(), then creates a thread with pthread_create(), and uses libnbd in the latter thread, that's perfectly safe. That's because pthread_create() is a proper synchronization point for *both* parties considered. (In C11, this appears as: "The completion of the thrd_create function synchronizes with the beginning of the execution of the new thread".) If the application first calls pthread_create(), then sets LIBNBD_DEBUG in the original thread, and then the other thread fetches the environment much-much later, it's unsafe, no matter "how much later" "later" is.) Anyway, to summarize: I'm OK if we just ignore this! My point is that *if* we do anything about it, ever, then it cannot be one-sided. Both sides must participate then. Sorry about the amount of text... Laszlo