thr3ads.net - Libguestfs - [Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Laszlo Ersek

2023-Feb-01 16:17 UTC

[Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation

On 1/31/23 18:19, Eric Blake wrote:> On Tue, Jan 31, 2023 at 01:07:53PM +0000, Richard W.M. Jones wrote:
>>>
>>> I really didn't want to obsess about this -- I spent like 10+
minutes on
>>> curbing my enthusiasm! :) --, but I believe there's a semantic
bug in
>>> the patch, one that's directly related to my "hidden"
thoughts.
>>>
>>> (1) In the last hunk, strlen() is applied to
"LISTEN_FDS=". However, the
>>> zero-index element of the env array holds
>>> "LISTEN_PID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx". In other
words, the patch
>>> only gets lucky because "PID" and "FDS" are
both three characters long.
>>
>> Yup that is totally wrong :-(
>>
>>> Relatedly, my hidden thought was that we shouldn't use so many
naked
>>> string literals all over the code.
>>>
>>> May I take a stab at rewriting this? I feel that's the easiest
way for
>>> me to express what I'd propose. Basically I'd propose:
>>>
>>> - an enum for listing the "keys" we need,
>>>
>>> - a static array of structure elements, for expressing the
environment
>>>   variables (name=value), *and* the prefix lengths,
>>>
>>> - a macro for populating an element of the array -- use
"sizeof" for
>>>   grabbing the prefix length
>>
>> Sure, go for it please.
> 
> More structure will pay off if we have more variables to handle in the
> long run.  For just 3 (instead of 2), it's still a toss-up in my mind
> if the extra structure is worth it, but I'm not opposed to seeing an
> attempt at a patch along these lines.
I'm torn like that myself. For me the divide is usually between 2 and 3.
What pushed me over the edge in this case is the LISTEN_FDS= typo in the
last hunk of the patch; that's sort of evidence that it's time to
centralize the variables.
> 
>>
>>> (2) Pre-patch, at commit 5a02c7d2cc6a, the error handling tail of
>>> prepare_socket_activation_environment() is less than ideal, IMO.
Namely,
>>> we have (excerpt)
>>>
>>>>  err:
>>>>   set_error (errno, "malloc");
>>>>   string_vector_iter (env, (void *) free);
>>>>   free (env->ptr);
>>>>   return -1;
>>>
>>> (2a) we free the vector's pointer field, but don't NULL it,
nor do we
>>> zero the len or cap fields.
>>>
>>> We should call string_vector_reset() instead.
>>
>> Yup.
> 
> The lone caller doesn't utilize env after error, but I agree that we
> are better off not leaving this function to expose an incomplete
'env'
> back to the caller in case we start using this function in more
> places.
Thanks -- I agree (and had realized) nothing depended on the
"residual"
status of "env", it's just that we start this function with

  assert (env->len == 0);

and that pushed me towards maintaining consistency at all times, even
though resetting all the fields is technically a waste of cycles.
> 
>>
>>> (2b) Casting the address of the free() function to (void*) makes me
>>> uncomfortable. It is undefined behavior by ISO C.
>>>
>>> Now, I seem to remember that POSIX says in various places that
pointers
>>> to functions and pointers to void have identical representation,
and
>>> also that pointers to void and pointers to structures have
identical
>>> representation. One of those locations is the dlsym() spec
>>>
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html>.
>>> The other locations elude me, unfortunately. I think at least one
of
>>> those "other" locations may be in one of the Conformance
sections; Eric
>>> will know better.
> 
> I recall it being discussed in the Austin Group that POSIX explicitly
> requires void* and function pointers to be cross-assignable, but only
> _because_ of dlsym().  After searching for variations of 'void *',
> 'void pointer', 'function pointer' and 'pointer to
function', I
> couldn't locate anything besides the dlsym() section that elaborates
> on the requirement that a POSIX compiler must allow conversion of a
> function pointer into void* and back.
Thank you very much for checking :)

It's quite possible that I'm incorrectly recalling a different memory,
about those "other locations". What I remember is *something* about
pointers-to-void, and furthermore that the location in the spec was
totally unexpected. I was just casually browsing one of the Conformance
sections or something like that, and was stunned to find such a
statement there. Perhaps it was about the all-zeroes bit pattern
constituting a NULL pointer; not sure. (I did find such a passage now,
too, but it's not a requirement. Either way it's too bad I've not
had
enough "quality time" with the standard in the last years.)

> 
>>>
>>> Regardless, casting "free" to a pointer-to-object, just
because
>>> string_vector_iter() takes a (void(*)(char*)), and not a
>>> (void(*)(void*)), is questionable style, IMO.
>>>
>>> I've grepped the tree for this pattern:
>>>
>>>   git grep -E '\(void ?\*\) ?free'
>>>
>>> and there are eleven hits.
>>>
>>> Furthermore, there are *no other* _vector_iter() calls -- and not
just
>>> string_vector_iter() calls, but in general, _vector_iter() ones! --
than
>>> these eleven.
>>>
>>> I think it's time we designed either a general freeing iterator
API for
>>> vector, or at least added a trivial (stop-gap) wrapper function
like
>>> this:
>>>
>>>> diff --git a/common/utils/string-vector.h
b/common/utils/string-vector.h
>>>> index 80d7311debfb..5221c70e3f78 100644
>>>> --- a/common/utils/string-vector.h
>>>> +++ b/common/utils/string-vector.h
>>>> @@ -39,4 +39,10 @@
>>>>
>>>>  DEFINE_VECTOR_TYPE(string_vector, char *);
>>>>
>>>> +static inline void
>>>> +string_free (char *string)
>>>> +{
>>>> +  free (string);
>>>> +}
>>>> +
>>>>  #endif /* STRING_VECTOR_H */
>>>
>>> Comments please :)
>>
>> Agreed.
> 
> I'm also in agreement that tweaking our vector interface for the ways
> in which we actively use it (without having to spell out explicit
> (void*) casts) would be welcome.
In retrospect, "inline" appears quite the opposite of what we want
here
-- we need to take and pass the address of the function. While (per my
superficial reading) C99 does not seem to preclude taking the address of
an inline function, the intent is certainly to flatten direct calls, and
we're going to make indirect calls instead.

It's probably best to extend the macros we have; I'll need to dig into
it to see what feels least awkward.
> 
>>
>>> (3) At the last hunk, the code suggests we're between fork()
and exec().
>>> Per POSIX
>>>
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html>,
>>> there we can only call async-signal-safe functions:
>>>
>>>> the child process may only execute async-signal-safe operations
until
>>>> such time as one of the exec functions is called
>>>
>>> The list of async-signal-safe functions can be found at
>>>
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>.
>>> snprintf() and sprintf() are not on that list, so it makes sense
for
>>> nbd_internal_fork_safe_itoa() to exist.
>>
>> Yes, in the past we have actually hit real bugs because of this.  One
>> I recall is:
>>
>>
https://github.com/libguestfs/libguestfs/commit/e1c9bbb3d1d5ef81490977060120dda0963eb567
>>
>> They are very hard to diagnose.  This one only happened when a certain
>> glibc feature was enabled.
>>
>>> The remaining functions we call in this context also seem to be on
the
>>> list... except for execvp().
>>>
>>> execvp() scans PATH, and is not safe to use in this concept.
>>
>> That's quite annoying.
>>
>>> I think we should call execve() instead. First, it is
async-signal-safe.
>>> Second, it could take "env.ptr" directly; I do find the
"environ"
>>> assignment a bit dubious, even if it happens to conform to POSIX.
>>>
>>> What image are we executing here, to begin with? Do we really
depend on
>>> PATH searching? Or do we rely on execvp() transparently launching
shell
>>> scripts?
>>
>> Yes we do depend on path search.  It is usually called in cases such
>> as this:
>>
>>
https://gitlab.com/nbdkit/libnbd/-/blob/5a02c7d2cc6a201f9e5531c0c20c2f3c22b805a2/examples/open-qcow2.c#L35
>>
>> I wonder if we can just ignore this one until someone complains
>> about the bug.
> 
> The alternative to relying on execvp() to scan PATH is to pre-scan
> PATH ourselves before fork().  I wish there were a helper function in
> glibc that would quickly return the absolute path that execvp() would
> otherwise utilize.
We use execvp() in another spot -- "generator/states-connect.c".

I'll try to come up with some new nbd_internal_fork_safe_*() APIs, for
preparing and then "running" execvp. It's not only PATH scanning
that we
need, but launching the shell (for which I'll just assume /bin/sh) if
the initial execve() fails with ENOEXEC. And running the shell with
execve() requires a modified argv too -- it needs to be shifted down one
slot, and the first argument must be duplicated (as it will stand for
both the name the shell's going to be executed under, and the name of
the script file for the shell to parse).
> fexecve() also comes in handy for avoiding TOCTTOU
> races, but it is newer and not as widely available.
For now I'd just like to get rid of execvp(). It bothers me very much
that we avoid something as harmless as snprintf() but rely on execvp()
not causing issues. TOCTTOU is another level (and I suspect that this
TOCTTOU race is harmless, unless we define a threat model).

Thank you both for all the comments -- I'd like to post a series that
only implements the refactorings and the execvp() replacement. I hope we
can delay the LISTEN_FDNAMES addition until after that.

Laszlo

Laszlo Ersek

2023-Feb-05 08:10 UTC

head link

[Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation

On 2/1/23 17:17, Laszlo Ersek wrote:> On 1/31/23 18:19, Eric Blake wrote:
>> The alternative to relying on execvp() to scan PATH is to pre-scan
>> PATH ourselves before fork().  I wish there were a helper function in
>> glibc that would quickly return the absolute path that execvp() would
>> otherwise utilize.
>
> We use execvp() in another spot -- "generator/states-connect.c".
>
> I'll try to come up with some new nbd_internal_fork_safe_*() APIs, for
> preparing and then "running" execvp.
There's a problem.

getenv() is not (generally) thread-safe -- even if we called
getenv("PATH") before forking.

  https://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html

An application may call a libnbd interface from one of its threads, and
call getenv() -- or setenv(), or unsetenv(), or putenv() -- from another
one of its threads. Because there is no synchronization around
*existent* getenv() calls in libnbd (LOGNAME, HOME, LIBNBD_DEBUG), this
would lead to undefined behavior.

We can't robustly pre-fetch these environment variables in a library
constructor funcion, like errors_init(), for two reasons:

- Client code that manipulates these variables intentionally, would
  break, even if single-threaded. Example:
"tests/debug-environment.c".

- dlopen() and dlsym() are thread-safe functions (all POSIX functions
  are, unless explicitly noted otherwise

<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_407>).
  If an application were to dlopen() libnbd, it could expect dlopen() to
  be thread-safe, and so no libnbd constructor function within would be
  permitted to call thread-unsafe functions.

A robust solution would call for an explicit libnbd initialization
function (where libnbd would snapshot the relevant environment variables
into global variables, and the application would be told, by way of
documentation, that this would have to be done before any threads are
created). But that breaks compatibility with existent libnbd
applications.

Another robust solution would be for libnbd to:

- expose a global mutex called "env_mutex" (possibly wrapped into an
  nbd_ accessor API, such as "nbd_lock_environment" or something),

- acquire the mutex around all getenv() and similar calls,

- document for any client application that any environment accesses
  anywhere in the application must be protected by this lock. In
  particular, because getenv() my *modify* internal state, its return
  value must be copied before releasing the mutex.

This is per POSIX:
> The returned string pointer might be invalidated or [...] the string
> content might be overwritten by a subsequent call to getenv
Linux/glibc is a bit more helpful:
<https://man7.org/linux/man-pages/man3/getenv.3.html> calls getenv()
"MT-Safe env", and that is explained as follows
<https://man7.org/linux/man-pages/man7/attributes.7.html>:
>        env    Functions marked with env as an MT-Safety issue access
>               the environment with getenv(3) or similar, without any
>               guards to ensure safety in the presence of concurrent
>               modifications.
>
>               We do not mark these functions as MT-Unsafe, however,
>               because functions that modify the environment are all
>               marked with const:env and regarded as unsafe.  Being
>               unsafe, the latter are not to be called when multiple
>               threads are running or asynchronous signals are enabled,
>               and so the environment can be considered effectively
>               constant in these contexts, which makes the former safe.
IOW the argument on Linux/glibc is that "getenv() will not break
getenv(), because it does not modify anything".

It still needs to be protected from writes: setenv() is marked as
"MT-Unsafe const:env"
<https://man7.org/linux/man-pages/man3/setenv.3.html>, and see the
following excerpt about const:env", again from attributes(7):
>        const  Functions marked with const as an MT-Safety issue non-
>               atomically modify internal objects that are better
>               regarded as constant, because a substantial portion of the
>               GNU C Library accesses them without synchronization.
>               Unlike race, which causes both readers and writers of
>               internal objects to be regarded as MT-Unsafe, this mark is
>               applied to writers only.  Writers remain MT-Unsafe to
>               call, but the then-mandatory constness of objects they
>               modify enables readers to be regarded as MT-Safe (as long
>               as no other reasons for them to be unsafe remain), since
>               the lack of synchronization is not a problem when the
>               objects are effectively constant.
>
>               The identifier that follows the const mark will appear by
>               itself as a safety note in readers.  Programs that wish to
>               work around this safety issue, so as to call writers, may
>               use a non-recursive read-write lock associated with the
>               identifier, and guard all calls to functions marked with
>               const followed by the identifier with a write lock, and
>               all calls to functions marked with the identifier by
>               itself with a read lock.
I don't like it:

- it requires the same amount of work in the source code (both libnbd
  and client apps),

- readers excluding readers is safer per POSIX, where getenv() may be
  entirely MT-Unsafe,

- the environment is not a highly contended resource, so the performance
  impact of getenv() excluding getenv() is likely negligible.

Summary:
- ignore the whole thing?
- update the documentation, add locking APIs, implemented with a mutex?
- update the documentation, add locking APIs, implemented with an
  rwlock?

Please comment,
Laszlo

Libguestfs - Feb 2023 - [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation

[Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation

[Libguestfs] [PATCH libnbd] generator: Pass LISTEN_FDNAMES=nbd with systemd socket activation