Eric Blake
2020-Feb-14  19:02 UTC
[Libguestfs] alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
I've got a situation where I need to hook a dlopen() made by VDDK, a 
proprietary library, where it passes a relative name expecting to 
resolve to a copy of several libraries, including libstdc++.so, that it 
installs alongside itself, and fails to load if that resolves to the 
system libstdc++.so.  The simplest solution of providing LD_LIBRARY_PATH 
is enough to load VDDK, but then poisons any child process which 
likewise fail to load if they pick up VDDK's libstdc++.so instead of the 
system one.  Up to now, we've documented throwing the burden on the end 
user who has to write convoluted:
LD_LIBRARY_PATH_save=$LD_LIBRARY_PATH \
  LD_LIBRARY_PATH=/path/to/vddklibs:$LD_LIBRARY_PATH \
  nbdkit vddk libdir=/path/to/vddklibs file --run \
    'LD_LIBRARY_PATH=$LD_LIBRARY_PATH_save; program args'
where we would rather the end-user could get away with a more concise:
nbdkit vddk libdir=/path/to/vddklibs file --run 'program args'
Sequentially, we have this scenario:
nbdkit vddk libdir=/path/to/libs file --run 'program args'
- nbdkit binary calls dlopen("/path/to/nbdkit-vddk-plugin.so")
   - nbdkit-vddk-plugin.so calls
     dlopen("/path/to/libs/libvixDiskLibs.so") using the libdir=
argument
     to load vddk (rather than dlopen("libvixDiskLibs.so") relying on
     LD_LIBRARY_PATH)
   - vddk's initializer calls dlopen("libcrypto.so") expecting to
     open /path/to/libs/libcrypto.so, but either LD_LIBRARY_PATH
     made that possible (at which point we have to scrub it before
     a child process will be penalized), or we have to find a way to
     rewrite vddk's dlopen call from relative into absolute before
     passing it to the real dlopen
- nbdkit binary spawns a child process to exec 'program args'
   - program does not want /path/to/libs in its search path
Writing my own dlopen() wrapper directly in nbdkit seems like a 
non-starter (my override has to come from a shared library before it can 
replace the shared version that would be imported from -ldl, at least 
for all subsequent shared library loads that want to bind to the 
override).  And if I read 'man dlopen' correctly, since nbdkit used 
dlopen() to load nbdkit-vddk-plugin.so, then dlopen() is already bound 
in the main context, so unless I use RTLD_DEEPBIND from nbdkit, then 
nbdkit-vddk-plugin.so will also see dlopen() bound to -dl rather than 
anything it loads locally; but even with RTLD_DEEPBIND, it sounds like 
that higher precedence lasts only for nbdkit-vddk-plugin.so and does not 
extend to later bindings performed for libvixDiskLib.so (which means 
vddk is back to -ldl's dlopen, without my hook).  Thus, to hook dlopen 
within the same process, I need some way to create a scope where I can 
provide a shared dlopen() that will take precedence when resolving 
symbols during the load of libvixDiskLib.so, but where that hook code 
can still defer back to the real dlopen() from -ldl and does not 
penalize child processes.
I managed to create a solution that avoids the need to set 
LD_PRELOAD_PATH at all by installing a shared library that hooks 
dlopen(), then loading both my shim and vddk via dlmopen() without the 
use of RTLD_GLOBAL or RTLD_DEEPBIND.  More links on my solution:
https://www.redhat.com/archives/libguestfs/2020-February/msg00154.html
https://bugzilla.redhat.com/show_bug.cgi?id=1756307#c7
https://sourceware.org/bugzilla/show_bug.cgi?id=15971#c5
However, when Florian saw it, he suggested that my solution of dlmopen() 
for a shim library that overrides dlopen() is reinventing what 
la_objsearch() can already do.  This is in part because the moment you 
dlmopen() a library into a separate namespace, you can't debug it (both 
glibc and gdb need additional patches to expose alternative namespaces 
for debugging), but there may be other nasty surprises lurking.
But after spending more than an hour playing with la_objsearch() and 
reading 'man rtld-audit', it looks like an audit library cannot be 
triggered in glibc except by listing it in LD_AUDIT in the environment 
during exec - which is back to the same problem we have with needing 
LD_LIBRARY_PATH in the environment.  Furthermore, although I know that 
glibc's audit interface is slightly different from the Solaris version 
it copied from, the Solaris documentation states that an audit library 
has some rather tough restrictions (including that using 'malloc' is 
unsafe, 
https://docs.oracle.com/cd/E36784_01/html/E36857/chapter6-3.html#scrolltoc 
"Some system interfaces assume that the interfaces are the only instance 
of their implementation within a process. Examples of such 
implementations are signals and malloc(3C). Audit libraries should avoid 
using such interfaces, as doing so can inadvertently alter the behavior 
of the application.").  But Solaris also stated that a library could 
serve as an audit entry point without LD_AUDIT if it is registered 
locally, via -Wl,-paudit.so.1 when creating the shared library 
(https://docs.oracle.com/cd/E36784_01/html/E36857/chapter6-18.html#scrolltoc); 
it doesn't seem that this functionality exists with glibc 
(/usr/lib64/libaudit.so on Linux has nothing to do with rtld-audit).
Does anyone have any ideas on how to let a shared library implement an 
audit interface for just its own process, without having to edit 
LD_AUDIT or re-exec the process?  Or is there yet another way to hook a 
program to rewrite misbehaving dlopen() calls without relying on either 
dlmopen() or la_objsearch(), or requiring pre-set environment variables, 
or having to re-exec a process?
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org
Eric Blake
2020-Feb-14  21:29 UTC
Re: [Libguestfs] alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
On 2/14/20 1:02 PM, Eric Blake wrote:> Writing my own dlopen() wrapper directly in nbdkit seems like a > non-starter (my override has to come from a shared library before it can > replace the shared version that would be imported from -ldl, at least > for all subsequent shared library loads that want to bind to the > override).Maybe I spoke too soon. I've tried another approach that looks like it will do what I want: put my shim dlopen() in a shared library, but link nbdkit against that shared library PRIOR to -ldl (so that name lookup for dlopen resolves there first). The shim library in turn depends on -ldl so that dlsym(RTLD_NEXT, "dlopen") still lets me get to the real dlopen. And by linking it directly into nbdkit, rather than into the nbdkit-vddk-plugin.so that gets loaded later, the first bound dlopen() in use for all subsequent loads is from my shim. It's still a bit less clean than I'd like (it requires tighter coupling between nbdkit and nbdkit-vddk-plugin.so than what used to exist), but the fact that it works without dlmopen() or LD_LIBRARY_PATH is in its favor. I'm now polishing up the experiment, and will post the patch when it's ready. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Carlos O'Donell
2020-Feb-14  22:20 UTC
Re: [Libguestfs] alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
On 2/14/20 4:29 PM, Eric Blake wrote:> On 2/14/20 1:02 PM, Eric Blake wrote: > >> Writing my own dlopen() wrapper directly in nbdkit seems like a >> non-starter (my override has to come from a shared library before >> it can replace the shared version that would be imported from -ldl, >> at least for all subsequent shared library loads that want to bind >> to the override). > > Maybe I spoke too soon. I've tried another approach that looks like > it will do what I want: put my shim dlopen() in a shared library, but > link nbdkit against that shared library PRIOR to -ldl (so that name > lookup for dlopen resolves there first). The shim library in turn > depends on -ldl so that dlsym(RTLD_NEXT, "dlopen") still lets me get > to the real dlopen. And by linking it directly into nbdkit, rather > than into the nbdkit-vddk-plugin.so that gets loaded later, the first > bound dlopen() in use for all subsequent loads is from my shim. It's > still a bit less clean than I'd like (it requires tighter coupling > between nbdkit and nbdkit-vddk-plugin.so than what used to exist), > but the fact that it works without dlmopen() or LD_LIBRARY_PATH is in > its favor. I'm now polishing up the experiment, and will post the > patch when it's ready.I think that's the best solution you're going to get. The alternatives (LD_LIBRARY_PATH, direct loader invocation, dlmopen) all have limitations that aren't helpful to your particular design. You have design strategies that look like this: - Move the object higher in the search order at link time (interposition) - Investigate static link order. - Investigate dynamic loader search order - Change what object is searched for - LD_LIBRARY_PATH, DT_RPATH, DT_RUNPATH, etc. - LD_AUDIT with la_objsearch() Your "shim dlopen()" is a case of moving the static link order around to ensure your shim is used first. -- Cheers, Carlos.
Eric Blake
2020-Feb-17  15:03 UTC
Re: [Libguestfs] alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
On 2/14/20 3:29 PM, Eric Blake wrote:> On 2/14/20 1:02 PM, Eric Blake wrote: > >> Writing my own dlopen() wrapper directly in nbdkit seems like a >> non-starter (my override has to come from a shared library before it >> can replace the shared version that would be imported from -ldl, at >> least for all subsequent shared library loads that want to bind to the >> override). > > Maybe I spoke too soon. I've tried another approach that looks like it > will do what I want: put my shim dlopen() in a shared library, but link > nbdkit against that shared library PRIOR to -ldl (so that name lookup > for dlopen resolves there first). The shim library in turn depends on > -ldl so that dlsym(RTLD_NEXT, "dlopen") still lets me get to the real > dlopen. And by linking it directly into nbdkit, rather than into the > nbdkit-vddk-plugin.so that gets loaded later, the first bound dlopen() > in use for all subsequent loads is from my shim. It's still a bit less > clean than I'd like (it requires tighter coupling between nbdkit and > nbdkit-vddk-plugin.so than what used to exist), but the fact that it > works without dlmopen() or LD_LIBRARY_PATH is in its favor. I'm now > polishing up the experiment, and will post the patch when it's ready.Progress report: I've posted a v4 series that relies on a shared library in the main executable; but I'm still trying to see if I can further reduce things (maybe with -rdynamic) so that the main binary itself provides the dlopen() override without needing an auxiliary shared library. https://www.redhat.com/archives/libguestfs/2020-February/msg00162.html> But after spending more than an hour playing with la_objsearch() and reading 'man rtld-audit', it looks like an audit library cannot be triggered in glibc except by listing it in LD_AUDIT in the environment during exec - which is back to the same problem we have with needing LD_LIBRARY_PATH in the environment. Furthermore, although I know that glibc's audit interface is slightly different from the Solaris version it copied from, the Solaris documentation states that an audit library has some rather tough restrictions (including that using 'malloc' is unsafe, https://docs.oracle.com/cd/E36784_01/html/E36857/chapter6-3.html#scrolltoc "Some system interfaces assume that the interfaces are the only instance of their implementation within a process. Examples of such implementations are signals and malloc(3C). Audit libraries should avoid using such interfaces, as doing so can inadvertently alter the behavior of the application."). But Solaris also stated that a library could serve as an audit entry point without LD_AUDIT if it is registered locally, via -Wl,-paudit.so.1 when creating the shared library (https://docs.oracle.com/cd/E36784_01/html/E36857/chapter6-18.html#scrolltoc); it doesn't seem that this functionality exists with glibc (/usr/lib64/libaudit.so on Linux has nothing to do with rtld-audit).I'm just now noticing that 'man ld' reports that you may pass '--audit LIB' during linking to add a DT_DEPAUDIT dependency on a library implementing the audit interface, which sounds like it might be an alternative to LD_AUDIT for getting a library with la_objsearch() to actually do something (but doesn't obviate the need for la_obsearch() to be in a separate library, rather than part of the main executable, unless a library can be reused as its own audit library...). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Maybe Matching Threads
- Re: alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
- alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
- Re: alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
- Re: alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?
- Re: alternatives for hooking dlopen() without LD_LIBRARY_PATH or LD_AUDIT?