thr3ads.net - llvm dev - [llvm-dev] LLD support for mach-o aliases (weak or otherwise) [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Louis Gerbarg via llvm-dev

2017-Jun-14 23:35 UTC

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

> On Jun 14, 2017, at 2:47 PM, Michael Clark via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On 15 Jun 2017, at 6:50 AM, Louis Gerbarg <lgerbarg at apple.com
<mailto:lgerbarg at apple.com>> wrote:
>> 
>>> 
>>> On Jun 6, 2017, at 4:08 PM, Michael Clark via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> Hi Folks,
>>> 
>>> I’m working on a port of musl libc to macos (arch triple is
“x86_64-xnu-musl”) to solve some irreconcilable issues I’m having with
libSystem.dylib. I don’t want to use glibc for various reasons, mainly because I
want to static link. I have static PIE + ASLR working which is not actually
supported by the Apple toolchain (*1), but I managed to get it to work. I’m sure
Apple might say “Don’t do that”, but from looking at the history of the xnu
kernel ABI, it seems to be very stable between versions.
>> 
>> I am from Apple, and I will say “Don’t do that.” The kernel ABI for our
platforms is not stable, we only guarantee stability at the dynamic link
boundary (in this case public symbols exported from libSystem). While the kernel
syscall numbers have not changed (though the kernel team reserves the right to
do that), the parameter lists and argument marshaling for them certainly has
changed. We also do not support static executables on our system.
>> 
>> We even had bincompat issues related to this i rolled their during the
last major release (macOS 10.12 Sierra): Go implemented its own syscall support,
which caused all of their binaries that used gettimeofday since the internal
interface changed <https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. More broadly you can look
at a discussion of their issues here:
<https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. In their case they want
to avoid invoking an external linker like ld64 or lld as opposed to avoiding
libSystem, but the effect is the same, they shipped a tool that caused
unsuspecting developers to have surprise bincompat issues that were entirely
avoidable.
> 
> I’m aware that Go has had some issues with the XNU ABI boundary.
> 
> I’m working on a CPU simulator / binary translator and I need control of
the process address space layout. It seems I may ultimately need to use
Hypervisor.framework however that is a lot more work in the short term.
I actually thought about mentioning Hypervisor.framework, but I was not sure
about your use case. If you need full control of the address space (PAGE_ZERO
control, overriding the shared cache mappings, etc) that is really the only
supported mechanism.
> The issue I am having with libSystem.dylib is the lack of weak linkage
(versus weak_import) i.e. weak aliases. I don’t want to use a wrapper binary
with DYLD_INSERT_LIBRARIES. I want to interpose Libc symbols with some of the
symbols present in my binary (memory allocator, mmap). Interposition support is
somewhat lacking in the Mach-O toolchain and runtime linker despite the Mach-O
format technically supporting what I need (N_INDR and N_WEAK_DEF).
> 
> - https://developer.apple.com/documentation/kernel/nlist_64
<https://developer.apple.com/documentation/kernel/nlist_64>Dyld does not generally use nlists at runtime except for things like dladdr(),
and has not for the last 10 years or so. Instead dyld uses a trie to publish
exports, and and a small byte code language to describe binding imports. We
still support using nlists for old binaries, but anything built with recent
tools also contains the newer trie and bind op codes which will be used if they
are available. I do not think our tries can express the sort of import semantics
you want.

I see two ways of potentially doing it (short of hypervisor.framework). Both of
them are a bit gross and have some bincompat risks, but given you are an open
source project and can rev if need be that may not be an enormous issue:

You could specify a custom segment in your executable with zero file size and a
vm size the blocks out the address range you need and then unmap it. There may
be practical limits on it that prevent you from achieving what you want.

You could use implicit interposing. This is a feature add so that ASAN binaries
can avoid the the whole re-exec with DYLD_INSERT_LIBRARIES issue. It is not
guaranteed to be stable, but in practice it is probably the most stable option
short of using a hypervisor. The way it works:

Define all the symbols in a dylib along with an interpose section (as though you
were going to load it with DYLD_INSERT_LIBRARIES). Directly link your executable
to binary. Dyld will discover the interpose during dependency analysis (before
libSystem initializers are run) and apply the interpose. This has only been
tested in the case of our sanitizer runtimes, but it *SHOULD* work.
> 
> If I could use N_INDR and N_WEAK_DEF to have early bound (runtime link
time) interposition with symbols in my binary replacing the C library allocator
and mmap, and have libSystem use my implementations then I would be happy.
libSystem itself would need to use weak aliases. This is possible with C
libraries on other platforms.
> 
> I’ve tried relentlessly to intercept the malloc_zone implementation.
malloc_zone_register is not sufficient as some of the internal zones are tied to
the internals of Libc and I am getting heap collisions with Libc allocated
objects and my guest address space. On Linux I have enough control to do what I
need and can interpose my symbols to implement versions of libc functions that I
wish to override. The problem on darwin is that I am not able to interpose the
malloc implementation until main starts, and at that point it is tool late as
the C library already has created its internal zones. I’m also unable to
interpose mmap. I have already looked at the interpose symbol tricks but they
don’t meet my purposes  (not wanting to re-exec with DYLD_INSERT_LIBRARIES).
Weak aliases from libSystem to the allocator implementation and various public
symbols along with N_INDR and N_WEAK_DEF would be required for me to achieve
what I need to achieve (somewhat similarly to the elegant internal
implementation of musl libc).
> 
> With my current solution (musl on xnu) I have successfully reserved 0x1000
– 0x7fff_0000_0000. Essentially the low 128TiB minus 4GiB at the top of the
address space where I place my translator and translator stack. This is
satisfactory for my user mode simulator to emulate Linux processes on macOS.
> 
> I think Hypervisor.framework is probably the correct interface to be using
if I want to avoid the kernel ABI, however that is a lot more work that making
syscall wrappers and I would need to implement communication from VM process to
the host process.
I think that this is probably your best choice from a binary compatibility
standpoint in the long run. It is a lot of work, though I am not sure if it is
really that much more work than trying to port a new libc or maintain a custom
toolchain.

Louis
> 
> I’m actually implementing Linux syscall emulation in a user simulator so
the kernel ABI is probably the technically correct layer. The full system
emulator ultimately needs to use Hypervisor.framework if I am to use hardware
paging instead of soft MMU. I have two simulators, a user-mode sim that emulates
the Linux ABI and a full system emulator: https://rv8.io/
<https://rv8.io/> and I really want to support RISC-V Linux on macOS in
the user mode simulator.
> 
> Proper Linux ABI emulation on macOS would ultimately require kernel
support, at minimum something like binfmt misc, but ideally a kext that
implements another ABI personality (much like Linux ABI emulation on Windows) in
addition to the BSD personality. In fact the FreeBSD linux compat could be used
if the FreeBSD portion of XNU is synced up with current, and we’d get bug fixes
for long standing issues like the macOS TCP_NOPUSH bug that has long since been
fixed in FreeBSD.
> 
>> Ultimately if you are doing this on your own for your own for fun thats
great, but if this something you intend to ship to other people please
reconsider. It is more than a theoretical concern that it will break.
>> 
>> Louis
>> 
>>> In any case the musl libc source makes extensive use of weak
aliases, perhaps to allow easier interposition of C library routines, however
aliases, weak or otherwise are not currently supported by ld64.
>>> 
>>> It appears that the mach-o format supports aliases, but the
functionality has not been exposed via the linker (ld64/LLD).
>>> 
>>> - http://blog.omega-prime.co.uk/?p=121
<http://blog.omega-prime.co.uk/?p=121>
>>> 
>>> The musl code does the following which currently errors out saying
aliases are not currently supported:
>>> 
>>> #undef weak_alias
>>> #define weak_alias(old, new) \
>>>         extern __typeof(old) new __attribute__((weak, alias(#old)))
>>> 
>>> and the macro is used internally like this:
>>> 
>>> int __pthread_join(pthread_t t, void **res)
>>> {
>>>         // implementation here
>>> }
>>> 
>>> weak_alias(__pthread_join, pthread_join);
>>> 
>>> The problem is the actual export used by clients is an alias and I
want to maintain source compatibility.
>>> 
>>> I seem to have found a way to semi-emulate aliases (at least within
one module). My goal is to at least turn them into strong aliases somehow, so I
can at a minimum make the musl source compatible with clang on macos. The
following compiles but foo is not exported:
>>> 
>>> $ cat a.c
>>> #include <stdio.h>
>>> 
>>> void foo() __attribute__((weak_import)) __asm("_bar");
>>> 
>>> void bar()
>>> {
>>>         printf("bar\n");
>>> }
>>> 
>>> int main()
>>> {
>>>         foo();
>>> }
>>> 
>>> $ cc -c a.c -o a.o
>>> $ nm a.o
>>> 0000000000000000 T _bar
>>> 0000000000000020 T _main
>>>                  U _printf
>>> 
>>> Any ideas how I can get foo as an exported symbol? 
>>> 
>>> Is weak alias or plan alias support planned for mach-o in LLD?
>>> 
>>> The goal at a minimum is to make the weak_alias macro emit a strong
alias with clang/ld64 or clang/LLD? so I don’t need to diverge too much from the
upstream musl source (as the lack of alias support currently requires me to
rename function declarations in the source). Of course pthreads which I’m
working on now are going to be completely different… but musl has support for
architecture specific overrides in its build system.
>>> 
>>> BTW I now have some quite non-trivial programs compiling against
musl-xnu + libcxx + libcxxabi on macos.
>>> 
>>> There are a lot of libcxx changes like this:
>>> 
>>> -#ifdef __APPLE__
>>> +#if defined(__APPLE__) && !defined(_LIBCPP_HAS_MUSL_LIBC)
>>> 
>>> Michael.
>>> 
>>> [1]
https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99
<https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99>_______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/779a64a3/attachment.html>

Michael Clark via llvm-dev

2017-Jun-15 00:51 UTC

head link

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

> On 15 Jun 2017, at 11:35 AM, Louis Gerbarg <lgerbarg at apple.com>
wrote:
> 
>> 
>> On Jun 14, 2017, at 2:47 PM, Michael Clark via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> 
>>> On 15 Jun 2017, at 6:50 AM, Louis Gerbarg <lgerbarg at apple.com
<mailto:lgerbarg at apple.com>> wrote:
>>> 
>>>> 
>>>> On Jun 6, 2017, at 4:08 PM, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>> Hi Folks,
>>>> 
>>>> I’m working on a port of musl libc to macos (arch triple is
“x86_64-xnu-musl”) to solve some irreconcilable issues I’m having with
libSystem.dylib. I don’t want to use glibc for various reasons, mainly because I
want to static link. I have static PIE + ASLR working which is not actually
supported by the Apple toolchain (*1), but I managed to get it to work. I’m sure
Apple might say “Don’t do that”, but from looking at the history of the xnu
kernel ABI, it seems to be very stable between versions.
>>> 
>>> I am from Apple, and I will say “Don’t do that.” The kernel ABI for
our platforms is not stable, we only guarantee stability at the dynamic link
boundary (in this case public symbols exported from libSystem). While the kernel
syscall numbers have not changed (though the kernel team reserves the right to
do that), the parameter lists and argument marshaling for them certainly has
changed. We also do not support static executables on our system.
>>> 
>>> We even had bincompat issues related to this i rolled their during
the last major release (macOS 10.12 Sierra): Go implemented its own syscall
support, which caused all of their binaries that used gettimeofday since the
internal interface changed <https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. More broadly you can look
at a discussion of their issues here:
<https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. In their case they want
to avoid invoking an external linker like ld64 or lld as opposed to avoiding
libSystem, but the effect is the same, they shipped a tool that caused
unsuspecting developers to have surprise bincompat issues that were entirely
avoidable.
>> 
>> I’m aware that Go has had some issues with the XNU ABI boundary.
>> 
>> I’m working on a CPU simulator / binary translator and I need control
of the process address space layout. It seems I may ultimately need to use
Hypervisor.framework however that is a lot more work in the short term.
> 
> I actually thought about mentioning Hypervisor.framework, but I was not
sure about your use case. If you need full control of the address space
(PAGE_ZERO control, overriding the shared cache mappings, etc) that is really
the only supported mechanism.
> 
>> The issue I am having with libSystem.dylib is the lack of weak linkage
(versus weak_import) i.e. weak aliases. I don’t want to use a wrapper binary
with DYLD_INSERT_LIBRARIES. I want to interpose Libc symbols with some of the
symbols present in my binary (memory allocator, mmap). Interposition support is
somewhat lacking in the Mach-O toolchain and runtime linker despite the Mach-O
format technically supporting what I need (N_INDR and N_WEAK_DEF).
>> 
>> - https://developer.apple.com/documentation/kernel/nlist_64
<https://developer.apple.com/documentation/kernel/nlist_64>
> Dyld does not generally use nlists at runtime except for things like
dladdr(), and has not for the last 10 years or so. Instead dyld uses a trie to
publish exports, and and a small byte code language to describe binding imports.
We still support using nlists for old binaries, but anything built with recent
tools also contains the newer trie and bind op codes which will be used if they
are available. I do not think our tries can express the sort of import semantics
you want.
> 
> I see two ways of potentially doing it (short of hypervisor.framework).
Both of them are a bit gross and have some bincompat risks, but given you are an
open source project and can rev if need be that may not be an enormous issue:
> 
> You could specify a custom segment in your executable with zero file size
and a vm size the blocks out the address range you need and then unmap it. There
may be practical limits on it that prevent you from achieving what you want.
That’s an interesting approach worth exploring. I wasn’t sure I could unmap the
zero page at runtime.

I’m currently using -Wl,-pagezero_size,0x1000 which frees up the lower address
space but of course the Libc allocator starts using this address space; the
default on x86_64 with ld64 is a 4GiB zero page and that is where Libc normally
allocates. I believe Libc is passing NULL for the address hint to mmap or
vm_allocate and is getting the default address returned by the kernel, so it
allocates at the lowest address possible. I’m likely going to have a similar
problem if I unmap a region after start up and then call a Libc function that
allocates using one of the internal zones. Even when I replaced the default
malloc zone, it seems there were already other zones created and appeared to be
used internally by Libc.

In fact, during this process, I’ve been working on minimising my Libc footprint,
which is obviously required if I want to run in Hypervisor.framework. The early
CRT initialisation hooks for C++ and image relocation stuff will be required,
however I may well end up with a tiny stub that loads an ELF image if I do in
fact use Hypervisor.framework. I’m using C++ so I can use vector, map, string,
shared_ptr, etc. I find that using vector, map, string, shared_ptr are much
safer than traditional C and raw pointers.
> You could use implicit interposing. This is a feature add so that ASAN
binaries can avoid the the whole re-exec with DYLD_INSERT_LIBRARIES issue. It is
not guaranteed to be stable, but in practice it is probably the most stable
option short of using a hypervisor. The way it works:
I saw the ASAN interposition patch that avoids DYLD_INSERT_LIBRARIES however I
was not sure how it worked.
> Define all the symbols in a dylib along with an interpose section (as
though you were going to load it with DYLD_INSERT_LIBRARIES). Directly link your
executable to binary. Dyld will discover the interpose during dependency
analysis (before libSystem initializers are run) and apply the interpose. This
has only been tested in the case of our sanitizer runtimes, but it *SHOULD*
work.
This might be an approach worth exploring, as I can then interpose vm_allocate
and/or mmap to add an address hint to coax Libc into using a reserved area of
memory. I actually tried to get this to work but my interposed functions were
not called as they were in the main executable. e.g.

https://opensource.apple.com/source/dyld/dyld-433.5/include/mach-o/dyld-interposing.h.auto.html
<https://opensource.apple.com/source/dyld/dyld-433.5/include/mach-o/dyld-interposing.h.auto.html>

So I guess I need a dependency on an additional dylib which has my interposed
functions. It’s a pity dyld only searches dylibs and not the main executable for
interpose sections (as it didn’t appear to work with an interpose section in the
main executable).
>> If I could use N_INDR and N_WEAK_DEF to have early bound (runtime link
time) interposition with symbols in my binary replacing the C library allocator
and mmap, and have libSystem use my implementations then I would be happy.
libSystem itself would need to use weak aliases. This is possible with C
libraries on other platforms.
>> 
>> I’ve tried relentlessly to intercept the malloc_zone implementation.
malloc_zone_register is not sufficient as some of the internal zones are tied to
the internals of Libc and I am getting heap collisions with Libc allocated
objects and my guest address space. On Linux I have enough control to do what I
need and can interpose my symbols to implement versions of libc functions that I
wish to override. The problem on darwin is that I am not able to interpose the
malloc implementation until main starts, and at that point it is tool late as
the C library already has created its internal zones. I’m also unable to
interpose mmap. I have already looked at the interpose symbol tricks but they
don’t meet my purposes  (not wanting to re-exec with DYLD_INSERT_LIBRARIES).
Weak aliases from libSystem to the allocator implementation and various public
symbols along with N_INDR and N_WEAK_DEF would be required for me to achieve
what I need to achieve (somewhat similarly to the elegant internal
implementation of musl libc).
>> 
>> With my current solution (musl on xnu) I have successfully reserved
0x1000 – 0x7fff_0000_0000. Essentially the low 128TiB minus 4GiB at the top of
the address space where I place my translator and translator stack. This is
satisfactory for my user mode simulator to emulate Linux processes on macOS.
>> 
>> I think Hypervisor.framework is probably the correct interface to be
using if I want to avoid the kernel ABI, however that is a lot more work that
making syscall wrappers and I would need to implement communication from VM
process to the host process.
> 
> I think that this is probably your best choice from a binary compatibility
standpoint in the long run. It is a lot of work, though I am not sure if it is
really that much more work than trying to port a new libc or maintain a custom
toolchain.
Yes, both of them are quite a bit of work. I need to get early boot code to
switch the CPU into long mode and implement a virtual device to communicate with
the host process, i.e. for console IO. Of course I need a thread implementation
and a bunch of other things. It’s also quite a lot of work.

Thanks,
Michael.
> Louis
> 
>> 
>> I’m actually implementing Linux syscall emulation in a user simulator
so the kernel ABI is probably the technically correct layer. The full system
emulator ultimately needs to use Hypervisor.framework if I am to use hardware
paging instead of soft MMU. I have two simulators, a user-mode sim that emulates
the Linux ABI and a full system emulator: https://rv8.io/
<https://rv8.io/> and I really want to support RISC-V Linux on macOS in
the user mode simulator.
>> 
>> Proper Linux ABI emulation on macOS would ultimately require kernel
support, at minimum something like binfmt misc, but ideally a kext that
implements another ABI personality (much like Linux ABI emulation on Windows) in
addition to the BSD personality. In fact the FreeBSD linux compat could be used
if the FreeBSD portion of XNU is synced up with current, and we’d get bug fixes
for long standing issues like the macOS TCP_NOPUSH bug that has long since been
fixed in FreeBSD.
>> 
>>> Ultimately if you are doing this on your own for your own for fun
thats great, but if this something you intend to ship to other people please
reconsider. It is more than a theoretical concern that it will break.
>>> 
>>> Louis
>>> 
>>>> In any case the musl libc source makes extensive use of weak
aliases, perhaps to allow easier interposition of C library routines, however
aliases, weak or otherwise are not currently supported by ld64.
>>>> 
>>>> It appears that the mach-o format supports aliases, but the
functionality has not been exposed via the linker (ld64/LLD).
>>>> 
>>>> - http://blog.omega-prime.co.uk/?p=121
<http://blog.omega-prime.co.uk/?p=121>
>>>> 
>>>> The musl code does the following which currently errors out
saying aliases are not currently supported:
>>>> 
>>>> #undef weak_alias
>>>> #define weak_alias(old, new) \
>>>>         extern __typeof(old) new __attribute__((weak,
alias(#old)))
>>>> 
>>>> and the macro is used internally like this:
>>>> 
>>>> int __pthread_join(pthread_t t, void **res)
>>>> {
>>>>         // implementation here
>>>> }
>>>> 
>>>> weak_alias(__pthread_join, pthread_join);
>>>> 
>>>> The problem is the actual export used by clients is an alias
and I want to maintain source compatibility.
>>>> 
>>>> I seem to have found a way to semi-emulate aliases (at least
within one module). My goal is to at least turn them into strong aliases
somehow, so I can at a minimum make the musl source compatible with clang on
macos. The following compiles but foo is not exported:
>>>> 
>>>> $ cat a.c
>>>> #include <stdio.h>
>>>> 
>>>> void foo() __attribute__((weak_import))
__asm("_bar");
>>>> 
>>>> void bar()
>>>> {
>>>>         printf("bar\n");
>>>> }
>>>> 
>>>> int main()
>>>> {
>>>>         foo();
>>>> }
>>>> 
>>>> $ cc -c a.c -o a.o
>>>> $ nm a.o
>>>> 0000000000000000 T _bar
>>>> 0000000000000020 T _main
>>>>                  U _printf
>>>> 
>>>> Any ideas how I can get foo as an exported symbol? 
>>>> 
>>>> Is weak alias or plan alias support planned for mach-o in LLD?
>>>> 
>>>> The goal at a minimum is to make the weak_alias macro emit a
strong alias with clang/ld64 or clang/LLD? so I don’t need to diverge too much
from the upstream musl source (as the lack of alias support currently requires
me to rename function declarations in the source). Of course pthreads which I’m
working on now are going to be completely different… but musl has support for
architecture specific overrides in its build system.
>>>> 
>>>> BTW I now have some quite non-trivial programs compiling
against musl-xnu + libcxx + libcxxabi on macos.
>>>> 
>>>> There are a lot of libcxx changes like this:
>>>> 
>>>> -#ifdef __APPLE__
>>>> +#if defined(__APPLE__) &&
!defined(_LIBCPP_HAS_MUSL_LIBC)
>>>> 
>>>> Michael.
>>>> 
>>>> [1]
https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99
<https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99>_______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/9b48386f/attachment.html>

Michael Clark via llvm-dev

2017-Jun-16 10:13 UTC

head link

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

Hi Louis,

I tried unmapping the 4GiB zero page at runtime however the vm_map is owned by
the kernel and it appears it can’t be unmapped after process creation. This left
the option of linking with -page_zero 0x1000 and somehow interposing the
allocator.

I then tried the second approach of using implicit interposing and have come up
with a workable solution where I set the zero page size to 0x1000 and then
interpose mmap, vm_map, vm_allocate, mach_vm_map and mach_vm_allocate.
Interestingly the dylib needs to be linked with an explicit image_base otherwise
dyld will load the interposing dylib into the low address space, using the
default map address. After quite a bit of experimenting I came up with what is a
relatively clean solution, and it doesn’t require any knowledge of the internals
of libsystem_malloc.dylib other than how it uses the default address hints (for
entropy). I’ve tested with and without PIE however I need to link with -no-pie
due to bugs in ld64 when altering both the zero page size and the image base.

- https://github.com/michaeljclark/libSystem-mmap
<https://github.com/michaeljclark/libSystem-mmap>
- https://github.com/michaeljclark/libSystem-mmap/blob/master/mmap-himem.c
<https://github.com/michaeljclark/libSystem-mmap/blob/master/mmap-himem.c>
- https://github.com/michaeljclark/libSystem-mmap/blob/master/mmap-test.c
<https://github.com/michaeljclark/libSystem-mmap/blob/master/mmap-test.c>

In the test results, you can see mmap-test-vanilla running with vanilla
libSystem.dylib. The test shows heap allocation addresses (from
libsystem_malloc.dylib) along with addresses from explicit calls to mmap and
vm_allocate with and without address hints (mapping 0x1000 fails of course). In
the first case the lower 4GiB is not available.

In mmap-test-collision I have linked the program with -page-zero 0x1000 which
frees up the address space, but you can see ibsystem_malloc.dylib heap
allocations colliding/intermingling with the lower 4GiB of address space which I
wish to reserve for the CPU simulator; my original problem. I have already tried
with runtime registration of malloc zones however this didn’t work due to
internal zones and early allocations in the CRT. That is the point I was at
previously (heap collisions).

In the third test mmap-test-himem you can see the same program run with the
interposed memory allocation functions. I’m able to reserve 0x1000 -
0x7ffe00000000 which should be satisfactory for emulating a Linux user process
address space on macOS (as I’m loading ELFs).

Thanks for the hints; otherwise I might have been using musl-xnu, although I’m
still keen to get that working for a variety of other reasons, but not for this
project. This approach solves my immediate problem and I don’t need to build a
sysroot of musl-xnu, libcxx, libcxxabi and libunwind to get my project to build
on macOS. I actually already had my project working with musl-xnu, including
stdio, signals, setjmp/longjmp, mach_time, static PIE, ASLR and a lot of other
functionality, but its kind of a large and complicated build dependency. I still
need to implement threads and semaphores however I was making rapid progress…

Cheers,
Michael.

$ make test
cc -Wall -Wpedantic -c -o obj/mmap-himem.o mmap-himem.c
cc -dynamiclib \
	-install_name @rpath/mmap-himem.dylib \
	-image_base 0x7ffe80000000 \
	-o lib/mmap-himem.dylib obj/mmap-himem.o
cc -Wall -Wpedantic -c -o obj/mmap-test.o mmap-test.c
cc -o bin/mmap-test-vanilla obj/mmap-test.o
cc -Wl,-no_pie -Wl,-pagezero_size,0x1000 \
	-image_base 0x7ffe00000000 \
	-o bin/mmap-test-collision obj/mmap-test.o
cc -Wl,-no_pie -Wl,-pagezero_size,0x1000 \
	-image_base 0x7ffe00000000 \
	-rpath lib lib/mmap-himem.dylib \
	-o bin/mmap-test-himem obj/mmap-test.o

== mmap-test-vanilla =stak=0x7fff5053c7c8
text=0x10f6c4040
heap=0x10f6f9000
map1=0xffffffffffffffff
map2=0x7ff000000000
map3=0x10f71a000
map4=0x10f71b000

== mmap-test-collision =stak=0x7fff5fbff7c8
text=0x7ffe00001040
heap=0x34000
map1=0x1000
map2=0x7ff000000000
map3=0x55000
map4=0x56000

== mmap-test-himem =stak=0x7fff5fbff7d0
text=0x7ffe00001040
heap=0x7fff01002000
map1=0x1000
map2=0x7ff000000000
map3=0x7fff01026000
map4=0x7fff01028000

Michael.
> On 15 Jun 2017, at 12:51 PM, Michael Clark <michaeljclark at mac.com>
wrote:
> 
>> 
>> On 15 Jun 2017, at 11:35 AM, Louis Gerbarg <lgerbarg at apple.com
<mailto:lgerbarg at apple.com>> wrote:
>> 
>>> 
>>> On Jun 14, 2017, at 2:47 PM, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>>> 
>>>> On 15 Jun 2017, at 6:50 AM, Louis Gerbarg <lgerbarg at
apple.com <mailto:lgerbarg at apple.com>> wrote:
>>>> 
>>>>> 
>>>>> On Jun 6, 2017, at 4:08 PM, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>> 
>>>>> Hi Folks,
>>>>> 
>>>>> I’m working on a port of musl libc to macos (arch triple is
“x86_64-xnu-musl”) to solve some irreconcilable issues I’m having with
libSystem.dylib. I don’t want to use glibc for various reasons, mainly because I
want to static link. I have static PIE + ASLR working which is not actually
supported by the Apple toolchain (*1), but I managed to get it to work. I’m sure
Apple might say “Don’t do that”, but from looking at the history of the xnu
kernel ABI, it seems to be very stable between versions.
>>>> 
>>>> I am from Apple, and I will say “Don’t do that.” The kernel ABI
for our platforms is not stable, we only guarantee stability at the dynamic link
boundary (in this case public symbols exported from libSystem). While the kernel
syscall numbers have not changed (though the kernel team reserves the right to
do that), the parameter lists and argument marshaling for them certainly has
changed. We also do not support static executables on our system.
>>>> 
>>>> We even had bincompat issues related to this i rolled their
during the last major release (macOS 10.12 Sierra): Go implemented its own
syscall support, which caused all of their binaries that used gettimeofday since
the internal interface changed <https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. More broadly you can look
at a discussion of their issues here:
<https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. In their case they want
to avoid invoking an external linker like ld64 or lld as opposed to avoiding
libSystem, but the effect is the same, they shipped a tool that caused
unsuspecting developers to have surprise bincompat issues that were entirely
avoidable.
>>> 
>>> I’m aware that Go has had some issues with the XNU ABI boundary.
>>> 
>>> I’m working on a CPU simulator / binary translator and I need
control of the process address space layout. It seems I may ultimately need to
use Hypervisor.framework however that is a lot more work in the short term.
>> 
>> I actually thought about mentioning Hypervisor.framework, but I was not
sure about your use case. If you need full control of the address space
(PAGE_ZERO control, overriding the shared cache mappings, etc) that is really
the only supported mechanism.
>> 
>>> The issue I am having with libSystem.dylib is the lack of weak
linkage (versus weak_import) i.e. weak aliases. I don’t want to use a wrapper
binary with DYLD_INSERT_LIBRARIES. I want to interpose Libc symbols with some of
the symbols present in my binary (memory allocator, mmap). Interposition support
is somewhat lacking in the Mach-O toolchain and runtime linker despite the
Mach-O format technically supporting what I need (N_INDR and N_WEAK_DEF).
>>> 
>>> - https://developer.apple.com/documentation/kernel/nlist_64
<https://developer.apple.com/documentation/kernel/nlist_64>
>> Dyld does not generally use nlists at runtime except for things like
dladdr(), and has not for the last 10 years or so. Instead dyld uses a trie to
publish exports, and and a small byte code language to describe binding imports.
We still support using nlists for old binaries, but anything built with recent
tools also contains the newer trie and bind op codes which will be used if they
are available. I do not think our tries can express the sort of import semantics
you want.
>> 
>> I see two ways of potentially doing it (short of hypervisor.framework).
Both of them are a bit gross and have some bincompat risks, but given you are an
open source project and can rev if need be that may not be an enormous issue:
>> 
>> You could specify a custom segment in your executable with zero file
size and a vm size the blocks out the address range you need and then unmap it.
There may be practical limits on it that prevent you from achieving what you
want.
> 
> That’s an interesting approach worth exploring. I wasn’t sure I could unmap
the zero page at runtime.
> 
> I’m currently using -Wl,-pagezero_size,0x1000 which frees up the lower
address space but of course the Libc allocator starts using this address space;
the default on x86_64 with ld64 is a 4GiB zero page and that is where Libc
normally allocates. I believe Libc is passing NULL for the address hint to mmap
or vm_allocate and is getting the default address returned by the kernel, so it
allocates at the lowest address possible. I’m likely going to have a similar
problem if I unmap a region after start up and then call a Libc function that
allocates using one of the internal zones. Even when I replaced the default
malloc zone, it seems there were already other zones created and appeared to be
used internally by Libc.
> 
> In fact, during this process, I’ve been working on minimising my Libc
footprint, which is obviously required if I want to run in Hypervisor.framework.
The early CRT initialisation hooks for C++ and image relocation stuff will be
required, however I may well end up with a tiny stub that loads an ELF image if
I do in fact use Hypervisor.framework. I’m using C++ so I can use vector, map,
string, shared_ptr, etc. I find that using vector, map, string, shared_ptr are
much safer than traditional C and raw pointers.
> 
>> You could use implicit interposing. This is a feature add so that ASAN
binaries can avoid the the whole re-exec with DYLD_INSERT_LIBRARIES issue. It is
not guaranteed to be stable, but in practice it is probably the most stable
option short of using a hypervisor. The way it works:
> 
> I saw the ASAN interposition patch that avoids DYLD_INSERT_LIBRARIES
however I was not sure how it worked.
> 
>> Define all the symbols in a dylib along with an interpose section (as
though you were going to load it with DYLD_INSERT_LIBRARIES). Directly link your
executable to binary. Dyld will discover the interpose during dependency
analysis (before libSystem initializers are run) and apply the interpose. This
has only been tested in the case of our sanitizer runtimes, but it *SHOULD*
work.
> 
> This might be an approach worth exploring, as I can then interpose
vm_allocate and/or mmap to add an address hint to coax Libc into using a
reserved area of memory. I actually tried to get this to work but my interposed
functions were not called as they were in the main executable. e.g.
> 
>
https://opensource.apple.com/source/dyld/dyld-433.5/include/mach-o/dyld-interposing.h.auto.html
<https://opensource.apple.com/source/dyld/dyld-433.5/include/mach-o/dyld-interposing.h.auto.html>
> 
> So I guess I need a dependency on an additional dylib which has my
interposed functions. It’s a pity dyld only searches dylibs and not the main
executable for interpose sections (as it didn’t appear to work with an interpose
section in the main executable).
> 
>>> If I could use N_INDR and N_WEAK_DEF to have early bound (runtime
link time) interposition with symbols in my binary replacing the C library
allocator and mmap, and have libSystem use my implementations then I would be
happy. libSystem itself would need to use weak aliases. This is possible with C
libraries on other platforms.
>>> 
>>> I’ve tried relentlessly to intercept the malloc_zone
implementation. malloc_zone_register is not sufficient as some of the internal
zones are tied to the internals of Libc and I am getting heap collisions with
Libc allocated objects and my guest address space. On Linux I have enough
control to do what I need and can interpose my symbols to implement versions of
libc functions that I wish to override. The problem on darwin is that I am not
able to interpose the malloc implementation until main starts, and at that point
it is tool late as the C library already has created its internal zones. I’m
also unable to interpose mmap. I have already looked at the interpose symbol
tricks but they don’t meet my purposes  (not wanting to re-exec with
DYLD_INSERT_LIBRARIES). Weak aliases from libSystem to the allocator
implementation and various public symbols along with N_INDR and N_WEAK_DEF would
be required for me to achieve what I need to achieve (somewhat similarly to the
elegant internal implementation of musl libc).
>>> 
>>> With my current solution (musl on xnu) I have successfully reserved
0x1000 – 0x7fff_0000_0000. Essentially the low 128TiB minus 4GiB at the top of
the address space where I place my translator and translator stack. This is
satisfactory for my user mode simulator to emulate Linux processes on macOS.
>>> 
>>> I think Hypervisor.framework is probably the correct interface to
be using if I want to avoid the kernel ABI, however that is a lot more work that
making syscall wrappers and I would need to implement communication from VM
process to the host process.
>> 
>> I think that this is probably your best choice from a binary
compatibility standpoint in the long run. It is a lot of work, though I am not
sure if it is really that much more work than trying to port a new libc or
maintain a custom toolchain.
> 
> Yes, both of them are quite a bit of work. I need to get early boot code to
switch the CPU into long mode and implement a virtual device to communicate with
the host process, i.e. for console IO. Of course I need a thread implementation
and a bunch of other things. It’s also quite a lot of work.
> 
> Thanks,
> Michael.
> 
>> Louis
>> 
>>> 
>>> I’m actually implementing Linux syscall emulation in a user
simulator so the kernel ABI is probably the technically correct layer. The full
system emulator ultimately needs to use Hypervisor.framework if I am to use
hardware paging instead of soft MMU. I have two simulators, a user-mode sim that
emulates the Linux ABI and a full system emulator: https://rv8.io/
<https://rv8.io/> and I really want to support RISC-V Linux on macOS in
the user mode simulator.
>>> 
>>> Proper Linux ABI emulation on macOS would ultimately require kernel
support, at minimum something like binfmt misc, but ideally a kext that
implements another ABI personality (much like Linux ABI emulation on Windows) in
addition to the BSD personality. In fact the FreeBSD linux compat could be used
if the FreeBSD portion of XNU is synced up with current, and we’d get bug fixes
for long standing issues like the macOS TCP_NOPUSH bug that has long since been
fixed in FreeBSD.
>>> 
>>>> Ultimately if you are doing this on your own for your own for
fun thats great, but if this something you intend to ship to other people please
reconsider. It is more than a theoretical concern that it will break.
>>>> 
>>>> Louis
>>>> 
>>>>> In any case the musl libc source makes extensive use of
weak aliases, perhaps to allow easier interposition of C library routines,
however aliases, weak or otherwise are not currently supported by ld64.
>>>>> 
>>>>> It appears that the mach-o format supports aliases, but the
functionality has not been exposed via the linker (ld64/LLD).
>>>>> 
>>>>> - http://blog.omega-prime.co.uk/?p=121
<http://blog.omega-prime.co.uk/?p=121>
>>>>> 
>>>>> The musl code does the following which currently errors out
saying aliases are not currently supported:
>>>>> 
>>>>> #undef weak_alias
>>>>> #define weak_alias(old, new) \
>>>>>         extern __typeof(old) new __attribute__((weak,
alias(#old)))
>>>>> 
>>>>> and the macro is used internally like this:
>>>>> 
>>>>> int __pthread_join(pthread_t t, void **res)
>>>>> {
>>>>>         // implementation here
>>>>> }
>>>>> 
>>>>> weak_alias(__pthread_join, pthread_join);
>>>>> 
>>>>> The problem is the actual export used by clients is an
alias and I want to maintain source compatibility.
>>>>> 
>>>>> I seem to have found a way to semi-emulate aliases (at
least within one module). My goal is to at least turn them into strong aliases
somehow, so I can at a minimum make the musl source compatible with clang on
macos. The following compiles but foo is not exported:
>>>>> 
>>>>> $ cat a.c
>>>>> #include <stdio.h>
>>>>> 
>>>>> void foo() __attribute__((weak_import))
__asm("_bar");
>>>>> 
>>>>> void bar()
>>>>> {
>>>>>         printf("bar\n");
>>>>> }
>>>>> 
>>>>> int main()
>>>>> {
>>>>>         foo();
>>>>> }
>>>>> 
>>>>> $ cc -c a.c -o a.o
>>>>> $ nm a.o
>>>>> 0000000000000000 T _bar
>>>>> 0000000000000020 T _main
>>>>>                  U _printf
>>>>> 
>>>>> Any ideas how I can get foo as an exported symbol? 
>>>>> 
>>>>> Is weak alias or plan alias support planned for mach-o in
LLD?
>>>>> 
>>>>> The goal at a minimum is to make the weak_alias macro emit
a strong alias with clang/ld64 or clang/LLD? so I don’t need to diverge too much
from the upstream musl source (as the lack of alias support currently requires
me to rename function declarations in the source). Of course pthreads which I’m
working on now are going to be completely different… but musl has support for
architecture specific overrides in its build system.
>>>>> 
>>>>> BTW I now have some quite non-trivial programs compiling
against musl-xnu + libcxx + libcxxabi on macos.
>>>>> 
>>>>> There are a lot of libcxx changes like this:
>>>>> 
>>>>> -#ifdef __APPLE__
>>>>> +#if defined(__APPLE__) &&
!defined(_LIBCPP_HAS_MUSL_LIBC)
>>>>> 
>>>>> Michael.
>>>>> 
>>>>> [1]
https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99
<https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99>_______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170616/f5d8eb30/attachment.html>

Jean-Daniel via llvm-dev

2017-Jun-17 16:46 UTC

head link

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

> Le 15 juin 2017 à 01:35, Louis Gerbarg via llvm-dev <llvm-dev at
lists.llvm.org> a écrit :
> 
>> 
>> On Jun 14, 2017, at 2:47 PM, Michael Clark via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> 
>>> On 15 Jun 2017, at 6:50 AM, Louis Gerbarg <lgerbarg at apple.com
<mailto:lgerbarg at apple.com>> wrote:
>>> 
>>>> 
>>>> On Jun 6, 2017, at 4:08 PM, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>> 
>>>> Hi Folks,
>>>> 
>>>> I’m working on a port of musl libc to macos (arch triple is
“x86_64-xnu-musl”) to solve some irreconcilable issues I’m having with
libSystem.dylib. I don’t want to use glibc for various reasons, mainly because I
want to static link. I have static PIE + ASLR working which is not actually
supported by the Apple toolchain (*1), but I managed to get it to work. I’m sure
Apple might say “Don’t do that”, but from looking at the history of the xnu
kernel ABI, it seems to be very stable between versions.
>>> 
>>> I am from Apple, and I will say “Don’t do that.” The kernel ABI for
our platforms is not stable, we only guarantee stability at the dynamic link
boundary (in this case public symbols exported from libSystem). While the kernel
syscall numbers have not changed (though the kernel team reserves the right to
do that), the parameter lists and argument marshaling for them certainly has
changed. We also do not support static executables on our system.
>>> 
>>> We even had bincompat issues related to this i rolled their during
the last major release (macOS 10.12 Sierra): Go implemented its own syscall
support, which caused all of their binaries that used gettimeofday since the
internal interface changed <https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. More broadly you can look
at a discussion of their issues here:
<https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. In their case they want
to avoid invoking an external linker like ld64 or lld as opposed to avoiding
libSystem, but the effect is the same, they shipped a tool that caused
unsuspecting developers to have surprise bincompat issues that were entirely
avoidable.
>> 
>> I’m aware that Go has had some issues with the XNU ABI boundary.
>> 
>> I’m working on a CPU simulator / binary translator and I need control
of the process address space layout. It seems I may ultimately need to use
Hypervisor.framework however that is a lot more work in the short term.
> 
> I actually thought about mentioning Hypervisor.framework, but I was not
sure about your use case. If you need full control of the address space
(PAGE_ZERO control, overriding the shared cache mappings, etc) that is really
the only supported mechanism.
> 
>> The issue I am having with libSystem.dylib is the lack of weak linkage
(versus weak_import) i.e. weak aliases. I don’t want to use a wrapper binary
with DYLD_INSERT_LIBRARIES. I want to interpose Libc symbols with some of the
symbols present in my binary (memory allocator, mmap). Interposition support is
somewhat lacking in the Mach-O toolchain and runtime linker despite the Mach-O
format technically supporting what I need (N_INDR and N_WEAK_DEF).
>> 
>> - https://developer.apple.com/documentation/kernel/nlist_64
<https://developer.apple.com/documentation/kernel/nlist_64>
> Dyld does not generally use nlists at runtime except for things like
dladdr(), and has not for the last 10 years or so. Instead dyld uses a trie to
publish exports, and and a small byte code language to describe binding imports.
We still support using nlists for old binaries, but anything built with recent
tools also contains the newer trie and bind op codes which will be used if they
are available. I do not think our tries can express the sort of import semantics
you want.
> 
> I see two ways of potentially doing it (short of hypervisor.framework).
Both of them are a bit gross and have some bincompat risks, but given you are an
open source project and can rev if need be that may not be an enormous issue:
> 
> You could specify a custom segment in your executable with zero file size
and a vm size the blocks out the address range you need and then unmap it. There
may be practical limits on it that prevent you from achieving what you want.
> 
> You could use implicit interposing. This is a feature add so that ASAN
binaries can avoid the the whole re-exec with DYLD_INSERT_LIBRARIES issue. It is
not guaranteed to be stable, but in practice it is probably the most stable
option short of using a hypervisor. The way it works:
Was is really added for ASAN ? I remember using dyld_interpose for a long time.
It was already described in the Amit Singh edition of Mac OS X Internal book
(which is older than clang). Or we are not talking about the same interpose ?
> Define all the symbols in a dylib along with an interpose section (as
though you were going to load it with DYLD_INSERT_LIBRARIES). Directly link your
executable to binary. Dyld will discover the interpose during dependency
analysis (before libSystem initializers are run) and apply the interpose. This
has only been tested in the case of our sanitizer runtimes, but it *SHOULD*
work.
> 
>> 
>> If I could use N_INDR and N_WEAK_DEF to have early bound (runtime link
time) interposition with symbols in my binary replacing the C library allocator
and mmap, and have libSystem use my implementations then I would be happy.
libSystem itself would need to use weak aliases. This is possible with C
libraries on other platforms.
>> 
>> I’ve tried relentlessly to intercept the malloc_zone implementation.
malloc_zone_register is not sufficient as some of the internal zones are tied to
the internals of Libc and I am getting heap collisions with Libc allocated
objects and my guest address space. On Linux I have enough control to do what I
need and can interpose my symbols to implement versions of libc functions that I
wish to override. The problem on darwin is that I am not able to interpose the
malloc implementation until main starts, and at that point it is tool late as
the C library already has created its internal zones. I’m also unable to
interpose mmap. I have already looked at the interpose symbol tricks but they
don’t meet my purposes  (not wanting to re-exec with DYLD_INSERT_LIBRARIES).
Weak aliases from libSystem to the allocator implementation and various public
symbols along with N_INDR and N_WEAK_DEF would be required for me to achieve
what I need to achieve (somewhat similarly to the elegant internal
implementation of musl libc).
>> 
>> With my current solution (musl on xnu) I have successfully reserved
0x1000 – 0x7fff_0000_0000. Essentially the low 128TiB minus 4GiB at the top of
the address space where I place my translator and translator stack. This is
satisfactory for my user mode simulator to emulate Linux processes on macOS.
>> 
>> I think Hypervisor.framework is probably the correct interface to be
using if I want to avoid the kernel ABI, however that is a lot more work that
making syscall wrappers and I would need to implement communication from VM
process to the host process.
> 
> I think that this is probably your best choice from a binary compatibility
standpoint in the long run. It is a lot of work, though I am not sure if it is
really that much more work than trying to port a new libc or maintain a custom
toolchain.
> 
> Louis
> 
>> 
>> I’m actually implementing Linux syscall emulation in a user simulator
so the kernel ABI is probably the technically correct layer. The full system
emulator ultimately needs to use Hypervisor.framework if I am to use hardware
paging instead of soft MMU. I have two simulators, a user-mode sim that emulates
the Linux ABI and a full system emulator: https://rv8.io/
<https://rv8.io/> and I really want to support RISC-V Linux on macOS in
the user mode simulator.
>> 
>> Proper Linux ABI emulation on macOS would ultimately require kernel
support, at minimum something like binfmt misc, but ideally a kext that
implements another ABI personality (much like Linux ABI emulation on Windows) in
addition to the BSD personality. In fact the FreeBSD linux compat could be used
if the FreeBSD portion of XNU is synced up with current, and we’d get bug fixes
for long standing issues like the macOS TCP_NOPUSH bug that has long since been
fixed in FreeBSD.
>> 
>>> Ultimately if you are doing this on your own for your own for fun
thats great, but if this something you intend to ship to other people please
reconsider. It is more than a theoretical concern that it will break.
>>> 
>>> Louis
>>> 
>>>> In any case the musl libc source makes extensive use of weak
aliases, perhaps to allow easier interposition of C library routines, however
aliases, weak or otherwise are not currently supported by ld64.
>>>> 
>>>> It appears that the mach-o format supports aliases, but the
functionality has not been exposed via the linker (ld64/LLD).
>>>> 
>>>> - http://blog.omega-prime.co.uk/?p=121
<http://blog.omega-prime.co.uk/?p=121>
>>>> 
>>>> The musl code does the following which currently errors out
saying aliases are not currently supported:
>>>> 
>>>> #undef weak_alias
>>>> #define weak_alias(old, new) \
>>>>         extern __typeof(old) new __attribute__((weak,
alias(#old)))
>>>> 
>>>> and the macro is used internally like this:
>>>> 
>>>> int __pthread_join(pthread_t t, void **res)
>>>> {
>>>>         // implementation here
>>>> }
>>>> 
>>>> weak_alias(__pthread_join, pthread_join);
>>>> 
>>>> The problem is the actual export used by clients is an alias
and I want to maintain source compatibility.
>>>> 
>>>> I seem to have found a way to semi-emulate aliases (at least
within one module). My goal is to at least turn them into strong aliases
somehow, so I can at a minimum make the musl source compatible with clang on
macos. The following compiles but foo is not exported:
>>>> 
>>>> $ cat a.c
>>>> #include <stdio.h>
>>>> 
>>>> void foo() __attribute__((weak_import))
__asm("_bar");
>>>> 
>>>> void bar()
>>>> {
>>>>         printf("bar\n");
>>>> }
>>>> 
>>>> int main()
>>>> {
>>>>         foo();
>>>> }
>>>> 
>>>> $ cc -c a.c -o a.o
>>>> $ nm a.o
>>>> 0000000000000000 T _bar
>>>> 0000000000000020 T _main
>>>>                  U _printf
>>>> 
>>>> Any ideas how I can get foo as an exported symbol? 
>>>> 
>>>> Is weak alias or plan alias support planned for mach-o in LLD?
>>>> 
>>>> The goal at a minimum is to make the weak_alias macro emit a
strong alias with clang/ld64 or clang/LLD? so I don’t need to diverge too much
from the upstream musl source (as the lack of alias support currently requires
me to rename function declarations in the source). Of course pthreads which I’m
working on now are going to be completely different… but musl has support for
architecture specific overrides in its build system.
>>>> 
>>>> BTW I now have some quite non-trivial programs compiling
against musl-xnu + libcxx + libcxxabi on macos.
>>>> 
>>>> There are a lot of libcxx changes like this:
>>>> 
>>>> -#ifdef __APPLE__
>>>> +#if defined(__APPLE__) &&
!defined(_LIBCPP_HAS_MUSL_LIBC)
>>>> 
>>>> Michael.
>>>> 
>>>> [1]
https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99
<https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99>_______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170617/bbfffc6d/attachment.html>

Louis Gerbarg via llvm-dev

2017-Jun-22 19:44 UTC

head link

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

> On Jun 17, 2017, at 9:46 AM, Jean-Daniel <mailing at xenonium.com>
wrote:
> 
>> 
>> Le 15 juin 2017 à 01:35, Louis Gerbarg via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> a écrit :
>> 
>>> 
>>> On Jun 14, 2017, at 2:47 PM, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>>> 
>>>> On 15 Jun 2017, at 6:50 AM, Louis Gerbarg <lgerbarg at
apple.com <mailto:lgerbarg at apple.com>> wrote:
>>>> 
>>>>> 
>>>>> On Jun 6, 2017, at 4:08 PM, Michael Clark via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>> 
>>>>> Hi Folks,
>>>>> 
>>>>> I’m working on a port of musl libc to macos (arch triple is
“x86_64-xnu-musl”) to solve some irreconcilable issues I’m having with
libSystem.dylib. I don’t want to use glibc for various reasons, mainly because I
want to static link. I have static PIE + ASLR working which is not actually
supported by the Apple toolchain (*1), but I managed to get it to work. I’m sure
Apple might say “Don’t do that”, but from looking at the history of the xnu
kernel ABI, it seems to be very stable between versions.
>>>> 
>>>> I am from Apple, and I will say “Don’t do that.” The kernel ABI
for our platforms is not stable, we only guarantee stability at the dynamic link
boundary (in this case public symbols exported from libSystem). While the kernel
syscall numbers have not changed (though the kernel team reserves the right to
do that), the parameter lists and argument marshaling for them certainly has
changed. We also do not support static executables on our system.
>>>> 
>>>> We even had bincompat issues related to this i rolled their
during the last major release (macOS 10.12 Sierra): Go implemented its own
syscall support, which caused all of their binaries that used gettimeofday since
the internal interface changed <https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. More broadly you can look
at a discussion of their issues here:
<https://github.com/golang/go/issues/16606
<https://github.com/golang/go/issues/16606>>. In their case they want
to avoid invoking an external linker like ld64 or lld as opposed to avoiding
libSystem, but the effect is the same, they shipped a tool that caused
unsuspecting developers to have surprise bincompat issues that were entirely
avoidable.
>>> 
>>> I’m aware that Go has had some issues with the XNU ABI boundary.
>>> 
>>> I’m working on a CPU simulator / binary translator and I need
control of the process address space layout. It seems I may ultimately need to
use Hypervisor.framework however that is a lot more work in the short term.
>> 
>> I actually thought about mentioning Hypervisor.framework, but I was not
sure about your use case. If you need full control of the address space
(PAGE_ZERO control, overriding the shared cache mappings, etc) that is really
the only supported mechanism.
>> 
>>> The issue I am having with libSystem.dylib is the lack of weak
linkage (versus weak_import) i.e. weak aliases. I don’t want to use a wrapper
binary with DYLD_INSERT_LIBRARIES. I want to interpose Libc symbols with some of
the symbols present in my binary (memory allocator, mmap). Interposition support
is somewhat lacking in the Mach-O toolchain and runtime linker despite the
Mach-O format technically supporting what I need (N_INDR and N_WEAK_DEF).
>>> 
>>> - https://developer.apple.com/documentation/kernel/nlist_64
<https://developer.apple.com/documentation/kernel/nlist_64>
>> Dyld does not generally use nlists at runtime except for things like
dladdr(), and has not for the last 10 years or so. Instead dyld uses a trie to
publish exports, and and a small byte code language to describe binding imports.
We still support using nlists for old binaries, but anything built with recent
tools also contains the newer trie and bind op codes which will be used if they
are available. I do not think our tries can express the sort of import semantics
you want.
>> 
>> I see two ways of potentially doing it (short of hypervisor.framework).
Both of them are a bit gross and have some bincompat risks, but given you are an
open source project and can rev if need be that may not be an enormous issue:
>> 
>> You could specify a custom segment in your executable with zero file
size and a vm size the blocks out the address range you need and then unmap it.
There may be practical limits on it that prevent you from achieving what you
want.
>> 
>> You could use implicit interposing. This is a feature add so that ASAN
binaries can avoid the the whole re-exec with DYLD_INSERT_LIBRARIES issue. It is
not guaranteed to be stable, but in practice it is probably the most stable
option short of using a hypervisor. The way it works:
> 
> Was is really added for ASAN ? I remember using dyld_interpose for a long
time. It was already described in the Amit Singh edition of Mac OS X Internal
book (which is older than clang). Or we are not talking about the same interpose
?
Interposing has been around for a long time, but interposes would only be
enabled by explicitly interposing the library at launch via
DYLD_INSERT_LIBRARIES. In the case of ASAN where the compiled binaries are ABI
dependent on the ASAN runtime support (and thus cannot really run without the
interposed memory allocators) it doesn’t make sense to use the environment to
control interposing. Instead the interposes are included in the ASAN runtime
dylib (which the applications built for ASAN link to), dyld discovers them
during dependency analysis and uses them. This implicit discovery mechanism was
added for ASAN, though the underlying mechanism for specifying interposed
functions is the same.

Louis
> 
>> Define all the symbols in a dylib along with an interpose section (as
though you were going to load it with DYLD_INSERT_LIBRARIES). Directly link your
executable to binary. Dyld will discover the interpose during dependency
analysis (before libSystem initializers are run) and apply the interpose. This
has only been tested in the case of our sanitizer runtimes, but it *SHOULD*
work.
>> 
>>> 
>>> If I could use N_INDR and N_WEAK_DEF to have early bound (runtime
link time) interposition with symbols in my binary replacing the C library
allocator and mmap, and have libSystem use my implementations then I would be
happy. libSystem itself would need to use weak aliases. This is possible with C
libraries on other platforms.
>>> 
>>> I’ve tried relentlessly to intercept the malloc_zone
implementation. malloc_zone_register is not sufficient as some of the internal
zones are tied to the internals of Libc and I am getting heap collisions with
Libc allocated objects and my guest address space. On Linux I have enough
control to do what I need and can interpose my symbols to implement versions of
libc functions that I wish to override. The problem on darwin is that I am not
able to interpose the malloc implementation until main starts, and at that point
it is tool late as the C library already has created its internal zones. I’m
also unable to interpose mmap. I have already looked at the interpose symbol
tricks but they don’t meet my purposes  (not wanting to re-exec with
DYLD_INSERT_LIBRARIES). Weak aliases from libSystem to the allocator
implementation and various public symbols along with N_INDR and N_WEAK_DEF would
be required for me to achieve what I need to achieve (somewhat similarly to the
elegant internal implementation of musl libc).
>>> 
>>> With my current solution (musl on xnu) I have successfully reserved
0x1000 – 0x7fff_0000_0000. Essentially the low 128TiB minus 4GiB at the top of
the address space where I place my translator and translator stack. This is
satisfactory for my user mode simulator to emulate Linux processes on macOS.
>>> 
>>> I think Hypervisor.framework is probably the correct interface to
be using if I want to avoid the kernel ABI, however that is a lot more work that
making syscall wrappers and I would need to implement communication from VM
process to the host process.
>> 
>> I think that this is probably your best choice from a binary
compatibility standpoint in the long run. It is a lot of work, though I am not
sure if it is really that much more work than trying to port a new libc or
maintain a custom toolchain.
>> 
>> Louis
>> 
>>> 
>>> I’m actually implementing Linux syscall emulation in a user
simulator so the kernel ABI is probably the technically correct layer. The full
system emulator ultimately needs to use Hypervisor.framework if I am to use
hardware paging instead of soft MMU. I have two simulators, a user-mode sim that
emulates the Linux ABI and a full system emulator: https://rv8.io/
<https://rv8.io/> and I really want to support RISC-V Linux on macOS in
the user mode simulator.
>>> 
>>> Proper Linux ABI emulation on macOS would ultimately require kernel
support, at minimum something like binfmt misc, but ideally a kext that
implements another ABI personality (much like Linux ABI emulation on Windows) in
addition to the BSD personality. In fact the FreeBSD linux compat could be used
if the FreeBSD portion of XNU is synced up with current, and we’d get bug fixes
for long standing issues like the macOS TCP_NOPUSH bug that has long since been
fixed in FreeBSD.
>>> 
>>>> Ultimately if you are doing this on your own for your own for
fun thats great, but if this something you intend to ship to other people please
reconsider. It is more than a theoretical concern that it will break.
>>>> 
>>>> Louis
>>>> 
>>>>> In any case the musl libc source makes extensive use of
weak aliases, perhaps to allow easier interposition of C library routines,
however aliases, weak or otherwise are not currently supported by ld64.
>>>>> 
>>>>> It appears that the mach-o format supports aliases, but the
functionality has not been exposed via the linker (ld64/LLD).
>>>>> 
>>>>> - http://blog.omega-prime.co.uk/?p=121
<http://blog.omega-prime.co.uk/?p=121>
>>>>> 
>>>>> The musl code does the following which currently errors out
saying aliases are not currently supported:
>>>>> 
>>>>> #undef weak_alias
>>>>> #define weak_alias(old, new) \
>>>>>         extern __typeof(old) new __attribute__((weak,
alias(#old)))
>>>>> 
>>>>> and the macro is used internally like this:
>>>>> 
>>>>> int __pthread_join(pthread_t t, void **res)
>>>>> {
>>>>>         // implementation here
>>>>> }
>>>>> 
>>>>> weak_alias(__pthread_join, pthread_join);
>>>>> 
>>>>> The problem is the actual export used by clients is an
alias and I want to maintain source compatibility.
>>>>> 
>>>>> I seem to have found a way to semi-emulate aliases (at
least within one module). My goal is to at least turn them into strong aliases
somehow, so I can at a minimum make the musl source compatible with clang on
macos. The following compiles but foo is not exported:
>>>>> 
>>>>> $ cat a.c
>>>>> #include <stdio.h>
>>>>> 
>>>>> void foo() __attribute__((weak_import))
__asm("_bar");
>>>>> 
>>>>> void bar()
>>>>> {
>>>>>         printf("bar\n");
>>>>> }
>>>>> 
>>>>> int main()
>>>>> {
>>>>>         foo();
>>>>> }
>>>>> 
>>>>> $ cc -c a.c -o a.o
>>>>> $ nm a.o
>>>>> 0000000000000000 T _bar
>>>>> 0000000000000020 T _main
>>>>>                  U _printf
>>>>> 
>>>>> Any ideas how I can get foo as an exported symbol? 
>>>>> 
>>>>> Is weak alias or plan alias support planned for mach-o in
LLD?
>>>>> 
>>>>> The goal at a minimum is to make the weak_alias macro emit
a strong alias with clang/ld64 or clang/LLD? so I don’t need to diverge too much
from the upstream musl source (as the lack of alias support currently requires
me to rename function declarations in the source). Of course pthreads which I’m
working on now are going to be completely different… but musl has support for
architecture specific overrides in its build system.
>>>>> 
>>>>> BTW I now have some quite non-trivial programs compiling
against musl-xnu + libcxx + libcxxabi on macos.
>>>>> 
>>>>> There are a lot of libcxx changes like this:
>>>>> 
>>>>> -#ifdef __APPLE__
>>>>> +#if defined(__APPLE__) &&
!defined(_LIBCPP_HAS_MUSL_LIBC)
>>>>> 
>>>>> Michael.
>>>>> 
>>>>> [1]
https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99
<https://gist.github.com/michaeljclark/0a805652ec4be987a782afb902f06a99>_______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170622/29a07ce4/attachment.html>

llvm dev - Jun 2017 - LLD support for mach-o aliases (weak or otherwise)

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)

[llvm-dev] LLD support for mach-o aliases (weak or otherwise)