Alexander Potapenko
2012-Nov-30 18:32 UTC
[LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"
No, we are not going to use mach_inject. This isn't portable and may be even harder to set up than mach_override. The new ASan runtime will use the dylib interposition and will in fact require DYLD_INSERT_LIBRARIES to work. However ASan already handles it correctly itself: if the corresponding env var is missing the app is just re-execed. Dylib interposition is supported by Apple and should work on iOS as well as Mac OS. It will also probably simplify hooking the memory allocations in ASan, which is now very tricky. On Fri, Nov 30, 2012 at 6:56 AM, Jack Howarth <howarth at bromo.med.uc.edu> wrote:> On Fri, Nov 30, 2012 at 01:41:05PM +0400, Kostya Serebryany wrote: >> Just want to remind everyone that we plan to stop using mach_override in >> asanin favor of OSX's native function interposition. >> So, we probably don't want to spend too much effort fixing mach_override. >> >> --kcc > > Kostya, > Is the native function interposition that is being adopted based on... > > https://github.com/rentzsch/mach_inject > > ? I assume that any method used will be transparent to the user and not require > manually setting DYLD_INSERT_LIBRARIES, correct? > Jack > >> >> On Fri, Nov 30, 2012 at 4:46 AM, Alexander Potapenko <glider at google.com>wrote: >> >> > Looks like this happens on x86_64 because the position of __cxa_throw >> > is too far from the allocated branch island (should be <2G). This can >> > be solved by allocating the branch islands somewhere near the text >> > segment (look for kIslandEnd in asan_mac.cc, this is currently >> > 0x7fffffdf0000) or by patching the function with a longer instruction >> > sequence that stores the jump target in a register and jumps to that >> > target (which is a bit more complex to implement). >> > >> > Once this problem is fixed, another one is going to arise. This is how >> > the first bytes of __cxa_throw look like: >> > >> > 0x0020c49ba5d916e0 <__cxa_throw+0>: lea 0xb4f01(%rip),%rax # >> > 0x20c49ba5e465e8 <_ZN10__cxxabiv120__unexpected_handlerE> >> > 0x0020c49ba5d916e7 <__cxa_throw+7>: push %rbx >> > 0x0020c49ba5d916e8 <__cxa_throw+8>: lea -0x20(%rdi),%rbx >> > >> > If we move the relative LEA instruction somewhere, we must fix the >> > constant in order to keep it pointing to the same address. >> > mach_override already does this for relative CALL and JMP >> > instructions, but not for LEA. This should be fairly simple to fix. >> > >> > Note that the 32-bit variant crashes on another invalid address: >> > >> > ASAN:SIGSEGV >> > ================================================================>> > ==89768== ERROR: AddressSanitizer: SEGV on unknown address 0xcccccccc >> > (pc 0x00061f8c sp 0xbffa8bd0 bp 0xbffa8cc8 T0) >> > AddressSanitizer can not provide additional info. >> > #0 0x61f8b >> > (/Users/glider/src/gcc-asan/inst/lib/i386/libstdc++.6.dylib+0x3f8b) >> > #1 0x91391724 (/usr/lib/system/libdyld.dylib+0x2724) >> > #2 0x0 >> > Stats: 0M malloced (0M for red zones) by 3 calls >> > Stats: 0M realloced by 0 calls >> > Stats: 0M freed by 0 calls >> > Stats: 0M really freed by 0 calls >> > Stats: 1M (256 full pages) mmaped in 2 calls >> > mmaps by size class: 7:4095; 8:2047; >> > mallocs by size class: 7:1; 8:2; >> > frees by size class: >> > rfrees by size class: >> > Stats: malloc large: 0 small slow: 2 >> > ==89768== ABORTING >> > >> > My guess is that this is caused by the following code being moved to a >> > branch island: >> > >> > Dump of assembler code for function __cxa_throw: >> > 0x00008f60 <__cxa_throw+0>: push %esi >> > 0x00008f61 <__cxa_throw+1>: push %ebx >> > 0x00008f62 <__cxa_throw+2>: call 0x7a60 <__x86.get_pc_thunk.bx> >> > >> > Perhaps this makes __x86.get_pc_thunk.bx return an incorrect value. >> > >> > Since libstdc++-v3 is built together with gcc, the two issues related >> > to instructions being moved to another place can be solved by padding >> > __cxa_throw() with five NOP instructions (enough to hold a JMP). I >> > believe this should be acceptable, because the performance penalty for >> > additional NOPs is negligible, and __cxa_throw() isn't a hot point. >> > >> > On Thu, Nov 29, 2012 at 1:01 PM, Nick Kledzik <kledzik at apple.com> wrote: >> > > I debugged this a bit and it seems the mach_override patching of >> > __cxa_throw is bogus. The start of that function is patched to jump to >> > garbage. >> > > >> > > Breakpoint 1, 0x0000000100001c19 in main () >> > > (gdb) display/i $pc >> > > 2: x/i $pc 0x100001c19 <main+318>: callq 0x100016386 >> > <dyld_stub___cxa_throw> >> > > (gdb) si >> > > 0x0000000100016386 in dyld_stub___cxa_throw () >> > > 2: x/i $pc 0x100016386 <dyld_stub___cxa_throw>: jmpq >> > *0xae1c(%rip) # 0x1000211a8 >> > > (gdb) >> > > 0x0000000102244870 in __cxa_throw () >> > > 2: x/i $pc 0x102244870 <__cxa_throw>: jmpq 0xffd27000 >> > > (gdb) # the above its __cxa_throw in gcc's libstdc++.6.dylib. The >> > first instruction has been patch to jump to a garbage address. >> > > >> > > (gdb) x/8i 0x102244870-8 >> > > 0x102244868 >> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+56>: >> > std >> > > 0x102244869 >> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+57>: >> > (bad) >> > > 0x10224486a >> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+58>: >> > decl (%rdi) >> > > 0x10224486c >> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+60>: >> > (bad) >> > > 0x10224486d >> > <_ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception+61>: >> > add %r8b,(%rax) >> > > 0x102244870 <__cxa_throw>: jmpq 0xffd27000 >> > > 0x102244875 <__cxa_throw+5>: or (%rax),%eax >> > > 0x102244877 <__cxa_throw+7>: push %rbx >> > > (gdb) >> > > (gdb) watch *0x102244870 >> > > Hardware watchpoint 2: *4330899568 >> > > (gdb) r >> > > >> > > Old value = -788165304 >> > > New value = -1373139991 >> > > 0x0000000100016203 in __asan_mach_override_ptr_custom () >> > > (gdb) bt >> > > #0 0x0000000100016203 in __asan_mach_override_ptr_custom () >> > > #1 0x0000000100015a9e in __interception::OverrideFunction () >> > > #2 0x00007fff5fc13378 in ImageLoaderMachO::doModInitFunctions () >> > > #3 0x00007fff5fc13762 in ImageLoaderMachO::doInitialization () >> > > #4 0x00007fff5fc1006e in ImageLoader::recursiveInitialization () >> > > #5 0x00007fff5fc0feba in ImageLoader::runInitializers () >> > > #6 0x00007fff5fc01fc0 in dyld::initializeMainExecutable () >> > > #7 0x00007fff5fc05b04 in dyld::_main () >> > > #8 0x00007fff5fc01397 in dyldbootstrap::start () >> > > #9 0x00007fff5fc0105e in _dyld_start () >> > > (gdb) x/8i 0x102244870 >> > > 0x102244870 <__cxa_throw>: jmpq 0xffd27000 >> > > 0x102244875 <__cxa_throw+5>: or (%rax),%eax >> > > 0x102244877 <__cxa_throw+7>: push %rbx >> > > 0x102244878 <__cxa_throw+8>: lea -0x20(%rdi),%rbx >> > > 0x10224487c <__cxa_throw+12>: mov %rsi,-0x70(%rdi) >> > > # Here is where the patching is being done >> > > >> > > -Nick >> > > >> > > On Nov 29, 2012, at 11:07 AM, Alexander Potapenko wrote: >> > >>> On Thu, Nov 29, 2012 at 9:55 PM, Jack Howarth < >> > howarth at bromo.med.uc.edu> >> > >>> wrote: >> > >>>> >> > >>>> Nick, >> > >>>> Can you take a quick look at the asan_eh_bug.tar.bz testcase >> > >>>> I uploaded into the newly opened radr://12777299, "potential >> > >>>> pthread/eh bug exposed by libsanitizer". The FSF gcc developers >> > >>>> have ported llvm.org's asan code into FSF gcc (and are keeping >> > >>>> it synced to the upstream llvm.org code). I have been helping >> > >>>> with the darwin build and testing -fsanitize=address against the >> > >>>> complete FSF gcc testsuite. This seems to have exposed a potential >> > >>>> bug in pthread or eh on darwin under libasan. Hundreds of test cases >> > >>>> in the g++ and libstdc++ testsuites fail under -fsanitize=address >> > >>>> in the following manner... >> > >>>> >> > >>>> ASAN:SIGSEGV >> > >>>> ================================================================>> > >>>> ==2738== ERROR: AddressSanitizer: SEGV on unknown address >> > 0x0000ffd27000 >> > >>>> (pc 0x0000ffd27000 sp 0x7fff55e40828 bp 0x7fff55e408f0 T0) >> > >>>> AddressSanitizer can not provide additional info. >> > >>>> #0 0xffd26fff >> > (/Users/howarth/asan_eh_bug/./cond1_asan.exe+0xf5f67fff) >> > >>>> #1 0x7fff8bd827e0 (/usr/lib/system/libdyld.dylib+0x27e0) >> > >>>> #2 0x0 >> > >>>> Stats: 0M malloced (0M for red zones) by 3 calls >> > >>>> Stats: 0M realloced by 0 calls >> > >>>> Stats: 0M freed by 0 calls >> > >>>> Stats: 0M really freed by 0 calls >> > >>>> Stats: 1M (384 full pages) mmaped in 3 calls >> > >>>> mmaps by size class: 7:4095; 8:2047; 9:1023; >> > >>>> mallocs by size class: 7:1; 8:1; 9:1; >> > >>>> frees by size class: >> > >>>> rfrees by size class: >> > >>>> Stats: malloc large: 0 small slow: 3 >> > >>>> ==2738== ABORTING >> > >>>> >> > >>>> The failure of... >> > >>>> >> > >>>> FAIL: g++.dg/eh/cond1.C -std=c++98 execution test >> > >>>> >> > >>>> was used as the test case for the radar report and compiled with... >> > >>>> >> > >>>> g++-fsf-4.8 -static-libasan -fsanitize=address -std=c++98 cond1.C -g >> > -O0 >> > >>>> -o cond1_asan.exe >> > >>>> >> > >>>> to produce the above failure. When compiled without libasan as... >> > >>>> >> > >>>> g++-fsf-4.8 -std=c++98 cond1.C -g -O0 -o cond1_no_asan.exe >> > >>>> >> > >>>> the resulting executable runs fine. Debugging this in gdb seems to >> > show >> > >>>> that the failure >> > >>>> is occuring in the final call to dyld_stub_pthread_once (). The same >> > test >> > >>>> case >> > >>>> compiles fine with -fsanitize=address under llvm 3.2 clang++ and >> > produces >> > >>>> no runtime errors >> > >>>> but the code execution path is very different in that case (because >> > of the >> > >>>> different >> > >>>> libstdc++). >> > >>>> Can you take a quick peek at this and determine if this is a darwin >> > >>>> pthread or unwinder >> > >>>> bug or an issue with libasan that FSF gcc's compiler is exposing? >> > Thanks >> > >>>> in advance for >> > >>>> any help on this. >> > >>>> Jack >> > >>>> _______________________________________________ >> > >>>> LLVM Developers mailing list >> > >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >>> >> > >>> >> > >> >> > >> >> > >> >> > >> -- >> > >> Alexander Potapenko >> > >> Software Engineer >> > >> Google Moscow >> > > >> > >> > >> > >> > -- >> > Alexander Potapenko >> > Software Engineer >> > Google Moscow >> >-- Alexander Potapenko Software Engineer Google Moscow
Alexander Potapenko
2012-Dec-04 17:46 UTC
[LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"
+kledzik at apple.com The dynamic runtime is using dylib interposition (google for "__DATA,__interpose). If I'm understanding correctly (Nick, can you please confirm this?) this allows to interpose the function regardless of the two-level namespace. The support for dynamic runtime in ASan is almost there. But the new interposition method has revealed some issues with the allocator which were corked here and there before. Most of those are caused by a CoreFoundation dependency, which I'm trying to eliminate now. On Mon, Dec 3, 2012 at 8:50 PM, Rafael EspĂndola <rafael.espindola at gmail.com> wrote:> On 30 November 2012 13:32, Alexander Potapenko <glider at google.com> wrote: >> No, we are not going to use mach_inject. This isn't portable and may >> be even harder to set up than mach_override. >> The new ASan runtime will use the dylib interposition and will in fact >> require DYLD_INSERT_LIBRARIES to work. However ASan already handles it >> correctly itself: if the corresponding env var is missing the app is >> just re-execed. >> Dylib interposition is supported by Apple and should work on iOS as >> well as Mac OS. It will also probably simplify hooking the memory >> allocations in ASan, which is now very tricky. > > This is interesting! I had some difficulties with mach_override myself > in firefox. Don't you have to disable the two-level namespace to be > able to override the functions you want? What currently blocks using > DYLD_INSERT_LIBRARIES instead of mach_override? > > Cheers, > Rafael-- Alexander Potapenko Software Engineer Google Moscow
Reasonably Related Threads
- [LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"
- [LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"
- [LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"
- [LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"
- [LLVMdev] radr://12777299, "potential pthread/eh bug exposed by libsanitizer"