> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com> wrote: > > On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >> >>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >>>> Hi, >>>> >>>> I'm trying to write a TSan interceptor for the C++11 call_once function. There are currently false positive reports, because the inner __call_once function is located in the (non-instrumented) libcxx library, and on macOS we can't expect the users to build their own instrumented libcxx. >>>> >>>> TSan already supports pthread_once and dispatch_once by having interceptors that re-implement the logic. However, doing the same for call_once/__call_once doesn't work, because call_once is explicitly supposed to be exception-safe, but the sanitizer runtime libraries disallow exception handling. >>>> >>>> Any ideas how to handle call_once in TSan? >>> >>> Does anybody remember exact reasons we disable exceptions in sanitizer >>> runtimes? One is that it won't link with C programs. Are there any >>> other? >>> If C is the only reason: there is already a part of tsan runtime is >>> linked only to C++ programs (it contains operator new/delete >>> interceptors). We could add additional files to the cxx part of >>> runtime and build them with exceptions. >>> >>> Alternatively, the interceptor can handle only synchronization but >>> forward actual logic to the real function. Along the lines of: >>> >>> INTERCEPTOR(call_once, o) { >>> __tsan_acquire_release(o); >>> REAL(call_once)(o); >>> } >>> >>> That will have some performance impact. If we hardcode the "fully >>> initialized" value, then we can eliminate the additional overhead: >>> >>> INTERCEPTOR(call_once, o) { >>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) { >>> __tsan_acquire(o); >>> return; >>> } >>> __tsan_acquire_release(o); >>> REAL(call_once)(o); >>> } >> >> Unfortunately, the first fast-path check is inlined and cannot be intercepted. We can only intercept the inner call to __call_once. But how would __tsan_acquire_release help here? The issue is that we need to perform the release *after* user code has run, but before the "o" flag is changed. Otherwise, TSan will still see a false positive where one thread has already run user code, and another thread already sees that call_once is finished, but the acquire has no release to pair with. > > > > Will then the following work? > > INTERCEPTOR(call_once, o) { > REAL(call_once)(o); > __tsan_release_merge(o); > __tsan_acquire(o); > }Still racy. Suppose thread A is still inside REAL(call_once), but already after it has run user code and updated "o" to ~0. Thread B load-acquires "o", finds ~0, assumes it's fully initialized, keeps going, but user code stores hasn't been properly published. Kuba
INTERCEPTOR(call_once, o) { __tsan_release_merge(o); REAL(call_once)(o); __tsan_acquire(o); } ? On Fri, Sep 2, 2016 at 12:16 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote:> >> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com> wrote: >> >> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >>> >>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> >>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >>>>> Hi, >>>>> >>>>> I'm trying to write a TSan interceptor for the C++11 call_once function. There are currently false positive reports, because the inner __call_once function is located in the (non-instrumented) libcxx library, and on macOS we can't expect the users to build their own instrumented libcxx. >>>>> >>>>> TSan already supports pthread_once and dispatch_once by having interceptors that re-implement the logic. However, doing the same for call_once/__call_once doesn't work, because call_once is explicitly supposed to be exception-safe, but the sanitizer runtime libraries disallow exception handling. >>>>> >>>>> Any ideas how to handle call_once in TSan? >>>> >>>> Does anybody remember exact reasons we disable exceptions in sanitizer >>>> runtimes? One is that it won't link with C programs. Are there any >>>> other? >>>> If C is the only reason: there is already a part of tsan runtime is >>>> linked only to C++ programs (it contains operator new/delete >>>> interceptors). We could add additional files to the cxx part of >>>> runtime and build them with exceptions. >>>> >>>> Alternatively, the interceptor can handle only synchronization but >>>> forward actual logic to the real function. Along the lines of: >>>> >>>> INTERCEPTOR(call_once, o) { >>>> __tsan_acquire_release(o); >>>> REAL(call_once)(o); >>>> } >>>> >>>> That will have some performance impact. If we hardcode the "fully >>>> initialized" value, then we can eliminate the additional overhead: >>>> >>>> INTERCEPTOR(call_once, o) { >>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) { >>>> __tsan_acquire(o); >>>> return; >>>> } >>>> __tsan_acquire_release(o); >>>> REAL(call_once)(o); >>>> } >>> >>> Unfortunately, the first fast-path check is inlined and cannot be intercepted. We can only intercept the inner call to __call_once. But how would __tsan_acquire_release help here? The issue is that we need to perform the release *after* user code has run, but before the "o" flag is changed. Otherwise, TSan will still see a false positive where one thread has already run user code, and another thread already sees that call_once is finished, but the acquire has no release to pair with. >> >> >> >> Will then the following work? >> >> INTERCEPTOR(call_once, o) { >> REAL(call_once)(o); >> __tsan_release_merge(o); >> __tsan_acquire(o); >> } > > Still racy. Suppose thread A is still inside REAL(call_once), but already after it has run user code and updated "o" to ~0. Thread B load-acquires "o", finds ~0, assumes it's fully initialized, keeps going, but user code stores hasn't been properly published. > > Kuba >
Same problem exists, thread A can still be within REAL(call_once), but after it ran user code and set the flag to ~0. Roughly, call_once does: __call_once(flag, arg, func) { mutex_lock(mut); if (flag == BEING_INITIALIZED) { wait } else if (flag == NOT_INITIALIZED_AT_ALL) { flag = BEING_INITIALIZED; mutex_unlock(mut); func(arg); // <=== user code callback mutex_lock(mut); atomic_store(&flag, FULLY_INITIALIZED, mo_release); // "release" store, but within a compiled dylib, thus invisible to TSan } mutex_unlock(mut); } If thread A is just after the release store, which is invisible to TSan, __tsan_acquire in thread B will have no effect, and the stores from the callback to func(arg) will not be synchronized to thread B. Anyway, I just realized that we can wrap "func" into our own callback, which will perform the (extra) __tsan_release... Do you think that would work? E.g.: void call_once_callback_wrapper(...) { orig_func(orig_arg); __tsan_release(flag); } INTERCEPTOR(call_once, o, func, arg) { REAL(call_once)(flag, ..., call_once_callback_wrapper); } Kuba> On 2 Sep 2016, at 13:42, Dmitry Vyukov <dvyukov at google.com> wrote: > > INTERCEPTOR(call_once, o) { > __tsan_release_merge(o); > REAL(call_once)(o); > __tsan_acquire(o); > } > > ? > > > On Fri, Sep 2, 2016 at 12:16 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >> >>> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com> wrote: >>> >>> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >>>> >>>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>>> >>>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at gmail.com> wrote: >>>>>> Hi, >>>>>> >>>>>> I'm trying to write a TSan interceptor for the C++11 call_once function. There are currently false positive reports, because the inner __call_once function is located in the (non-instrumented) libcxx library, and on macOS we can't expect the users to build their own instrumented libcxx. >>>>>> >>>>>> TSan already supports pthread_once and dispatch_once by having interceptors that re-implement the logic. However, doing the same for call_once/__call_once doesn't work, because call_once is explicitly supposed to be exception-safe, but the sanitizer runtime libraries disallow exception handling. >>>>>> >>>>>> Any ideas how to handle call_once in TSan? >>>>> >>>>> Does anybody remember exact reasons we disable exceptions in sanitizer >>>>> runtimes? One is that it won't link with C programs. Are there any >>>>> other? >>>>> If C is the only reason: there is already a part of tsan runtime is >>>>> linked only to C++ programs (it contains operator new/delete >>>>> interceptors). We could add additional files to the cxx part of >>>>> runtime and build them with exceptions. >>>>> >>>>> Alternatively, the interceptor can handle only synchronization but >>>>> forward actual logic to the real function. Along the lines of: >>>>> >>>>> INTERCEPTOR(call_once, o) { >>>>> __tsan_acquire_release(o); >>>>> REAL(call_once)(o); >>>>> } >>>>> >>>>> That will have some performance impact. If we hardcode the "fully >>>>> initialized" value, then we can eliminate the additional overhead: >>>>> >>>>> INTERCEPTOR(call_once, o) { >>>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) { >>>>> __tsan_acquire(o); >>>>> return; >>>>> } >>>>> __tsan_acquire_release(o); >>>>> REAL(call_once)(o); >>>>> } >>>> >>>> Unfortunately, the first fast-path check is inlined and cannot be intercepted. We can only intercept the inner call to __call_once. But how would __tsan_acquire_release help here? The issue is that we need to perform the release *after* user code has run, but before the "o" flag is changed. Otherwise, TSan will still see a false positive where one thread has already run user code, and another thread already sees that call_once is finished, but the acquire has no release to pair with. >>> >>> >>> >>> Will then the following work? >>> >>> INTERCEPTOR(call_once, o) { >>> REAL(call_once)(o); >>> __tsan_release_merge(o); >>> __tsan_acquire(o); >>> } >> >> Still racy. Suppose thread A is still inside REAL(call_once), but already after it has run user code and updated "o" to ~0. Thread B load-acquires "o", finds ~0, assumes it's fully initialized, keeps going, but user code stores hasn't been properly published. >> >> Kuba >>