thr3ads.net - llvm dev - [llvm-dev] call

If this information is useful, please help other people find it:
Share via:

Kuba Brecka via llvm-dev

2016-Sep-02 10:16 UTC

[llvm-dev] call_once and TSan

> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com> wrote:
> 
> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at
gmail.com> wrote:
>> 
>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>> 
>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at
gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I'm trying to write a TSan interceptor for the C++11
call_once function.  There are currently false positive reports, because the
inner __call_once function is located in the (non-instrumented) libcxx library,
and on macOS we can't expect the users to build their own instrumented
libcxx.
>>>> 
>>>> TSan already supports pthread_once and dispatch_once by having
interceptors that re-implement the logic.  However, doing the same for
call_once/__call_once doesn't work, because call_once is explicitly supposed
to be exception-safe, but the sanitizer runtime libraries disallow exception
handling.
>>>> 
>>>> Any ideas how to handle call_once in TSan?
>>> 
>>> Does anybody remember exact reasons we disable exceptions in
sanitizer
>>> runtimes? One is that it won't link with C programs. Are there
any
>>> other?
>>> If C is the only reason: there is already a part of tsan runtime is
>>> linked only to C++ programs (it contains operator new/delete
>>> interceptors). We could add additional files to the cxx part of
>>> runtime and build them with exceptions.
>>> 
>>> Alternatively, the interceptor can handle only synchronization but
>>> forward actual logic to the real function. Along the lines of:
>>> 
>>> INTERCEPTOR(call_once, o) {
>>> __tsan_acquire_release(o);
>>> REAL(call_once)(o);
>>> }
>>> 
>>> That will have some performance impact. If we hardcode the
"fully
>>> initialized" value, then we can eliminate the additional
overhead:
>>> 
>>> INTERCEPTOR(call_once, o) {
>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) {
>>>    __tsan_acquire(o);
>>>    return;
>>> }
>>> __tsan_acquire_release(o);
>>> REAL(call_once)(o);
>>> }
>> 
>> Unfortunately, the first fast-path check is inlined and cannot be
intercepted.  We can only intercept the inner call to __call_once.  But how
would __tsan_acquire_release help here?  The issue is that we need to perform
the release *after* user code has run, but before the "o" flag is
changed.  Otherwise, TSan will still see a false positive where one thread has
already run user code, and another thread already sees that call_once is
finished, but the acquire has no release to pair with.
> 
> 
> 
> Will then the following work?
> 
> INTERCEPTOR(call_once, o) {
>  REAL(call_once)(o);
>  __tsan_release_merge(o);
>  __tsan_acquire(o);
> }
Still racy.  Suppose thread A is still inside REAL(call_once), but already after
it has run user code and updated "o" to ~0.  Thread B load-acquires
"o", finds ~0, assumes it's fully initialized, keeps going, but
user code stores hasn't been properly published.

Kuba

Dmitry Vyukov via llvm-dev

2016-Sep-02 11:42 UTC

head link

[llvm-dev] call_once and TSan

INTERCEPTOR(call_once, o) {
    __tsan_release_merge(o);
   REAL(call_once)(o);
  __tsan_acquire(o);
}

?


On Fri, Sep 2, 2016 at 12:16 PM, Kuba Brecka <kuba.brecka at gmail.com>
wrote:>
>> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at google.com>
wrote:
>>
>> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at
gmail.com> wrote:
>>>
>>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka at
gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm trying to write a TSan interceptor for the C++11
call_once function.  There are currently false positive reports, because the
inner __call_once function is located in the (non-instrumented) libcxx library,
and on macOS we can't expect the users to build their own instrumented
libcxx.
>>>>>
>>>>> TSan already supports pthread_once and dispatch_once by
having interceptors that re-implement the logic.  However, doing the same for
call_once/__call_once doesn't work, because call_once is explicitly supposed
to be exception-safe, but the sanitizer runtime libraries disallow exception
handling.
>>>>>
>>>>> Any ideas how to handle call_once in TSan?
>>>>
>>>> Does anybody remember exact reasons we disable exceptions in
sanitizer
>>>> runtimes? One is that it won't link with C programs. Are
there any
>>>> other?
>>>> If C is the only reason: there is already a part of tsan
runtime is
>>>> linked only to C++ programs (it contains operator new/delete
>>>> interceptors). We could add additional files to the cxx part of
>>>> runtime and build them with exceptions.
>>>>
>>>> Alternatively, the interceptor can handle only synchronization
but
>>>> forward actual logic to the real function. Along the lines of:
>>>>
>>>> INTERCEPTOR(call_once, o) {
>>>> __tsan_acquire_release(o);
>>>> REAL(call_once)(o);
>>>> }
>>>>
>>>> That will have some performance impact. If we hardcode the
"fully
>>>> initialized" value, then we can eliminate the additional
overhead:
>>>>
>>>> INTERCEPTOR(call_once, o) {
>>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) {
>>>>    __tsan_acquire(o);
>>>>    return;
>>>> }
>>>> __tsan_acquire_release(o);
>>>> REAL(call_once)(o);
>>>> }
>>>
>>> Unfortunately, the first fast-path check is inlined and cannot be
intercepted.  We can only intercept the inner call to __call_once.  But how
would __tsan_acquire_release help here?  The issue is that we need to perform
the release *after* user code has run, but before the "o" flag is
changed.  Otherwise, TSan will still see a false positive where one thread has
already run user code, and another thread already sees that call_once is
finished, but the acquire has no release to pair with.
>>
>>
>>
>> Will then the following work?
>>
>> INTERCEPTOR(call_once, o) {
>>  REAL(call_once)(o);
>>  __tsan_release_merge(o);
>>  __tsan_acquire(o);
>> }
>
> Still racy.  Suppose thread A is still inside REAL(call_once), but already
after it has run user code and updated "o" to ~0.  Thread B
load-acquires "o", finds ~0, assumes it's fully initialized, keeps
going, but user code stores hasn't been properly published.
>
> Kuba
>

Kuba Brecka via llvm-dev

2016-Sep-02 12:21 UTC

head link

[llvm-dev] call_once and TSan

Same problem exists, thread A can still be within REAL(call_once), but after it
ran user code and set the flag to ~0.  Roughly, call_once does:

__call_once(flag, arg, func) {
  mutex_lock(mut);
  if (flag == BEING_INITIALIZED) { wait }
  else if (flag == NOT_INITIALIZED_AT_ALL) {
    flag = BEING_INITIALIZED;
    mutex_unlock(mut);
    func(arg);    // <=== user code callback
    mutex_lock(mut);
    atomic_store(&flag, FULLY_INITIALIZED, mo_release);   //
"release" store, but within a compiled dylib, thus invisible to TSan
  }
  mutex_unlock(mut);
}

If thread A is just after the release store, which is invisible to TSan,
__tsan_acquire in thread B will have no effect, and the stores from the callback
to func(arg) will not be synchronized to thread B.

Anyway, I just realized that we can wrap "func" into our own callback,
which will perform the (extra) __tsan_release...  Do you think that would work? 
E.g.:

void call_once_callback_wrapper(...) {
  orig_func(orig_arg);
  __tsan_release(flag);
}

INTERCEPTOR(call_once, o, func, arg) {
  REAL(call_once)(flag, ..., call_once_callback_wrapper);
}

Kuba
> On 2 Sep 2016, at 13:42, Dmitry Vyukov <dvyukov at google.com> wrote:
> 
> INTERCEPTOR(call_once, o) {
>    __tsan_release_merge(o);
>   REAL(call_once)(o);
>  __tsan_acquire(o);
> }
> 
> ?
> 
> 
> On Fri, Sep 2, 2016 at 12:16 PM, Kuba Brecka <kuba.brecka at
gmail.com> wrote:
>> 
>>> On 2 Sep 2016, at 12:11, Dmitry Vyukov <dvyukov at
google.com> wrote:
>>> 
>>> On Fri, Sep 2, 2016 at 12:09 PM, Kuba Brecka <kuba.brecka at
gmail.com> wrote:
>>>> 
>>>>> On 2 Sep 2016, at 11:18, Dmitry Vyukov via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>> 
>>>>> On Thu, Sep 1, 2016 at 2:30 PM, Kuba Brecka <kuba.brecka
at gmail.com> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I'm trying to write a TSan interceptor for the
C++11 call_once function.  There are currently false positive reports, because
the inner __call_once function is located in the (non-instrumented) libcxx
library, and on macOS we can't expect the users to build their own
instrumented libcxx.
>>>>>> 
>>>>>> TSan already supports pthread_once and dispatch_once by
having interceptors that re-implement the logic.  However, doing the same for
call_once/__call_once doesn't work, because call_once is explicitly supposed
to be exception-safe, but the sanitizer runtime libraries disallow exception
handling.
>>>>>> 
>>>>>> Any ideas how to handle call_once in TSan?
>>>>> 
>>>>> Does anybody remember exact reasons we disable exceptions
in sanitizer
>>>>> runtimes? One is that it won't link with C programs.
Are there any
>>>>> other?
>>>>> If C is the only reason: there is already a part of tsan
runtime is
>>>>> linked only to C++ programs (it contains operator
new/delete
>>>>> interceptors). We could add additional files to the cxx
part of
>>>>> runtime and build them with exceptions.
>>>>> 
>>>>> Alternatively, the interceptor can handle only
synchronization but
>>>>> forward actual logic to the real function. Along the lines
of:
>>>>> 
>>>>> INTERCEPTOR(call_once, o) {
>>>>> __tsan_acquire_release(o);
>>>>> REAL(call_once)(o);
>>>>> }
>>>>> 
>>>>> That will have some performance impact. If we hardcode the
"fully
>>>>> initialized" value, then we can eliminate the
additional overhead:
>>>>> 
>>>>> INTERCEPTOR(call_once, o) {
>>>>> if (__atomic_load(o, acquire) == FULLY_INITIALIZED) {
>>>>>   __tsan_acquire(o);
>>>>>   return;
>>>>> }
>>>>> __tsan_acquire_release(o);
>>>>> REAL(call_once)(o);
>>>>> }
>>>> 
>>>> Unfortunately, the first fast-path check is inlined and cannot
be intercepted.  We can only intercept the inner call to __call_once.  But how
would __tsan_acquire_release help here?  The issue is that we need to perform
the release *after* user code has run, but before the "o" flag is
changed.  Otherwise, TSan will still see a false positive where one thread has
already run user code, and another thread already sees that call_once is
finished, but the acquire has no release to pair with.
>>> 
>>> 
>>> 
>>> Will then the following work?
>>> 
>>> INTERCEPTOR(call_once, o) {
>>> REAL(call_once)(o);
>>> __tsan_release_merge(o);
>>> __tsan_acquire(o);
>>> }
>> 
>> Still racy.  Suppose thread A is still inside REAL(call_once), but
already after it has run user code and updated "o" to ~0.  Thread B
load-acquires "o", finds ~0, assumes it's fully initialized, keeps
going, but user code stores hasn't been properly published.
>> 
>> Kuba
>>

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Sep 2016 - call_once and TSan

[llvm-dev] call_once and TSan

[llvm-dev] call_once and TSan

[llvm-dev] call_once and TSan

Possibly Parallel Threads