thr3ads.net - llvm dev - [llvm-dev] TSAN hack on AArch64 for Android [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Dmitry Vyukov via llvm-dev

2015-Aug-18 21:32 UTC

[llvm-dev] TSAN hack on AArch64 for Android

On Tue, Aug 18, 2015 at 10:53 PM, Renato Golin <renato.golin at
linaro.org> wrote:> On 18 August 2015 at 19:57, Dmitry Vyukov <dvyukov at google.com>
wrote:
>> Hi Renato,
>>
>> What exactly hack/hacks do you mean?
>
> Hi Dmitry,
>
> The sanitizer code seems to be more accepting to adding complicated
> pre-processor logic, as well as a number of alternative functions to
> work around various inefficiencies in other systems, but in other LLVM
> projects, such as Clang and the back-ends, we tend to be less
> accepting of work-arounds being implemented just for the same of a
> test.
>
> This case adds an immense complexity (see the size of the patch), and
> it has an immense number of odd assumptions (see the amount of
> negative feedback on the patch), that may never have been tested in
> other architectures (see some comments about generalising the wrong
> stuff). All of that, to be able to "experiment" with TSAN on
Android
> in AArch64.
>
> See, if TSAN was completely broken in AArch64, or if other people
> weren't trying to push it upstream, I'd understand someone throwing
a
> bunch of hacked files that "works on their boxes", if that's
disabled
> by default and is only supposed to work when invoked. But none of that
> is true.
>
> TSAN works to an extent in AArch64, and we have been pushing some
> patches to prove that. With ASAN live on the buildbots, and the
> builtins working, we'll now focus on TSAN to get it being tested in
> the buildbots, too. That's Linux AArch64, of course, as we don't
test
> Android at all.
>
> Now, the oddities with Android is that bionic and the rest of the
> run-time is crippled in ways that no other system I know gets close.
> Layers upon layers of libraries, aliases, asm hackery, just to make it
> build. Adding Clang to the list of compilers did not make it better.
>
> This comment baffles me: "Ideally, this should be supported with the
> __thread keyword, but that is not yet supported, and it is a much
> bigger chunk of work to get it integrated into the default android
> cross compiler."
>
> AFAIK, __thread is supported by GCC and Clang, and this seems to be a
> problem in Android's linker. So, to work around a problem in
Android's
> linker, we're adding a huge pile of code to LLVM for a temporary fix
> because of a possible experimentation in Android?
>
> The amount of work being done, and still required, to get this patch
> upstream in any decent shape or form will take many engineer-months,
> probably the same as it would to actually fix the linker in the first
> place. Especially if you add up the work that will be required to
> *remove* all that after the linker is fixed.
>
> But I'm being over-optimistic here. I'm old enough to know that,
> whenever someone introduces a "work-around that works", the
> probability of anyone fixing the original problem is zero. And the
> Android code that I had the displeasure to see, so far, has shown me
> that this is ultimately true on their side. And that's why I'm
> baffled, because we're moving that mentality, that work arounds are
> ok, no matter how ugly they are, into LLVM, and everyone seems to be
> ok with it.
>
> If everyone is ok with it, I'll just shut up. But I'd like to make
> sure that everyone is ok to do so.
>
> My tuppence.
>
> --renato

Hi Renato,


So the problem is only with the tls emulation, right?

I don't know what it worth to add __thread support to android, and
whether somebody is going to address it soon.
But I think we can make the tls emulation _much_ cleaner. First, we
need to put all tsan per-thread data into ThreadState object, accesses
to ThreadState are already wrapped into cur_thread() function. So now
we have a single function to modify, no macros spread across the
codebase, no new files, etc. Then, I've looked at pthread_getspecific
code in bionic and it is actually quite fast and does not call any
other system function (except get_thread() which is basically an
access to real tls). So we can use it, and then the whole support
becomes merely:

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState *>(pthread_getspecific(thread_key));
#endif
}

Which looks quite acceptable to me.

Also we could try to ask android to dedicate a real tls slot for
sanitizers (I don't know what are the chances, but ART obviously get a
one). Then we can change the support to:

INLINE ThreadState *cur_thread() {
#ifdef REAL_TLS
  return reinterpret_cast<ThreadState *>(&cur_thread_placeholder);
#else
  return reinterpret_cast<ThreadState
*>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
#endif
}

Which is roughly the same code-wise, but faster and safer.

Am I missing something?

Dan Albert via llvm-dev

2015-Aug-18 22:43 UTC

head link

[llvm-dev] TSAN hack on AArch64 for Android

>
> Also we could try to ask android to dedicate a real tls slot for
> sanitizers (I don't know what are the chances, but ART obviously get a
> one).

Done: https://android-review.googlesource.com/#/c/167420/

On Tue, Aug 18, 2015 at 2:32 PM, Dmitry Vyukov <dvyukov at google.com>
wrote:
> On Tue, Aug 18, 2015 at 10:53 PM, Renato Golin <renato.golin at
linaro.org>
> wrote:
> > On 18 August 2015 at 19:57, Dmitry Vyukov <dvyukov at
google.com> wrote:
> >> Hi Renato,
> >>
> >> What exactly hack/hacks do you mean?
> >
> > Hi Dmitry,
> >
> > The sanitizer code seems to be more accepting to adding complicated
> > pre-processor logic, as well as a number of alternative functions to
> > work around various inefficiencies in other systems, but in other LLVM
> > projects, such as Clang and the back-ends, we tend to be less
> > accepting of work-arounds being implemented just for the same of a
> > test.
> >
> > This case adds an immense complexity (see the size of the patch), and
> > it has an immense number of odd assumptions (see the amount of
> > negative feedback on the patch), that may never have been tested in
> > other architectures (see some comments about generalising the wrong
> > stuff). All of that, to be able to "experiment" with TSAN on
Android
> > in AArch64.
> >
> > See, if TSAN was completely broken in AArch64, or if other people
> > weren't trying to push it upstream, I'd understand someone
throwing a
> > bunch of hacked files that "works on their boxes", if
that's disabled
> > by default and is only supposed to work when invoked. But none of that
> > is true.
> >
> > TSAN works to an extent in AArch64, and we have been pushing some
> > patches to prove that. With ASAN live on the buildbots, and the
> > builtins working, we'll now focus on TSAN to get it being tested
in
> > the buildbots, too. That's Linux AArch64, of course, as we
don't test
> > Android at all.
> >
> > Now, the oddities with Android is that bionic and the rest of the
> > run-time is crippled in ways that no other system I know gets close.
> > Layers upon layers of libraries, aliases, asm hackery, just to make it
> > build. Adding Clang to the list of compilers did not make it better.
> >
> > This comment baffles me: "Ideally, this should be supported with
the
> > __thread keyword, but that is not yet supported, and it is a much
> > bigger chunk of work to get it integrated into the default android
> > cross compiler."
> >
> > AFAIK, __thread is supported by GCC and Clang, and this seems to be a
> > problem in Android's linker. So, to work around a problem in
Android's
> > linker, we're adding a huge pile of code to LLVM for a temporary
fix
> > because of a possible experimentation in Android?
> >
> > The amount of work being done, and still required, to get this patch
> > upstream in any decent shape or form will take many engineer-months,
> > probably the same as it would to actually fix the linker in the first
> > place. Especially if you add up the work that will be required to
> > *remove* all that after the linker is fixed.
> >
> > But I'm being over-optimistic here. I'm old enough to know
that,
> > whenever someone introduces a "work-around that works", the
> > probability of anyone fixing the original problem is zero. And the
> > Android code that I had the displeasure to see, so far, has shown me
> > that this is ultimately true on their side. And that's why I'm
> > baffled, because we're moving that mentality, that work arounds
are
> > ok, no matter how ugly they are, into LLVM, and everyone seems to be
> > ok with it.
> >
> > If everyone is ok with it, I'll just shut up. But I'd like to
make
> > sure that everyone is ok to do so.
> >
> > My tuppence.
> >
> > --renato
>
>
> Hi Renato,
>
>
> So the problem is only with the tls emulation, right?
>
> I don't know what it worth to add __thread support to android, and
> whether somebody is going to address it soon.
> But I think we can make the tls emulation _much_ cleaner. First, we
> need to put all tsan per-thread data into ThreadState object, accesses
> to ThreadState are already wrapped into cur_thread() function. So now
> we have a single function to modify, no macros spread across the
> codebase, no new files, etc. Then, I've looked at pthread_getspecific
> code in bionic and it is actually quite fast and does not call any
> other system function (except get_thread() which is basically an
> access to real tls). So we can use it, and then the whole support
> becomes merely:
>
> INLINE ThreadState *cur_thread() {
> #ifdef REAL_TLS
>   return reinterpret_cast<ThreadState
*>(&cur_thread_placeholder);
> #else
>   return reinterpret_cast<ThreadState
*>(pthread_getspecific(thread_key));
> #endif
> }
>
> Which looks quite acceptable to me.
>
> Also we could try to ask android to dedicate a real tls slot for
> sanitizers (I don't know what are the chances, but ART obviously get a
> one). Then we can change the support to:
>
> INLINE ThreadState *cur_thread() {
> #ifdef REAL_TLS
>   return reinterpret_cast<ThreadState
*>(&cur_thread_placeholder);
> #else
>   return reinterpret_cast<ThreadState
> *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
> #endif
> }
>
> Which is roughly the same code-wise, but faster and safer.
>
> Am I missing something?
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150818/38dde16a/attachment.html>

Renato Golin via llvm-dev

2015-Aug-18 23:11 UTC

head link

[llvm-dev] TSAN hack on AArch64 for Android

On 18 August 2015 at 22:32, Dmitry Vyukov <dvyukov at google.com>
wrote:> So the problem is only with the tls emulation, right?
No. It's with the whole patch, including its contents, how it was
submitted, what is its purpose and how nobody cared about any of this.

Adding TLS emulation is, in its own, a worthy feature. It allows for
platforms that can't do full TLS to use TSAN, which is a great tool.

But when you want to add a feature to a tool that runs on multiple
architectures, you plan the feature, you design it, you send an RFC
email, you discuss in the list.

Then you go about the implementation, submit some preparation patches,
clean up code, and have some more discussions. For a good example, see
Daniel Sanders' "The trouble with triples" and the associated
patches.
It takes a lot of time to conjure a new feature in a complex piece of
software, but more than that, needs consensus and people designing it
beforehand.

But when someone dumps a patch that is nowhere near production
quality, or would even compile in most platforms, people should be
backing up and saying: "Wait a minute. What are you trying to do, and
how did you get here?". We had a similar issue a few months back with
the new Stride Vectorizer. Fantastic feature, one that I had worked
before, but it took us a whole month of "code review" to figure out
that the whole patch was just wrong, and had to be completely
re-written, from where we had stopped two years back.

I'm doing the same now. Let's make sure to agree on what's being
proposed, THEN we'll agree if that's worthy or not, THEN we'll agree
what's the best implementation, THEN we'll review code. We can't
start
from the last iteration, as that'll invariably lead to poor code going
in.

Furthermore, you don't get TLS+TSAN+AArch64+Android support in a
single patch! The likelihood that we'll end up reverting it a hundred
times is really high, and that will completely unbalance our buildbot
infrastructure and break everyone else's workflow. Slow is faster than
ludicrous. One step at a time, please.

First, let's get TSAN working on AArch64. Adhemerval is working on
that. Then, let's see what's missing for Android. Ok, TLS emulation is
required, at least for the time being. What other architecture would
benefit from TLS emulation? None? Ok, then we set out to change the
code, so that TLS emulation can go in. Meanwhile, a group of people
discuss (in public) the design of such framework, and in time, a
number of patches will start to go in, one by one, until the feature
is on. Only then we'll get Android using it.

The time will also give buildbots enough iterations to pick any bug
that we don't find on code review. Bugs that would be obscured by a
giant patch with many other more obvious bugs exploding our CI
infrastructure and getting reverted by angry bot maintainers.

> I don't know what it worth to add __thread support to android, and
> whether somebody is going to address it soon.
Wouldn't it be better to know that before submitting a large change to
another code base? If there is evidence that this is never going to
happen, or that, for some very strong engineering reasons, it's not
possible to do it, then the reasons to do TLS emulation are thoroughly
justified. Right now, it's just a hunch.

> But I think we can make the tls emulation _much_ cleaner. First, we
> need to put all tsan per-thread data into ThreadState object, accesses
> to ThreadState are already wrapped into cur_thread() function. So now
> we have a single function to modify, no macros spread across the
> codebase, no new files, etc.
I'm no expert in that field, but this sounds like a good plan. Doesn't
seem to be part of this patch, though. Nor it should, as it actually
should have been a preparatory patch, submitted after all interested
parties agreed that TLS emulation is the way to go.

> INLINE ThreadState *cur_thread() {
> #ifdef REAL_TLS
>   return reinterpret_cast<ThreadState
*>(&cur_thread_placeholder);
> #else
>   return reinterpret_cast<ThreadState
*>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
> #endif
> }
>
> Which is roughly the same code-wise, but faster and safer.
Dan "Eastwood" Albert has pulled the trigger. :)

This looks reasonable to me. Does this still mean you need the
emulation, or is Android going to give you a "reasonably looking
ThreadState"?

cheers,
--renato

Joerg Sonnenberger via llvm-dev

2015-Aug-18 23:31 UTC

head link

[llvm-dev] TSAN hack on AArch64 for Android

On Tue, Aug 18, 2015 at 11:32:43PM +0200, Dmitry Vyukov via llvm-dev
wrote:> I don't know what it worth to add __thread support to android, and
> whether somebody is going to address it soon.
http://reviews.llvm.org/D12001 and related changes?

Joerg

Dan Albert via llvm-dev

2015-Aug-19 00:11 UTC

head link

[llvm-dev] TSAN hack on AArch64 for Android

It hasn't been obvious to me whether people mean real ELF TLS when they
talk about __thread support, or just that we need the keyword to be
supported *somehow*. The patch set that Joerg mentioned will do the TLS
emulation via pthread_getspecific for us in the compiler/rtlib. If that's
sufficient then we don't actually need any of that in TSAN, and neither
will any other platform that lacks ELF TLS.

On Tue, Aug 18, 2015 at 4:31 PM, Joerg Sonnenberger via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Aug 18, 2015 at 11:32:43PM +0200, Dmitry Vyukov via llvm-dev wrote:
> > I don't know what it worth to add __thread support to android, and
> > whether somebody is going to address it soon.
>
> http://reviews.llvm.org/D12001 and related changes?
>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150818/8a3ab8ee/attachment.html>

Dmitry Vyukov via llvm-dev

2015-Aug-19 10:29 UTC

head link

[llvm-dev] TSAN hack on AArch64 for Android

Wait, this change is not submitted yet, right? Or you mean mailing of
this change in bad shape?

I consider this change as work-in-progress where author is looking for
feedback on his ongoing progress. I guess the change description
should have been spelled this very explicitly.
This change definitely needs more work to be submitted, splitting into
smaller patches with a plan for submission order, refactoring,
cleanup, etc. That is no question.
As for the design, there is not much to design upfront, as you
discover most of the issue only when you hit them (I would not foresee
them). Also, most of the changes in the patch are just spot fixes
(e.g. if a struct definition is different on the new platform, you
just need to provide a correct definition), there is nothing to design
there.

Regarding the tls, whether we call it "emulation" or not, I believe we
can do it a very neat and laconic form.

Refactoring of tsan per-thread structures in preparation for the
"emulation" should go in independently. Agree.
Aarch64 support is already submitted to tsan as far as I see.



On Wed, Aug 19, 2015 at 1:11 AM, Renato Golin <renato.golin at linaro.org>
wrote:> On 18 August 2015 at 22:32, Dmitry Vyukov <dvyukov at google.com>
wrote:
>> So the problem is only with the tls emulation, right?
>
> No. It's with the whole patch, including its contents, how it was
> submitted, what is its purpose and how nobody cared about any of this.
>
> Adding TLS emulation is, in its own, a worthy feature. It allows for
> platforms that can't do full TLS to use TSAN, which is a great tool.
>
> But when you want to add a feature to a tool that runs on multiple
> architectures, you plan the feature, you design it, you send an RFC
> email, you discuss in the list.
>
> Then you go about the implementation, submit some preparation patches,
> clean up code, and have some more discussions. For a good example, see
> Daniel Sanders' "The trouble with triples" and the associated
patches.
> It takes a lot of time to conjure a new feature in a complex piece of
> software, but more than that, needs consensus and people designing it
> beforehand.
>
> But when someone dumps a patch that is nowhere near production
> quality, or would even compile in most platforms, people should be
> backing up and saying: "Wait a minute. What are you trying to do, and
> how did you get here?". We had a similar issue a few months back with
> the new Stride Vectorizer. Fantastic feature, one that I had worked
> before, but it took us a whole month of "code review" to figure
out
> that the whole patch was just wrong, and had to be completely
> re-written, from where we had stopped two years back.
>
> I'm doing the same now. Let's make sure to agree on what's
being
> proposed, THEN we'll agree if that's worthy or not, THEN we'll
agree
> what's the best implementation, THEN we'll review code. We
can't start
> from the last iteration, as that'll invariably lead to poor code going
> in.
>
> Furthermore, you don't get TLS+TSAN+AArch64+Android support in a
> single patch! The likelihood that we'll end up reverting it a hundred
> times is really high, and that will completely unbalance our buildbot
> infrastructure and break everyone else's workflow. Slow is faster than
> ludicrous. One step at a time, please.
>
> First, let's get TSAN working on AArch64. Adhemerval is working on
> that. Then, let's see what's missing for Android. Ok, TLS emulation
is
> required, at least for the time being. What other architecture would
> benefit from TLS emulation? None? Ok, then we set out to change the
> code, so that TLS emulation can go in. Meanwhile, a group of people
> discuss (in public) the design of such framework, and in time, a
> number of patches will start to go in, one by one, until the feature
> is on. Only then we'll get Android using it.
>
> The time will also give buildbots enough iterations to pick any bug
> that we don't find on code review. Bugs that would be obscured by a
> giant patch with many other more obvious bugs exploding our CI
> infrastructure and getting reverted by angry bot maintainers.
>
>
>
>> I don't know what it worth to add __thread support to android, and
>> whether somebody is going to address it soon.
>
> Wouldn't it be better to know that before submitting a large change to
> another code base? If there is evidence that this is never going to
> happen, or that, for some very strong engineering reasons, it's not
> possible to do it, then the reasons to do TLS emulation are thoroughly
> justified. Right now, it's just a hunch.
>
>
>> But I think we can make the tls emulation _much_ cleaner. First, we
>> need to put all tsan per-thread data into ThreadState object, accesses
>> to ThreadState are already wrapped into cur_thread() function. So now
>> we have a single function to modify, no macros spread across the
>> codebase, no new files, etc.
>
> I'm no expert in that field, but this sounds like a good plan.
Doesn't
> seem to be part of this patch, though. Nor it should, as it actually
> should have been a preparatory patch, submitted after all interested
> parties agreed that TLS emulation is the way to go.
>
>
>> INLINE ThreadState *cur_thread() {
>> #ifdef REAL_TLS
>>   return reinterpret_cast<ThreadState
*>(&cur_thread_placeholder);
>> #else
>>   return reinterpret_cast<ThreadState
*>(__get_tls()[ANDROID_TSAN_TLS_SLOT]);
>> #endif
>> }
>>
>> Which is roughly the same code-wise, but faster and safer.
>
> Dan "Eastwood" Albert has pulled the trigger. :)
>
> This looks reasonable to me. Does this still mean you need the
> emulation, or is Android going to give you a "reasonably looking
> ThreadState"?
>
> cheers,
> --renato

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Aug 2015 - TSAN hack on AArch64 for Android

[llvm-dev] TSAN hack on AArch64 for Android

[llvm-dev] TSAN hack on AArch64 for Android

[llvm-dev] TSAN hack on AArch64 for Android

[llvm-dev] TSAN hack on AArch64 for Android

[llvm-dev] TSAN hack on AArch64 for Android

[llvm-dev] TSAN hack on AArch64 for Android

Maybe Matching Threads