Dmitry Vyukov via llvm-dev
2015-Aug-18 18:57 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On Tue, Aug 18, 2015 at 8:28 PM, Kostya Serebryany <kcc at google.com> wrote:> +dvyukov > > On Mon, Aug 17, 2015 at 8:37 AM, Renato Golin <renato.golin at linaro.org> > wrote: >> >> Folks, >> >> The review of patch http://reviews.llvm.org/D11532 is extremely slow >> due to the number of hacks, left-overs and general undesired changes >> and style that the submission has. That happens, and it's ok when the >> overall direction the patch is going was agreed, and is acceptable as >> generally good. >> >> But this is not the case. >> >> To wake up the elephant in the room, do we really think that adding >> such a hack to an LLVM project for the sake of saving Android the >> burden of carrying such hack is *acceptable*?Hi Renato, What exactly hack/hacks do you mean?>> The only "reason" given to add such a hack was that, and I quote: >> * "what are we to do for NOW when the "proper" fix is maybe months >> off, or possibly longer?" >> * "This patch will allow people to experiment with TSAN on their >> android devices" >> * "don't let the perfect be the enemy of "limping along for a bit" >> >> So, in order to let *some* people *experiment* with TSAN on *Android*, >> we're going to consciously make TSAN *limp along* for the foreseeable >> future? Is that a wise price to pay for such a far fetched goal? >> >> The "proper" solution seems to be to fix TLS support, which: >> * "Ideally, this should be supported with the __thread keyword, but >> that is not yet supported, and it is a much bigger chunk of work to >> get it integrated into the default android cross compiler." >> >> So, to avoid a larger amount of work on another compiler, we're going >> to cripple LLVM? >> >> I cannot see why this would *ever* be a wise decision... Please, >> someone enlighten me. >> >> cheers, >> --renato > >
Renato Golin via llvm-dev
2015-Aug-18 20:53 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On 18 August 2015 at 19:57, Dmitry Vyukov <dvyukov at google.com> wrote:> Hi Renato, > > What exactly hack/hacks do you mean?Hi Dmitry, The sanitizer code seems to be more accepting to adding complicated pre-processor logic, as well as a number of alternative functions to work around various inefficiencies in other systems, but in other LLVM projects, such as Clang and the back-ends, we tend to be less accepting of work-arounds being implemented just for the same of a test. This case adds an immense complexity (see the size of the patch), and it has an immense number of odd assumptions (see the amount of negative feedback on the patch), that may never have been tested in other architectures (see some comments about generalising the wrong stuff). All of that, to be able to "experiment" with TSAN on Android in AArch64. See, if TSAN was completely broken in AArch64, or if other people weren't trying to push it upstream, I'd understand someone throwing a bunch of hacked files that "works on their boxes", if that's disabled by default and is only supposed to work when invoked. But none of that is true. TSAN works to an extent in AArch64, and we have been pushing some patches to prove that. With ASAN live on the buildbots, and the builtins working, we'll now focus on TSAN to get it being tested in the buildbots, too. That's Linux AArch64, of course, as we don't test Android at all. Now, the oddities with Android is that bionic and the rest of the run-time is crippled in ways that no other system I know gets close. Layers upon layers of libraries, aliases, asm hackery, just to make it build. Adding Clang to the list of compilers did not make it better. This comment baffles me: "Ideally, this should be supported with the __thread keyword, but that is not yet supported, and it is a much bigger chunk of work to get it integrated into the default android cross compiler." AFAIK, __thread is supported by GCC and Clang, and this seems to be a problem in Android's linker. So, to work around a problem in Android's linker, we're adding a huge pile of code to LLVM for a temporary fix because of a possible experimentation in Android? The amount of work being done, and still required, to get this patch upstream in any decent shape or form will take many engineer-months, probably the same as it would to actually fix the linker in the first place. Especially if you add up the work that will be required to *remove* all that after the linker is fixed. But I'm being over-optimistic here. I'm old enough to know that, whenever someone introduces a "work-around that works", the probability of anyone fixing the original problem is zero. And the Android code that I had the displeasure to see, so far, has shown me that this is ultimately true on their side. And that's why I'm baffled, because we're moving that mentality, that work arounds are ok, no matter how ugly they are, into LLVM, and everyone seems to be ok with it. If everyone is ok with it, I'll just shut up. But I'd like to make sure that everyone is ok to do so. My tuppence. --renato
Dmitry Vyukov via llvm-dev
2015-Aug-18 21:32 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On Tue, Aug 18, 2015 at 10:53 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 18 August 2015 at 19:57, Dmitry Vyukov <dvyukov at google.com> wrote: >> Hi Renato, >> >> What exactly hack/hacks do you mean? > > Hi Dmitry, > > The sanitizer code seems to be more accepting to adding complicated > pre-processor logic, as well as a number of alternative functions to > work around various inefficiencies in other systems, but in other LLVM > projects, such as Clang and the back-ends, we tend to be less > accepting of work-arounds being implemented just for the same of a > test. > > This case adds an immense complexity (see the size of the patch), and > it has an immense number of odd assumptions (see the amount of > negative feedback on the patch), that may never have been tested in > other architectures (see some comments about generalising the wrong > stuff). All of that, to be able to "experiment" with TSAN on Android > in AArch64. > > See, if TSAN was completely broken in AArch64, or if other people > weren't trying to push it upstream, I'd understand someone throwing a > bunch of hacked files that "works on their boxes", if that's disabled > by default and is only supposed to work when invoked. But none of that > is true. > > TSAN works to an extent in AArch64, and we have been pushing some > patches to prove that. With ASAN live on the buildbots, and the > builtins working, we'll now focus on TSAN to get it being tested in > the buildbots, too. That's Linux AArch64, of course, as we don't test > Android at all. > > Now, the oddities with Android is that bionic and the rest of the > run-time is crippled in ways that no other system I know gets close. > Layers upon layers of libraries, aliases, asm hackery, just to make it > build. Adding Clang to the list of compilers did not make it better. > > This comment baffles me: "Ideally, this should be supported with the > __thread keyword, but that is not yet supported, and it is a much > bigger chunk of work to get it integrated into the default android > cross compiler." > > AFAIK, __thread is supported by GCC and Clang, and this seems to be a > problem in Android's linker. So, to work around a problem in Android's > linker, we're adding a huge pile of code to LLVM for a temporary fix > because of a possible experimentation in Android? > > The amount of work being done, and still required, to get this patch > upstream in any decent shape or form will take many engineer-months, > probably the same as it would to actually fix the linker in the first > place. Especially if you add up the work that will be required to > *remove* all that after the linker is fixed. > > But I'm being over-optimistic here. I'm old enough to know that, > whenever someone introduces a "work-around that works", the > probability of anyone fixing the original problem is zero. And the > Android code that I had the displeasure to see, so far, has shown me > that this is ultimately true on their side. And that's why I'm > baffled, because we're moving that mentality, that work arounds are > ok, no matter how ugly they are, into LLVM, and everyone seems to be > ok with it. > > If everyone is ok with it, I'll just shut up. But I'd like to make > sure that everyone is ok to do so. > > My tuppence. > > --renatoHi Renato, So the problem is only with the tls emulation, right? I don't know what it worth to add __thread support to android, and whether somebody is going to address it soon. But I think we can make the tls emulation _much_ cleaner. First, we need to put all tsan per-thread data into ThreadState object, accesses to ThreadState are already wrapped into cur_thread() function. So now we have a single function to modify, no macros spread across the codebase, no new files, etc. Then, I've looked at pthread_getspecific code in bionic and it is actually quite fast and does not call any other system function (except get_thread() which is basically an access to real tls). So we can use it, and then the whole support becomes merely: INLINE ThreadState *cur_thread() { #ifdef REAL_TLS return reinterpret_cast<ThreadState *>(&cur_thread_placeholder); #else return reinterpret_cast<ThreadState *>(pthread_getspecific(thread_key)); #endif } Which looks quite acceptable to me. Also we could try to ask android to dedicate a real tls slot for sanitizers (I don't know what are the chances, but ART obviously get a one). Then we can change the support to: INLINE ThreadState *cur_thread() { #ifdef REAL_TLS return reinterpret_cast<ThreadState *>(&cur_thread_placeholder); #else return reinterpret_cast<ThreadState *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]); #endif } Which is roughly the same code-wise, but faster and safer. Am I missing something?