Dmitry Vyukov via llvm-dev
2015-Aug-19 11:20 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On Wed, Aug 19, 2015 at 1:15 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 19 August 2015 at 11:29, Dmitry Vyukov <dvyukov at google.com> wrote: >> Wait, this change is not submitted yet, right? Or you mean mailing of >> this change in bad shape? > > Right. > > Jason has submitted high quality patches before, so this is in no way > a reprimand to his submission. I am specifically worried about the > lack of higher level questions from the reviewers / code owners about > the general idea of the patch, not with the submission itself. > > >> I consider this change as work-in-progress where author is looking for >> feedback on his ongoing progress. I guess the change description >> should have been spelled this very explicitly. > > For that kind of work, we normally follow the guidelines: > > 1. Send an email to the list explaining the problem / design, have > some high level discussions > 2. Send an "[RFC]" patch, explicitly identified as "not-for-commit" > quality, have some specific discussions > 3. Send *another* patch, for commit, with loads of tests, comments, > FIXMEs, have final review > > People normally ignore most spurious errors from RFC patches, but if > it's not marked as such, you'll get a lot of heat. > > Also, it is good practice to have a few steps *before* sending a big > final patch: > > 1. Compile your code in different ways (debug/release/asserts) on at > least x86_64/ARM/AArch64 and run check-all with at least > LLVM+Clang+"your-component". > 2. If you're targeting a different platform, make sure you run the > tests on them, too. (native/cross, doesn't matter much) > 3. Make sure you add enough tests, or describe extensively what tests > you have made to be sure the patch is good > 4. Review your diff cautiously, to make sure no left-overs remain: > * no commented out code > * no spurious whitespace changes > * no wrong comments, or fixed FIXME's left > > >> This change definitely needs more work to be submitted, splitting into >> smaller patches with a plan for submission order, refactoring, >> cleanup, etc. That is no question. > > More importantly, this patch may be moot, because Joerg's work already > does that, and Dan's patch to Android makes the remaining step a > 5-line patch. > > *That* is the question. How comfortable are we to add work-arounds > before even questioning if the proper solution is possible / > available. > > The submission is explicit: "This is a work-around, a better > implementation would be X". It is our job to evaluate the real cost of > X and see if it's worth putting a hack. > > It seems the cost is low, or paid already, so the patch should have > been refused before any real review was made. > > >> As for the design, there is not much to design upfront, as you >> discover most of the issue only when you hit them (I would not foresee >> them). > > TLS emulation vs proper implementation? Emulation in TSAN or RT or > some other lib? > > TSAN is hardly the place for run-time library calls, RT is a much > better place. Work arounds in LLVM is hardly a good replacement for a > proper implementation in the Android linker. > > All those are design decisions. In the end, RT has already a TLS > emulation coming about and Android just got a TLS slot for TSAN. Good > design decisions stemmed from design discussions in the list. > > >> Refactoring of tsan per-thread structures in preparation for the >> "emulation" should go in independently. Agree. >> Aarch64 support is already submitted to tsan as far as I see. > > Not yet. This is still work in progress. > > Adhemerval has implemented the functionality and we have tested on our > hardware. But there are issues with other hardware we don't have > access to, so progress has slowed down considerably. We're working on > some AArch64 buildbots to speed that up a lot, but as usual, > unforeseen complications hunt our sleep. :) > > Despite the experimental nature of our TSAN and Joerg's TLS emulation > implementations, I think it's enough to give Jason a working base to > start from scratch and use that to make TSAN work on Android (with > Dan's change).Sounds good to me.
Hi Renato and gang. I guess I need to back up a few steps. I missed this thread before I took off last week. I had been discussing the TSAN on android/aarch64 issues with the folks at thread-sanitizer at googlegroups.com for several months now. I had not explicitly been including llvm-dev (and you), and that was a mistake on my part. Palm-on-face? :-( Secondly, the "early release" of the gigantic patch for tsan+android+aarch64 was due to not being familiar with how phabricator does things by default. Sorry for the noise. I expected it to add an audience only after I did an expliclit "make-public" step. I was wrong on that, but I should not have been. Bad tool! :-) Now for what remains: As far as I can tell, there are several big hurdles to this work landing on compiler-rt Patch Size! The patch in question is a bit smaller now thanks to some dovetailing with http://reviews.llvm.org/D11484 being merged. I have already split up the patch into three chunks. I can do more as needed. TLS! I am willing to redo the TLS emulation part. So there are several possible strategies for doing this -- this was explicltly mentioned in the commit message, but it seems to have gotten lost in phabricator somehow. 1. use pthread_get/setspecific() (more on this below!!) 2. use some other workaround that does not use pthread API 3. use the new TLS slot from Dan's recent push to android. https://android-review.googlesource.com/#/c/167420/ 4. use the emutls work http://reviews.llvm.org/D12001 The first approach fails because pthread_key maintenance happens at thread/process destruction time, which means that a NEW key was being generated for the thread undergoing death, resulting in a new ThreadState being allocated for the dying thread. The second approach was what I selected (and what is on display at phabricator). I works, but isn't necessarily very pretty. The third and fourth approaches are recent, and I have not yet had a chance to take a look. At first blush, emutls seems the most problematic, as it uses intercepted calls like malloc(). TESTS! Currently, about 2/3 tests for tsan fail/flake on android+aarch64. They aren't xfailed, because some of them are flaky, and thus unexpectedly pass at times. So I decided to mark them as unsupported on android+aarch64, and thus is "green". These tests will need to triaged individually. *If anyone has a simple hack to mark a set of tests as unsupported that does not touch individual test files, please let me know!* *Whether I move them to a directory or prepend 2 lines is roughly the same size patch!* ANDROID-BIONIC! The required patches on bionic adds private symbols to select APIs in bionic so that TSAN can not intercept bionic-to-bionic calls. https://code.google.com/p/android/issues/detail?id=179564 Now, it turns out that some of the early workarounds specific to android (such as early addition of REAL versions of memset() and friends even before dlsym() is called()) becomes unnecessary with the bionic patches. This also implies that I should be able to go back to using the pthread API for TLS, because TSAN will no longer be intercepting calls at thread/process destruction time I have not yet had a chance to test this out. Assuming this works, what is your opinion on the "proper" TLS emulation in TSAN? a) always use pthread api? b) leave the use of __thread keyword there and use some variant of arch+os conditionals to select the appropriate technique? If it's (b), what is the appropriate technique? i.e. use pthread API selectively, or use some other arch+OS specific workaround? Thanks -Jason p.s. I handily dismiss the "nuclear" option, which is a complete reimplementation of tsan to avoid weak-symbol aliasing, and to do manual interception/patching of dynamically linked routines. Yeah, ignore this one, please. > -----Original Message----- > From: Dmitry Vyukov [mailto:dvyukov at google.com] > Sent: Wednesday, August 19, 2015 4:21 AM > To: Renato Golin > Cc: Kostya Serebryany; Jason Kim; Tamas Berghammer; Dan Albert; Stephen > Hines; Tim Northover; kanheim at a-bix.com; Chad Rosier; Evgenii Stepanov; > Adhemerval Zanella; LLVM Dev; Kristof Beyls; James Molloy > Subject: Re: TSAN hack on AArch64 for Android > > On Wed, Aug 19, 2015 at 1:15 PM, Renato Golin <renato.golin at linaro.org> > wrote: > > On 19 August 2015 at 11:29, Dmitry Vyukov <dvyukov at google.com> > wrote: > >> Wait, this change is not submitted yet, right? Or you mean mailing of > >> this change in bad shape? > > > > Right. > > > > Jason has submitted high quality patches before, so this is in no way > > a reprimand to his submission. I am specifically worried about the > > lack of higher level questions from the reviewers / code owners about > > the general idea of the patch, not with the submission itself. > > > > > >> I consider this change as work-in-progress where author is looking > >> for feedback on his ongoing progress. I guess the change description > >> should have been spelled this very explicitly. > > > > For that kind of work, we normally follow the guidelines: > > > > 1. Send an email to the list explaining the problem / design, have > > some high level discussions 2. Send an "[RFC]" patch, explicitly > > identified as "not-for-commit" > > quality, have some specific discussions 3. Send *another* patch, for > > commit, with loads of tests, comments, FIXMEs, have final review > > > > People normally ignore most spurious errors from RFC patches, but if > > it's not marked as such, you'll get a lot of heat. > > > > Also, it is good practice to have a few steps *before* sending a big > > final patch: > > > > 1. Compile your code in different ways (debug/release/asserts) on at > > least x86_64/ARM/AArch64 and run check-all with at least > > LLVM+Clang+"your-component". > > 2. If you're targeting a different platform, make sure you run the > > tests on them, too. (native/cross, doesn't matter much) 3. Make sure > > you add enough tests, or describe extensively what tests you have made > > to be sure the patch is good 4. Review your diff cautiously, to make > > sure no left-overs remain: > > * no commented out code > > * no spurious whitespace changes > > * no wrong comments, or fixed FIXME's left > > > > > >> This change definitely needs more work to be submitted, splitting > >> into smaller patches with a plan for submission order, refactoring, > >> cleanup, etc. That is no question. > > > > More importantly, this patch may be moot, because Joerg's work already > > does that, and Dan's patch to Android makes the remaining step a > > 5-line patch. > > > > *That* is the question. How comfortable are we to add work-arounds > > before even questioning if the proper solution is possible / > > available. > > > > The submission is explicit: "This is a work-around, a better > > implementation would be X". It is our job to evaluate the real cost of > > X and see if it's worth putting a hack. > > > > It seems the cost is low, or paid already, so the patch should have > > been refused before any real review was made. > > > > > >> As for the design, there is not much to design upfront, as you > >> discover most of the issue only when you hit them (I would not > >> foresee them). > > > > TLS emulation vs proper implementation? Emulation in TSAN or RT or > > some other lib? > > > > TSAN is hardly the place for run-time library calls, RT is a much > > better place. Work arounds in LLVM is hardly a good replacement for a > > proper implementation in the Android linker. > > > > All those are design decisions. In the end, RT has already a TLS > > emulation coming about and Android just got a TLS slot for TSAN. Good > > design decisions stemmed from design discussions in the list. > > > > > >> Refactoring of tsan per-thread structures in preparation for the > >> "emulation" should go in independently. Agree. > >> Aarch64 support is already submitted to tsan as far as I see. > > > > Not yet. This is still work in progress. > > > > Adhemerval has implemented the functionality and we have tested on > our > > hardware. But there are issues with other hardware we don't have > > access to, so progress has slowed down considerably. We're working on > > some AArch64 buildbots to speed that up a lot, but as usual, > > unforeseen complications hunt our sleep. :) > > > > Despite the experimental nature of our TSAN and Joerg's TLS emulation > > implementations, I think it's enough to give Jason a working base to > > start from scratch and use that to make TSAN work on Android (with > > Dan's change). > > Sounds good to me.
Renato Golin via llvm-dev
2015-Aug-26 21:12 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On 26 August 2015 at 20:13, Jason Kim <jasonk at codeaurora.org> wrote:> I had been discussing the TSAN on android/aarch64 issues with the folks at thread-sanitizer at googlegroups.com for several months now. I had not explicitly been including llvm-dev (and you), and that was a mistake on my part. Palm-on-face? :-(Yup, that would have saved a lot of emails... :)> Patch Size! > The patch in question is a bit smaller now thanks to some dovetailing with http://reviews.llvm.org/D11484 being merged. > I have already split up the patch into three chunks. I can do more as needed.Before I have any opinion on the current patch, we need to know what's going to change.> 1. use pthread_get/setspecific() (more on this below!!)If the re-creation of the thread state disappears after the bionic changes, I'd seriously investigate this one, as it seems the most logic.> 2. use some other workaround that does not use pthread APII'm trying to avoid this at all costs. :)> 3. use the new TLS slot from Dan's recent push to android. https://android-review.googlesource.com/#/c/167420/I asked somewhere else, but, will this give you a reasonably looking thread state without resorting to pthread_get/specific calls? Will this be a unique and non-redundant state for each thread?> 4. use the emutls work http://reviews.llvm.org/D12001I only suggested this because Joerg/Dan mentioned it. I haven't looked at that patch in detail. From what I'm seeing, options 1 and 3 above may very well suit us.> TESTS! > Currently, about 2/3 tests for tsan fail/flake on android+aarch64. > They aren't xfailed, because some of them are flaky, and thus unexpectedly pass at times. > So I decided to mark them as unsupported on android+aarch64, and thus is "green".So, if they currently pass on the production buildbots, you can't mark them as unstable because of your patch, as that will mean your patch is making them unstable, and we can't just make dozens of tests unstable for any amount of features. If your patch introduces the instability, you'll have to fix them before commit. That's why I suggested splitting it into many small chunks, none of which will leave the tree in an unstable state, because you'll have time to look at each instability individually, and fix them before commit. The bots will also be happier, and every one wins. I have no idea how to split your patch so that this happens, but since you know what the instability is, we could work together to test and validate *before* the patches go live. I have a fast AArch64 machine just for that purpose, so we can do this as fast as possible.> I have not yet had a chance to test this out. Assuming this works, what is your opinion on the "proper" TLS emulation in TSAN? > > a) always use pthread api? > b) leave the use of __thread keyword there and use some variant of arch+os conditionals to select the appropriate technique?I'm not familiar with pthread enough to give you a good answer. I'm hoping Joerg/Dan could chime in. Having said that, I'd rather use something that is arch/os agnostic, even if it slows it down a few percent. If the slow down is too great, then we can discuss case by case. cheers, -renato