Dmitry Vyukov via llvm-dev
2015-Aug-19 10:29 UTC
[llvm-dev] TSAN hack on AArch64 for Android
Wait, this change is not submitted yet, right? Or you mean mailing of this change in bad shape? I consider this change as work-in-progress where author is looking for feedback on his ongoing progress. I guess the change description should have been spelled this very explicitly. This change definitely needs more work to be submitted, splitting into smaller patches with a plan for submission order, refactoring, cleanup, etc. That is no question. As for the design, there is not much to design upfront, as you discover most of the issue only when you hit them (I would not foresee them). Also, most of the changes in the patch are just spot fixes (e.g. if a struct definition is different on the new platform, you just need to provide a correct definition), there is nothing to design there. Regarding the tls, whether we call it "emulation" or not, I believe we can do it a very neat and laconic form. Refactoring of tsan per-thread structures in preparation for the "emulation" should go in independently. Agree. Aarch64 support is already submitted to tsan as far as I see. On Wed, Aug 19, 2015 at 1:11 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 18 August 2015 at 22:32, Dmitry Vyukov <dvyukov at google.com> wrote: >> So the problem is only with the tls emulation, right? > > No. It's with the whole patch, including its contents, how it was > submitted, what is its purpose and how nobody cared about any of this. > > Adding TLS emulation is, in its own, a worthy feature. It allows for > platforms that can't do full TLS to use TSAN, which is a great tool. > > But when you want to add a feature to a tool that runs on multiple > architectures, you plan the feature, you design it, you send an RFC > email, you discuss in the list. > > Then you go about the implementation, submit some preparation patches, > clean up code, and have some more discussions. For a good example, see > Daniel Sanders' "The trouble with triples" and the associated patches. > It takes a lot of time to conjure a new feature in a complex piece of > software, but more than that, needs consensus and people designing it > beforehand. > > But when someone dumps a patch that is nowhere near production > quality, or would even compile in most platforms, people should be > backing up and saying: "Wait a minute. What are you trying to do, and > how did you get here?". We had a similar issue a few months back with > the new Stride Vectorizer. Fantastic feature, one that I had worked > before, but it took us a whole month of "code review" to figure out > that the whole patch was just wrong, and had to be completely > re-written, from where we had stopped two years back. > > I'm doing the same now. Let's make sure to agree on what's being > proposed, THEN we'll agree if that's worthy or not, THEN we'll agree > what's the best implementation, THEN we'll review code. We can't start > from the last iteration, as that'll invariably lead to poor code going > in. > > Furthermore, you don't get TLS+TSAN+AArch64+Android support in a > single patch! The likelihood that we'll end up reverting it a hundred > times is really high, and that will completely unbalance our buildbot > infrastructure and break everyone else's workflow. Slow is faster than > ludicrous. One step at a time, please. > > First, let's get TSAN working on AArch64. Adhemerval is working on > that. Then, let's see what's missing for Android. Ok, TLS emulation is > required, at least for the time being. What other architecture would > benefit from TLS emulation? None? Ok, then we set out to change the > code, so that TLS emulation can go in. Meanwhile, a group of people > discuss (in public) the design of such framework, and in time, a > number of patches will start to go in, one by one, until the feature > is on. Only then we'll get Android using it. > > The time will also give buildbots enough iterations to pick any bug > that we don't find on code review. Bugs that would be obscured by a > giant patch with many other more obvious bugs exploding our CI > infrastructure and getting reverted by angry bot maintainers. > > > >> I don't know what it worth to add __thread support to android, and >> whether somebody is going to address it soon. > > Wouldn't it be better to know that before submitting a large change to > another code base? If there is evidence that this is never going to > happen, or that, for some very strong engineering reasons, it's not > possible to do it, then the reasons to do TLS emulation are thoroughly > justified. Right now, it's just a hunch. > > >> But I think we can make the tls emulation _much_ cleaner. First, we >> need to put all tsan per-thread data into ThreadState object, accesses >> to ThreadState are already wrapped into cur_thread() function. So now >> we have a single function to modify, no macros spread across the >> codebase, no new files, etc. > > I'm no expert in that field, but this sounds like a good plan. Doesn't > seem to be part of this patch, though. Nor it should, as it actually > should have been a preparatory patch, submitted after all interested > parties agreed that TLS emulation is the way to go. > > >> INLINE ThreadState *cur_thread() { >> #ifdef REAL_TLS >> return reinterpret_cast<ThreadState *>(&cur_thread_placeholder); >> #else >> return reinterpret_cast<ThreadState *>(__get_tls()[ANDROID_TSAN_TLS_SLOT]); >> #endif >> } >> >> Which is roughly the same code-wise, but faster and safer. > > Dan "Eastwood" Albert has pulled the trigger. :) > > This looks reasonable to me. Does this still mean you need the > emulation, or is Android going to give you a "reasonably looking > ThreadState"? > > cheers, > --renato
Renato Golin via llvm-dev
2015-Aug-19 11:15 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On 19 August 2015 at 11:29, Dmitry Vyukov <dvyukov at google.com> wrote:> Wait, this change is not submitted yet, right? Or you mean mailing of > this change in bad shape?Right. Jason has submitted high quality patches before, so this is in no way a reprimand to his submission. I am specifically worried about the lack of higher level questions from the reviewers / code owners about the general idea of the patch, not with the submission itself.> I consider this change as work-in-progress where author is looking for > feedback on his ongoing progress. I guess the change description > should have been spelled this very explicitly.For that kind of work, we normally follow the guidelines: 1. Send an email to the list explaining the problem / design, have some high level discussions 2. Send an "[RFC]" patch, explicitly identified as "not-for-commit" quality, have some specific discussions 3. Send *another* patch, for commit, with loads of tests, comments, FIXMEs, have final review People normally ignore most spurious errors from RFC patches, but if it's not marked as such, you'll get a lot of heat. Also, it is good practice to have a few steps *before* sending a big final patch: 1. Compile your code in different ways (debug/release/asserts) on at least x86_64/ARM/AArch64 and run check-all with at least LLVM+Clang+"your-component". 2. If you're targeting a different platform, make sure you run the tests on them, too. (native/cross, doesn't matter much) 3. Make sure you add enough tests, or describe extensively what tests you have made to be sure the patch is good 4. Review your diff cautiously, to make sure no left-overs remain: * no commented out code * no spurious whitespace changes * no wrong comments, or fixed FIXME's left> This change definitely needs more work to be submitted, splitting into > smaller patches with a plan for submission order, refactoring, > cleanup, etc. That is no question.More importantly, this patch may be moot, because Joerg's work already does that, and Dan's patch to Android makes the remaining step a 5-line patch. *That* is the question. How comfortable are we to add work-arounds before even questioning if the proper solution is possible / available. The submission is explicit: "This is a work-around, a better implementation would be X". It is our job to evaluate the real cost of X and see if it's worth putting a hack. It seems the cost is low, or paid already, so the patch should have been refused before any real review was made.> As for the design, there is not much to design upfront, as you > discover most of the issue only when you hit them (I would not foresee > them).TLS emulation vs proper implementation? Emulation in TSAN or RT or some other lib? TSAN is hardly the place for run-time library calls, RT is a much better place. Work arounds in LLVM is hardly a good replacement for a proper implementation in the Android linker. All those are design decisions. In the end, RT has already a TLS emulation coming about and Android just got a TLS slot for TSAN. Good design decisions stemmed from design discussions in the list.> Refactoring of tsan per-thread structures in preparation for the > "emulation" should go in independently. Agree. > Aarch64 support is already submitted to tsan as far as I see.Not yet. This is still work in progress. Adhemerval has implemented the functionality and we have tested on our hardware. But there are issues with other hardware we don't have access to, so progress has slowed down considerably. We're working on some AArch64 buildbots to speed that up a lot, but as usual, unforeseen complications hunt our sleep. :) Despite the experimental nature of our TSAN and Joerg's TLS emulation implementations, I think it's enough to give Jason a working base to start from scratch and use that to make TSAN work on Android (with Dan's change). cheers, --renato
Dmitry Vyukov via llvm-dev
2015-Aug-19 11:20 UTC
[llvm-dev] TSAN hack on AArch64 for Android
On Wed, Aug 19, 2015 at 1:15 PM, Renato Golin <renato.golin at linaro.org> wrote:> On 19 August 2015 at 11:29, Dmitry Vyukov <dvyukov at google.com> wrote: >> Wait, this change is not submitted yet, right? Or you mean mailing of >> this change in bad shape? > > Right. > > Jason has submitted high quality patches before, so this is in no way > a reprimand to his submission. I am specifically worried about the > lack of higher level questions from the reviewers / code owners about > the general idea of the patch, not with the submission itself. > > >> I consider this change as work-in-progress where author is looking for >> feedback on his ongoing progress. I guess the change description >> should have been spelled this very explicitly. > > For that kind of work, we normally follow the guidelines: > > 1. Send an email to the list explaining the problem / design, have > some high level discussions > 2. Send an "[RFC]" patch, explicitly identified as "not-for-commit" > quality, have some specific discussions > 3. Send *another* patch, for commit, with loads of tests, comments, > FIXMEs, have final review > > People normally ignore most spurious errors from RFC patches, but if > it's not marked as such, you'll get a lot of heat. > > Also, it is good practice to have a few steps *before* sending a big > final patch: > > 1. Compile your code in different ways (debug/release/asserts) on at > least x86_64/ARM/AArch64 and run check-all with at least > LLVM+Clang+"your-component". > 2. If you're targeting a different platform, make sure you run the > tests on them, too. (native/cross, doesn't matter much) > 3. Make sure you add enough tests, or describe extensively what tests > you have made to be sure the patch is good > 4. Review your diff cautiously, to make sure no left-overs remain: > * no commented out code > * no spurious whitespace changes > * no wrong comments, or fixed FIXME's left > > >> This change definitely needs more work to be submitted, splitting into >> smaller patches with a plan for submission order, refactoring, >> cleanup, etc. That is no question. > > More importantly, this patch may be moot, because Joerg's work already > does that, and Dan's patch to Android makes the remaining step a > 5-line patch. > > *That* is the question. How comfortable are we to add work-arounds > before even questioning if the proper solution is possible / > available. > > The submission is explicit: "This is a work-around, a better > implementation would be X". It is our job to evaluate the real cost of > X and see if it's worth putting a hack. > > It seems the cost is low, or paid already, so the patch should have > been refused before any real review was made. > > >> As for the design, there is not much to design upfront, as you >> discover most of the issue only when you hit them (I would not foresee >> them). > > TLS emulation vs proper implementation? Emulation in TSAN or RT or > some other lib? > > TSAN is hardly the place for run-time library calls, RT is a much > better place. Work arounds in LLVM is hardly a good replacement for a > proper implementation in the Android linker. > > All those are design decisions. In the end, RT has already a TLS > emulation coming about and Android just got a TLS slot for TSAN. Good > design decisions stemmed from design discussions in the list. > > >> Refactoring of tsan per-thread structures in preparation for the >> "emulation" should go in independently. Agree. >> Aarch64 support is already submitted to tsan as far as I see. > > Not yet. This is still work in progress. > > Adhemerval has implemented the functionality and we have tested on our > hardware. But there are issues with other hardware we don't have > access to, so progress has slowed down considerably. We're working on > some AArch64 buildbots to speed that up a lot, but as usual, > unforeseen complications hunt our sleep. :) > > Despite the experimental nature of our TSAN and Joerg's TLS emulation > implementations, I think it's enough to give Jason a working base to > start from scratch and use that to make TSAN work on Android (with > Dan's change).Sounds good to me.