----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com> > To: "Sanjay Patel" <spatel at rotateright.com>, "Clang" <cfe-dev at cs.uiuc.edu>, llvmdev at cs.uiuc.edu, > joerg at britannica.bec.de > Sent: Friday, June 26, 2015 8:55:22 PM > Subject: Re: [LLVMdev] [cfe-dev] bitwise ops on booleans > > > On Fri, Jun 26, 2015 at 2:01 PM Sanjay Patel < spatel at rotateright.com > > wrote: > > On Fri, Jun 26, 2015 at 2:17 PM, Joerg Sonnenberger < > joerg at britannica.bec.de > wrote: > > On Fri, Jun 26, 2015 at 12:51:38PM -0600, Sanjay Patel wrote: > > Assuming the transform is correct, what is the recommended way to > > write > > this in C/C++ to achieve the desired effect: we want both > > comparisons to be > > evaluated (do *not* want short-circuiting)? > > Why do you want that? > > For performance. The statement that materializing the condition bits > is as expensive as a branch is dependent on the arch, uarch, and > data set. We're currently only using arch for the DAG transform. So > for example, x86 globally changes all of these while PPC does not. > SimplifyCFG doesn't even consider arch and does the inverse > transform. > > I don't have any problem teaching codegen to use very specific > information to weigh the costs of an additional branch. But that > needs to be done in codegen, as it is rare on x86 at this point that > a well predicted branch is more expensive than anything, and given > the quality of branch predictors on x86, it is also relatively rare > that branches are so unpredictable. > > Again, we need to support both sides (lots of architectures outside > of x86 with weaker branch prediction!) but it should be a detailed > arch decision of how to lower this kind of code. > > > The programmer may have more knowledge about the predictability of a > specific branch and would like a way to specify that to the > compiler. > > We could introduce an *unpredictabie* annotation to LLVM if there is > sufficient demand for this. Specifically, this is different from a > 50/50 probability, as it implies a lack of pattern which the > processor can exploit to predict the behavior. This doesn't seem > like a bad construct for LLVM to have, although it seems like an > expensive one to add and so I would want to see a lot of really > compelling evidence that we *have* to let programmers make this > annotation. > > However, I'm really strongly opposed to tying this in any way to the > bitwise and operator. That is madness. What about masking bits > signifies anything about predictability? What about all the cases > where I'm not using and but I still have unpredictable branches that > should be avoided if possible?I am also strongly opposed to this. Amusingly enough, I was taking part in a meeting over the last few days where, among other things, experienced developers from several labs were providing feedback on compiler development priorities for future systems. Their top general request was that we try harder to make sure that different ways of writing the same thing were optimized in equivalent ways. This seems like a nice example of the kind of thing to which they were referring. Implicit hints like this are a) not portable b) often not documented c) not stable across compiler upgrades and d) not obvious to readers of the code. We should try to fix these kinds of things when we find them, and not introduce them on purpose.> > If you want to add support to *clang* for this, you'd be much better > off proposing something like __builtin_expect which instead > annotates a lack of predictability. I'm much more willing to > tolerate code that looks like:+1 -Hal> > > if (UNPREDICTABLE(x < 3) && UNPREDICTABLE(y > 7)) { ... } > > That is, if we actually must support this. I'm still extremely > skeptical of the entire value proposition here. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Ok, I understand better why the 'bitwise and' hack looks insane to you guys. :) It would take a change in the C language spec to happen. Since that's practically impossible, it actually makes this mission a little clearer, if not easier. There's widespread misunderstanding: http://stackoverflow.com/questions/4014535/differences-in-boolean-operators-vs-and-vs http://stackoverflow.com/questions/1279217/difference-between-and-or-and-for-comparison I've personally used the & vs. && hack for optimization purposes on many occasions over the years, so I wonder if C optimizers have only recently (last 5 years or so?) gotten smart enough to transform those bitwise expressions back into logical. In any case, I'll come up with some perf data to better illustrate the motivation. Thanks! On Sat, Jun 27, 2015 at 10:16 AM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- > > From: "Chandler Carruth" <chandlerc at google.com> > > To: "Sanjay Patel" <spatel at rotateright.com>, "Clang" < > cfe-dev at cs.uiuc.edu>, llvmdev at cs.uiuc.edu, > > joerg at britannica.bec.de > > Sent: Friday, June 26, 2015 8:55:22 PM > > Subject: Re: [LLVMdev] [cfe-dev] bitwise ops on booleans > > > > > > On Fri, Jun 26, 2015 at 2:01 PM Sanjay Patel < spatel at rotateright.com > > > wrote: > > > > On Fri, Jun 26, 2015 at 2:17 PM, Joerg Sonnenberger < > > joerg at britannica.bec.de > wrote: > > > > On Fri, Jun 26, 2015 at 12:51:38PM -0600, Sanjay Patel wrote: > > > Assuming the transform is correct, what is the recommended way to > > > write > > > this in C/C++ to achieve the desired effect: we want both > > > comparisons to be > > > evaluated (do *not* want short-circuiting)? > > > > Why do you want that? > > > > For performance. The statement that materializing the condition bits > > is as expensive as a branch is dependent on the arch, uarch, and > > data set. We're currently only using arch for the DAG transform. So > > for example, x86 globally changes all of these while PPC does not. > > SimplifyCFG doesn't even consider arch and does the inverse > > transform. > > > > I don't have any problem teaching codegen to use very specific > > information to weigh the costs of an additional branch. But that > > needs to be done in codegen, as it is rare on x86 at this point that > > a well predicted branch is more expensive than anything, and given > > the quality of branch predictors on x86, it is also relatively rare > > that branches are so unpredictable. > > > > Again, we need to support both sides (lots of architectures outside > > of x86 with weaker branch prediction!) but it should be a detailed > > arch decision of how to lower this kind of code. > > > > > > The programmer may have more knowledge about the predictability of a > > specific branch and would like a way to specify that to the > > compiler. > > > > We could introduce an *unpredictabie* annotation to LLVM if there is > > sufficient demand for this. Specifically, this is different from a > > 50/50 probability, as it implies a lack of pattern which the > > processor can exploit to predict the behavior. This doesn't seem > > like a bad construct for LLVM to have, although it seems like an > > expensive one to add and so I would want to see a lot of really > > compelling evidence that we *have* to let programmers make this > > annotation. > > > > However, I'm really strongly opposed to tying this in any way to the > > bitwise and operator. That is madness. What about masking bits > > signifies anything about predictability? What about all the cases > > where I'm not using and but I still have unpredictable branches that > > should be avoided if possible? > > I am also strongly opposed to this. Amusingly enough, I was taking part in > a meeting over the last few days where, among other things, experienced > developers from several labs were providing feedback on compiler > development priorities for future systems. Their top general request was > that we try harder to make sure that different ways of writing the same > thing were optimized in equivalent ways. This seems like a nice example of > the kind of thing to which they were referring. Implicit hints like this > are a) not portable b) often not documented c) not stable across compiler > upgrades and d) not obvious to readers of the code. We should try to fix > these kinds of things when we find them, and not introduce them on purpose. > > > > > If you want to add support to *clang* for this, you'd be much better > > off proposing something like __builtin_expect which instead > > annotates a lack of predictability. I'm much more willing to > > tolerate code that looks like: > > +1 > > -Hal > > > > > > > if (UNPREDICTABLE(x < 3) && UNPREDICTABLE(y > 7)) { ... } > > > > That is, if we actually must support this. I'm still extremely > > skeptical of the entire value proposition here. > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150627/9518b4c4/attachment.html>
> ----- Original Message ----- >> > From: "Chandler Carruth" <chandlerc at google.com> >> > I don't have any problem teaching codegen to use very specific >> > information to weigh the costs of an additional branch. But that >> > needs to be done in codegen, as it is rare on x86 at this point that >> > a well predicted branch is more expensive than anything, and given >> > the quality of branch predictors on x86, it is also relatively rare >> > that branches are so unpredictable. >> >I agree that it's rare, but it's common enough that it makes some of our very knowledgeable and perf-starved customers furious when LLVM unilaterally declares that it knows best and turns all bit-logic into branches. At the least, I think this optimization needs a chicken switch because there's no way we can know in advance that adding a branch will beat a compound predicate in all cases. I attached a test program and some results to PR23827. Feedback and more data points are certainly appreciated. Also, we should be careful here: there's no provision for branch prediction in the x86 arch. That's purely a micro-arch concept for x86 AFAIK. We don't know what micro-arches will look like N years from now...even if they're all excellent at prediction today. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150629/f042a148/attachment.html>