> On Jun 29, 2017, at 9:32 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > On 06/29/2017 10:41 AM, Peter Lawrence wrote: >> >>> On Jun 29, 2017, at 4:39 AM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote: >>> >>> On 06/28/2017 05:33 PM, Peter Lawrence wrote: >>>> Chandler, >>>> where we disagree is in whether the current project is moving the issue >>>> forward. It is not. It is making the compiler more complex for no additional value. >>>> >>>> The current project is not based in evidence, I have asked for any SPEC benchmark >>>> that shows performance gain by the compiler taking advantage of “undefined behavior” >>>> and no one can show that. >>> >>> I can't comment on SPEC, but this does remind me of code I was working on recently. To abstract the relevant parts, it looked something like this: >>> >>> template <typename T> >>> int do_something(T mask, bool cond) { >>> if (mask & 2) >>> return 1; >>> >>> if (cond) { >>> T high_mask = mask >> 48; >>> if (high_mask > 5) >>> do_something_1(high_mask); >>> else if (high_mask > 3) >>> do_something_2(); >>> } >>> >>> return 0; >>> } >>> >>> This function ended up being instantiated on different types T (e.g. unsigned char, unsigned int, unsigned long, etc.) and, dynamically, cond was always false when T was char. The question is: Can the compiler eliminate all of the code predicated on cond for the smaller types? In this case, this code was hot, and moreover, performance depended on the fact that, for T = unsigned char, the function was inlined and the branch on cond was eliminated. In the relevant translation unit, however, the compiler would never see how cond was set. >>> >>> Luckily, we do the right thing here currently. In the case where T = unsigned char, we end up folding both of the high_mask tests as though they were false. That entire part of the code is eliminated, the function is inlined, and everyone is happy. >>> >>> Why was I looking at this? As it turns out, if the 'else if' in this example is just 'else', we don't actually eliminate both sides of the branch. The same is true for many other variants of the conditionals (i.e. we don't recognize all of the code as dead). >> >> >> I apologize in advance if I have missed something here and am misreading your example... >> >> This doesn’t make sense to me, a shift amount of 48 is “undefined” for unsigned char, >> How do we know this isn’t a source code bug, >> What makes us think the the user intended the result to be “0”. > > As I said, this is representation of what the real code did, and looked like, after other inlining had taken place, etc. In the original form, the user's intent was clear. That code is never executed when T is a small integer type.I will still have a hard time believing this until I see a real example, can you fill in the details ? Peter Lawrence. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170629/018df4c5/attachment.html>
On Thu, Jun 29, 2017 at 10:43 AM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote:> I will still have a hard time believing this until I see a real example, can > you fill in the details ?JFYI, I gave a couple of real bug reports related to this in the previous email. -- Sanjoy
2017-06-29 10:43 GMT-07:00 Peter Lawrence via llvm-dev < llvm-dev at lists.llvm.org>:> > On Jun 29, 2017, at 9:32 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > On 06/29/2017 10:41 AM, Peter Lawrence wrote: > > > On Jun 29, 2017, at 4:39 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > On 06/28/2017 05:33 PM, Peter Lawrence wrote: > > Chandler, > where we disagree is in whether the current project is > moving the issue > forward. It is not. It is making the compiler more complex for no > additional value. > > The current project is not based in evidence, I have asked for any SPEC > benchmark > that shows performance gain by the compiler taking advantage of “undefined > behavior” > and no one can show that. > > > I can't comment on SPEC, but this does remind me of code I was working on > recently. To abstract the relevant parts, it looked something like this: > > template <typename T> > int do_something(T mask, bool cond) { > if (mask & 2) > return 1; > > if (cond) { > T high_mask = mask >> 48; > if (high_mask > 5) > do_something_1(high_mask); > else if (high_mask > 3) > do_something_2(); > } > > return 0; > } > > This function ended up being instantiated on different types T (e.g. > unsigned char, unsigned int, unsigned long, etc.) and, dynamically, cond > was always false when T was char. The question is: Can the compiler > eliminate all of the code predicated on cond for the smaller types? In this > case, this code was hot, and moreover, performance depended on the fact > that, for T = unsigned char, the function was inlined and the branch on > cond was eliminated. In the relevant translation unit, however, the > compiler would never see how cond was set. > > Luckily, we do the right thing here currently. In the case where T > unsigned char, we end up folding both of the high_mask tests as though they > were false. That entire part of the code is eliminated, the function is > inlined, and everyone is happy. > > Why was I looking at this? As it turns out, if the 'else if' in this > example is just 'else', we don't actually eliminate both sides of the > branch. The same is true for many other variants of the conditionals (i.e. > we don't recognize all of the code as dead). > > > > I apologize in advance if I have missed something here and am misreading > your example... > > This doesn’t make sense to me, a shift amount of 48 is “undefined” for > unsigned char, > How do we know this isn’t a source code bug, > What makes us think the the user intended the result to be “0”. > > > As I said, this is representation of what the real code did, and looked > like, after other inlining had taken place, etc. In the original form, the > user's intent was clear. That code is never executed when T is a small > integer type. > > > > I will still have a hard time believing this until I see a real example, > can you fill in the details ? >Hal gave you a real example, have you tried? I feel like you're asking more effort from others than you are ready to put in: it took me less than 5 minutes to reproduce what Hal was describing using his snippet: See the difference between https://godbolt.org/g/YYtsxB and https://godbolt.org/g/dTBBDq -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170629/c13bd271/attachment.html>
On 06/29/2017 12:43 PM, Peter Lawrence wrote:> >> On Jun 29, 2017, at 9:32 AM, Hal Finkel <hfinkel at anl.gov >> <mailto:hfinkel at anl.gov>> wrote: >> >> >> On 06/29/2017 10:41 AM, Peter Lawrence wrote: >>> >>>> On Jun 29, 2017, at 4:39 AM, Hal Finkel <hfinkel at anl.gov >>>> <mailto:hfinkel at anl.gov>> wrote: >>>> >>>> On 06/28/2017 05:33 PM, Peter Lawrence wrote: >>>>> Chandler, >>>>> where we disagree is in whether the current project >>>>> is moving the issue >>>>> forward. It is not. It is making the compiler more complex for >>>>> no additional value. >>>>> >>>>> The current project is not based in evidence, I have asked for any >>>>> SPEC benchmark >>>>> that shows performance gain by the compiler taking advantage of >>>>> “undefined behavior” >>>>> and no one can show that. >>>> >>>> I can't comment on SPEC, but this does remind me of code I was >>>> working on recently. To abstract the relevant parts, it looked >>>> something like this: >>>> >>>> template <typename T> >>>> int do_something(T mask, bool cond) { >>>> if (mask & 2) >>>> return 1; >>>> >>>> if (cond) { >>>> T high_mask = mask >> 48; >>>> if (high_mask > 5) >>>> do_something_1(high_mask); >>>> else if (high_mask > 3) >>>> do_something_2(); >>>> } >>>> >>>> return 0; >>>> } >>>> >>>> This function ended up being instantiated on different types T >>>> (e.g. unsigned char, unsigned int, unsigned long, etc.) and, >>>> dynamically, cond was always false when T was char. The question >>>> is: Can the compiler eliminate all of the code predicated on cond >>>> for the smaller types? In this case, this code was hot, and >>>> moreover, performance depended on the fact that, for T = unsigned >>>> char, the function was inlined and the branch on cond was >>>> eliminated. In the relevant translation unit, however, the compiler >>>> would never see how cond was set. >>>> >>>> Luckily, we do the right thing here currently. In the case where T >>>> = unsigned char, we end up folding both of the high_mask tests as >>>> though they were false. That entire part of the code is eliminated, >>>> the function is inlined, and everyone is happy. >>>> >>>> Why was I looking at this? As it turns out, if the 'else if' in >>>> this example is just 'else', we don't actually eliminate both sides >>>> of the branch. The same is true for many other variants of the >>>> conditionals (i.e. we don't recognize all of the code as dead). >>> >>> >>> I apologize in advance if I have missed something here and am >>> misreading your example... >>> >>> This doesn’t make sense to me, a shift amount of 48 is “undefined” >>> for unsigned char, >>> How do we know this isn’t a source code bug, >>> What makes us think the the user intended the result to be “0”. >> >> As I said, this is representation of what the real code did, and >> looked like, after other inlining had taken place, etc. In the >> original form, the user's intent was clear. That code is never >> executed when T is a small integer type. > > > I will still have a hard time believing this until I see a real > example, can you fill in the details ?As I recall, it was roughly like this: There is some collection of objects that have some flags (stored in a bit mask). Some of these objects optionally supported tracing, and for such objects, the tracing flags are stored in the high bits of the bit mask. Some of the low bits had universal meanings. For objects that didn't support tracing at all, they used a smaller bit mask (to save on space, etc.). External logic guaranteed that the tracing bits were only checked for objects that supported tracing (in addition to other conditions being true). There was a template class that handled the bit masks and interpreted some of the tracing bits (on supported objects, but only if tracing was enabled). This function was a member of that template class. The 48 was not directly specified in the function like this, but was a parameter, and so only appeared like this after inlining. -Hal> > > Peter Lawrence. > > > > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170629/d3693a19/attachment.html>
Mehdi, I think the following was the point of the conversation, That both those examples are illegal C programs. They are both “undefined behavior” because they both use a shift amount that is too large. They both should have been rejected by the compiler even though they weren’t. Hal agrees wth this assessment, That’s why we’re waiting for a more complete example. My belief is that undefined behavior is an optimization hazard, Not an optimization opportunity. Its just a belief, I could be proved Wrong at any moment, but it feels right to me. I would start looking for a more complete example myself, but my Belief is so strong that "optimizing undefined behavior" seems like a self-contradiction to me, and I don’t know where to Even start looking. I write compiler test programs in my spare time as a hobby, (which someday I’d like to contribute to llvm) So it’s not like I don’t have the knowledge or the inclination, I just don’t know how to approach this problem. You would think that since “optimization of undefined behavior” Has become such a bedrock concept in llvm that by now some Concrete examples would be readily at hand, But this doesn’t seem to be the case. So I’m eagerly awaiting Hal’s (or anyone else's) next email That has a complete example. Peter Lawrence.>>>> >>>> I can't comment on SPEC, but this does remind me of code I was working on recently. To abstract the relevant parts, it looked something like this: >>>> >>>> template <typename T> >>>> int do_something(T mask, bool cond) { >>>> if (mask & 2) >>>> return 1; >>>> >>>> if (cond) { >>>> T high_mask = mask >> 48; >>>> if (high_mask > 5) >>>> do_something_1(high_mask); >>>> else if (high_mask > 3) >>>> do_something_2(); >>>> } >>>> >>>> return 0; >>>> } >>>> >>>> This function ended up being instantiated on different types T (e.g. unsigned char, unsigned int, unsigned long, etc.) and, dynamically, cond was always false when T was char. The question is: Can the compiler eliminate all of the code predicated on cond for the smaller types? In this case, this code was hot, and moreover, performance depended on the fact that, for T = unsigned char, the function was inlined and the branch on cond was eliminated. In the relevant translation unit, however, the compiler would never see how cond was set. >>>> >>>> Luckily, we do the right thing here currently. In the case where T = unsigned char, we end up folding both of the high_mask tests as though they were false. That entire part of the code is eliminated, the function is inlined, and everyone is happy. >>>> >>>> Why was I looking at this? As it turns out, if the 'else if' in this example is just 'else', we don't actually eliminate both sides of the branch. The same is true for many other variants of the conditionals (i.e. we don't recognize all of the code as dead). >>> >>> >>> I apologize in advance if I have missed something here and am misreading your example... >>> >>> This doesn’t make sense to me, a shift amount of 48 is “undefined” for unsigned char, >>> How do we know this isn’t a source code bug, >>> What makes us think the the user intended the result to be “0”. >> >> As I said, this is representation of what the real code did, and looked like, after other inlining had taken place, etc. In the original form, the user's intent was clear. That code is never executed when T is a small integer type. > > > I will still have a hard time believing this until I see a real example, can you fill in the details ? > > > Hal gave you a real example, have you tried? I feel like you're asking more effort from others than you are ready to put in: it took me less than 5 minutes to reproduce what Hal was describing using his snippet: > > See the difference between https://godbolt.org/g/YYtsxB <https://godbolt.org/g/YYtsxB> and https://godbolt.org/g/dTBBDq <https://godbolt.org/g/dTBBDq> > > -- > Mehdi > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170629/e91d4d21/attachment.html>
On Thu, Jun 29, 2017 at 12:55 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On 06/29/2017 12:43 PM, Peter Lawrence wrote: > > > On Jun 29, 2017, at 9:32 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > On 06/29/2017 10:41 AM, Peter Lawrence wrote: > > > On Jun 29, 2017, at 4:39 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > On 06/28/2017 05:33 PM, Peter Lawrence wrote: > > Chandler, > where we disagree is in whether the current project is > moving the issue > forward. It is not. It is making the compiler more complex for no > additional value. > > The current project is not based in evidence, I have asked for any SPEC > benchmark > that shows performance gain by the compiler taking advantage of “undefined > behavior” > and no one can show that. > > > I can't comment on SPEC, but this does remind me of code I was working on > recently. To abstract the relevant parts, it looked something like this: > > template <typename T> > int do_something(T mask, bool cond) { > if (mask & 2) > return 1; > > if (cond) { > T high_mask = mask >> 48; > if (high_mask > 5) > do_something_1(high_mask); > else if (high_mask > 3) > do_something_2(); > } > > return 0; > } > > This function ended up being instantiated on different types T (e.g. > unsigned char, unsigned int, unsigned long, etc.) and, dynamically, cond > was always false when T was char. The question is: Can the compiler > eliminate all of the code predicated on cond for the smaller types? In this > case, this code was hot, and moreover, performance depended on the fact > that, for T = unsigned char, the function was inlined and the branch on > cond was eliminated. In the relevant translation unit, however, the > compiler would never see how cond was set. > > Luckily, we do the right thing here currently. In the case where T > unsigned char, we end up folding both of the high_mask tests as though they > were false. That entire part of the code is eliminated, the function is > inlined, and everyone is happy. > > Why was I looking at this? As it turns out, if the 'else if' in this > example is just 'else', we don't actually eliminate both sides of the > branch. The same is true for many other variants of the conditionals (i.e. > we don't recognize all of the code as dead). > > > > I apologize in advance if I have missed something here and am misreading > your example... > > This doesn’t make sense to me, a shift amount of 48 is “undefined” for > unsigned char, > How do we know this isn’t a source code bug, > What makes us think the the user intended the result to be “0”. > > > As I said, this is representation of what the real code did, and looked > like, after other inlining had taken place, etc. In the original form, the > user's intent was clear. That code is never executed when T is a small > integer type. > > > > I will still have a hard time believing this until I see a real example, > can you fill in the details ? > > > As I recall, it was roughly like this: There is some collection of objects > that have some flags (stored in a bit mask). Some of these objects > optionally supported tracing, and for such objects, the tracing flags are > stored in the high bits of the bit mask. Some of the low bits had universal > meanings. For objects that didn't support tracing at all, they used a > smaller bit mask (to save on space, etc.). External logic guaranteed that > the tracing bits were only checked for objects that supported tracing (in > addition to other conditions being true). There was a template class that > handled the bit masks and interpreted some of the tracing bits (on > supported objects, but only if tracing was enabled). This function was a > member of that template class. The 48 was not directly specified in the > function like this, but was a parameter, and so only appeared like this > after inlining. >This is such a cool example Hal! Thanks for sharing! -- Sean Silva> > -Hal > > > > Peter Lawrence. > > > > > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170629/8f1d432a/attachment.html>