Mehdi Amini via llvm-dev
2016-Feb-25 06:02 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote: > ----- Original Message ----- > > > From: "Chandler Carruth via llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> > > To: "Philip Reames" <listmail at philipreames.com <mailto:listmail at philipreames.com>>, "Duncan P. N. Exon > > Smith" <dexonsmith at apple.com <mailto:dexonsmith at apple.com>>, "Sanjoy Das" > > <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpointers.com>> > > Cc: "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> > > Sent: Wednesday, February 24, 2016 10:29:23 PM > > Subject: Re: [llvm-dev] Possible soundness issue with > > available_externally (split from "RFC: Add guard intrinsics") > > > Yea, I'm pretty sad about all of this. I'm also not seeing a lot of > > awesome paths forward. > > > Here is the least bad strategy I can come up with. Curious if folks > > think this is sufficient: > > I may not completely understand the problem, but this seems like overkill. The underlying restriction is that, if the compiler makes a non-determinism-collapsing choice when optimizing a function, it must make the same choice for all definitions of that function (undefined behavior excluded). > > This isn't enough, because some definition in some other module may *not be optimized at all*, and yet may get selected at link time. > > Put another way, it must *prove* that the same choice will *always* be made for all definitions. This is akin to proving that the optimizer is run over all translation units for C++ linkonce_odr functions, which you can't do.Even if the optimizer is ran, it could take different decision because the context would be different: linkonce_odr foo() { bar(); } If bar() is present in the TU it can gets inlined into foo(). So the optimizer would optimize differently foo().> > The result would be failing to optimize the bodies of linkonce_odr functions in any way which was externally detectable such as this. I think that would be *much* worse than losing the ability to do function attribute deduction for such functions? > > Thus, with an externally_available function, the CSE in Sanjoy's original example should be forbidden. Richard's example again demonstrates this principle, although in this case the non-determinism is in the choice of a globally-visible implementation technique rather than non-determinism from memory-subsystem reordering. > > There is a complication, which you imply in your proposal, that such optimizations need to be forbidden not just in the externally_available functions themselves, but in any local function transitively called by one. This, however, we can take care of with an (easily-deduced) attribute.I'm not sure why " such optimizations need to be forbidden [...] in any local function transitively called by one", can you illustrate with an example? -- Mehdi> > In short, it is not clear to me that the number of problematic optimizations is large (seems likely restricted to things involving atomics in practice), > and while I understand the auditing difficulties here, we should just restrict these in appropriate contexts instead of trying to restrict all information flow into or out of comdats. > > -Hal > > > 1) Stop deducing function attributes within comdats by examining the > > bodies of the functions (so that we remain free to transform the > > bodies of functions). > > 2) Teach frontends to emit (even at O0!!!) trivially deduced function > > attributes for comdats so that we continue to catch easy cases. > > 3) Ensure and specify that we never hoist code *into* a comdat group > > in which it would not have been executed previously. I don't know of > > anything in LLVM that does this today, but it would become an > > important invariant. > > 4) Work a lot harder to do internalizing and removing of this > > restriction. > > > Pretty horrible. But I think it is correct. > > > As a slight modification to #1 and #2, we could have a very carefully > > crafted deduction rule where we only deduce function attributes for > > functions prior to any modification of their function bodies. Such > > attributes should be conservatively correct because we would never > > lift new code into the function bodies. This would at least allow us > > to do bottom-up deduction to catch interprocedural cases. But it > > would become incredibly subtle that this is only valid prior to > > *any* transformations of the comdat-containing functions. > > > I'm starting to think this subtle rule might be worth it. But I'm > > frankly terrified by the implications. > > > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev < > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote: > > > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev wrote: > > > > > >> On 2016-Feb-24, at 19:46, Sanjoy Das < > > > >> sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpointers.com> > wrote: > > > > > >> > > > > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth < > > > >> chandlerc at google.com <mailto:chandlerc at google.com> > wrote: > > > > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith > > > > > >>> < dexonsmith at apple.com <mailto:dexonsmith at apple.com> > wrote: > > > > > >>>> > > > > > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth < > > > >>>>> chandlerc at google.com <mailto:chandlerc at google.com> > wrote: > > > > > >>>>> > > > > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev > > > > > >>>>> < llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > wrote: > > > > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon Smith > > > > > >>>>> < dexonsmith at apple.com <mailto:dexonsmith at apple.com> > wrote: > > > > > >>>>>>> If we do not inline @foo(), and instead re-link the call > > > >>>>>>> site > > > >>>>>>> in > > > > > >>>>>>> @main > > > > > >>>>>>> to some non-optimized copy (or differently optimized copy) > > > >>>>>>> of > > > >>>>>>> @foo, > > > > > >>>>>>> then it is possible for the program to have the behavior > > > >>>>>>> {print("Y"); > > > > > >>>>>>> print ("X")}, which was disallowed in the earlier program. > > > > > >>>>>>> > > > > > >>>>>>> In other words, opt refined the semantics of @foo() (i.e. > > > >>>>>>> reduced the > > > > > >>>>>>> set of behaviors it may have) in ways that would make later > > > > > >>>>>>> optimizations invalid if we de-refine the implementation of > > > >>>>>>> @foo(). > > > > > >>>>>> I'm probably missing something obvious here. How could the > > > >>>>>> result of > > > > > >>>>>> `%t0 != %t1` be different at optimization time in one file > > > >>>>>> than from > > > > > >>>>>> runtime in the "real" implementation? Doesn't this make the > > > >>>>>> CSE > > > > > >>>>>> invalid? > > > > > >>>>> `%t0` and `%t1` are "allowed" to "always be the same", i.e. > > > >>>>> an > > > > > >>>>> implementation of @foo that always feeds in the same > > > > > >>>>> value for `%t0` and `%t1` is a valid implementation (which is > > > >>>>> why the > > > > > >>>>> CSE was valid); but it is not the *only* valid > > > >>>>> implementation. > > > >>>>> If I > > > > > >>>>> don't CSE the two load instructions (also a valid thing to > > > >>>>> do), > > > >>>>> and > > > > > >>>>> this is a second thread writing to `%par`, then the two > > > >>>>> values > > > >>>>> loaded > > > > > >>>>> can be different, and you could end up printing `"X"` in > > > >>>>> `@foo`. > > > > > >>>>> > > > > > >>>>> Did that make sense? > > > > > >>>> Yes. To be sure I understand the scope: this is only a problem > > > >>>> for > > > > > >>>> atomics, correct? (Because multi-threaded behaviour with other > > > >>>> globals > > > > > >>>> is UB?) > > > > > >>>> > > > > > >>>>>> Does linkonce_odr linkage have the same problem? > > > > > >>>>>> - If so, do you want to change it too? > > > > > >>>>>> - Else, why not? > > > > > >>>>> Going by the specification in the LangRef, I'd say it depends > > > >>>>> on how > > > > > >>>>> you define "definitive". If you're allowed to replace the > > > >>>>> body > > > >>>>> of a > > > > > >>>>> function with a differently optimized body, then the above > > > >>>>> problem > > > > > >>>>> exists. > > > > > >>>>> > > > > > >>>>> I believe that is the case, and I strongly believe the > > > >>>>> problem > > > >>>>> you > > > > > >>>>> outline exists for linkonce_odr exactly as it does for > > > >>>>> available_externally. > > > > > >>>>> > > > > > >>>>> Which is what makes this scary: every C++ inline function > > > >>>>> today > > > >>>>> can > > > > > >>>>> trigger this. > > > > > >>>> Every C/C++ inline or template function. But only the ones > > > >>>> that > > > >>>> use > > > > > >>>> atomics, right? > > > > > >>> > > > > > >>> Well, with *this* example... > > > > > >> Atomic are one source of non-determinism that compilers can > > > >> reason > > > > > >> about. I don't know if the following snippet is well defined or > > > >> not, > > > > > >> but you could have similar issues with > > > > > >> > > > > > >> > > > > > >> void foo() { > > > > > >> int *p = malloc(sizeof(int)); > > > > > >> if (*p < 10) print("X"); > > > > > >> } > > > > > >> > > > > > >> or (again, I don't know if this is actually well defined) > > > > > >> > > > > > >> void foo() { > > > > > >> int t; // it is probably reasonable to fold compares with > > > > > >> ptrtoint(alloca) to undef > > > > > >> if ((intptr_t)(&t) < 10) print("X"); > > > > > >> } > > > > > >> > > > > > > The first one at least is UB, but as Richard pointed out the > > > > scope > > > > > > is certainly broader than atomics (it's not even just > > > > well-defined > > > > > > non-deterministism). > > > > > > > > > > > > I'm kind of terrified by the implications. > > > > > Me too. :( > > > > > > > > > > > >> -- Sanjoy > > > > > >> > > > > > >>>> > > > > > >>>> Not that I'm sure that will end up being a helpful > > > >>>> distinction. > > > > > >>> > > > > > >>> Right. See Richard's comment. I think that sums up the real > > > >>> issue > > > >>> here. =/ > > > > > > _______________________________________________ > > > > > > LLVM Developers mailing list > > > > > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > > _______________________________________________ > > > > > LLVM Developers mailing list > > > > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160224/669118b0/attachment-0001.html>
Hal Finkel via llvm-dev
2016-Feb-25 06:20 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
----- Original Message -----> From: "Mehdi Amini" <mehdi.amini at apple.com> > To: "Chandler Carruth" <chandlerc at google.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" > <llvm-dev at lists.llvm.org> > Sent: Thursday, February 25, 2016 12:02:16 AM > Subject: Re: [llvm-dev] Possible soundness issue with > available_externally (split from "RFC: Add guard intrinsics")> > On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev < > > llvm-dev at lists.llvm.org > wrote: >> > On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov > > > wrote: >> > > ----- Original Message ----- > > >> > > > From: "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org > > > > > > > > > > > > To: "Philip Reames" < listmail at philipreames.com >, "Duncan P. > > > > N. > > > > Exon > > > > > > > Smith" < dexonsmith at apple.com >, "Sanjoy Das" > > > > > > > < sanjoy at playingwithpointers.com > > > > > > > > Cc: "llvm-dev" < llvm-dev at lists.llvm.org > > > > > > > > Sent: Wednesday, February 24, 2016 10:29:23 PM > > > > > > > Subject: Re: [llvm-dev] Possible soundness issue with > > > > > > > available_externally (split from "RFC: Add guard intrinsics") > > >> > > > Yea, I'm pretty sad about all of this. I'm also not seeing a > > > > lot > > > > of > > > > > > > awesome paths forward. > > >> > > > Here is the least bad strategy I can come up with. Curious if > > > > folks > > > > > > > think this is sufficient: > > >> > > I may not completely understand the problem, but this seems like > > > overkill. The underlying restriction is that, if the compiler > > > makes > > > a non-determinism-collapsing choice when optimizing a function, > > > it > > > must make the same choice for all definitions of that function > > > (undefined behavior excluded). > > > > > This isn't enough, because some definition in some other module may > > *not be optimized at all*, and yet may get selected at link time. >> > Put another way, it must *prove* that the same choice will *always* > > be made for all definitions. This is akin to proving that the > > optimizer is run over all translation units for C++ linkonce_odr > > functions, which you can't do. > > Even if the optimizer is ran, it could take different decision > because the context would be different:> linkonce_odr foo() { > bar(); > }> If bar() is present in the TU it can gets inlined into foo(). So the > optimizer would optimize differently foo().> > The result would be failing to optimize the bodies of linkonce_odr > > functions in any way which was externally detectable such as this. > > I > > think that would be *much* worse than losing the ability to do > > function attribute deduction for such functions? >> > > Thus, with an externally_available function, the CSE in Sanjoy's > > > original example should be forbidden. Richard's example again > > > demonstrates this principle, although in this case the > > > non-determinism is in the choice of a globally-visible > > > implementation technique rather than non-determinism from > > > memory-subsystem reordering. > > >> > > There is a complication, which you imply in your proposal, that > > > such > > > optimizations need to be forbidden not just in the > > > externally_available functions themselves, but in any local > > > function > > > transitively called by one. This, however, we can take care of > > > with > > > an (easily-deduced) attribute. > > >> I'm not sure why " such optimizations need to be forbidden [...] in > any local function transitively called by one", can you illustrate > with an example?Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply. -Hal> -- > Mehdi> > > In short, it is not clear to me that the number of problematic > > > optimizations is large (seems likely restricted to things > > > involving > > > atomics in practice), > > > > > > and while I understand the auditing difficulties here, we should > > > just > > > restrict these in appropriate contexts instead of trying to > > > restrict > > > all information flow into or out of comdats. > > >> > > -Hal > > >> > > > 1) Stop deducing function attributes within comdats by > > > > examining > > > > the > > > > > > > bodies of the functions (so that we remain free to transform > > > > the > > > > > > > bodies of functions). > > > > > > > 2) Teach frontends to emit (even at O0!!!) trivially deduced > > > > function > > > > > > > attributes for comdats so that we continue to catch easy cases. > > > > > > > 3) Ensure and specify that we never hoist code *into* a comdat > > > > group > > > > > > > in which it would not have been executed previously. I don't > > > > know > > > > of > > > > > > > anything in LLVM that does this today, but it would become an > > > > > > > important invariant. > > > > > > > 4) Work a lot harder to do internalizing and removing of this > > > > > > > restriction. > > >> > > > Pretty horrible. But I think it is correct. > > >> > > > As a slight modification to #1 and #2, we could have a very > > > > carefully > > > > > > > crafted deduction rule where we only deduce function attributes > > > > for > > > > > > > functions prior to any modification of their function bodies. > > > > Such > > > > > > > attributes should be conservatively correct because we would > > > > never > > > > > > > lift new code into the function bodies. This would at least > > > > allow > > > > us > > > > > > > to do bottom-up deduction to catch interprocedural cases. But > > > > it > > > > > > > would become incredibly subtle that this is only valid prior to > > > > > > > *any* transformations of the comdat-containing functions. > > >> > > > I'm starting to think this subtle rule might be worth it. But > > > > I'm > > > > > > > frankly terrified by the implications. > > >> > > > On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev < > > > > > > > llvm-dev at lists.llvm.org > wrote: > > >> > > > > On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev > > > > > wrote: > > > > > > > > > > > > > > > >> On 2016-Feb-24, at 19:46, Sanjoy Das < > > > > > > > > >> sanjoy at playingwithpointers.com > wrote: > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth < > > > > > > > > >> chandlerc at google.com > wrote: > > > > > > > > > > > > > > > >>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith > > > > > > > > > > > > > > > >>> < dexonsmith at apple.com > wrote: > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > >>>>> On 2016-Feb-24, at 19:17, Chandler Carruth < > > > > > > > > >>>>> chandlerc at google.com > wrote: > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > >>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev > > > > > > > > > > > > > > > >>>>> < llvm-dev at lists.llvm.org > wrote: > > > > > > > > > > > > > > > >>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon > > > > > >>>>> Smith > > > > > > > > > > > > > > > >>>>> < dexonsmith at apple.com > wrote: > > > > > > > > > > > > > > > >>>>>>> If we do not inline @foo(), and instead re-link the > > > > > >>>>>>> call > > > > > > > > >>>>>>> site > > > > > > > > >>>>>>> in > > > > > > > > > > > > > > > >>>>>>> @main > > > > > > > > > > > > > > > >>>>>>> to some non-optimized copy (or differently optimized > > > > > >>>>>>> copy) > > > > > > > > >>>>>>> of > > > > > > > > >>>>>>> @foo, > > > > > > > > > > > > > > > >>>>>>> then it is possible for the program to have the > > > > > >>>>>>> behavior > > > > > > > > >>>>>>> {print("Y"); > > > > > > > > > > > > > > > >>>>>>> print ("X")}, which was disallowed in the earlier > > > > > >>>>>>> program. > > > > > > > > > > > > > > > >>>>>>> > > > > > > > > > > > > > > > >>>>>>> In other words, opt refined the semantics of @foo() > > > > > >>>>>>> (i.e. > > > > > > > > >>>>>>> reduced the > > > > > > > > > > > > > > > >>>>>>> set of behaviors it may have) in ways that would make > > > > > >>>>>>> later > > > > > > > > > > > > > > > >>>>>>> optimizations invalid if we de-refine the > > > > > >>>>>>> implementation > > > > > >>>>>>> of > > > > > > > > >>>>>>> @foo(). > > > > > > > > > > > > > > > >>>>>> I'm probably missing something obvious here. How could > > > > > >>>>>> the > > > > > > > > >>>>>> result of > > > > > > > > > > > > > > > >>>>>> `%t0 != %t1` be different at optimization time in one > > > > > >>>>>> file > > > > > > > > >>>>>> than from > > > > > > > > > > > > > > > >>>>>> runtime in the "real" implementation? Doesn't this > > > > > >>>>>> make > > > > > >>>>>> the > > > > > > > > >>>>>> CSE > > > > > > > > > > > > > > > >>>>>> invalid? > > > > > > > > > > > > > > > >>>>> `%t0` and `%t1` are "allowed" to "always be the same", > > > > > >>>>> i.e. > > > > > > > > >>>>> an > > > > > > > > > > > > > > > >>>>> implementation of @foo that always feeds in the same > > > > > > > > > > > > > > > >>>>> value for `%t0` and `%t1` is a valid implementation > > > > > >>>>> (which > > > > > >>>>> is > > > > > > > > >>>>> why the > > > > > > > > > > > > > > > >>>>> CSE was valid); but it is not the *only* valid > > > > > > > > >>>>> implementation. > > > > > > > > >>>>> If I > > > > > > > > > > > > > > > >>>>> don't CSE the two load instructions (also a valid thing > > > > > >>>>> to > > > > > > > > >>>>> do), > > > > > > > > >>>>> and > > > > > > > > > > > > > > > >>>>> this is a second thread writing to `%par`, then the two > > > > > > > > >>>>> values > > > > > > > > >>>>> loaded > > > > > > > > > > > > > > > >>>>> can be different, and you could end up printing `"X"` > > > > > >>>>> in > > > > > > > > >>>>> `@foo`. > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > >>>>> Did that make sense? > > > > > > > > > > > > > > > >>>> Yes. To be sure I understand the scope: this is only a > > > > > >>>> problem > > > > > > > > >>>> for > > > > > > > > > > > > > > > >>>> atomics, correct? (Because multi-threaded behaviour with > > > > > >>>> other > > > > > > > > >>>> globals > > > > > > > > > > > > > > > >>>> is UB?) > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > >>>>>> Does linkonce_odr linkage have the same problem? > > > > > > > > > > > > > > > >>>>>> - If so, do you want to change it too? > > > > > > > > > > > > > > > >>>>>> - Else, why not? > > > > > > > > > > > > > > > >>>>> Going by the specification in the LangRef, I'd say it > > > > > >>>>> depends > > > > > > > > >>>>> on how > > > > > > > > > > > > > > > >>>>> you define "definitive". If you're allowed to replace > > > > > >>>>> the > > > > > > > > >>>>> body > > > > > > > > >>>>> of a > > > > > > > > > > > > > > > >>>>> function with a differently optimized body, then the > > > > > >>>>> above > > > > > > > > >>>>> problem > > > > > > > > > > > > > > > >>>>> exists. > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > >>>>> I believe that is the case, and I strongly believe the > > > > > > > > >>>>> problem > > > > > > > > >>>>> you > > > > > > > > > > > > > > > >>>>> outline exists for linkonce_odr exactly as it does for > > > > > > > > >>>>> available_externally. > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > >>>>> Which is what makes this scary: every C++ inline > > > > > >>>>> function > > > > > > > > >>>>> today > > > > > > > > >>>>> can > > > > > > > > > > > > > > > >>>>> trigger this. > > > > > > > > > > > > > > > >>>> Every C/C++ inline or template function. But only the > > > > > >>>> ones > > > > > > > > >>>> that > > > > > > > > >>>> use > > > > > > > > > > > > > > > >>>> atomics, right? > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Well, with *this* example... > > > > > > > > > > > > > > > >> Atomic are one source of non-determinism that compilers > > > > > >> can > > > > > > > > >> reason > > > > > > > > > > > > > > > >> about. I don't know if the following snippet is well > > > > > >> defined > > > > > >> or > > > > > > > > >> not, > > > > > > > > > > > > > > > >> but you could have similar issues with > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> void foo() { > > > > > > > > > > > > > > > >> int *p = malloc(sizeof(int)); > > > > > > > > > > > > > > > >> if (*p < 10) print("X"); > > > > > > > > > > > > > > > >> } > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> or (again, I don't know if this is actually well defined) > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> void foo() { > > > > > > > > > > > > > > > >> int t; // it is probably reasonable to fold compares with > > > > > > > > > > > > > > > >> ptrtoint(alloca) to undef > > > > > > > > > > > > > > > >> if ((intptr_t)(&t) < 10) print("X"); > > > > > > > > > > > > > > > >> } > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > The first one at least is UB, but as Richard pointed out > > > > > > the > > > > > > > > > scope > > > > > > > > > > > > > > > > is certainly broader than atomics (it's not even just > > > > > > > > > well-defined > > > > > > > > > > > > > > > > non-deterministism). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm kind of terrified by the implications. > > > > > > > > > > > > > > > Me too. :( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> -- Sanjoy > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > >>>> Not that I'm sure that will end up being a helpful > > > > > > > > >>>> distinction. > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Right. See Richard's comment. I think that sums up the > > > > > >>> real > > > > > > > > >>> issue > > > > > > > > >>> here. =/ > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > > LLVM Developers mailing list > > > > > > > > > > > > > > > > llvm-dev at lists.llvm.org > > > > > > > > > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > > >> > > > > _______________________________________________ > > > > > > > > > > > > > > > LLVM Developers mailing list > > > > > > > > > > > > > > > llvm-dev at lists.llvm.org > > > > > > > > > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > > >> > > > _______________________________________________ > > > > > > > LLVM Developers mailing list > > > > > > > llvm-dev at lists.llvm.org > > > > > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >> > > -- > > >> > > -- > > > > > > Hal Finkel > > > > > > Assistant Computational Scientist > > > > > > Leadership Computing Facility > > > > > > Argonne National Laboratory > > >> > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Mehdi Amini via llvm-dev
2016-Feb-25 06:28 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
> On Feb 24, 2016, at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- > >> From: "Mehdi Amini" <mehdi.amini at apple.com> >> To: "Chandler Carruth" <chandlerc at google.com> >> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" >> <llvm-dev at lists.llvm.org> >> Sent: Thursday, February 25, 2016 12:02:16 AM >> Subject: Re: [llvm-dev] Possible soundness issue with >> available_externally (split from "RFC: Add guard intrinsics") > >>> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev < >>> llvm-dev at lists.llvm.org > wrote: >> > >>> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov > >>> wrote: >> > >>>> ----- Original Message ----- >>> >> > >>>>> From: "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org >>>>>> >>> >> >>>>> To: "Philip Reames" < listmail at philipreames.com >, "Duncan P. >>>>> N. >>>>> Exon >>> >> >>>>> Smith" < dexonsmith at apple.com >, "Sanjoy Das" >>> >> >>>>> < sanjoy at playingwithpointers.com > >>> >> >>>>> Cc: "llvm-dev" < llvm-dev at lists.llvm.org > >>> >> >>>>> Sent: Wednesday, February 24, 2016 10:29:23 PM >>> >> >>>>> Subject: Re: [llvm-dev] Possible soundness issue with >>> >> >>>>> available_externally (split from "RFC: Add guard intrinsics") >>> >> > >>>>> Yea, I'm pretty sad about all of this. I'm also not seeing a >>>>> lot >>>>> of >>> >> >>>>> awesome paths forward. >>> >> > >>>>> Here is the least bad strategy I can come up with. Curious if >>>>> folks >>> >> >>>>> think this is sufficient: >>> >> > >>>> I may not completely understand the problem, but this seems like >>>> overkill. The underlying restriction is that, if the compiler >>>> makes >>>> a non-determinism-collapsing choice when optimizing a function, >>>> it >>>> must make the same choice for all definitions of that function >>>> (undefined behavior excluded). >>> >> >>> This isn't enough, because some definition in some other module may >>> *not be optimized at all*, and yet may get selected at link time. >> > >>> Put another way, it must *prove* that the same choice will *always* >>> be made for all definitions. This is akin to proving that the >>> optimizer is run over all translation units for C++ linkonce_odr >>> functions, which you can't do. >> >> Even if the optimizer is ran, it could take different decision >> because the context would be different: > >> linkonce_odr foo() { >> bar(); >> } > >> If bar() is present in the TU it can gets inlined into foo(). So the >> optimizer would optimize differently foo(). > >>> The result would be failing to optimize the bodies of linkonce_odr >>> functions in any way which was externally detectable such as this. >>> I >>> think that would be *much* worse than losing the ability to do >>> function attribute deduction for such functions? >> > >>>> Thus, with an externally_available function, the CSE in Sanjoy's >>>> original example should be forbidden. Richard's example again >>>> demonstrates this principle, although in this case the >>>> non-determinism is in the choice of a globally-visible >>>> implementation technique rather than non-determinism from >>>> memory-subsystem reordering. >>> >> > >>>> There is a complication, which you imply in your proposal, that >>>> such >>>> optimizations need to be forbidden not just in the >>>> externally_available functions themselves, but in any local >>>> function >>>> transitively called by one. This, however, we can take care of >>>> with >>>> an (easily-deduced) attribute. >>> >> > >> I'm not sure why " such optimizations need to be forbidden [...] in >> any local function transitively called by one", can you illustrate >> with an example? > > Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply.Do you mean like: static void bar() { %t0 = load atomic %ptr %t1 = load atomic %ptr if (%t0 != %t1) print("X"); } void foo() available_externally { bar(); } void main() { foo(); print("Y"); } ``` Sorry, I still fail to understand why you can't optimize bar(), including deducing attributes on it. Any reference to bar() is "final": bar() can't be interposed by the linker. I think you should be able to infer attributes (and do IPA in general on any function that matches GlobalValue::isStrongDefinitionForLinker()). -- Mehdi> > -Hal > >> -- >> Mehdi > >>>> In short, it is not clear to me that the number of problematic >>>> optimizations is large (seems likely restricted to things >>>> involving >>>> atomics in practice), >>> >> >>>> and while I understand the auditing difficulties here, we should >>>> just >>>> restrict these in appropriate contexts instead of trying to >>>> restrict >>>> all information flow into or out of comdats. >>> >> > >>>> -Hal >>> >> > >>>>> 1) Stop deducing function attributes within comdats by >>>>> examining >>>>> the >>> >> >>>>> bodies of the functions (so that we remain free to transform >>>>> the >>> >> >>>>> bodies of functions). >>> >> >>>>> 2) Teach frontends to emit (even at O0!!!) trivially deduced >>>>> function >>> >> >>>>> attributes for comdats so that we continue to catch easy cases. >>> >> >>>>> 3) Ensure and specify that we never hoist code *into* a comdat >>>>> group >>> >> >>>>> in which it would not have been executed previously. I don't >>>>> know >>>>> of >>> >> >>>>> anything in LLVM that does this today, but it would become an >>> >> >>>>> important invariant. >>> >> >>>>> 4) Work a lot harder to do internalizing and removing of this >>> >> >>>>> restriction. >>> >> > >>>>> Pretty horrible. But I think it is correct. >>> >> > >>>>> As a slight modification to #1 and #2, we could have a very >>>>> carefully >>> >> >>>>> crafted deduction rule where we only deduce function attributes >>>>> for >>> >> >>>>> functions prior to any modification of their function bodies. >>>>> Such >>> >> >>>>> attributes should be conservatively correct because we would >>>>> never >>> >> >>>>> lift new code into the function bodies. This would at least >>>>> allow >>>>> us >>> >> >>>>> to do bottom-up deduction to catch interprocedural cases. But >>>>> it >>> >> >>>>> would become incredibly subtle that this is only valid prior to >>> >> >>>>> *any* transformations of the comdat-containing functions. >>> >> > >>>>> I'm starting to think this subtle rule might be worth it. But >>>>> I'm >>> >> >>>>> frankly terrified by the implications. >>> >> > >>>>> On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev < >>> >> >>>>> llvm-dev at lists.llvm.org > wrote: >>> >> > >>>>>> On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev >>>>>> wrote: >>> >> >>>>> >>> >> >>>>>>>> On 2016-Feb-24, at 19:46, Sanjoy Das < >>> >> >>>>>>>> sanjoy at playingwithpointers.com > wrote: >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth < >>> >> >>>>>>>> chandlerc at google.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith >>> >> >>>>> >>> >> >>>>>>>>> < dexonsmith at apple.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> On 2016-Feb-24, at 19:17, Chandler Carruth < >>> >> >>>>>>>>>>> chandlerc at google.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev >>> >> >>>>> >>> >> >>>>>>>>>>> < llvm-dev at lists.llvm.org > wrote: >>> >> >>>>> >>> >> >>>>>>>>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon >>>>>>>>>>> Smith >>> >> >>>>> >>> >> >>>>>>>>>>> < dexonsmith at apple.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>>>>>> If we do not inline @foo(), and instead re-link the >>>>>>>>>>>>> call >>> >> >>>>>>>>>>>>> site >>> >> >>>>>>>>>>>>> in >>> >> >>>>> >>> >> >>>>>>>>>>>>> @main >>> >> >>>>> >>> >> >>>>>>>>>>>>> to some non-optimized copy (or differently optimized >>>>>>>>>>>>> copy) >>> >> >>>>>>>>>>>>> of >>> >> >>>>>>>>>>>>> @foo, >>> >> >>>>> >>> >> >>>>>>>>>>>>> then it is possible for the program to have the >>>>>>>>>>>>> behavior >>> >> >>>>>>>>>>>>> {print("Y"); >>> >> >>>>> >>> >> >>>>>>>>>>>>> print ("X")}, which was disallowed in the earlier >>>>>>>>>>>>> program. >>> >> >>>>> >>> >> >>>>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>>>> In other words, opt refined the semantics of @foo() >>>>>>>>>>>>> (i.e. >>> >> >>>>>>>>>>>>> reduced the >>> >> >>>>> >>> >> >>>>>>>>>>>>> set of behaviors it may have) in ways that would make >>>>>>>>>>>>> later >>> >> >>>>> >>> >> >>>>>>>>>>>>> optimizations invalid if we de-refine the >>>>>>>>>>>>> implementation >>>>>>>>>>>>> of >>> >> >>>>>>>>>>>>> @foo(). >>> >> >>>>> >>> >> >>>>>>>>>>>> I'm probably missing something obvious here. How could >>>>>>>>>>>> the >>> >> >>>>>>>>>>>> result of >>> >> >>>>> >>> >> >>>>>>>>>>>> `%t0 != %t1` be different at optimization time in one >>>>>>>>>>>> file >>> >> >>>>>>>>>>>> than from >>> >> >>>>> >>> >> >>>>>>>>>>>> runtime in the "real" implementation? Doesn't this >>>>>>>>>>>> make >>>>>>>>>>>> the >>> >> >>>>>>>>>>>> CSE >>> >> >>>>> >>> >> >>>>>>>>>>>> invalid? >>> >> >>>>> >>> >> >>>>>>>>>>> `%t0` and `%t1` are "allowed" to "always be the same", >>>>>>>>>>> i.e. >>> >> >>>>>>>>>>> an >>> >> >>>>> >>> >> >>>>>>>>>>> implementation of @foo that always feeds in the same >>> >> >>>>> >>> >> >>>>>>>>>>> value for `%t0` and `%t1` is a valid implementation >>>>>>>>>>> (which >>>>>>>>>>> is >>> >> >>>>>>>>>>> why the >>> >> >>>>> >>> >> >>>>>>>>>>> CSE was valid); but it is not the *only* valid >>> >> >>>>>>>>>>> implementation. >>> >> >>>>>>>>>>> If I >>> >> >>>>> >>> >> >>>>>>>>>>> don't CSE the two load instructions (also a valid thing >>>>>>>>>>> to >>> >> >>>>>>>>>>> do), >>> >> >>>>>>>>>>> and >>> >> >>>>> >>> >> >>>>>>>>>>> this is a second thread writing to `%par`, then the two >>> >> >>>>>>>>>>> values >>> >> >>>>>>>>>>> loaded >>> >> >>>>> >>> >> >>>>>>>>>>> can be different, and you could end up printing `"X"` >>>>>>>>>>> in >>> >> >>>>>>>>>>> `@foo`. >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> Did that make sense? >>> >> >>>>> >>> >> >>>>>>>>>> Yes. To be sure I understand the scope: this is only a >>>>>>>>>> problem >>> >> >>>>>>>>>> for >>> >> >>>>> >>> >> >>>>>>>>>> atomics, correct? (Because multi-threaded behaviour with >>>>>>>>>> other >>> >> >>>>>>>>>> globals >>> >> >>>>> >>> >> >>>>>>>>>> is UB?) >>> >> >>>>> >>> >> >>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>>> Does linkonce_odr linkage have the same problem? >>> >> >>>>> >>> >> >>>>>>>>>>>> - If so, do you want to change it too? >>> >> >>>>> >>> >> >>>>>>>>>>>> - Else, why not? >>> >> >>>>> >>> >> >>>>>>>>>>> Going by the specification in the LangRef, I'd say it >>>>>>>>>>> depends >>> >> >>>>>>>>>>> on how >>> >> >>>>> >>> >> >>>>>>>>>>> you define "definitive". If you're allowed to replace >>>>>>>>>>> the >>> >> >>>>>>>>>>> body >>> >> >>>>>>>>>>> of a >>> >> >>>>> >>> >> >>>>>>>>>>> function with a differently optimized body, then the >>>>>>>>>>> above >>> >> >>>>>>>>>>> problem >>> >> >>>>> >>> >> >>>>>>>>>>> exists. >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> I believe that is the case, and I strongly believe the >>> >> >>>>>>>>>>> problem >>> >> >>>>>>>>>>> you >>> >> >>>>> >>> >> >>>>>>>>>>> outline exists for linkonce_odr exactly as it does for >>> >> >>>>>>>>>>> available_externally. >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> Which is what makes this scary: every C++ inline >>>>>>>>>>> function >>> >> >>>>>>>>>>> today >>> >> >>>>>>>>>>> can >>> >> >>>>> >>> >> >>>>>>>>>>> trigger this. >>> >> >>>>> >>> >> >>>>>>>>>> Every C/C++ inline or template function. But only the >>>>>>>>>> ones >>> >> >>>>>>>>>> that >>> >> >>>>>>>>>> use >>> >> >>>>> >>> >> >>>>>>>>>> atomics, right? >>> >> >>>>> >>> >> >>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>> Well, with *this* example... >>> >> >>>>> >>> >> >>>>>>>> Atomic are one source of non-determinism that compilers >>>>>>>> can >>> >> >>>>>>>> reason >>> >> >>>>> >>> >> >>>>>>>> about. I don't know if the following snippet is well >>>>>>>> defined >>>>>>>> or >>> >> >>>>>>>> not, >>> >> >>>>> >>> >> >>>>>>>> but you could have similar issues with >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> void foo() { >>> >> >>>>> >>> >> >>>>>>>> int *p = malloc(sizeof(int)); >>> >> >>>>> >>> >> >>>>>>>> if (*p < 10) print("X"); >>> >> >>>>> >>> >> >>>>>>>> } >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> or (again, I don't know if this is actually well defined) >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> void foo() { >>> >> >>>>> >>> >> >>>>>>>> int t; // it is probably reasonable to fold compares with >>> >> >>>>> >>> >> >>>>>>>> ptrtoint(alloca) to undef >>> >> >>>>> >>> >> >>>>>>>> if ((intptr_t)(&t) < 10) print("X"); >>> >> >>>>> >>> >> >>>>>>>> } >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>> The first one at least is UB, but as Richard pointed out >>>>>>> the >>> >> >>>>>>> scope >>> >> >>>>> >>> >> >>>>>>> is certainly broader than atomics (it's not even just >>> >> >>>>>>> well-defined >>> >> >>>>> >>> >> >>>>>>> non-deterministism). >>> >> >>>>> >>> >> >>>>>>> >>> >> >>>>> >>> >> >>>>>>> I'm kind of terrified by the implications. >>> >> >>>>> >>> >> >>>>>> Me too. :( >>> >> >>>>> >>> >> >>>>>>> >>> >> >>>>> >>> >> >>>>>>>> -- Sanjoy >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>> Not that I'm sure that will end up being a helpful >>> >> >>>>>>>>>> distinction. >>> >> >>>>> >>> >> >>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>> Right. See Richard's comment. I think that sums up the >>>>>>>>> real >>> >> >>>>>>>>> issue >>> >> >>>>>>>>> here. =/ >>> >> >>>>> >>> >> >>>>>>> _______________________________________________ >>> >> >>>>> >>> >> >>>>>>> LLVM Developers mailing list >>> >> >>>>> >>> >> >>>>>>> llvm-dev at lists.llvm.org >>> >> >>>>> >>> >> >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >>>>> >>> >> > >>>>>> _______________________________________________ >>> >> >>>>> >>> >> >>>>>> LLVM Developers mailing list >>> >> >>>>> >>> >> >>>>>> llvm-dev at lists.llvm.org >>> >> >>>>> >>> >> >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >>>>> >>> >> > >>>>> _______________________________________________ >>> >> >>>>> LLVM Developers mailing list >>> >> >>>>> llvm-dev at lists.llvm.org >>> >> >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> > >>>> -- >>> >> > >>>> -- >>> >> >>>> Hal Finkel >>> >> >>>> Assistant Computational Scientist >>> >> >>>> Leadership Computing Facility >>> >> >>>> Argonne National Laboratory >>> >> > >>> _______________________________________________ >> >>> LLVM Developers mailing list >> >>> llvm-dev at lists.llvm.org >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > -- > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
Possibly Parallel Threads
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")