Mehdi Amini via llvm-dev
2016-Feb-25 06:28 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
> On Feb 24, 2016, at 10:20 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- > >> From: "Mehdi Amini" <mehdi.amini at apple.com> >> To: "Chandler Carruth" <chandlerc at google.com> >> Cc: "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" >> <llvm-dev at lists.llvm.org> >> Sent: Thursday, February 25, 2016 12:02:16 AM >> Subject: Re: [llvm-dev] Possible soundness issue with >> available_externally (split from "RFC: Add guard intrinsics") > >>> On Feb 24, 2016, at 9:41 PM, Chandler Carruth via llvm-dev < >>> llvm-dev at lists.llvm.org > wrote: >> > >>> On Wed, Feb 24, 2016 at 9:35 PM Hal Finkel < hfinkel at anl.gov > >>> wrote: >> > >>>> ----- Original Message ----- >>> >> > >>>>> From: "Chandler Carruth via llvm-dev" < llvm-dev at lists.llvm.org >>>>>> >>> >> >>>>> To: "Philip Reames" < listmail at philipreames.com >, "Duncan P. >>>>> N. >>>>> Exon >>> >> >>>>> Smith" < dexonsmith at apple.com >, "Sanjoy Das" >>> >> >>>>> < sanjoy at playingwithpointers.com > >>> >> >>>>> Cc: "llvm-dev" < llvm-dev at lists.llvm.org > >>> >> >>>>> Sent: Wednesday, February 24, 2016 10:29:23 PM >>> >> >>>>> Subject: Re: [llvm-dev] Possible soundness issue with >>> >> >>>>> available_externally (split from "RFC: Add guard intrinsics") >>> >> > >>>>> Yea, I'm pretty sad about all of this. I'm also not seeing a >>>>> lot >>>>> of >>> >> >>>>> awesome paths forward. >>> >> > >>>>> Here is the least bad strategy I can come up with. Curious if >>>>> folks >>> >> >>>>> think this is sufficient: >>> >> > >>>> I may not completely understand the problem, but this seems like >>>> overkill. The underlying restriction is that, if the compiler >>>> makes >>>> a non-determinism-collapsing choice when optimizing a function, >>>> it >>>> must make the same choice for all definitions of that function >>>> (undefined behavior excluded). >>> >> >>> This isn't enough, because some definition in some other module may >>> *not be optimized at all*, and yet may get selected at link time. >> > >>> Put another way, it must *prove* that the same choice will *always* >>> be made for all definitions. This is akin to proving that the >>> optimizer is run over all translation units for C++ linkonce_odr >>> functions, which you can't do. >> >> Even if the optimizer is ran, it could take different decision >> because the context would be different: > >> linkonce_odr foo() { >> bar(); >> } > >> If bar() is present in the TU it can gets inlined into foo(). So the >> optimizer would optimize differently foo(). > >>> The result would be failing to optimize the bodies of linkonce_odr >>> functions in any way which was externally detectable such as this. >>> I >>> think that would be *much* worse than losing the ability to do >>> function attribute deduction for such functions? >> > >>>> Thus, with an externally_available function, the CSE in Sanjoy's >>>> original example should be forbidden. Richard's example again >>>> demonstrates this principle, although in this case the >>>> non-determinism is in the choice of a globally-visible >>>> implementation technique rather than non-determinism from >>>> memory-subsystem reordering. >>> >> > >>>> There is a complication, which you imply in your proposal, that >>>> such >>>> optimizations need to be forbidden not just in the >>>> externally_available functions themselves, but in any local >>>> function >>>> transitively called by one. This, however, we can take care of >>>> with >>>> an (easily-deduced) attribute. >>> >> > >> I'm not sure why " such optimizations need to be forbidden [...] in >> any local function transitively called by one", can you illustrate >> with an example? > > Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply.Do you mean like: static void bar() { %t0 = load atomic %ptr %t1 = load atomic %ptr if (%t0 != %t1) print("X"); } void foo() available_externally { bar(); } void main() { foo(); print("Y"); } ``` Sorry, I still fail to understand why you can't optimize bar(), including deducing attributes on it. Any reference to bar() is "final": bar() can't be interposed by the linker. I think you should be able to infer attributes (and do IPA in general on any function that matches GlobalValue::isStrongDefinitionForLinker()). -- Mehdi> > -Hal > >> -- >> Mehdi > >>>> In short, it is not clear to me that the number of problematic >>>> optimizations is large (seems likely restricted to things >>>> involving >>>> atomics in practice), >>> >> >>>> and while I understand the auditing difficulties here, we should >>>> just >>>> restrict these in appropriate contexts instead of trying to >>>> restrict >>>> all information flow into or out of comdats. >>> >> > >>>> -Hal >>> >> > >>>>> 1) Stop deducing function attributes within comdats by >>>>> examining >>>>> the >>> >> >>>>> bodies of the functions (so that we remain free to transform >>>>> the >>> >> >>>>> bodies of functions). >>> >> >>>>> 2) Teach frontends to emit (even at O0!!!) trivially deduced >>>>> function >>> >> >>>>> attributes for comdats so that we continue to catch easy cases. >>> >> >>>>> 3) Ensure and specify that we never hoist code *into* a comdat >>>>> group >>> >> >>>>> in which it would not have been executed previously. I don't >>>>> know >>>>> of >>> >> >>>>> anything in LLVM that does this today, but it would become an >>> >> >>>>> important invariant. >>> >> >>>>> 4) Work a lot harder to do internalizing and removing of this >>> >> >>>>> restriction. >>> >> > >>>>> Pretty horrible. But I think it is correct. >>> >> > >>>>> As a slight modification to #1 and #2, we could have a very >>>>> carefully >>> >> >>>>> crafted deduction rule where we only deduce function attributes >>>>> for >>> >> >>>>> functions prior to any modification of their function bodies. >>>>> Such >>> >> >>>>> attributes should be conservatively correct because we would >>>>> never >>> >> >>>>> lift new code into the function bodies. This would at least >>>>> allow >>>>> us >>> >> >>>>> to do bottom-up deduction to catch interprocedural cases. But >>>>> it >>> >> >>>>> would become incredibly subtle that this is only valid prior to >>> >> >>>>> *any* transformations of the comdat-containing functions. >>> >> > >>>>> I'm starting to think this subtle rule might be worth it. But >>>>> I'm >>> >> >>>>> frankly terrified by the implications. >>> >> > >>>>> On Wed, Feb 24, 2016 at 8:13 PM Philip Reames via llvm-dev < >>> >> >>>>> llvm-dev at lists.llvm.org > wrote: >>> >> > >>>>>> On 02/24/2016 08:10 PM, Duncan P. N. Exon Smith via llvm-dev >>>>>> wrote: >>> >> >>>>> >>> >> >>>>>>>> On 2016-Feb-24, at 19:46, Sanjoy Das < >>> >> >>>>>>>> sanjoy at playingwithpointers.com > wrote: >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> On Wed, Feb 24, 2016 at 7:38 PM, Chandler Carruth < >>> >> >>>>>>>> chandlerc at google.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>> On Wed, Feb 24, 2016 at 7:34 PM Duncan P. N. Exon Smith >>> >> >>>>> >>> >> >>>>>>>>> < dexonsmith at apple.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> On 2016-Feb-24, at 19:17, Chandler Carruth < >>> >> >>>>>>>>>>> chandlerc at google.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> On Wed, Feb 24, 2016 at 7:10 PM Sanjoy Das via llvm-dev >>> >> >>>>> >>> >> >>>>>>>>>>> < llvm-dev at lists.llvm.org > wrote: >>> >> >>>>> >>> >> >>>>>>>>>>> On Wed, Feb 24, 2016 at 6:51 PM, Duncan P. N. Exon >>>>>>>>>>> Smith >>> >> >>>>> >>> >> >>>>>>>>>>> < dexonsmith at apple.com > wrote: >>> >> >>>>> >>> >> >>>>>>>>>>>>> If we do not inline @foo(), and instead re-link the >>>>>>>>>>>>> call >>> >> >>>>>>>>>>>>> site >>> >> >>>>>>>>>>>>> in >>> >> >>>>> >>> >> >>>>>>>>>>>>> @main >>> >> >>>>> >>> >> >>>>>>>>>>>>> to some non-optimized copy (or differently optimized >>>>>>>>>>>>> copy) >>> >> >>>>>>>>>>>>> of >>> >> >>>>>>>>>>>>> @foo, >>> >> >>>>> >>> >> >>>>>>>>>>>>> then it is possible for the program to have the >>>>>>>>>>>>> behavior >>> >> >>>>>>>>>>>>> {print("Y"); >>> >> >>>>> >>> >> >>>>>>>>>>>>> print ("X")}, which was disallowed in the earlier >>>>>>>>>>>>> program. >>> >> >>>>> >>> >> >>>>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>>>> In other words, opt refined the semantics of @foo() >>>>>>>>>>>>> (i.e. >>> >> >>>>>>>>>>>>> reduced the >>> >> >>>>> >>> >> >>>>>>>>>>>>> set of behaviors it may have) in ways that would make >>>>>>>>>>>>> later >>> >> >>>>> >>> >> >>>>>>>>>>>>> optimizations invalid if we de-refine the >>>>>>>>>>>>> implementation >>>>>>>>>>>>> of >>> >> >>>>>>>>>>>>> @foo(). >>> >> >>>>> >>> >> >>>>>>>>>>>> I'm probably missing something obvious here. How could >>>>>>>>>>>> the >>> >> >>>>>>>>>>>> result of >>> >> >>>>> >>> >> >>>>>>>>>>>> `%t0 != %t1` be different at optimization time in one >>>>>>>>>>>> file >>> >> >>>>>>>>>>>> than from >>> >> >>>>> >>> >> >>>>>>>>>>>> runtime in the "real" implementation? Doesn't this >>>>>>>>>>>> make >>>>>>>>>>>> the >>> >> >>>>>>>>>>>> CSE >>> >> >>>>> >>> >> >>>>>>>>>>>> invalid? >>> >> >>>>> >>> >> >>>>>>>>>>> `%t0` and `%t1` are "allowed" to "always be the same", >>>>>>>>>>> i.e. >>> >> >>>>>>>>>>> an >>> >> >>>>> >>> >> >>>>>>>>>>> implementation of @foo that always feeds in the same >>> >> >>>>> >>> >> >>>>>>>>>>> value for `%t0` and `%t1` is a valid implementation >>>>>>>>>>> (which >>>>>>>>>>> is >>> >> >>>>>>>>>>> why the >>> >> >>>>> >>> >> >>>>>>>>>>> CSE was valid); but it is not the *only* valid >>> >> >>>>>>>>>>> implementation. >>> >> >>>>>>>>>>> If I >>> >> >>>>> >>> >> >>>>>>>>>>> don't CSE the two load instructions (also a valid thing >>>>>>>>>>> to >>> >> >>>>>>>>>>> do), >>> >> >>>>>>>>>>> and >>> >> >>>>> >>> >> >>>>>>>>>>> this is a second thread writing to `%par`, then the two >>> >> >>>>>>>>>>> values >>> >> >>>>>>>>>>> loaded >>> >> >>>>> >>> >> >>>>>>>>>>> can be different, and you could end up printing `"X"` >>>>>>>>>>> in >>> >> >>>>>>>>>>> `@foo`. >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> Did that make sense? >>> >> >>>>> >>> >> >>>>>>>>>> Yes. To be sure I understand the scope: this is only a >>>>>>>>>> problem >>> >> >>>>>>>>>> for >>> >> >>>>> >>> >> >>>>>>>>>> atomics, correct? (Because multi-threaded behaviour with >>>>>>>>>> other >>> >> >>>>>>>>>> globals >>> >> >>>>> >>> >> >>>>>>>>>> is UB?) >>> >> >>>>> >>> >> >>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>>> Does linkonce_odr linkage have the same problem? >>> >> >>>>> >>> >> >>>>>>>>>>>> - If so, do you want to change it too? >>> >> >>>>> >>> >> >>>>>>>>>>>> - Else, why not? >>> >> >>>>> >>> >> >>>>>>>>>>> Going by the specification in the LangRef, I'd say it >>>>>>>>>>> depends >>> >> >>>>>>>>>>> on how >>> >> >>>>> >>> >> >>>>>>>>>>> you define "definitive". If you're allowed to replace >>>>>>>>>>> the >>> >> >>>>>>>>>>> body >>> >> >>>>>>>>>>> of a >>> >> >>>>> >>> >> >>>>>>>>>>> function with a differently optimized body, then the >>>>>>>>>>> above >>> >> >>>>>>>>>>> problem >>> >> >>>>> >>> >> >>>>>>>>>>> exists. >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> I believe that is the case, and I strongly believe the >>> >> >>>>>>>>>>> problem >>> >> >>>>>>>>>>> you >>> >> >>>>> >>> >> >>>>>>>>>>> outline exists for linkonce_odr exactly as it does for >>> >> >>>>>>>>>>> available_externally. >>> >> >>>>> >>> >> >>>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>>> Which is what makes this scary: every C++ inline >>>>>>>>>>> function >>> >> >>>>>>>>>>> today >>> >> >>>>>>>>>>> can >>> >> >>>>> >>> >> >>>>>>>>>>> trigger this. >>> >> >>>>> >>> >> >>>>>>>>>> Every C/C++ inline or template function. But only the >>>>>>>>>> ones >>> >> >>>>>>>>>> that >>> >> >>>>>>>>>> use >>> >> >>>>> >>> >> >>>>>>>>>> atomics, right? >>> >> >>>>> >>> >> >>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>> Well, with *this* example... >>> >> >>>>> >>> >> >>>>>>>> Atomic are one source of non-determinism that compilers >>>>>>>> can >>> >> >>>>>>>> reason >>> >> >>>>> >>> >> >>>>>>>> about. I don't know if the following snippet is well >>>>>>>> defined >>>>>>>> or >>> >> >>>>>>>> not, >>> >> >>>>> >>> >> >>>>>>>> but you could have similar issues with >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> void foo() { >>> >> >>>>> >>> >> >>>>>>>> int *p = malloc(sizeof(int)); >>> >> >>>>> >>> >> >>>>>>>> if (*p < 10) print("X"); >>> >> >>>>> >>> >> >>>>>>>> } >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> or (again, I don't know if this is actually well defined) >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>> void foo() { >>> >> >>>>> >>> >> >>>>>>>> int t; // it is probably reasonable to fold compares with >>> >> >>>>> >>> >> >>>>>>>> ptrtoint(alloca) to undef >>> >> >>>>> >>> >> >>>>>>>> if ((intptr_t)(&t) < 10) print("X"); >>> >> >>>>> >>> >> >>>>>>>> } >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>> The first one at least is UB, but as Richard pointed out >>>>>>> the >>> >> >>>>>>> scope >>> >> >>>>> >>> >> >>>>>>> is certainly broader than atomics (it's not even just >>> >> >>>>>>> well-defined >>> >> >>>>> >>> >> >>>>>>> non-deterministism). >>> >> >>>>> >>> >> >>>>>>> >>> >> >>>>> >>> >> >>>>>>> I'm kind of terrified by the implications. >>> >> >>>>> >>> >> >>>>>> Me too. :( >>> >> >>>>> >>> >> >>>>>>> >>> >> >>>>> >>> >> >>>>>>>> -- Sanjoy >>> >> >>>>> >>> >> >>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>>> Not that I'm sure that will end up being a helpful >>> >> >>>>>>>>>> distinction. >>> >> >>>>> >>> >> >>>>>>>>> >>> >> >>>>> >>> >> >>>>>>>>> Right. See Richard's comment. I think that sums up the >>>>>>>>> real >>> >> >>>>>>>>> issue >>> >> >>>>>>>>> here. =/ >>> >> >>>>> >>> >> >>>>>>> _______________________________________________ >>> >> >>>>> >>> >> >>>>>>> LLVM Developers mailing list >>> >> >>>>> >>> >> >>>>>>> llvm-dev at lists.llvm.org >>> >> >>>>> >>> >> >>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >>>>> >>> >> > >>>>>> _______________________________________________ >>> >> >>>>> >>> >> >>>>>> LLVM Developers mailing list >>> >> >>>>> >>> >> >>>>>> llvm-dev at lists.llvm.org >>> >> >>>>> >>> >> >>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> >>>>> >>> >> > >>>>> _______________________________________________ >>> >> >>>>> LLVM Developers mailing list >>> >> >>>>> llvm-dev at lists.llvm.org >>> >> >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> > >>>> -- >>> >> > >>>> -- >>> >> >>>> Hal Finkel >>> >> >>>> Assistant Computational Scientist >>> >> >>>> Leadership Computing Facility >>> >> >>>> Argonne National Laboratory >>> >> > >>> _______________________________________________ >> >>> LLVM Developers mailing list >> >>> llvm-dev at lists.llvm.org >> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > -- > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
Sanjoy Das via llvm-dev
2016-Feb-25 06:36 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
Mehdi Amini via llvm-dev wrote: >> Take Sanjoy's example with the two atomic loads, but outline the body of the function into some private function. The same restrictions need apply. > > Do you mean like: > > static void bar() { > %t0 = load atomic %ptr > %t1 = load atomic %ptr > if (%t0 != %t1) print("X"); > } > void foo() available_externally { > bar(); > } > void main() { > foo(); > print("Y"); > } > ``` > > Sorry, I still fail to understand why you can't optimize bar(), > including deducing attributes on it. Any reference to bar() is > "final": bar() can't be interposed by the linker. > > I think you should be able to infer attributes (and do IPA in > general on any function that matches > GlobalValue::isStrongDefinitionForLinker()). > I suppose you could have one instance of foo() inline bar() _before_ any optimization, and another instance of foo() call a refined readnone bar. Then replacing the latter with the former will be a problem. -- Sanjoy
Sanjoy Das via llvm-dev
2016-Feb-25 06:45 UTC
[llvm-dev] Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
Mehdi pointed out on IRC that my explanation is incorrect. Specifically, in the case I'm trying to sketch out, replacing the latter with the former will *not* be a problem, since the latter won't have any function attributes added by the virtue of being something from a comdat.> I suppose you could have one instance of foo() inline bar() _before_ > any optimization, and another instance of foo() call a refined > readnone bar. Then replacing the latter with the former will be > a problem.-- Sanjoy
Possibly Parallel Threads
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")
- Possible soundness issue with available_externally (split from "RFC: Add guard intrinsics")