Sanjoy Das via llvm-dev
2016-Feb-23 17:32 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
On Mon, Feb 22, 2016 at 11:18 PM, Chandler Carruth <chandlerc at gmail.com> wrote:>> # step A: Introduce an `interposable` function attribute >> >> We can bike shed on the name and the exact specification, but the >> general idea is that you cannot do IPA / IPO over callsites calling >> `interposable` functions without inlining them. This attribute will >> (usually) have to be used on function bodies that can deoptimize (e.g. has >> a >> side exit / guard it in); but also has more general use cases. > > > Note that we already have this *exact* concept in the IR via linkage for > better or worse. I think it is really confusing as you are currentlyI was going to have a more detailed discussion on this in the (yet to be started) review thread for `interposable`: we'd like to be able to inline `interposable` functions. The "interposition" can only happen in physical function boundaries, so opt is allowed to do as much IPA/IPO it wants once it makes the physical function boundary go away via inlining. None of linkage types seem to have this property. Part of the challenge here is to specify the attribute in a way that allows inlining, but not IPA without inlining. In fact, maybe it is best to not call it "interposable" at all? Actually, I think one of the problems we're trying to solve with `interposable` is applicable to the available_externally linkage as well. Say we have ``` void foo() available_externally { %t0 = load atomic %ptr %t1 = load atomic %ptr if (%t0 != %t1) print("X"); } void main() { foo(); print("Y"); } ``` Now the possible behaviors of the above program are {print("X"), print("Y")} or {print("Y")}. But if we run opt then we have ``` void foo() available_externally readnone nounwind { ;; After CSE'ing the two loads and folding the condition } void main() { foo(); print("Y"); } ``` and some generic reordering ``` void foo() available_externally readnone nounwind { ;; After CSE'ing the two loads and folding the condition } void main() { print("Y"); foo(); // legal since we're moving a readnone nounwind function that // was guaranteed to execute (hence can't have UB) } ``` Now if we do not inline @foo(), and instead re-link the call site in @main to some non-optimized copy (or differently optimized copy) of foo, then it is possible for the program to have the behavior {print("Y"); print ("X")}, which was disallowed in the earlier program. In other words, opt refined the semantics of @foo() (i.e. reduced the set of behaviors it may have) in ways that would make later optimizations invalid if we de-refine the implementation of @foo(). Given this, I'd say we don't need a new attribute / linkage type, and can add our restriction to the available_externally linkage.> describing it because it seems deeply overlapping with linkage, which is > where the whole interposition thing comes from, and yet you never mention > how it interacts with linkage at all. What does it mean to have a common > linkage function that lacks the interposable attribute? Or a LinkOnceODR > function that does have that attribute?What would you say about adding this as a new kind of linkage? I was trying to avoid doing that since the intended semantics of, GlobalValue::InterposableLinkage don't just describe what a linker does, but also restricts what can be legally linked in (for the can-inline-but-can't-IPA property to hold), but perhaps that's the best way forward? [Edit: I wrote this section before I wrote the available_externally thing above.]> If the goal is to factor replaceability out of linkage, we should actually > factor it out rather than adding yet one more way to talk about this. > > And generally, we need to be *really* careful adding function attributes. > Look at the challenges we had figuring out norecurse. Adding attributes > needs to be viewed as nearly as high cost as adding instructions, > substantially higher cost than intrinsics.Only indirectly relevant to this discussion, but this is news to me -- my mental cost model was "attributes are easy to add and maintain", so I didn't think too hard about alternatives.> I think it would be really helpful to work to describe these things in terms > of semantic contracts on the IR rather than in terms of implementation > strategies. For example, not all IR interacts with an interpreter, and so I > don't think we should use the term "interpreter" to specify the semantic > model exposed by the IR.That's what I was getting at by:>> Chandler raised some points on IRC around making `guard_on` (and >> possibly `side_exit`?) more generally applicable to unmanaged >> languages; so we'd want to be careful to specify these in a way that >> allows for implementations in an unmanaged environments (by function >> cloning, for instance).-- Sanjoy
Chandler Carruth via llvm-dev
2016-Feb-23 18:55 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
On Tue, Feb 23, 2016 at 9:33 AM Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> On Mon, Feb 22, 2016 at 11:18 PM, Chandler Carruth <chandlerc at gmail.com> > wrote: > >> # step A: Introduce an `interposable` function attribute > >> > >> We can bike shed on the name and the exact specification, but the > >> general idea is that you cannot do IPA / IPO over callsites calling > >> `interposable` functions without inlining them. This attribute will > >> (usually) have to be used on function bodies that can deoptimize (e.g. > has > >> a > >> side exit / guard it in); but also has more general use cases. > > > > > > Note that we already have this *exact* concept in the IR via linkage for > > better or worse. I think it is really confusing as you are currently > > I was going to have a more detailed discussion on this in the (yet to > be started) review thread for `interposable`: we'd like to be able to > inline `interposable` functions. The "interposition" can only happen > in physical function boundaries, so opt is allowed to do as much > IPA/IPO it wants once it makes the physical function boundary go away > via inlining. None of linkage types seem to have this property. > > Part of the challenge here is to specify the attribute in a way that > allows inlining, but not IPA without inlining. In fact, maybe it is > best to not call it "interposable" at all? >Yea, this is something *very* different from interposable. GCC and other compilers that work to support symbol interposition make specific efforts to not inline them in specific ways (that frankly I don't fully understand, as it doesn't seem to be always which is what the definition of interposable indicates to me...).> > Actually, I think one of the problems we're trying to solve with > `interposable` is applicable to the available_externally linkage as > well. Say we have > > ``` > void foo() available_externally { > %t0 = load atomic %ptr > %t1 = load atomic %ptr > if (%t0 != %t1) print("X"); > } > void main() { > foo(); > print("Y"); > } > ``` > > Now the possible behaviors of the above program are {print("X"), > print("Y")} or {print("Y")}. But if we run opt then we have > > ``` > void foo() available_externally readnone nounwind { > ;; After CSE'ing the two loads and folding the condition > } > void main() { > foo(); > print("Y"); > } > ``` > > and some generic reordering > > ``` > void foo() available_externally readnone nounwind { > ;; After CSE'ing the two loads and folding the condition > } > void main() { > print("Y"); > foo(); // legal since we're moving a readnone nounwind function that > // was guaranteed to execute (hence can't have UB) > } > ``` > > Now if we do not inline @foo(), and instead re-link the call site in > @main to some non-optimized copy (or differently optimized copy) of > foo, then it is possible for the program to have the behavior > {print("Y"); print ("X")}, which was disallowed in the earlier > program. > > In other words, opt refined the semantics of @foo() (i.e. reduced the > set of behaviors it may have) in ways that would make later > optimizations invalid if we de-refine the implementation of @foo(). > > Given this, I'd say we don't need a new attribute / linkage type, and > can add our restriction to the available_externally linkage. >Interesting example, I agree it seems quite broken. Even more interesting, I can't see anything we do in LLVM that prevents this from breaking essentially everywhere. =[[[[[[ link_once and link_once_odr at least seem equally broken because we don't put the caller and callee into a single comdat or anything to ensure that the optimized one is selected at link time. But there are also multiple different kinds of overriding we should think about: 1) Can the definition get replaced at link time (or at runtime via an interpreter) with a differently *optimized* variant stemming from the same definition (thus it has the same behavior but not the same refinement). This is the "ODR" guarantee in some linkages (and vaguely implied for available_externally) 2) Can the definition get replaced at link time (or at runtime via an interpreter) with a function that has fundamentally different behavior 3) To support replacing the definition, the call edge must be preserved. To support interposition you need #3, the most restrictive model. LLVM (i think) actually does a decent job of modeling this as we say that the function is totally opaque. We don't do IPA or inlining. But I don't think that's what you're looking for. I'm curious whether your use case is actually in the #1 bucket or #2 bucket. That is, I'm wondering if there is any way in which the "different implementation" would actually break in the face of optimizations on things like *non-deduced* function attributes, etc. If your use case looks more like #1, then I actually think this is what we want for link_once_odr and available_externally. You probably want the former rather than the latter as you don't want it to be discardable. If your use case looks more like #2, then I think its essentially "link_once" or "link_any", and it isn't clear that LLVM does a great job of modeling this today. I'd be mildly interested in factoring the discarding semantics from the "what do other definitions look like" semantics. The former are what I think fit cleanly into linkages, and the latter I think we wedged into them because they seemed to correspond in some cases and because attributes used to be very limited in number. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160223/ecb0ddf1/attachment.html>
Sanjoy Das via llvm-dev
2016-Feb-23 23:06 UTC
[llvm-dev] RFC: Add guard intrinsics to LLVM
On Tue, Feb 23, 2016 at 10:55 AM, Chandler Carruth <chandlerc at gmail.com> wrote:>> Part of the challenge here is to specify the attribute in a way that >> allows inlining, but not IPA without inlining. In fact, maybe it is >> best to not call it "interposable" at all? > > > Yea, this is something *very* different from interposable. GCC and other > compilers that work to support symbol interposition make specific efforts to > not inline them in specific ways (that frankly I don't fully understand, as > it doesn't seem to be always which is what the definition of interposable > indicates to me...).Sure, not calling it interposable is fine for me. Credit where credit is due: Philip had warned me about this exact thing offline (that the term "interposable" is already taken).>> In other words, opt refined the semantics of @foo() (i.e. reduced the >> set of behaviors it may have) in ways that would make later >> optimizations invalid if we de-refine the implementation of @foo(). >> >> Given this, I'd say we don't need a new attribute / linkage type, and >> can add our restriction to the available_externally linkage. > > > Interesting example, I agree it seems quite broken. Even more interesting, I > can't see anything we do in LLVM that prevents this from breaking > essentially everywhere. =[[[[[[ > > link_once and link_once_odr at least seem equally broken because we don't > put the caller and callee into a single comdat or anything to ensure that > the optimized one is selected at link time. > > But there are also multiple different kinds of overriding we should think > about: > > 1) Can the definition get replaced at link time (or at runtime via an > interpreter) with a differently *optimized* variant stemming from the same > definition (thus it has the same behavior but not the same refinement). This > is the "ODR" guarantee in some linkages (and vaguely implied for > available_externally) > > 2) Can the definition get replaced at link time (or at runtime via an > interpreter) with a function that has fundamentally different behavior > > 3) To support replacing the definition, the call edge must be preserved.I'm working under context of a optimizer that does not know if its input has been previously optimized or if its input is "raw" IR. Realistically, I'd say deviating LLVM from this will be painful. Given that I don't see how (2) and (3) are different: Firstly, (1) and (2) are not _that_ different -- a differently optimized variant of a function can have completely different observable behavior (e.g. the "original" function could have started with "if (*ptr != *ptr) { call @unknown(); return; }"). The only practical difference I can see between (1) and (2) is that in (2) inlining is incorrect since it would be retroactively invalid on replacement. In (1) we have the invariant that the function in question is always *a* valid implementation of what we started with, but this can not be used to infer anything about the function we'll actually call at runtime. Thus, I don't understand the difference between (2) and (3); both of them seem to imply "don't do IPA/IPO, including inlining" while (1) implies "the only IPA/IPO you can do is inlining".> I'm curious whether your use case is actually in the #1 bucket or #2 > bucket. That is, I'm wondering if there is any way in which the > "different implementation" would actually break in the face of > optimizations on things like *non-deduced* function attributes, etc.With the understanding I have at this time (that isn't complete, as I say above) I'd say we're (1). We can replace a possibly inlined callee with another arbitrary function, but if that happens the runtime will deoptimize the caller. I'm not sure if I understood your second statement -- but assuming I did -- we do "manually" attach attributes to some well-known functions (e.g. in the standard library), but they never get replaced. -- Sanjoy