On 07/17/2015 02:03 AM, Chandler Carruth wrote:> On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel <hfinkel at anl.gov > <mailto:hfinkel at anl.gov>> wrote: > > ----- Original Message ----- > > From: "Chandler Carruth" <chandlerc at google.com > <mailto:chandlerc at google.com>> > > To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> > > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>> > > Sent: Thursday, July 16, 2015 2:33:21 AM > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops > > > > > > > > > > On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < hfinkel at anl.gov > <mailto:hfinkel at anl.gov> > > > wrote: > > > > > > ----- Original Message ----- > > > From: "Chandler Carruth" < chandlerc at google.com > <mailto:chandlerc at google.com> > > > > To: "Hal Finkel" < hfinkel at anl.gov <mailto:hfinkel at anl.gov> >, > "LLVM Dev" < > > > llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > > > > Sent: Thursday, July 16, 2015 1:00:05 AM > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops > > > > > > > > > FWIW, I'm very much in favor of having a firm and clear answer to > > > these questions. > > > > > > I also agree that it is an absolute requirement that LLVM have > > > *some* > > > mechanism for supporting both languages with defined behavior for > > > infinite loops and a language requirement that all loops > terminate. > > > > > > > > > However, I'd like to float an alternative approach. I've not spent > > > a > > > lot of time thinking about it, so I'm not sure its actually > better. > > > I'm wondering if you've already thought about it. > > > > > > > > > What if we have an @llvm.noop.sideeffect() or some such which > > > doesn't > > > read or write memory in any way, but which a frontend can place > > > inside a loop body to mark that its execution (perhaps infinitely) > > > is in-and-of-itself a side effect of the program. We could then > > > teach loop unrolling or the few other things that would care to > > > collapse multiple ones into a single one, and not count them > > > towards > > > anything. > > > > > > > > > I know an intrinsic is kind of awkward for this, but it seems like > > > the least bad way we have to annotate a loop in a fully generic > > > way. > > > I'd somewhat like this to be a property of the *loop* and not of > > > the > > > function. And it really needs to be truly generic, handling > > > unnatural CFGs and even recursion-loops. My only idea for how to > > > accomplish that is an intrinsic to mark the dynamic path which if > > > executed infinitely can't be eliminated. > > > > My largest concern is that the frontend would need to add these > > things all over the place, not just before the loop backedges. For > > one thing, if the language has gotos, where should they be inserted? > > > > > > The target of every goto. > > > > > > For computed goto, very label whose address is taken. > > > > > > This at least doesn't seem that bad to me. > > > > > > Before every branch will be conservatively correct, but I'm worried > > that will unnecessarily block optimizations. They'd also be needed > > at the entry to every function. > > > > > > Only external, address taken, or internal-and-recursively-called > > functions. All of which we already have some blockers to > > optimization, so this part i'm not worried about. > > > > > > On the other hand, maybe if we added an optimization that removed > > these things along any control-flow path that already had any other > > side effect, it might not be too bad? > > > > > > > > Absolutely, much like lower-expect, we'd need to make sure that easy > > cases were folded quickly in the optimizer so this didn't get out of > > hand. > > > > > > > > > > > > > > > As for why I'm particularly interested in this being a property of > > > the loop, consider if you were to have a mixture of Java and C++ > > > code, all compiled to LLVM. How do you inline between them? > > > > > > > You add the attribute to the caller. > > > > > > This has the really unfortunate side effect of pessimizing code > > during cross language optimizations. > > > > > > FWIW, I suspect I might care a lot about this particular case > > (because I believe that Fortran has defined behavior for infinite > > loops). > > > > > > > > Yea, you could argue that C does too, which is one reason why I'm so > > interested in this being done really well even in an LTO situation. > > > > > > I think it would be really useful to not have this cross between > > adjacent loops after inlining when they come from different source > > languages, and it would be nice for it to not apply to nested loops > > when those nested loops were inlined from a language without this > > guarantee. > > > > > > But I'm still not convinced that the noise of the intrinsic is > > *definitely* worth it. I come from the background of the C++ rules' > > rationale, and so I naturally see the languages that define this as > > giving up optimizations and so wanting to localize the impact of > > that... Not sure that's actually the right perspective though. ;] > > > > I'm leaning toward agreeing with you, primarily because I think it > will more-naturally fit into the optimizer than the attribute. We > need to check loops for side effects anyway (if we otherwise > default to C++-like rules), and so this intrinsic will do the > right thing without any special logic. > > > FWIW, this is why I first started thinking along these lines. I'm > increasingly thinking that if this approach works it will make the > implementation of testing for this more natural in the optimizer. > Simple things like instruction predicates will "just work", etc. > > I'm really interested in the downsides. You mentioned a few potential > ones, but seem to be happy with my responses. I wonder, are there others?I'm really not a fan of this approach. I think it could be made to work, but I suspect we'd run into a lot of quality of implementation issues if we went down this route. We'd have to teach many places in the optimizer to merge or split such calls. Consider simple things like tail commoning, if-then-else hoisting, or the like. In each of these, we'd need code to recognize having the intrinsic on one path, but not the other, and then still perform the optimization. Think the various debug intrinsics, but worse. I worry about the interactions with memory aliasing and hoisting rules. We don't currently have the notion of a non-memory side effect in LLVM. To prevent this intrinsic from being DCE'd we'd likely need to mark it as writing to some known location. Given the number of places in the optimizer which give up when encountering any write (EarlyCSE for one), that would be problematic for optimization effectiveness. The other approach would be to teach LLVM about non-memory side effects. I think this is solvable, but a large investment. In practice, most of the contributors to LLVM care about C++. I worry we'd end up in a situation where languages which need infinite loops would become second class citizens and that a large number of optimizations wouldn't apply to them in practice. This is by far my biggest worry. Now, I'm certainly biased, but I'd much rather see a scheme where a quality of implementation issue effects the performance of C++. These are far more likely to be actually fixed. :) Earlier in this thread, the idea of using metadata on loops was mentioned. Chandler's point about generic handling for recursive loops is a good one, but in general, a metadata annotation of finiteness seems like a better starting point. What if we introduced a piece of branch (really, control transfer instruction) metadata (not loop) called "productive" (to steal Sanjoy's term) whose semantics are such that it can be assumed to only execute a finite number of times between productive actions (volatile, synchronization, io, etc..). We then tag *all* branches emitted by Clang with this metadata. This gives us the benefit of the loop metadata in that a single tagged backedge branch implies a productive loop, but allows productive recursive functions to be converted into productive loops in a natural way. The function attribute "productive" now has an easy inference rule in that if all branches in the function are productive and all callees are productive, so is the function. This would seem to allow us to perform DSE, LICM, and related optimizations without trouble. Inlining now has a reasonable definition where you can inline between languages w/the semantics of each language preserved. One cool thing here is that the set of operations which are "productive" could actually be encoded in the metadata. This could potentially support other languages than C++ w.r.t. the "no infinite loops except when" type rules. Thoughts? Philip p.s. The current implementation of readonly is correct w.r.t. C++ rules only because we designate all atomic, volatile, or fencing operations as read/write. One thing we might end up with out of this discussion is the possibility of talking about functions which are readonly, but involved in synchronization. That seems like it might be useful for optimizing concurrent C++ programs in interesting ways. It would require a separate notion of a synchronization side effect independent of read/write though. This seems related, but slightly orthogonal to the assumptions about finiteness of loops. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150717/f6a09581/attachment.html>
> "productive" (to steal Sanjoy's term)Just to make it clear: this isn't something I came up with. It is used (perhaps not extensively) in describing co-inductive data structures. Insofar I understand the concept, it roughly means that to get to a finite number of "things" (external actions, in our case) you only have to execute a finite number of steps.> What if we introduced a piece of branch (really, control transfer > instruction) metadata (not loop) called "productive" (to steal Sanjoy's > term) whose semantics are such that it can be assumed to only execute a > finite number of times between productive actions (volatile, > synchronization, io, etc..). We then tag *all* branches emitted by Clang > with this metadata. This gives us the benefit of the loop metadata in that a > single tagged backedge branch implies a productive loop, but allows > productive recursive functions to be converted into productive loops in a > natural way.I think this is a good idea. However, I'll suggest one modification: our specification of a "productive" control transfer should be that only a finite number of *productive control transfers* can run between productive actions (volatile write, IO etc.). This is subtly different from stating that any one productive control transfer executes only a finite number of times between productive actions. While discussing offline we decided that these were equivalent (and theoretically they are), but this second definition lets simple local CFG manipulation be more aggressive. For instance, say you wish to split an edge or a basic block. With your definition, marking the newly inserted branches as productive would involve proving that the new branch is not part of an existing (well defined) infinite loop, since inserting a productive branch in an otherwise well-defined infinite loop will introduce UB. However, if we go with this second definition, we can (aggressively) mark the newly inserted branches as productive, as long as they themselves do not introduce a cycle. Another way of phrasing my suggestion is that any infinite "loop" that does not do (IO / volatile memory access / atomic operations) has to have at least one non-productive control transfer. A "loop" in this definition has to be general enough to encompass all possible control flow transfer mechanisms (so mutual recursion will have to count as "loop"ing, for whatever definition we come up for a loop). The only thing that bugs me about this notion of "productivity" (both your idea and mine) is that, going by the Java spec, it isn't that a thread stalled in an infinite loop is not performing any actions; but the stalling is itself an "external" action (similar to writing to a socket or program termination). It would be nice if it were possible to precisely capture that. -- Sanjoy
----- Original Message -----> From: "Philip Reames" <listmail at philipreames.com> > To: "Chandler Carruth" <chandlerc at google.com>, "Hal Finkel" > <hfinkel at anl.gov> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, July 17, 2015 7:06:33 PM > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops> On 07/17/2015 02:03 AM, Chandler Carruth wrote:> > On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel < hfinkel at anl.gov > > > wrote: >> > > ----- Original Message ----- > > > > > > > From: "Chandler Carruth" < chandlerc at google.com > > > > > > > > To: "Hal Finkel" < hfinkel at anl.gov > > > > > > > > Cc: "LLVM Dev" < llvmdev at cs.uiuc.edu > > > > > > > > Sent: Thursday, July 16, 2015 2:33:21 AM > > > > > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < hfinkel at anl.gov > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > >> > > > > From: "Chandler Carruth" < chandlerc at google.com > > > > > > > > > To: "Hal Finkel" < hfinkel at anl.gov >, "LLVM Dev" < > > > > > > > > llvmdev at cs.uiuc.edu > > > > > > > > > Sent: Thursday, July 16, 2015 1:00:05 AM > > > > > > > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops > > > > > > > > > > > > > > > > > > > > > > > > FWIW, I'm very much in favor of having a firm and clear > > > > > answer > > > > > to > > > > > > > > these questions. > > > > > > > > > > > > > > > > I also agree that it is an absolute requirement that LLVM > > > > > have > > > > > > > > *some* > > > > > > > > mechanism for supporting both languages with defined behavior > > > > > for > > > > > > > > infinite loops and a language requirement that all loops > > > > > terminate. > > > > > > > > > > > > > > > > > > > > > > > > However, I'd like to float an alternative approach. I've not > > > > > spent > > > > > > > > a > > > > > > > > lot of time thinking about it, so I'm not sure its actually > > > > > better. > > > > > > > > I'm wondering if you've already thought about it. > > > > > > > > > > > > > > > > > > > > > > > > What if we have an @llvm.noop.sideeffect() or some such which > > > > > > > > doesn't > > > > > > > > read or write memory in any way, but which a frontend can > > > > > place > > > > > > > > inside a loop body to mark that its execution (perhaps > > > > > infinitely) > > > > > > > > is in-and-of-itself a side effect of the program. We could > > > > > then > > > > > > > > teach loop unrolling or the few other things that would care > > > > > to > > > > > > > > collapse multiple ones into a single one, and not count them > > > > > > > > towards > > > > > > > > anything. > > > > > > > > > > > > > > > > > > > > > > > > I know an intrinsic is kind of awkward for this, but it seems > > > > > like > > > > > > > > the least bad way we have to annotate a loop in a fully > > > > > generic > > > > > > > > way. > > > > > > > > I'd somewhat like this to be a property of the *loop* and not > > > > > of > > > > > > > > the > > > > > > > > function. And it really needs to be truly generic, handling > > > > > > > > unnatural CFGs and even recursion-loops. My only idea for how > > > > > to > > > > > > > > accomplish that is an intrinsic to mark the dynamic path > > > > > which > > > > > if > > > > > > > > executed infinitely can't be eliminated. > > > > > > > > > > > > > > My largest concern is that the frontend would need to add these > > > > > > > things all over the place, not just before the loop backedges. > > > > For > > > > > > > one thing, if the language has gotos, where should they be > > > > inserted? > > > > > > > > > > > > > > > > > > > > > The target of every goto. > > > > > > > > > > > > > > > > > > > > > For computed goto, very label whose address is taken. > > > > > > > > > > > > > > > > > > > > > This at least doesn't seem that bad to me. > > > > > > > > > > > > > > > > > > > > > Before every branch will be conservatively correct, but I'm > > > > worried > > > > > > > that will unnecessarily block optimizations. They'd also be > > > > needed > > > > > > > at the entry to every function. > > > > > > > > > > > > > > > > > > > > > Only external, address taken, or > > > > internal-and-recursively-called > > > > > > > functions. All of which we already have some blockers to > > > > > > > optimization, so this part i'm not worried about. > > > > > > > > > > > > > > > > > > > > > On the other hand, maybe if we added an optimization that > > > > removed > > > > > > > these things along any control-flow path that already had any > > > > other > > > > > > > side effect, it might not be too bad? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Absolutely, much like lower-expect, we'd need to make sure that > > > > easy > > > > > > > cases were folded quickly in the optimizer so this didn't get > > > > out > > > > of > > > > > > > hand. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As for why I'm particularly interested in this being a > > > > > property > > > > > of > > > > > > > > the loop, consider if you were to have a mixture of Java and > > > > > C++ > > > > > > > > code, all compiled to LLVM. How do you inline between them? > > > > > > > > > > > > > > > > > > > > > > You add the attribute to the caller. > > > > > > > > > > > > > > > > > > > > > This has the really unfortunate side effect of pessimizing code > > > > > > > during cross language optimizations. > > > > > > > > > > > > > > > > > > > > > FWIW, I suspect I might care a lot about this particular case > > > > > > > (because I believe that Fortran has defined behavior for > > > > infinite > > > > > > > loops). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yea, you could argue that C does too, which is one reason why > > > > I'm > > > > so > > > > > > > interested in this being done really well even in an LTO > > > > situation. > > > > > > > > > > > > > > > > > > > > > I think it would be really useful to not have this cross > > > > between > > > > > > > adjacent loops after inlining when they come from different > > > > source > > > > > > > languages, and it would be nice for it to not apply to nested > > > > loops > > > > > > > when those nested loops were inlined from a language without > > > > this > > > > > > > guarantee. > > > > > > > > > > > > > > > > > > > > > But I'm still not convinced that the noise of the intrinsic is > > > > > > > *definitely* worth it. I come from the background of the C++ > > > > rules' > > > > > > > rationale, and so I naturally see the languages that define > > > > this > > > > as > > > > > > > giving up optimizations and so wanting to localize the impact > > > > of > > > > > > > that... Not sure that's actually the right perspective though. > > > > ;] > > > > > > > > > >> > > I'm leaning toward agreeing with you, primarily because I think > > > it > > > will more-naturally fit into the optimizer than the attribute. We > > > need to check loops for side effects anyway (if we otherwise > > > default > > > to C++-like rules), and so this intrinsic will do the right thing > > > without any special logic. > > >> > FWIW, this is why I first started thinking along these lines. I'm > > increasingly thinking that if this approach works it will make the > > implementation of testing for this more natural in the optimizer. > > Simple things like instruction predicates will "just work", etc. >> > I'm really interested in the downsides. You mentioned a few > > potential > > ones, but seem to be happy with my responses. I wonder, are there > > others? > > I'm really not a fan of this approach. I think it could be made to > work, but I suspect we'd run into a lot of quality of implementation > issues if we went down this route.> We'd have to teach many places in the optimizer to merge or split > such calls. Consider simple things like tail commoning, if-then-else > hoisting, or the like. In each of these, we'd need code to recognize > having the intrinsic on one path, but not the other, and then still > perform the optimization. Think the various debug intrinsics, but > worse.> I worry about the interactions with memory aliasing and hoisting > rules. We don't currently have the notion of a non-memory side > effect in LLVM. To prevent this intrinsic from being DCE'd we'd > likely need to mark it as writing to some known location.We'd do the same thing we did with @llvm.assume, for which we have all of the same problems (including those mentioned above, only worse). For @llvm.assume we tag it as writing to an unknown location, but taught BasicAA that is does not alias with any particular location.> Given the number of places in the optimizer which give up when > encountering any write (EarlyCSE for one), that would be problematic > for optimization effectiveness. The other approach would be to teach > LLVM about non-memory side effects. I think this is solvable, but a > large investment.EarlyCSE has specific code to deal with @llvm.assume as well (four lines of code, but code nevertheless).> In practice, most of the contributors to LLVM care about C++. I worry > we'd end up in a situation where languages which need infinite loops > would become second class citizens and that a large number of > optimizations wouldn't apply to them in practice. This is by far my > biggest worry.> Now, I'm certainly biased, but I'd much rather see a scheme where a > quality of implementation issue effects the performance of C++. > These are far more likely to be actually fixed. :)> Earlier in this thread, the idea of using metadata on loops was > mentioned. Chandler's point about generic handling for recursive > loops is a good one, but in general, a metadata annotation of > finiteness seems like a better starting point.> What if we introduced a piece of branch (really, control transfer > instruction) metadata (not loop) called "productive" (to steal > Sanjoy's term) whose semantics are such that it can be assumed to > only execute a finite number of times between productive actions > (volatile, synchronization, io, etc..). We then tag *all* branches > emitted by Clang with this metadata. This gives us the benefit of > the loop metadata in that a single tagged backedge branch implies a > productive loop, but allows productive recursive functions to be > converted into productive loops in a natural way.I don't see how this helps us with the recursive functions case. If I have this: void foo() { bar(); foo(); } where do I put the metadata? Maybe we could also tag the 'ret' as well?> The function attribute "productive" now has an easy inference rule in > that if all branches in the function are productive and all callees > are productive, so is the function. This would seem to allow us to > perform DSE, LICM, and related optimizations without trouble.> Inlining now has a reasonable definition where you can inline between > languages w/the semantics of each language preserved.> One cool thing here is that the set of operations which are > "productive" could actually be encoded in the metadata. This could > potentially support other languages than C++ w.r.t. the "no infinite > loops except when" type rules.> Thoughts?I was at first afraid of the number of places that create branch instructions, and updating them all to preserve the metadata when appropriate. However: $ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms | wc -l 99 $ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms -l | wc -l 33 And so there are only 99 places in 33 files that create branches in lib/Transforms (and I find the numerical coincidence amusing). So this seems potentially doable. Also, if I'm right and we'd need them on 'ret' as well: $ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms | wc -l 33 $ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms -l | wc -l 11 And now I'm even more amused. Chandler, what do you think? -Hal> Philip> p.s. The current implementation of readonly is correct w.r.t. C++ > rules only because we designate all atomic, volatile, or fencing > operations as read/write. One thing we might end up with out of this > discussion is the possibility of talking about functions which are > readonly, but involved in synchronization. That seems like it might > be useful for optimizing concurrent C++ programs in interesting > ways. It would require a separate notion of a synchronization side > effect independent of read/write though. This seems related, but > slightly orthogonal to the assumptions about finiteness of loops.I agree; we do need some notion of side effects other than based on memory dependencies. This, however, is a (hopefully) separate project. Thanks again, Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150721/0ec14927/attachment.html>
----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com> > To: "Philip Reames" <listmail at philipreames.com> > Cc: "Chandler Carruth" <chandlerc at google.com>, "Hal Finkel" <hfinkel at anl.gov>, "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Saturday, July 18, 2015 2:52:56 AM > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops > > > "productive" (to steal Sanjoy's term) > > Just to make it clear: this isn't something I came up with. It is > used (perhaps not extensively) in describing co-inductive data > structures. Insofar I understand the concept, it roughly means that > to get to a finite number of "things" (external actions, in our case) > you only have to execute a finite number of steps. > > > What if we introduced a piece of branch (really, control transfer > > instruction) metadata (not loop) called "productive" (to steal > > Sanjoy's > > term) whose semantics are such that it can be assumed to only > > execute a > > finite number of times between productive actions (volatile, > > synchronization, io, etc..). We then tag *all* branches emitted by > > Clang > > with this metadata. This gives us the benefit of the loop metadata > > in that a > > single tagged backedge branch implies a productive loop, but allows > > productive recursive functions to be converted into productive > > loops in a > > natural way. > > I think this is a good idea. However, I'll suggest one modification: > our specification of a "productive" control transfer should be that > only a finite number of *productive control transfers* can run > between > productive actions (volatile write, IO etc.). This is subtly > different from stating that any one productive control transfer > executes only a finite number of times between productive actions. > > While discussing offline we decided that these were equivalent (and > theoretically they are), but this second definition lets simple local > CFG manipulation be more aggressive. For instance, say you wish to > split an edge or a basic block. With your definition, marking the > newly inserted branches as productive would involve proving that the > new branch is not part of an existing (well defined) infinite loop, > since inserting a productive branch in an otherwise well-defined > infinite loop will introduce UB. However, if we go with this second > definition, we can (aggressively) mark the newly inserted branches as > productive, as long as they themselves do not introduce a cycle.This does indeed sound like a good idea if we use metadata like this.> > Another way of phrasing my suggestion is that any infinite "loop" > that > does not do (IO / volatile memory access / atomic operations) has to > have at least one non-productive control transfer. A "loop" in this > definition has to be general enough to encompass all possible control > flow transfer mechanisms (so mutual recursion will have to count as > "loop"ing, for whatever definition we come up for a loop). > > The only thing that bugs me about this notion of "productivity" (both > your idea and mine) is that, going by the Java spec, it isn't that a > thread stalled in an infinite loop is not performing any actions; but > the stalling is itself an "external" action (similar to writing to a > socket or program termination). It would be nice if it were possible > to precisely capture that.I think the problem is that this depends on the definition of your abstract machine. In Java, if you have a Thread you can query its state: it is observable whether it is still running or not. Many languages don't have abstract machines that contain such a notion, and so a thread not performing productive actions is indistinguishable from a thread not running at all. I think that, in terms of defining LLVM, we probably need to stick closer to this latter vantage point. -Hal> > -- Sanjoy >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
The quoting from different mail clients has made this almost impossible for me to reply in-line. Sorry for that. The two ideas are "sideeffect" marking intrinsics and metadata on control flow instructions to indicate productive. I have to say, I'm not a huge fan of the metadata for productive approach. There are a bunch of reasons why, although I'm not sure that they are strictly speaking "good" reasons. The first, and primary problem I have is that it is a very different and new concept for the IR, and it doesn't fit well with at least the way I think about either the IR or how such languages would be modeled. For example, in the IR, I would expect to check that the *instructions* being executed have some effect, rather than having to check an annotation on control flow to realize that erasing that control flow is allowed. I also worry a great deal about propagating, updating, preserving, and finding metadata. We have had a miserable time getting profile metadata to survive the optimizer, and I think we would have the same struggles here. I also feel like finding all of the relevant edges in the CFG that need to be marked as productive before it is correct to assume productive would be challenging as well. Whereas the reverse problem of determining that an infinite loop must remain is relatively easy. We *already* have to scan the loop instructions executed to draw conclusions such as this, and so it would be very natural to discover the intrinsic that is a sigil. I'm sure I'm somewhat biased here, and its hard for me to discount that, but I feel like the more basic model of as-if is one that doesn't view infinite loops as having observable side-effects, and that it is easier to graft a model for representing those side-effects for languages which expect them than it is to annotate the cases where the stronger guarantee holds. One significant advantage of using the intrinsic is precisely the similarity with llvm.assume that Hal mentioned. We can and should start to unify the handling of these so that we don't actually pay a complexity cost proportional to the number of these that exist. On Tue, Jul 21, 2015 at 6:51 PM Hal Finkel <hfinkel at anl.gov> wrote:> > ------------------------------ > > *From: *"Philip Reames" <listmail at philipreames.com> > *To: *"Chandler Carruth" <chandlerc at google.com>, "Hal Finkel" < > hfinkel at anl.gov> > > > *Cc: *"LLVM Dev" <llvmdev at cs.uiuc.edu> > > *Sent: *Friday, July 17, 2015 7:06:33 PM > > > *Subject: *Re: [LLVMdev] [RFC] Defining Infinite Loops > > > > On 07/17/2015 02:03 AM, Chandler Carruth wrote: > > On Thu, Jul 16, 2015 at 11:08 AM Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- >> > From: "Chandler Carruth" <chandlerc at google.com> >> > To: "Hal Finkel" <hfinkel at anl.gov> >> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> >> > Sent: Thursday, July 16, 2015 2:33:21 AM >> > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops >> > >> > >> > >> > >> > On Thu, Jul 16, 2015 at 12:27 AM Hal Finkel < hfinkel at anl.gov > >> > wrote: >> > >> > >> > > > >> > > From: "Chandler Carruth" < chandlerc at google.com > >> > > To: "Hal Finkel" < hfinkel at anl.gov >, "LLVM Dev" < >> > > llvmdev at cs.uiuc.edu > >> > > Sent: Thursday, July 16, 2015 1:00:05 AM >> > > Subject: Re: [LLVMdev] [RFC] Defining Infinite Loops >> > > >> > > >> > > FWIW, I'm very much in favor of having a firm and clear answer to >> > > these questions. >> > > >> > > I also agree that it is an absolute requirement that LLVM have >> > > *some* >> > > mechanism for supporting both languages with defined behavior for >> > > infinite loops and a language requirement that all loops terminate. >> > > >> > > >> > > However, I'd like to float an alternative approach. I've not spent >> > > a >> > > lot of time thinking about it, so I'm not sure its actually better. >> > > I'm wondering if you've already thought about it. >> > > >> > > >> > > What if we have an @llvm.noop.sideeffect() or some such which >> > > doesn't >> > > read or write memory in any way, but which a frontend can place >> > > inside a loop body to mark that its execution (perhaps infinitely) >> > > is in-and-of-itself a side effect of the program. We could then >> > > teach loop unrolling or the few other things that would care to >> > > collapse multiple ones into a single one, and not count them >> > > towards >> > > anything. >> > > >> > > >> > > I know an intrinsic is kind of awkward for this, but it seems like >> > > the least bad way we have to annotate a loop in a fully generic >> > > way. >> > > I'd somewhat like this to be a property of the *loop* and not of >> > > the >> > > function. And it really needs to be truly generic, handling >> > > unnatural CFGs and even recursion-loops. My only idea for how to >> > > accomplish that is an intrinsic to mark the dynamic path which if >> > > executed infinitely can't be eliminated. >> > >> > My largest concern is that the frontend would need to add these >> > things all over the place, not just before the loop backedges. For >> > one thing, if the language has gotos, where should they be inserted? >> > >> > >> > The target of every goto. >> > >> > >> > For computed goto, very label whose address is taken. >> > >> > >> > This at least doesn't seem that bad to me. >> > >> > >> > Before every branch will be conservatively correct, but I'm worried >> > that will unnecessarily block optimizations. They'd also be needed >> > at the entry to every function. >> > >> > >> > Only external, address taken, or internal-and-recursively-called >> > functions. All of which we already have some blockers to >> > optimization, so this part i'm not worried about. >> > >> > >> > On the other hand, maybe if we added an optimization that removed >> > these things along any control-flow path that already had any other >> > side effect, it might not be too bad? >> > >> > >> > >> > Absolutely, much like lower-expect, we'd need to make sure that easy >> > cases were folded quickly in the optimizer so this didn't get out of >> > hand. >> > >> > >> > >> > > >> > > >> > > As for why I'm particularly interested in this being a property of >> > > the loop, consider if you were to have a mixture of Java and C++ >> > > code, all compiled to LLVM. How do you inline between them? >> > > >> > >> > You add the attribute to the caller. >> > >> > >> > This has the really unfortunate side effect of pessimizing code >> > during cross language optimizations. >> > >> > >> > FWIW, I suspect I might care a lot about this particular case >> > (because I believe that Fortran has defined behavior for infinite >> > loops). >> > >> > >> > >> > Yea, you could argue that C does too, which is one reason why I'm so >> > interested in this being done really well even in an LTO situation. >> > >> > >> > I think it would be really useful to not have this cross between >> > adjacent loops after inlining when they come from different source >> > languages, and it would be nice for it to not apply to nested loops >> > when those nested loops were inlined from a language without this >> > guarantee. >> > >> > >> > But I'm still not convinced that the noise of the intrinsic is >> > *definitely* worth it. I come from the background of the C++ rules' >> > rationale, and so I naturally see the languages that define this as >> > giving up optimizations and so wanting to localize the impact of >> > that... Not sure that's actually the right perspective though. ;] >> > >> >> I'm leaning toward agreeing with you, primarily because I think it will >> more-naturally fit into the optimizer than the attribute. We need to check >> loops for side effects anyway (if we otherwise default to C++-like rules), >> and so this intrinsic will do the right thing without any special logic. >> > > FWIW, this is why I first started thinking along these lines. I'm > increasingly thinking that if this approach works it will make the > implementation of testing for this more natural in the optimizer. Simple > things like instruction predicates will "just work", etc. > > I'm really interested in the downsides. You mentioned a few potential > ones, but seem to be happy with my responses. I wonder, are there others? > > I'm really not a fan of this approach. I think it could be made to work, > but I suspect we'd run into a lot of quality of implementation issues if we > went down this route. > > We'd have to teach many places in the optimizer to merge or split such > calls. Consider simple things like tail commoning, if-then-else hoisting, > or the like. In each of these, we'd need code to recognize having the > intrinsic on one path, but not the other, and then still perform the > optimization. Think the various debug intrinsics, but worse. > > I worry about the interactions with memory aliasing and hoisting rules. > We don't currently have the notion of a non-memory side effect in LLVM. To > prevent this intrinsic from being DCE'd we'd likely need to mark it as > writing to some known location. > > > We'd do the same thing we did with @llvm.assume, for which we have all of > the same problems (including those mentioned above, only worse). For > @llvm.assume we tag it as writing to an unknown location, but taught > BasicAA that is does not alias with any particular location. > > > Given the number of places in the optimizer which give up when > encountering any write (EarlyCSE for one), that would be problematic for > optimization effectiveness. The other approach would be to teach LLVM > about non-memory side effects. I think this is solvable, but a large > investment. > > EarlyCSE has specific code to deal with @llvm.assume as well (four lines > of code, but code nevertheless). > > > In practice, most of the contributors to LLVM care about C++. I worry > we'd end up in a situation where languages which need infinite loops would > become second class citizens and that a large number of optimizations > wouldn't apply to them in practice. This is by far my biggest worry. > > Now, I'm certainly biased, but I'd much rather see a scheme where a > quality of implementation issue effects the performance of C++. These are > far more likely to be actually fixed. :) > > Earlier in this thread, the idea of using metadata on loops was > mentioned. Chandler's point about generic handling for recursive loops is > a good one, but in general, a metadata annotation of finiteness seems like > a better starting point. > > What if we introduced a piece of branch (really, control transfer > instruction) metadata (not loop) called "productive" (to steal Sanjoy's > term) whose semantics are such that it can be assumed to only execute a > finite number of times between productive actions (volatile, > synchronization, io, etc..). We then tag *all* branches emitted by Clang > with this metadata. This gives us the benefit of the loop metadata in that > a single tagged backedge branch implies a productive loop, but allows > productive recursive functions to be converted into productive loops in a > natural way. > > > I don't see how this helps us with the recursive functions case. If I have > this: > > void foo() { > bar(); > foo(); > } > > where do I put the metadata? Maybe we could also tag the 'ret' as well? > > > > The function attribute "productive" now has an easy inference rule in that > if all branches in the function are productive and all callees are > productive, so is the function. This would seem to allow us to perform > DSE, LICM, and related optimizations without trouble. > > Inlining now has a reasonable definition where you can inline between > languages w/the semantics of each language preserved. > > One cool thing here is that the set of operations which are "productive" > could actually be encoded in the metadata. This could potentially support > other languages than C++ w.r.t. the "no infinite loops except when" type > rules. > > Thoughts? > > > I was at first afraid of the number of places that create branch > instructions, and updating them all to preserve the metadata when > appropriate. However: > > $ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms | wc -l > 99 > $ grep -r 'BranchInst::Create\|CreateBr' lib/Transforms -l | wc -l > 33 > > And so there are only 99 places in 33 files that create branches in > lib/Transforms (and I find the numerical coincidence amusing). So this > seems potentially doable. > > Also, if I'm right and we'd need them on 'ret' as well: > > $ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms | wc -l > 33 > $ grep -r 'ReturnInst::Create\|CreateRet' lib/Transforms -l | wc -l > 11 > > And now I'm even more amused. > > Chandler, what do you think? > > -Hal > > > > Philip > > p.s. The current implementation of readonly is correct w.r.t. C++ rules > only because we designate all atomic, volatile, or fencing operations as > read/write. One thing we might end up with out of this discussion is the > possibility of talking about functions which are readonly, but involved in > synchronization. That seems like it might be useful for optimizing > concurrent C++ programs in interesting ways. It would require a separate > notion of a synchronization side effect independent of read/write though. > This seems related, but slightly orthogonal to the assumptions about > finiteness of loops. > > I agree; we do need some notion of side effects other than based on memory > dependencies. This, however, is a (hopefully) separate project. > > > Thanks again, > Hal > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/f1ea18fb/attachment.html>