On Thu, Jul 17, 2014 at 5:31 PM, Philip Reames <listmail at philipreames.com> wrote:> 3. An "llvm.invariant" has zero code generation cost. Given that, a lot > of pattern matching and profitability heuristics will need adjusted to > ignore them. >FWIW, this has been the fundamental point of contention in the entire design. I've discussed this several times with Andy, Hal, and others. I'll try to summarize a few points here, although getting all of the points is beyond me. Also, this is *my* takeaway, I'm probably not doing justice to other positions here, but hopefully others will chime in. IMO, this cannot be avoided. This is *the* tradeoff of encoding assumptions: you risk the complexity of the assumption you are expressing imposing more optimization costs than you get in simplifications due to the extra information. I think we should just make this very clear in the documentation, and then not worry about it too much. The idea is to only use this to express really important, and impossible to deduce invariants of the program or high-level abstraction in the IR. Unless it has really significant effects, it isn't worth it, don't do it. =/ It may not be the most theoretically satisfying result, but I think it is practical. The other ideas have been to try to use some metadata encoding scheme to partition the abstraction from the actual IR. However, all of these IMO degenerate to specifying a "shadow IR" -- an entirely different IR that we still have to canonicalize, optimize, and pattern match, but which is somehow different and separated from the actual IR. I think the complexity of this is tremendously higher and would be a real problem long-term. I also don't think that the gain is worth the tradeoff. I worry about the maintenance cost of this in the optimizer. As long as we don't go "crazy" trying to recover the performance, it should be OK. Hal has already put together the patches needed for his current approach. We'll likely have to tweak this approach a little, but we don't need to go throguh and change every single hasOneUse() check to do something special with these invariants (or assumptions).> I can see a couple of possibilities here: > - Canonicalize the placement of "llvm.invariants: at the end of each basic > block. This at least reduces the patterns to be matched. >I think that the use of use-def SSA graph information makes the placement of these unimportant. On the flip side, sinking the invariant below a call which might unwind would reduce scope over which the invariant applies. - Remove the invariant instructions entirely from the in flight IR. Use the> "llvm.invariant" instruction only for serialization, but have the > conditions stored directly on the basic blocks when in memory. This would > reduce the pattern matching problem, but at cost of extra complexity. >This is very similar to the metadata approach -- it boils down to needing to have and hold a reasonably complete "IR" which isn't the actual IR.> > 4. How will the presence of "llvm.invariants" effect things like code > placement? I could see have lots of extra users (which are dropped late in > the optimizer) causing somewhat surprising results. (i.e. we perform CSE, > instruction gets placed differently, performance differs) >Somewhat surprising, but I suspect it won't be terribly different. These won't make past codegen prep (and maybe removed even before LSR or such) and so shouldn't really change the scheduling within a basic block. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140717/83116d38/attachment.html>
On 07/17/2014 02:44 PM, Chandler Carruth wrote:> On Thu, Jul 17, 2014 at 5:31 PM, Philip Reames > <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote: > > 3. An "llvm.invariant" has zero code generation cost. Given that, > a lot of pattern matching and profitability heuristics will need > adjusted to ignore them. > > > FWIW, this has been the fundamental point of contention in the entire > design. I've discussed this several times with Andy, Hal, and others. > I'll try to summarize a few points here, although getting all of the > points is beyond me. Also, this is *my* takeaway, I'm probably not > doing justice to other positions here, but hopefully others will chime in. > > IMO, this cannot be avoided. This is *the* tradeoff of encoding > assumptions: you risk the complexity of the assumption you are > expressing imposing more optimization costs than you get in > simplifications due to the extra information. I think we should just > make this very clear in the documentation, and then not worry about it > too much. The idea is to only use this to express really important, > and impossible to deduce invariants of the program or high-level > abstraction in the IR. Unless it has really significant effects, it > isn't worth it, don't do it. =/ It may not be the most theoretically > satisfying result, but I think it is practical. > > The other ideas have been to try to use some metadata encoding scheme > to partition the abstraction from the actual IR. However, all of these > IMO degenerate to specifying a "shadow IR" -- an entirely different IR > that we still have to canonicalize, optimize, and pattern match, but > which is somehow different and separated from the actual IR. I think > the complexity of this is tremendously higher and would be a real > problem long-term. I also don't think that the gain is worth the tradeoff.This is a convincing argument; I find myself agreeing.> > I worry about the maintenance cost of this in the optimizer. > > > As long as we don't go "crazy" trying to recover the performance, it > should be OK. Hal has already put together the patches needed for his > current approach. We'll likely have to tweak this approach a little, > but we don't need to go throguh and change every single hasOneUse() > check to do something special with these invariants (or assumptions). > > I can see a couple of possibilities here: > - Canonicalize the placement of "llvm.invariants: at the end of > each basic block. This at least reduces the patterns to be matched. > > > I think that the use of use-def SSA graph information makes the > placement of these unimportant.I suspect that we also match on "empty" or "near empty" basic blocks in numerous places. We'll need to handle that case.> On the flip side, sinking the invariant below a call which might > unwind would reduce scope over which the invariant applies.You're right; I'd failed to consider this case. Suggestion withdrawn. :)> > - Remove the invariant instructions entirely from the in flight > IR. Use the "llvm.invariant" instruction only for serialization, > but have the conditions stored directly on the basic blocks when > in memory. This would reduce the pattern matching problem, but at > cost of extra complexity. > > > This is very similar to the metadata approach -- it boils down to > needing to have and hold a reasonably complete "IR" which isn't the > actual IR.I see your point and agree. I think I'm convinced this is a cost we're just going to have to pay. I'm not thrilled about that, but I agree it seems to beat the alternatives.> > > 4. How will the presence of "llvm.invariants" effect things like > code placement? I could see have lots of extra users (which are > dropped late in the optimizer) causing somewhat surprising > results. (i.e. we perform CSE, instruction gets placed > differently, performance differs) > > > Somewhat surprising, but I suspect it won't be terribly different. > These won't make past codegen prep (and maybe removed even before LSR > or such) and so shouldn't really change the scheduling within a basic > block.I suspect we need data on this one. This is not something which should hold the submission in any way, but it would be interesting to study long term. Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140717/9811f9ac/attachment.html>
> On Jul 17, 2014, at 2:44 PM, Chandler Carruth <chandlerc at google.com> wrote: > > I worry about the maintenance cost of this in the optimizer. > > As long as we don't go "crazy" trying to recover the performance, it should be OK. Hal has already put together the patches needed for his current approach. We'll likely have to tweak this approach a little, but we don't need to go throguh and change every single hasOneUse() check to do something special with these invariants (or assumptions).My main concern was the impact on use-list iteration and hasOneUse() patterns, and the potential burden on the rest of the optimizer because we often need to check for ephemeral values. Is there any way to check that a use is ephemeral? The potential for silently defeating optimizations when you add the intrinsics is also worrisome. It could lead to surprising results when someone adds a new invariant that is supposed to improve performance. That said, I put a lot of weight in having an implementation. If you can run experiments where you add a significant number of invariants and don’t see performance regressions, then I’d be convinced. If we can make it work in the current code base, then maintaining it should be doable. It could also be a problem that we introduce a call that potentially writes memory, but that is less subtle than the use-list problem. On all the other points raised in the thread, I agree with Chandler and Philip’s comments. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140718/fa8dc8af/attachment.html>
----- Original Message -----> From: "Andrew Trick" <atrick at apple.com> > To: "Chandler Carruth" <chandlerc at google.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, July 18, 2014 2:49:30 AM > Subject: Re: [LLVMdev] [RFC] Invariants in LLVM > > On Jul 17, 2014, at 2:44 PM, Chandler Carruth < chandlerc at google.com > > wrote: > > > > I worry about the maintenance cost of this in the optimizer. > > > As long as we don't go "crazy" trying to recover the performance, it > should be OK. Hal has already put together the patches needed for > his current approach. We'll likely have to tweak this approach a > little, but we don't need to go throguh and change every single > hasOneUse() check to do something special with these invariants (or > assumptions). > > My main concern was the impact on use-list iteration and hasOneUse() > patterns, and the potential burden on the rest of the optimizer > because we often need to check for ephemeral values. Is there any > way to check that a use is ephemeral? The potential for silently > defeating optimizations when you add the intrinsics is also > worrisome. It could lead to surprising results when someone adds a > new invariant that is supposed to improve performance.Regarding hasOneUse(), I propose that we not change it at all. The philosophy that I'd like to experiment with is, "if the invariant is worth having, then it is worth keeping." That means not performing optimizations that will necessitate removal of the invariant (which I think we'd need to do if we somehow update hasOneUse() to ignore ephemeral values). That having been said, there are certainly times when we'd like to ignore ephemeral values (such as when we do cost estimates for inlining, loop unrolling, etc.), and we do need to worry about efficiency there. It may turn out that the current scheme needs enhancement to be sufficiently efficient.> > > That said, I put a lot of weight in having an implementation. If you > can run experiments where you add a significant number of invariants > and don’t see performance regressions, then I’d be convinced. If we > can make it work in the current code base, then maintaining it > should be doable. > > > It could also be a problem that we introduce a call that potentially > writes memory, but that is less subtle than the use-list problem.Yes, and this is necessary to maintain control dependencies. However, I'm updated BasicAA to return NoModRef for any particular Loc-CS and CS-CS query involving an invariant. So, although it writes to somewhere unknown, it won't alias with anywhere in particular, and so hopefully it will not affect code motion for memory accesses for which we have location information. Thanks again, Hal> > > On all the other points raised in the thread, I agree with Chandler > and Philip’s comments. > > > -Andy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory