Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo. --- Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler. An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis. Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR. IR Canonicalization Pipeline: Function Passes { SimplifyCFG SROA-1 EarlyCSE } Call-Graph SCC Passes { Inline Function Passes { EarlyCSE SimplifyCFG InstCombine Early Loop Opts { LoopSimplify Rotate (when obvious) Full-Unroll (when obvious) } SROA-2 InstCombine GVN Reassociate Generic Loop Opts { LICM (Rotate on-demand) Unswitch } SCCP InstCombine JumpThreading CorrelatedValuePropagation AggressiveDCE } } IR optimizations that require target information or destructively modify the IR can run in a separate pipeline. This helps make a more a clean distinction between passes that may and may not use TargetTransformInfo. TargetTransformInfo encapsultes legal types and operation costs. IR instruction costs are approximate and relative. They do not represent def-use latencies nor do they distinguish between latency and cpu resources requirements--that level of machine modeling needs to be done in MI passes. IR Lowering Pipeline: Function Passes { Target SimplifyCFG (OptimizeCFG?) Target InstCombine (InstOptimize?) Target Loop Opts { SCEV IndvarSimplify (mainly sxt/zxt elimination) Vectorize/Unroll LSR (move LFTR here too) } SLP Vectorize LowerSwitch CodeGenPrepare } --- The above pass ordering is roughly something I think we can live with. Notice that I have: Full-Unroll -> SROA-2 -> GVN -> Loop-Opts since that solves some issues we have today. I don't currently have any reason to reorder the "late" IR optimization passes (those after generic loop opts). We do either need a GVN-util that loops opts and lowering passes may call on-demand after performing code motion, or we can rerun a non-iterative GVN-lite as a cleanup after lowering passes. If anyone can think of important dependencies between IR passes, this would be good time to point it out. We could probably make an adjustment to the ‘opt' driver so that the user can specify any mix of canonical and lowering passes. The first lowering pass and subsequent passes would run in the lowering function pass manager. ‘llc' could also optionally run the lowering pass pipeline for as convenience for users who want to run ‘opt' without specifying a triple/cpu. -Andy
There seems to be a lot of interest recently in LTO. How do you see the situation of splitting the IR passes between per-TU processing and multi-TU ("link time") processing? -- Sean Silva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130717/f5310a6e/attachment.html>
Andy and I briefly discussed this the other day, we have not yet got chance to list a detailed pass order for the pre- and post- IPO scalar optimizations. This is wish-list in our mind: pre-IPO: based on the ordering he propose, get rid of the inlining (or just inline tiny func), get rid of all loop xforms... post-IPO: get rid of inlining, or maybe we still need it, only perform the inling to to callee which now become tiny. enable the loop xforms. The SCC pass manager seems to be important inling, no matter how the inling looks like in the future, I think the passmanager is still useful for scalar opt. It enable us to achieve cheap inter-procedural opt hands down in the sense that we can optimize callee, analyze it, and feedback the detailed whatever info back to caller (say info like "the callee already return constant 5", the "callee return value in 5-10", and such info is difficult to obtain and IPO stage, as it can not afford to take such closer look. I think it is too early to discuss the pre-IPO and post-IPO thing, let us focus on what Andy is proposing. On 7/17/13 6:04 PM, Sean Silva wrote:> There seems to be a lot of interest recently in LTO. How do you see > the situation of splitting the IR passes between per-TU processing and > multi-TU ("link time") processing? > > -- Sean Silva > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130717/98f4a436/attachment.html>
On Jul 16, 2013, at 9:38 PM, Andrew Trick <atrick at apple.com> wrote:> IR Canonicalization Pipeline: > > Function Passes { > SimplifyCFG > SROA-1 > EarlyCSE > } > Call-Graph SCC Passes { > Inline > Function Passes { > EarlyCSE > SimplifyCFG > InstCombine > Early Loop Opts { > LoopSimplify > Rotate (when obvious) > Full-Unroll (when obvious) > } > SROA-2 > InstCombine > GVN... I should explain: SROA-1 and SROA-2 are not necessarily different versions of SROA (though they could be in the future), I just wanted to be clear that it is run twice. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130717/c86a3164/attachment.html>
Krzysztof Parzyszek
2013-Jul-29 16:05 UTC
[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
On 7/16/2013 11:38 PM, Andrew Trick wrote:> Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. > > To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo. > > --- > Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler. > > An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis. > > Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR. > > IR Canonicalization Pipeline: > > Function Passes { > SimplifyCFG > SROA-1 > EarlyCSE > } > Call-Graph SCC Passes { > Inline > Function Passes { > EarlyCSE > SimplifyCFG > InstCombine > Early Loop Opts { > LoopSimplify > Rotate (when obvious) > Full-Unroll (when obvious) > } > SROA-2 > InstCombine > GVN > Reassociate > Generic Loop Opts { > LICM (Rotate on-demand) > Unswitch > } > SCCP > InstCombine > JumpThreading > CorrelatedValuePropagation > AggressiveDCE > } > } >I'm a bit late to this, but the examples of the "generic loop opts" above are really better left until later. They have a potential to obscure the code and make other loop optimizations harder. Specifically, there has to be a place where loop nest optimizations can be done (such as loop interchange or unroll-and-jam). There is also array expansion and loop distribution, which can be highly target-dependent in terms of their applicability. I don't know if TTI could provide enough details to account for all circumstances that would motivate such transformations, but assuming that it could, there still needs to be a room left for it in the design. On a different, but related note---one thing I've asked recently was about the "proper" solution for recognizing target specific loop idioms. On Hexagon, we have a builtin functions that handle certain specific loop patterns. In order to separate the target-dependent code from the target-independent, we would basically have to replicate the loop idiom recognition in our own target-specific pass. Not only that, but it would have to run before the loops may be subjected to other optimizations that could obfuscate the opportunity. To solve this, I was thinking about having target-specific hooks in the idiom recognition code, that could transform a given loop in the target's own way. Still, that would imply target-specific transformations running before the "official" lowering code. -K -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On Jul 29, 2013, at 9:05 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> On 7/16/2013 11:38 PM, Andrew Trick wrote: >> Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions. >> >> To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo. >> >> --- >> Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler. >> >> An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis. >> >> Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR. >> >> IR Canonicalization Pipeline: >> >> Function Passes { >> SimplifyCFG >> SROA-1 >> EarlyCSE >> } >> Call-Graph SCC Passes { >> Inline >> Function Passes { >> EarlyCSE >> SimplifyCFG >> InstCombine >> Early Loop Opts { >> LoopSimplify >> Rotate (when obvious) >> Full-Unroll (when obvious) >> } >> SROA-2 >> InstCombine >> GVN >> Reassociate >> Generic Loop Opts { >> LICM (Rotate on-demand) >> Unswitch >> } >> SCCP >> InstCombine >> JumpThreading >> CorrelatedValuePropagation >> AggressiveDCE >> } >> } >> > > I'm a bit late to this, but the examples of the "generic loop opts" above are really better left until later. They have a potential to obscure the code and make other loop optimizations harder. Specifically, there has to be a place where loop nest optimizations can be done (such as loop interchange or unroll-and-jam). There is also array expansion and loop distribution, which can be highly target-dependent in terms of their applicability. I don't know if TTI could provide enough details to account for all circumstances that would motivate such transformations, but assuming that it could, there still needs to be a room left for it in the design.You mean that LICM and Unswitching should be left for later? For the purpose of exposing scalar optimizations, I'm not sure I agree with that but I'd be interested in examples. I think you're only worried about the impact on loop nest optimizations. Admittedly I'm not making much concessesion for that, because I think of loop nest optimization as a different tool that will probably want fairly substantial changes to the pass pipeline anyway. Here's a few of ways it might work: (1) Loop nest optimizer extends the standard PMB by plugging in its own passes prior to Generic Loop Opts in addition to loading TTI. The loop nest optimizer's passes are free to query TTI: (2) Loop nest optimizer suppresses generic loop opts through a PMB flag (assuming they are too disruptive). It registers its own loop passes with the Target Loop Opts. It registers instances of generic loop opts to now run after loop nest optimization, and registers new instances of scalar opts to rerun after Target Loop Opts if needed. (3) If the loop nest optimizer were part of llvm core libs, then we could have a completely separate passmanager builder for it.> On a different, but related note---one thing I've asked recently was about the "proper" solution for recognizing target specific loop idioms. On Hexagon, we have a builtin functions that handle certain specific loop patterns. In order to separate the target-dependent code from the target-independent, we would basically have to replicate the loop idiom recognition in our own target-specific pass. Not only that, but it would have to run before the loops may be subjected to other optimizations that could obfuscate the opportunity. To solve this, I was thinking about having target-specific hooks in the idiom recognition code, that could transform a given loop in the target's own way. Still, that would imply target-specific transformations running before the "official" lowering code.We may be able to run loop idiom recognition as part of Target Loop Opts. If that misses too many optimizations, then targets can add a second instance of loop-idiom in the target loop opts. Target's can also add extra instances of scalar opts passes in the lowering pipeline, if needed, to cleanup. The lowering pass order should be completely configurable. Are you afraid that LICM and unswitching will obfuscate the loops to the point that you can’t recognize the idiom? The current pass pipeline would have the same problem. -Andy
Maybe Matching Threads
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [LLVMdev] IR Passes and TargetTransformInfo: Straw Man
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [RFC] Delaying phi-to-select transformation until later in the pass pipeline
- [LLVMdev] Enabling the SLP vectorizer by default for -O3