thr3ads.net - llvm dev - [LLVMdev] IR Passes and TargetTransformInfo: Straw Man [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2013-Jul-17 04:38 UTC

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Since introducing the new TargetTransformInfo analysis, there has been some
confusion over the role of target heuristics in IR passes. A few patches have
led to interesting discussions.

To centralize the discussion, until we get some documentation and better APIs in
place, let me throw out an oversimplified Straw Man for a new pass pipline. It
serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a
formalization of the role of TargetTransformInfo.

---
Canonicalization passes are designed to normalize the IR in order to expose
opportunities to subsequent machine independent passes. This simplifies writing
machine independent optimizations and improves the quality of the compiler.

An important property of these passes is that they are repeatable. The may be
invoked multiple times after inlining and should converge to a canonical form.
They should not destructively transform the IR in a way that defeats subsequent
analysis.

Canonicalization passes can make use of data layout and are affected by ABI, but
are otherwise target independent. Adding target specific hooks to these passes
can defeat the purpose of canonical IR.

IR Canonicalization Pipeline:

Function Passes {
  SimplifyCFG
  SROA-1
  EarlyCSE
}
Call-Graph SCC Passes {
  Inline
  Function Passes {
    EarlyCSE
    SimplifyCFG
    InstCombine
    Early Loop Opts {
      LoopSimplify
      Rotate (when obvious)
      Full-Unroll (when obvious)
    }
    SROA-2
    InstCombine
    GVN
    Reassociate
    Generic Loop Opts {
      LICM (Rotate on-demand)
      Unswitch
    }
    SCCP
    InstCombine
    JumpThreading
    CorrelatedValuePropagation
    AggressiveDCE
  }
}

IR optimizations that require target information or destructively modify the IR
can run in a separate pipeline. This helps make a more a clean distinction
between passes that may and may not use TargetTransformInfo.

TargetTransformInfo encapsultes legal types and operation costs. IR instruction
costs are approximate and relative. They do not represent def-use latencies nor
do they distinguish between latency and cpu resources requirements--that level
of machine modeling needs to be done in MI passes.

IR Lowering Pipeline:

Function Passes {
  Target SimplifyCFG (OptimizeCFG?)
  Target InstCombine (InstOptimize?)
  Target Loop Opts {
    SCEV
    IndvarSimplify (mainly sxt/zxt elimination)
    Vectorize/Unroll
    LSR (move LFTR here too)
  }
  SLP Vectorize
  LowerSwitch
  CodeGenPrepare
}
---

The above pass ordering is roughly something I think we can live with. Notice
that I have:
  Full-Unroll -> SROA-2 -> GVN -> Loop-Opts
since that solves some issues we have today.

I don't currently have any reason to reorder the "late" IR
optimization passes (those after generic loop opts). We do either need a
GVN-util that  loops opts and lowering passes may call on-demand after
performing code motion, or we can rerun a non-iterative GVN-lite as a cleanup
after lowering passes.

If anyone can think of important dependencies between IR passes, this would be
good time to point it out.

We could probably make an adjustment to the ‘opt' driver so that the user
can specify any mix of canonical and lowering passes. The first lowering pass
and subsequent passes would run in the lowering function pass manager.

‘llc' could also optionally run the lowering pass pipeline for as
convenience for users who want to run ‘opt' without specifying a triple/cpu.

-Andy

Sean Silva

2013-Jul-18 01:04 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

There seems to be a lot of interest recently in LTO. How do you see the
situation of splitting the IR passes between per-TU processing and multi-TU
("link time") processing?

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130717/f5310a6e/attachment.html>

Shuxin Yang

2013-Jul-18 02:09 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Andy and I briefly discussed this  the other day, we have not yet got 
chance to list a detailed pass order
for the pre- and post- IPO scalar optimizations.

This is wish-list in our mind:

pre-IPO:  based on the ordering he propose, get rid of the inlining (or 
just inline tiny func), get rid of
                all loop xforms...

post-IPO: get rid of inlining, or maybe we still need it, only perform 
the inling to to callee which now become tiny.
                enable the loop xforms.

                 The SCC pass manager seems to be important inling,  no 
matter how the inling looks like in the future,
                 I think the passmanager is still useful for scalar 
opt.  It enable us to achieve cheap inter-procedural
                 opt hands down in the sense that we can optimize 
callee, analyze it, and feedback the detailed whatever
                 info  back to caller (say info like "the callee already 
return constant 5", the "callee return value in 5-10",
                 and such info is difficult to obtain and IPO stage, as 
it can not afford to take such closer look.

I think it is too early to discuss the pre-IPO and post-IPO thing, let 
us focus on what Andy is proposing.


On 7/17/13 6:04 PM, Sean Silva wrote:> There seems to be a lot of interest recently in LTO. How do you see 
> the situation of splitting the IR passes between per-TU processing and 
> multi-TU ("link time") processing?
>
> -- Sean Silva
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130717/98f4a436/attachment.html>

Andrew Trick

2013-Jul-18 02:49 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On Jul 16, 2013, at 9:38 PM, Andrew Trick <atrick at apple.com>
wrote:> IR Canonicalization Pipeline:
> 
> Function Passes {
>  SimplifyCFG
>  SROA-1
>  EarlyCSE
> }
> Call-Graph SCC Passes {
>  Inline
>  Function Passes {
>    EarlyCSE
>    SimplifyCFG
>    InstCombine
>    Early Loop Opts {
>      LoopSimplify
>      Rotate (when obvious)
>      Full-Unroll (when obvious)
>    }
>    SROA-2
>    InstCombine
>    GVN...

I should explain: SROA-1 and SROA-2 are not necessarily different versions of
SROA (though they could be in the future), I just wanted to be clear that it is
run twice.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130717/c86a3164/attachment.html>

Krzysztof Parzyszek

2013-Jul-29 16:05 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On 7/16/2013 11:38 PM, Andrew Trick wrote:> Since introducing the new TargetTransformInfo analysis, there has been some
confusion over the role of target heuristics in IR passes. A few patches have
led to interesting discussions.
>
> To centralize the discussion, until we get some documentation and better
APIs in place, let me throw out an oversimplified Straw Man for a new pass
pipline. It serves two purposes: (1) an overdue reorganization of the pass
pipeline (2) a formalization of the role of TargetTransformInfo.
>
> ---
> Canonicalization passes are designed to normalize the IR in order to expose
opportunities to subsequent machine independent passes. This simplifies writing
machine independent optimizations and improves the quality of the compiler.
>
> An important property of these passes is that they are repeatable. The may
be invoked multiple times after inlining and should converge to a canonical
form. They should not destructively transform the IR in a way that defeats
subsequent analysis.
>
> Canonicalization passes can make use of data layout and are affected by
ABI, but are otherwise target independent. Adding target specific hooks to these
passes can defeat the purpose of canonical IR.
>
> IR Canonicalization Pipeline:
>
> Function Passes {
>    SimplifyCFG
>    SROA-1
>    EarlyCSE
> }
> Call-Graph SCC Passes {
>    Inline
>    Function Passes {
>      EarlyCSE
>      SimplifyCFG
>      InstCombine
>      Early Loop Opts {
>        LoopSimplify
>        Rotate (when obvious)
>        Full-Unroll (when obvious)
>      }
>      SROA-2
>      InstCombine
>      GVN
>      Reassociate
>      Generic Loop Opts {
>        LICM (Rotate on-demand)
>        Unswitch
>      }
>      SCCP
>      InstCombine
>      JumpThreading
>      CorrelatedValuePropagation
>      AggressiveDCE
>    }
> }
>
I'm a bit late to this, but the examples of the "generic loop
opts"
above are really better left until later.  They have a potential to 
obscure the code and make other loop optimizations harder. 
Specifically, there has to be a place where loop nest optimizations can 
be done (such as loop interchange or unroll-and-jam).  There is also 
array expansion and loop distribution, which can be highly 
target-dependent in terms of their applicability.  I don't know if TTI 
could provide enough details to account for all circumstances that would 
motivate such transformations, but assuming that it could, there still 
needs to be a room left for it in the design.


On a different, but related note---one thing I've asked recently was 
about the "proper" solution for recognizing target specific loop
idioms.
  On Hexagon, we have a builtin functions that handle certain specific 
loop patterns.  In order to separate the target-dependent code from the 
target-independent, we would basically have to replicate the loop idiom 
recognition in our own target-specific pass.  Not only that, but it 
would have to run before the loops may be subjected to other 
optimizations that could obfuscate the opportunity.  To solve this, I 
was thinking about having target-specific hooks in the idiom recognition 
code, that could transform a given loop in the target's own way.  Still, 
that would imply target-specific transformations running before the 
"official" lowering code.

-K

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Andrew Trick

2013-Jul-29 23:28 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On Jul 29, 2013, at 9:05 AM, Krzysztof Parzyszek <kparzysz at
codeaurora.org> wrote:
> On 7/16/2013 11:38 PM, Andrew Trick wrote:
>> Since introducing the new TargetTransformInfo analysis, there has been
some confusion over the role of target heuristics in IR passes. A few patches
have led to interesting discussions.
>> 
>> To centralize the discussion, until we get some documentation and
better APIs in place, let me throw out an oversimplified Straw Man for a new
pass pipline. It serves two purposes: (1) an overdue reorganization of the pass
pipeline (2) a formalization of the role of TargetTransformInfo.
>> 
>> ---
>> Canonicalization passes are designed to normalize the IR in order to
expose opportunities to subsequent machine independent passes. This simplifies
writing machine independent optimizations and improves the quality of the
compiler.
>> 
>> An important property of these passes is that they are repeatable. The
may be invoked multiple times after inlining and should converge to a canonical
form. They should not destructively transform the IR in a way that defeats
subsequent analysis.
>> 
>> Canonicalization passes can make use of data layout and are affected by
ABI, but are otherwise target independent. Adding target specific hooks to these
passes can defeat the purpose of canonical IR.
>> 
>> IR Canonicalization Pipeline:
>> 
>> Function Passes {
>>   SimplifyCFG
>>   SROA-1
>>   EarlyCSE
>> }
>> Call-Graph SCC Passes {
>>   Inline
>>   Function Passes {
>>     EarlyCSE
>>     SimplifyCFG
>>     InstCombine
>>     Early Loop Opts {
>>       LoopSimplify
>>       Rotate (when obvious)
>>       Full-Unroll (when obvious)
>>     }
>>     SROA-2
>>     InstCombine
>>     GVN
>>     Reassociate
>>     Generic Loop Opts {
>>       LICM (Rotate on-demand)
>>       Unswitch
>>     }
>>     SCCP
>>     InstCombine
>>     JumpThreading
>>     CorrelatedValuePropagation
>>     AggressiveDCE
>>   }
>> }
>> 
> 
> I'm a bit late to this, but the examples of the "generic loop
opts" above are really better left until later.  They have a potential to
obscure the code and make other loop optimizations harder. Specifically, there
has to be a place where loop nest optimizations can be done (such as loop
interchange or unroll-and-jam).  There is also array expansion and loop
distribution, which can be highly target-dependent in terms of their
applicability.  I don't know if TTI could provide enough details to account
for all circumstances that would motivate such transformations, but assuming
that it could, there still needs to be a room left for it in the design.
You mean that LICM and Unswitching should be left for later? For the purpose of
exposing scalar optimizations, I'm not sure I agree with that but I'd be
interested in examples.

I think you're only worried about the impact on loop nest optimizations.
Admittedly I'm not making much concessesion for that, because I think of
loop nest optimization as a different tool that will probably want fairly
substantial changes to the pass pipeline anyway. Here's a few of ways it
might work:

(1) Loop nest optimizer extends the standard PMB by plugging in its own passes
prior to Generic Loop Opts in addition to loading TTI. The loop nest
optimizer's passes are free to query TTI:

(2) Loop nest optimizer suppresses generic loop opts through a PMB flag
(assuming they are too disruptive). It registers its own loop passes with the
Target Loop Opts. It registers instances of generic loop opts to now run after
loop nest optimization, and registers new instances of scalar opts to rerun
after Target Loop Opts if needed.

(3) If the loop nest optimizer were part of llvm core libs, then we could have a
completely separate passmanager builder for it.
> On a different, but related note---one thing I've asked recently was
about the "proper" solution for recognizing target specific loop
idioms.  On Hexagon, we have a builtin functions that handle certain specific
loop patterns.  In order to separate the target-dependent code from the
target-independent, we would basically have to replicate the loop idiom
recognition in our own target-specific pass.  Not only that, but it would have
to run before the loops may be subjected to other optimizations that could
obfuscate the opportunity.  To solve this, I was thinking about having
target-specific hooks in the idiom recognition code, that could transform a
given loop in the target's own way.  Still, that would imply target-specific
transformations running before the "official" lowering code.
We may be able to run loop idiom recognition as part of Target Loop Opts. If
that misses too many optimizations, then targets can add a second instance of
loop-idiom in the target loop opts. Target's can also add extra instances of
scalar opts passes in the lowering pipeline, if needed, to cleanup. The lowering
pass order should be completely configurable.

Are you afraid that LICM and unswitching will obfuscate the loops to the point
that you can’t recognize the idiom? The current pass pipeline would have the
same problem.

-Andy

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jul 2013 - [LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Possibly Parallel Threads