thr3ads.net - llvm dev - [LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions [May 2012]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2012-May-21 20:36 UTC

[LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions

Tobias,

Thanks for taking the time to summarize all this. It's a great writeup.
I'm moving the thread to llvm-dev. My responses below.

On May 21, 2012, at 5:06 AM, Tobias Grosser <tobias at grosser.es>
wrote:> First of all some information about the RegionInfo pass:
> 
> ======================================================================>
The very first paper I was looking at, when implementing it was "The
Program Structure Tree: Computing Control Regions in Linear Time (1994)".
Nadav already mentioned this and it is also mentioned in the
"RegionInfo.h" file.
> 
> The algorithm described in this paper is very easy to implement. However,
it only calculates SESE regions, where SESE means a region with a single entry
and a single exit _edge_. The word _edge_ is here very important, as the
algorithm misses a lot of regions that could easily be transformed to a SESE
_edge_ region, but that are no SESE _edge_ regions.
> 
> Have a look at:
> 
> opt -view-regions-only -only-simple-regions
test/Analysis/RegionInfo/condition_complicated.ll
> 
> All regions in this examples would not be detected by the first algorithm,
because they have both multiple entry and multiple exit edges. However,
transforming them to regions with a single entry and a single exit _edge_ is
simple. We call regions that are not single entry single exit edge regions, but
can easily transformed to such regions 'refined' regions. Simple single
entry singe exit _edge_ regions are called 'simple'.
> 
> For my work on Polly, I wanted to detect both 'simple' and
'refined' regions. The paper 'The Refined Process Structure Tree -
Jussi Vanhatalo, Hagen Voelyer, Jana Koehler - 2009' describes an algorithm
how to detect such regions in linear time. However, unfortunately, the proposed
algorithm is complex to implement (and I am not convinced it is fast in the
common cases).
> 
> At that point I started an IRC discussion about what to do and Chris
> suggested to look for an dominance information based algorithm. I did what
he said and came up with the existing algorithm, which is, due to taking
advantage of dominance information, rather simple. Unfortunately, I think I
misunderstood Chris, as I later realized he did not want me to use the
DominanceFrontier analysis, but just the normal DominatorTree.
> 
> So why did I not change the algorithm and remove the use of
DominanceFrontier? There are two reasons:
> 
> 1) It was and is not obvious for me how to do so
> 
> I did some analysis on SPEC and polyhedron.com and there are 3-8 times more
'refined' regions than 'simple' regions [1]. For the use in
Polly, I wanted to retain the ability to optimize 'refined' regions.
> 
> Unfortunately, until today I was not aware of an algorithm that has linear
complexity, is in practice fast and can calculate 'refined' regions.
> 
> 2) For many use cases the existing solution works perfectly fine
> 
> The RegionInfo analysis is obviously non-linear as it uses the
DominanceFrontier analysis. However, for me it works surprisingly well.
> When I committed it I tested it on the LLVM test suite, polyhedron.com and
the SPEC 2006 benchmarks [1]. I got the following timings (in seconds):
> 
> Name             DomTree  PostDomTree DomFrontier RegionTree
> SPEC 2006        1.109    0.911       0.525       0.662
> Polyhedron.com   0.034    0.029       0.016       0.022
> 
> On these examples DomFrontier and RegionTree calculation is very fast.
Also, I used RegionInfo for over two years in my daily Polly work. And I know
people who test Polly on their larger internal test suites. I got _never_ any
complains about RegionInfo speed issues. So RegionInfo is for now good enough
for my work.
> 
> I also heard that RegionInfo is used in some commercial compilers, but did
not hear of any performance problems there.
> 
> Reason 2) does not mean I am not in favor of improving RegionInfo. Very
much in contrast, I support any work that makes RegionInfo useable on the LLVM
machine cfgs or that allows it to be used as a clang default analysis pass. I
just raised that point to explain why I did not spend a large amount of time on
1)
> ======================================================================That's interesting. It looks like PostDomTree is the only thing we're
really concerned with pulling in. If RegionInfo can be further improved,
that's great, but I'm confident that any algorithm that does what
you're doing will require PostDomTree.

As a side note, in a perfectly efficient compiler, we should only have to pay
for either LoopInfo or Dominators, not both, since one can be efficiently
computed from the other. However, I think we're always stuck computing
PostDomTree from scratch whenever we need to determine control dependence, or
regions based on it.
> Andrew:
> > I'm dismayed that you depend on RegionInfo, but it is
understandable. > We could probably improve RegionInfo to be more efficient
and
> > self-contained (no DomFrontier), but I think we'd be much better
off
> > dropping RegionInfo and handling SEME regions. AFAIK CodeExtractor
> > can already handle that. Is there any particular reason you need
> > RegionInfo other than finding a connected set of blocks dominated by
> a cold block? Can FunctionSplitter just find its own regions?
> 
> Andrew, can you define what you mean by SEME region? As written above,
RegionInfo already detects SEME regions with a single entry block and multiple
exit edges that terminate at the same block. What is the _exact_ kind of SEME
region you are talking about? Does it require single entry edge or do you allow
various entry edges if they terminate at the same basic block?
> 
> Also, is there a good algorithm available to detect such regions. I am very
much in favor of replacing the existing RegionInfo algorithm with something
faster _and_ more generic.
I was calling your regions SESE, otherwise I need a new acronym!

I'm not considering the number of entry or exit edges, because restricting
them to end in the same block gives you the same properties as a single block
entry/exit AFAICT.

I'm not aware of a better algorithm for building RegionInfo when one of the
goals is to represent nested control dependence.

I don't know if we need a standalone SEME RegionInfo pass. Beyond LoopInfo,
I don't know what criteria would be used to pick the regions, but I
haven't read the papers you mention below.

Some passes, like FunctionSplitting, may have ad-hoc profile-driven region
formation logic. For them, the Region API is just a way to package a SEME-block
set of connected blocks.
> Also, supporting SEME regions with one entry block would be great
> as this would make the LoopInfo tree a subtree of the RegionTree. So
> a LoopTree would be a simple RegionTree that is already available by
default. Also, the code extractor
Yes, exactly.
> Chandler:
> > I'm dismayed that RegionInfo is still in the tree and not being
> > improved. ;] I think we should fix RegionInfo to be efficient and
> > reasonably well implemented. It as *already* handling SEME regions
> > except for the "ME" case of function returns (something that
is
> > trivial to fix).
> 
> I explained above, why RegionInfo is good enough for our cases and why
improving it is not straightforward. Advice how to make it better is highly
welcome.
I would have to do a lot more homework before I could suggest improving
RegionInfo. DomFrontiers seemed unnecessary, but if you say it's an
efficient and easy to reason about way of handling CFGs that aren't already
neatly nested, then I believe you.

The problem I was pointing out is that FunctionSplitting does not need to be
informed by control dependence. So it's pulling in a costly analysis
(PostDominators), and the only thing it gets in return is an unnecessary
limitation. FunctionSplitting should be able to work on any region that LoopInfo
can detect.
> Andrew:
>> There are two issues then, Region API vs. Region discovery.
>> 
>> A convenient API for visiting and traversing regions is nice, though by
>> design it's never needed in LLVM. A problem with the existing
RegionInfo
>> API is that it is superficially limited to SESE--though you say
>> otherwise.
> 
> Not exactly SESE, but yes. I don't see a reason to widen the API to a
more generic SEME region, in case there is an algorithm to detect such regions.
> 
> Andrew:
>> That may be right for some clients, but then those clients
>> likely don't need the region discovery that it provides. They can
>> probably get by with finding control equivalent regions, which is
>> trivial if you have postdominators--walk down in the dom tree and up in
>> the posdom tree at the same time.
> 
> Are you referring to the SESE regions as defined in that paper:
> "The Program Structure Tree: Computing Control Regions in Linear
Time"
I'm not referring to a specific paper. I was suggesting that a pass could
probably directly query control dependence without building RegionInfo. But if
the only expensive thing in RegionInfo is PostDominators, then it just comes
down to the interface that's most natural.
> As explained above, they have a very low resolution.
> 
> Andrew:
>> Maybe you just want a neat Region iterator API utility that could be
>> used by anyone doing region discovery, including RegionInfo. But you
>> wouldn't need to pull in RegionInfo analysis.
> 
> Sounds like a good idea.
> 
> Chandler:
> > It would be better to have such clarification in documentation,
> > ideally in the header files of these passes. This is a very
"meta"
> > point, but it is frustrating to contributors to have arbitrary
> > restrictions materialize only after designing an optimization. I
> > would rather that people are aware of the constraints they need to
> > work within if they want to implement a particular optimization pass.
> > Essentially, if there are analyses or passes which are known to be
> > unacceptable for the normal compilation pipeline, I think that would
> > be an important thing to mention in the high-level comments for the
> > pass. (It is possible that I missed such comments, or that I never
> > looked in the right place, but I feel like the fact that dom-frontier
> > and region-info is essentially on the chopping block should be more
> > clear than it currently is... ;]
> 
> True. We should clearly document which passes are acceptable in the default
optimization chain.
One (hopefully obvious) rule of thumb. When looking at the cost/benefit of a new
transformation, the compile time must include time spent in  analysis passes
that now get pulled in at a new point in the standard pass order. If you're
pulling in a new Analysis, you need to be able to show that it solves an
important problem for you in the most reasonably efficient way.

There may be some analysis passes that are currently in use but are out-of-favor
and we would like to remove. This is a collective "we", so my strategy
is to look at commit logs and ask around. There are many cases in which I would
have benefitted from more documentation. Stale docs hurt though.

-Andy 
> I also looked myself into papers again and discovered two interesting ones:
> 
> 1) Simplified Computation and Generalization of
>   the Refined Process Structure Tree
>   Azrtem Polyvyanyy, Jussi Vanhatalo, and Hagen Völzer, 2010
> 
> A very recent paper, that gives us high resolution SESE regions
('simple' and 'refined' and even a little bit more) in linear
time.
> They state this algorithm is simpler than the previous. Basically, it seems
to be equivalent to calculate the treeconnected components.
> 
> 2) Code Compaction of Matching Single-Entry Multiple-Exit Regions (2003)
>   Cached
>   Wen-ke Chen , Bengu Li , Rajiv Gupta, 2003
> 
> I missed this one earlier. They calculate SEME regions based on the control
dependence graph.
> 
> I am currently a little busy, but I like the idea of moving to SEME regions
and having a generic Region iterator API. We could test this API with the
current RegionInfo and the LoopInfo passes as region discovery algorithms (one
more precise, the other available by default). In case a more generic and faster
SEME region algorithm is implemented, the Region iterator API users would
automatically take advantage of it.
> 
> Cheers
> Tobi
> 
> 
> 
> 
> 
> 
> 
>

Tobias Grosser

2012-May-21 23:44 UTC

head link

[LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions

On 05/21/2012 10:36 PM, Andrew Trick wrote:> Tobias,
>
> Thanks for taking the time to summarize all this. It's a great writeup.
I'm moving the thread to llvm-dev. My responses below.
>
[..]
>> Name             DomTree  PostDomTree DomFrontier RegionTree
>> SPEC 2006        1.109    0.911       0.525       0.662
>> Polyhedron.com   0.034    0.029       0.016       0.022
>>
>> On these examples DomFrontier and RegionTree calculation is very fast.
Also, I used RegionInfo for over two years in my daily Polly work. And I know
people who test Polly on their larger internal test suites. I got _never_ any
complains about RegionInfo speed issues. So RegionInfo is for now good enough
for my work.
>>
>> I also heard that RegionInfo is used in some commercial compilers, but
did not hear of any performance problems there.
>>
>> Reason 2) does not mean I am not in favor of improving RegionInfo. Very
much in contrast, I support any work that makes RegionInfo useable on the LLVM
machine cfgs or that allows it to be used as a clang default analysis pass. I
just raised that point to explain why I did not spend a large amount of time on
1)
>>
======================================================================>
> That's interesting. It looks like PostDomTree is the only thing
we're really concerned with pulling in. If RegionInfo can be further
improved, that's great, but I'm confident that any algorithm that does
what you're doing will require PostDomTree.
I thought the reason DomFrontier is a problem was the worst case 
quadratic complexity. I have never seen a case triggering that behavior, 
but I must admit I did not specifically look for it.
> As a side note, in a perfectly efficient compiler, we should only have to
pay for either LoopInfo or Dominators, not both, since one can be efficiently
computed from the other. However, I think we're always stuck computing
PostDomTree from scratch whenever we need to determine control dependence, or
regions based on it.
Yes, probably the PostDomtree or something equally expensive is 
necessary. I believe in case a transformation needs detailed control 
regions, we need to see if the benefits it gives are worth the added 
analysis overhead. In case we can get around calculating detailed 
control regions, we should definitely try to go this way.
>> Andrew:
>>> I'm dismayed that you depend on RegionInfo, but it is
understandable.>  We could probably improve RegionInfo to be more efficient
and
>>> self-contained (no DomFrontier), but I think we'd be much
better off
>>> dropping RegionInfo and handling SEME regions. AFAIK CodeExtractor
>>> can already handle that. Is there any particular reason you need
>>> RegionInfo other than finding a connected set of blocks dominated
by>  a cold block? Can FunctionSplitter just find its own regions?
>>
>> Andrew, can you define what you mean by SEME region? As written above,
RegionInfo already detects SEME regions with a single entry block and multiple
exit edges that terminate at the same block. What is the _exact_ kind of SEME
region you are talking about? Does it require single entry edge or do you allow
various entry edges if they terminate at the same basic block?
>>
>> Also, is there a good algorithm available to detect such regions. I am
very much in favor of replacing the existing RegionInfo algorithm with something
faster _and_ more generic.
>
> I was calling your regions SESE, otherwise I need a new acronym!
>
> I'm not considering the number of entry or exit edges, because
restricting them to end in the same block gives you the same properties as a
single block entry/exit AFAICT.
Yes. It is trivial to transform between them, but it is a lot easier to 
detect SESE regions if they have just a single entry and exit edge.
> I'm not aware of a better algorithm for building RegionInfo when one of
the goals is to represent nested control dependence.
Even without. I did not find a way to check for two given basic blocks, 
if they form a SESE region without using DomFrontier (just using DomTree 
and PostDomTree). It is trivial if there is a single entry and a single 
exit edge, but for multiple edges (ending at the same block) I could not 
find a good solution.

(Solving this problem, would remove the requirement for DomFrontier in 
RegionInfo)
> I don't know if we need a standalone SEME RegionInfo pass. Beyond
LoopInfo, I don't know what criteria would be used to pick the regions, but
I haven't read the papers you mention below.
> Some passes, like FunctionSplitting, may have ad-hoc profile-driven region
formation logic. For them, the Region API is just a way to package a SEME-block
set of connected blocks.
>
>> Also, supporting SEME regions with one entry block would be great
>> as this would make the LoopInfo tree a subtree of the RegionTree. So
>> a LoopTree would be a simple RegionTree that is already available by
default. Also, the code extractor
>
> Yes, exactly.
>
>> Chandler:
>>> I'm dismayed that RegionInfo is still in the tree and not being
>>> improved. ;] I think we should fix RegionInfo to be efficient and
>>> reasonably well implemented. It as *already* handling SEME regions
>>> except for the "ME" case of function returns (something
that is
>>> trivial to fix).
>>
>> I explained above, why RegionInfo is good enough for our cases and why
improving it is not straightforward. Advice how to make it better is highly
welcome.
>
> I would have to do a lot more homework before I could suggest improving
RegionInfo. DomFrontiers seemed unnecessary, but if you say it's an
efficient and easy to reason about way of handling CFGs that aren't already
neatly nested, then I believe you.
I said it's good enough for my uses. Different tools may have different 
requirements.
> The problem I was pointing out is that FunctionSplitting does not need to
be informed by control dependence. So it's pulling in a costly analysis
(PostDominators), and the only thing it gets in return is an unnecessary
limitation. FunctionSplitting should be able to work on any region that LoopInfo
can detect.
Sure. If FunctionSplitting can be written without using the RegionInfo 
analysis, avoiding the (re)calculation of it is the way to go. To me it 
seems RegionInfo gives more details. It can e.g. give separate regions 
for the if/else parts of a condition. If this level of detail is needed, 
should to be evaluated. For this it seems to be good to define generic 
regions that can either be calculated based on LoopInfo or on 
RegionInfo. Like this the same pass can be used with different analysis 
depending on how much details are required.
>> As explained above, they have a very low resolution.
>>
>> Andrew:
>>> Maybe you just want a neat Region iterator API utility that could
be
>>> used by anyone doing region discovery, including RegionInfo. But
you
>>> wouldn't need to pull in RegionInfo analysis.
>>
>> Sounds like a good idea.
>>
>> Chandler:
>>> It would be better to have such clarification in documentation,
>>> ideally in the header files of these passes. This is a very
"meta"
>>> point, but it is frustrating to contributors to have arbitrary
>>> restrictions materialize only after designing an optimization. I
>>> would rather that people are aware of the constraints they need to
>>> work within if they want to implement a particular optimization
pass.
>>> Essentially, if there are analyses or passes which are known to be
>>> unacceptable for the normal compilation pipeline, I think that
would
>>> be an important thing to mention in the high-level comments for the
>>> pass. (It is possible that I missed such comments, or that I never
>>> looked in the right place, but I feel like the fact that
dom-frontier
>>> and region-info is essentially on the chopping block should be more
>>> clear than it currently is... ;]
>>
>> True. We should clearly document which passes are acceptable in the
default optimization chain.
>
> One (hopefully obvious) rule of thumb. When looking at the cost/benefit of
a new transformation, the compile time must include time spent in  analysis
passes that now get pulled in at a new point in the standard pass order. If
you're pulling in a new Analysis, you need to be able to show that it solves
an important problem for you in the most reasonably efficient way.
Yes, that's why I want to the RegionInfo to give high detail results. 
They are costly, but allow us to measure the benefit of a high 
resolution. When pulling passes in the default optimization chain, it 
obviously needs to be understood if the high resolution is necessary or 
if a faster, lower resolution analysis is the better option.
> There may be some analysis passes that are currently in use but are
out-of-favor and we would like to remove. This is a collective "we",
so my strategy is to look at commit logs and ask around. There are many cases in
which I would have benefitted from more documentation. Stale docs hurt though.
Maybe we should just add your 'rule of thumb' to some documentation and 
mention the PostDom or RegionInfo pass as an example.

Tobi

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - May 2012 - [LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions

[LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions

[LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions

Maybe Matching Threads