thr3ads.net - llvm dev - [LLVMdev] Make LoopBase inherit from "RegionBase"? [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2010-Jan-13 01:14 UTC

[LLVMdev] Make LoopBase inherit from "RegionBase"?

On 01/13/10 00:56, Jan Sjodin wrote:> Why not use the "standard" algorithm for detecting SESE-regions
and building a program structure tree?
> It should handle everything you want. It also becomes much simpler to
specify a connected SESE-region
> by entry/exit edges, while a disconnected region is specified by entry/exit
blocks. Only defining regions on
> blocks is not enough to be able to quickly determine how to replace/move a
region in a CFG.
> 
> The algorithm can be found in this paper:
> The Program Structure Tree: Computing Control Regions in Linear Time
> by Richard Johnson ,  David Pearson , KeshavPingali
Hi Jan,

great to read you again.

And thanks for pointing me to this paper, I read it but did some further
research. I like the idea using edges to define the regions, however it
does not catch all regions.

Defining regions using just edges stops us from detection a lot of very
common regions.

Example:

A very common CFG:

      \ /
       1
      / \
     2   3
     |   |
     4   6
     |   |
     5   7
      \ /
       8
      / \
     9   10
     |   |
    11   12
      \ /
       13
      / \

I would detect these two regions:

Region A: 1 -> 8   containing {1,2,3,4,5,6,7}
Region B: 8 -> 13  containing {8,9,10,11,12}

If I use edges to define the regions the detection is not possible at all.

After region detection the CFG can always be split up to create single
entry single exit edges, if they are needed e.g. for code generation.

      \ /
      1_a
       |
       1
      / \
     2   3
     |   |
     4   6
     |   |
     5   7
      \ /
      8_a
       |
       8
      / \
     9   10
     |   |
    11   12
      \ /
      13_a
       |
       13
      / \

Now the regions can be defined using edges:

Region A: (1_a,1) -> (8_a, 8)   containing {1,2,3,4,5,6,7,8_a}
Region B: (8_a, 8) -> (13_a, 13)  containing {8,9,10,11,12,13_a}

In general this approach saves a preliminary pass that has to insert new
bbs, to generate these edges. As we do not have to modify the CFG other
passes like dominance information are still valid, and we do not have to
create a lot of auxiliary bbs, to be able to detect all regions. This
saves memory and runtime. In general it is probably not too easy to
decide where to insert these bbs either:

CFG:
        0
        |
        1
       / |
      2   |
     / \   3
    4   5  |
    |   |  |
    6   7  8
     \  | /
      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}
        9        region B: 2 -> 9 {2,4,5,6,7}

So we need one bb that joins 6 and 7 and one that joins the two regions

CFG:    0
        |
        1
       / |
      2   |
     / \   3
    4   5  |
    |   |  |
    6   7  8
     \  |  |
      \ |  |     region A: (0,1) -> (9b,9)  {1,2,3,4,5,6,7,8,9a,9b}
       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
        \ /
         9b
         |
         9

My approach is comparable to this paper:

The Refined Process Structure Tree by Jussi Vanhatalo, Hagen Völzer,
Jana Koehler

The implementation however takes advantage of the existence of
Dominance/PostDominance information. Therefore it is simpler and
hopefully faster. At the moment run time is comparable to dominance tree
calculation.

If you want, have a look into some results I got with a pass extracting
maximal non trivial regions that do not contain loops from libbz2:

http://tobias.osaft.eu/llvm/region/bz2NoLoops/

Interesting ones:

regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
has a lot of exit edges.

regions_without_loops.bzopen_or_bzdopen.dot.png
the first region has two entry edges. One is the loop latch.

(Keep in mind all regions have the same color, so if it seems there is
an edge into a region, there are just two regions close by)

Without a prepass that exposes the edges almost no region could be
detected with the "standard" approach.

It would be great to hear your opinion about this.

See you

Tobias

Jan Sjodin

2010-Jan-13 04:09 UTC

head link

[LLVMdev] Make LoopBase inherit from "RegionBase"?

Hi Tobias
> In general this approach saves a preliminary pass that has to insert new
> bbs, to generate these edges. As we do not have to modify the CFG other
> passes like dominance information are still valid, and we do not have to
> create a lot of auxiliary bbs, to be able to detect all regions. This
> saves memory and runtime. In general it is probably not too easy to
> decide where to insert these bbs either:
The general rule is to split all blocks with multiple in-edges and multiple
out-edges
into blocks with either multiple in-edges or multiple out-edges, but not both.
One option is to keep this as an invariant throughout the compiler and make use
of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other
code
in regular basic blocks. There are different variations on this that may or may
not be
useful.
> CFG:
>        0
>        |
>        1
>       / |
>      2   |
>     / \   3
>    4   5  |
>    |   |  |
>    6   7  8
>     \  | /
>      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>        9        region B: 2 -> 9 {2,4,5,6,7}
>
> So we need one bb that joins 6 and 7 and one that joins the two regions
>
> CFG:    0
>        |
>        1
>       / |
>      2   |
>     / \   3
>    4   5  |
>    |   |  |
>    6   7  8
>     \  |  |
>      \ |  |     region A: (0,1) -> (9b,9)  {1,2,3,4,5,6,7,8,9a,9b}
>       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>        \ /
>         9b
>         |
>         9
It is fairly simple to use the information from the algorithm to decide
where those merges should be inserted to get the expected regions.
This may be needed in the cases where a sub-region is too complicated
to be represented and must be abstracted into a "black box". 
> My approach is comparable to this paper:
> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
> Jana Koehler
I was looking through some slides that described their algorithm. One case that
seems to be legal is this:

      |
     Entry
    /    \
  R0      R1
    \     /
     Exit
       |

With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
that a fragment cannot be identified using only the entry and exit blocks, but
the internal blocks or edges will also need to be listed. I don't know if
this is
relevant to your implementation.
> The implementation however takes advantage of the existence of
> Dominance/PostDominance information. Therefore it is simpler and
> hopefully faster. At the moment run time is comparable to dominance tree
> calculation.
Both algorithms are linear so there is really no big difference in time imo. 
I believe the biggest difference that you mention is that you can capture more 
complicated regions without having to modify the CFG with the current
algorithm.
> If you want, have a look into some results I got with a pass extracting
> maximal non trivial regions that do not contain loops from libbz2:
> 
> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
> 
> Interesting ones:
> 
> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
> has a lot of exit edges.
I think this example proves the strengths and weaknesses of both
approaches. Making that region into structured control flow would add a lot
of additional blocks. This will also happen after generating code
from the polyhedral model, so either way the cost is there if the optimization
is successful.

The second  case is where the optimization fails (no profitable
transformation found) and the CFG can remain untouched.  

The third case is if one of those blocks contains something complicated. 
I believe the current algorithm simply fails and cannot detect the region. If
the
CFG is modified this would allow an internal SESE-region to become a black box,
and the
the outer regions could be optimized. 
> regions_without_loops.bzopen_or_bzdopen.dot.png
> the first region has two entry edges. One is the loop latch.
> (Keep in mind all regions have the same color, so if it seems there is
> an edge into a region, there are just two regions close by)
>
> Without a prepass that exposes the edges almost no region could be
> detected with the "standard" approach.
Indeed the CFG will have to be modified for these cases. I it seems to me that
the trade-off
between the two approaches is that the algorithm that you currently have is a
cheaper up
front, but may be less capable in some cases, while the "standard"
algorithm will be more
expensive, but can handle problematic regions better. Would you agree?

- Jan

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jan 2010 - [LLVMdev] Make LoopBase inherit from "RegionBase"?

[LLVMdev] Make LoopBase inherit from "RegionBase"?

[LLVMdev] Make LoopBase inherit from "RegionBase"?

Apparently Analagous Threads