thr3ads.net - llvm dev - [LLVMdev] Make LoopBase inherit from "RegionBase"? [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2010-Jan-13 09:19 UTC

[LLVMdev] Make LoopBase inherit from "RegionBase"?

On 01/13/10 05:09, Jan Sjodin wrote:> Hi Tobias
> 
>> In general this approach saves a preliminary pass that has to insert
new
> 
>> bbs, to generate these edges. As we do not have to modify the CFG other
>> passes like dominance information are still valid, and we do not have
to
>> create a lot of auxiliary bbs, to be able to detect all regions. This
>> saves memory and runtime. In general it is probably not too easy to
>> decide where to insert these bbs either:
> 
> The general rule is to split all blocks with multiple in-edges and multiple
out-edges
> into blocks with either multiple in-edges or multiple out-edges, but not
both.This is not sufficient, as shown in the example below. It would allow
only one region.
> One option is to keep this as an invariant throughout the compiler and make
use
> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all
other code
> in regular basic blocks. There are different variations on this that may or
may not be
> useful.This might be possible, however probably doubling the number of bbs.
> 
>> CFG:
>>        0
>>        |
>>        1
>>       / |
>>      2   |
>>     / \   3
>>    4   5  |
>>    |   |  |
>>    6   7  8
>>     \  | /
>>      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>>        9        region B: 2 -> 9 {2,4,5,6,7}
>>
>> So we need one bb that joins 6 and 7 and one that joins the two regions
>>
>> CFG:    0
>>        |
>>        1
>>       / |
>>      2   |
>>     / \   3
>>    4   5  |
>>    |   |  |
>>    6   7  8
>>     \  |  |
>>      \ |  |     region A: (0,1) -> (9b,9)  {1,2,3,4,5,6,7,8,9a,9b}
>>       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>>        \ /
>>         9b
>>         |
>>         9
> 
> It is fairly simple to use the information from the algorithm to decide
> where those merges should be inserted to get the expected regions.
> This may be needed in the cases where a sub-region is too complicated
> to be represented and must be abstracted into a "black box". 
>From which algorithm?  The program structure tree does not give thisinformation, does it?
>> My approach is comparable to this paper:
>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
>> Jana Koehler
> 
> I was looking through some slides that described their algorithm. One case
that
> seems to be legal is this:
> 
>       |
>      Entry
>     /    \
>   R0      R1
>     \     /
>      Exit
>        |
> 
> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which
means
> that a fragment cannot be identified using only the entry and exit blocks,
but
> the internal blocks or edges will also need to be listed. I don't know
if this is
> relevant to your implementation.
No. The ideas are comparable, however I believe their implementation is
a little complicated. ;-)

I would mark the regions as:

Region A: R0 -> Exit, containing {R0}
Region B: R1 -> Exit, containing {R1}
>> The implementation however takes advantage of the existence of
>> Dominance/PostDominance information. Therefore it is simpler and
>> hopefully faster. At the moment run time is comparable to dominance
tree
>> calculation.
> 
> Both algorithms are linear so there is really no big difference in time
imo.Sure. However in terms of maintainability it is nice to be able to reuse
existing analysis instead of write another triconnected component
analysis upfront.
> I believe the biggest difference that you mention is that you can capture
more
> complicated regions without having to modify the CFG with the current
> algorithm.Yes.
>> If you want, have a look into some results I got with a pass extracting
>> maximal non trivial regions that do not contain loops from libbz2:
>>
>> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>>
>> Interesting ones:
>>
>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
>> has a lot of exit edges.
> 
> I think this example proves the strengths and weaknesses of both
> approaches. Making that region into structured control flow would add a lot
> of additional blocks. This will also happen after generating code
> from the polyhedral model, so either way the cost is there if the
optimization
> is successful.Yes, but just in this case and not for regions where the model cannot be
applied. If the regions pass is used for analysis purposes, nothing has
to be touched.
> The second  case is where the optimization fails (no profitable
> transformation found) and the CFG can remain untouched.  
> 
> The third case is if one of those blocks contains something complicated. 
> I believe the current algorithm simply fails and cannot detect the region.Which algorithm? The one in Graphite? Or the region detection I wrote
here? This is just plain region detection, that does not even look at
the content but builds a region tree (program structure tree). It just
detects every possible region.
The selection would be a later pass.
> If the
> CFG is modified this would allow an internal SESE-region to become a black
box, and the
> the outer regions could be optimized. This is an optimization, however I think it is orthogonal to the region
detection problem. Say it works with any algorithm.

>> regions_without_loops.bzopen_or_bzdopen.dot.png
>> the first region has two entry edges. One is the loop latch.
>> (Keep in mind all regions have the same color, so if it seems there is
>> an edge into a region, there are just two regions close by)
>>
>> Without a prepass that exposes the edges almost no region could be
>> detected with the "standard" approach.
> 
> Indeed the CFG will have to be modified for these cases. I it seems to me
that the trade-off
> between the two approaches is that the algorithm that you currently have is
a cheaper up
> front, but may be less capable in some cases, while the
"standard" algorithm will be more
> expensive, but can handle problematic regions better. Would you agree?
I agree that the algorithm I have is cheaper upfront, but I do not yet
see a case where the algorithm is less capable. Would you mind to give
an example or to highlight the relevant part of the discussion?

Thanks a lot

Tobias

Jan Sjodin

2010-Jan-20 17:05 UTC

head link

[LLVMdev] Make LoopBase inherit from "RegionBase"?

>>> bbs, to generate these edges. As we do not have to modify the CFG
other
>>> passes like dominance information are still valid, and we do not
have to
>>> create a lot of auxiliary bbs, to be able to detect all regions.
This
>>> saves memory and runtime. In general it is probably not too easy to
>>> decide where to insert these bbs either:
>>
>> The general rule is to split all blocks with multiple in-edges and
multiple out-edges
>> into blocks with either multiple in-edges or multiple out-edges, but
not both.
> This is not sufficient, as shown in the example below. It would allow
> only one region.
Yes, but with the insertion of merge-blocks it will allow more (see below).
>> One option is to keep this as an invariant throughout the compiler and
make use
>> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and
all other code
>> in regular basic blocks. There are different variations on this that
may or may not be
>> useful.
> This might be possible, however probably doubling the number of bbs.
I don't know if it will be double, but there will be more basic blocks for
sure.
>>>> CFG:
>>>        0
>>>        |
>>>        1
>>>       / |
>>>      2   |
>>>     / \   3
>>>    4   5  |
>>>    |   |  |
>>>    6   7  8
>>>     \  | /
>>>      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>>>        9        region B: 2 -> 9 {2,4,5,6,7}
>>>
>>> So we need one bb that joins 6 and 7 and one that joins the two
regions
>>>
>>> CFG:    0
>>>        |
>>>        1
>>>       / |
>>>      2   |
>>>     / \   3
>>>    4   5  |
>>>    |   |  |
>>>    6   7  8
>>>     \  |  |
>>>      \ |  |     region A: (0,1) -> (9b,9) 
{1,2,3,4,5,6,7,8,9a,9b}
>>>       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>>>        \ /
>>>         9b
>>>         |
>>>         9
>>
>> It is fairly simple to use the information from the algorithm to decide
>> where those merges should be inserted to get the expected regions.
>> This may be needed in the cases where a sub-region is too complicated
>> to be represented and must be abstracted into a "black box".
> From which algorithm?  The program structure tree does not give this
> information, does it?
The algorithm that computes the SESE-regions can be used to determine
where the merge-nodes should be inserted. There are a couple of ways
of doing it, but if the bracket sets on two eges can be intersected to
match a third edge (which dominates the first two), you can insert a
merge block for the two edges. You don't have to compute the
dominators, but it helps to explain the problem that way.
>>>> My approach is comparable to this paper:
>>>> The Refined Process Structure Tree by JussiVanhatalo, Hagen
Völzer,
>>>> Jana Koehler
>>>
>>> I was looking through some slides that described their algorithm.
One case that
>>> seems to be legal is this:
>>>
>>>       |
>>>      Entry
>>>     /    \
>>>   R0      R1
>>>     \     /
>>>      Exit
>>>        |
>>>
>>> With two fragments: Entry->R0->Exit and
Entry->R1->Exit, which means
>>> that a fragment cannot be identified using only the entry and exit
blocks, but
>>> the internal blocks or edges will also need to be listed. I
don't know if this is
>>> relevant to your implementation.
>
> No. The ideas are comparable, however I believe their implementation is
> a little complicated. ;-)
Do you have the same definition of a region and entry/exit blocks as they do?
> I would mark the regions as:
>
> Region A: R0 -> Exit, containing {R0}
> Region B: R1 -> Exit, containing {R1}
Is the entry always contained and is the exit never contained, or is that
specified
per region? Depending on the restrictions of entry and exit blocks a loop with a
single
basic block cannot be an entry or exit by itself. Example:

    |
    A
   /| _
  / |/ \
 B  R  |
  \ |\_/
   \|
    C
    |

If you only care about R in this case how is the region formed?

>>> The implementation however takes advantage of the existence of
>>> Dominance/PostDominance information. Therefore it is simpler and
>>> hopefully faster. At the moment run time is comparable to dominance
tree
>>> calculation.
>>
>> Both algorithms are linear so there is really no big difference in time
imo.
> Sure. However in terms of maintainability it is nice to be able to reuse
> existing analysis instead of write another triconnected component
> analysis upfront.
>
>> I believe the biggest difference that you mention is that you can
capture more
>> complicated regions without having to modify the CFG with the current
>> algorithm.
> Yes.
>
>>> If you want, have a look into some results I got with a pass
extracting
>>> maximal non trivial regions that do not contain loops from libbz2:
>>>
>>> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>>>
>>> Interesting ones:
>>>
>>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
>>> has a lot of exit edges.
>>
>> I think this example proves the strengths and weaknesses of both
>> approaches. Making that region into structured control flow would add a
lot
>> of additional blocks. This will also happen after generating code
>> from the polyhedral model, so either way the cost is there if the
optimization
>> is successful.
> Yes, but just in this case and not for regions where the model cannot be
> applied. If the regions pass is used for analysis purposes, nothing has
> to be touched.
>
>> The second  case is where the optimization fails (no profitable
>> transformation found) and the CFG can remain untouched. 
>>
>> The third case is if one of those blocks contains something
complicated.
>> I believe the current algorithm simply fails and cannot detect the
region.
> Which algorithm? The one in Graphite? Or the region detection I wrote
> here? This is just plain region detection, that does not even look at
> the content but builds a region tree (program structure tree). It just
> detects every possible region.
> The selection would be a later pass.
My assumption was that there is a selection in there somewhere.
Do you plan to refine the regions in the selection phase in any way?
>> If the
>> CFG is modified this would allow an internal SESE-region to become a
black box, and the
>> the outer regions could be optimized.
>This is an optimization, however I think it is orthogonal to the region
>detection problem. Say it works with any algorithm.
I believe that creating a black-box will map a lot more cleanly to an edge-based
region definition, since block-based may include multiple entry/exit sub-regions
that will not encapsulate control flow in a reasonable way.
>>> regions_without_loops.bzopen_or_bzdopen.dot.png
>>> the first region has two entry edges. One is the loop latch.
>>> (Keep in mind all regions have the same color, so if it seems there
is
>>> an edge into a region, there are just two regions close by)
>>>
>>> Without a prepass that exposes the edges almost no region could be
>>> detected with the "standard" approach.
>>
>> Indeed the CFG will have to be modified for these cases. I it seems to
me that the trade-off
>> between the two approaches is that the algorithm that you currently
have is a cheaper up
>> front, but may be less capable in some cases, while the
"standard" algorithm will be more
>> expensive, but can handle problematic regions better. Would you agree?
>
> I agree that the algorithm I have is cheaper upfront, but I do not yet
> see a case where the algorithm is less capable. Would you mind to give
> an example or to highlight the relevant part of the discussion?
With the insertion of extra merge-blocks the code becomes more structured and
the PST
can be refined further. A more fine-grained PST may allow more cases to be
handled.

Thanks

Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100120/6ce1a3c3/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jan 2010 - [LLVMdev] Make LoopBase inherit from "RegionBase"?

[LLVMdev] Make LoopBase inherit from "RegionBase"?

[LLVMdev] Make LoopBase inherit from "RegionBase"?

Possibly Parallel Threads