>>> bbs, to generate these edges. As we do not have to modify the CFG
other
>>> passes like dominance information are still valid, and we do not
have to
>>> create a lot of auxiliary bbs, to be able to detect all regions.
This
>>> saves memory and runtime. In general it is probably not too easy to
>>> decide where to insert these bbs either:
>>
>> The general rule is to split all blocks with multiple in-edges and
multiple out-edges
>> into blocks with either multiple in-edges or multiple out-edges, but
not both.
> This is not sufficient, as shown in the example below. It would allow
> only one region.
Yes, but with the insertion of merge-blocks it will allow more (see below).
>> One option is to keep this as an invariant throughout the compiler and
make use
>> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and
all other code
>> in regular basic blocks. There are different variations on this that
may or may not be
>> useful.
> This might be possible, however probably doubling the number of bbs.
I don't know if it will be double, but there will be more basic blocks for
sure.
>>>> CFG:
>>> 0
>>> |
>>> 1
>>> / |
>>> 2 |
>>> / \ 3
>>> 4 5 |
>>> | | |
>>> 6 7 8
>>> \ | /
>>> \ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>>> 9 region B: 2 -> 9 {2,4,5,6,7}
>>>
>>> So we need one bb that joins 6 and 7 and one that joins the two
regions
>>>
>>> CFG: 0
>>> |
>>> 1
>>> / |
>>> 2 |
>>> / \ 3
>>> 4 5 |
>>> | | |
>>> 6 7 8
>>> \ | |
>>> \ | | region A: (0,1) -> (9b,9)
{1,2,3,4,5,6,7,8,9a,9b}
>>> 9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>>> \ /
>>> 9b
>>> |
>>> 9
>>
>> It is fairly simple to use the information from the algorithm to decide
>> where those merges should be inserted to get the expected regions.
>> This may be needed in the cases where a sub-region is too complicated
>> to be represented and must be abstracted into a "black box".
> From which algorithm? The program structure tree does not give this
> information, does it?
The algorithm that computes the SESE-regions can be used to determine
where the merge-nodes should be inserted. There are a couple of ways
of doing it, but if the bracket sets on two eges can be intersected to
match a third edge (which dominates the first two), you can insert a
merge block for the two edges. You don't have to compute the
dominators, but it helps to explain the problem that way.
>>>> My approach is comparable to this paper:
>>>> The Refined Process Structure Tree by JussiVanhatalo, Hagen
Völzer,
>>>> Jana Koehler
>>>
>>> I was looking through some slides that described their algorithm.
One case that
>>> seems to be legal is this:
>>>
>>> |
>>> Entry
>>> / \
>>> R0 R1
>>> \ /
>>> Exit
>>> |
>>>
>>> With two fragments: Entry->R0->Exit and
Entry->R1->Exit, which means
>>> that a fragment cannot be identified using only the entry and exit
blocks, but
>>> the internal blocks or edges will also need to be listed. I
don't know if this is
>>> relevant to your implementation.
>
> No. The ideas are comparable, however I believe their implementation is
> a little complicated. ;-)
Do you have the same definition of a region and entry/exit blocks as they do?
> I would mark the regions as:
>
> Region A: R0 -> Exit, containing {R0}
> Region B: R1 -> Exit, containing {R1}
Is the entry always contained and is the exit never contained, or is that
specified
per region? Depending on the restrictions of entry and exit blocks a loop with a
single
basic block cannot be an entry or exit by itself. Example:
|
A
/| _
/ |/ \
B R |
\ |\_/
\|
C
|
If you only care about R in this case how is the region formed?
>>> The implementation however takes advantage of the existence of
>>> Dominance/PostDominance information. Therefore it is simpler and
>>> hopefully faster. At the moment run time is comparable to dominance
tree
>>> calculation.
>>
>> Both algorithms are linear so there is really no big difference in time
imo.
> Sure. However in terms of maintainability it is nice to be able to reuse
> existing analysis instead of write another triconnected component
> analysis upfront.
>
>> I believe the biggest difference that you mention is that you can
capture more
>> complicated regions without having to modify the CFG with the current
>> algorithm.
> Yes.
>
>>> If you want, have a look into some results I got with a pass
extracting
>>> maximal non trivial regions that do not contain loops from libbz2:
>>>
>>> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>>>
>>> Interesting ones:
>>>
>>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
>>> has a lot of exit edges.
>>
>> I think this example proves the strengths and weaknesses of both
>> approaches. Making that region into structured control flow would add a
lot
>> of additional blocks. This will also happen after generating code
>> from the polyhedral model, so either way the cost is there if the
optimization
>> is successful.
> Yes, but just in this case and not for regions where the model cannot be
> applied. If the regions pass is used for analysis purposes, nothing has
> to be touched.
>
>> The second case is where the optimization fails (no profitable
>> transformation found) and the CFG can remain untouched.
>>
>> The third case is if one of those blocks contains something
complicated.
>> I believe the current algorithm simply fails and cannot detect the
region.
> Which algorithm? The one in Graphite? Or the region detection I wrote
> here? This is just plain region detection, that does not even look at
> the content but builds a region tree (program structure tree). It just
> detects every possible region.
> The selection would be a later pass.
My assumption was that there is a selection in there somewhere.
Do you plan to refine the regions in the selection phase in any way?
>> If the
>> CFG is modified this would allow an internal SESE-region to become a
black box, and the
>> the outer regions could be optimized.
>This is an optimization, however I think it is orthogonal to the region
>detection problem. Say it works with any algorithm.
I believe that creating a black-box will map a lot more cleanly to an edge-based
region definition, since block-based may include multiple entry/exit sub-regions
that will not encapsulate control flow in a reasonable way.
>>> regions_without_loops.bzopen_or_bzdopen.dot.png
>>> the first region has two entry edges. One is the loop latch.
>>> (Keep in mind all regions have the same color, so if it seems there
is
>>> an edge into a region, there are just two regions close by)
>>>
>>> Without a prepass that exposes the edges almost no region could be
>>> detected with the "standard" approach.
>>
>> Indeed the CFG will have to be modified for these cases. I it seems to
me that the trade-off
>> between the two approaches is that the algorithm that you currently
have is a cheaper up
>> front, but may be less capable in some cases, while the
"standard" algorithm will be more
>> expensive, but can handle problematic regions better. Would you agree?
>
> I agree that the algorithm I have is cheaper upfront, but I do not yet
> see a case where the algorithm is less capable. Would you mind to give
> an example or to highlight the relevant part of the discussion?
With the insertion of extra merge-blocks the code becomes more structured and
the PST
can be refined further. A more fine-grained PST may allow more cases to be
handled.
Thanks
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100120/6ce1a3c3/attachment.html>