On 01/13/10 00:56, Jan Sjodin wrote:> Why not use the "standard" algorithm for detecting SESE-regions and building a program structure tree? > It should handle everything you want. It also becomes much simpler to specify a connected SESE-region > by entry/exit edges, while a disconnected region is specified by entry/exit blocks. Only defining regions on > blocks is not enough to be able to quickly determine how to replace/move a region in a CFG. > > The algorithm can be found in this paper: > The Program Structure Tree: Computing Control Regions in Linear Time > by Richard Johnson , David Pearson , KeshavPingaliHi Jan, great to read you again. And thanks for pointing me to this paper, I read it but did some further research. I like the idea using edges to define the regions, however it does not catch all regions. Defining regions using just edges stops us from detection a lot of very common regions. Example: A very common CFG: \ / 1 / \ 2 3 | | 4 6 | | 5 7 \ / 8 / \ 9 10 | | 11 12 \ / 13 / \ I would detect these two regions: Region A: 1 -> 8 containing {1,2,3,4,5,6,7} Region B: 8 -> 13 containing {8,9,10,11,12} If I use edges to define the regions the detection is not possible at all. After region detection the CFG can always be split up to create single entry single exit edges, if they are needed e.g. for code generation. \ / 1_a | 1 / \ 2 3 | | 4 6 | | 5 7 \ / 8_a | 8 / \ 9 10 | | 11 12 \ / 13_a | 13 / \ Now the regions can be defined using edges: Region A: (1_a,1) -> (8_a, 8) containing {1,2,3,4,5,6,7,8_a} Region B: (8_a, 8) -> (13_a, 13) containing {8,9,10,11,12,13_a} In general this approach saves a preliminary pass that has to insert new bbs, to generate these edges. As we do not have to modify the CFG other passes like dominance information are still valid, and we do not have to create a lot of auxiliary bbs, to be able to detect all regions. This saves memory and runtime. In general it is probably not too easy to decide where to insert these bbs either: CFG: 0 | 1 / | 2 | / \ 3 4 5 | | | | 6 7 8 \ | / \ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8} 9 region B: 2 -> 9 {2,4,5,6,7} So we need one bb that joins 6 and 7 and one that joins the two regions CFG: 0 | 1 / | 2 | / \ 3 4 5 | | | | 6 7 8 \ | | \ | | region A: (0,1) -> (9b,9) {1,2,3,4,5,6,7,8,9a,9b} 9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a} \ / 9b | 9 My approach is comparable to this paper: The Refined Process Structure Tree by Jussi Vanhatalo, Hagen Völzer, Jana Koehler The implementation however takes advantage of the existence of Dominance/PostDominance information. Therefore it is simpler and hopefully faster. At the moment run time is comparable to dominance tree calculation. If you want, have a look into some results I got with a pass extracting maximal non trivial regions that do not contain loops from libbz2: http://tobias.osaft.eu/llvm/region/bz2NoLoops/ Interesting ones: regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png has a lot of exit edges. regions_without_loops.bzopen_or_bzdopen.dot.png the first region has two entry edges. One is the loop latch. (Keep in mind all regions have the same color, so if it seems there is an edge into a region, there are just two regions close by) Without a prepass that exposes the edges almost no region could be detected with the "standard" approach. It would be great to hear your opinion about this. See you Tobias
Hi Tobias> In general this approach saves a preliminary pass that has to insert new> bbs, to generate these edges. As we do not have to modify the CFG other > passes like dominance information are still valid, and we do not have to > create a lot of auxiliary bbs, to be able to detect all regions. This > saves memory and runtime. In general it is probably not too easy to > decide where to insert these bbs either:The general rule is to split all blocks with multiple in-edges and multiple out-edges into blocks with either multiple in-edges or multiple out-edges, but not both. One option is to keep this as an invariant throughout the compiler and make use of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code in regular basic blocks. There are different variations on this that may or may not be useful.> CFG: > 0 > | > 1 > / | > 2 | > / \ 3 > 4 5 | > | | | > 6 7 8 > \ | / > \ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8} > 9 region B: 2 -> 9 {2,4,5,6,7} > > So we need one bb that joins 6 and 7 and one that joins the two regions > > CFG: 0 > | > 1 > / | > 2 | > / \ 3 > 4 5 | > | | | > 6 7 8 > \ | | > \ | | region A: (0,1) -> (9b,9) {1,2,3,4,5,6,7,8,9a,9b} > 9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a} > \ / > 9b > | > 9It is fairly simple to use the information from the algorithm to decide where those merges should be inserted to get the expected regions. This may be needed in the cases where a sub-region is too complicated to be represented and must be abstracted into a "black box".> My approach is comparable to this paper: > The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer, > Jana KoehlerI was looking through some slides that described their algorithm. One case that seems to be legal is this: | Entry / \ R0 R1 \ / Exit | With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means that a fragment cannot be identified using only the entry and exit blocks, but the internal blocks or edges will also need to be listed. I don't know if this is relevant to your implementation.> The implementation however takes advantage of the existence of > Dominance/PostDominance information. Therefore it is simpler and > hopefully faster. At the moment run time is comparable to dominance tree > calculation.Both algorithms are linear so there is really no big difference in time imo. I believe the biggest difference that you mention is that you can capture more complicated regions without having to modify the CFG with the current algorithm.> If you want, have a look into some results I got with a pass extracting > maximal non trivial regions that do not contain loops from libbz2: > > http://tobias.osaft.eu/llvm/region/bz2NoLoops/ > > Interesting ones: > > regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png > has a lot of exit edges.I think this example proves the strengths and weaknesses of both approaches. Making that region into structured control flow would add a lot of additional blocks. This will also happen after generating code from the polyhedral model, so either way the cost is there if the optimization is successful. The second case is where the optimization fails (no profitable transformation found) and the CFG can remain untouched. The third case is if one of those blocks contains something complicated. I believe the current algorithm simply fails and cannot detect the region. If the CFG is modified this would allow an internal SESE-region to become a black box, and the the outer regions could be optimized.> regions_without_loops.bzopen_or_bzdopen.dot.png > the first region has two entry edges. One is the loop latch. > (Keep in mind all regions have the same color, so if it seems there is > an edge into a region, there are just two regions close by) > > Without a prepass that exposes the edges almost no region could be > detected with the "standard" approach.Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off between the two approaches is that the algorithm that you currently have is a cheaper up front, but may be less capable in some cases, while the "standard" algorithm will be more expensive, but can handle problematic regions better. Would you agree? - Jan