Hi folks, A few months back I finished writing and testing a pass which implements "structural analysis" as described originally by Sharir in 1980 ("Structural analysis: A new approach to flow analysis in optimizing compilers") and more recently by Muchnick in Advanced Compiler Design and Implementation. It analyses the CFG and recognises specific region schema, such as if-then, if-then-else, while-loop, etc., and builds a "control tree" which represents the hierarchical structure of the input program. Somebody asked about this pass a couple of years ago: http://old.nabble.com/Structural-Analysis-to9429397.html#a9429397 I wrote this to get results for a paper, and it needs some tidying up, but it's been tested on 10,000-15,000 functions from open source programs and it works fine, albeit rather slow on big programs because of the exceptionally slow (and exceptionally lazy...) way I implemented calculating dominators, but this could be fixed in the future. Would this be a welcome addition to the project? The only reason that I ask is that it doesn't actually perform any optimisations, it just builds the control tree. Also, this was the first non-trivial piece of C++ code I ever wrote, and I think my code absolutely stinks in places. But it works, and it might be of use to somebody. What do you think? Best wishes, James -- View this message in context: http://old.nabble.com/Seeking-advice-on-Structural-Analysis-pass-tp27890316p27890316.html Sent from the LLVM - Dev mailing list archive at Nabble.com.
On Mar 13, 2010, at 11:54 AM, James Stanier wrote:> It > analyses the CFG and recognises specific region schema, such as if- > then, > if-then-else, while-loop, etc., and builds a "control tree" which > represents > the hierarchical structure of the input program.I am not sure if my definition of a control flow tree is the same as yours, but if it is, I am very interested. For my recent Ph.D. work, I created control flow trees from Java bytecode in order to calculate the WCET of a program. Operating on a CFT instead of a CFG makes the calculation much faster because, in many cases, one can simply perform a recursive descent into the tree (a linear operation), rather than formulating an ILP problem from the CFG and waiting for an ILP solver to do its thing (an exponential operation). For Java, I got lucky because I found a bytecode decompiler that creates a CFT of Java programs as a side-effect of its reverse engineering process. I was then able to adapt this internal CFT data structure into something more general-purpose and readily usable for WCET analysis. I also was able to transform the CFT into a CFG (for performance comparisons) by adding back-edges to the tree in the appropriate places. I am now attempting to adapt the work I did for Java to C++, using LLVM as the foundation. It's difficult, however, because LLVM only provides CFGs, not CFTs. And while transforming a CFT into a CFG is relatively straightforward, going the other direction is not. I assume this is where I could benefit from your and Tobias's work. It sounds like you are both working on (or have solved) the problem of converting LLVM's CFGs into CFTs. Am I understanding that correctly? Just to be clear, a rough example of what I mean by CFT and CFG can be found in the attachments. Both diagrams represent the exact same control flow (from SpeedSensor.java); they are simply two different ways of structuring that control flow: one as a tree, the other as a graph. Trevor -------------- next part -------------- A non-text attachment was scrubbed... Name: SpeedSensor.java Type: application/octet-stream Size: 478 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100315/41c61b7e/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: cft.pdf Type: application/pdf Size: 34703 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100315/41c61b7e/attachment.pdf> -------------- next part -------------- A non-text attachment was scrubbed... Name: cfg.pdf Type: application/pdf Size: 30589 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100315/41c61b7e/attachment-0001.pdf>
On 03/16/2010 05:02 PM, James Stanier wrote:> > Hi Trevor, > > I just studied your diagrams -- I *think* that the "control tree" of > structural analysis is not same as the "control flow tree" from your own > Java work. However, as you understand your work more than I do, I present > you with some fancy coloured diagrams that I drew that might help you (and > others!) understand what structural analysis does a bit better than the > black-and-white ones in Muchnick's book. > > This first diagram shows a CFG in step 1, and the steps of reduction that > structural analysis performs: > http://www.informatics.sussex.ac.uk/users/js231/img/sa-example-reduction.pdf > > For example, in the first step, the red cloud indicates that the blocks "B6" > and "exit" are matched to the "continuous blocks" schema. These are then > reduced into an abstract node which is called B6a and is coloured red (the > same colour as the cloud) in the next step. Then, block "B5" is matched to > the "self loop" schema, and replaced by the pink abstract node "B5a" in the > next diagram. This continues until there is only one node left, and > structural analysis has finished. > > This second diagram shows the control tree that is built while these > reductions are occurring, which shows the hierarchical structure of the > program. The coloured nodes are the abstract nodes from the first diagram, > and all leaf nodes are the original basic blocks. > http://www.informatics.sussex.ac.uk/users/js231/img/sa-controltree.pdf > > Does this make it any clearer? I'm happy to explain it in more detail.Hi James, this is very close to what ether and me have implemented. However you have got the nicer graphics, they look great. If you want to try the RegionInfo pass call: opt -regions -analyze somefile.ll This will print you a tree that is not as nice as your graphics, but should have the same semantics. The algorithm I used is probably different, however the results are the same. The main difference I see between our approaches is that you classify the regions, whereas in the RegionInfo analysis a tree out of (yet unclassified) regions is created Therefore the RegionInfo analysis is probably a little bit more general, as I am just looking for anything that is either a single entry single exit region (simple region) or can be transformed to such a region be inserting merge blocks (refined region). As your pass seems to work on block schemas it will probaly only detect regions that fit in a schema that you have implemented. Is this correct? I would be interested how your pass handles unstructured control flow. Especially in the problematic cases I put in the RegionInfo test suite. They are available at this place: http://repo.or.cz/w/llvm-complete/pofl.git/tree/refs/heads/regioninfo:/test/Analysis/RegionInfo Ether and me are currently adding doxygen comments. If you are interested you could also have a look at our doxygen documentation http:://tobias.osaft.eu/regioninfo/doxygen To me it seems as if our approaches are close enough together to merge them. I think your code could take advantage of the iterators and the RegionPass we have written. Whereas we would be really glad to have the classification of the regions, This way we could e.g. write RegionPasses that only work on continuous blocks. Or only on if/then/else blocks. Something that seems very useful to me. I would be interested in your opinion Tobias