James Courtier-Dutton
2012-Sep-13 10:58 UTC
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
Hi, I know most compilers go from AST to CFG. I am writing a decompiler, so I was wondering if anyone knew of any documents describing how best to get from CFG to AST. The decompiler project is open source. https://github.com/jcdutton/libbeauty The decompiler already contains a disassembler and a virtual machine resulting in an annotated CFG. It uses information gained from using a virtual machine to annotate the CFG. Another use of the VM will be to help analyze self modifying code. The decompiler can output C source code form the CFG, but it is not easy to understand the result due to lack of structure such as for {} loops. I wish to create an AST from the CFG in order to be able to output for {}, while {} and if...then...else structure. The CFG to AST step seems a little complicated, so I was looking for some pointers to how best to do it to save me re-inventing the wheel. Kind Regards James
Carolina Simões Gomes
2012-Sep-13 23:10 UTC
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
Hello, Yours is a very interesting problem. I am also interested in this, hope someone can answer. On Thu, Sep 13, 2012 at 4:58 AM, James Courtier-Dutton < james.dutton at gmail.com> wrote:> Hi, > > I know most compilers go from AST to CFG. > I am writing a decompiler, so I was wondering if anyone knew of any > documents describing how best to get from CFG to AST. > The decompiler project is open source. > https://github.com/jcdutton/libbeauty > > The decompiler already contains a disassembler and a virtual machine > resulting in an annotated CFG. It uses information gained from using a > virtual machine to annotate the CFG. Another use of the VM will be to > help analyze self modifying code. > > The decompiler can output C source code form the CFG, but it is not > easy to understand the result due to lack of structure such as for {} > loops. > I wish to create an AST from the CFG in order to be able to output for > {}, while {} and if...then...else structure. > The CFG to AST step seems a little complicated, so I was looking for > some pointers to how best to do it to save me re-inventing the wheel. > > Kind Regards > > James > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- [Carolina Simões Gomes] M.Sc. Student in Computing Science University of Alberta, Canada CAS Partner - IBM Toronto +1 (780) 863-0155 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120913/1b64333c/attachment.html>
Joshua Cranmer
2012-Sep-16 22:12 UTC
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
On 9/13/2012 5:58 AM, James Courtier-Dutton wrote:> Hi, > > I know most compilers go from AST to CFG. > I am writing a decompiler, so I was wondering if anyone knew of any > documents describing how best to get from CFG to AST.The general process that you are trying to do is known as "structuring control flow graphs." The basic algorithm, as far as I know it, amounts to three steps (in C terms): 1. Make switch statements 2. Make loops 3. Figure out the if/else stuff Google finds this article: <http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=914984> for a more detailed algorithm. -- Joshua Cranmer News submodule owner DXR coauthor
Ralf Karrenberg
2012-Sep-21 08:51 UTC
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
Hi, Simon Moll (in CC) has written a decompiler for LLVM in his Bachelor's Thesis here at Saarland University. The thesis is titled "Decompilation of LLVM IR" and can be found here: http://www.cdl.uni-saarland.de/publications/ The library he implemented is called "Axtor" (for "AST Extractor") and has been used primarily to generate OpenCL code from LLVM. In this context, it successfully compiles OpenCL -> LLVM -> OpenCL "cycles". Simon will be able to give you more information, also in terms of license details which I don't remember. Cheers, Ralf On 13.09.2012 12:58, James Courtier-Dutton wrote:> Hi, > > I know most compilers go from AST to CFG. > I am writing a decompiler, so I was wondering if anyone knew of any > documents describing how best to get from CFG to AST. > The decompiler project is open source. > https://github.com/jcdutton/libbeauty > > The decompiler already contains a disassembler and a virtual machine > resulting in an annotated CFG. It uses information gained from using a > virtual machine to annotate the CFG. Another use of the VM will be to > help analyze self modifying code. > > The decompiler can output C source code form the CFG, but it is not > easy to understand the result due to lack of structure such as for {} > loops. > I wish to create an AST from the CFG in order to be able to output for > {}, while {} and if...then...else structure. > The CFG to AST step seems a little complicated, so I was looking for > some pointers to how best to do it to save me re-inventing the wheel. > > Kind Regards > > James > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
James Courtier-Dutton
2012-Sep-21 17:15 UTC
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
On 21 September 2012 09:51, Ralf Karrenberg <Chareos at gmx.de> wrote:> Hi, > > Simon Moll (in CC) has written a decompiler for LLVM in his Bachelor's > Thesis here at Saarland University. The thesis is titled "Decompilation of > LLVM IR" and can be found here: > http://www.cdl.uni-saarland.de/publications/ > > The library he implemented is called "Axtor" (for "AST Extractor") and has > been used primarily to generate OpenCL code from LLVM. In this context, it > successfully compiles OpenCL -> LLVM -> OpenCL "cycles". > Simon will be able to give you more information, also in terms of license > details which I don't remember. >I have read the pdf doc. It covers some things well, but is missing an important piece of information. It only needs to cover well formed CFG. I.e. Normal loops and if...then...else. It does not need to handle "goto". Specifically: Section 4.1.2 1) Figure 4.2: Ir-reducible control-flow. 2) Figure 4.3: A CFG with loop-crossing branches. 3) Figure 4.4: Ab-normal selection paths. If the original program was written in C, these types of CFG are all possible. The only way to represent these in AST is to introduce "gotos" at strategic points in the CFG. I am looking at various CFG graphs, and from looking at them, it is quite obvious to me which branches should be gotos, and which should be loops or if...then...else. The problem I have is coming up with an algorithm that would make the same decisions as I would.
Jianfei Hu
2012-Oct-13 05:51 UTC
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
Interesting things. I once also wanted to get structural information from LLVM binary code. And I found it just consisted of connected basic blocks, CFG, not ast-like tree. What I am wondering is that, why LLVM does not include some high level IR, before converting to RISC-like instruction. I think different work and anlysis should be done on different level. As far as I know, open64 compiler has several different level IR, called WHIRL. Could anyone answer this question? 2012/9/13 James Courtier-Dutton <james.dutton at gmail.com>:> Hi, > > I know most compilers go from AST to CFG. > I am writing a decompiler, so I was wondering if anyone knew of any > documents describing how best to get from CFG to AST. > The decompiler project is open source. > https://github.com/jcdutton/libbeauty > > The decompiler already contains a disassembler and a virtual machine > resulting in an annotated CFG. It uses information gained from using a > virtual machine to annotate the CFG. Another use of the VM will be to > help analyze self modifying code. > > The decompiler can output C source code form the CFG, but it is not > easy to understand the result due to lack of structure such as for {} > loops. > I wish to create an AST from the CFG in order to be able to output for > {}, while {} and if...then...else structure. > The CFG to AST step seems a little complicated, so I was looking for > some pointers to how best to do it to save me re-inventing the wheel. > > Kind Regards > > James > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Best Wishes, Jianfei Hu