Tomas Kalibera via llvm-dev
2015-Sep-15 09:10 UTC
[llvm-dev] LLVM: mapping unoptimized IR back to clang AST
Hi, I would like to rewrite a C program based on analyzing the LLVM IR of that C program, produced by CLANG. Did anyone have any hints on how to map the IR back to CLANG AST? Can I do better than invoking clang with "-g -O0" to produce an IR for this task? (can I get more debug info, disable more optimizations?) The debug info in LLVM IR does not seem to have information on C macros (while CLANG AST does) - is there a way to get that information from the IR? Is it possible to add some custom meta-data to CLANG AST nodes that would somehow propagate through CLANG to the LLVM IR? I could think of wrapping some AST nodes into dummy function calls, but that seems rather crude. Indeed, some analysis can also be done at AST level, but it seems to me that it is easier to do at the IR level. Also the IR level has linking information, one can do inter-procedural analyses. Thanks Tomas
Joshua Cranmer 🐧 via llvm-dev
2015-Sep-16 13:48 UTC
[llvm-dev] LLVM: mapping unoptimized IR back to clang AST
On 9/15/2015 4:10 AM, Tomas Kalibera via llvm-dev wrote:> > Hi, > > I would like to rewrite a C program based on analyzing the LLVM IR of > that C program, produced by CLANG. Did anyone have any hints on how to > map the IR back to CLANG AST? > > Can I do better than invoking clang with "-g -O0" to produce an IR for > this task? (can I get more debug info, disable more optimizations?) > > The debug info in LLVM IR does not seem to have information on C > macros (while CLANG AST does) - is there a way to get that information > from the IR?Going from C to IR is inherently a lossy transformation. Macros don't exist except at a pre-lexing stage (although clang does retain them through semantic analysis). There are several ASTs that would map to the same IR (for example, for (;;) {}, while (1) {}, and do {} while (1); are all the exact some control flow, yet have different IR). While it might be the case that you could do some disambiguation based on things like basic block names, such introspection would be highly brittle and likely to break if any optimization pass is run. Note that if you don't run some basic optimization passes (such as mem2reg), the IR is going to be much harder to analyze (e.g., you would need def-use tracking through memory to do basic constant propagation of local variables!). Or, put another way, if you try to do C -> IR -> C, you will have to accept that the output C may look nothing like the input C.> Is it possible to add some custom meta-data to CLANG AST nodes that > would somehow propagate through CLANG to the LLVM IR? > I could think of wrapping some AST nodes into dummy function calls, > but that seems rather crude. > > Indeed, some analysis can also be done at AST level, but it seems to > me that it is easier to do at the IR level. Also the IR level has > linking information, one can do inter-procedural analyses.If you're trying to rewrite C code, you'll want to do that all at AST level. It is too lossy to convert to IR and back again. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist