thr3ads.net - llvm dev - [llvm-dev] LLVM: mapping unoptimized IR back to clang AST [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Tomas Kalibera via llvm-dev

2015-Sep-15 09:10 UTC

[llvm-dev] LLVM: mapping unoptimized IR back to clang AST

Hi,

I would like to rewrite a C program based on analyzing the LLVM IR of 
that C program, produced by CLANG. Did anyone have any hints on how to 
map the IR back to CLANG AST?

Can I do better than invoking clang with "-g -O0" to produce an IR for
this task? (can I get more debug info, disable more optimizations?)

The debug info in LLVM IR does not seem to have information on C macros 
(while CLANG AST does) - is there a way to get that information from the IR?

Is it possible to add some custom meta-data to CLANG AST nodes that 
would somehow propagate through CLANG to the LLVM IR?
I could think of wrapping some AST nodes into dummy function calls, but 
that seems rather crude.

Indeed, some analysis can also be done at AST level, but it seems to me 
that it is easier to do at the IR level. Also the IR level has linking 
information, one can do inter-procedural analyses.

Thanks
Tomas

Joshua Cranmer 🐧 via llvm-dev

2015-Sep-16 13:48 UTC

head link

[llvm-dev] LLVM: mapping unoptimized IR back to clang AST

On 9/15/2015 4:10 AM, Tomas Kalibera via llvm-dev wrote:>
> Hi,
>
> I would like to rewrite a C program based on analyzing the LLVM IR of 
> that C program, produced by CLANG. Did anyone have any hints on how to 
> map the IR back to CLANG AST?
>
> Can I do better than invoking clang with "-g -O0" to produce an
IR for
> this task? (can I get more debug info, disable more optimizations?)
>
> The debug info in LLVM IR does not seem to have information on C 
> macros (while CLANG AST does) - is there a way to get that information 
> from the IR?
Going from C to IR is inherently a lossy transformation. Macros don't 
exist except at a pre-lexing stage (although clang does retain them 
through semantic analysis). There are several ASTs that would map to the 
same IR (for example, for (;;) {}, while (1) {}, and do {} while (1); 
are all the exact some control flow, yet have different IR). While it 
might be the case that you could do some disambiguation based on things 
like basic block names, such introspection would be highly brittle and 
likely to break if any optimization pass is run. Note that if you don't 
run some basic optimization passes (such as mem2reg), the IR is going to 
be much harder to analyze (e.g., you would need def-use tracking through 
memory to do basic constant propagation of local variables!).

Or, put another way, if you try to do C -> IR -> C, you will have to 
accept that the output C may look nothing like the input C.
> Is it possible to add some custom meta-data to CLANG AST nodes that 
> would somehow propagate through CLANG to the LLVM IR?
> I could think of wrapping some AST nodes into dummy function calls, 
> but that seems rather crude.
>
> Indeed, some analysis can also be done at AST level, but it seems to 
> me that it is easier to do at the IR level. Also the IR level has 
> linking information, one can do inter-procedural analyses.
If you're trying to rewrite C code, you'll want to do that all at AST 
level. It is too lossy to convert to IR and back again.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

llvm dev - Sep 2015 - LLVM: mapping unoptimized IR back to clang AST

[llvm-dev] LLVM: mapping unoptimized IR back to clang AST

[llvm-dev] LLVM: mapping unoptimized IR back to clang AST