James Courtier-Dutton via llvm-dev
2018-Sep-02 15:42 UTC
[llvm-dev] Understanding optimizations below LLVM IR.
Consider the following x86_64 assembly: cmpl $0x12,%rax sbb %esi,%esi and $0xffffffffffffffdf,%esi add $0x5b,%esi It contains the SBB instruction. SBB cannot be represented in LLVM_IR cmpl: if ($0x12 < %rax) set the carry flag. sbb: if carry flag set: %esi = 0xffffffffffffffff else %esi = 0; // Note: 0xffffffffffffffff = -1 and: if carry flag set: %esi = 0xffffffffffffffdf else %esi = 0; // Note: 0xffffffffffffffdf = -33 add: if carry flag set: %esi = 0x3a else %esi = 0x5b; This can then be converted into: if ($0x12 < %rax) { %esi = 58; // 0x3a } else { %esi = 91; // 0x5b } So, the SBB is used here to remove the need for a Branch instruction. WOW, compilers are clever!!! I am writing a de-compiler "Binary -> LLVM IR". So, I obviously need to treat SBB as a special case and transform it into something that can be represented in LLVM IR. I wish to obtain a list of all the optimizations done by LLVM that result in assembly that cannot immediately be represented in LLVM IR. The above being one example. For example: 1) List all optimizations that result in a SBB instruction. Where in LLVM should I start looking ? Kind Regards James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180902/45247711/attachment.html>
Krzysztof Parzyszek via llvm-dev
2018-Sep-02 17:00 UTC
[llvm-dev] Understanding optimizations below LLVM IR.
On 9/2/2018 10:42 AM, James Courtier-Dutton via llvm-dev wrote:> > I am writing a de-compiler "Binary -> LLVM IR". So, I obviously need to > treat SBB as a special case and transform it into something that can be > represented in LLVM IR.Not a "special case", it's just an instruction whose function needs to be represented in the LLVM IR somehow.> I wish to obtain a list of all the optimizations done by LLVM that > result in assembly that cannot immediately be represented in LLVM IR.That won't take you anywhere. Think of this as a compiler that takes source programs in ELF format (for example) and produces output in .ll format. The resulting .ll will never look exactly like the original bitcode, the best you can get is that it will have the same semantics. The SBB instruction uses the carry bit and modifies the carry bit, so you need to represent the carry in your bitcode model somehow, and then do just that: write bitcode that produces the result of the subtraction and the value of the simulated carry bit. For things like EFLAGS are difficult to model because they are like a global variable, but if you assume some default value of it at function entries, you can still "decompile" functions that use it. -Krzysztof