HI, I am interested in whether LLVM bit-code is ready for a distribution format(stored in software distribution package); is it easy to revert LLVM IR to C/C++ source code like Java byte code? My understanding is that. 1. LLVM IR is more like assembly code, so it is not easy for reverse engineering. 2. If it is easy for reverse engineering, does it mean it is not suitable for distribution format? Otherwise code obfuscation in IR level must be added. Thanks Wan Xiaofei
LLVM IR represents higher level than assembler code, it keeps some names and it is easier to revert the IR to source code than a binary format. The main task of LLVM IR is code generation. I don't think adding obfuscation has particular worth, those who need it can use tools and approaches specifically aimed at obfuscation. Even simple rename of identifiers in source code makes C/C++ file very difficult to analyze. In other cases one might use anti-debugger tricks or execution code in virtual machine. Everything depends on the level of obfuscation, it is impractical to make LLVM IR a tool for that. Thanks, --Serge 2013/10/15 Wan, Xiaofei <xiaofei.wan at intel.com>> HI, > > I am interested in whether LLVM bit-code is ready for a distribution > format(stored in software distribution package); is it easy to revert LLVM > IR to C/C++ source code like Java byte code? My understanding is that. > 1. LLVM IR is more like assembly code, so it is not easy for reverse > engineering. > 2. If it is easy for reverse engineering, does it mean it is not suitable > for distribution format? Otherwise code obfuscation in IR level must be > added. > > Thanks > Wan Xiaofei > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/4309b77c/attachment.html>
Thanks, I just want to get the conclusion "LLVM IR" is easy to be reverted into source code. Code obfuscation is not worth of discussion here, at least it is not IR's coverage, haha. But one more question here, there are some optimization passes are applied in the frontend before generating BC, so it may not easy to revert IR to source code. Thanks Wan Xiaofei From: Serge Pavlov [mailto:sepavloff at gmail.com] Sent: Tuesday, October 15, 2013 12:01 PM To: Wan, Xiaofei Cc: LLVMdev at cs.uiuc.edu; Chris Lattner (sabre at nondot.org) Subject: Re: [LLVMdev] Reverse engineering for LLVM bit-code LLVM IR represents higher level than assembler code, it keeps some names and it is easier to revert the IR to source code than a binary format. The main task of LLVM IR is code generation. I don't think adding obfuscation has particular worth, those who need it can use tools and approaches specifically aimed at obfuscation. Even simple rename of identifiers in source code makes C/C++ file very difficult to analyze. In other cases one might use anti-debugger tricks or execution code in virtual machine. Everything depends on the level of obfuscation, it is impractical to make LLVM IR a tool for that. Thanks, --Serge 2013/10/15 Wan, Xiaofei <xiaofei.wan at intel.com<mailto:xiaofei.wan at intel.com>> HI, I am interested in whether LLVM bit-code is ready for a distribution format(stored in software distribution package); is it easy to revert LLVM IR to C/C++ source code like Java byte code? My understanding is that. 1. LLVM IR is more like assembly code, so it is not easy for reverse engineering. 2. If it is easy for reverse engineering, does it mean it is not suitable for distribution format? Otherwise code obfuscation in IR level must be added. Thanks Wan Xiaofei _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Thanks, --Serge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/df2d02cf/attachment.html>
On 10/14/2013 9:31 PM, Wan, Xiaofei wrote:> HI, > > I am interested in whether LLVM bit-code is ready for a distribution format(stored in software distribution package); is it easy to revert LLVM IR to C/C++ source code like Java byte code? My understanding is that. > 1. LLVM IR is more like assembly code, so it is not easy for reverse engineering.IDA and HexRays show that it is extremely possible to reverse engineer assembly code (at least that which comes out of a C/C++ compiler) to C/C++ code. But even though that's the question you asked, it's not what you meant to ask. What makes Java easy to reverse engineer is that it retains full structural typing and names of the original program [1]. LLVM lacks names for fields of structural types (although it does retain struct names and global names), but optimization passes will render all SSA names completely illegible, and they often appear to destroy structural typing a fair amount too.> 2. If it is easy for reverse engineering, does it mean it is not suitable for distribution format? Otherwise code obfuscation in IR level must be added.If you are super-paranoid about reverse-engineering, replace all names of functions with garbage names and all types with equivalent i8 arrays. The resulting IR will pretty much be exactly as informative about the original source code as the resulting assembly will be. [1] Due to a version control bug, I ended up losing the source code to my C++ project while retaining the resulting library. I found this much easier to decompile than a project I once set myself of decompiling obfuscated Java bytecode (where the only obfuscation that provided a meaningful barrier to comprehension was name obfuscation). -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist
Reasonably Related Threads
- [LLVMdev] Reverse engineering for LLVM bit-code
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] [LLVM Dev] [Discussion] Function-based parallel LLVM backend code generation
- [LLVMdev] __fp16 suport in llvm back-end