I went ahead and tried to translate my legacy machine language into IR. Legacy branch instructions have me stumped. The branch instructions are already resolved to destination addresses in the legacy machine code. For example, there is an instruction that performs an unconditional branch to the address stored in legacy register B1. I can represent register B1 as a local variable: %B1 = alloca i32 ; storage for emulated register B1 I can generate IR corresponding to legacy machine code that calculates the value to store in %B1. I can generate the IR to store in %B1. store i32 %indirectAddr, i32* %B1 Now, how do I generate an IR "br" instruction to the calculated address in %B1 ? I don't have a suitable destination label in my IR. I can't create a block for the destination address if the address is calculated by the legacy code (can I?) Am I off to the wrong approach ? Are there any suggestions ?
Eli Friedman
2008-Jun-23 04:54 UTC
[LLVMdev] Advice - llvm as binary to binary translator ?
On Sun, Jun 22, 2008 at 8:08 PM, Erik Buck <erik.buck at sbcglobal.net> wrote:> I went ahead and tried to translate my legacy machine language into > IR. Legacy branch instructions have me stumped. The branch > instructions are already resolved to destination addresses in the > legacy machine code. For example, there is an instruction that > performs an unconditional branch to the address stored in legacy > register B1. > > I can represent register B1 as a local variable: > %B1 = alloca i32 ; storage for emulated register B1 > I can generate IR corresponding to legacy machine code that calculates > the value to store in %B1. I can generate the IR to store in %B1. > store i32 %indirectAddr, i32* %B1 > Now, how do I generate an IR "br" instruction to the calculated > address in %B1 ? I don't have a suitable destination label in my IR. > I can't create a block for the destination address if the address is > calculated by the legacy code (can I?) > > Am I off to the wrong approach ? Are there any suggestions ?Completely wrong approach; you're not going to get anywhere trying to statically translate machine code. (I actually tried something like this once, and it broke apart in a similar way.) As far as I know, the only project that made any real progress using LLVM for binary translation is llvm-qemu (http://code.google.com/p/llvm-qemu/). The approach there is roughly to JIT one basic block at a time instead of interpreting one instruction at a time, which is relatively simple and has a relatively low overhead. There's some documentation about llvm-qemu on its website, and there's some good information on how to get started with the JIT at http://llvm.org/docs/tutorial/. That said, there are a lot of ways to speed up a pure interpreter; I'd suggest trying that first before attempting a JIT-based solution. JIT can be a useful tool, but translation time can end up being a significant factor, and it will likely take a lot of work to get good performance with it. -Eli