thr3ads.net - llvm dev - [LLVMdev] Advice - llvm as binary to binary translator ? [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Erik Buck

2008-Jun-22 01:53 UTC

[LLVMdev] Advice - llvm as binary to binary translator ?

First, is there a way to search the archives for this list ?  I  
apologize in advance if I have stepped on a FAQ.

My goal is to execute legacy binary machine code from a very old one  
of a kind computer on a variety of modern computers.   I already wrote  
an emulator for the legacy machine that executes the old machine  
code.  However, my emulator is just an interpreter and therefore has  
some limitations:

- The emulator spends a lot of time in an executive loop that fetches  
legacy instructions, decodes them, and jumps to appropriate C  
functions that emulate each legacy instruction.  The executive loop  
also has to handle emulated interrupts, support single-step debugging,  
etc.

- The emulator is compiled and run on only a few modern hardware/ 
operating system combinations.  The emulator is fairly portable, but  
extensive optimizations on some platforms restrict capabilities on  
other platforms.

- The emulator executes the legacy machine code unmodified which is  
good, but that means opportunities for optimization are lost.  The  
legacy machine code is full of dead code, jumps to jumps, redundant  
sub-expressions, unnecessary memory accesses, etc.  Back in the old  
days, compilers really didn't optimize at all.  They generated  
horrible code that was sometimes hand modified.

My idea is to convert my emulator into a translator that emits LLVM IR  
either directly or via calls to the LLVM library.  I would then  
execute the result via JIT or native code compilation...

Is this a reasonable approach ?
Can this approach be used even when the legacy code is self  
modifying ?  After a code modification, a re-translation and re-JIT  
would be needed.

Are there any general suggestions ?

Harry Metcalfe

2008-Jun-24 08:28 UTC

head link

[LLVMdev] Advice - llvm as binary to binary translator ?

Hi Eric,

I'm currently writing an IA-32 to LLVMIR translator. I'm only mid way
through, but I can certainly say that there have been more difficulties
than I anticipated when I began!

I think that it is a reasonable approach, perhaps especially in your
case, since you have an emulator already. Automatic static translation
is equivalent to the halting problem for IA-32 code, though perhaps it
wouldn't be for yours (what architecture are you using?). A dynamic
phase is therefore necessary for me -- if it is for you too, you'll have
a leg up.

Self-modifying code is both hideous and unusual, and very difficult to
deal with. I'm leaving it to one side.

General thoughts: are you sure that LLVMIR is suitable? You may be
better off with a lower-level representation. At least in my case, LLVM
enforces a level of structure that doesn't exist in machine code. That's
something you'll also probably have to deal with. 

Its type system also hampers the modification of translated code, so
it's advantageous to ensure that you won't need to change any code once
translated. This is of particular importance when you're trying to
figure out the bounds of an array, and things like that: a change to the
size of an array is a change of its type, which means it's much easier
just to get the size of the array right in the first place. I'm
currently in the process of altering my code so that a lot more analysis
takes place before translation even begins!

Finally, how will you deal with memory accesses and aliasing? This is
certainly the thorniest problem, and its the one my dynamic phase exists
to solve.

Do email me off-list if you like -- it sounds like we're pursuing
similar lines of inquiry!

Harry

On Sat, 2008-06-21 at 21:53 -0400, Erik Buck wrote:> First, is there a way to search the archives for this list ?  I  
> apologize in advance if I have stepped on a FAQ.
> 
> My goal is to execute legacy binary machine code from a very old one  
> of a kind computer on a variety of modern computers.   I already wrote  
> an emulator for the legacy machine that executes the old machine  
> code.  However, my emulator is just an interpreter and therefore has  
> some limitations:
> 
> - The emulator spends a lot of time in an executive loop that fetches  
> legacy instructions, decodes them, and jumps to appropriate C  
> functions that emulate each legacy instruction.  The executive loop  
> also has to handle emulated interrupts, support single-step debugging,  
> etc.
> 
> - The emulator is compiled and run on only a few modern hardware/ 
> operating system combinations.  The emulator is fairly portable, but  
> extensive optimizations on some platforms restrict capabilities on  
> other platforms.
> 
> - The emulator executes the legacy machine code unmodified which is  
> good, but that means opportunities for optimization are lost.  The  
> legacy machine code is full of dead code, jumps to jumps, redundant  
> sub-expressions, unnecessary memory accesses, etc.  Back in the old  
> days, compilers really didn't optimize at all.  They generated  
> horrible code that was sometimes hand modified.
> 
> My idea is to convert my emulator into a translator that emits LLVM IR  
> either directly or via calls to the LLVM library.  I would then  
> execute the result via JIT or native code compilation...
> 
> Is this a reasonable approach ?
> Can this approach be used even when the legacy code is self  
> modifying ?  After a code modification, a re-translation and re-JIT  
> would be needed.
> 
> Are there any general suggestions ?
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nuno Lopes

2008-Jun-24 09:28 UTC

head link

[LLVMdev] Advice - llvm as binary to binary translator ?

You may also want to take a look at valgrind. It is capable of translating 
IA-32/64, PPC-32/64, etc.. to its own SSA-style IR. And then it has backends 
from the IR to the very same architectures. You can also build a backend to 
LLVM and let it further optimize the generated code (although valgrind 
already has its own optimizers).
Building such translators is not easy business. I can tell you that by 
experience.. Depending on what you want to achieve, I would reuse something 
already existent.

Nuno


----- Original Message ----- 
From: "Harry Metcalfe" <H.S.Metcalfe at sussex.ac.uk>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Tuesday, June 24, 2008 9:28 AM
Subject: Re: [LLVMdev] Advice - llvm as binary to binary translator ?

> Hi Eric,
>
> I'm currently writing an IA-32 to LLVMIR translator. I'm only mid
way
> through, but I can certainly say that there have been more difficulties
> than I anticipated when I began!
>
> I think that it is a reasonable approach, perhaps especially in your
> case, since you have an emulator already. Automatic static translation
> is equivalent to the halting problem for IA-32 code, though perhaps it
> wouldn't be for yours (what architecture are you using?). A dynamic
> phase is therefore necessary for me -- if it is for you too, you'll
have
> a leg up.
>
> Self-modifying code is both hideous and unusual, and very difficult to
> deal with. I'm leaving it to one side.
>
> General thoughts: are you sure that LLVMIR is suitable? You may be
> better off with a lower-level representation. At least in my case, LLVM
> enforces a level of structure that doesn't exist in machine code.
That's
> something you'll also probably have to deal with.
>
> Its type system also hampers the modification of translated code, so
> it's advantageous to ensure that you won't need to change any code
once
> translated. This is of particular importance when you're trying to
> figure out the bounds of an array, and things like that: a change to the
> size of an array is a change of its type, which means it's much easier
> just to get the size of the array right in the first place. I'm
> currently in the process of altering my code so that a lot more analysis
> takes place before translation even begins!
>
> Finally, how will you deal with memory accesses and aliasing? This is
> certainly the thorniest problem, and its the one my dynamic phase exists
> to solve.
>
> Do email me off-list if you like -- it sounds like we're pursuing
> similar lines of inquiry!
>
> Harry
>
>
>
> On Sat, 2008-06-21 at 21:53 -0400, Erik Buck wrote:
>> First, is there a way to search the archives for this list ?  I
>> apologize in advance if I have stepped on a FAQ.
>>
>> My goal is to execute legacy binary machine code from a very old one
>> of a kind computer on a variety of modern computers.   I already wrote
>> an emulator for the legacy machine that executes the old machine
>> code.  However, my emulator is just an interpreter and therefore has
>> some limitations:
>>
>> - The emulator spends a lot of time in an executive loop that fetches
>> legacy instructions, decodes them, and jumps to appropriate C
>> functions that emulate each legacy instruction.  The executive loop
>> also has to handle emulated interrupts, support single-step debugging,
>> etc.
>>
>> - The emulator is compiled and run on only a few modern hardware/
>> operating system combinations.  The emulator is fairly portable, but
>> extensive optimizations on some platforms restrict capabilities on
>> other platforms.
>>
>> - The emulator executes the legacy machine code unmodified which is
>> good, but that means opportunities for optimization are lost.  The
>> legacy machine code is full of dead code, jumps to jumps, redundant
>> sub-expressions, unnecessary memory accesses, etc.  Back in the old
>> days, compilers really didn't optimize at all.  They generated
>> horrible code that was sometimes hand modified.
>>
>> My idea is to convert my emulator into a translator that emits LLVM IR
>> either directly or via calls to the LLVM library.  I would then
>> execute the result via JIT or native code compilation...
>>
>> Is this a reasonable approach ?
>> Can this approach be used even when the legacy code is self
>> modifying ?  After a code modification, a re-translation and re-JIT
>> would be needed.
>>
>> Are there any general suggestions ?

Jonathan S. Shapiro

2008-Jun-24 18:33 UTC

head link

[LLVMdev] Advice - llvm as binary to binary translator ?

This is off-topic. I would reply directly to Erick, but I seem to have
missed the note that had his reply address.

Erick, Harry:

The evidence in the literature supporting non-trivial optimization in a
dynamic binary to binary translator should be viewed with caution. Our
suspicion when we started the HDTrans work was that all that the fancy
optimizations accomplish in most cases is to (almost) make up for the
computational cost of generating the IR in the first place. The numbers
we achieved on IA32->IA32 translation seem to support that view. There
are special cases where this might not be true, and of course there are
specific counter-examples, but the HDTrans numbers apparently came as a
shock to some people in the dynamic translator world.

Erick mentioned in particular a number of control flow optimizations
that are very easily handled in a direct emitter (i.e. with no IR). Dead
code is also naturally eliminated by any dynamic translator. Redundant
expressions are not, but the performance advantage of modern processors
over legacy machines is so large that eliminating those may not be
worthwhile. Newer machines typically have more registers and/or fast L1
caches, both of which are your friend in a dynamic translator.

Given this, if performance actually matters your time and effort might
be better spent crafting a really tight binary decoder that pays careful
attention to cache utilization, and then implementing a relatively
modest number of branch to branch and call/return optimizations that are
fairly simple to do with a low-level direct translator.

Regardless of whether you accept my argument about the value of
optimization, it's clear from the HDTrans numbers that you can lose a
lot more performance in decoding and IR generation than the published
research work has generally acknowledged.

Please don't hesitate to contact me directly if I can be helpful.


shap

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Jun 2008 - [LLVMdev] Advice - llvm as binary to binary translator ?

[LLVMdev] Advice - llvm as binary to binary translator ?

[LLVMdev] Advice - llvm as binary to binary translator ?

[LLVMdev] Advice - llvm as binary to binary translator ?

[LLVMdev] Advice - llvm as binary to binary translator ?

Reasonably Related Threads