thr3ads.net - llvm dev - [LLVMdev] Newbie [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Vania Joloboff

2008-Apr-01 07:49 UTC

[LLVMdev] Newbie

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
Hello,<br>
<br>
We are a research project in joint french-chinese laboratory. We are
considering using<br>
 LLVM in our project but we'd like to have some additional info before
we dive in. <br>
Since we are new kids on the block, please bear with us...<br>
<br>
We are interested in using LLVM for emulation of real hardware. What we
have as<br>
 input is the binary code of the program to run. Today we emulate each
instruction <br>
behavior sequentially, which has pros and cons. We want to build a
faster simulator,<br>
and an idea is to decompile the binary code into an LLVM
representation, then compile<br>
it to the simulation host and run it. Hopefully it would be faster
because perhaps we<br>
may use one LLVM instruction for several machine instructions, and we
can benefit <br>
from the real host stack and the real registers instead of a simulated
stack<br>
and simulated registers.<br>
<br>
So we have several questions:<br>
<br>
1. Do you have an opinion on the feasibility of the project ? <br>
           Do you know if it has been done before.<br>
<br>
2. There is an in-memory representation for LLVM. Where shall we look
in the<br>
documentation about it to understand how to generate it properly ?<br>
<br>
3 We want to generate directly the in-memory IR and dynamicall call the
LLVM code<br>
    generator on a chunk of code that has been decompiled, not a
complete program.<br>
<i>         </i>Is this possible ? Is it worthwile in terms of
performance ?<br>
<br>
<br>
Sincerely,<br>
-- Vania<br>
<div class="moz-signature"><br>
<meta content="text/html;" http-equiv="Content-Type">
<title></title>
<font face="Helvetica, Arial,
sans-serif">===============================================<br>
Vania JOLOBOFF<br>
LIAMA Sino French Laboratory
<br>
95 Zhongguancun East Road
<br>
Beijing 100080, China
<br>
Tel +86 10 8261 4528     <a class="moz-txt-link-freetext"
href="http://liama.ia.ac.cn/">http://liama.ia.ac.cn/</a><br>
<a class="moz-txt-link-abbreviated"
href="mailto:vania@liama.ia.ac.cn">vania@liama.ia.ac.cn</a> 
or <a class="moz-txt-link-abbreviated"
href="mailto:vania.joloboff@inria.fr">vania.joloboff@inria.fr</a><br>
</font>
<br>
</div>
</body>
</html>

Patrick Meredith

2008-Apr-02 02:20 UTC

head link

[LLVMdev] Newbie

----- Original Message ----- 
From: "John Criswell" <criswell at uiuc.edu>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Cc: "Claude Helmstetter" <claude at liama.ia.ac.cn>
Sent: Tuesday, April 01, 2008 9:52 PM
Subject: Re: [LLVMdev] Newbie

> Vania Joloboff wrote:
>> Hello,
>>
>> We are a research project in joint french-chinese laboratory. We are
>> considering using
>>  LLVM in our project but we'd like to have some additional info
before
>> we dive in.
>> Since we are new kids on the block, please bear with us...
>>
>> We are interested in using LLVM for emulation of real hardware. What
>> we have as
>>  input is the binary code of the program to run. Today we emulate each
>> instruction
>> behavior sequentially, which has pros and cons. We want to build a
>> faster simulator,
>> and an idea is to decompile the binary code into an LLVM
>> representation, then compile
>> it to the simulation host and run it. Hopefully it would be faster
>> because perhaps we
>> may use one LLVM instruction for several machine instructions, and we
>> can benefit
>> from the real host stack and the real registers instead of a simulated
>> stack
>> and simulated registers.
> Very cool.  I'm probably not the person best qualified to answer your
> questions, but since no one else has answered them yet, I'll take a
shot.
>>
>> So we have several questions:
>>
>> 1. Do you have an opinion on the feasibility of the project ?
>>            Do you know if it has been done before.
> There was a Google Summer of Code (GSoC) project last year where someone
> started the work of modifying Qemu (a simulator) to use LLVM for JIT
> compilation for faster simulation.  I don't know how well it worked,
but
> it's very similar to what you want to do.  I'd say it's quite
feasible,
> and, in fact, LLVM should make it easier with its JIT libraries.
>>
>> 2. There is an in-memory representation for LLVM. Where shall we look
>> in the
>> documentation about it to understand how to generate it properly ?
> The LLVM Programmers Manual
> (http://llvm.org/docs/ProgrammersManual.html) might be a good place to
> start; it describes the basic classes used for the LLVM in-memory IR in
> the latter half of the document.  The doxygen documentation is also
> surprisingly useful (http://llvm.org/docs/ProgrammersManual.html) to
> describe the details of the LLVM programming APIs.
>
> The in-memory representation is very easy to generate if you're writing
> your program in C++.  Basically, there are C++ classes for each type of
> object in the LLVM IR.  To create the in-memory IR, you simply create a
> new object of the correct class.  For example, to create a new function,
> you simply create a new Function object (i.e. Function * f = new
> Function (...)).
>>
>> 3 We want to generate directly the in-memory IR and dynamicall call
>> the LLVM code
>>     generator on a chunk of code that has been decompiled, not a
>> complete program.
>> /         /Is this possible ? Is it worthwile in terms of performance ?
You would basically *have* to do this:
binary translation is a very difficult problem. What
ISA are you currently implementing?  If it is has fixed instruction width 
this makes the process *much* easier.  Even
with a fixed instruction width keep in mind that it is virtually impossible 
to to do static binary translation on anything
but the simpliest of programs (in particular indirect jumps are painful). 
Whatever you do will probably end up being
dynamic, so I think this is your best route.

> This should be possible using the JIT libraries included with LLVM.  I
> have not used these extensively, but I'm sure someone else on the list
> has and would be happy to answer any specific questions you may have.
>
> Whether it will be worthwhile in performance, I am not sure, but since
> you are currently doing emulation, I'd think that dynamic binary
> translation with LLVM would be much faster.
>
> -- John T.
>
>>
>>
>> Sincerely,
>> -- Vania
>>
>> ===============================================>> Vania JOLOBOFF
>> LIAMA Sino French Laboratory
>> 95 Zhongguancun East Road
>> Beijing 100080, China
>> Tel +86 10 8261 4528     http://liama.ia.ac.cn/
>> vania at liama.ia.ac.cn  or vania.joloboff at inria.fr
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

John Criswell

2008-Apr-02 02:52 UTC

head link

[LLVMdev] Newbie

Vania Joloboff wrote:> Hello,
>
> We are a research project in joint french-chinese laboratory. We are 
> considering using
>  LLVM in our project but we'd like to have some additional info before 
> we dive in.
> Since we are new kids on the block, please bear with us...
>
> We are interested in using LLVM for emulation of real hardware. What 
> we have as
>  input is the binary code of the program to run. Today we emulate each 
> instruction
> behavior sequentially, which has pros and cons. We want to build a 
> faster simulator,
> and an idea is to decompile the binary code into an LLVM 
> representation, then compile
> it to the simulation host and run it. Hopefully it would be faster 
> because perhaps we
> may use one LLVM instruction for several machine instructions, and we 
> can benefit
> from the real host stack and the real registers instead of a simulated 
> stack
> and simulated registers.Very cool.  I'm probably not the person best qualified to answer your 
questions, but since no one else has answered them yet, I'll take a
shot.>
> So we have several questions:
>
> 1. Do you have an opinion on the feasibility of the project ?
>            Do you know if it has been done before.There was a Google Summer of Code (GSoC) project last year where someone 
started the work of modifying Qemu (a simulator) to use LLVM for JIT 
compilation for faster simulation.  I don't know how well it worked, but 
it's very similar to what you want to do.  I'd say it's quite
feasible,
and, in fact, LLVM should make it easier with its JIT
libraries.>
> 2. There is an in-memory representation for LLVM. Where shall we look 
> in the
> documentation about it to understand how to generate it properly ?The LLVM Programmers Manual 
(http://llvm.org/docs/ProgrammersManual.html) might be a good place to 
start; it describes the basic classes used for the LLVM in-memory IR in 
the latter half of the document.  The doxygen documentation is also 
surprisingly useful (http://llvm.org/docs/ProgrammersManual.html) to 
describe the details of the LLVM programming APIs.

The in-memory representation is very easy to generate if you're writing 
your program in C++.  Basically, there are C++ classes for each type of 
object in the LLVM IR.  To create the in-memory IR, you simply create a 
new object of the correct class.  For example, to create a new function, 
you simply create a new Function object (i.e. Function * f = new 
Function (...)).>
> 3 We want to generate directly the in-memory IR and dynamicall call 
> the LLVM code
>     generator on a chunk of code that has been decompiled, not a 
> complete program.
> /         /Is this possible ? Is it worthwile in terms of performance ?This should be possible using the JIT libraries included with LLVM.  I 
have not used these extensively, but I'm sure someone else on the list 
has and would be happy to answer any specific questions you may have.

Whether it will be worthwhile in performance, I am not sure, but since 
you are currently doing emulation, I'd think that dynamic binary 
translation with LLVM would be much faster.

-- John T.
>
>
> Sincerely,
> -- Vania
>
> ===============================================> Vania JOLOBOFF
> LIAMA Sino French Laboratory
> 95 Zhongguancun East Road
> Beijing 100080, China
> Tel +86 10 8261 4528     http://liama.ia.ac.cn/
> vania at liama.ia.ac.cn  or vania.joloboff at inria.fr
>

Tilmann Scheller

2008-Apr-05 17:30 UTC

head link

[LLVMdev] Newbie

On Tue, Apr 1, 2008 at 9:49 AM, Vania Joloboff <vania at liama.ia.ac.cn>
wrote:

 Hello,>
> We are a research project in joint french-chinese laboratory. We are
> considering using
>  LLVM in our project but we'd like to have some additional info before
we
> dive in.
> Since we are new kids on the block, please bear with us...
>
> We are interested in using LLVM for emulation of real hardware. What we
> have as
>  input is the binary code of the program to run. Today we emulate each
> instruction
> behavior sequentially, which has pros and cons. We want to build a faster
> simulator,
> and an idea is to decompile the binary code into an LLVM representation,
> then compile
> it to the simulation host and run it. Hopefully it would be faster because
> perhaps we
> may use one LLVM instruction for several machine instructions, and we can
> benefit
> from the real host stack and the real registers instead of a simulated
> stack
> and simulated registers.
>
> So we have several questions:
>
> 1. Do you have an opinion on the feasibility of the project ?
>            Do you know if it has been done before.
>Using LLVM for dynamic binary translation is definitely feasible, last year
I was working on llvm-qemu during Google Summer of Code 2007 which in fact
does binary translation with LLVM. It is a modified version of qemu which
uses the LLVM JIT for optimization and code generation. Currently it
translates from ARM machine code to LLVM IR (at basic block level) and via
the LLVM JIT to x86 machine code. All source architectures supported by qemu
(x86, x86-64, ARM, SPARC, PowerPC, MIPS, m68k) can be translated to LLVM IR
this way (adding support for one of these architectures only requires minor
changes to llvm-qemu).

The end result was that llvm-qemu was running about half the speed of
regular qemu on the synthetic benchmark nbench (using a hotspot-like
approach: interpretation of blocks with few executions and JITing of blocks
with high execution counts). However, there is still potential for
improvement, one being an efficient implementation of direct block chaining
(in certain cases a block can directly jump to its successor instead of
falling back to the dispatcher, this is currently implemented with calls
instead of jmps, which should be possible to implement with jmps now, after
the recent work on tail call optimizations).  Direct block chaining is a
very useful optimization, on the nbench test case enabling direct block
chaining for regular qemu leads to a 100% speed increase. Another promising
improvement would be the capability to build "super"-blocks from a set
of
connected basic blocks, resembling a "hot path". This work is
partially
finished and, once complete, should yield a significant performance
improvement since a "super"-block offers a lot more optimization
potential
compared to a single basic block. Nevertheless, it is unlikely that
llvm-qemu will ever be much faster than regular qemu (by replacing its code
generator completely, which it currently does), which is due to the fact
that regular qemu has a very lightweight code generator (it basically only
copies blocks of memory and performs some patching to them and only does
static register allocation) which generates reasonably good code, with a
very low overhead for compilation time. In contrast the LLVM JIT generates
really high quality code (in fact the JIT and the static compiler share the
same code generator), but at a higher price in terms of compilation time.
Ideally the current code generator of qemu would coexist with the LLVM JIT
in llvm-qemu, allowing for different levels of code quality/compilation time
depending on the execution frequency of a particular block.

I guess in your situation, the chances are much higher that you will see a
significant performance increase since you apparently don't do any dynamic
binary translation yet, especially if you decide to use your existing
interpreter in combination with the LLVM JIT in a hotspot-like manner.

An important question is how you perform the translation from your source
architecture to LLVM IR, for llvm-qemu I could benefit from the fact that
qemu translates from source machine code to an intermediate representation
which has its instructions implemented in C, thus I could use llvm-gcc to
compile the instructions to equivalent LLVM IR and did not have to worry
about the actual translation from machine code to qemu IR. Going directly
from machine code to LLVM IR certainly requires more effort.

Which architectures are you interested in particularly?

> 3 We want to generate directly the in-memory IR and dynamicall call the
> LLVM code
>     generator on a chunk of code that has been decompiled, not a complete
> program.
> *         *Is this possible ? Is it worthwile in terms of performance ?
>Yes, that's perfectly possible and that's what llvm-qemu does too
(translation is performed at basic block level).  As Patrick already pointed
out, static recompilation is not really feasible in most cases.

If you're interested, you can find llvm-qemu at
http://code.google.com/p/llvm-qemu/, the Wiki contains a page which lists
the progress of the project (including some numbers regarding performance).

Greetings,

Tilmann Scheller
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080405/2ca101cb/attachment.html>

Ralph Corderoy

2008-Apr-06 11:11 UTC

head link

[LLVMdev] llvm-qemu. (Was: Newbie)

Hi Tilmann,
> Nevertheless, it is unlikely that llvm-qemu will ever be much faster
> than regular qemu (by replacing its code generator completely, which
> it currently does), which is due to the fact that regular qemu has a
> very lightweight code generator (it basically only copies blocks of
> memory and performs some patching to them and only does static
> register allocation) which generates reasonably good code, with a very
> low overhead for compilation time. In contrast the LLVM JIT generates
> really high quality code (in fact the JIT and the static compiler
> share the same code generator), but at a higher price in terms of
> compilation time.
How about storing generated code on disc?  Or the intermediate IR?  I'd
typically use the same things under qemu day after day and would be
happy to slowly build up a cache on disc.  Perhaps starting qemu with a
'spend time and add to cache' option when I'm getting it to learn
and a
'use only what's in the cache already' when I having got the time to
wait.

Cheers,


Ralph.

Vania Joloboff

2008-Apr-07 03:56 UTC

head link

[LLVMdev] Newbie

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Thanks to all those who responded to our email.<br>
<br>
Tilmann Scheller wrote:
<blockquote
 cite="mid:aec43a860804051030g14d8514agb574f7fdcbbc9be6@mail.gmail.com"
 type="cite">
  <div class="gmail_quote">
  <div><br>
However, there is still potential for improvement, one being an
efficient implementation of direct block chaining (in certain cases a
block can directly jump to its successor instead of falling back to the
dispatcher, this is currently implemented with calls instead of jmps,
which should be possible to implement with jmps now, after the recent
work on tail call optimizations).  Direct block chaining is a very
useful optimization, on the nbench test case enabling direct block
chaining for regular qemu leads to a 100% speed increase. Another
promising improvement would be the capability to build "super"-blocks
from a set of connected basic blocks, resembling a "hot path". This
work is partially finished and, once complete, should yield a
significant performance improvement since a "super"-block offers a lot
more optimization potential compared to a single basic block.
Nevertheless, it is unlikely that llvm-qemu will ever be much faster
than regular qemu (by replacing its code generator completely, which it
currently does), which is due to the fact that regular qemu has a very
lightweight code generator (it basically only copies blocks of memory
and performs some patching to them and only does static register
allocation) which generates reasonably good code, with a very low
overhead for compilation time. In contrast the LLVM JIT generates
really high quality code (in fact the JIT and the static compiler share
the same code generator), but at a higher price in terms of compilation
time. Ideally the current code generator of qemu would coexist with the
LLVM JIT in llvm-qemu, allowing for different levels of code
quality/compilation time depending on the execution frequency of a
particular block.<br>
  <br>
I guess in your situation, the chances are much higher that you will
see a significant performance increase since you apparently don't do
any dynamic binary translation yet, especially if you decide to use
your existing interpreter in combination with the LLVM JIT in a
hotspot-like manner.<br>
  </div>
  </div>
</blockquote>
We do dynamic binary translation. We are in a similar situation to qemu
except we are SystemC / TLM
compliant for hardware and bus models. Our current technology is
somewhat like qemu, we translate the binary into "semantic ops", which
are pre-compiled at build time, like qemu. Then we execute that code.
The difference is that we extensively use
partial evaluation to translate each binary instruction into one
specialized op. We do not intend to use LLVM at the basic block level.
Our initial idea is similar to what you call super-block, but also to
carry the LLVM compilation in a separate thread, so that we can use the
now available quad-core PC's without slowing down the on-going
execution. In fact similar to what we did a few years ago with the Java
VM,<br>
 but on a separate computer because we did not have even dual-core at
the time.<br>
<blockquote
 cite="mid:aec43a860804051030g14d8514agb574f7fdcbbc9be6@mail.gmail.com"
 type="cite">
  <div class="gmail_quote">
  <div>An important question is how you perform the translation from
your source architecture to LLVM IR, for llvm-qemu I could benefit from
the fact that qemu translates from source machine code to an
intermediate representation which has its instructions implemented in
C, thus I could use llvm-gcc to compile the instructions to equivalent
LLVM IR and did not have to worry about the actual translation from
machine code to qemu IR. Going directly from machine code to LLVM IR
certainly requires more effort.<br>
  <br>
  </div>
  </div>
</blockquote>
This is the effort we are considering. But we already have C++ binary
decoders.<br>
<br>
<blockquote
 cite="mid:aec43a860804051030g14d8514agb574f7fdcbbc9be6@mail.gmail.com"
 type="cite">
  <div class="gmail_quote">
  <div>Which architectures are you interested in particularly?<br>
  <br>
  </div>
  </div>
  <pre wrap="">ARM, MIPS, PowerPC and SH
  </pre>
</blockquote>
<br>
Thanks again, <br>
Vania<br>
<br>
<div class="moz-signature">-- <br>
<meta content="text/html;" http-equiv="Content-Type">
<title></title>
<font face="Helvetica, Arial,
sans-serif">===============================================<br>
Vania JOLOBOFF<br>
LIAMA Sino French Laboratory
<br>
95 Zhongguancun East Road
<br>
Beijing 100080, China
<br>
Tel +86 10 8261 4528     <a class="moz-txt-link-freetext"
href="http://liama.ia.ac.cn/">http://liama.ia.ac.cn/</a><br>
<a class="moz-txt-link-abbreviated"
href="mailto:vania@liama.ia.ac.cn">vania@liama.ia.ac.cn</a> 
or <a class="moz-txt-link-abbreviated"
href="mailto:vania.joloboff@inria.fr">vania.joloboff@inria.fr</a><br>
</font>
<br>
</div>
</body>
</html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Apr 2008 - [LLVMdev] Newbie

[LLVMdev] Newbie

[LLVMdev] Newbie

[LLVMdev] Newbie

[LLVMdev] Newbie

[LLVMdev] llvm-qemu. (Was: Newbie)

[LLVMdev] Newbie

Possibly Parallel Threads