thr3ads.net - llvm dev - [LLVMdev] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Denis Steckelmacher

2014-Feb-25 18:06 UTC

[LLVMdev] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind

On 02/25/2014 04:50 PM, John Criswell wrote:>
> I think a more interesting idea would be to use LLVM to perform
> instrumentation and then to use Valgrind to instrument third-party
> libraries linked into the program.
>
> What I'm imagining is this: Let's say you instrument a program with
> SAFECode or Asan to find memory safety errors.  When you run the program
> under Valgrind, the portion of the code instrumented by SAFECode or Asan
> runs natively without dynamic binary instrumentation because it's
> already been instrumented.  When the program calls uninstrumented code
> (e.g., code in a dynamic library), Valgrind starts dynamic binary
> instrumentation to do instrumentation.
>
> A really neat thing you could do with this is to share run-time data
> structures between the LLVM and Valgrind instrumentation.  For example,
> Valgrind could use SAFECode's meta-data on object allocations and
> vice-versa.
>
Someone proposed to cache the results of a JIT compilation. Caching LLVM 
bitcode is easy (and the LLVM optimizations operate on bitcode, so they 
don't need to be re-run on bitcode reload), and may be a good way to 
fasten Valgrind. Caching native binary code is more difficult and would 
only be useful if LLVM's codegen is slow (I think that the codegen can 
be configured to be fast, for instance by using a simpler register 
allocator).

If every .so is cached in a separate bitcode file, loading an 
application would only require the generation of bitcode for the 
application itself, not the libraries it uses, provided that they didn't 
change since another application using them was analyzed. That may speed 
up the start-up of Valgrind.

Timur Iskhodzhanov

2014-Feb-25 18:21 UTC

head link

[LLVMdev] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind

Valgrind is still going to be single threaded, right?
25 февр. 2014 г. 22:10 пользователь "Denis Steckelmacher" <
steckdenis at yahoo.fr> написал:
> On 02/25/2014 04:50 PM, John Criswell wrote:
>
>>
>> I think a more interesting idea would be to use LLVM to perform
>> instrumentation and then to use Valgrind to instrument third-party
>> libraries linked into the program.
>>
>> What I'm imagining is this: Let's say you instrument a program
with
>> SAFECode or Asan to find memory safety errors.  When you run the
program
>> under Valgrind, the portion of the code instrumented by SAFECode or
Asan
>> runs natively without dynamic binary instrumentation because it's
>> already been instrumented.  When the program calls uninstrumented code
>> (e.g., code in a dynamic library), Valgrind starts dynamic binary
>> instrumentation to do instrumentation.
>>
>> A really neat thing you could do with this is to share run-time data
>> structures between the LLVM and Valgrind instrumentation.  For example,
>> Valgrind could use SAFECode's meta-data on object allocations and
>> vice-versa.
>>
>>
> Someone proposed to cache the results of a JIT compilation. Caching LLVM
> bitcode is easy (and the LLVM optimizations operate on bitcode, so they
> don't need to be re-run on bitcode reload), and may be a good way to
fasten
> Valgrind. Caching native binary code is more difficult and would only be
> useful if LLVM's codegen is slow (I think that the codegen can be
> configured to be fast, for instance by using a simpler register allocator).
>
> If every .so is cached in a separate bitcode file, loading an application
> would only require the generation of bitcode for the application itself,
> not the libraries it uses, provided that they didn't change since
another
> application using them was analyzed. That may speed up the start-up of
> Valgrind.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140225/62c00b18/attachment.html>

Kirill Batuzov

2014-Feb-26 11:23 UTC

head link

[LLVMdev] [Valgrind-developers] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind

Hi,

only one letter got to valgrind-developers mailing list. I'll quote
the first message of the thread so that those who do not read llvmdev
knew what's this discusssion about.

=== Begin of the first message ==> Hi,> 
> I've seen on the LLVM's Open Projet Page [1] an idea about using
LLVM to
> generate native code in Valgrind. For what I know, Valgrind uses libVEX
> to translate native instructions into a bitcode, used to add the
> instrumentation and then translated back to native code for execution.
> 
> Valgrind and LLVM are two tools that I use nearly every day. I'm also
> very interested in code generation and optimization, so adding the
> possibility to use LLVM to generate native code in libVEX interests me
> very much. Is it a good idea? Could a LLVM backend bring something
> useful to Valgrind (for instance, faster execution or more targets
> supported) ?
> 
> I've sent this message to the LLVM and Valgrind mailing lists because
> I've originally found the idea on the LLVM's website, but Valgrind
is
> the object of the idea. By the way, does anyone already know if LLVM or
> Valgrind will be a mentoring organization for this year's GSoC?
> 
> You can find in [2] the list of my past projects. During the GSoC 2011,
> I had the chance to use the Clang libraries to compile C code, and the
> LLVM JIT to execute it (with instrumented stdlib functions). I have also
> played with the LLVM C bindings to generate code when I explored some
> parts of Mesa.
> 
> Denis Steckelmacher
> 
> [1] : http://llvm.org/OpenProjects.html#misc_new
> [2] : http://steckdenis.be/page-projects.html === End of the first message ==
The idea of using LLVM backend in some dynamic binary translation (DBT)
project has became popular recently. Unfortunately it does not prove
to be good.

I suggest you check the related work in QEMU. DBT part of both QEMU and
Valgrind works in similar way. And there were a bunch of works on using
LLVM as a QEMU backend. They resulted in slowdown mostly. In [1] the
authors reported 35x slowdown, in [2] there were around 2x slowdown.
Finally in [3] the authors reported performance gain, but there are some
catches.

1. They used LLVM not only for backend. They replaced internal
representation with LLVM. This is not an option for Valgrind because
you'll need to rewrite all existing tools (including third party ones)
to do it.

2. They use SPEC CPU benchmarks to measure their speedup. Important
things about these tests is that they have little code to translate but
a lot of computations to do with translated code. And even some of these
tests are not doing too well (like 403.gcc). On real life applications
(like firefox) where there are a lot of library code to translate and
not so much computations to do results may be totally different.

LLVM is not doing good as a DBT backend mostly for two reasons.

First, in DBT you need to translate while you are running the
application. You need to do it really fast. Compiler is not optimized
for that task. LLVM JIT? May be.

Second, in DBT you translate code in small portions like basic blocks,
or extended basic blocks. They have very simple structure. There is no
loops, there is no redundancy from translation high level language to
low level. There is nothing good sophisticated optimizations can do
better then very simple ones.

In conclusion I second what have already been said: this project sounds
like fun to do, but do not expect much practical results from it.
> It would also be interesting to cache the LLVM-generated code
> between runs
The tricky part here is to build matching between binary code fragments
and cached translations from previous runs. In worst case all you know
about the binary code is it's address (which can vary between runs) and
the binary code itself.

[1] : "Dynamically Translating x86 to LLVM using QEMU"
http://infoscience.epfl.ch/record/149975/files/x86-llvm-translator-chipounov_2.pdf
[2] : llvm-qemu project.
http://code.google.com/p/llvm-qemu/
[3] : "LnQ: Building High Performance Dynamic Binary Translator
       with Existing Compiler Backends"
http://people.cs.nctu.edu.tw/~chenwj/slide/paper/lnq.pdf

-- 
Kirill

llvm dev - Feb 2014 - [LLVMdev] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind

[LLVMdev] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind

[LLVMdev] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind

[LLVMdev] [Valgrind-developers] [GSoC 2014] Using LLVM as a code-generation backend for Valgrind