thr3ads.net - llvm dev - [LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64 [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Viktor Pavlu

2011-Apr-01 13:53 UTC

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

Hi All,

I'd like to propose a fast path through code generation for x86-84 in
the JIT execution engine as part of 2011's Google Summer of Code
program.

While the LLVM-JIT is very popular as a first try at jitting
programming languages, projects have abandoned the LLVM-JIT when
disappointed with the overall runtime. The problem is, that the
benefit of faster execution is traded for longer compile time -- which
only pays off for code that is executed frequently. One solution to
this problem is an adaptive compilation scheme with separate compile
strategies for cold and hot code.

The aim of my project is to create the fast path for code that is
compiled the first time in such an adaptive compilation scheme: a code
generator that produces unoptimized code in a very short time.

I plan to implement a two-pass (almost) linear code generator
specifically for x86-64 that

 - performs analyses (e.g. live-range analysis) on LLVM-IR in the
   first pass and

 - then generates x86-64 instructions directly from IR in a second
   pass that writes to the executable memory (e.g. in
   X86CodeEmitter.cpp),

circumventing the more expensive backend passes.


This code generator can then be part of an adaptive compilation
framework within LLVM (a GSoC proposal in this direction is currently
discussed on the llvm-dev mailing list[1]), or from outside of LLVM --
the latter being my main motivation.

I currently work on generating fast cycle-accurate simulators[2]. For
this, our institute has implemented a two-part adaptive compilation
scheme using the LLVM-JIT. Although most optimizations are turned off
already and the FastISel instruction selector is used, the "fast" path
for first-time code generation is still the bottleneck of the
simulators. This is for the largest part due to the SelectionDAG
instruction selection process, hence the motivation for a simpler,
two-pass code generator.

As for my personal details, I'm a PhD student at Vienna University of
Technology (TU Wien) with a strong background in compiler theory,
acquired in a wide variety of undergradute- and graduate-level
courses.

I appreciate any suggestions and would be very excited if someone is
interested in mentoring this.

Please note that I'm offline until April 4, so I cannot respond before
next Tuesday.

- Viktor Pavlu


---
[1]: GSoC Proposal: Adaptive Compilation Framework for LLVM JIT Compiler
http://groups.google.com/group/llvm-dev/browse_thread/thread/b4dfd837e208f9dc/

[2]: Optimal Code Generation for Explicitly Parallel Processors
http://www.complang.tuwien.ac.at/epicopt/

Joshua Warner

2011-Apr-01 15:06 UTC

head link

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

Hi Viktor,

I think this is a great idea overall!  This problem is something that has *
almost* turned me away from LLVM several times now.

I'm by no means an influential member of the community (and hence have no
real say in GSoC projects), but I do have a few comments.

> I plan to implement a two-pass (almost) linear code generator
> specifically for x86-64 that
>
>  - performs analyses (e.g. live-range analysis) on LLVM-IR in the
>   first pass and
>
I assume this is for collecting information for register allocation?  For
fast code generation, I would go with a local, bottom-up, linear register
allocator, which shouldn't require an explicit live-range analysis pass.  It
only needs to know liveness information within a single block (mostly),
which should be easier and faster to compute on-demand instead of in an
analysis pass.

>
>  - then generates x86-64 instructions directly from IR in a second
>   pass that writes to the executable memory (e.g. in
>   X86CodeEmitter.cpp),
>
It sounds as if you are intending on mostly hand-writing the code generation
part.  If this is the case, I would suggest that it would be significantly
more valuable to generate it from the *.td files instead.  That way, it
should be a lot easier to port to other architectures.

Sincerely,
Joshua
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110401/fdfbc516/attachment.html>

Óscar Fuentes

2011-Apr-02 05:09 UTC

head link

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

Viktor Pavlu <vpavlu at gmail.com> writes:

[snip]
> While the LLVM-JIT is very popular as a first try at jitting
> programming languages, projects have abandoned the LLVM-JIT when
> disappointed with the overall runtime.
Hear, hear.
> The problem is, that the benefit of faster execution is traded for
> longer compile time -- which only pays off for code that is executed
> frequently. One solution to this problem is an adaptive compilation
> scheme with separate compile strategies for cold and hot code.
[snip]
> I currently work on generating fast cycle-accurate simulators[2]. For
> this, our institute has implemented a two-part adaptive compilation
> scheme using the LLVM-JIT. Although most optimizations are turned off
> already and the FastISel instruction selector is used, the "fast"
path
> for first-time code generation is still the bottleneck of the
> simulators. This is for the largest part due to the SelectionDAG
> instruction selection process, hence the motivation for a simpler,
> two-pass code generator.
Well, anything that makes the JIT usable for those of us compiling
medium-sized code (on the order of hundred of KB to a few MB of
generated native code) is greatly appreciated.

As a means of improving runtime performance, my compiler supports the
LLVM JIT and a dumb X86 assembler generator that makes very simple
optimizations and has some hard constraints. The latter runs on a
fraction of the time and performs very similar or better than the LLVM
JIT (without LLVM's optimization passes.) So I'm pretty sure that it is
possible to dramatically reduce the time required by the JIT without a
severe impact on performance.

[snip]

Eric Christopher

2011-Apr-04 19:50 UTC

head link

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

On Apr 1, 2011, at 6:53 AM, Viktor Pavlu wrote:
> I currently work on generating fast cycle-accurate simulators[2]. For
> this, our institute has implemented a two-part adaptive compilation
> scheme using the LLVM-JIT. Although most optimizations are turned off
> already and the FastISel instruction selector is used, the "fast"
path
> for first-time code generation is still the bottleneck of the
> simulators. This is for the largest part due to the SelectionDAG
> instruction selection process, hence the motivation for a simpler,
> two-pass code generator.
This is effectively what fastisel was created for - there are just IR
constructs that don't go through that path. The idea is that fastisel
will get most of the IR and everything that'd be really hard we just
punt to the DAG. I imagine running more things through fastisel would
help.

That won't help the slow register allocation problem though - even
the fast allocator is pretty slow. I haven't seen what your plan
is for register allocation or were you planning on just using a few
registers in defined ways?

Also, X86CodeEmitter.cpp is going away to be replaced with the MC
emitters.

-eric

Viktor Pavlu

2011-Apr-05 09:56 UTC

head link

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

On Mon, Apr 4, 2011 at 9:50 PM, Eric Christopher <echristo at apple.com>
wrote:>
> On Apr 1, 2011, at 6:53 AM, Viktor Pavlu wrote:
>
>> [...] Although most optimizations are turned off
>> already and the FastISel instruction selector is used, the
"fast" path
>> for first-time code generation is still the bottleneck [...]
>
> This is effectively what fastisel was created for - there are just IR
> constructs that don't go through that path. The idea is that fastisel
> will get most of the IR and everything that'd be really hard we just
> punt to the DAG. I imagine running more things through fastisel would
> help.
To me, increasing coverage of the FastISel seemed more involved than
directly emitting opcodes to memory, with a lesser outlook on
reducing overhead.
> That won't help the slow register allocation problem though - even
> the fast allocator is pretty slow. I haven't seen what your plan
> is for register allocation or were you planning on just using a few
> registers in defined ways?
My first idea was to implement a linear scan allocator integrated
into the code generation pass.
> Also, X86CodeEmitter.cpp is going away to be replaced with the MC
> emitters.
Yes, I remember reading about this on the mailing list.
With our simulator generators we are still living in 2.2/2.6 land,
though, but we will change that.

X86CodeEmitter was only meant to indicate that in my intended fast
path there is nothing in between the LLVM-IR passes and the final
emission of the code, i.e. an LLVM-IR pass that produces x86-64.

- Viktor

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Apr 2011 - [LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64

Possibly Parallel Threads