Viktor Pavlu
2011-Apr-05 09:56 UTC
[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64
On Mon, Apr 4, 2011 at 9:50 PM, Eric Christopher <echristo at apple.com> wrote:> > On Apr 1, 2011, at 6:53 AM, Viktor Pavlu wrote: > >> [...] Although most optimizations are turned off >> already and the FastISel instruction selector is used, the "fast" path >> for first-time code generation is still the bottleneck [...] > > This is effectively what fastisel was created for - there are just IR > constructs that don't go through that path. The idea is that fastisel > will get most of the IR and everything that'd be really hard we just > punt to the DAG. I imagine running more things through fastisel would > help.To me, increasing coverage of the FastISel seemed more involved than directly emitting opcodes to memory, with a lesser outlook on reducing overhead.> That won't help the slow register allocation problem though - even > the fast allocator is pretty slow. I haven't seen what your plan > is for register allocation or were you planning on just using a few > registers in defined ways?My first idea was to implement a linear scan allocator integrated into the code generation pass.> Also, X86CodeEmitter.cpp is going away to be replaced with the MC > emitters.Yes, I remember reading about this on the mailing list. With our simulator generators we are still living in 2.2/2.6 land, though, but we will change that. X86CodeEmitter was only meant to indicate that in my intended fast path there is nothing in between the LLVM-IR passes and the final emission of the code, i.e. an LLVM-IR pass that produces x86-64. - Viktor
Jim Grosbach
2011-Apr-05 17:16 UTC
[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64
On Apr 5, 2011, at 2:56 AM, Viktor Pavlu wrote:> On Mon, Apr 4, 2011 at 9:50 PM, Eric Christopher <echristo at apple.com> wrote: >> >> On Apr 1, 2011, at 6:53 AM, Viktor Pavlu wrote: >> >>> [...] Although most optimizations are turned off >>> already and the FastISel instruction selector is used, the "fast" path >>> for first-time code generation is still the bottleneck [...] >> >> This is effectively what fastisel was created for - there are just IR >> constructs that don't go through that path. The idea is that fastisel >> will get most of the IR and everything that'd be really hard we just >> punt to the DAG. I imagine running more things through fastisel would >> help. > > To me, increasing coverage of the FastISel seemed more involved than > directly emitting opcodes to memory, with a lesser outlook on > reducing overhead.That seems extremely unlikely. You'd be effectively re-implementing both fast-isel and the MC binary emitter layers, and it sounds like a new register allocator as well. What Eric is suggesting is instead locating which IR constructs are not being handled by fast-isel and are causing problems (i.e., are being frequently encountered in your code-base) and implementing fast-isel handling for them. That will remove the selectiondag overhead that you've identified as the primary compile-time problem. -Jim>> That won't help the slow register allocation problem though - even >> the fast allocator is pretty slow. I haven't seen what your plan >> is for register allocation or were you planning on just using a few >> registers in defined ways? > > My first idea was to implement a linear scan allocator integrated > into the code generation pass. > >> Also, X86CodeEmitter.cpp is going away to be replaced with the MC >> emitters. > > Yes, I remember reading about this on the mailing list. > With our simulator generators we are still living in 2.2/2.6 land, > though, but we will change that. > > X86CodeEmitter was only meant to indicate that in my intended fast > path there is nothing in between the LLVM-IR passes and the final > emission of the code, i.e. an LLVM-IR pass that produces x86-64. > > - Viktor > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
----- Original Message ----> From: Jim Grosbach <grosbach at apple.com> > To: Viktor Pavlu <vpavlu at gmail.com> > Cc: llvmdev at cs.uiuc.edu > Sent: Tue, April 5, 2011 1:16:34 PM > Subject: Re: [LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64 > > > On Apr 5, 2011, at 2:56 AM, Viktor Pavlu wrote: > > > On Mon, Apr 4, 2011 at 9:50 PM, Eric Christopher <echristo at apple.com>wrote:> >> > >> On Apr 1, 2011, at 6:53 AM, Viktor Pavlu wrote: > >> > >>> [...] Although most optimizations are turned off > >>> already and the FastISel instruction selector is used, the "fast" path > >>> for first-time code generation is still the bottleneck [...] > >> > >> This is effectively what fastisel was created for - there are just IR > >> constructs that don't go through that path. The idea is that fastisel > >> will get most of the IR and everything that'd be really hard we just > >> punt to the DAG. I imagine running more things through fastisel would > >> help. > > > > To me, increasing coverage of the FastISel seemed more involved than > > directly emitting opcodes to memory, with a lesser outlook on > > reducing overhead. > > That seems extremely unlikely. You'd be effectively re-implementing both >fast-isel and the MC binary emitter layers, and it sounds like a new register >allocator as well. > > What Eric is suggesting is instead locating which IR constructs are not being >handled by fast-isel and are causing problems (i.e., are being frequently >encountered in your code-base) and implementing fast-isel handling for them. >That will remove the selectiondag overhead that you've identified as the >primary compile-time problem. > > -JimAn alternative that would expand LLVMs capabilities would be to write an interpreter for the LLVM IR itself. A well written interpretation framework could be used by the compiler as well. - Jan
Eric Christopher
2011-Apr-05 18:33 UTC
[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64
On Apr 5, 2011, at 2:56 AM, Viktor Pavlu wrote:> On Mon, Apr 4, 2011 at 9:50 PM, Eric Christopher <echristo at apple.com> wrote: >> >> On Apr 1, 2011, at 6:53 AM, Viktor Pavlu wrote: >> >>> [...] Although most optimizations are turned off >>> already and the FastISel instruction selector is used, the "fast" path >>> for first-time code generation is still the bottleneck [...] >> >> This is effectively what fastisel was created for - there are just IR >> constructs that don't go through that path. The idea is that fastisel >> will get most of the IR and everything that'd be really hard we just >> punt to the DAG. I imagine running more things through fastisel would >> help. > > To me, increasing coverage of the FastISel seemed more involved than > directly emitting opcodes to memory, with a lesser outlook on > reducing overhead. >Then you're not quite understanding what fast-isel does completely. The idea behind fast-isel is that common code that can be easily splatted out using effectively assembly is done using that. If you're seeing a lot of time in the dag instruction selection then the code you're putting through fast-isel isn't getting all the way through and fast-isel is punting to selection dag. If this is happening a lot you'll probably want to change the IR that you're generating if possible. If you'd like to see the constructs that fast-isel is punting to selection dag on, then there are options to get it to be more verbose, or even abort.>> That won't help the slow register allocation problem though - even >> the fast allocator is pretty slow. I haven't seen what your plan >> is for register allocation or were you planning on just using a few >> registers in defined ways? > > My first idea was to implement a linear scan allocator integrated > into the code generation pass.You may want to look at the fast register allocator then.> >> Also, X86CodeEmitter.cpp is going away to be replaced with the MC >> emitters. > > Yes, I remember reading about this on the mailing list. > With our simulator generators we are still living in 2.2/2.6 land, > though, but we will change that. > > X86CodeEmitter was only meant to indicate that in my intended fast > path there is nothing in between the LLVM-IR passes and the final > emission of the code, i.e. an LLVM-IR pass that produces x86-64.Effectively what you're talking about then is a pass that rewrites fast-isel to use MC instead of machine instructions. Also, one of the advantages of fast-isel is that there's very little that has to go through the hand coded parts - a great deal is autogenerated out of the .td files. This is something that any replacement should do as well. -eric
Óscar Fuentes
2011-Apr-05 19:41 UTC
[LLVMdev] GSoC 2011: Fast JIT Code Generation for x86-64
Jim Grosbach <grosbach at apple.com> writes:>> To me, increasing coverage of the FastISel seemed more involved than >> directly emitting opcodes to memory, with a lesser outlook on >> reducing overhead. > > That seems extremely unlikely. You'd be effectively re-implementing > both fast-isel and the MC binary emitter layers, and it sounds like a > new register allocator as well. > > What Eric is suggesting is instead locating which IR constructs are > not being handled by fast-isel and are causing problems (i.e., are > being frequently encountered in your code-base) and implementing > fast-isel handling for them. That will remove the selectiondag > overhead that you've identified as the primary compile-time problem.At some point on the past someone was kind enough to add fast-isel for some instructions frequently emitted by my compiler, hoping that that would speed up JITting. The results were dissapointing (negligible, IIRC). Either fast-isel does not make much of a difference or the main inefficiency is elsewhere.