thr3ads.net - llvm dev - [LLVMdev] BigBlock register allocator [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Duraid Madina

2007-Jun-22 09:46 UTC

[LLVMdev] BigBlock register allocator

Hi everyone,

Quick summary:

	LLVM now has a new register allocator particularly suitable for 
compiling (very) large, machine-generated functions.

Longer story:

	I've recently been using LLVM in an application that involves JITing 
fairly large functions that have no control flow - they're just flat 
sequences of instructions, anywhere from 100 to 10000+ in size. (The 
control flow is all in the host program, which works out which monster 
function to call and when.)

	The default (linearscan) register allocator wasn't doing a good job, as 
it doesn't (yet) have live range splitting. It would quickly use all 
available registers and then get stuck, using only a single register 
(and the stack, a lot!) for hundreds or thousands of instructions at a 
time, greatly slowing (+bloating) the code. Luckily, I didn't have the 
presence of mind to actually try using the local register allocator 
before I ripped its guts out, turning it into RegAllocBigBlock.cpp 
that's now in CVS. (Had I done that, I would have been quite happy to 
find that the local allocator produced much better code, much faster 
than linearscan.)

	The good news is the new "BigBlock" allocator turns out to produce
even
better code than the local allocator when blocks are very large. We're 
talking a +10~20% speed boost on average. (If your basic blocks are 
small, or there's not much register pressure, you'll actually get the 
same code out of both local and BigBlock.)

	While BigBlock isn't (and never will be) as fast as the local 
allocator, it's not much slower, doesn't use much memory, and is 
certainly faster than linearscan. So if you're compiling very large, 
(probably) machine-generated blocks of straight-line code, give the 
Local and BigBlock allocators a try, especially if you're JITing things 
and compile time is important.

	If anyone has any problems with or comments on the BigBlock allocator, 
please drop me an email!

	Enjoy,

	Duraid

Roman Levenstein

2007-Jun-22 10:20 UTC

head link

[LLVMdev] BigBlock register allocator

Hi Duraid,
> Hi everyone,
> 
> Quick summary:
> 
> 	LLVM now has a new register allocator particularly suitable for 
> compiling (very) large, machine-generated functions.
Congrats! Very good job!
 > Longer story:
> 
> 	I've recently been using LLVM in an application that involves JITing
> 
> fairly large functions that have no control flow - they're just flat 
> sequences of instructions, anywhere from 100 to 10000+ in size. (The 
> control flow is all in the host program, which works out which
> monster  function to call and when.)
 > The default (linearscan) register allocator wasn't doing a good job,
as > it doesn't (yet) have live range splitting. It would quickly use all 
> available registers and then get stuck, using only a single register 
> (and the stack, a lot!) for hundreds or thousands of instructions at
> a time, greatly slowing (+bloating) the code. 
True. I'm working on the version of the linear scan based on Wimmer's
thesis. It supports live range splitting. I'd like to compare it with
yours. Do you have any good examples of those fairly large functions
that are just flat sequences of instructions, anywhere from 100 to
10000+ in size??? It would be nice, if you could send me those test
cases (as C or ll files). I could use it then as a basis for a
comparision and report about results.
 > 	The good news is the new "BigBlock" allocator turns out to 
> produce even  better code than the local allocator when blocks are
very> large. We're talking a +10~20% speed boost on average. (If your basic
> blocks are small, or there's not much register pressure, you'll 
> actually get the same code out of both local and BigBlock.)
Do you have numbers comparing it to the current version of the LLVM's
linear scan? The win of your allocator over the linear scan should be
even better, I guess. 

 > 	While BigBlock isn't (and never will be) as fast as the local 
> allocator, it's not much slower, doesn't use much memory, and is 
> certainly faster than linearscan. So if you're compiling very large, 
> (probably) machine-generated blocks of straight-line code, give the 
> Local and BigBlock allocators a try, especially if you're JITing
> things and compile time is important.
I looked at your code. And I see some things that could be
significantlty sped up, e.g.
 - InsnTimes handling. I have the feeling, this map can be eliminated
completely.
 - use of the VRegReadTable. The vector of read occurences can be
shortened every time, you processed the corresponding intruction. This
makes it shorter and makes searches inside this vector faster, thus
making chooseReg much faster. Probably also some other optimizations
can be applied to the chooseReg function.
 - PhysRegsUseOrder - you remove some elements from the middle of this
vector in removePhysReg. This is not a very efficient operation on the
vectors, since it need to copy the tail of the vector. I think using a
list data-structure could be much more efficient for this purpose

I think these changes may significantely improve the performance of
your BigBlock register allocator. I'll try to come up with some more
concrete proposals or even patches over the week-end or next week. 

-Roman

__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen
Massenmails.
http://mail.yahoo.com

duraid at octopus.com.au

2007-Jun-22 14:39 UTC

head link

[LLVMdev] BigBlock register allocator

Hi Roman,
> True. I'm working on the version of the linear scan based on
Wimmer's
> thesis. It supports live range splitting. I'd like to compare it with
> yours. Do you have any good examples of those fairly large functions
> that are just flat sequences of instructions, anywhere from 100 to
> 10000+ in size???
You can find a couple attached to bug #1512, but I'm happy to email you
more. Just in case people out there think JITing 10000-instruction functions
is crazy (I did), you should know that the Core2 processor appears to have
no real problem running *very* large functions out of L2$.
> I could use it then as a basis for a
> comparision and report about results.
No problem - I'd be curious to see if your allocator can improve the code
enough to justify the extra time spent.
> Do you have numbers comparing it to the current version of the LLVM's
> linear scan? The win of your allocator over the linear scan should be
> even better, I guess.
Both local and bigblock are way, way ahead of the linearscan allocator for my
application (and I imagine others, such as large unrolled linear algebra
kernels, FFTs/codecs etc). On the code I have, performance improves
by around 50~100% using the local allocator. Using bigblock gives another
10~20% on top of that, with the gap more or less proportional to the size
of the block. For blocks with <200 instructions, local and bigblock tend
to produce more or less the same code.
> I looked at your code. And I see some things that could be
> significantlty sped up, e.g.
>  - InsnTimes handling. I have the feeling, this map can be eliminated
> completely.
Absolutely, I agree it should go. I was in a hurry, and my first attempt
at eliminating it didn't work, so I just put it back. I'll get around to
killing it again at some point.
>  - use of the VRegReadTable. The vector of read occurences can be
> shortened every time, you processed the corresponding intruction. This
> makes it shorter and makes searches inside this vector faster, thus
> making chooseReg much faster. Probably also some other optimizations
> can be applied to the chooseReg function.
chooseReg is of course where most of the time is spent, but truth be told,
98% of the time my application spends in LLVM is *not* in the register
allocator. Or I should say *was* not - a quick patch by Evan Cheng earlier
today to some scheduler code more than doubled LLVM's speed on large
functions. However, there are still some very serious LLVM performance
issues when codegenning large basic blocks. A couple of years ago,
I clocked the LLVM JIT at roughly 10,000 cycles per byte of code
generated. Currently, it's *well* over 200,000. (Again, this is on large
functions. On small functions, the performance is presumably much better.)
Of course, back then, LLVM didn't *have* a scheduler, and even things like
instruction selection were simpler. But somewhere, some nasty speed issues
have crept in when compiling huge functions. I'll do my best to try and sift
through these over the next few weeks. (Or at least provide Evan with plenty
of test cases and beg him to fix things. ;)
>  - PhysRegsUseOrder - you remove some elements from the middle of this
> vector in removePhysReg. This is not a very efficient operation on the
> vectors, since it need to copy the tail of the vector. I think using a
> list data-structure could be much more efficient for this purpose
Actually, PhysRegsUseOrder isn't really needed for BigBlock - it's
a leftover from the local allocator. So there are even more brutal
performance fixes to be made to BigBlock, however your suggestion
applies to the Local allocator for sure!
> I think these changes may significantely improve the performance of
> your BigBlock register allocator. I'll try to come up with some more
> concrete proposals or even patches over the week-end or next week.
Patches would be a dream - while I don't expect BigBlock to ever be
more than ~25% of total codegen time, every little bit helps. :)

Thanks for your comments,

    Duraid

Evan Cheng

2007-Jun-22 20:05 UTC

head link

[LLVMdev] BigBlock register allocator

Thanks Duraid. Nice job.

I'd like to point out a couple of things.

1. The local register allocator is currently broken. :-( Actually it  
has been broken for a while (due to physical sub-regs changes in  
earlier passes. I am currently working on  it. You probably will need  
to merge the fixes back once I am done.
2. Please benchmark against some more "normally sized" programs.  
Perhaps SPEC? I don't expect a local register allocator to win  
against linearscan.

Thanks,

Evan

On Jun 22, 2007, at 2:46 AM, Duraid Madina wrote:
> Hi everyone,
>
> Quick summary:
>
> 	LLVM now has a new register allocator particularly suitable for
> compiling (very) large, machine-generated functions.
>
> Longer story:
>
> 	I've recently been using LLVM in an application that involves JITing
> fairly large functions that have no control flow - they're just flat
> sequences of instructions, anywhere from 100 to 10000+ in size. (The
> control flow is all in the host program, which works out which monster
> function to call and when.)
>
> 	The default (linearscan) register allocator wasn't doing a good  
> job, as
> it doesn't (yet) have live range splitting. It would quickly use all
> available registers and then get stuck, using only a single register
> (and the stack, a lot!) for hundreds or thousands of instructions at a
> time, greatly slowing (+bloating) the code. Luckily, I didn't have the
> presence of mind to actually try using the local register allocator
> before I ripped its guts out, turning it into RegAllocBigBlock.cpp
> that's now in CVS. (Had I done that, I would have been quite happy to
> find that the local allocator produced much better code, much faster
> than linearscan.)
>
> 	The good news is the new "BigBlock" allocator turns out to
produce
> even
> better code than the local allocator when blocks are very large. We're
> talking a +10~20% speed boost on average. (If your basic blocks are
> small, or there's not much register pressure, you'll actually get
the
> same code out of both local and BigBlock.)
>
> 	While BigBlock isn't (and never will be) as fast as the local
> allocator, it's not much slower, doesn't use much memory, and is
> certainly faster than linearscan. So if you're compiling very large,
> (probably) machine-generated blocks of straight-line code, give the
> Local and BigBlock allocators a try, especially if you're JITing  
> things
> and compile time is important.
>
> 	If anyone has any problems with or comments on the BigBlock  
> allocator,
> please drop me an email!
>
> 	Enjoy,
>
> 	Duraid
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Jun 2007 - [LLVMdev] BigBlock register allocator

[LLVMdev] BigBlock register allocator

[LLVMdev] BigBlock register allocator

[LLVMdev] BigBlock register allocator

[LLVMdev] BigBlock register allocator

Seemingly Similar Threads