thr3ads.net - llvm dev - [LLVMdev] Greedy register allocation [May 2011]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2011-May-03 17:06 UTC

[LLVMdev] Greedy register allocation

On May 3, 2011, at 9:19 AM, David A. Greene wrote:
> Jakob Stoklund Olesen <stoklund at 2pi.dk> writes:
> 
>>    +10.0% SingleSource/Benchmarks/CoyoteBench/huffbench
>>    +12.0% SingleSource/Benchmarks/McGill/chomp
>>    +18.0% SingleSource/Benchmarks/BenchmarkGame/n-body
>>    +45.5% SingleSource/Benchmarks/BenchmarkGame/puzzle
>>    +10.0% SingleSource/Benchmarks/Shootout/heapsort
>>    +10.5% MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des
>>    +10.9% SingleSource/Benchmarks/Shootout-C++/heapsort
>>    +11.7% MultiSource/Benchmarks/Ptrdist/bc/bc
>>    +12.0% MultiSource/Benchmarks/McCat/17-bintr/bintr
>>    +55.2% SingleSource/Benchmarks/Shootout/methcall
> 
> Yikes!  Do we know why these codes got so much worse?  Even 5% is a big
> deal on x86.
On x86-64, n-body and puzzle have the exact same instructions as with linear
scan. The only difference is the choice of registers. This causes some loops to
be a few bytes longer or shorter which can easily change performance by that
much if that small loop is all the benchmark does.

The greedy allocator is trying to pick registers so inner loops are as small as
possible, but that is not always the right thing to do.

Unfortunately, we don't model the effects of code alignment, so there is a
lot of luck involved.

I am working my way through the regressions, looking for things the allocator
did wrong. Any help is appreciated, please file bugs if you find examples of
stupid register allocation.

/jakob

David A. Greene

2011-May-03 19:03 UTC

head link

[LLVMdev] Greedy register allocation

Jakob Stoklund Olesen <stoklund at 2pi.dk> writes:
>> Yikes!  Do we know why these codes got so much worse?  Even 5% is a big
>> deal on x86.
>
> On x86-64, n-body and puzzle have the exact same instructions as with
> linear scan. The only difference is the choice of registers. This
> causes some loops to be a few bytes longer or shorter which can easily
> change performance by that much if that small loop is all the
> benchmark does.
Ok, I can believe that.
> The greedy allocator is trying to pick registers so inner loops are as
> small as possible, but that is not always the right thing to do.
How does it balance that against spill cost?
> Unfortunately, we don't model the effects of code alignment, so there
> is a lot of luck involved.
As with any allocator.  :)
> I am working my way through the regressions, looking for things the
> allocator did wrong. Any help is appreciated, please file bugs if you
> find examples of stupid register allocation.
Certainly.  I would ask that we keep linearscan around, if possible, as
long as there are significant regressions like this.  Our customers tend
to really, really care about performance.

                             -Dave

Jakob Stoklund Olesen

2011-May-03 20:20 UTC

head link

[LLVMdev] Greedy register allocation

On May 3, 2011, at 12:03 PM, David A. Greene wrote:
>> 
>> The greedy allocator is trying to pick registers so inner loops are as
>> small as possible, but that is not always the right thing to do.
> 
> How does it balance that against spill cost?
I added the CostPerUse field to the register descriptions. The allocator will
try to minimize the spill weight assigned to registers with a CostPerUse. It
does it by swapping physical register assignments, it won't do it if it
requires extra spilling.

This is actually the cause of the n-body regression. The benchmark has nested
loops:

	%vreg1 = const pool load
header1:
	; large blocks with lots of floating point ops
header2:
	; small loop using %vreg1
	jnz header2
...
	jnz header1

The def of %vreg1 has been hoisted by LICM so it is live across a block with
lots of floating point code. The allocator uses the low xmm registers for the
large block, and %xmm8 is left for %vreg1 which has a low spill weight. This
significantly improves code size, but the small loop suffers.

A low xmm register could be used for %vreg1, but would need to be
rematerialized. The allocator won't go that far just to use cheaper
registers.

In this case it might have helped to split the live range and rematerialize, but
usually that won't be the case.

/jakob

Jakob Stoklund Olesen

2011-May-03 20:30 UTC

head link

[LLVMdev] Greedy register allocation

On May 3, 2011, at 12:03 PM, David A. Greene wrote:
>> 
>> I am working my way through the regressions, looking for things the
>> allocator did wrong. Any help is appreciated, please file bugs if you
>> find examples of stupid register allocation.
> 
> Certainly.  I would ask that we keep linearscan around, if possible, as
> long as there are significant regressions like this.  Our customers tend
> to really, really care about performance.
That's reasonable, and it is also useful to keep it around as a reference
when greedy breaks.

On the other hand, I really want to clean up the code surrounding register
allocation, and that is much easier to do after linear scan is gone. There is a
good chance it won't make it to the 3.0 release.

/jakob

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - May 2011 - [LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

Apparently Analagous Threads