thr3ads.net - llvm dev - [LLVMdev] Greedy register allocation [May 2011]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2011-May-03 20:20 UTC

[LLVMdev] Greedy register allocation

On May 3, 2011, at 12:03 PM, David A. Greene wrote:
>> 
>> The greedy allocator is trying to pick registers so inner loops are as
>> small as possible, but that is not always the right thing to do.
> 
> How does it balance that against spill cost?
I added the CostPerUse field to the register descriptions. The allocator will
try to minimize the spill weight assigned to registers with a CostPerUse. It
does it by swapping physical register assignments, it won't do it if it
requires extra spilling.

This is actually the cause of the n-body regression. The benchmark has nested
loops:

	%vreg1 = const pool load
header1:
	; large blocks with lots of floating point ops
header2:
	; small loop using %vreg1
	jnz header2
...
	jnz header1

The def of %vreg1 has been hoisted by LICM so it is live across a block with
lots of floating point code. The allocator uses the low xmm registers for the
large block, and %xmm8 is left for %vreg1 which has a low spill weight. This
significantly improves code size, but the small loop suffers.

A low xmm register could be used for %vreg1, but would need to be
rematerialized. The allocator won't go that far just to use cheaper
registers.

In this case it might have helped to split the live range and rematerialize, but
usually that won't be the case.

/jakob

David A. Greene

2011-May-03 22:23 UTC

head link

[LLVMdev] Greedy register allocation

Jakob Stoklund Olesen <stoklund at 2pi.dk> writes:
>>> The greedy allocator is trying to pick registers so inner loops are
as
>>> small as possible, but that is not always the right thing to do.
>> 
>> How does it balance that against spill cost?
>
> I added the CostPerUse field to the register descriptions. The
> allocator will try to minimize the spill weight assigned to registers
> with a CostPerUse. It does it by swapping physical register
> assignments, it won't do it if it requires extra spilling.
CostPerUse models the encoding size of the register?
> This is actually the cause of the n-body regression. The benchmark has
nested loops:
>
> 	%vreg1 = const pool load
> header1:
> 	; large blocks with lots of floating point ops
> header2:
> 	; small loop using %vreg1
> 	jnz header2
> ...
> 	jnz header1
>
> The def of %vreg1 has been hoisted by LICM so it is live across a
> block with lots of floating point code. The allocator uses the low xmm
> registers for the large block, and %xmm8 is left for %vreg1 which has
> a low spill weight. This significantly improves code size, but the
> small loop suffers.
Why does %xmm8 have a low spill weight?  It's used in an inner loop.
> In this case it might have helped to split the live range and
> rematerialize, but usually that won't be the case.
That was my initial reaction.  Splitting should have at least
rematerialized the value just before header2.  That should significantly
improve things.  This is a classic motivational case for live range
splitting.

Another way to approach this is to add a register pressure heuristic to
LICM so it doesn't spill so much stuff out over such a large loop body.

                                   -Dave

Jakob Stoklund Olesen

2011-May-03 22:28 UTC

head link

[LLVMdev] Greedy register allocation

On May 3, 2011, at 3:23 PM, David A. Greene wrote:
> Jakob Stoklund Olesen <stoklund at 2pi.dk> writes:
> 
>>>> The greedy allocator is trying to pick registers so inner loops
are as
>>>> small as possible, but that is not always the right thing to
do.
>>> 
>>> How does it balance that against spill cost?
>> 
>> I added the CostPerUse field to the register descriptions. The
>> allocator will try to minimize the spill weight assigned to registers
>> with a CostPerUse. It does it by swapping physical register
>> assignments, it won't do it if it requires extra spilling.
> 
> CostPerUse models the encoding size of the register?
Yes, something like that.
>> This is actually the cause of the n-body regression. The benchmark has
nested loops:
>> 
>> 	%vreg1 = const pool load
>> header1:
>> 	; large blocks with lots of floating point ops
>> header2:
>> 	; small loop using %vreg1
>> 	jnz header2
>> ...
>> 	jnz header1
>> 
> 
>> The def of %vreg1 has been hoisted by LICM so it is live across a
>> block with lots of floating point code. The allocator uses the low xmm
>> registers for the large block, and %xmm8 is left for %vreg1 which has
>> a low spill weight. This significantly improves code size, but the
>> small loop suffers.
> 
> Why does %xmm8 have a low spill weight?  It's used in an inner loop.
Because it is rematerializable and live across a big block where it isn't
used.
>> In this case it might have helped to split the live range and
>> rematerialize, but usually that won't be the case.
> 
> That was my initial reaction.  Splitting should have at least
> rematerialized the value just before header2.  That should significantly
> improve things.  This is a classic motivational case for live range
> splitting.
Well, not really. Note there there are plenty of registers available and no
spilling is neccessary.

It's just that an REX prefix is required on some instructions when %xmm8 is
used. Is it worth it to undo LICM just for that? In this case, probably. In
general, no.

/jakob

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - May 2011 - [LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

[LLVMdev] Greedy register allocation

Maybe Matching Threads