thr3ads.net - llvm dev - [LLVMdev] register allocation [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Jonas Paulsson

2012-Jan-20 14:40 UTC

[LLVMdev] register allocation

> On Jan 19, 2012, at 5:31 AM, Jonas Paulsson wrote: >  LLVM would have to be extended with an RegClass/register-attribute
'spillable'> >
 > What exactly are you proposing? Why can't you do what the PowerPC and
Hexagon targets do?

Yes, I can move a CR to a GPR and save it to the stack, but due to a very
irregular register file this is about 10 times more expensive than
saving/restoring an ordinary register. These registers should basically never
have to be spilled, at least not on -O3.

 > Spill-free register allocation sounds great, why not do it for all
register classes?

Well, in this architecture, each GPR has its own CR, and I have made a
super-register that includes the GPR and CR when the flag is needed. I need to
write code like:

GPR_CR = cmp GPR, 0                                                           
// GPR_CR def
<other code, possibly using just GPR's without setting the CR>
br #, GPR_CR                                                                    
// GPR_CR use

Basically, I would like to make sure that the GPR_CR reg is not spilled before
another GPR-only reg is spilled, as it would be idiotic. As the super-register
is wider than the GPR I do not see why this happened, but at -O0
this happened for some reason or other.

I tried a method with simply handing out pregs as suitable in preRA. I would
like to confirm with you if possible that this should work with PBQP and
greedy/basic, which scans the LiveIntervals for pregs and remembers it.
What's more, setting the GPR_CR class to 'not-spillable' would
probably do the trick here as we basically do not want to do this, and I would
not have to pre-allocate. But there is probably a better way, or?
> , and a register allocator would have to implement register pairing. That
would be PBQP, right? >
 > If by pairing you mean GCC's multiple-alternatives, then PBQP should
be able to support that.> > /jakob

I take it then that it is not possible to write operand-combinations as in GCC
in LLVM so as to handle register pairing on a high level?

The PBQP algorithm as such supports register pairing per the article, but it was
not implemented in RegAllocPBQP.cpp as far as I can see.

For my needs, I would like to say that whenever two registers are used in the
same instruction, they must follow any register-pairing rule defined for any
such occurence. Possibly, one could add a Constraint per instruction def
as well to indicate the use of the register pairing rule, and to allow instances
where it does not apply.

PBQP extension (suggestion)
=====================
Tablegen:

def regPair : registerPair<AddrReg0, OffsReg0>,
~or~
def regPair:  registerPairing<AddrReg0, [OffsReg0, OffsReg1, OffsReg2]>;
~or~
??

in the instruction such as a load:

ld dst, addrReg, offsReg

then PBQP must follow the rule and only allocate legal combinations of addrReg
and offsReg.

I beleive this should work by setting the cost to infinity for illegal
combinations. I supervised a bacheolor student here last year (Jakob Stengård),
and as far as I recall, Lang was interested in helping him getting this
implemented
then, although Jakob ran out of time. Has anything changed since then? What is
your estimation on the amount of  work for this? Would anyone (Lang?) be willing
to supervise it?

Jonas

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120120/1168d75f/attachment.html>

Jakob Stoklund Olesen

2012-Jan-20 17:34 UTC

head link

[LLVMdev] register allocation

On Jan 20, 2012, at 6:40 AM, Jonas Paulsson wrote:
>  > What exactly are you proposing? Why can't you do what the PowerPC
and Hexagon targets do?
>  
> Yes, I can move a CR to a GPR and save it to the stack, but due to a very
irregular register file this is about 10 times more expensive than
saving/restoring an ordinary register. These registers should basically never
> have to be spilled, at least not on -O3.
And the register allocator is trying very hard not to do that, but sometimes it
just isn't possible.

You have the choice between aborting compilation or inserting expensive spill
code when that happens.
> Basically, I would like to make sure that the GPR_CR reg is not spilled
before another GPR-only reg is spilled, as it would be idiotic. As the
super-register is wider than the GPR I do not see why this happened, but at -O0
> this happened for some reason or other.
I sounds like you disabled optimizations, and now optimizations are missing. The
fast register allocator tries very hard to be fast. That includes producing
suboptimal code. Sometimes it is faster to spill than to run an expensive
interference computation.
> What's more, setting the GPR_CR class to 'not-spillable' would
probably do the trick here as we basically do not want to do this, and I would
not have to pre-allocate. But there is probably a better way, or?
I am sorry, I simply don't understand what you are asking for. You might as
well suggest that we all don't pay taxes. It sounds tempting, but the plan
needs some details.

If the fast register allocator is not working for you, you can use the greedy
register allocator in -O0 builds by passing -regalloc=greedy. It's not that
much slower.
> I take it then that it is not possible to write operand-combinations as in
GCC in LLVM so as to handle register pairing on a high level?
That's right.
>  The PBQP algorithm as such supports register pairing per the article, but
it was not implemented in RegAllocPBQP.cpp as far as I can see.
The PBQP allocator allows targets to specify complicated constraints that the
normal constraint model doesn't support. Constraints are modeled as
matrices, so any specific type of constraint wouldn't be mentioned in the
source file.
> PBQP extension (suggestion)
> =====================>  
> Tablegen:
>  
> def regPair : registerPair<AddrReg0, OffsReg0>,
> ~or~
> def regPair:  registerPairing<AddrReg0, [OffsReg0, OffsReg1,
OffsReg2]>;
> ~or~
> ??
I am not sure how much easier that would be than the current approach. I'll
leave that up to Lang.
> I beleive this should work by setting the cost to infinity for illegal
combinations. I supervised a bacheolor student here last year (Jakob Stengård),
and as far as I recall, Lang was interested in helping him getting this
implemented
> then, although Jakob ran out of time. Has anything changed since then? What
is your estimation on the amount of  work for this? Would anyone (Lang?) be
willing to supervise it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120120/84f9a0ab/attachment.html>

Lang Hames

2012-Jan-20 19:24 UTC

head link

[LLVMdev] register allocation

Hi Jonas,

As Jakob mentioned, PBQP is designed to provide support constraints that
aren't supported in the current model, without requiring any modifications
to the PBQP allocator itself. Constraints are expressed to the PBQP
allocator via matrices, as described in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/034781.html .

If you decide to try the PBQP route and need pointers to further material,
or help with the API, please just let me know.

- Lang.


On Fri, Jan 20, 2012 at 9:34 AM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
>
> On Jan 20, 2012, at 6:40 AM, Jonas Paulsson wrote:
>
>   > What exactly are you proposing? Why can't you do what the
PowerPC and
> Hexagon targets do?
>
> Yes, I can move a CR to a GPR and save it to the stack, but due to a very
> irregular register file this is about 10 times more expensive than
> saving/restoring an ordinary register. These registers should basically
> never
> have to be spilled, at least not on -O3.
>
>
> And the register allocator is trying very hard not to do that, but
> sometimes it just isn't possible.
>
> You have the choice between aborting compilation or inserting expensive
> spill code when that happens.
>
> Basically, I would like to make sure that the GPR_CR reg is not spilled
> before another GPR-only reg is spilled, as it would be idiotic. As the
> super-register is wider than the GPR I do not see why this happened, but at
> -O0
> this happened for some reason or other.
>
>
> I sounds like you disabled optimizations, and now optimizations are
> missing. The fast register allocator tries very hard to be fast. That
> includes producing suboptimal code. Sometimes it is faster to spill than to
> run an expensive interference computation.
>
> What's more, setting the GPR_CR class to 'not-spillable' would
probably do
> the trick here as we basically do not want to do this, and I would not have
> to pre-allocate. But there is probably a better way, or?
>
>
> I am sorry, I simply don't understand what you are asking for. You
might
> as well suggest that we all don't pay taxes. It sounds tempting, but
the
> plan needs some details.
>
> If the fast register allocator is not working for you, you can use the
> greedy register allocator in -O0 builds by passing -regalloc=greedy.
It's
> not that much slower.
>
> I take it then that it is not possible to write operand-combinations as in
> GCC in LLVM so as to handle register pairing on a high level?
>
>
> That's right.
>
>  The PBQP algorithm as such supports register pairing per the article,
> but it was not implemented in RegAllocPBQP.cpp as far as I can see.
>
>
> The PBQP allocator allows targets to specify complicated constraints that
> the normal constraint model doesn't support. Constraints are modeled as
> matrices, so any specific type of constraint wouldn't be mentioned in
the
> source file.
>
> PBQP extension (suggestion)
> =====================>
> Tablegen:
>
> def regPair : registerPair<AddrReg0, OffsReg0>,
> ~or~
> def regPair:  registerPairing<AddrReg0, [OffsReg0, OffsReg1,
OffsReg2]>;
> ~or~
> ??
>
>
> I am not sure how much easier that would be than the current approach.
> I'll leave that up to Lang.
>
> I beleive this should work by setting the cost to infinity for illegal
> combinations. I supervised a bacheolor student here last year (Jakob
> Stengård), and as far as I recall, Lang was interested in helping him
> getting this implemented
> then, although Jakob ran out of time. Has anything changed since then? What
> is your estimation on the amount of  work for this? Would anyone (Lang?)
> be willing to supervise it?
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120120/f0b7b707/attachment.html>

Jeroen Dobbelaere

2012-Feb-02 16:40 UTC

head link

[LLVMdev] register allocation

Hi Jakob, Jonas et al,

Jakob wrote:
[...]> Jonas wrote:
> [...]
> > What's more, setting the GPR_CR class to 'not-spillable'
would probably do the trick here as we
> > basically do not want to do this, and I would not have to
pre-allocate. But there is probably a
> > better way, or?
> 
> I am sorry, I simply don't understand what you are asking for. You
might as well suggest that we all
> don't pay taxes. It sounds tempting, but the plan needs some details.
> 
> If the fast register allocator is not working for you, you can use the
greedy register allocator in
> -O0 builds by passing -regalloc=greedy. It's not that much slower.[...]

I believe that Jonas wants to have the following:
- A register class specification such that you get the same effect as 'glue
code', but without
  having to introducing custom nodes and lowering to represent this glue.

I think that 'glue' is today the only solution if you do not want to
produce spill code.

In a backend on which I have been working, we have a compare instruction that
sets a flag (true or false), and a conditional branch instruction, that jumps if
the flag is true.

Initially, I tried to specify this by providing a lowering of the compare
instruction :

(CCReg only contains one register)

def CMPEQ : Instr<
  (outs CCReg:$dst),
  (ins  IntRegs:$lhs, IntRegs:$rhs),
  "cmpeq $lhs, $src",
  [(cmpeq IntRegs:$dst, IntRegs:$src)>;

def BCC : Instr<
  (outs),
  (ins CCReg:$cc, brtarget:$addr),
  "bcc $addr",
  [(brcond CCReg:$cc, bb:$addr)]>;
This worked, but for more complex programs, llvm tries to generate code to move
CC and to spill CC.
I was able to get rid of the 'moves' by setting the copycost to -1.
For getting rid of the spills, I was forced to introduce custom nodes, that pass
the CC register
through 'glue'.

Having a property to register classes that identifies a 'glue-like'
behavior would make sense to me.

Greetings,

Jeroen Dobbelaere

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jan 2012 - [LLVMdev] register allocation

[LLVMdev] register allocation

[LLVMdev] register allocation

[LLVMdev] register allocation

[LLVMdev] register allocation

Seemingly Similar Threads