> On Jan 19, 2012, at 5:31 AM, Jonas Paulsson wrote:> LLVM would have to be extended with an RegClass/register-attribute 'spillable'>> > What exactly are you proposing? Why can't you do what the PowerPC and Hexagon targets do? Yes, I can move a CR to a GPR and save it to the stack, but due to a very irregular register file this is about 10 times more expensive than saving/restoring an ordinary register. These registers should basically never have to be spilled, at least not on -O3. > Spill-free register allocation sounds great, why not do it for all register classes? Well, in this architecture, each GPR has its own CR, and I have made a super-register that includes the GPR and CR when the flag is needed. I need to write code like: GPR_CR = cmp GPR, 0 // GPR_CR def <other code, possibly using just GPR's without setting the CR> br #, GPR_CR // GPR_CR use Basically, I would like to make sure that the GPR_CR reg is not spilled before another GPR-only reg is spilled, as it would be idiotic. As the super-register is wider than the GPR I do not see why this happened, but at -O0 this happened for some reason or other. I tried a method with simply handing out pregs as suitable in preRA. I would like to confirm with you if possible that this should work with PBQP and greedy/basic, which scans the LiveIntervals for pregs and remembers it. What's more, setting the GPR_CR class to 'not-spillable' would probably do the trick here as we basically do not want to do this, and I would not have to pre-allocate. But there is probably a better way, or?> , and a register allocator would have to implement register pairing. That would be PBQP, right?> > If by pairing you mean GCC's multiple-alternatives, then PBQP should be able to support that.>> /jakob I take it then that it is not possible to write operand-combinations as in GCC in LLVM so as to handle register pairing on a high level? The PBQP algorithm as such supports register pairing per the article, but it was not implemented in RegAllocPBQP.cpp as far as I can see. For my needs, I would like to say that whenever two registers are used in the same instruction, they must follow any register-pairing rule defined for any such occurence. Possibly, one could add a Constraint per instruction def as well to indicate the use of the register pairing rule, and to allow instances where it does not apply. PBQP extension (suggestion) ===================== Tablegen: def regPair : registerPair<AddrReg0, OffsReg0>, ~or~ def regPair: registerPairing<AddrReg0, [OffsReg0, OffsReg1, OffsReg2]>; ~or~ ?? in the instruction such as a load: ld dst, addrReg, offsReg then PBQP must follow the rule and only allocate legal combinations of addrReg and offsReg. I beleive this should work by setting the cost to infinity for illegal combinations. I supervised a bacheolor student here last year (Jakob Stengård), and as far as I recall, Lang was interested in helping him getting this implemented then, although Jakob ran out of time. Has anything changed since then? What is your estimation on the amount of work for this? Would anyone (Lang?) be willing to supervise it? Jonas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120120/1168d75f/attachment.html>
On Jan 20, 2012, at 6:40 AM, Jonas Paulsson wrote:> > What exactly are you proposing? Why can't you do what the PowerPC and Hexagon targets do? > > Yes, I can move a CR to a GPR and save it to the stack, but due to a very irregular register file this is about 10 times more expensive than saving/restoring an ordinary register. These registers should basically never > have to be spilled, at least not on -O3.And the register allocator is trying very hard not to do that, but sometimes it just isn't possible. You have the choice between aborting compilation or inserting expensive spill code when that happens.> Basically, I would like to make sure that the GPR_CR reg is not spilled before another GPR-only reg is spilled, as it would be idiotic. As the super-register is wider than the GPR I do not see why this happened, but at -O0 > this happened for some reason or other.I sounds like you disabled optimizations, and now optimizations are missing. The fast register allocator tries very hard to be fast. That includes producing suboptimal code. Sometimes it is faster to spill than to run an expensive interference computation.> What's more, setting the GPR_CR class to 'not-spillable' would probably do the trick here as we basically do not want to do this, and I would not have to pre-allocate. But there is probably a better way, or?I am sorry, I simply don't understand what you are asking for. You might as well suggest that we all don't pay taxes. It sounds tempting, but the plan needs some details. If the fast register allocator is not working for you, you can use the greedy register allocator in -O0 builds by passing -regalloc=greedy. It's not that much slower.> I take it then that it is not possible to write operand-combinations as in GCC in LLVM so as to handle register pairing on a high level?That's right.> The PBQP algorithm as such supports register pairing per the article, but it was not implemented in RegAllocPBQP.cpp as far as I can see.The PBQP allocator allows targets to specify complicated constraints that the normal constraint model doesn't support. Constraints are modeled as matrices, so any specific type of constraint wouldn't be mentioned in the source file.> PBQP extension (suggestion) > =====================> > Tablegen: > > def regPair : registerPair<AddrReg0, OffsReg0>, > ~or~ > def regPair: registerPairing<AddrReg0, [OffsReg0, OffsReg1, OffsReg2]>; > ~or~ > ??I am not sure how much easier that would be than the current approach. I'll leave that up to Lang.> I beleive this should work by setting the cost to infinity for illegal combinations. I supervised a bacheolor student here last year (Jakob Stengård), and as far as I recall, Lang was interested in helping him getting this implemented > then, although Jakob ran out of time. Has anything changed since then? What is your estimation on the amount of work for this? Would anyone (Lang?) be willing to supervise it?-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120120/84f9a0ab/attachment.html>
Hi Jonas, As Jakob mentioned, PBQP is designed to provide support constraints that aren't supported in the current model, without requiring any modifications to the PBQP allocator itself. Constraints are expressed to the PBQP allocator via matrices, as described in http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/034781.html . If you decide to try the PBQP route and need pointers to further material, or help with the API, please just let me know. - Lang. On Fri, Jan 20, 2012 at 9:34 AM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:> > On Jan 20, 2012, at 6:40 AM, Jonas Paulsson wrote: > > > What exactly are you proposing? Why can't you do what the PowerPC and > Hexagon targets do? > > Yes, I can move a CR to a GPR and save it to the stack, but due to a very > irregular register file this is about 10 times more expensive than > saving/restoring an ordinary register. These registers should basically > never > have to be spilled, at least not on -O3. > > > And the register allocator is trying very hard not to do that, but > sometimes it just isn't possible. > > You have the choice between aborting compilation or inserting expensive > spill code when that happens. > > Basically, I would like to make sure that the GPR_CR reg is not spilled > before another GPR-only reg is spilled, as it would be idiotic. As the > super-register is wider than the GPR I do not see why this happened, but at > -O0 > this happened for some reason or other. > > > I sounds like you disabled optimizations, and now optimizations are > missing. The fast register allocator tries very hard to be fast. That > includes producing suboptimal code. Sometimes it is faster to spill than to > run an expensive interference computation. > > What's more, setting the GPR_CR class to 'not-spillable' would probably do > the trick here as we basically do not want to do this, and I would not have > to pre-allocate. But there is probably a better way, or? > > > I am sorry, I simply don't understand what you are asking for. You might > as well suggest that we all don't pay taxes. It sounds tempting, but the > plan needs some details. > > If the fast register allocator is not working for you, you can use the > greedy register allocator in -O0 builds by passing -regalloc=greedy. It's > not that much slower. > > I take it then that it is not possible to write operand-combinations as in > GCC in LLVM so as to handle register pairing on a high level? > > > That's right. > > The PBQP algorithm as such supports register pairing per the article, > but it was not implemented in RegAllocPBQP.cpp as far as I can see. > > > The PBQP allocator allows targets to specify complicated constraints that > the normal constraint model doesn't support. Constraints are modeled as > matrices, so any specific type of constraint wouldn't be mentioned in the > source file. > > PBQP extension (suggestion) > =====================> > Tablegen: > > def regPair : registerPair<AddrReg0, OffsReg0>, > ~or~ > def regPair: registerPairing<AddrReg0, [OffsReg0, OffsReg1, OffsReg2]>; > ~or~ > ?? > > > I am not sure how much easier that would be than the current approach. > I'll leave that up to Lang. > > I beleive this should work by setting the cost to infinity for illegal > combinations. I supervised a bacheolor student here last year (Jakob > Stengård), and as far as I recall, Lang was interested in helping him > getting this implemented > then, although Jakob ran out of time. Has anything changed since then? What > is your estimation on the amount of work for this? Would anyone (Lang?) > be willing to supervise it? > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120120/f0b7b707/attachment.html>
Hi Jakob, Jonas et al, Jakob wrote: [...]> Jonas wrote: > [...] > > What's more, setting the GPR_CR class to 'not-spillable' would probably do the trick here as we > > basically do not want to do this, and I would not have to pre-allocate. But there is probably a > > better way, or? > > I am sorry, I simply don't understand what you are asking for. You might as well suggest that we all > don't pay taxes. It sounds tempting, but the plan needs some details. > > If the fast register allocator is not working for you, you can use the greedy register allocator in > -O0 builds by passing -regalloc=greedy. It's not that much slower.[...] I believe that Jonas wants to have the following: - A register class specification such that you get the same effect as 'glue code', but without having to introducing custom nodes and lowering to represent this glue. I think that 'glue' is today the only solution if you do not want to produce spill code. In a backend on which I have been working, we have a compare instruction that sets a flag (true or false), and a conditional branch instruction, that jumps if the flag is true. Initially, I tried to specify this by providing a lowering of the compare instruction : (CCReg only contains one register) def CMPEQ : Instr< (outs CCReg:$dst), (ins IntRegs:$lhs, IntRegs:$rhs), "cmpeq $lhs, $src", [(cmpeq IntRegs:$dst, IntRegs:$src)>; def BCC : Instr< (outs), (ins CCReg:$cc, brtarget:$addr), "bcc $addr", [(brcond CCReg:$cc, bb:$addr)]>;This worked, but for more complex programs, llvm tries to generate code to move CC and to spill CC. I was able to get rid of the 'moves' by setting the copycost to -1. For getting rid of the spills, I was forced to introduce custom nodes, that pass the CC register through 'glue'. Having a property to register classes that identifies a 'glue-like' behavior would make sense to me. Greetings, Jeroen Dobbelaere