Hal Finkel
2012-Jun-08 02:31 UTC
[LLVMdev] Strong vs. default phi elimination and single-reg classes
Hello again, I am trying to implement an optimization pass for PowerPC such that simple loops use the special "counter register" (CTR) to track the induction variable. This is helpful because, in addition to reducing register pressure, there is a combined decrement-compare-and-branch instruction BZND (there are also other related instructions). I started this process by converting the Hexagon Hardware-Loops pass to work with analogous PPC instructions. This worked fairly well, but left a bunch of unneeded unconditional branches. To fix this, I added support into AnalyzeBranch (and InsertBranch and RemoveBranch). Unfortunately, this really broke things, the branch instructions (which both used and defined the count register) were moved around in invalid ways causing both compile-time (live-out assertions) and run-time failures. Instead of trying to track down these problems, I thought it would be better to use a register class instead of the physical register. This register class has only one register (the count register), and because of how the loops pass is setup, spilling this register is never necessary. Here's the problem: If I use strong phi elimination, then this works, and the CTR register is allocated as it should be. When using the default phi elimination, extra copies are introduced (which I don't completely understand), and the register allocator tries to spill the count register. For example, with strong-phi elimination, I get (as a simple example): BB#0: derived from LLVM BB %entry Live Ins: %X3 %vreg2<def> = COPY %X3<kill>; G8RC:%vreg2 %vreg4<def> = LI 2048; GPRC:%vreg4 %vreg3<def> = OR8To4 %vreg2<kill>, %vreg2; GPRC:%vreg3 G8RC:%vreg2 %vreg9<def> = COPY %vreg4<kill>; GPRC:%vreg9,%vreg4 %vreg10<def> = RLDICL %vreg9<kill>, 0, 32; GPRC:%vreg10,%vreg9 %vreg11<def> = MTCTR8r %vreg10<kill>; CTRRC8:%vreg11 GPRC:%vreg10 Successors according to CFG: BB#1 112B BB#1: derived from LLVM BB %for.body, ADDRESS TAKEN Predecessors according to CFG: BB#0 BB#1 %vreg12<def> = PHI %vreg13, <BB#1>, %vreg11, <BB#0>;CTRRC8:%vreg12,%vreg13,%vreg11 %vreg5<def> = LDtoc <ga:@a>, %X2; G8RC:%vreg5 %vreg6<def> = LWZ 0, %vreg5; mem:Volatile LD4[@a](tbaa=!"int") GPRC:%vreg6 G8RC:%vreg5 %vreg7<def> = ADD4 %vreg6<kill>, %vreg3; GPRC:%vreg7,%vreg6,%vreg3 STW %vreg7<kill>, 0, %vreg5<kill>; mem:Volatile ST4[@a](tbaa=!"int") GPRC:%vreg7 G8RC:%vreg5 %vreg13<def> = COPY %vreg12<kill>; CTRRC8:%vreg13,%vreg12 %vreg13<def> = BDNZ8 %vreg13, <BB#1>; CTRRC8:%vreg13 B <BB#2> Successors according to CFG: BB#2 BB#1 but with default phi elimination I get: 0B BB#0: derived from LLVM BB %entry Live Ins: %X3 %vreg2<def> = COPY %X3<kill>; G8RC:%vreg2 %vreg4<def> = LI 2048; GPRC:%vreg4 %vreg3<def> = OR8To4 %vreg2<kill>, %vreg2; GPRC:%vreg3 G8RC:%vreg2 %vreg9<def> = COPY %vreg4<kill>; GPRC:%vreg9,%vreg4 %vreg10<def> = RLDICL %vreg9<kill>, 0, 32; GPRC:%vreg10,%vreg9 %vreg11<def> = MTCTR8r %vreg10<kill>; CTRRC8:%vreg11 GPRC:%vreg10 %vreg14<def> = COPY %vreg11<kill>; CTRRC8:%vreg14,%vreg11 Successors according to CFG: BB#1 128B BB#1: derived from LLVM BB %for.body, ADDRESS TAKEN Predecessors according to CFG: BB#0 BB#1 %vreg12<def> = COPY %vreg14<kill>; CTRRC8:%vreg12,%vreg14 %vreg5<def> = LDtoc <ga:@a>, %X2; G8RC:%vreg5 %vreg6<def> = LWZ 0, %vreg5; mem:Volatile LD4[@a](tbaa=!"int") GPRC:%vreg6 G8RC:%vreg5 %vreg7<def> = ADD4 %vreg6<kill>, %vreg3; GPRC:%vreg7,%vreg6,%vreg3 STW %vreg7<kill>, 0, %vreg5<kill>; mem:Volatile ST4[@a](tbaa=!"int") GPRC:%vreg7 G8RC:%vreg5 %vreg14<def> = COPY %vreg13<kill>; CTRRC8:%vreg14,%vreg13 %vreg13<def> = COPY %vreg12<kill>; CTRRC8:%vreg13,%vreg12 %vreg13<def> = BDNZ8 %vreg13, <BB#1>; CTRRC8:%vreg13 B <BB#2> Successors according to CFG: BB#2 BB#1 Is is those two extra copies at the end that seem to be the problem. The register allocator believes that it needs two count registers, one for vreg14 and one for vreg13 (thus the need to spill). Maybe turning on strong phi elimination by default would be a good thing, but is there another solution here? Does this problem indicate that I'm setting up my phi nodes (or register constraints, etc.) incorrectly? Thanks in advance, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Jakob Stoklund Olesen
2012-Jun-08 05:14 UTC
[LLVMdev] Strong vs. default phi elimination and single-reg classes
On Jun 7, 2012, at 7:31 PM, Hal Finkel wrote:> 112B BB#1: derived from LLVM BB %for.body, ADDRESS TAKEN > Predecessors according to CFG: BB#0 BB#1 > %vreg12<def> = PHI %vreg13, <BB#1>, %vreg11, <BB#0>;CTRRC8:%vreg12,%vreg13,%vreg11 > %vreg13<def> = COPY %vreg12<kill>; CTRRC8:%vreg13,%vreg12 > %vreg13<def> = BDNZ8 %vreg13, <BB#1>; CTRRC8:%vreg13 > B <BB#2> > Successors according to CFG: BB#2 BB#1Phi-elim works by inserting copies in the PHI predecessors before the first terminator. That isn't possible here since the first terminator defines the value of %vreg13 we want to copy. I think that phi-elim is not only causing spilling. It is miscompiling the code. We simply don't support terminator instructions that define virtual registers ATM, it seems. I can see how strong phi-elim would work in this case, but I don't think it can properly handle this in general. For example: 112B BB#1: derived from LLVM BB %for.body, ADDRESS TAKEN Predecessors according to CFG: BB#0 BB#1 %vreg12<def> = PHI %vreg13, <BB#1>, %vreg11, <BB#0>;CTRRC8:%vreg12,%vreg13,%vreg11 %vreg99<def> = PHI %vreg13, <BB#1>, %vreg2, <BB#0> %vreg13<def> = COPY %vreg12<kill>; CTRRC8:%vreg13,%vreg12 %vreg13<def> = BDNZ8 %vreg13, <BB#1>; CTRRC8:%vreg13 B <BB#2> Successors according to CFG: BB#2 BB#1 Since it is impossible to copy %vreg13 at the end of BB#1, it must be merged with %vreg12. The second PHI use means that it must also be merged with %vreg99, but that is not possible if %vreg11 and %vreg2 have different values leaving BB#0. Instead of trying to solve this hairy problem, I think you would be better off without the CTRRC8 register class. Your problems with {Analyze,Insert,Remove}Branch should be fixable. /jakob
Hal Finkel
2012-Jun-08 05:54 UTC
[LLVMdev] Strong vs. default phi elimination and single-reg classes
On Thu, 7 Jun 2012 22:14:00 -0700 Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:> > On Jun 7, 2012, at 7:31 PM, Hal Finkel wrote: > > > 112B BB#1: derived from LLVM BB %for.body, ADDRESS TAKEN > > Predecessors according to CFG: BB#0 BB#1 > > %vreg12<def> = PHI %vreg13, <BB#1>, %vreg11, > > <BB#0>;CTRRC8:%vreg12,%vreg13,%vreg11 %vreg13<def> = COPY > > %vreg12<kill>; CTRRC8:%vreg13,%vreg12 %vreg13<def> = BDNZ8 %vreg13, > > <BB#1>; CTRRC8:%vreg13 B <BB#2> > > Successors according to CFG: BB#2 BB#1 > > > Phi-elim works by inserting copies in the PHI predecessors before the > first terminator. That isn't possible here since the first terminator > defines the value of %vreg13 we want to copy. I think that phi-elim > is not only causing spilling. It is miscompiling the code. > > We simply don't support terminator instructions that define virtual > registers ATM, it seems. > > I can see how strong phi-elim would work in this case, but I don't > think it can properly handle this in general. For example: > > 112B BB#1: derived from LLVM BB %for.body, ADDRESS TAKEN > Predecessors according to CFG: BB#0 BB#1 > %vreg12<def> = PHI %vreg13, <BB#1>, %vreg11, > <BB#0>;CTRRC8:%vreg12,%vreg13,%vreg11 %vreg99<def> = PHI %vreg13, > <BB#1>, %vreg2, <BB#0> %vreg13<def> = COPY %vreg12<kill>; > CTRRC8:%vreg13,%vreg12 %vreg13<def> = BDNZ8 %vreg13, <BB#1>; > CTRRC8:%vreg13 B <BB#2> > Successors according to CFG: BB#2 BB#1 > > Since it is impossible to copy %vreg13 at the end of BB#1, it must be > merged with %vreg12. The second PHI use means that it must also be > merged with %vreg99, but that is not possible if %vreg11 and %vreg2 > have different values leaving BB#0. > > Instead of trying to solve this hairy problem, I think you would be > better off without the CTRRC8 register class.Thanks for explaining! I can roll-back.> Your problems with > {Analyze,Insert,Remove}Branch should be fixable.I don't think that the problem was with those functions. Adding support for BDNZ and friends in those functions just enabled other passes to start moving the blocks around, and that seems to have exposed problems of its own. For example, sometimes LiveIntervals asserts with: register: %CTR8 clang: /llvm-trunk/lib/CodeGen/LiveIntervalAnalysis.cpp:446: void llvm::LiveInterval s::handlePhysicalRegisterDef(llvm::MachineBasicBlock*, llvm::MachineBasicBlock::iterator, llvm::SlotIndex, llvm::MachineOperand&, llvm::LiveInt erval&): Assertion `!isAllocatable(interval.reg) && "Physregs shouldn't be live out!"' failed. in this case the loop is quite simple: 944B BB#8: derived from LLVM BB %for.inc6, ADDRESS TAKEN Live Ins: %CTR8 Predecessors according to CFG: BB#8 BB#3 960B BDNZ8 <BB#8>, %CTR8<imp-def>, %CTR8<imp-use,kill> Successors according to CFG: BB#8 BB#10 the preheader is: 240B BB#3: Predecessors according to CFG: BB#2 256B %vreg28<def> = LI 0; GPRC:%vreg28 272B %vreg30<def> = COPY %vreg17<kill>; GPRC:%vreg30,%vreg17 288B %vreg31<def> = RLDICL %vreg30<kill>, 0, 32;GPRC:%vreg31,%vreg30 304B MTCTR8 %vreg31<kill>,%CTR8<imp-def,dead>; GPRC:%vreg31 320B B <BB#8> Successors according to CFG: BB#8 So maybe LiveInterval would need to be updated to support terminators that define registers? There are also mis-compiles, but I'm hoping that they all stem from the same underlying problem. Thanks again, Hal> > /jakob-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Apparently Analagous Threads
- [LLVMdev] Strong vs. default phi elimination and single-reg classes
- [LLVMdev] Strong vs. default phi elimination and single-reg classes
- [LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
- Machine Scheduler on Power PC: Latency Limit and Register Pressure
- [LLVMdev] RegisterCoalescing pass crashes with ImplicitDef registers