Michele Scandale
2012-Jul-25  07:48 UTC
[LLVMdev] Question about an unusual jump instruction
Dear all,
I'm working on an exploratory backend on llvm. In the instruction set
I'm using
I have an instruction (called DECJNZ) that decrements a register and, if the 
decremented value is not zero, jumps (with a relative jump) to a given offset.
I've described in tablegen this instruction as follow:
def DECJNZ : Instruction {
let Namespace = "MyTarget";
let OutOperandList = (outs GprRegs:$R0);
let InOperandList = (ins GprRegs: $R1, imm16:$dest);
let AsmString = "DECJNZ $R0, $dest";
let isBranch = 1;
let isTerminator = 1;
let Constraints = "$R1 = $R0";
let Defs = [SR];
}
I would like to create an optimization pass to make countable loops faster by 
using this instruction.
The simplest loop that I would like to optimize is like:
//////////////////////////
int i = a;
do {
   // loop body
   --i;
} while (i != 0);
//////////////////////////
After code selection I've something like:
BB0:
   %vreg0<def> = COPY %R0; // R0 contains 'a'
   J <#BB1>
BB1:
   %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3>
   J <#BB2>
BB2:
   // loop body
BB3:
   %vreg3<def> = ADDI %vreg1<kill>, 1
   CMPNE %vreg3, 0, %SR<implicit,def>
   JNZ <#BB1>
   J <#BB4>
BB4:
   // end
With the optimization pass I replace the decrement, comparison and conditional 
jump with the DECJNZ. The resulting code will be:
BB0:
   %vreg0<def> = COPY %R0; // R0 contains 'a'
   J <#BB1>
BB1:
   %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3>
   J <#BB2>
BB2:
   // loop body
BB3:
   %vreg3<def> = DECJNZ %vreg1<kill>, <#BB1>,
%SR<implicit,def>
   J <#BB4>
BB4:
   // end
A first problem was related to PHIElimination, while eliminating the PHI-node a 
copy was generated before the DECJNZ, because it's a terminator instruction,
but
the copy should use the value defined by the DECJNZ.
To solve this problem I wrote a preprocess pass which is run just before 
PHIElimination and change the opcode of PHIs that have at least one source alue 
generated by a DECJNZ. In this way it is ignored by the PHIElimination passes 
and then a pass run just after PHIElimination that 'lowers' in a custom
way the
marked PHIs and updates the information about live variables.
With some tests I had a second problem generated by the spilling of the register
used as loop counter. A store instruction is generated after the definition of 
the value, so is inserted between the DECJNZ and J.
I think that with another pass I can try to manually move the spill-store 
instruction at the beginning of the destination basic block, but I think
it's
not enough to preserve the semantics of the code.
Is my approach correct? Does it exist a cleaner and more elegant way to support 
this kind of instruction? I tried to look for a similar instruction in other 
targets, but I've found nothing...
Thanks in advance for future replies.
Regards,
Michele Scandale
On Wed, Jul 25, 2012 at 12:48 AM, Michele Scandale <michele.scandale at gmail.com> wrote:> Dear all, > > I'm working on an exploratory backend on llvm. In the instruction set I'm using > I have an instruction (called DECJNZ) that decrements a register and, if the > decremented value is not zero, jumps (with a relative jump) to a given offset. > > I've described in tablegen this instruction as follow: > > def DECJNZ : Instruction { > let Namespace = "MyTarget"; > let OutOperandList = (outs GprRegs:$R0); > let InOperandList = (ins GprRegs: $R1, imm16:$dest); > let AsmString = "DECJNZ $R0, $dest"; > let isBranch = 1; > let isTerminator = 1; > let Constraints = "$R1 = $R0"; > let Defs = [SR]; > } > > I would like to create an optimization pass to make countable loops faster by > using this instruction. > > The simplest loop that I would like to optimize is like: > > ////////////////////////// > int i = a; > do { > // loop body > --i; > } while (i != 0); > ////////////////////////// > > > After code selection I've something like: > > BB0: > %vreg0<def> = COPY %R0; // R0 contains 'a' > J <#BB1> > BB1: > %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> > J <#BB2> > BB2: > // loop body > BB3: > %vreg3<def> = ADDI %vreg1<kill>, 1 > CMPNE %vreg3, 0, %SR<implicit,def> > JNZ <#BB1> > J <#BB4> > BB4: > // end > > With the optimization pass I replace the decrement, comparison and conditional > jump with the DECJNZ. The resulting code will be: > > BB0: > %vreg0<def> = COPY %R0; // R0 contains 'a' > J <#BB1> > BB1: > %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> > J <#BB2> > BB2: > // loop body > BB3: > %vreg3<def> = DECJNZ %vreg1<kill>, <#BB1>, %SR<implicit,def> > J <#BB4> > BB4: > // end > > A first problem was related to PHIElimination, while eliminating the PHI-node a > copy was generated before the DECJNZ, because it's a terminator instruction, but > the copy should use the value defined by the DECJNZ. > > To solve this problem I wrote a preprocess pass which is run just before > PHIElimination and change the opcode of PHIs that have at least one source alue > generated by a DECJNZ. In this way it is ignored by the PHIElimination passes > and then a pass run just after PHIElimination that 'lowers' in a custom way the > marked PHIs and updates the information about live variables. > > With some tests I had a second problem generated by the spilling of the register > used as loop counter. A store instruction is generated after the definition of > the value, so is inserted between the DECJNZ and J. > > I think that with another pass I can try to manually move the spill-store > instruction at the beginning of the destination basic block, but I think it's > not enough to preserve the semantics of the code. > > Is my approach correct? Does it exist a cleaner and more elegant way to support > this kind of instruction? I tried to look for a similar instruction in other > targets, but I've found nothing...See PPCCTRLoops.cpp in the PPC backend. -Eli
Michele Scandale
2012-Jul-25  08:55 UTC
[LLVMdev] Question about an unusual jump instruction
Il 25/07/2012 10:07, Eli Friedman ha scritto:> On Wed, Jul 25, 2012 at 12:48 AM, Michele Scandale > <michele.scandale at gmail.com> wrote: >> Dear all, >> >> I'm working on an exploratory backend on llvm. In the instruction set I'm using >> I have an instruction (called DECJNZ) that decrements a register and, if the >> decremented value is not zero, jumps (with a relative jump) to a given offset. >> >> I've described in tablegen this instruction as follow: >> >> def DECJNZ : Instruction { >> let Namespace = "MyTarget"; >> let OutOperandList = (outs GprRegs:$R0); >> let InOperandList = (ins GprRegs: $R1, imm16:$dest); >> let AsmString = "DECJNZ $R0, $dest"; >> let isBranch = 1; >> let isTerminator = 1; >> let Constraints = "$R1 = $R0"; >> let Defs = [SR]; >> } >> >> I would like to create an optimization pass to make countable loops faster by >> using this instruction. >> >> The simplest loop that I would like to optimize is like: >> >> ////////////////////////// >> int i = a; >> do { >> // loop body >> --i; >> } while (i != 0); >> ////////////////////////// >> >> >> After code selection I've something like: >> >> BB0: >> %vreg0<def> = COPY %R0; // R0 contains 'a' >> J <#BB1> >> BB1: >> %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> >> J <#BB2> >> BB2: >> // loop body >> BB3: >> %vreg3<def> = ADDI %vreg1<kill>, 1 >> CMPNE %vreg3, 0, %SR<implicit,def> >> JNZ <#BB1> >> J <#BB4> >> BB4: >> // end >> >> With the optimization pass I replace the decrement, comparison and conditional >> jump with the DECJNZ. The resulting code will be: >> >> BB0: >> %vreg0<def> = COPY %R0; // R0 contains 'a' >> J <#BB1> >> BB1: >> %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> >> J <#BB2> >> BB2: >> // loop body >> BB3: >> %vreg3<def> = DECJNZ %vreg1<kill>, <#BB1>, %SR<implicit,def> >> J <#BB4> >> BB4: >> // end >> >> A first problem was related to PHIElimination, while eliminating the PHI-node a >> copy was generated before the DECJNZ, because it's a terminator instruction, but >> the copy should use the value defined by the DECJNZ. >> >> To solve this problem I wrote a preprocess pass which is run just before >> PHIElimination and change the opcode of PHIs that have at least one source alue >> generated by a DECJNZ. In this way it is ignored by the PHIElimination passes >> and then a pass run just after PHIElimination that 'lowers' in a custom way the >> marked PHIs and updates the information about live variables. >> >> With some tests I had a second problem generated by the spilling of the register >> used as loop counter. A store instruction is generated after the definition of >> the value, so is inserted between the DECJNZ and J. >> >> I think that with another pass I can try to manually move the spill-store >> instruction at the beginning of the destination basic block, but I think it's >> not enough to preserve the semantics of the code. >> >> Is my approach correct? Does it exist a cleaner and more elegant way to support >> this kind of instruction? I tried to look for a similar instruction in other >> targets, but I've found nothing... > > See PPCCTRLoops.cpp in the PPC backend. > > -Eli >I took a quick look to PPCCTRLoops.cpp, but I don't think it's the same of my case. The instruction I have defines explicitly a GPR register, while the BDNZ defines and uses an implicit dedicated register. I based my optimization pass on what seen in Hexagon target too, but due to the fact that I have an explicit GPR as loop counter, my instruction defines a virtual register and the fact that the instruction is also a terminator creates all the problems I listed in my previous mail. Regards, Michele Scandale