Michele Scandale
2012-Jul-25 07:48 UTC
[LLVMdev] Question about an unusual jump instruction
Dear all, I'm working on an exploratory backend on llvm. In the instruction set I'm using I have an instruction (called DECJNZ) that decrements a register and, if the decremented value is not zero, jumps (with a relative jump) to a given offset. I've described in tablegen this instruction as follow: def DECJNZ : Instruction { let Namespace = "MyTarget"; let OutOperandList = (outs GprRegs:$R0); let InOperandList = (ins GprRegs: $R1, imm16:$dest); let AsmString = "DECJNZ $R0, $dest"; let isBranch = 1; let isTerminator = 1; let Constraints = "$R1 = $R0"; let Defs = [SR]; } I would like to create an optimization pass to make countable loops faster by using this instruction. The simplest loop that I would like to optimize is like: ////////////////////////// int i = a; do { // loop body --i; } while (i != 0); ////////////////////////// After code selection I've something like: BB0: %vreg0<def> = COPY %R0; // R0 contains 'a' J <#BB1> BB1: %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> J <#BB2> BB2: // loop body BB3: %vreg3<def> = ADDI %vreg1<kill>, 1 CMPNE %vreg3, 0, %SR<implicit,def> JNZ <#BB1> J <#BB4> BB4: // end With the optimization pass I replace the decrement, comparison and conditional jump with the DECJNZ. The resulting code will be: BB0: %vreg0<def> = COPY %R0; // R0 contains 'a' J <#BB1> BB1: %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> J <#BB2> BB2: // loop body BB3: %vreg3<def> = DECJNZ %vreg1<kill>, <#BB1>, %SR<implicit,def> J <#BB4> BB4: // end A first problem was related to PHIElimination, while eliminating the PHI-node a copy was generated before the DECJNZ, because it's a terminator instruction, but the copy should use the value defined by the DECJNZ. To solve this problem I wrote a preprocess pass which is run just before PHIElimination and change the opcode of PHIs that have at least one source alue generated by a DECJNZ. In this way it is ignored by the PHIElimination passes and then a pass run just after PHIElimination that 'lowers' in a custom way the marked PHIs and updates the information about live variables. With some tests I had a second problem generated by the spilling of the register used as loop counter. A store instruction is generated after the definition of the value, so is inserted between the DECJNZ and J. I think that with another pass I can try to manually move the spill-store instruction at the beginning of the destination basic block, but I think it's not enough to preserve the semantics of the code. Is my approach correct? Does it exist a cleaner and more elegant way to support this kind of instruction? I tried to look for a similar instruction in other targets, but I've found nothing... Thanks in advance for future replies. Regards, Michele Scandale
On Wed, Jul 25, 2012 at 12:48 AM, Michele Scandale <michele.scandale at gmail.com> wrote:> Dear all, > > I'm working on an exploratory backend on llvm. In the instruction set I'm using > I have an instruction (called DECJNZ) that decrements a register and, if the > decremented value is not zero, jumps (with a relative jump) to a given offset. > > I've described in tablegen this instruction as follow: > > def DECJNZ : Instruction { > let Namespace = "MyTarget"; > let OutOperandList = (outs GprRegs:$R0); > let InOperandList = (ins GprRegs: $R1, imm16:$dest); > let AsmString = "DECJNZ $R0, $dest"; > let isBranch = 1; > let isTerminator = 1; > let Constraints = "$R1 = $R0"; > let Defs = [SR]; > } > > I would like to create an optimization pass to make countable loops faster by > using this instruction. > > The simplest loop that I would like to optimize is like: > > ////////////////////////// > int i = a; > do { > // loop body > --i; > } while (i != 0); > ////////////////////////// > > > After code selection I've something like: > > BB0: > %vreg0<def> = COPY %R0; // R0 contains 'a' > J <#BB1> > BB1: > %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> > J <#BB2> > BB2: > // loop body > BB3: > %vreg3<def> = ADDI %vreg1<kill>, 1 > CMPNE %vreg3, 0, %SR<implicit,def> > JNZ <#BB1> > J <#BB4> > BB4: > // end > > With the optimization pass I replace the decrement, comparison and conditional > jump with the DECJNZ. The resulting code will be: > > BB0: > %vreg0<def> = COPY %R0; // R0 contains 'a' > J <#BB1> > BB1: > %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> > J <#BB2> > BB2: > // loop body > BB3: > %vreg3<def> = DECJNZ %vreg1<kill>, <#BB1>, %SR<implicit,def> > J <#BB4> > BB4: > // end > > A first problem was related to PHIElimination, while eliminating the PHI-node a > copy was generated before the DECJNZ, because it's a terminator instruction, but > the copy should use the value defined by the DECJNZ. > > To solve this problem I wrote a preprocess pass which is run just before > PHIElimination and change the opcode of PHIs that have at least one source alue > generated by a DECJNZ. In this way it is ignored by the PHIElimination passes > and then a pass run just after PHIElimination that 'lowers' in a custom way the > marked PHIs and updates the information about live variables. > > With some tests I had a second problem generated by the spilling of the register > used as loop counter. A store instruction is generated after the definition of > the value, so is inserted between the DECJNZ and J. > > I think that with another pass I can try to manually move the spill-store > instruction at the beginning of the destination basic block, but I think it's > not enough to preserve the semantics of the code. > > Is my approach correct? Does it exist a cleaner and more elegant way to support > this kind of instruction? I tried to look for a similar instruction in other > targets, but I've found nothing...See PPCCTRLoops.cpp in the PPC backend. -Eli
Michele Scandale
2012-Jul-25 08:55 UTC
[LLVMdev] Question about an unusual jump instruction
Il 25/07/2012 10:07, Eli Friedman ha scritto:> On Wed, Jul 25, 2012 at 12:48 AM, Michele Scandale > <michele.scandale at gmail.com> wrote: >> Dear all, >> >> I'm working on an exploratory backend on llvm. In the instruction set I'm using >> I have an instruction (called DECJNZ) that decrements a register and, if the >> decremented value is not zero, jumps (with a relative jump) to a given offset. >> >> I've described in tablegen this instruction as follow: >> >> def DECJNZ : Instruction { >> let Namespace = "MyTarget"; >> let OutOperandList = (outs GprRegs:$R0); >> let InOperandList = (ins GprRegs: $R1, imm16:$dest); >> let AsmString = "DECJNZ $R0, $dest"; >> let isBranch = 1; >> let isTerminator = 1; >> let Constraints = "$R1 = $R0"; >> let Defs = [SR]; >> } >> >> I would like to create an optimization pass to make countable loops faster by >> using this instruction. >> >> The simplest loop that I would like to optimize is like: >> >> ////////////////////////// >> int i = a; >> do { >> // loop body >> --i; >> } while (i != 0); >> ////////////////////////// >> >> >> After code selection I've something like: >> >> BB0: >> %vreg0<def> = COPY %R0; // R0 contains 'a' >> J <#BB1> >> BB1: >> %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> >> J <#BB2> >> BB2: >> // loop body >> BB3: >> %vreg3<def> = ADDI %vreg1<kill>, 1 >> CMPNE %vreg3, 0, %SR<implicit,def> >> JNZ <#BB1> >> J <#BB4> >> BB4: >> // end >> >> With the optimization pass I replace the decrement, comparison and conditional >> jump with the DECJNZ. The resulting code will be: >> >> BB0: >> %vreg0<def> = COPY %R0; // R0 contains 'a' >> J <#BB1> >> BB1: >> %vreg1<def> = PHI %vreg0, <#BB0>, %vreg3, <#BB3> >> J <#BB2> >> BB2: >> // loop body >> BB3: >> %vreg3<def> = DECJNZ %vreg1<kill>, <#BB1>, %SR<implicit,def> >> J <#BB4> >> BB4: >> // end >> >> A first problem was related to PHIElimination, while eliminating the PHI-node a >> copy was generated before the DECJNZ, because it's a terminator instruction, but >> the copy should use the value defined by the DECJNZ. >> >> To solve this problem I wrote a preprocess pass which is run just before >> PHIElimination and change the opcode of PHIs that have at least one source alue >> generated by a DECJNZ. In this way it is ignored by the PHIElimination passes >> and then a pass run just after PHIElimination that 'lowers' in a custom way the >> marked PHIs and updates the information about live variables. >> >> With some tests I had a second problem generated by the spilling of the register >> used as loop counter. A store instruction is generated after the definition of >> the value, so is inserted between the DECJNZ and J. >> >> I think that with another pass I can try to manually move the spill-store >> instruction at the beginning of the destination basic block, but I think it's >> not enough to preserve the semantics of the code. >> >> Is my approach correct? Does it exist a cleaner and more elegant way to support >> this kind of instruction? I tried to look for a similar instruction in other >> targets, but I've found nothing... > > See PPCCTRLoops.cpp in the PPC backend. > > -Eli >I took a quick look to PPCCTRLoops.cpp, but I don't think it's the same of my case. The instruction I have defines explicitly a GPR register, while the BDNZ defines and uses an implicit dedicated register. I based my optimization pass on what seen in Hexagon target too, but due to the fact that I have an explicit GPR as loop counter, my instruction defines a virtual register and the fact that the instruction is also a terminator creates all the problems I listed in my previous mail. Regards, Michele Scandale