Hi all, We are working on a backend for a machine that has 4-wide vector register & ops, *but* not vector loads. All the vector register elements are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and I7 as its (integer) subregisters. Subregisters of same reg *never* overlap. Therefore, vector loads are lowered to scalar loads followed by a chain of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything fine to that point. Status before live analisys is (non-related instrs removed): 36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] 68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>, %reg16388<kill>, 1 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] 116 %reg16400<def> = MOVEV %reg16392<kill> 124 %reg16400<def> = INSERT_SUBREG %reg16400, %reg16394<kill>, 2 132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>] 164 %reg16404<def> = MOVEV %reg16400<kill> 172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3 180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>] 212 %reg16408<def> = MOVEV %reg16404<kill> 220 %reg16408<def> = INSERT_SUBREG %reg16408, %reg16405<kill>, 4 Which after register coalescing gets transformed into: 36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] 124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2 132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>] 172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3 180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>] 220 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>, 4 The code is correct, but not optimal. I would like the loads to go directly to the subregisters of %reg16404, avoiding the extra copies. But seems Live Range Analisys interprets %reg16404 to be alive in the whole range, thus preventing coalescing between its subregs and the load destinations. Is there a way to solve this? As an alternate approach, I also tried to do a custom InstrInserter that ended with the correct code just after MI emission: 68 %reg16392<def> = LDWr %reg16384<kill>, 0; mem:LD4[<unknown>] 76 %reg16393<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] 84 %reg16394<def> = LDWr %reg16387<kill>, 0; mem:LD4[<unknown>] 92 %reg16395<def> = LDWr %reg16388<kill>, 0; mem:LD4[<unknown>] 132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill> 140 %reg16400:2<def> = MOVI32rr %reg16393<kill> 148 %reg16400:3<def> = MOVI32rr %reg16394<kill> 156 %reg16400:4<def> = MOVI32rr %reg16395<kill> but then Live Range Analysis asserts because of multiply defined % reg16400. Can anyone give me a clue on the correct way to handle this situation? Thanks! Carlos
The ARM backend uses REG_SEQUENCE to solve this problem for NEON registers. REG_SEQUENCE is basically a parallel INSERT_SUBREG operation that inserts all of the subregs at once so the coalescer can deal with it. During the TwoAddressInstructionPass, the REG_SEQUENCE operations are replaced by direct references to the subregs, which is what you want. Look at the ARM backend to see how this works. On Jul 28, 2010, at 12:25 PM, Carlos Sánchez de La Lama wrote:> Hi all, > > We are working on a backend for a machine that has 4-wide vector > register & ops, *but* not vector loads. All the vector register elements > are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and > I7 as its (integer) subregisters. Subregisters of same reg *never* > overlap. > > Therefore, vector loads are lowered to scalar loads followed by a chain > of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything > fine to that point. > > Status before live analisys is (non-related instrs removed): > > 36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] > 68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>, %reg16388<kill>, 1 > 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] > 116 %reg16400<def> = MOVEV %reg16392<kill> > 124 %reg16400<def> = INSERT_SUBREG %reg16400, %reg16394<kill>, 2 > 132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>] > 164 %reg16404<def> = MOVEV %reg16400<kill> > 172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3 > 180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>] > 212 %reg16408<def> = MOVEV %reg16404<kill> > 220 %reg16408<def> = INSERT_SUBREG %reg16408, %reg16405<kill>, 4 > > Which after register coalescing gets transformed into: > > 36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] > 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] > 124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2 > 132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>] > 172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3 > 180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>] > 220 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>, 4 > > The code is correct, but not optimal. I would like the loads to go > directly to the subregisters of %reg16404, avoiding the extra copies. > But seems Live Range Analisys interprets %reg16404 to be alive in the > whole range, thus preventing coalescing between its subregs and the load > destinations. > > Is there a way to solve this? > > As an alternate approach, I also tried to do a custom InstrInserter that > ended with the correct code just after MI emission: > > 68 %reg16392<def> = LDWr %reg16384<kill>, 0; mem:LD4[<unknown>] > 76 %reg16393<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] > 84 %reg16394<def> = LDWr %reg16387<kill>, 0; mem:LD4[<unknown>] > 92 %reg16395<def> = LDWr %reg16388<kill>, 0; mem:LD4[<unknown>] > 132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill> > 140 %reg16400:2<def> = MOVI32rr %reg16393<kill> > 148 %reg16400:3<def> = MOVI32rr %reg16394<kill> > 156 %reg16400:4<def> = MOVI32rr %reg16395<kill> > > but then Live Range Analysis asserts because of multiply defined % > reg16400. > > Can anyone give me a clue on the correct way to handle this situation? > > Thanks! > > Carlos > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Jul 28, 2010, at 12:25 PM, Carlos Sánchez de La Lama wrote:> Which after register coalescing gets transformed into: > > 36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] > 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] > 124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2 > 132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>] > 172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3 > 180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>] > 220 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>, 4 > > The code is correct, but not optimal. I would like the loads to go > directly to the subregisters of %reg16404, avoiding the extra copies. > But seems Live Range Analisys interprets %reg16404 to be alive in the > whole range, thus preventing coalescing between its subregs and the load > destinations.Right. This is a deficiency in the coalescer. It doesn't deal well with multiple values being inserted into a larger register. As you correctly observed, it doesn't understand that a live interval can be partially defined and so may not interfere. The opposite direction should be fine - using EXTRACT_SUBREG to get small registers from the larger one. Your stores ought to coalesce properly.> Is there a way to solve this?What Bob said. Use REG_SEQUENCE. You may have to use LLVM from Subversion to do that. Your machine code looks like you are using 2.7.> As an alternate approach, I also tried to do a custom InstrInserter that > ended with the correct code just after MI emission: > > 68 %reg16392<def> = LDWr %reg16384<kill>, 0; mem:LD4[<unknown>] > 76 %reg16393<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] > 84 %reg16394<def> = LDWr %reg16387<kill>, 0; mem:LD4[<unknown>] > 92 %reg16395<def> = LDWr %reg16388<kill>, 0; mem:LD4[<unknown>] > 132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill> > 140 %reg16400:2<def> = MOVI32rr %reg16393<kill> > 148 %reg16400:3<def> = MOVI32rr %reg16394<kill> > 156 %reg16400:4<def> = MOVI32rr %reg16395<kill> > > but then Live Range Analysis asserts because of multiply defined % > reg16400.You can't do it like that because the machine code must be in SSA form until TwoAddressInstructionPass runs. This is why we keep the INSERT_SUBREG instruction around instead of just lowering it to %reg16404:4<def> = COPY %reg16405<kill> On the other hand, EXTRACT_SUBREG is translated to a subreg COPY immediately. /jakob
Hi,>> Is there a way to solve this? > > What Bob said. Use REG_SEQUENCE. You may have to use LLVM from > Subversion to do that. Your machine code looks like you are using 2.7.Ok, so I understand REG_SEQUENCE is to BUILD_VECTOR what INSERT_SUBREG is to INSERT_VECTOR_ELT, in a way. I was actually looking for such a target opcode and could not find it in 2.7 (you are right, that is what i am using). I guess it is time to upgrade. Thanks Jakob and Bob, Carlos