thr3ads.net - llvm dev - [LLVMdev] Subregister coalescing [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Carlos Sánchez de La Lama

2010-Jul-28 19:25 UTC

[LLVMdev] Subregister coalescing

Hi all,

We are working on a backend for a machine that has 4-wide vector
register & ops, *but* not vector loads. All the vector register elements
are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and
I7 as its (integer) subregisters. Subregisters of same reg *never*
overlap.

Therefore, vector loads are lowered to scalar loads followed by a chain
of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything
fine to that point.

Status before live analisys is (non-related instrs removed):

36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>,
%reg16388<kill>, 1
76 %reg16394<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
116 %reg16400<def> = MOVEV %reg16392<kill>
124 %reg16400<def> = INSERT_SUBREG %reg16400, %reg16394<kill>, 2
132 %reg16401<def> = LDWr %reg16390<kill>, 0;
mem:LD4[<unknown>]
164 %reg16404<def> = MOVEV %reg16400<kill>
172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3
180 %reg16405<def> = LDWr %reg16398<kill>, 0;
mem:LD4[<unknown>]
212 %reg16408<def> = MOVEV %reg16404<kill>
220 %reg16408<def> = INSERT_SUBREG %reg16408, %reg16405<kill>, 4

Which after register coalescing gets transformed into:

36	%reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
76	%reg16394<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
124	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2
132	%reg16401<def> = LDWr %reg16390<kill>, 0;
mem:LD4[<unknown>]
172	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3
180	%reg16405<def> = LDWr %reg16398<kill>, 0;
mem:LD4[<unknown>]
220	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>, 4

The code is correct, but not optimal. I would like the loads to go
directly to the subregisters of %reg16404, avoiding the extra copies.
But seems Live Range Analisys interprets %reg16404 to be alive in the
whole range, thus preventing coalescing between its subregs and the load
destinations.

Is there a way to solve this?

As an alternate approach, I also tried to do a custom InstrInserter that
ended with the correct code just after MI emission:

68 %reg16392<def> = LDWr %reg16384<kill>, 0;
mem:LD4[<unknown>]
76 %reg16393<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
84 %reg16394<def> = LDWr %reg16387<kill>, 0;
mem:LD4[<unknown>]
92 %reg16395<def> = LDWr %reg16388<kill>, 0;
mem:LD4[<unknown>]
132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill>
140 %reg16400:2<def> = MOVI32rr %reg16393<kill>
148 %reg16400:3<def> = MOVI32rr %reg16394<kill>
156 %reg16400:4<def> = MOVI32rr %reg16395<kill>

but then Live Range Analysis asserts because of multiply defined %
reg16400.

Can anyone give me a clue on the correct way to handle this situation?

Thanks!

Carlos

Bob Wilson

2010-Jul-28 20:07 UTC

head link

[LLVMdev] Subregister coalescing

The ARM backend uses REG_SEQUENCE to solve this problem for NEON registers. 
REG_SEQUENCE is basically a parallel INSERT_SUBREG operation that inserts all of
the subregs at once so the coalescer can deal with it.  During the
TwoAddressInstructionPass, the REG_SEQUENCE operations are replaced by direct
references to the subregs, which is what you want.  Look at the ARM backend to
see how this works.

On Jul 28, 2010, at 12:25 PM, Carlos Sánchez de La Lama wrote:
> Hi all,
> 
> We are working on a backend for a machine that has 4-wide vector
> register & ops, *but* not vector loads. All the vector register
elements
> are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and
> I7 as its (integer) subregisters. Subregisters of same reg *never*
> overlap.
> 
> Therefore, vector loads are lowered to scalar loads followed by a chain
> of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything
> fine to that point.
> 
> Status before live analisys is (non-related instrs removed):
> 
> 36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
> 68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>,
%reg16388<kill>, 1
> 76 %reg16394<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
> 116 %reg16400<def> = MOVEV %reg16392<kill>
> 124 %reg16400<def> = INSERT_SUBREG %reg16400, %reg16394<kill>,
2
> 132 %reg16401<def> = LDWr %reg16390<kill>, 0;
mem:LD4[<unknown>]
> 164 %reg16404<def> = MOVEV %reg16400<kill>
> 172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>,
3
> 180 %reg16405<def> = LDWr %reg16398<kill>, 0;
mem:LD4[<unknown>]
> 212 %reg16408<def> = MOVEV %reg16404<kill>
> 220 %reg16408<def> = INSERT_SUBREG %reg16408, %reg16405<kill>,
4
> 
> Which after register coalescing gets transformed into:
> 
> 36	%reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
> 76	%reg16394<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
> 124	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>,
2
> 132	%reg16401<def> = LDWr %reg16390<kill>, 0;
mem:LD4[<unknown>]
> 172	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>,
3
> 180	%reg16405<def> = LDWr %reg16398<kill>, 0;
mem:LD4[<unknown>]
> 220	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>,
4
> 
> The code is correct, but not optimal. I would like the loads to go
> directly to the subregisters of %reg16404, avoiding the extra copies.
> But seems Live Range Analisys interprets %reg16404 to be alive in the
> whole range, thus preventing coalescing between its subregs and the load
> destinations.
> 
> Is there a way to solve this?
> 
> As an alternate approach, I also tried to do a custom InstrInserter that
> ended with the correct code just after MI emission:
> 
> 68 %reg16392<def> = LDWr %reg16384<kill>, 0;
mem:LD4[<unknown>]
> 76 %reg16393<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
> 84 %reg16394<def> = LDWr %reg16387<kill>, 0;
mem:LD4[<unknown>]
> 92 %reg16395<def> = LDWr %reg16388<kill>, 0;
mem:LD4[<unknown>]
> 132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill>
> 140 %reg16400:2<def> = MOVI32rr %reg16393<kill>
> 148 %reg16400:3<def> = MOVI32rr %reg16394<kill>
> 156 %reg16400:4<def> = MOVI32rr %reg16395<kill>
> 
> but then Live Range Analysis asserts because of multiply defined %
> reg16400.
> 
> Can anyone give me a clue on the correct way to handle this situation?
> 
> Thanks!
> 
> Carlos
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Jakob Stoklund Olesen

2010-Jul-28 20:42 UTC

head link

[LLVMdev] Subregister coalescing

On Jul 28, 2010, at 12:25 PM, Carlos Sánchez de La Lama
wrote:> Which after register coalescing gets transformed into:
> 
> 36	%reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
> 76	%reg16394<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
> 124	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>,
2
> 132	%reg16401<def> = LDWr %reg16390<kill>, 0;
mem:LD4[<unknown>]
> 172	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>,
3
> 180	%reg16405<def> = LDWr %reg16398<kill>, 0;
mem:LD4[<unknown>]
> 220	%reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>,
4
> 
> The code is correct, but not optimal. I would like the loads to go
> directly to the subregisters of %reg16404, avoiding the extra copies.
> But seems Live Range Analisys interprets %reg16404 to be alive in the
> whole range, thus preventing coalescing between its subregs and the load
> destinations.
Right. This is a deficiency in the coalescer. It doesn't deal well with
multiple values being inserted into a larger register. As you correctly
observed, it doesn't understand that a live interval can be partially
defined and so may not interfere.

The opposite direction should be fine - using EXTRACT_SUBREG to get small
registers from the larger one. Your stores ought to coalesce properly.
> Is there a way to solve this?
What Bob said. Use REG_SEQUENCE. You may have to use LLVM from Subversion to do
that. Your machine code looks like you are using 2.7.
> As an alternate approach, I also tried to do a custom InstrInserter that
> ended with the correct code just after MI emission:
> 
> 68 %reg16392<def> = LDWr %reg16384<kill>, 0;
mem:LD4[<unknown>]
> 76 %reg16393<def> = LDWr %reg16386<kill>, 0;
mem:LD4[<unknown>]
> 84 %reg16394<def> = LDWr %reg16387<kill>, 0;
mem:LD4[<unknown>]
> 92 %reg16395<def> = LDWr %reg16388<kill>, 0;
mem:LD4[<unknown>]
> 132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill>
> 140 %reg16400:2<def> = MOVI32rr %reg16393<kill>
> 148 %reg16400:3<def> = MOVI32rr %reg16394<kill>
> 156 %reg16400:4<def> = MOVI32rr %reg16395<kill>
> 
> but then Live Range Analysis asserts because of multiply defined %
> reg16400.
You can't do it like that because the machine code must be in SSA form until
TwoAddressInstructionPass runs. This is why we keep the INSERT_SUBREG
instruction around instead of just lowering it to

%reg16404:4<def> = COPY %reg16405<kill>

On the other hand, EXTRACT_SUBREG is translated to a subreg COPY immediately.

/jakob

Carlos Sánchez de La Lama

2010-Jul-29 07:16 UTC

head link

[LLVMdev] Subregister coalescing

Hi,
>> Is there a way to solve this?
>
> What Bob said. Use REG_SEQUENCE. You may have to use LLVM from  
> Subversion to do that. Your machine code looks like you are using 2.7.
Ok, so I understand REG_SEQUENCE is to BUILD_VECTOR what INSERT_SUBREG  
is to INSERT_VECTOR_ELT, in a way. I was actually looking for such a  
target opcode and could not find it in 2.7 (you are right, that is  
what i am using). I guess it is time to upgrade.

Thanks Jakob and Bob,

Carlos

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Jul 2010 - [LLVMdev] Subregister coalescing

[LLVMdev] Subregister coalescing

[LLVMdev] Subregister coalescing

[LLVMdev] Subregister coalescing

[LLVMdev] Subregister coalescing

Reasonably Related Threads