thr3ads.net - llvm dev - [LLVMdev] Register Pairing [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Borja Ferrer

2010-Nov-27 16:56 UTC

[LLVMdev] Register Pairing

Hello, some months ago i wrote to the mailing list asking some questions
about register pairing, i've been experimenting several things with the help
i got back then.

Some background first: this issue is for a backend for an 8bit
microcontroller with only 8bit regs, however it has a few 16bit instructions
that only work with fixed register pairs, so it doesnt allow all
combinations of regs. This introduces some problems because if data wider
than 8bits is expanded into 8bit operations the 16bit instructions never get
selected, also the reg allocator should allocate adjacent regs to form the
pairs. The most important 16 bit instruction is data movement, this
instruction can move register pairs in a single cycle, doing something like
this:
mov r25, r23
mov r24, r22
into:
movw r25:r24, r23:r22
The key point here is that the movw instruction can only move data of fixed
pairs in this way. movw Rdest+1:Rdest, Rorig+1:Rorig, so movw R25:R23,
R21:R18 is illegal because registers pairs aren't adjacent.

Explaining this as if it was for x86 may make things more clear. Suppose we
only have 8bit regs (ah, al, cl, ch, bl, bh, etc...) with 8 bit instructions
and a few 16 bit instrs that only work with ax, bx, cx,.... We could pair
ah:al to form ax, and so on. Then we have a movw instruction that is able to
work with reg pairs, such as ax, bx, cx, etc... This way if we store a short
in ah:al we could copy it into another pair, but storing it into al:cl would
be bad because movw wouldnt be emitted. Also these reg pairs would be stored
in a pseudo reg class so the 16 bit intrs would be emitted, otherwise none
of them would be produced.

Going back to my case I want to favor storing 16bit or wider data in
adjacent regs to favor the insertion of this instruction like
HARD_REGNO_MODE_OK in gcc. To overcome this problem and to select 16bit
instructions automatically by LLVM i thought on working always with 16 bit
register pairs. I created a pseudo reg class that contains the reg pairs
marking each pair with its 2 subregs of 8 bits. Then i marked the i16 type
legal with addRegisterClass(), although this is nearly a lie because i16 is
only legal in a few specific cases, it makes LLVM think that working with
the pairs is fine.
Now for example to perform a 16 bit sum consider this code:
typedef unsigned short t;
t foo(t a, t b, t c)
{
    return a+b;
}
Incoming args come in register pairs, split them into the 8bit subregs
perform the addition and combine them again into the pair ready for the next
operation, in this case we're returning the result. This gives me this code
before instr sel:
# Machine code for function foo:
Function Live Ins: %R25R24 in reg%16384, %R23R22 in reg%16385
Function Live Outs: %R25R24

BB#0: derived from LLVM BB %entry
    Live Ins: %R25R24 %R23R22
    %reg16385<def> = COPY %R23R22; WDREGS:%reg16385 // COPY B
    %reg16384<def> = COPY %R25R24; WDREGS:%reg16384  //  COPY A
    %reg16387<def> = COPY %reg16384:ssub_0; GPR8:%reg16387
WDREGS:%reg16384
// EXTRACT LO BYTE OF A
    %reg16388<def> = COPY %reg16385:ssub_0; GPR8:%reg16388
WDREGS:%reg16385
// EXTRACT LO BYTE OF B
    %reg16389<def> = COPY %reg16384:ssub_1; GPR8:%reg16389
WDREGS:%reg16384
// EXTRACT HI BYTE OF A
    %reg16390<def> = COPY %reg16385:ssub_1; GPR8:%reg16390
WDREGS:%reg16385
// EXTRACT HI BYTE OF B
    %reg16391<def> = ADDRdRr %reg16388, %reg16387<kill>,
%SREG<imp-def>;
GPR8:%reg16391,16388,16387 // ADD LO BYTES
    %reg16392<def> = ADCRdRr %reg16390, %reg16389<kill>,
%SREG<imp-def,dead>, %SREG<imp-use>; GPR8:%reg16392,16390,16389 //
ADDC HI
BYTES
    %reg16393<def> = REG_SEQUENCE %reg16391<kill>, ssub_0,
%reg16392<kill>,
ssub_1; WDREGS:%reg16393 GPR8:%reg16391,16392 // COMBINE INTO REG PAIR AGAIN
    %R25R24<def> = COPY %reg16393; WDREGS:%reg16393 // COPY REG PAIR INTO
RETURN REG.
    RET

This is fine until we get to the register allocation stage, there it does:
BB#0: derived from LLVM BB %entry
    Live Ins: %R25R24 %R23R22
    %R18<def> = COPY %R24
    %R19<def> = COPY %R25
    %R24<def> = COPY %R22<kill>, %R25R24<imp-def>
    %R24<def> = ADDRdRr %R24, %R18<kill>, %SREG<imp-def>
    %R25<def> = COPY %R23<kill>
    %R25<def> = ADCRdRr %R25, %R19<kill>, %SREG<imp-def,dead>,
%SREG<imp-use,kill>
    RET %R25R24<imp-use,kill>

Here instead of splitting the pair into its own subregs, it's copying each
subreg into r18 and r19 making useless movements.
The optimal code should be:
add r24, r22
adc r25, r23

So my first question would be how to solve this?
And going on further, is this the correct way to model the pairs to get the
behaviour i need, or is there a better and direct way of handling this? You
can think of the x86 example i wrote before.

Thanks for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101127/6fa166b2/attachment.html>

Jeff Kunkel

2010-Nov-29 18:16 UTC

head link

[LLVMdev] Register Pairing

Can this not be done with a pattern? In <name>InstrInfo.td one could build
a
pattern in functional pseudocode like

def Mov8Mov8ToMov16 : (8 bit register move, r1, r2), (8 bit register move,
r3, r4) may be reduced to (16 bit register move, r1,r2, r3,r4).

However, your case is a bit more complicated than the generic case. The
pattern would have to have a constraint. So it may be worth extending
patterns one is able to define in <..>InstrInfo.td to allow for
conditional
cases.

def Mov8Mov8ToMov16 : (8 bit register move, r1, r2), (8 bit register move,
r3, r4) may be reduced to (16 bit register move, r1,r2, r3,r4) if the
conditional <function name> is satisfied. Where the function name takes in
as many parameters as there are inputs. For this case the function would
take 6 parameters (first reg class, r1, r2, second reg class, r3, r4 ).

- My 2 cents
- Jeff Kunkel


On Sat, Nov 27, 2010 at 11:56 AM, Borja Ferrer <borja.ferav at
gmail.com>wrote:
> Hello, some months ago i wrote to the mailing list asking some questions
> about register pairing, i've been experimenting several things with the
help
> i got back then.
>
> Some background first: this issue is for a backend for an 8bit
> microcontroller with only 8bit regs, however it has a few 16bit
instructions
> that only work with fixed register pairs, so it doesnt allow all
> combinations of regs. This introduces some problems because if data wider
> than 8bits is expanded into 8bit operations the 16bit instructions never
get
> selected, also the reg allocator should allocate adjacent regs to form the
> pairs. The most important 16 bit instruction is data movement, this
> instruction can move register pairs in a single cycle, doing something like
> this:
> mov r25, r23
> mov r24, r22
> into:
> movw r25:r24, r23:r22
> The key point here is that the movw instruction can only move data of fixed
> pairs in this way. movw Rdest+1:Rdest, Rorig+1:Rorig, so movw R25:R23,
> R21:R18 is illegal because registers pairs aren't adjacent.
>
> Explaining this as if it was for x86 may make things more clear. Suppose we
> only have 8bit regs (ah, al, cl, ch, bl, bh, etc...) with 8 bit
instructions
> and a few 16 bit instrs that only work with ax, bx, cx,.... We could pair
> ah:al to form ax, and so on. Then we have a movw instruction that is able
to
> work with reg pairs, such as ax, bx, cx, etc... This way if we store a
short
> in ah:al we could copy it into another pair, but storing it into al:cl
would
> be bad because movw wouldnt be emitted. Also these reg pairs would be
stored
> in a pseudo reg class so the 16 bit intrs would be emitted, otherwise none
> of them would be produced.
>
> Going back to my case I want to favor storing 16bit or wider data in
> adjacent regs to favor the insertion of this instruction like
> HARD_REGNO_MODE_OK in gcc. To overcome this problem and to select 16bit
> instructions automatically by LLVM i thought on working always with 16 bit
> register pairs. I created a pseudo reg class that contains the reg pairs
> marking each pair with its 2 subregs of 8 bits. Then i marked the i16 type
> legal with addRegisterClass(), although this is nearly a lie because i16 is
> only legal in a few specific cases, it makes LLVM think that working with
> the pairs is fine.
> Now for example to perform a 16 bit sum consider this code:
> typedef unsigned short t;
> t foo(t a, t b, t c)
> {
>     return a+b;
> }
> Incoming args come in register pairs, split them into the 8bit subregs
> perform the addition and combine them again into the pair ready for the
next
> operation, in this case we're returning the result. This gives me this
code
> before instr sel:
> # Machine code for function foo:
> Function Live Ins: %R25R24 in reg%16384, %R23R22 in reg%16385
> Function Live Outs: %R25R24
>
> BB#0: derived from LLVM BB %entry
>     Live Ins: %R25R24 %R23R22
>     %reg16385<def> = COPY %R23R22; WDREGS:%reg16385 // COPY B
>     %reg16384<def> = COPY %R25R24; WDREGS:%reg16384  //  COPY A
>     %reg16387<def> = COPY %reg16384:ssub_0; GPR8:%reg16387
WDREGS:%reg16384
> // EXTRACT LO BYTE OF A
>     %reg16388<def> = COPY %reg16385:ssub_0; GPR8:%reg16388
WDREGS:%reg16385
> // EXTRACT LO BYTE OF B
>     %reg16389<def> = COPY %reg16384:ssub_1; GPR8:%reg16389
WDREGS:%reg16384
> // EXTRACT HI BYTE OF A
>     %reg16390<def> = COPY %reg16385:ssub_1; GPR8:%reg16390
WDREGS:%reg16385
> // EXTRACT HI BYTE OF B
>     %reg16391<def> = ADDRdRr %reg16388, %reg16387<kill>,
%SREG<imp-def>;
> GPR8:%reg16391,16388,16387 // ADD LO BYTES
>     %reg16392<def> = ADCRdRr %reg16390, %reg16389<kill>,
> %SREG<imp-def,dead>, %SREG<imp-use>; GPR8:%reg16392,16390,16389
// ADDC HI
> BYTES
>     %reg16393<def> = REG_SEQUENCE %reg16391<kill>, ssub_0,
%reg16392<kill>,
> ssub_1; WDREGS:%reg16393 GPR8:%reg16391,16392 // COMBINE INTO REG PAIR
AGAIN
>     %R25R24<def> = COPY %reg16393; WDREGS:%reg16393 // COPY REG PAIR
INTO
> RETURN REG.
>     RET
>
> This is fine until we get to the register allocation stage, there it does:
> BB#0: derived from LLVM BB %entry
>     Live Ins: %R25R24 %R23R22
>     %R18<def> = COPY %R24
>     %R19<def> = COPY %R25
>     %R24<def> = COPY %R22<kill>, %R25R24<imp-def>
>     %R24<def> = ADDRdRr %R24, %R18<kill>, %SREG<imp-def>
>     %R25<def> = COPY %R23<kill>
>     %R25<def> = ADCRdRr %R25, %R19<kill>,
%SREG<imp-def,dead>,
> %SREG<imp-use,kill>
>     RET %R25R24<imp-use,kill>
>
> Here instead of splitting the pair into its own subregs, it's copying
each
> subreg into r18 and r19 making useless movements.
> The optimal code should be:
> add r24, r22
> adc r25, r23
>
> So my first question would be how to solve this?
> And going on further, is this the correct way to model the pairs to get the
> behaviour i need, or is there a better and direct way of handling this? You
> can think of the x86 example i wrote before.
>
> Thanks for your help.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101129/07dc2903/attachment.html>

Jakob Stoklund Olesen

2010-Nov-29 18:37 UTC

head link

[LLVMdev] Register Pairing

On Nov 27, 2010, at 8:56 AM, Borja Ferrer wrote:
> Some background first: this issue is for a backend for an 8bit
microcontroller with only 8bit regs, however it has a few 16bit instructions
that only work with fixed register pairs, so it doesnt allow all combinations of
regs. This introduces some problems because if data wider than 8bits is expanded
into 8bit operations the 16bit instructions never get selected, also the reg
allocator should allocate adjacent regs to form the pairs. The most important 16
bit instruction is data movement, this instruction can move register pairs in a
single cycle
Are you doing this only to take advantage of a double move instruction?

If so, I think you are better off just using 8-bit registers. The 16-bit code
will never be as good as 8-bit code if you don't have a number of 16-bit
instructions.

[...]
> typedef unsigned short t;
> t foo(t a, t b, t c)
> {
>     return a+b;
> }
[...]
> This is fine until we get to the register allocation stage, there it does:
> BB#0: derived from LLVM BB %entry
>     Live Ins: %R25R24 %R23R22
>     %R18<def> = COPY %R24
>     %R19<def> = COPY %R25
>     %R24<def> = COPY %R22<kill>, %R25R24<imp-def>
>     %R24<def> = ADDRdRr %R24, %R18<kill>, %SREG<imp-def>
>     %R25<def> = COPY %R23<kill>
>     %R25<def> = ADCRdRr %R25, %R19<kill>,
%SREG<imp-def,dead>, %SREG<imp-use,kill>
>     RET %R25R24<imp-use,kill>
> 
> Here instead of splitting the pair into its own subregs, it's copying
each subreg into r18 and r19 making useless movements.
> The optimal code should be:
> add r24, r22
> adc r25, r23
> 
> So my first question would be how to solve this?
You should not expect LLVM to generate optimal code for toy examples, that is
not the design goal. You example requires a lot of trickery to compile
optimally, and it is not clear that such tricks would benefit more realistic
code.

How are you doing on real code?

/jakob

Borja Ferrer

2010-Dec-01 21:02 UTC

head link

[LLVMdev] Register Pairing

Jeff thanks for those suggestions, that's exactly what i would like to do,
however i dont know how to do it with my current knowledge :\ As far as i
understand patterns only take one instruction as an input (while the pattern
you wrote before takes two) and also, i dont know how to handle register
copying (COPY) in the .td file because they're handled in a different way to
the rest of instructions.

Jakob i know my approach is not very suitable watching the results i got.
The example i wrote in my previous email was just a trivial function to see
if what i was trying to do was good.
Without doing what i mentioned and letting LLVM expand all operations wider
than 8 bits as you asked, the code produced is excellent supposing that many
of the moves there should be 16 bit moves reducing code size and right
register allocation, also something important for me is that the code is
better than gcc's. When i say right reg allocation it doesnt mean it's
doing
things wrong, i mean it's getting regs freely without pairing regs because i
dont know how to do it. So now i have to push things further and implement
these details to make the backend introduce those 16 bit instructions i dont
know how to insert, and this is where i need help.

My 2 big questions are:

- How to tell the register allocator to store 16 bit data in two contiguous
8 bit regs being the low part an even reg? Remember data would be expanded
into 8 bit ops, so when we're working with dags after type legalization do
we really know the original type before expansion?
As an example, storing a short in r21:r20 would be valid, but r20:r19 or
r20:r18 would be invalid because the in the first case the low reg is odd
and in the second case regs arent contiguous.
To store data wider than 16 bits, for example for a 32 bit int we would use
2 register pairs (4 8bit regs) but here the pairs dont need to be contiguous
so storing it r25:r24:r19:r18 is completely fine.
As i said in my previous email this is achieved by using HARD_REGNO_MODE_OK
in gcc.

- Second, assuming the previous point works so the register constraints are
done, how would i then proceed and combine two 8 bit instructions into a 16
bit one as Jeff pointed out in his email?
For example i want to combine a 16 bit add like this:
// b = b + 1:
(b stored in r25:r24)
add r24, 1
adc r25, 0
into
adw r25:r24, 1

and the one im talking all my mails about
move r25, r23
move r24, r22
into
movw r25:r24, r23:r22

or
move r18, r2
move r19, r3
into
movw r19:r18, r3:r2
any combination of moves with reg pairs are valid.

I wrote a function pass to test, it scanned for moves and checked if next
instruction was a move to see if globally it was a 16 move and replace those
2 insts with a movw. But this pass lacked lots of details for example if a
different instruction was placed in between the moves it wouldnt be able to
detect it.

Your help is really appreciated, and thanks to both for your patience.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101201/fc30c2c6/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Dec 2010 - [LLVMdev] Register Pairing

[LLVMdev] Register Pairing

[LLVMdev] Register Pairing

[LLVMdev] Register Pairing

[LLVMdev] Register Pairing

Reasonably Related Threads