thr3ads.net - llvm dev - [LLVMdev] Register Pairing [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Borja Ferrer

2010-Dec-01 21:02 UTC

[LLVMdev] Register Pairing

Jeff thanks for those suggestions, that's exactly what i would like to do,
however i dont know how to do it with my current knowledge :\ As far as i
understand patterns only take one instruction as an input (while the pattern
you wrote before takes two) and also, i dont know how to handle register
copying (COPY) in the .td file because they're handled in a different way to
the rest of instructions.

Jakob i know my approach is not very suitable watching the results i got.
The example i wrote in my previous email was just a trivial function to see
if what i was trying to do was good.
Without doing what i mentioned and letting LLVM expand all operations wider
than 8 bits as you asked, the code produced is excellent supposing that many
of the moves there should be 16 bit moves reducing code size and right
register allocation, also something important for me is that the code is
better than gcc's. When i say right reg allocation it doesnt mean it's
doing
things wrong, i mean it's getting regs freely without pairing regs because i
dont know how to do it. So now i have to push things further and implement
these details to make the backend introduce those 16 bit instructions i dont
know how to insert, and this is where i need help.

My 2 big questions are:

- How to tell the register allocator to store 16 bit data in two contiguous
8 bit regs being the low part an even reg? Remember data would be expanded
into 8 bit ops, so when we're working with dags after type legalization do
we really know the original type before expansion?
As an example, storing a short in r21:r20 would be valid, but r20:r19 or
r20:r18 would be invalid because the in the first case the low reg is odd
and in the second case regs arent contiguous.
To store data wider than 16 bits, for example for a 32 bit int we would use
2 register pairs (4 8bit regs) but here the pairs dont need to be contiguous
so storing it r25:r24:r19:r18 is completely fine.
As i said in my previous email this is achieved by using HARD_REGNO_MODE_OK
in gcc.

- Second, assuming the previous point works so the register constraints are
done, how would i then proceed and combine two 8 bit instructions into a 16
bit one as Jeff pointed out in his email?
For example i want to combine a 16 bit add like this:
// b = b + 1:
(b stored in r25:r24)
add r24, 1
adc r25, 0
into
adw r25:r24, 1

and the one im talking all my mails about
move r25, r23
move r24, r22
into
movw r25:r24, r23:r22

or
move r18, r2
move r19, r3
into
movw r19:r18, r3:r2
any combination of moves with reg pairs are valid.

I wrote a function pass to test, it scanned for moves and checked if next
instruction was a move to see if globally it was a 16 move and replace those
2 insts with a movw. But this pass lacked lots of details for example if a
different instruction was placed in between the moves it wouldnt be able to
detect it.

Your help is really appreciated, and thanks to both for your patience.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101201/fc30c2c6/attachment.html>

Lang Hames

2010-Dec-02 01:57 UTC

head link

[LLVMdev] Register Pairing

Hi Borja,

> Without doing what i mentioned and letting LLVM expand all operations wider
> than 8 bits as you asked, the code produced is excellent supposing that
many
> of the moves there should be 16 bit moves reducing code size and right
> register allocation, also something important for me is that the code is
> better than gcc's. When i say right reg allocation it doesnt mean
it's doing
> things wrong, i mean it's getting regs freely without pairing regs
because i
> dont know how to do it. So now i have to push things further and implement
> these details to make the backend introduce those 16 bit instructions i
dont
> know how to insert, and this is where i need help.
>
I'm not sure how to help with inserting the 16-bit instructions, but if you
go with using allocating 8-bit virtuals the PBQP allocator can handle the
pairing constraint. An overview of how to use the allocator is given in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/034781.html , and
the follow up emails. If you think it might be useful and have any questions
I'd be happy to help.

Cheers,
Lang.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101202/ccc325d1/attachment.html>

Borja Ferrer

2010-Dec-05 16:17 UTC

head link

[LLVMdev] Register Pairing

Hello Lang, thanks for the suggestion :) it's very interesting. I'll
take a
read to the email you've pointed out there to understand how it works. Btw,
does this mean that only your allocator is able to handle or support this
type of constraint?

As a follow up to my previous email, the following code is a real example of
what i was explaining, Lang this example is exactly why i need the
constraints and how to combine instructions.

typedef short t;
extern t mcos(t a);
extern t mdiv(t a, t b);
t foo(t a, t b)
{
    short p1 = mcos(b);
    short p2 = mcos(a);
    return mdiv(p1&p2, p1^p2);
}

This C code produces:
; a<- r25:r24   b<--r23:r22
    mov    r18, r24
    mov    r19, r25 <-- can be combined into a movw r19:r18, r25:r24
    mov    r25, r23
    mov    r24, r22 <-- can be combined into a movw r25:r24, r23:r22
    call    mcos
; here we have the case i was explaining, pairs dont match because they're
the other way round, function result is in r25:r24
; but it's storing the hi part in r20 instead of r21, so we cant insert a
movw
    mov    r20, r25
    mov    r21, r24 <--- should be mov r21, r25; mov r20, r24 to be able to
insert a movw
    mov    r25, r19
    mov    r24, r18 <-- can be combined into a movw r25:r24, r19:r18
    call    mcos
; same problem as above, again it's moving the hi part in r25 into r18
instead of r19 so we've lost another movw here
    mov    r18, r25 <-- should be mov r19, r25
    and    r18, r20
    mov    r19, r24 <-- should be mov r18, r24
    and    r19, r21
    mov    r23, r25 <-- this *
    eor    r23, r20
    mov    r22, r24 <--  * and this could be combined into movw r23:r22,
r25:r24
    eor    r22, r21
    mov    r25, r18
    mov    r24, r19 <-- because the result returned by the second call to
mcos is stored in r18:r19 (other way round)
                          ; we've lost another movw
    call    mdiv
    ret

As you can see there are three cases were we've lost the opportunity of
inserting a movw. This is a very important instruction because if we're able
to insert it everywhere it's possible we can reduce the code in 7
instructions and 7 cycles which is a huge gain. Ignoring these issues the
code produced is optimal. I hope this example make things more clearer.
Also, as i said in my previous email i wrote a function pass searching for 2
moves in a row and combining them into a movw since i dont know a better way
of combining machine instructions, however in this case it wouldnt work for
the second half of the program because there are "and" and
"xor"
instructions in between the moves breaking the simple heuristic the function
pass has.

Thanks for the help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101205/4337dad7/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Dec 2010 - [LLVMdev] Register Pairing

[LLVMdev] Register Pairing

[LLVMdev] Register Pairing

[LLVMdev] Register Pairing

Reasonably Related Threads