thr3ads.net - llvm dev - [llvm-dev] PBQP register allocation and copy propagation [Jun 2016]

If this information is useful, please help other people find it:
Share via:

James Molloy via llvm-dev

2016-Jun-02 17:22 UTC

[llvm-dev] PBQP register allocation and copy propagation

Hi Lang and Arnaud,

I've been testing out the PBQP allocator for Thumb-2 and have ran into a
problem I'd love to get your input on.

The problem is exemplfied in the codegen for the function @bar in the
attached IR file:

bar:
        push    {r4, lr}
        sub     sp, #12
 (1)    movw    r2, :lower16:.L_MergedGlobals
 (1)    movt    r2, :upper16:.L_MergedGlobals
        ldm.w   r2, {r0, r1, r3, r12, lr}
        ldrd    r4, r2, [r2, #20]
        strd    lr, r4, [sp]
        str     r2, [sp, #8]
 (2)    mov     r2, r3         ****
        mov     r3, r12        ****
        bl      baz
        add     sp, #12
        pop     {r4, pc}

The two moves marked with **** are unnecessary. Especially the first, which
could be removed simply by swapping r2 and r3. This becomes even more
pronounced when I teach PBQP about how best to use registers to allow
LDM/STM formation; the codegen is perfect apart from those two moves that
just won't go.

I've discovered that the underlying problem is that line (1) is reduced
after line (2). During the backpropagation phase (1) is allocated r2 which
means (2) cannot, as they interfere.

Interestingly they're both reduced by rule RN and as the spill cost is the
same between them (and they're both conservatively allocatable), it's
pure
luck which gets allocated first. I have a patch to account for the
opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost
to r2 in (1)'s affinity matrix; this seems to work. However I don't
really
know what's the right approach here - dealing with this problem purely by
propagating opportunity costs through affinity matrices or doing something
better with the reduction order.

This ties in with another problem I'm seeing with a prototype live-range
splitter for PBQP. Obviously when pre-splitting around every instruction,
many copies are inserted. But the above problem rears its head again!
Consider this:

%vregB = COPY %R0
%vregA = COPY %vregB

      COPY
  A <------> B

Node B has an affinity for register R0. The affinity edge between A and B
is a simple coalescing affinity (identity matrix). If B is assigned first,
it will get R0 and then A will also select R0. However if A is selected
first then it doesn't know anything about node B's affinities and could
thus get any register, causing a MOV.

The above trivial example would be reduced by rule R1, so would actually
work optimally (because the reduction would account for B's costs in A).
However it is trivial to add two or more interfering defs that cause the
reduction rule to change to RN:

%vregC = def...
%vregD = def...
%vregB = COPY %R0
%vregA = COPY %vregB
       = use %vregC, %vregD

Now the reduction order is arbitrary and local costs aren't propagated
unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald
[1]
that seeks to fix a very similar case to this.

What do you think?

Cheers,

James

[1] SSA-based Register Allocation with PBQP, Buchwald et al,
https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160602/11044b87/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zia.ll
Type: application/octet-stream
Size: 2607 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160602/11044b87/attachment.obj>

Sebastian Buchwald via llvm-dev

2016-Jun-02 18:56 UTC

head link

[llvm-dev] PBQP register allocation and copy propagation

On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote:> Hi Lang and Arnaud,
>
> I've been testing out the PBQP allocator for Thumb-2 and have ran into 
> a problem I'd love to get your input on.
>
> The problem is exemplfied in the codegen for the function @bar in the 
> attached IR file:
>
> bar:
>         push    {r4, lr}
>         sub     sp, #12
>  (1)    movw    r2, :lower16:.L_MergedGlobals
>  (1)    movt    r2, :upper16:.L_MergedGlobals
>         ldm.w   r2, {r0, r1, r3, r12, lr}
>         ldrd    r4, r2, [r2, #20]
>         strd    lr, r4, [sp]
>         str     r2, [sp, #8]
>  (2)    mov     r2, r3         ****
>         mov     r3, r12        ****
>         bl      baz
>         add     sp, #12
>         pop     {r4, pc}
>
> The two moves marked with **** are unnecessary. Especially the first, 
> which could be removed simply by swapping r2 and r3. This becomes even 
> more pronounced when I teach PBQP about how best to use registers to 
> allow LDM/STM formation; the codegen is perfect apart from those two 
> moves that just won't go.
>
> I've discovered that the underlying problem is that line (1) is 
> reduced after line (2). During the backpropagation phase (1) is 
> allocated r2 which means (2) cannot, as they interfere.
>
> Interestingly they're both reduced by rule RN and as the spill cost is 
> the same between them (and they're both conservatively allocatable), 
> it's pure luck which gets allocated first. I have a patch to account 
> for the opportunity cost of allocating r2 to (1) instead of (2) by 
> assigning a cost to r2 in (1)'s affinity matrix; this seems to work. 
> However I don't really know what's the right approach here -
dealing
> with this problem purely by propagating opportunity costs through 
> affinity matrices or doing something better with the reduction order.
>
> This ties in with another problem I'm seeing with a prototype 
> live-range splitter for PBQP. Obviously when pre-splitting around 
> every instruction, many copies are inserted. But the above problem 
> rears its head again! Consider this:
>
> %vregB = COPY %R0
> %vregA = COPY %vregB
>
>       COPY
>   A <------> B
>
> Node B has an affinity for register R0. The affinity edge between A 
> and B is a simple coalescing affinity (identity matrix). If B is 
> assigned first, it will get R0 and then A will also select R0. However 
> if A is selected first then it doesn't know anything about node B's
> affinities and could thus get any register, causing a MOV.
>
> The above trivial example would be reduced by rule R1, so would 
> actually work optimally (because the reduction would account for B's 
> costs in A). However it is trivial to add two or more interfering defs 
> that cause the reduction rule to change to RN:
>
> %vregC = def...
> %vregD = def...
> %vregB = COPY %R0
> %vregA = COPY %vregB
>        = use %vregC, %vregDI think one idea to improve the situation is to consider the cost vector 
of adjacent nodes during RN. Let's say you decided to do a RN for node A 
and want to compute the costs for choosing register %Ri. The current 
implementation does this by computing min(row/column i of edge A <--> B) 
but you can do better by adding B's cost vector to the row/column before 
computing the minimum. This way you also consider B's affinitiy for %R0.
>
> Now the reduction order is arbitrary and local costs aren't propagated 
> unlike R1 and R2. In fact, there is a rule "RM" proposed by
Buchwald
> [1] that seeks to fix a very similar case to this.Porting RM to that situation would basically say "always fulfill this 
affinity" (beween A and B). Of course that's the wrong choice, if you 
really need the copy.

Best regards,
Sebastian
>
> What do you think?
>
> Cheers,
>
> James
>
> [1] SSA-based Register Allocation with PBQP, Buchwald et al, 
> https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160602/ce77e5e8/attachment.html>

Arnaud De Grandmaison via llvm-dev

2016-Jun-03 13:32 UTC

head link

[llvm-dev] PBQP register allocation and copy propagation

Hi James,

I’ve tried to play in the past with the allocation order, which can definitely
be tweaked and improved. The metric we use for spill cost being what it is (i.e.
not targeted for PBQP, but that’s a different subject), I found it made real
sense to use some other heuristics to sort nodes (on top of the spill cost) when
there was a tie between them.  Of course, that’s a heuristic and it can
sometimes be wrong. Another option I tried was to ensure the nodes’ allocation
order gets updated when some costs are propagated.

I have also tried to implement something along the lines of what Sebastian
suggested (i.e. considering the cost of adjacent nodes). This made the
allocation better, but allocation time went very high --- but I had no time to
try to improve the heuristic.

I did not know about Sebastian et al paper, so thanks for pointing to it !

Cheers,
Arnaud

From: Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu]
Sent: 02 June 2016 20:57
To: llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com;
Arnaud De Grandmaison
Subject: Re: [llvm-dev] PBQP register allocation and copy propagation

On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote:
Hi Lang and Arnaud,

I've been testing out the PBQP allocator for Thumb-2 and have ran into a
problem I'd love to get your input on.

The problem is exemplfied in the codegen for the function @bar in the attached
IR file:

bar:
        push    {r4, lr}
        sub     sp, #12
 (1)    movw    r2, :lower16:.L_MergedGlobals
 (1)    movt    r2, :upper16:.L_MergedGlobals
        ldm.w   r2, {r0, r1, r3, r12, lr}
        ldrd    r4, r2, [r2, #20]
        strd    lr, r4, [sp]
        str     r2, [sp, #8]
 (2)    mov     r2, r3         ****
        mov     r3, r12        ****
        bl      baz
        add     sp, #12
        pop     {r4, pc}

The two moves marked with **** are unnecessary. Especially the first, which
could be removed simply by swapping r2 and r3. This becomes even more pronounced
when I teach PBQP about how best to use registers to allow LDM/STM formation;
the codegen is perfect apart from those two moves that just won't go.

I've discovered that the underlying problem is that line (1) is reduced
after line (2). During the backpropagation phase (1) is allocated r2 which means
(2) cannot, as they interfere.

Interestingly they're both reduced by rule RN and as the spill cost is the
same between them (and they're both conservatively allocatable), it's
pure luck which gets allocated first. I have a patch to account for the
opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost to
r2 in (1)'s affinity matrix; this seems to work. However I don't really
know what's the right approach here - dealing with this problem purely by
propagating opportunity costs through affinity matrices or doing something
better with the reduction order.

This ties in with another problem I'm seeing with a prototype live-range
splitter for PBQP. Obviously when pre-splitting around every instruction, many
copies are inserted. But the above problem rears its head again! Consider this:

%vregB = COPY %R0
%vregA = COPY %vregB

      COPY
  A <------> B

Node B has an affinity for register R0. The affinity edge between A and B is a
simple coalescing affinity (identity matrix). If B is assigned first, it will
get R0 and then A will also select R0. However if A is selected first then it
doesn't know anything about node B's affinities and could thus get any
register, causing a MOV.

The above trivial example would be reduced by rule R1, so would actually work
optimally (because the reduction would account for B's costs in A). However
it is trivial to add two or more interfering defs that cause the reduction rule
to change to RN:

%vregC = def...
%vregD = def...
%vregB = COPY %R0
%vregA = COPY %vregB
       = use %vregC, %vregD
I think one idea to improve the situation is to consider the cost vector of
adjacent nodes during RN. Let's say you decided to do a RN for node A and
want to compute the costs for choosing register %Ri. The current implementation
does this by computing min(row/column i of edge A <--> B) but you can do
better by adding B's cost vector to the row/column before computing the
minimum. This way you also consider B's affinitiy for %R0.



Now the reduction order is arbitrary and local costs aren't propagated
unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald
[1] that seeks to fix a very similar case to this.
Porting RM to that situation would basically say "always fulfill this
affinity" (beween A and B). Of course that's the wrong choice, if you
really need the copy.

Best regards,
Sebastian



What do you think?

Cheers,

James

[1] SSA-based Register Allocation with PBQP, Buchwald et al,
https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf




_______________________________________________

LLVM Developers mailing list

llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160603/5605ed9e/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jun 2016 - PBQP register allocation and copy propagation

[llvm-dev] PBQP register allocation and copy propagation

[llvm-dev] PBQP register allocation and copy propagation

[llvm-dev] PBQP register allocation and copy propagation

Apparently Analagous Threads