James Molloy via llvm-dev
2016-Jun-03 18:09 UTC
[llvm-dev] PBQP register allocation and copy propagation
Hi,> > I think one idea to improve the situation is to consider the costvector of adjacent nodes during RN. Let's say you decided to do a RN for node A and want to compute the costs for choosing register %Ri. The current implementation does this by computing min(row/column i of edge A <--> B) but you can do better by adding B's cost vector to the row/column before computing the minimum. This way you also consider B's affinitiy for %R0. In fact, it seems even worse than this; if RN is used, then all neighbors are disconnected. So during backpropagation, unless I'm missing something, it seems we will only ever consider the node cost, never adding in any edge cost. On Fri, 3 Jun 2016 at 19:04 James Molloy <james at jamesmolloy.co.uk> wrote:> Hi Sebastien, > > > I think one idea to improve the situation is to consider the cost vector > of adjacent nodes during RN. Let's say you decided to do a RN for node A > and want to compute the costs for choosing register %Ri. The current > implementation does this by computing min(row/column i of edge A <--> B) > but you can do better by adding B's cost vector to the row/column before > computing the minimum. This way you also consider B's affinitiy for %R0. > > Am I right in thinking that this extra calculation would take place during > the backpropagation phase? > > I'm also interested as to whether such a calculation would be sufficient > when more copies are inserted (a chain of copies): > > COPY COPY > A <--------> B <---------> C > > If we reduce B last, and we only account for neighboring affinities at the > backpropagation phase, we will not get an optimal allocation. However if we > could somehow propagate neighboring affinities at the *reduction* stage, > perhaps we might get a transitivity property emerging that could solve the > above case optimally? > > I'm very much a newbie in this area so be gentle on me :) > > Cheers, > > James > > On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison < > Arnaud.DeGrandmaison at arm.com> wrote: > >> Hi James, >> >> >> >> I’ve tried to play in the past with the allocation order, which can >> definitely be tweaked and improved. The metric we use for spill cost being >> what it is (i.e. not targeted for PBQP, but that’s a different subject), I >> found it made real sense to use some other heuristics to sort nodes (on top >> of the spill cost) when there was a tie between them. Of course, that’s a >> heuristic and it can sometimes be wrong. Another option I tried was to >> ensure the nodes’ allocation order gets updated when some costs are >> propagated. >> >> >> >> I have also tried to implement something along the lines of what >> Sebastian suggested (i.e. considering the cost of adjacent nodes). This >> made the allocation better, but allocation time went very high --- but I >> had no time to try to improve the heuristic. >> >> >> >> I did not know about Sebastian et al paper, so thanks for pointing to it >> ! >> >> >> >> Cheers, >> >> Arnaud >> >> >> >> *From:* Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu] >> *Sent:* 02 June 2016 20:57 >> *To:* llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com; >> Arnaud De Grandmaison >> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation >> >> >> >> On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote: >> >> Hi Lang and Arnaud, >> >> >> >> I've been testing out the PBQP allocator for Thumb-2 and have ran into a >> problem I'd love to get your input on. >> >> >> >> The problem is exemplfied in the codegen for the function @bar in the >> attached IR file: >> >> >> >> bar: >> >> push {r4, lr} >> >> sub sp, #12 >> >> (1) movw r2, :lower16:.L_MergedGlobals >> >> (1) movt r2, :upper16:.L_MergedGlobals >> >> ldm.w r2, {r0, r1, r3, r12, lr} >> >> ldrd r4, r2, [r2, #20] >> >> strd lr, r4, [sp] >> >> str r2, [sp, #8] >> >> (2) mov r2, r3 **** >> >> mov r3, r12 **** >> >> bl baz >> >> add sp, #12 >> >> pop {r4, pc} >> >> >> >> The two moves marked with **** are unnecessary. Especially the first, >> which could be removed simply by swapping r2 and r3. This becomes even more >> pronounced when I teach PBQP about how best to use registers to allow >> LDM/STM formation; the codegen is perfect apart from those two moves that >> just won't go. >> >> >> >> I've discovered that the underlying problem is that line (1) is reduced >> after line (2). During the backpropagation phase (1) is allocated r2 which >> means (2) cannot, as they interfere. >> >> >> >> Interestingly they're both reduced by rule RN and as the spill cost is >> the same between them (and they're both conservatively allocatable), it's >> pure luck which gets allocated first. I have a patch to account for the >> opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost >> to r2 in (1)'s affinity matrix; this seems to work. However I don't really >> know what's the right approach here - dealing with this problem purely by >> propagating opportunity costs through affinity matrices or doing something >> better with the reduction order. >> >> >> >> This ties in with another problem I'm seeing with a prototype live-range >> splitter for PBQP. Obviously when pre-splitting around every instruction, >> many copies are inserted. But the above problem rears its head again! >> Consider this: >> >> >> >> %vregB = COPY %R0 >> >> %vregA = COPY %vregB >> >> >> >> COPY >> >> A <------> B >> >> >> >> Node B has an affinity for register R0. The affinity edge between A and B >> is a simple coalescing affinity (identity matrix). If B is assigned first, >> it will get R0 and then A will also select R0. However if A is selected >> first then it doesn't know anything about node B's affinities and could >> thus get any register, causing a MOV. >> >> >> >> The above trivial example would be reduced by rule R1, so would actually >> work optimally (because the reduction would account for B's costs in A). >> However it is trivial to add two or more interfering defs that cause the >> reduction rule to change to RN: >> >> >> >> %vregC = def... >> >> %vregD = def... >> >> %vregB = COPY %R0 >> >> %vregA = COPY %vregB >> >> = use %vregC, %vregD >> >> I think one idea to improve the situation is to consider the cost vector >> of adjacent nodes during RN. Let's say you decided to do a RN for node A >> and want to compute the costs for choosing register %Ri. The current >> implementation does this by computing min(row/column i of edge A <--> B) >> but you can do better by adding B's cost vector to the row/column before >> computing the minimum. This way you also consider B's affinitiy for %R0. >> >> >> >> >> Now the reduction order is arbitrary and local costs aren't propagated >> unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1] >> that seeks to fix a very similar case to this. >> >> Porting RM to that situation would basically say "always fulfill this >> affinity" (beween A and B). Of course that's the wrong choice, if you >> really need the copy. >> >> Best regards, >> Sebastian >> >> >> >> >> What do you think? >> >> >> >> Cheers, >> >> >> >> James >> >> >> >> [1] SSA-based Register Allocation with PBQP, Buchwald et al, >> https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >> IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy the >> information in any medium. Thank you. >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160603/bad3bac4/attachment.html>
Lang Hames via llvm-dev
2016-Jun-03 21:14 UTC
[llvm-dev] PBQP register allocation and copy propagation
Hi James, Arnaud, Sebastian, I think one idea to improve the situation is to consider the cost vector of> adjacent nodes during RN. Let's say you decided to do a RN for node A and > want to compute the costs for choosing register %Ri. The current > implementation does this by computing min(row/column i of edge A <--> B) > but you can do better by adding B's cost vector to the row/column before > computing the minimum. This way you also consider B's affinitiy for %R0.The current solver does not use an "RN" rule as such (certainly nothing comparable to what was discussed in the 2002 Scholz/Eckstein paper). We have a heuristic rule for selecting nodes for reduction, but we do not propagate costs from heuristically reduced nodes until backpropagation. In fact, it seems even worse than this; if RN is used, then all neighbors> are disconnected. So during backpropagation, unless I'm missing something, > it seems we will only ever consider the node cost, never adding in any edge > cost.I think you're right, but just to be precise: In the current solver any heuristically reduced node is colored during backpropagation taking into account the costs for nodes further up the stack (i.e. ones that have been colored earlier during backpropagation) only. Nodes further down in the stack are not considered. The metric we use for spill cost being what it is (i.e. not targeted for> PBQP, but that’s a different subject)...Yes - this is also a very interesting subject. So: There are four things that we can tweak that I know of: (1) Spill cost metrics. (2) Heuristic node *selection*. (Currently "provably colorable nodes first (in no particular order as far as I remember), possibly uncolorable nodes last in order of spill cost") (3) Speculative assignment during heuristic reduction - propagate costs onto neighbors of the RN node based on a localized guess at the best solution. (4) Look-ahead during backpropagation. All of these are worth looking at, but I'd be inclined to look at (2) (maybe we should reduce nodes with many affinity edges later, so that they're colored earlier?) and (4) (a user-configurable lookahead would let clients trade compile-time for allocation quality smoothly). James - how can I reproduce your test case? I tried with r271685 using 'llc -O3 -regalloc=pbqp zia.ll', but I got: bar: .fnstart @ BB#0: .save {r0, r1, r2, r3, r4, lr} push {r0, r1, r2, r3, r4, lr} ldr r0, .LCPI0_0 ldm.w r0, {r1, r2, r3, r12, lr} ldrd r4, r0, [r0, #20] strd lr, r4, [sp] str r0, [sp, #8] mov r0, r1 mov r1, r2 mov r2, r3 mov r3, r12 bl baz pop {r0, r1, r2, r3, r4, pc} .p2align 2 which has even more registers used. Cheers, Lang. On Fri, Jun 3, 2016 at 11:09 AM, James Molloy <james at jamesmolloy.co.uk> wrote:> Hi, > > > > I think one idea to improve the situation is to consider the cost > vector of adjacent nodes during RN. Let's say you decided to do a RN for > node A and want to compute the costs for choosing register %Ri. The current > implementation does this by computing min(row/column i of edge A <--> B) > but you can do better by adding B's cost vector to the row/column before > computing the minimum. This way you also consider B's affinitiy for %R0. > > In fact, it seems even worse than this; if RN is used, then all neighbors > are disconnected. So during backpropagation, unless I'm missing something, > it seems we will only ever consider the node cost, never adding in any edge > cost. > > On Fri, 3 Jun 2016 at 19:04 James Molloy <james at jamesmolloy.co.uk> wrote: > >> Hi Sebastien, >> >> > I think one idea to improve the situation is to consider the cost >> vector of adjacent nodes during RN. Let's say you decided to do a RN for >> node A and want to compute the costs for choosing register %Ri. The current >> implementation does this by computing min(row/column i of edge A <--> B) >> but you can do better by adding B's cost vector to the row/column before >> computing the minimum. This way you also consider B's affinitiy for %R0. >> >> Am I right in thinking that this extra calculation would take place >> during the backpropagation phase? >> >> I'm also interested as to whether such a calculation would be sufficient >> when more copies are inserted (a chain of copies): >> >> COPY COPY >> A <--------> B <---------> C >> >> If we reduce B last, and we only account for neighboring affinities at >> the backpropagation phase, we will not get an optimal allocation. However >> if we could somehow propagate neighboring affinities at the *reduction* >> stage, perhaps we might get a transitivity property emerging that could >> solve the above case optimally? >> >> I'm very much a newbie in this area so be gentle on me :) >> >> Cheers, >> >> James >> >> On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison < >> Arnaud.DeGrandmaison at arm.com> wrote: >> >>> Hi James, >>> >>> >>> >>> I’ve tried to play in the past with the allocation order, which can >>> definitely be tweaked and improved. The metric we use for spill cost being >>> what it is (i.e. not targeted for PBQP, but that’s a different subject), I >>> found it made real sense to use some other heuristics to sort nodes (on top >>> of the spill cost) when there was a tie between them. Of course, that’s a >>> heuristic and it can sometimes be wrong. Another option I tried was to >>> ensure the nodes’ allocation order gets updated when some costs are >>> propagated. >>> >>> >>> >>> I have also tried to implement something along the lines of what >>> Sebastian suggested (i.e. considering the cost of adjacent nodes). This >>> made the allocation better, but allocation time went very high --- but I >>> had no time to try to improve the heuristic. >>> >>> >>> >>> I did not know about Sebastian et al paper, so thanks for pointing to it >>> ! >>> >>> >>> >>> Cheers, >>> >>> Arnaud >>> >>> >>> >>> *From:* Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu] >>> *Sent:* 02 June 2016 20:57 >>> *To:* llvm-dev at lists.llvm.org; james at jamesmolloy.co.uk; lhames at gmail.com; >>> Arnaud De Grandmaison >>> *Subject:* Re: [llvm-dev] PBQP register allocation and copy propagation >>> >>> >>> >>> On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote: >>> >>> Hi Lang and Arnaud, >>> >>> >>> >>> I've been testing out the PBQP allocator for Thumb-2 and have ran into a >>> problem I'd love to get your input on. >>> >>> >>> >>> The problem is exemplfied in the codegen for the function @bar in the >>> attached IR file: >>> >>> >>> >>> bar: >>> >>> push {r4, lr} >>> >>> sub sp, #12 >>> >>> (1) movw r2, :lower16:.L_MergedGlobals >>> >>> (1) movt r2, :upper16:.L_MergedGlobals >>> >>> ldm.w r2, {r0, r1, r3, r12, lr} >>> >>> ldrd r4, r2, [r2, #20] >>> >>> strd lr, r4, [sp] >>> >>> str r2, [sp, #8] >>> >>> (2) mov r2, r3 **** >>> >>> mov r3, r12 **** >>> >>> bl baz >>> >>> add sp, #12 >>> >>> pop {r4, pc} >>> >>> >>> >>> The two moves marked with **** are unnecessary. Especially the first, >>> which could be removed simply by swapping r2 and r3. This becomes even more >>> pronounced when I teach PBQP about how best to use registers to allow >>> LDM/STM formation; the codegen is perfect apart from those two moves that >>> just won't go. >>> >>> >>> >>> I've discovered that the underlying problem is that line (1) is reduced >>> after line (2). During the backpropagation phase (1) is allocated r2 which >>> means (2) cannot, as they interfere. >>> >>> >>> >>> Interestingly they're both reduced by rule RN and as the spill cost is >>> the same between them (and they're both conservatively allocatable), it's >>> pure luck which gets allocated first. I have a patch to account for the >>> opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost >>> to r2 in (1)'s affinity matrix; this seems to work. However I don't really >>> know what's the right approach here - dealing with this problem purely by >>> propagating opportunity costs through affinity matrices or doing something >>> better with the reduction order. >>> >>> >>> >>> This ties in with another problem I'm seeing with a prototype live-range >>> splitter for PBQP. Obviously when pre-splitting around every instruction, >>> many copies are inserted. But the above problem rears its head again! >>> Consider this: >>> >>> >>> >>> %vregB = COPY %R0 >>> >>> %vregA = COPY %vregB >>> >>> >>> >>> COPY >>> >>> A <------> B >>> >>> >>> >>> Node B has an affinity for register R0. The affinity edge between A and >>> B is a simple coalescing affinity (identity matrix). If B is assigned >>> first, it will get R0 and then A will also select R0. However if A is >>> selected first then it doesn't know anything about node B's affinities and >>> could thus get any register, causing a MOV. >>> >>> >>> >>> The above trivial example would be reduced by rule R1, so would actually >>> work optimally (because the reduction would account for B's costs in A). >>> However it is trivial to add two or more interfering defs that cause the >>> reduction rule to change to RN: >>> >>> >>> >>> %vregC = def... >>> >>> %vregD = def... >>> >>> %vregB = COPY %R0 >>> >>> %vregA = COPY %vregB >>> >>> = use %vregC, %vregD >>> >>> I think one idea to improve the situation is to consider the cost vector >>> of adjacent nodes during RN. Let's say you decided to do a RN for node A >>> and want to compute the costs for choosing register %Ri. The current >>> implementation does this by computing min(row/column i of edge A <--> B) >>> but you can do better by adding B's cost vector to the row/column before >>> computing the minimum. This way you also consider B's affinitiy for %R0. >>> >>> >>> >>> >>> Now the reduction order is arbitrary and local costs aren't propagated >>> unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1] >>> that seeks to fix a very similar case to this. >>> >>> Porting RM to that situation would basically say "always fulfill this >>> affinity" (beween A and B). Of course that's the wrong choice, if you >>> really need the copy. >>> >>> Best regards, >>> Sebastian >>> >>> >>> >>> >>> What do you think? >>> >>> >>> >>> Cheers, >>> >>> >>> >>> James >>> >>> >>> >>> [1] SSA-based Register Allocation with PBQP, Buchwald et al, >>> https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf >>> >>> >>> >>> >>> _______________________________________________ >>> >>> LLVM Developers mailing list >>> >>> llvm-dev at lists.llvm.org >>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >>> IMPORTANT NOTICE: The contents of this email and any attachments are >>> confidential and may also be privileged. If you are not the intended >>> recipient, please notify the sender immediately and do not disclose the >>> contents to any other person, use it for any purpose, or store or copy the >>> information in any medium. Thank you. >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160603/e113c883/attachment.html>
Arnaud De Grandmaison via llvm-dev
2016-Jun-04 13:05 UTC
[llvm-dev] PBQP register allocation and copy propagation
> (1) Spill cost metrics.I think this one cannot be dissociated from some kind of live range pre-splitting. Cheers, Arnaud From: Lang Hames [mailto:lhames at gmail.com] Sent: 03 June 2016 23:15 To: James Molloy Cc: Arnaud De Grandmaison; llvm-dev at lists.llvm.org; Sebastian Buchwald; Bernhard Scholz Subject: Re: [llvm-dev] PBQP register allocation and copy propagation Hi James, Arnaud, Sebastian, I think one idea to improve the situation is to consider the cost vector of adjacent nodes during RN. Let's say you decided to do a RN for node A and want to compute the costs for choosing register %Ri. The current implementation does this by computing min(row/column i of edge A <--> B) but you can do better by adding B's cost vector to the row/column before computing the minimum. This way you also consider B's affinitiy for %R0. The current solver does not use an "RN" rule as such (certainly nothing comparable to what was discussed in the 2002 Scholz/Eckstein paper). We have a heuristic rule for selecting nodes for reduction, but we do not propagate costs from heuristically reduced nodes until backpropagation. In fact, it seems even worse than this; if RN is used, then all neighbors are disconnected. So during backpropagation, unless I'm missing something, it seems we will only ever consider the node cost, never adding in any edge cost. I think you're right, but just to be precise: In the current solver any heuristically reduced node is colored during backpropagation taking into account the costs for nodes further up the stack (i.e. ones that have been colored earlier during backpropagation) only. Nodes further down in the stack are not considered. The metric we use for spill cost being what it is (i.e. not targeted for PBQP, but that’s a different subject)... Yes - this is also a very interesting subject. So: There are four things that we can tweak that I know of: (1) Spill cost metrics. (2) Heuristic node selection. (Currently "provably colorable nodes first (in no particular order as far as I remember), possibly uncolorable nodes last in order of spill cost") (3) Speculative assignment during heuristic reduction - propagate costs onto neighbors of the RN node based on a localized guess at the best solution. (4) Look-ahead during backpropagation. All of these are worth looking at, but I'd be inclined to look at (2) (maybe we should reduce nodes with many affinity edges later, so that they're colored earlier?) and (4) (a user-configurable lookahead would let clients trade compile-time for allocation quality smoothly). James - how can I reproduce your test case? I tried with r271685 using 'llc -O3 -regalloc=pbqp zia.ll', but I got: bar: .fnstart @ BB#0: .save {r0, r1, r2, r3, r4, lr} push {r0, r1, r2, r3, r4, lr} ldr r0, .LCPI0_0 ldm.w r0, {r1, r2, r3, r12, lr} ldrd r4, r0, [r0, #20] strd lr, r4, [sp] str r0, [sp, #8] mov r0, r1 mov r1, r2 mov r2, r3 mov r3, r12 bl baz pop {r0, r1, r2, r3, r4, pc} .p2align 2 which has even more registers used. Cheers, Lang. On Fri, Jun 3, 2016 at 11:09 AM, James Molloy <james at jamesmolloy.co.uk<mailto:james at jamesmolloy.co.uk>> wrote: Hi,> > I think one idea to improve the situation is to consider the cost vector of adjacent nodes during RN. Let's say you decided to do a RN for node A and want to compute the costs for choosing register %Ri. The current implementation does this by computing min(row/column i of edge A <--> B) but you can do better by adding B's cost vector to the row/column before computing the minimum. This way you also consider B's affinitiy for %R0.In fact, it seems even worse than this; if RN is used, then all neighbors are disconnected. So during backpropagation, unless I'm missing something, it seems we will only ever consider the node cost, never adding in any edge cost. On Fri, 3 Jun 2016 at 19:04 James Molloy <james at jamesmolloy.co.uk<mailto:james at jamesmolloy.co.uk>> wrote: Hi Sebastien,> I think one idea to improve the situation is to consider the cost vector of adjacent nodes during RN. Let's say you decided to do a RN for node A and want to compute the costs for choosing register %Ri. The current implementation does this by computing min(row/column i of edge A <--> B) but you can do better by adding B's cost vector to the row/column before computing the minimum. This way you also consider B's affinitiy for %R0.Am I right in thinking that this extra calculation would take place during the backpropagation phase? I'm also interested as to whether such a calculation would be sufficient when more copies are inserted (a chain of copies): COPY COPY A <--------> B <---------> C If we reduce B last, and we only account for neighboring affinities at the backpropagation phase, we will not get an optimal allocation. However if we could somehow propagate neighboring affinities at the *reduction* stage, perhaps we might get a transitivity property emerging that could solve the above case optimally? I'm very much a newbie in this area so be gentle on me :) Cheers, James On Fri, 3 Jun 2016 at 15:33 Arnaud De Grandmaison <Arnaud.DeGrandmaison at arm.com<mailto:Arnaud.DeGrandmaison at arm.com>> wrote: Hi James, I’ve tried to play in the past with the allocation order, which can definitely be tweaked and improved. The metric we use for spill cost being what it is (i.e. not targeted for PBQP, but that’s a different subject), I found it made real sense to use some other heuristics to sort nodes (on top of the spill cost) when there was a tie between them. Of course, that’s a heuristic and it can sometimes be wrong. Another option I tried was to ensure the nodes’ allocation order gets updated when some costs are propagated. I have also tried to implement something along the lines of what Sebastian suggested (i.e. considering the cost of adjacent nodes). This made the allocation better, but allocation time went very high --- but I had no time to try to improve the heuristic. I did not know about Sebastian et al paper, so thanks for pointing to it ! Cheers, Arnaud From: Sebastian Buchwald [mailto:Sebastian.Buchwald at kit.edu<mailto:Sebastian.Buchwald at kit.edu>] Sent: 02 June 2016 20:57 To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; james at jamesmolloy.co.uk<mailto:james at jamesmolloy.co.uk>; lhames at gmail.com<mailto:lhames at gmail.com>; Arnaud De Grandmaison Subject: Re: [llvm-dev] PBQP register allocation and copy propagation On 06/02/2016 07:22 PM, James Molloy via llvm-dev wrote: Hi Lang and Arnaud, I've been testing out the PBQP allocator for Thumb-2 and have ran into a problem I'd love to get your input on. The problem is exemplfied in the codegen for the function @bar in the attached IR file: bar: push {r4, lr} sub sp, #12 (1) movw r2, :lower16:.L_MergedGlobals (1) movt r2, :upper16:.L_MergedGlobals ldm.w r2, {r0, r1, r3, r12, lr} ldrd r4, r2, [r2, #20] strd lr, r4, [sp] str r2, [sp, #8] (2) mov r2, r3 **** mov r3, r12 **** bl baz add sp, #12 pop {r4, pc} The two moves marked with **** are unnecessary. Especially the first, which could be removed simply by swapping r2 and r3. This becomes even more pronounced when I teach PBQP about how best to use registers to allow LDM/STM formation; the codegen is perfect apart from those two moves that just won't go. I've discovered that the underlying problem is that line (1) is reduced after line (2). During the backpropagation phase (1) is allocated r2 which means (2) cannot, as they interfere. Interestingly they're both reduced by rule RN and as the spill cost is the same between them (and they're both conservatively allocatable), it's pure luck which gets allocated first. I have a patch to account for the opportunity cost of allocating r2 to (1) instead of (2) by assigning a cost to r2 in (1)'s affinity matrix; this seems to work. However I don't really know what's the right approach here - dealing with this problem purely by propagating opportunity costs through affinity matrices or doing something better with the reduction order. This ties in with another problem I'm seeing with a prototype live-range splitter for PBQP. Obviously when pre-splitting around every instruction, many copies are inserted. But the above problem rears its head again! Consider this: %vregB = COPY %R0 %vregA = COPY %vregB COPY A <------> B Node B has an affinity for register R0. The affinity edge between A and B is a simple coalescing affinity (identity matrix). If B is assigned first, it will get R0 and then A will also select R0. However if A is selected first then it doesn't know anything about node B's affinities and could thus get any register, causing a MOV. The above trivial example would be reduced by rule R1, so would actually work optimally (because the reduction would account for B's costs in A). However it is trivial to add two or more interfering defs that cause the reduction rule to change to RN: %vregC = def... %vregD = def... %vregB = COPY %R0 %vregA = COPY %vregB = use %vregC, %vregD I think one idea to improve the situation is to consider the cost vector of adjacent nodes during RN. Let's say you decided to do a RN for node A and want to compute the costs for choosing register %Ri. The current implementation does this by computing min(row/column i of edge A <--> B) but you can do better by adding B's cost vector to the row/column before computing the minimum. This way you also consider B's affinitiy for %R0. Now the reduction order is arbitrary and local costs aren't propagated unlike R1 and R2. In fact, there is a rule "RM" proposed by Buchwald [1] that seeks to fix a very similar case to this. Porting RM to that situation would basically say "always fulfill this affinity" (beween A and B). Of course that's the wrong choice, if you really need the copy. Best regards, Sebastian What do you think? Cheers, James [1] SSA-based Register Allocation with PBQP, Buchwald et al, https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald11cc.pdf _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160604/d7440344/attachment-0001.html>