> A node should never be put into the conservatively allocatable list if there is a chance of it spilling.I can understand why the logic of NodeMetadata::isConservativelyAllocatable is necessary for the node to be allocatable, but I have not been able to convince myself this is sufficient, especially when the node degree > available registers. Cheers, Arnaud From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Lang Hames Sent: 27 January 2015 06:06 To: Jonas Paulsson Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] PBQP crash Hi Jonas,> * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case.Arnaud is correct: A node should never be put into the conservatively allocatable list if there is a chance of it spilling. Off the top of my head I can imagine 2 things going wrong here: (1) Conservative allocability is mostly-precomputed, and updated with deltas as nodes are removed. It is possible that there is some subtle bug in this code. (2) I think the current conservative allocability test bakes in the assumption that all register options have non-infinite cost. If you assign infinite costs to any physical register I would expect this to blow up. Are you able to share a test case at all? If so that would be great. If not, I can add an option to the allocator to dump abstract PBQP graphs, and I could use these to test the problem on my end. Regards, Lang. On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson <jonas.paulsson at ericsson.com> wrote: Hi, I have run into a test case on an out-of-tree target where PBQP fails to complete register allocation after “Attempting to spill already spilled value” (the triggered assert in InlineSpiller::spill(). First, the original LiveInterval is spilled. It is a load of a symbol into a narrow register class, i.e. a subset of the class of address registers. InlineSpiller decides to rematerialize the load of the symbol to lie right before its only user, which makes good sense. The original def is removed. The new LiveInterval pushed is thus much smaller in the next PBQP round. The spill cost is marked as ‘inf’ during graph building. This small interval has also a lot of overlapping intervals and thus edges in the PBQP graph. It gets pushed on the node stack to later be popped after 17 others. Those 17 nodes use up all registers of the narrow reg-class, and the cost vector has become all infinities. Spill option is selected again, and thus the error is a fact of spilling an already spilled value. I wonder what has gone wrong here, and have some initial thoughts: * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case. In setup(), the node is first put into not-provably-allocatables. However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it into conservatively-allocatables, because DeniedOpts had become less than NumOpts (in isConservativelyAllocatable(). * There are lots of spillable nodes being popped before the one that can’t be spilled. This seems intuitively wrong, as they are intervals that actually could be spilled. I would really appreciate some help and pointers on what might be going wrong here, Jonas Paulsson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/6a98134e/attachment.html>
Hi, Sorry for the delay, it has taken some extra time as more than one bug showed up ☺ I continued to look into this with your viewpoint that a node that is conservatively allocatable should never be spilled. The first thing I did was therefore to add some extra code with an assert for this. I believe I then found three bugs and fixed the two: Bug 1: Incorrect transpositions in handleAddEdge() / hanldeRemoveEdge(). For the heuristic of DeniedOpts, if N runs along the columns (Transpose == true), the interesting dimension is along the rows, and vice versa. In other words, the question is what the neighbour could maximally deny N, and the answer is found in WorstRow, not WorstCol (as it was). This makes a difference if sub/super registers overlap. Bug 2: When it happens that there are no available physical registers for a node (e.g because of regmasks or calls), the number of rows / columns is just 1, for the spill option. This case must be detected in MatrixMetadata(), or WorstColCountForCurRow will get an undefined value, as std::max_element() is called with an empty vector. Bug 3: Again a conservatively allocatable node was spilled, and the assert triggered. This is a description of what happened in my out-of-tree test case: applyR2() called on node N, which is overlapping Y and Z. The edges (N,Y) and (N,Z) are all-zeroes. Y and Z are overlapping and already have an interference edge. Z is just on the limit of not being conservatively allocatable: NumOpts is 8 and DeniedOpts is also 8. It is contained in NotProvablyAllocatableNodes. G.setEdgeCosts() is called and then the call stack grows with Solver->handleSetEdgeCosts(), handleRemoveEdge() into handleDisconnectEdge(), where NMd.handleRemoveEdge() is called, which decreases the DeniedOpts by one. After this, it looks like a bug to me that Z is moved to ConservativelyAllocatableNodes, because eventually handleSetEdgeCosts() will complete, and the edge between Y and Z will have been added again in handleAddEdge(), and Z:DeniedOpts is again 8! I think this also shows up in a test case for arm. It was found by using the assert mentioned above, and running 'bin/llvm-stress -size 200 -seed 17761'. The attached file (pbqp_reduced.ll) is the failing test case found reduced with bugpoint. Apply patch 2 (the assert), and then run 'llc pbqp_reduced.ll -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57 -mattr=+neon -optimize-regalloc -regalloc=pbqp'. The assert shows a node that is spilled that was conservatively allocatable. The test case itself will not fail without this assert. Add the Verbose-PBQP-dump patch and use '-debug-only=regalloc -pbqp-dump-graphs' to see the more verbose output which shows the progress of the algorithm leading to the assert: * Applied R2(NId 18)handleDisconnectEdge(9, 2) : DeniedOpts 10 -> 9 NId 9(%vreg15, GPR64common) moved to conservatively-allocatables. handleDisconnectEdge(2, 9) : DeniedOpts 10 -> 9 NId 2(%vreg4, GPR64common) moved to conservatively-allocatables. ... Popped NId 2(%vreg4, GPR64common) , all edge costs added: 2.002748e+01 inf inf inf inf inf inf inf inf inf inf ** selection: 0 llc: ../include/llvm/CodeGen/PBQP/ReductionRules.h:214: llvm::PBQP::Solution llvm::PBQP::backpropagate(GraphT&, StackT, llvm::PBQP::NodeSet&) [with GraphT = llvm::PBQP::Gra ph<llvm::PBQP::RegAlloc::RegAllocSolverImpl>; StackT = std::vector<unsigned int>; llvm::PBQP::NodeSet = std::set<unsigned int>]: Assertion `PushedAsConservativelyAllocatabl eNodes.find(NId) == PushedAsConservativelyAllocatableNodes.end() && "Node in conservatively allocatable set spilled!"' failed. I am not that familiar with the arm architecture, but it looks like NId 2 is incorrectly moved from not-provens to conservatively-allocables, after R2(NId 18). This looks to me like the same bug, but I could be wrong. Note that it *does not show up if the bugfix for the transpositions is applied* (for random reasons I beleive). I am not sure how to fix it myself, as it seems non-trivial relating to temporary edge removal. Does anyone have any idea? In addition to the reduced test case, I attach four patches, where 1 and 4 are not ready to commit, but I would be happy to modify and commit them if wanted: 1. 0001-Assert… The assert (with some extra code related to it) that checks that a node that was conservatively allocatable never spills. My quick way to achieve this was to use an extra set of nodes as a means of remembering their origin. This is a very good idea, considering that this should never happen, and that there is some worry that the code might be incorrect. 2. 0001-Bugfix-in-PBQP-matrix-transpositions… A bugfix for the transposition bug. (Bug 1). Small patch and also sent to llvm-commits for review. 3. 0001-Bugfix-in-PBQP-regarding-Matrixes-and… A bugfix for the 1-column case (Bug 2). Small patch and also sent to llvm-commits for review. 4. 0001-Verbose… Increased debug-dump and pbqp-graph output. Quite crude code, because I had a conflict in the ARM backend when I tried to #define DEBUG_TYPE "regalloc". I think this could help PBQP get a wider audience, perhaps, even if it is still far from perfect. Perhaps this could be cleaned up and a target-independent name for all pbqp-debug-dumps could be found and then commited. Looking forward to your reply, Jonas Paulsson From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] Sent: den 27 januari 2015 08:43 To: 'Lang Hames'; Jonas Paulsson Cc: llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] PBQP crash> A node should never be put into the conservatively allocatable list if there is a chance of it spilling.I can understand why the logic of NodeMetadata::isConservativelyAllocatable is necessary for the node to be allocatable, but I have not been able to convince myself this is sufficient, especially when the node degree > available registers. Cheers, Arnaud From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Lang Hames Sent: 27 January 2015 06:06 To: Jonas Paulsson Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: Re: [LLVMdev] PBQP crash Hi Jonas,> * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case.Arnaud is correct: A node should never be put into the conservatively allocatable list if there is a chance of it spilling. Off the top of my head I can imagine 2 things going wrong here: (1) Conservative allocability is mostly-precomputed, and updated with deltas as nodes are removed. It is possible that there is some subtle bug in this code. (2) I think the current conservative allocability test bakes in the assumption that all register options have non-infinite cost. If you assign infinite costs to any physical register I would expect this to blow up. Are you able to share a test case at all? If so that would be great. If not, I can add an option to the allocator to dump abstract PBQP graphs, and I could use these to test the problem on my end. Regards, Lang. On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson <jonas.paulsson at ericsson.com<mailto:jonas.paulsson at ericsson.com>> wrote: Hi, I have run into a test case on an out-of-tree target where PBQP fails to complete register allocation after “Attempting to spill already spilled value” (the triggered assert in InlineSpiller::spill(). First, the original LiveInterval is spilled. It is a load of a symbol into a narrow register class, i.e. a subset of the class of address registers. InlineSpiller decides to rematerialize the load of the symbol to lie right before its only user, which makes good sense. The original def is removed. The new LiveInterval pushed is thus much smaller in the next PBQP round. The spill cost is marked as ‘inf’ during graph building. This small interval has also a lot of overlapping intervals and thus edges in the PBQP graph. It gets pushed on the node stack to later be popped after 17 others. Those 17 nodes use up all registers of the narrow reg-class, and the cost vector has become all infinities. Spill option is selected again, and thus the error is a fact of spilling an already spilled value. I wonder what has gone wrong here, and have some initial thoughts: * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case. In setup(), the node is first put into not-provably-allocatables. However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it into conservatively-allocatables, because DeniedOpts had become less than NumOpts (in isConservativelyAllocatable(). * There are lots of spillable nodes being popped before the one that can’t be spilled. This seems intuitively wrong, as they are intervals that actually could be spilled. I would really appreciate some help and pointers on what might be going wrong here, Jonas Paulsson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/48433cc8/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: pbqp_reduced.ll Type: application/octet-stream Size: 3152 bytes Desc: pbqp_reduced.ll URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/48433cc8/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Assert-in-PBQP-that-a-node-that-selects-0-spilled-wa.patch Type: application/octet-stream Size: 2792 bytes Desc: 0001-Assert-in-PBQP-that-a-node-that-selects-0-spilled-wa.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/48433cc8/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Bugfix-in-PBQP-matrix-transpositions.patch Type: application/octet-stream Size: 1899 bytes Desc: 0001-Bugfix-in-PBQP-matrix-transpositions.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/48433cc8/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Bugfix-in-PBQP-regarding-Matrixes-and-spill-only-nod.patch Type: application/octet-stream Size: 1257 bytes Desc: 0001-Bugfix-in-PBQP-regarding-Matrixes-and-spill-only-nod.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/48433cc8/attachment-0003.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Verbose-PBQP-dump-output.patch Type: application/octet-stream Size: 12456 bytes Desc: 0001-Verbose-PBQP-dump-output.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/48433cc8/attachment-0004.obj>
Hi Arnaud, The conservatively allocatable test is supposed to check two conditions, either of which would be sufficient to make a node allocatable: (1) There exists some register that is not aliased by any register option for any neighbor. This is the "safe row" test. It is straightforward, but likely to fire only rarely. (2) The sum of the maximum number of registers aliased by any register for each neighbor is less than the number of available registers for this node. This is the "worst-column" test. More intuitively: for each neighbor you compute the maximum number of registers that could be "taken" from this node by an allocation to that neighbor. If you assume (conservatively) that none of these "taken" registers alias one another then you can sum them to find the maximum number of registers that might be unavailable to this node. If this sum is lower than the number of registers available then you're safely allocatable. Cheers, Lang. On Mon, Jan 26, 2015 at 11:42 PM, Arnaud A. de Grandmaison < arnaud.degrandmaison at arm.com> wrote:> > A node should never be put into the conservatively allocatable list if > there is a chance of it spilling. > > > > I can understand why the logic of > NodeMetadata::isConservativelyAllocatable is necessary for the node to be > allocatable, but I have not been able to convince myself this is > sufficient, especially when the node degree > available registers. > > > > Cheers, > > Arnaud > > > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] *On > Behalf Of *Lang Hames > *Sent:* 27 January 2015 06:06 > *To:* Jonas Paulsson > *Cc:* llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] PBQP crash > > > > Hi Jonas, > > > > > * The problematic node that was spilled again, was in the > ConservativelyAllocatableNodes set during reduce(). The comment in reduce() > “Conservatively allocatable nodes will never spill…” indicates that perhaps > this is an incorrect insertion, as the regs did in fact run out in this > case. > > > > Arnaud is correct: A node should never be put into the conservatively > allocatable list if there is a chance of it spilling. Off the top of my > head I can imagine 2 things going wrong here: > > > > (1) Conservative allocability is mostly-precomputed, and updated with > deltas as nodes are removed. It is possible that there is some subtle bug > in this code. > > > > (2) I think the current conservative allocability test bakes in the > assumption that all register options have non-infinite cost. If you assign > infinite costs to any physical register I would expect this to blow up. > > > > Are you able to share a test case at all? If so that would be great. If > not, I can add an option to the allocator to dump abstract PBQP graphs, and > I could use these to test the problem on my end. > > > > Regards, > > Lang. > > > > > > On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson < > jonas.paulsson at ericsson.com> wrote: > > Hi, > > > > I have run into a test case on an out-of-tree target where PBQP fails to > complete register allocation after “Attempting to spill already spilled > value” (the triggered assert in InlineSpiller::spill(). > > > > First, the original LiveInterval is spilled. It is a load of a symbol into > a narrow register class, i.e. a subset of the class of address registers. > InlineSpiller decides to rematerialize the load of the symbol to lie right > before its only user, which makes good sense. The original def is removed. > > > > The new LiveInterval pushed is thus much smaller in the next PBQP round. > The spill cost is marked as ‘inf’ during graph building. This small > interval has also a lot of overlapping intervals and thus edges in the PBQP > graph. It gets pushed on the node stack to later be popped after 17 others. > > Those 17 nodes use up all registers of the narrow reg-class, and the cost > vector has become all infinities. Spill option is selected again, and thus > the error is a fact of spilling an already spilled value. > > > > I wonder what has gone wrong here, and have some initial thoughts: > > > > * The problematic node that was spilled again, was in the > ConservativelyAllocatableNodes set during reduce(). The comment in reduce() > “Conservatively allocatable nodes will never spill…” indicates that perhaps > this is an incorrect insertion, as the regs did in fact run out in this > case. > > In setup(), the node is first put into not-provably-allocatables. > However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it > into conservatively-allocatables, because DeniedOpts had become less than > NumOpts (in isConservativelyAllocatable(). > > * There are lots of spillable nodes being popped before the one that can’t > be spilled. This seems intuitively wrong, as they are intervals that > actually could be spilled. > > > > I would really appreciate some help and pointers on what might be going > wrong here, > > > > Jonas Paulsson > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/587247ca/attachment.html>
Thanks for the explanations ! -- Arnaud From: Lang Hames [mailto:lhames at gmail.com] Sent: 30 January 2015 22:28 To: Arnaud De Grandmaison Cc: Jonas Paulsson; LLVM Developers Mailing List Subject: Re: [LLVMdev] PBQP crash Hi Arnaud, The conservatively allocatable test is supposed to check two conditions, either of which would be sufficient to make a node allocatable: (1) There exists some register that is not aliased by any register option for any neighbor. This is the "safe row" test. It is straightforward, but likely to fire only rarely. (2) The sum of the maximum number of registers aliased by any register for each neighbor is less than the number of available registers for this node. This is the "worst-column" test. More intuitively: for each neighbor you compute the maximum number of registers that could be "taken" from this node by an allocation to that neighbor. If you assume (conservatively) that none of these "taken" registers alias one another then you can sum them to find the maximum number of registers that might be unavailable to this node. If this sum is lower than the number of registers available then you're safely allocatable. Cheers, Lang. On Mon, Jan 26, 2015 at 11:42 PM, Arnaud A. de Grandmaison <arnaud.degrandmaison at arm.com<mailto:arnaud.degrandmaison at arm.com>> wrote:> A node should never be put into the conservatively allocatable list if there is a chance of it spilling.I can understand why the logic of NodeMetadata::isConservativelyAllocatable is necessary for the node to be allocatable, but I have not been able to convince myself this is sufficient, especially when the node degree > available registers. Cheers, Arnaud From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu> [mailto:llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>] On Behalf Of Lang Hames Sent: 27 January 2015 06:06 To: Jonas Paulsson Cc: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu> Subject: Re: [LLVMdev] PBQP crash Hi Jonas,> * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case.Arnaud is correct: A node should never be put into the conservatively allocatable list if there is a chance of it spilling. Off the top of my head I can imagine 2 things going wrong here: (1) Conservative allocability is mostly-precomputed, and updated with deltas as nodes are removed. It is possible that there is some subtle bug in this code. (2) I think the current conservative allocability test bakes in the assumption that all register options have non-infinite cost. If you assign infinite costs to any physical register I would expect this to blow up. Are you able to share a test case at all? If so that would be great. If not, I can add an option to the allocator to dump abstract PBQP graphs, and I could use these to test the problem on my end. Regards, Lang. On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson <jonas.paulsson at ericsson.com<mailto:jonas.paulsson at ericsson.com>> wrote: Hi, I have run into a test case on an out-of-tree target where PBQP fails to complete register allocation after “Attempting to spill already spilled value” (the triggered assert in InlineSpiller::spill(). First, the original LiveInterval is spilled. It is a load of a symbol into a narrow register class, i.e. a subset of the class of address registers. InlineSpiller decides to rematerialize the load of the symbol to lie right before its only user, which makes good sense. The original def is removed. The new LiveInterval pushed is thus much smaller in the next PBQP round. The spill cost is marked as ‘inf’ during graph building. This small interval has also a lot of overlapping intervals and thus edges in the PBQP graph. It gets pushed on the node stack to later be popped after 17 others. Those 17 nodes use up all registers of the narrow reg-class, and the cost vector has become all infinities. Spill option is selected again, and thus the error is a fact of spilling an already spilled value. I wonder what has gone wrong here, and have some initial thoughts: * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case. In setup(), the node is first put into not-provably-allocatables. However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it into conservatively-allocatables, because DeniedOpts had become less than NumOpts (in isConservativelyAllocatable(). * There are lots of spillable nodes being popped before the one that can’t be spilled. This seems intuitively wrong, as they are intervals that actually could be spilled. I would really appreciate some help and pointers on what might be going wrong here, Jonas Paulsson -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/caf5130e/attachment.html>
- Re-sending to include llvm-dev. HI Jonas, This is great - thank you very much for your analysis! You're spot on about Bug 1 - the row/column checks are transposed there. I have fixed this in r227628. Regarding Bug 2, as discussed on the other thread I'm going to teach the register allocator to prune single-option vregs so that they never make it into the graph. I haven't had a chance to dig in to Bug 3 yet, but I hope to soon. Cheers, Lang. On Thu, Jan 29, 2015 at 9:19 AM, Jonas Paulsson <jonas.paulsson at ericsson.com> wrote:> Hi, > > > > Sorry for the delay, it has taken some extra time as more than one bug > showed up J > > > > I continued to look into this with your viewpoint that a node that is > conservatively allocatable should never be spilled. The first thing I did > was therefore to add some extra code with an assert for this. > > > > I believe I then found three bugs and fixed the two: > > > > Bug 1: Incorrect transpositions in handleAddEdge() / hanldeRemoveEdge(). > For the heuristic of DeniedOpts, if N runs along the columns (Transpose => true), the interesting dimension is along the rows, and vice versa. In > other words, the question is what the neighbour could maximally deny N, and > the answer is found in WorstRow, not WorstCol (as it was). This makes a > difference if sub/super registers overlap. > > > > Bug 2: When it happens that there are no available physical registers for > a node (e.g because of regmasks or calls), the number of rows / columns is > just 1, for the spill option. This case must be detected in > MatrixMetadata(), or WorstColCountForCurRow will get an undefined value, as > std::max_element() is called with an empty vector. > > > > Bug 3: Again a conservatively allocatable node was spilled, and the assert > triggered. This is a description of what happened in my out-of-tree test > case: > > > > applyR2() called on node N, which is overlapping Y and Z. The edges (N,Y) > and (N,Z) are all-zeroes. Y and Z are overlapping and already have an > interference edge. Z is just on the limit of not being conservatively > allocatable: NumOpts is 8 and DeniedOpts is also 8. It is contained in > NotProvablyAllocatableNodes. G.setEdgeCosts() is called and then the call > stack grows with Solver->handleSetEdgeCosts(), handleRemoveEdge() into > handleDisconnectEdge(), where NMd.handleRemoveEdge() is called, which > decreases the DeniedOpts by one. After this, it looks like a bug to me that > Z is moved to ConservativelyAllocatableNodes, because eventually > handleSetEdgeCosts() will complete, and the edge between Y and Z will have > been added again in handleAddEdge(), and Z:DeniedOpts is again 8! > > > > I think this also shows up in a test case for arm. It was found by using > the assert mentioned above, and running 'bin/llvm-stress -size 200 -seed > 17761'. The attached file (pbqp_reduced.ll) is the failing test case found > reduced with bugpoint. Apply patch 2 (the assert), and then run 'llc > pbqp_reduced.ll -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57 > -mattr=+neon -optimize-regalloc -regalloc=pbqp'. The assert shows a node > that is spilled that was conservatively allocatable. The test case itself > will not fail without this assert. Add the Verbose-PBQP-dump patch and use > '-debug-only=regalloc -pbqp-dump-graphs' to see the more verbose output > which shows the progress of the algorithm leading to the assert: > > > > * Applied R2(NId 18)handleDisconnectEdge(9, 2) : DeniedOpts 10 -> 9 > > NId 9(%vreg15, GPR64common) moved to conservatively-allocatables. > > handleDisconnectEdge(2, 9) : DeniedOpts 10 -> 9 > > NId 2(%vreg4, GPR64common) moved to conservatively-allocatables. > > ... > > Popped NId 2(%vreg4, GPR64common) , all edge costs added: > > 2.002748e+01 inf inf inf inf inf inf inf inf inf inf ** selection: 0 > > llc: ../include/llvm/CodeGen/PBQP/ReductionRules.h:214: > llvm::PBQP::Solution llvm::PBQP::backpropagate(GraphT&, StackT, > llvm::PBQP::NodeSet&) [with GraphT = llvm::PBQP::Gra > > ph<llvm::PBQP::RegAlloc::RegAllocSolverImpl>; StackT > std::vector<unsigned int>; llvm::PBQP::NodeSet = std::set<unsigned int>]: > Assertion `PushedAsConservativelyAllocatabl > > eNodes.find(NId) == PushedAsConservativelyAllocatableNodes.end() && "Node > in conservatively allocatable set spilled!"' failed. > > > > I am not that familiar with the arm architecture, but it looks like NId 2 > is incorrectly moved from not-provens to conservatively-allocables, after > R2(NId 18). This looks to me like the same bug, but I could be wrong. Note > that it **does not show up if the bugfix for the transpositions is > applied** (for random reasons I beleive). > > > > I am not sure how to fix it myself, as it seems non-trivial relating to > temporary edge removal. Does anyone have any idea? > > > > In addition to the reduced test case, I attach four patches, where 1 and 4 > are not ready to commit, but I would be happy to modify and commit them if > wanted: > > > > 1. 0001-Assert… The assert (with some extra code related to it) that > checks that a node that was conservatively allocatable never spills. My > quick way to achieve this was to use an extra set of nodes as a means of > remembering their origin. This is a very good idea, considering that this > should never happen, and that there is some worry that the code might be > incorrect. > > > > 2. 0001-Bugfix-in-PBQP-matrix-transpositions… A bugfix for the > transposition bug. (Bug 1). Small patch and also sent to llvm-commits for > review. > > > > 3. 0001-Bugfix-in-PBQP-regarding-Matrixes-and… A bugfix for the 1-column > case (Bug 2). Small patch and also sent to llvm-commits for review. > > > > 4. 0001-Verbose… Increased debug-dump and pbqp-graph output. Quite crude > code, because I had a conflict in the ARM backend when I tried to #define > DEBUG_TYPE "regalloc". I think this could help PBQP get a wider audience, > perhaps, even if it is still far from perfect. Perhaps this could be > cleaned up and a target-independent name for all pbqp-debug-dumps could be > found and then commited. > > > > Looking forward to your reply, > > > > Jonas Paulsson > > > > > > > > > > > > > > > > *From:* Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] > *Sent:* den 27 januari 2015 08:43 > *To:* 'Lang Hames'; Jonas Paulsson > *Cc:* llvmdev at cs.uiuc.edu > *Subject:* RE: [LLVMdev] PBQP crash > > > > > A node should never be put into the conservatively allocatable list if > there is a chance of it spilling. > > > > I can understand why the logic of > NodeMetadata::isConservativelyAllocatable is necessary for the node to be > allocatable, but I have not been able to convince myself this is > sufficient, especially when the node degree > available registers. > > > > Cheers, > > Arnaud > > > > *From:* llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu > <llvmdev-bounces at cs.uiuc.edu>] *On Behalf Of *Lang Hames > *Sent:* 27 January 2015 06:06 > *To:* Jonas Paulsson > *Cc:* llvmdev at cs.uiuc.edu > *Subject:* Re: [LLVMdev] PBQP crash > > > > Hi Jonas, > > > > > * The problematic node that was spilled again, was in the > ConservativelyAllocatableNodes set during reduce(). The comment in reduce() > “Conservatively allocatable nodes will never spill…” indicates that perhaps > this is an incorrect insertion, as the regs did in fact run out in this > case. > > > > Arnaud is correct: A node should never be put into the conservatively > allocatable list if there is a chance of it spilling. Off the top of my > head I can imagine 2 things going wrong here: > > > > (1) Conservative allocability is mostly-precomputed, and updated with > deltas as nodes are removed. It is possible that there is some subtle bug > in this code. > > > > (2) I think the current conservative allocability test bakes in the > assumption that all register options have non-infinite cost. If you assign > infinite costs to any physical register I would expect this to blow up. > > > > Are you able to share a test case at all? If so that would be great. If > not, I can add an option to the allocator to dump abstract PBQP graphs, and > I could use these to test the problem on my end. > > > > Regards, > > Lang. > > > > > > On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson < > jonas.paulsson at ericsson.com> wrote: > > Hi, > > > > I have run into a test case on an out-of-tree target where PBQP fails to > complete register allocation after “Attempting to spill already spilled > value” (the triggered assert in InlineSpiller::spill(). > > > > First, the original LiveInterval is spilled. It is a load of a symbol into > a narrow register class, i.e. a subset of the class of address registers. > InlineSpiller decides to rematerialize the load of the symbol to lie right > before its only user, which makes good sense. The original def is removed. > > > > The new LiveInterval pushed is thus much smaller in the next PBQP round. > The spill cost is marked as ‘inf’ during graph building. This small > interval has also a lot of overlapping intervals and thus edges in the PBQP > graph. It gets pushed on the node stack to later be popped after 17 others. > > Those 17 nodes use up all registers of the narrow reg-class, and the cost > vector has become all infinities. Spill option is selected again, and thus > the error is a fact of spilling an already spilled value. > > > > I wonder what has gone wrong here, and have some initial thoughts: > > > > * The problematic node that was spilled again, was in the > ConservativelyAllocatableNodes set during reduce(). The comment in reduce() > “Conservatively allocatable nodes will never spill…” indicates that perhaps > this is an incorrect insertion, as the regs did in fact run out in this > case. > > In setup(), the node is first put into not-provably-allocatables. > However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it > into conservatively-allocatables, because DeniedOpts had become less than > NumOpts (in isConservativelyAllocatable(). > > * There are lots of spillable nodes being popped before the one that can’t > be spilled. This seems intuitively wrong, as they are intervals that > actually could be spilled. > > > > I would really appreciate some help and pointers on what might be going > wrong here, > > > > Jonas Paulsson > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/db8a7706/attachment.html>
The dump of the graph can certainly be improved. I’ll look into it. The problem you had with an ARM backend is due to the fact that the DEBUG_TYPE macro is supposed to be used cautiously (if at all) in a header, as it will probably collide when it is included in .cpp files: as it did with the AArch64 target: see the comment at the top of include/llvm/Support/Debug.h. Cheers, -- Arnaud From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com] Sent: 29 January 2015 18:19 To: Arnaud De Grandmaison; 'Lang Hames' Cc: llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] PBQP crash Hi, Sorry for the delay, it has taken some extra time as more than one bug showed up J I continued to look into this with your viewpoint that a node that is conservatively allocatable should never be spilled. The first thing I did was therefore to add some extra code with an assert for this. I believe I then found three bugs and fixed the two: Bug 1: Incorrect transpositions in handleAddEdge() / hanldeRemoveEdge(). For the heuristic of DeniedOpts, if N runs along the columns (Transpose == true), the interesting dimension is along the rows, and vice versa. In other words, the question is what the neighbour could maximally deny N, and the answer is found in WorstRow, not WorstCol (as it was). This makes a difference if sub/super registers overlap. Bug 2: When it happens that there are no available physical registers for a node (e.g because of regmasks or calls), the number of rows / columns is just 1, for the spill option. This case must be detected in MatrixMetadata(), or WorstColCountForCurRow will get an undefined value, as std::max_element() is called with an empty vector. Bug 3: Again a conservatively allocatable node was spilled, and the assert triggered. This is a description of what happened in my out-of-tree test case: applyR2() called on node N, which is overlapping Y and Z. The edges (N,Y) and (N,Z) are all-zeroes. Y and Z are overlapping and already have an interference edge. Z is just on the limit of not being conservatively allocatable: NumOpts is 8 and DeniedOpts is also 8. It is contained in NotProvablyAllocatableNodes. G.setEdgeCosts() is called and then the call stack grows with Solver->handleSetEdgeCosts(), handleRemoveEdge() into handleDisconnectEdge(), where NMd.handleRemoveEdge() is called, which decreases the DeniedOpts by one. After this, it looks like a bug to me that Z is moved to ConservativelyAllocatableNodes, because eventually handleSetEdgeCosts() will complete, and the edge between Y and Z will have been added again in handleAddEdge(), and Z:DeniedOpts is again 8! I think this also shows up in a test case for arm. It was found by using the assert mentioned above, and running 'bin/llvm-stress -size 200 -seed 17761'. The attached file (pbqp_reduced.ll) is the failing test case found reduced with bugpoint. Apply patch 2 (the assert), and then run 'llc pbqp_reduced.ll -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57 -mattr=+neon -optimize-regalloc -regalloc=pbqp'. The assert shows a node that is spilled that was conservatively allocatable. The test case itself will not fail without this assert. Add the Verbose-PBQP-dump patch and use '-debug-only=regalloc -pbqp-dump-graphs' to see the more verbose output which shows the progress of the algorithm leading to the assert: * Applied R2(NId 18)handleDisconnectEdge(9, 2) : DeniedOpts 10 -> 9 NId 9(%vreg15, GPR64common) moved to conservatively-allocatables. handleDisconnectEdge(2, 9) : DeniedOpts 10 -> 9 NId 2(%vreg4, GPR64common) moved to conservatively-allocatables. ... Popped NId 2(%vreg4, GPR64common) , all edge costs added: 2.002748e+01 inf inf inf inf inf inf inf inf inf inf ** selection: 0 llc: ../include/llvm/CodeGen/PBQP/ReductionRules.h:214: llvm::PBQP::Solution llvm::PBQP::backpropagate(GraphT&, StackT, llvm::PBQP::NodeSet&) [with GraphT = llvm::PBQP::Gra ph<llvm::PBQP::RegAlloc::RegAllocSolverImpl>; StackT = std::vector<unsigned int>; llvm::PBQP::NodeSet = std::set<unsigned int>]: Assertion `PushedAsConservativelyAllocatabl eNodes.find(NId) == PushedAsConservativelyAllocatableNodes.end() && "Node in conservatively allocatable set spilled!"' failed. I am not that familiar with the arm architecture, but it looks like NId 2 is incorrectly moved from not-provens to conservatively-allocables, after R2(NId 18). This looks to me like the same bug, but I could be wrong. Note that it *does not show up if the bugfix for the transpositions is applied* (for random reasons I beleive). I am not sure how to fix it myself, as it seems non-trivial relating to temporary edge removal. Does anyone have any idea? In addition to the reduced test case, I attach four patches, where 1 and 4 are not ready to commit, but I would be happy to modify and commit them if wanted: 1. 0001-Assert… The assert (with some extra code related to it) that checks that a node that was conservatively allocatable never spills. My quick way to achieve this was to use an extra set of nodes as a means of remembering their origin. This is a very good idea, considering that this should never happen, and that there is some worry that the code might be incorrect. 2. 0001-Bugfix-in-PBQP-matrix-transpositions… A bugfix for the transposition bug. (Bug 1). Small patch and also sent to llvm-commits for review. 3. 0001-Bugfix-in-PBQP-regarding-Matrixes-and… A bugfix for the 1-column case (Bug 2). Small patch and also sent to llvm-commits for review. 4. 0001-Verbose… Increased debug-dump and pbqp-graph output. Quite crude code, because I had a conflict in the ARM backend when I tried to #define DEBUG_TYPE "regalloc". I think this could help PBQP get a wider audience, perhaps, even if it is still far from perfect. Perhaps this could be cleaned up and a target-independent name for all pbqp-debug-dumps could be found and then commited. Looking forward to your reply, Jonas Paulsson From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] Sent: den 27 januari 2015 08:43 To: 'Lang Hames'; Jonas Paulsson Cc: llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] PBQP crash> A node should never be put into the conservatively allocatable list if there is a chance of it spilling.I can understand why the logic of NodeMetadata::isConservativelyAllocatable is necessary for the node to be allocatable, but I have not been able to convince myself this is sufficient, especially when the node degree > available registers. Cheers, Arnaud From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Lang Hames Sent: 27 January 2015 06:06 To: Jonas Paulsson Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] PBQP crash Hi Jonas,> * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case.Arnaud is correct: A node should never be put into the conservatively allocatable list if there is a chance of it spilling. Off the top of my head I can imagine 2 things going wrong here: (1) Conservative allocability is mostly-precomputed, and updated with deltas as nodes are removed. It is possible that there is some subtle bug in this code. (2) I think the current conservative allocability test bakes in the assumption that all register options have non-infinite cost. If you assign infinite costs to any physical register I would expect this to blow up. Are you able to share a test case at all? If so that would be great. If not, I can add an option to the allocator to dump abstract PBQP graphs, and I could use these to test the problem on my end. Regards, Lang. On Mon, Jan 26, 2015 at 7:55 AM, Jonas Paulsson <jonas.paulsson at ericsson.com> wrote: Hi, I have run into a test case on an out-of-tree target where PBQP fails to complete register allocation after “Attempting to spill already spilled value” (the triggered assert in InlineSpiller::spill(). First, the original LiveInterval is spilled. It is a load of a symbol into a narrow register class, i.e. a subset of the class of address registers. InlineSpiller decides to rematerialize the load of the symbol to lie right before its only user, which makes good sense. The original def is removed. The new LiveInterval pushed is thus much smaller in the next PBQP round. The spill cost is marked as ‘inf’ during graph building. This small interval has also a lot of overlapping intervals and thus edges in the PBQP graph. It gets pushed on the node stack to later be popped after 17 others. Those 17 nodes use up all registers of the narrow reg-class, and the cost vector has become all infinities. Spill option is selected again, and thus the error is a fact of spilling an already spilled value. I wonder what has gone wrong here, and have some initial thoughts: * The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case. In setup(), the node is first put into not-provably-allocatables. However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it into conservatively-allocatables, because DeniedOpts had become less than NumOpts (in isConservativelyAllocatable(). * There are lots of spillable nodes being popped before the one that can’t be spilled. This seems intuitively wrong, as they are intervals that actually could be spilled. I would really appreciate some help and pointers on what might be going wrong here, Jonas Paulsson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/af21c9b2/attachment.html>