Hi Lang, I have reduced the testcase as much as possible. The log of the run and the dumped graphes are attached. Cheers, -- Arnaud de Grandmaison On Tuesday, March 27, 2012 01:20:35 Lang Hames wrote:> Hi Arnaud, > > Thanks for attaching those files. I'll take a look at them. > > Commit r153483 adds an option to the PBQP allocator, > "-pbqp-dump-graphs", to dump the PBQP graph for each round of each > function in a compilation unit. The generated files are named "<module > id>.<function>.<round>.pbqpgraph", and contain a simple text > representation of the PBQP graph. The PBQP allocator has been stable > for some time - I think this patch should apply cleanly to the 3.0 > version. > > Can you run the failing test case through the allocator with this > patch applied, passing the "-regalloc=pbqp -pbqp-dump-graphs > -debug-only=regalloc" options. I'll need to take a look at the last > graph dumped before the assertion is triggered. > > Cheers, > Lang. > > On Mon, Mar 26, 2012 at 7:35 AM, Arnaud de Grandmaison > > <arnaud.allarddegrandmaison at parrot.com> wrote: > > Hi Lang, > > > >> From memory your target is not public, so I won't be able to reproduce > >> the crash myself. Is that correct? > > > > Correct. > > > >> If that's the case, I could add functionality to dump the PBQP graphs > >> during allocation. I think they should give me enough information to > >> debug the issue. Would you be able to share the PBQP graphs? > > > > I can share the pbqp graph if you send me a patch with the added > > functionality you want. On my side, I already checked the node cost for > > this register, which is correctly set to inf for spill, as well as the > > edge costs, and they look ok. > > > > I also attached my target's pbqp related file in case you want to double > > check what I did. This is llvm-3.0 based. It comprises 2 passes : the > > FemtoPBQPBuilder, plus a FemtoPairablepass, which undo some of the > > coalescer's work. The insertRegCopy may specifically need a double > > check, as I am not 100% sure to have updated correctly the LiveInterval > > information. > > > > In terms of registers, the Femto target is simplistic : a single > > register > > class GR16, for data and pointers, all i16. It has 16 registers, R0 to > > R15; R15 is used as stack pointer, and R14 potentialy as framepointer. > > A pair is constituted from a register + its successor, i.e. (R0, R1), > > (R1,R2), (R2, R3), ... are valid pairs. This is an instruction encoding > > constraint, as we only have 16bits wide instructions. Pairs involving > > R15 are never allowed, those with R14 may be allowed, depending on each > > function. > > > > Most instructions have no pairing constraints, and do not appear here, > > being handled by the PBQPBuilder default base class. A few instructions > > have 1 or 2 pairs of registers, and are all handled by the 2 passes > > above. > > > > Thanks for your help, > > -- > > Arnaud de Grandmaison > > Senior CPU engineer > > Business Unit Digital Tuner > > > > Parrot S.A. > > 174, quai de Jemmapes > > 75010 Paris - France > > Phone: +33 1 48 03 84 59-------------- next part -------------- A non-text attachment was scrubbed... Name: invalid-spill.tar.gz Type: application/x-compressed-tar Size: 6984 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120327/1a996a3b/attachment.bin>
Hi Arnaud, Apologies for the delayed reply. Thank you for the excellent test case - it exposed a subtle bug in the colorability heuristic. This has been fixed in r153958. In case you are curious, the bug was as follows: the PBQP solver applies applies a simplification step to each matrix. When all elements of a matrix row or column are equal, the value for those elements is "pushed out" to the corresponding element of the cost vector, and the row/column is zeroed out. This is an attempt to turn the matrices into zero matrices which can be eliminated from the problem. This simplification step runs even for rows/columns that are all infinite. When an all infinite row/column is encountered, it will be zeroed out, and the infinite cost attached to some register in the cost vector. Unfortunately the infinite-cost elements of vectors were not being taken into consideration when determining colorability, only the infinities in the matrices were. In your test case this led to a node being falsely assumed to be colorable when it was not, and pushed to the coloring stack too early. By the time it was encountered in the coloring phase it had already been over-constrained, and no finite cost solutions were available. In future I hope to make the simplification step strip infinite rows/columns from the problem altogether (along with their corresponding vector elements). This would speed the solver up further by avoiding consideration of such impossible options. With the fix from r153958 applied the solver now finds a zero-cost solution for the test case you sent me. This should translate to a valid register allocation for your test case. Please try it out and let me know if it works for you. Cheers, Lang. On Tue, Mar 27, 2012 at 5:05 AM, Arnaud de Grandmaison < arnaud.allarddegrandmaison at parrot.com> wrote:> Hi Lang, > > I have reduced the testcase as much as possible. The log of the run and the > dumped graphes are attached. > > Cheers, > -- > Arnaud de Grandmaison > > On Tuesday, March 27, 2012 01:20:35 Lang Hames wrote: > > Hi Arnaud, > > > > Thanks for attaching those files. I'll take a look at them. > > > > Commit r153483 adds an option to the PBQP allocator, > > "-pbqp-dump-graphs", to dump the PBQP graph for each round of each > > function in a compilation unit. The generated files are named "<module > > id>.<function>.<round>.pbqpgraph", and contain a simple text > > representation of the PBQP graph. The PBQP allocator has been stable > > for some time - I think this patch should apply cleanly to the 3.0 > > version. > > > > Can you run the failing test case through the allocator with this > > patch applied, passing the "-regalloc=pbqp -pbqp-dump-graphs > > -debug-only=regalloc" options. I'll need to take a look at the last > > graph dumped before the assertion is triggered. > > > > Cheers, > > Lang. > > > > On Mon, Mar 26, 2012 at 7:35 AM, Arnaud de Grandmaison > > > > <arnaud.allarddegrandmaison at parrot.com> wrote: > > > Hi Lang, > > > > > >> From memory your target is not public, so I won't be able to reproduce > > >> the crash myself. Is that correct? > > > > > > Correct. > > > > > >> If that's the case, I could add functionality to dump the PBQP graphs > > >> during allocation. I think they should give me enough information to > > >> debug the issue. Would you be able to share the PBQP graphs? > > > > > > I can share the pbqp graph if you send me a patch with the added > > > functionality you want. On my side, I already checked the node cost for > > > this register, which is correctly set to inf for spill, as well as the > > > edge costs, and they look ok. > > > > > > I also attached my target's pbqp related file in case you want to > double > > > check what I did. This is llvm-3.0 based. It comprises 2 passes : the > > > FemtoPBQPBuilder, plus a FemtoPairablepass, which undo some of the > > > coalescer's work. The insertRegCopy may specifically need a double > > > check, as I am not 100% sure to have updated correctly the LiveInterval > > > information. > > > > > > In terms of registers, the Femto target is simplistic : a single > > > register > > > class GR16, for data and pointers, all i16. It has 16 registers, R0 to > > > R15; R15 is used as stack pointer, and R14 potentialy as framepointer. > > > A pair is constituted from a register + its successor, i.e. (R0, R1), > > > (R1,R2), (R2, R3), ... are valid pairs. This is an instruction encoding > > > constraint, as we only have 16bits wide instructions. Pairs involving > > > R15 are never allowed, those with R14 may be allowed, depending on each > > > function. > > > > > > Most instructions have no pairing constraints, and do not appear here, > > > being handled by the PBQPBuilder default base class. A few instructions > > > have 1 or 2 pairs of registers, and are all handled by the 2 passes > > > above. > > > > > > Thanks for your help, > > > -- > > > Arnaud de Grandmaison > > > Senior CPU engineer > > > Business Unit Digital Tuner > > > > > > Parrot S.A. > > > 174, quai de Jemmapes > > > 75010 Paris - France > > > Phone: +33 1 48 03 84 59 >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120403/8619d4c5/attachment.html>
Hi Lang, Thanks a lot for taking time to look into this. I will test the fix soon and let you know the results. Cheers, -- Arnaud de Grandmaison On Tuesday, April 03, 2012 17:30:33 Lang Hames wrote:> Hi Arnaud, > > Apologies for the delayed reply. > > Thank you for the excellent test case - it exposed a subtle bug in the > colorability heuristic. This has been fixed in r153958. > > In case you are curious, the bug was as follows: the PBQP solver applies > applies a simplification step to each matrix. When all elements of a matrix > row or column are equal, the value for those elements is "pushed out" to > the corresponding element of the cost vector, and the row/column is zeroed > out. This is an attempt to turn the matrices into zero matrices which can > be eliminated from the problem. This simplification step runs even for > rows/columns that are all infinite. When an all infinite row/column is > encountered, it will be zeroed out, and the infinite cost attached to some > register in the cost vector. Unfortunately the infinite-cost elements of > vectors were not being taken into consideration when determining > colorability, only the infinities in the matrices were. In your test case > this led to a node being falsely assumed to be colorable when it was not, > and pushed to the coloring stack too early. By the time it was encountered > in the coloring phase it had already been over-constrained, and no finite > cost solutions were available. > > In future I hope to make the simplification step strip infinite rows/columns > from the problem altogether (along with their corresponding vector > elements). This would speed the solver up further by avoiding consideration > of such impossible options. > > With the fix from r153958 applied the solver now finds a zero-cost solution > for the test case you sent me. This should translate to a valid register > allocation for your test case. Please try it out and let me know if it > works for you. > > Cheers, > Lang. > > On Tue, Mar 27, 2012 at 5:05 AM, Arnaud de Grandmaison > <arnaud.allarddegrandmaison at parrot.com<mailto:arnaud.allarddegrandmaison at pa > rrot.com>> wrote: Hi Lang, > > I have reduced the testcase as much as possible. The log of the run and the > dumped graphes are attached. > > Cheers, > -- > Arnaud de Grandmaison > > On Tuesday, March 27, 2012 01:20:35 Lang Hames wrote: > > Hi Arnaud, > > > > Thanks for attaching those files. I'll take a look at them. > > > > Commit r153483 adds an option to the PBQP allocator, > > "-pbqp-dump-graphs", to dump the PBQP graph for each round of each > > function in a compilation unit. The generated files are named "<module > > id>.<function>.<round>.pbqpgraph", and contain a simple text > > representation of the PBQP graph. The PBQP allocator has been stable > > for some time - I think this patch should apply cleanly to the 3.0 > > version. > > > > Can you run the failing test case through the allocator with this > > patch applied, passing the "-regalloc=pbqp -pbqp-dump-graphs > > -debug-only=regalloc" options. I'll need to take a look at the last > > graph dumped before the assertion is triggered. > > > > Cheers, > > Lang. > > > > On Mon, Mar 26, 2012 at 7:35 AM, Arnaud de Grandmaison > > > ><arnaud.allarddegrandmaison at parrot.com<mailto:arnaud.allarddegrandmaison at parrot.com>> wrote:> > > Hi Lang, > > > > > >> From memory your target is not public, so I won't be able to > > >> reproduce > > >> the crash myself. Is that correct? > > > > > > Correct. > > > > > >> If that's the case, I could add functionality to dump the PBQP > > >> graphs > > >> during allocation. I think they should give me enough information > > >> to > > >> debug the issue. Would you be able to share the PBQP graphs? > > > > > > I can share the pbqp graph if you send me a patch with the added > > > functionality you want. On my side, I already checked the node cost > > > for > > > this register, which is correctly set to inf for spill, as well as > > > the > > > edge costs, and they look ok. > > > > > > I also attached my target's pbqp related file in case you want to > > > double check what I did. This is llvm-3.0 based. It comprises 2 > > > passes : the FemtoPBQPBuilder, plus a FemtoPairablepass, which undo > > > some of the coalescer's work. The insertRegCopy may specifically > > > need a double check, as I am not 100% sure to have updated > > > correctly the LiveInterval information. > > > > > > In terms of registers, the Femto target is simplistic : a single > > > register > > > class GR16, for data and pointers, all i16. It has 16 registers, R0 > > > to > > > R15; R15 is used as stack pointer, and R14 potentialy as > > > framepointer. > > > A pair is constituted from a register + its successor, i.e. (R0, > > > R1), > > > (R1,R2), (R2, R3), ... are valid pairs. This is an instruction > > > encoding > > > constraint, as we only have 16bits wide instructions. Pairs > > > involving > > > R15 are never allowed, those with R14 may be allowed, depending on > > > each > > > function. > > > > > > Most instructions have no pairing constraints, and do not appear > > > here, > > > being handled by the PBQPBuilder default base class. A few > > > instructions > > > have 1 or 2 pairs of registers, and are all handled by the 2 passes > > > above. > > > > > > Thanks for your help, > > > -- > > > Arnaud de Grandmaison > > > Senior CPU engineer > > > Business Unit Digital Tuner > > > > > > Parrot S.A. > > > 174, quai de Jemmapes > > > 75010 Paris - France > > > Phone: +33 1 48 03 84 59<tel:%2B33%201%2048%2003%2084%2059>