Hi Lang, Thanks a lot for taking time to look into this. I will test the fix soon and let you know the results. Cheers, -- Arnaud de Grandmaison On Tuesday, April 03, 2012 17:30:33 Lang Hames wrote:> Hi Arnaud, > > Apologies for the delayed reply. > > Thank you for the excellent test case - it exposed a subtle bug in the > colorability heuristic. This has been fixed in r153958. > > In case you are curious, the bug was as follows: the PBQP solver applies > applies a simplification step to each matrix. When all elements of a matrix > row or column are equal, the value for those elements is "pushed out" to > the corresponding element of the cost vector, and the row/column is zeroed > out. This is an attempt to turn the matrices into zero matrices which can > be eliminated from the problem. This simplification step runs even for > rows/columns that are all infinite. When an all infinite row/column is > encountered, it will be zeroed out, and the infinite cost attached to some > register in the cost vector. Unfortunately the infinite-cost elements of > vectors were not being taken into consideration when determining > colorability, only the infinities in the matrices were. In your test case > this led to a node being falsely assumed to be colorable when it was not, > and pushed to the coloring stack too early. By the time it was encountered > in the coloring phase it had already been over-constrained, and no finite > cost solutions were available. > > In future I hope to make the simplification step strip infinite rows/columns > from the problem altogether (along with their corresponding vector > elements). This would speed the solver up further by avoiding consideration > of such impossible options. > > With the fix from r153958 applied the solver now finds a zero-cost solution > for the test case you sent me. This should translate to a valid register > allocation for your test case. Please try it out and let me know if it > works for you. > > Cheers, > Lang. > > On Tue, Mar 27, 2012 at 5:05 AM, Arnaud de Grandmaison > <arnaud.allarddegrandmaison at parrot.com<mailto:arnaud.allarddegrandmaison at pa > rrot.com>> wrote: Hi Lang, > > I have reduced the testcase as much as possible. The log of the run and the > dumped graphes are attached. > > Cheers, > -- > Arnaud de Grandmaison > > On Tuesday, March 27, 2012 01:20:35 Lang Hames wrote: > > Hi Arnaud, > > > > Thanks for attaching those files. I'll take a look at them. > > > > Commit r153483 adds an option to the PBQP allocator, > > "-pbqp-dump-graphs", to dump the PBQP graph for each round of each > > function in a compilation unit. The generated files are named "<module > > id>.<function>.<round>.pbqpgraph", and contain a simple text > > representation of the PBQP graph. The PBQP allocator has been stable > > for some time - I think this patch should apply cleanly to the 3.0 > > version. > > > > Can you run the failing test case through the allocator with this > > patch applied, passing the "-regalloc=pbqp -pbqp-dump-graphs > > -debug-only=regalloc" options. I'll need to take a look at the last > > graph dumped before the assertion is triggered. > > > > Cheers, > > Lang. > > > > On Mon, Mar 26, 2012 at 7:35 AM, Arnaud de Grandmaison > > > ><arnaud.allarddegrandmaison at parrot.com<mailto:arnaud.allarddegrandmaison at parrot.com>> wrote:> > > Hi Lang, > > > > > >> From memory your target is not public, so I won't be able to > > >> reproduce > > >> the crash myself. Is that correct? > > > > > > Correct. > > > > > >> If that's the case, I could add functionality to dump the PBQP > > >> graphs > > >> during allocation. I think they should give me enough information > > >> to > > >> debug the issue. Would you be able to share the PBQP graphs? > > > > > > I can share the pbqp graph if you send me a patch with the added > > > functionality you want. On my side, I already checked the node cost > > > for > > > this register, which is correctly set to inf for spill, as well as > > > the > > > edge costs, and they look ok. > > > > > > I also attached my target's pbqp related file in case you want to > > > double check what I did. This is llvm-3.0 based. It comprises 2 > > > passes : the FemtoPBQPBuilder, plus a FemtoPairablepass, which undo > > > some of the coalescer's work. The insertRegCopy may specifically > > > need a double check, as I am not 100% sure to have updated > > > correctly the LiveInterval information. > > > > > > In terms of registers, the Femto target is simplistic : a single > > > register > > > class GR16, for data and pointers, all i16. It has 16 registers, R0 > > > to > > > R15; R15 is used as stack pointer, and R14 potentialy as > > > framepointer. > > > A pair is constituted from a register + its successor, i.e. (R0, > > > R1), > > > (R1,R2), (R2, R3), ... are valid pairs. This is an instruction > > > encoding > > > constraint, as we only have 16bits wide instructions. Pairs > > > involving > > > R15 are never allowed, those with R14 may be allowed, depending on > > > each > > > function. > > > > > > Most instructions have no pairing constraints, and do not appear > > > here, > > > being handled by the PBQPBuilder default base class. A few > > > instructions > > > have 1 or 2 pairs of registers, and are all handled by the 2 passes > > > above. > > > > > > Thanks for your help, > > > -- > > > Arnaud de Grandmaison > > > Senior CPU engineer > > > Business Unit Digital Tuner > > > > > > Parrot S.A. > > > 174, quai de Jemmapes > > > 75010 Paris - France > > > Phone: +33 1 48 03 84 59<tel:%2B33%201%2048%2003%2084%2059>
Hi Lang, The assert is not triggered any longer on my testcases :) I however still get my wrong allocation in some non trivial cases : the pairing constraint is not fulfilled. I have tried to modify the 'ensure pairable' pass (the pass undoing some of the coalescer's work) to always insert register copies for instructions with the pairable constraint, instead of being smart and inserting the copy only when needed. This had no visible effect. Although I am deriving from PBQPBuilder, the PBQP seems to be coalescing some register copy, without taking into account that the source or dest reg may have different constraints. In which part of pbqp would this be happening ? Cheers, -- Arnaud de Grandmaison On 04/05/2012 05:23 PM, Arnaud de Grandmaison wrote:> Hi Lang, > > Thanks a lot for taking time to look into this. I will test the fix soon and > let you know the results. > > Cheers, > -- > Arnaud de Grandmaison > > On Tuesday, April 03, 2012 17:30:33 Lang Hames wrote: >> Hi Arnaud, >> >> Apologies for the delayed reply. >> >> Thank you for the excellent test case - it exposed a subtle bug in the >> colorability heuristic. This has been fixed in r153958. >> >> In case you are curious, the bug was as follows: the PBQP solver applies >> applies a simplification step to each matrix. When all elements of a matrix >> row or column are equal, the value for those elements is "pushed out" to >> the corresponding element of the cost vector, and the row/column is zeroed >> out. This is an attempt to turn the matrices into zero matrices which can >> be eliminated from the problem. This simplification step runs even for >> rows/columns that are all infinite. When an all infinite row/column is >> encountered, it will be zeroed out, and the infinite cost attached to some >> register in the cost vector. Unfortunately the infinite-cost elements of >> vectors were not being taken into consideration when determining >> colorability, only the infinities in the matrices were. In your test case >> this led to a node being falsely assumed to be colorable when it was not, >> and pushed to the coloring stack too early. By the time it was encountered >> in the coloring phase it had already been over-constrained, and no finite >> cost solutions were available. >> >> In future I hope to make the simplification step strip infinite rows/columns >> from the problem altogether (along with their corresponding vector >> elements). This would speed the solver up further by avoiding consideration >> of such impossible options. >> >> With the fix from r153958 applied the solver now finds a zero-cost solution >> for the test case you sent me. This should translate to a valid register >> allocation for your test case. Please try it out and let me know if it >> works for you. >> >> Cheers, >> Lang. >> >> On Tue, Mar 27, 2012 at 5:05 AM, Arnaud de Grandmaison >> <arnaud.allarddegrandmaison at parrot.com<mailto:arnaud.allarddegrandmaison at pa >> rrot.com>> wrote: Hi Lang, >> >> I have reduced the testcase as much as possible. The log of the run and the >> dumped graphes are attached. >> >> Cheers, >> -- >> Arnaud de Grandmaison >> >> On Tuesday, March 27, 2012 01:20:35 Lang Hames wrote: >>> Hi Arnaud, >>> >>> Thanks for attaching those files. I'll take a look at them. >>> >>> Commit r153483 adds an option to the PBQP allocator, >>> "-pbqp-dump-graphs", to dump the PBQP graph for each round of each >>> function in a compilation unit. The generated files are named "<module >>> id>.<function>.<round>.pbqpgraph", and contain a simple text >>> representation of the PBQP graph. The PBQP allocator has been stable >>> for some time - I think this patch should apply cleanly to the 3.0 >>> version. >>> >>> Can you run the failing test case through the allocator with this >>> patch applied, passing the "-regalloc=pbqp -pbqp-dump-graphs >>> -debug-only=regalloc" options. I'll need to take a look at the last >>> graph dumped before the assertion is triggered. >>> >>> Cheers, >>> Lang. >>> >>> On Mon, Mar 26, 2012 at 7:35 AM, Arnaud de Grandmaison >>> >>> > <arnaud.allarddegrandmaison at parrot.com<mailto:arnaud.allarddegrandmaison at parrot.com>> > wrote: >>>> Hi Lang, >>>> >>>>> From memory your target is not public, so I won't be able to >>>>> reproduce >>>>> the crash myself. Is that correct? >>>> Correct. >>>> >>>>> If that's the case, I could add functionality to dump the PBQP >>>>> graphs >>>>> during allocation. I think they should give me enough information >>>>> to >>>>> debug the issue. Would you be able to share the PBQP graphs? >>>> I can share the pbqp graph if you send me a patch with the added >>>> functionality you want. On my side, I already checked the node cost >>>> for >>>> this register, which is correctly set to inf for spill, as well as >>>> the >>>> edge costs, and they look ok. >>>> >>>> I also attached my target's pbqp related file in case you want to >>>> double check what I did. This is llvm-3.0 based. It comprises 2 >>>> passes : the FemtoPBQPBuilder, plus a FemtoPairablepass, which undo >>>> some of the coalescer's work. The insertRegCopy may specifically >>>> need a double check, as I am not 100% sure to have updated >>>> correctly the LiveInterval information. >>>> >>>> In terms of registers, the Femto target is simplistic : a single >>>> register >>>> class GR16, for data and pointers, all i16. It has 16 registers, R0 >>>> to >>>> R15; R15 is used as stack pointer, and R14 potentialy as >>>> framepointer. >>>> A pair is constituted from a register + its successor, i.e. (R0, >>>> R1), >>>> (R1,R2), (R2, R3), ... are valid pairs. This is an instruction >>>> encoding >>>> constraint, as we only have 16bits wide instructions. Pairs >>>> involving >>>> R15 are never allowed, those with R14 may be allowed, depending on >>>> each >>>> function. >>>> >>>> Most instructions have no pairing constraints, and do not appear >>>> here, >>>> being handled by the PBQPBuilder default base class. A few >>>> instructions >>>> have 1 or 2 pairs of registers, and are all handled by the 2 passes >>>> above. >>>> >>>> Thanks for your help, >>>> -- >>>> Arnaud de Grandmaison >>>> Senior CPU engineer >>>> Business Unit Digital Tuner >>>> >>>> Parrot S.A. >>>> 174, quai de Jemmapes >>>> 75010 Paris - France >>>> Phone: +33 1 48 03 84 59<tel:%2B33%201%2048%2003%2084%2059> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Hi Arnaud, I'm glad to hear that your test case is working. I however still get my wrong allocation in some non trivial cases : the> pairing constraint is not fulfilled. > > I have tried to modify the 'ensure pairable' pass (the pass undoing some > of the coalescer's work) to always insert register copies for > instructions with the pairable constraint, instead of being smart and > inserting the copy only when needed. This had no visible effect. > Although I am deriving from PBQPBuilder, the PBQP seems to be coalescing > some register copy, without taking into account that the source or dest > reg may have different constraints. In which part of pbqp would this be > happening ? > >If you're deriving PBQPBuilder (and not PBQPBuilderWithCoalescing) then PBQP won't attempt any coalescing, though obviously it can "get lucky" and assign registers that are connected by a copy the same register. Those copies shouldn't be necessary - as long as you don't have a circular chain of paired variables (a,b,c,...,a) with all-infinite spill costs you should be safe. Can you send me a PBQP graph for your failing test case? I'll see if I can figure out where the solver is going wrong. - Lang.> Cheers, > -- > Arnaud de Grandmaison > > On 04/05/2012 05:23 PM, Arnaud de Grandmaison wrote: > > Hi Lang, > > > > Thanks a lot for taking time to look into this. I will test the fix soon > and > > let you know the results. > > > > Cheers, > > -- > > Arnaud de Grandmaison > > > > On Tuesday, April 03, 2012 17:30:33 Lang Hames wrote: > >> Hi Arnaud, > >> > >> Apologies for the delayed reply. > >> > >> Thank you for the excellent test case - it exposed a subtle bug in the > >> colorability heuristic. This has been fixed in r153958. > >> > >> In case you are curious, the bug was as follows: the PBQP solver applies > >> applies a simplification step to each matrix. When all elements of a > matrix > >> row or column are equal, the value for those elements is "pushed out" to > >> the corresponding element of the cost vector, and the row/column is > zeroed > >> out. This is an attempt to turn the matrices into zero matrices which > can > >> be eliminated from the problem. This simplification step runs even for > >> rows/columns that are all infinite. When an all infinite row/column is > >> encountered, it will be zeroed out, and the infinite cost attached to > some > >> register in the cost vector. Unfortunately the infinite-cost elements of > >> vectors were not being taken into consideration when determining > >> colorability, only the infinities in the matrices were. In your test > case > >> this led to a node being falsely assumed to be colorable when it was > not, > >> and pushed to the coloring stack too early. By the time it was > encountered > >> in the coloring phase it had already been over-constrained, and no > finite > >> cost solutions were available. > >> > >> In future I hope to make the simplification step strip infinite > rows/columns > >> from the problem altogether (along with their corresponding vector > >> elements). This would speed the solver up further by avoiding > consideration > >> of such impossible options. > >> > >> With the fix from r153958 applied the solver now finds a zero-cost > solution > >> for the test case you sent me. This should translate to a valid register > >> allocation for your test case. Please try it out and let me know if it > >> works for you. > >> > >> Cheers, > >> Lang. > >> > >> On Tue, Mar 27, 2012 at 5:05 AM, Arnaud de Grandmaison > >> <arnaud.allarddegrandmaison at parrot.com<mailto: > arnaud.allarddegrandmaison at pa > >> rrot.com>> wrote: Hi Lang, > >> > >> I have reduced the testcase as much as possible. The log of the run and > the > >> dumped graphes are attached. > >> > >> Cheers, > >> -- > >> Arnaud de Grandmaison > >> > >> On Tuesday, March 27, 2012 01:20:35 Lang Hames wrote: > >>> Hi Arnaud, > >>> > >>> Thanks for attaching those files. I'll take a look at them. > >>> > >>> Commit r153483 adds an option to the PBQP allocator, > >>> "-pbqp-dump-graphs", to dump the PBQP graph for each round of each > >>> function in a compilation unit. The generated files are named "<module > >>> id>.<function>.<round>.pbqpgraph", and contain a simple text > >>> representation of the PBQP graph. The PBQP allocator has been stable > >>> for some time - I think this patch should apply cleanly to the 3.0 > >>> version. > >>> > >>> Can you run the failing test case through the allocator with this > >>> patch applied, passing the "-regalloc=pbqp -pbqp-dump-graphs > >>> -debug-only=regalloc" options. I'll need to take a look at the last > >>> graph dumped before the assertion is triggered. > >>> > >>> Cheers, > >>> Lang. > >>> > >>> On Mon, Mar 26, 2012 at 7:35 AM, Arnaud de Grandmaison > >>> > >>> > > <arnaud.allarddegrandmaison at parrot.com<mailto: > arnaud.allarddegrandmaison at parrot.com>> > > wrote: > >>>> Hi Lang, > >>>> > >>>>> From memory your target is not public, so I won't be able to > >>>>> reproduce > >>>>> the crash myself. Is that correct? > >>>> Correct. > >>>> > >>>>> If that's the case, I could add functionality to dump the PBQP > >>>>> graphs > >>>>> during allocation. I think they should give me enough information > >>>>> to > >>>>> debug the issue. Would you be able to share the PBQP graphs? > >>>> I can share the pbqp graph if you send me a patch with the added > >>>> functionality you want. On my side, I already checked the node cost > >>>> for > >>>> this register, which is correctly set to inf for spill, as well as > >>>> the > >>>> edge costs, and they look ok. > >>>> > >>>> I also attached my target's pbqp related file in case you want to > >>>> double check what I did. This is llvm-3.0 based. It comprises 2 > >>>> passes : the FemtoPBQPBuilder, plus a FemtoPairablepass, which undo > >>>> some of the coalescer's work. The insertRegCopy may specifically > >>>> need a double check, as I am not 100% sure to have updated > >>>> correctly the LiveInterval information. > >>>> > >>>> In terms of registers, the Femto target is simplistic : a single > >>>> register > >>>> class GR16, for data and pointers, all i16. It has 16 registers, R0 > >>>> to > >>>> R15; R15 is used as stack pointer, and R14 potentialy as > >>>> framepointer. > >>>> A pair is constituted from a register + its successor, i.e. (R0, > >>>> R1), > >>>> (R1,R2), (R2, R3), ... are valid pairs. This is an instruction > >>>> encoding > >>>> constraint, as we only have 16bits wide instructions. Pairs > >>>> involving > >>>> R15 are never allowed, those with R14 may be allowed, depending on > >>>> each > >>>> function. > >>>> > >>>> Most instructions have no pairing constraints, and do not appear > >>>> here, > >>>> being handled by the PBQPBuilder default base class. A few > >>>> instructions > >>>> have 1 or 2 pairs of registers, and are all handled by the 2 passes > >>>> above. > >>>> > >>>> Thanks for your help, > >>>> -- > >>>> Arnaud de Grandmaison > >>>> Senior CPU engineer > >>>> Business Unit Digital Tuner > >>>> > >>>> Parrot S.A. > >>>> 174, quai de Jemmapes > >>>> 75010 Paris - France > >>>> Phone: +33 1 48 03 84 59<tel:%2B33%201%2048%2003%2084%2059> > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120418/4d8e5e3c/attachment.html>