thr3ads.net - llvm dev - [llvm-dev] Register Allocation Graph Coloring algorithm and Others [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Leslie Zhai via llvm-dev

2017-Dec-15 03:18 UTC

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi GCC and LLVM developers,

I am learning Register Allocation algorithms and I am clear that:

* Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg (hard)

* Memory (20 - 100 cycles) is expensive than Register (1 cycle), but it 
has to spill code when PhysReg is unavailable

* Folding spill code into instructions, handling register coallescing, 
splitting live ranges, doing rematerialization, doing shrink wrapping 
are harder than RegAlloc

* LRA and IRA is default Passes in RA for GCC:

$ /opt/gcc-git/bin/gcc hello.c
DEBUG: ../../gcc/lra.c, lra_init_once, line 2441
DEBUG: ../../gcc/ira-build.c, ira_build, line 3409

* Greedy is default Pass for LLVM

But I have some questions, please give me some hint, thanks a lot!

* IRA is regional register allocator performing graph coloring on a 
top-down traversal of nested regions, is it Global? compares with Local LRA

* The papers by Briggs and Chaiten contradict[2] themselves when examine 
the text of the paper vs. the pseudocode provided?

* Why  interference graph is expensive to build[3]?

And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, for 
LLVM firstly.


[1] https://reviews.llvm.org/D39712

[2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html

[3] https://github.com/joaotavio/llvm-register-allocator

[4] https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol

-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

Vladimir Makarov via llvm-dev

2017-Dec-15 04:40 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

On 12/14/2017 10:18 PM, Leslie Zhai wrote:> Hi GCC and LLVM developers,
>
> I am learning Register Allocation algorithms and I am clear that:
>
> * Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg 
> (hard)
>
> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but 
> it has to spill code when PhysReg is unavailable
>
It might be much less if memory value is in L1 cache.> * Folding spill code into instructions, handling register coallescing, 
> splitting live ranges, doing rematerialization, doing shrink wrapping 
> are harder than RegAlloc
>RegAlloc is in a wide sense includes all this tasks and more.  For some 
architectures, other tasks like a right live range splitting might be 
even more important for generated code quality than just better graph 
coloring.> * LRA and IRA is default Passes in RA for GCC:
>
> $ /opt/gcc-git/bin/gcc hello.c
> DEBUG: ../../gcc/lra.c, lra_init_once, line 2441
> DEBUG: ../../gcc/ira-build.c, ira_build, line 3409
>
> * Greedy is default Pass for LLVM
>
> But I have some questions, please give me some hint, thanks a lot!
>
> * IRA is regional register allocator performing graph coloring on a 
> top-down traversal of nested regions, is it Global? compares with 
> Local LRAIRA is a global RA.  The description of its initial version can be found

https://vmakarov.fedorapeople.org/vmakarov-submission-cgo2008.pdf

LRA in some way is also global RA but it is a very simplified version of 
global RA (e.g. LRA does not use conflict graph and its coloring 
algoritm is closer to priority coloring).  LRA does a lot of other very 
complicated things besides RA, for example instruction selection which 
is quite specific to GCC machine description. Usually code selection 
task is a separate pass in other compilers. Generally speaking LRA is 
more complicated, machine dependent and more buggy than IRA.  But 
fortunately LRA is less complicated than its predecessor so called 
reload pass.

IRA and LRA names have a long history and they do not reflect correctly 
the current situation.

It would be possible to incorporate LRA tasks into IRA, but the final RA 
would be much slower, even more complicated and hard to maintain and the 
generated code would be not much better.  So to improve RA 
maintainability, RA is divided on two parts solving a bit different 
tasks.  This is a typical engineering approach.>
> * The papers by Briggs and Chaiten contradict[2] themselves when 
> examine the text of the paper vs. the pseudocode provided?I don't examine Preston Briggs work so thoroughly.  So I can not say 
that is true.  Even so it is natural that there are discrepancy in 
pseudocode and its description especially for such size description.

For me Preston Briggs is famous for his introduction of optimistic
coloring.>
> * Why  interference graph is expensive to build[3]?
>That is because it might be N^2 algorithm.  There are a lot of 
publications investigating building conflict graphs and its cost in
RAs.> And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, for 
> LLVM firstly.
>When I just started to work on RAs very long ago I used about the same 
approach: a lot of tiny transformations directed by a cost function and 
using metaheuristics (I also used tabu search as HEA). Nothing good came 
out of this.

If you are interesting in RA algorithms and architectures, I'd recommend 
Michael Matz article

ftp://gcc.gnu.org/pub/gcc/summit/2003/Graph%20Coloring%20Register%20Allocation.pdf

as a start point.>
> [1] https://reviews.llvm.org/D39712
>
> [2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html
>
> [3] https://github.com/joaotavio/llvm-register-allocator
>
> [4] https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol
>

Leslie Zhai via llvm-dev

2017-Dec-15 04:58 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi Vladimir,

Thanks for your kind and very detailed response!


在 2017年12月15日 12:40, Vladimir Makarov 写道:>
>
> On 12/14/2017 10:18 PM, Leslie Zhai wrote:
>> Hi GCC and LLVM developers,
>>
>> I am learning Register Allocation algorithms and I am clear that:
>>
>> * Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg
>> (hard)
>>
>> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but 
>> it has to spill code when PhysReg is unavailable
>>
> It might be much less if memory value is in L1 cache.
>> * Folding spill code into instructions, handling register 
>> coallescing, splitting live ranges, doing rematerialization, doing 
>> shrink wrapping are harder than RegAlloc
>>
> RegAlloc is in a wide sense includes all this tasks and more.  For 
> some architectures, other tasks like a right live range splitting 
> might be even more important for generated code quality than just 
> better graph coloring.
>> * LRA and IRA is default Passes in RA for GCC:
>>
>> $ /opt/gcc-git/bin/gcc hello.c
>> DEBUG: ../../gcc/lra.c, lra_init_once, line 2441
>> DEBUG: ../../gcc/ira-build.c, ira_build, line 3409
>>
>> * Greedy is default Pass for LLVM
>>
>> But I have some questions, please give me some hint, thanks a lot!
>>
>> * IRA is regional register allocator performing graph coloring on a 
>> top-down traversal of nested regions, is it Global? compares with 
>> Local LRA
> IRA is a global RA.  The description of its initial version can be found
>
> https://vmakarov.fedorapeople.org/vmakarov-submission-cgo2008.pdfI am reading this paper at present :)

>
>
> LRA in some way is also global RA but it is a very simplified version 
> of global RA (e.g. LRA does not use conflict graph and its coloring 
> algoritm is closer to priority coloring).  LRA does a lot of other 
> very complicated things besides RA, for example instruction selection 
> which is quite specific to GCC machine description. Usually code 
> selection task is a separate pass in other compilers. Generally 
> speaking LRA is more complicated, machine dependent and more buggy 
> than IRA.  But fortunately LRA is less complicated than its 
> predecessor so called reload pass.
>
> IRA and LRA names have a long history and they do not reflect 
> correctly the current situation.
>
> It would be possible to incorporate LRA tasks into IRA, but the final 
> RA would be much slower, even more complicated and hard to maintain 
> and the generated code would be not much better.  So to improve RA 
> maintainability, RA is divided on two parts solving a bit different 
> tasks.  This is a typical engineering approach.I am debugging by printf to be familiar with LRA and IRA.

>
>>
>> * The papers by Briggs and Chaiten contradict[2] themselves when 
>> examine the text of the paper vs. the pseudocode provided?
> I don't examine Preston Briggs work so thoroughly.  So I can not say 
> that is true.  Even so it is natural that there are discrepancy in 
> pseudocode and its description especially for such size description.
>
> For me Preston Briggs is famous for his introduction of optimistic 
> coloring.
>>
>> * Why  interference graph is expensive to build[3]?
>>
> That is because it might be N^2 algorithm.  There are a lot of 
> publications investigating building conflict graphs and its cost in RAs.
>> And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, 
>> for LLVM firstly.
>>
> When I just started to work on RAs very long ago I used about the same 
> approach: a lot of tiny transformations directed by a cost function 
> and using metaheuristics (I also used tabu search as HEA). Nothing 
> good came out of this.Thanks for your lesson! But are there some benchmarks when you used Tabu 
search as HEA, AntCol, etc. such as 
https://pbs.twimg.com/media/DRD-kxcUMAAxZec.jpg

>
>
> If you are interesting in RA algorithms and architectures, I'd 
> recommend Michael Matz article
>
>
ftp://gcc.gnu.org/pub/gcc/summit/2003/Graph%20Coloring%20Register%20Allocation.pdf
>
>
> as a start point.Thanks! I am reading it.

>>
>> [1] https://reviews.llvm.org/D39712
>>
>> [2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html
>>
>> [3] https://github.com/joaotavio/llvm-register-allocator
>>
>> [4]
https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol
>>
>
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

Peter Bergner via llvm-dev

2017-Dec-15 14:48 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

On 12/14/17 9:18 PM, Leslie Zhai wrote:> * The papers by Briggs and Chaiten contradict[2] themselves when examine
> the text of the paper vs. the pseudocode provided?
I've read both of these papers many times (in the past) and I don't
recall
any contradictions in them.  Can you (Dave?) be more specific about what you
think are contradictions?

I do admit that pseudo code in papers can be very terse, to the point that
they don't show all the little details that are needed to actually implement
them, but they definitely shouldn't contradict their written description.
I was very grateful that Preston was more than willing to answer all my many
questions regarding his allocator and the many many details he couldn't
mention in his Ph.D. thesis, let alone a short paper.

Peter

Leslie Zhai via llvm-dev

2017-Dec-17 13:15 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi Dr. Rhydian,

I am trying to build Dimacs Graph with VirtReg (pseudo) I could counting 
G.Nodes and Edges 
https://github.com/xiangzhai/llvm/blob/avr/lib/CodeGen/RegAllocGraphColoring.cpp#L359

It just translated gCol/HybridEA's inputDimacsGraph to LLVM pseudo 
registers' traverse, but I am not clear what is Node1 or Node2,  is it 
refer to the Index of vertices?

In the gCol/HybridEA/graph.txt, for example:

e 2 1

Is it means there is an edge between Node2 and Node1? if so, it might be 
equivalent to LLVM's VirtReg1->overlaps(*VirtReg2)

And follow your mind,
Node1 = 2, Node2 =1;
if (Node1 < 1 || Node1 > G.Nodes || Node2 < 1 || Node2 > G.Nodes)
errs()
<< "Node is out of range\n";
Node1--, Node2--; // Why minus?
if (G[Node1][Node2] == 0) G.Edges++;
G[Node1][Node2] = 1, G[Node2][Node1] = 1;

Please give me some hint, thanks a lot!

在 2017年12月15日 11:18, Leslie Zhai 写道:> Hi GCC and LLVM developers,
>
> I am learning Register Allocation algorithms and I am clear that:
>
> * Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg 
> (hard)
>
> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but 
> it has to spill code when PhysReg is unavailable
>
> * Folding spill code into instructions, handling register coallescing, 
> splitting live ranges, doing rematerialization, doing shrink wrapping 
> are harder than RegAlloc
>
> * LRA and IRA is default Passes in RA for GCC:
>
> $ /opt/gcc-git/bin/gcc hello.c
> DEBUG: ../../gcc/lra.c, lra_init_once, line 2441
> DEBUG: ../../gcc/ira-build.c, ira_build, line 3409
>
> * Greedy is default Pass for LLVM
>
> But I have some questions, please give me some hint, thanks a lot!
>
> * IRA is regional register allocator performing graph coloring on a 
> top-down traversal of nested regions, is it Global? compares with 
> Local LRA
>
> * The papers by Briggs and Chaiten contradict[2] themselves when 
> examine the text of the paper vs. the pseudocode provided?
>
> * Why  interference graph is expensive to build[3]?
>
> And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, for 
> LLVM firstly.
>
>
> [1] https://reviews.llvm.org/D39712
>
> [2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html
>
> [3] https://github.com/joaotavio/llvm-register-allocator
>
> [4] https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol
>
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

via llvm-dev

2017-Dec-18 16:41 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Leslie Zhai <lesliezhai at llvm.org.cn> writes:
> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but
> it has to spill code when PhysReg is unavailable
As Vladimir said, the cache makes this kind of analysis much more
tricky.  It's not necessarily the case that memory=bad and
register=good.  Since there are tradeoffs involved, one needs to
evaluate different strategies to determine *when* memory is worse than a
register.  It may very well be the case that leaving something in memory
frees up a register for something much more important to use it.  All of
the register allocation algorithms try to determine this kind of thing
through various heuristics.  Which heuristic is most effective is highly
target-dependent.

In my experience, changing heuristics can swing performance 20% or more
in some cases.  Today's processors and memory systems are so complex
that 2nd- and even 3rd-order effects become important.

It is very, very wrong on today's machines to use # of spills as a
metric to determine the "goodness" of an allocation.  Determining
*what*
to spill is much more important than the raw number of spills.  Many
times I have a seen codes generated with more spills perform much better
than the code generated with fewer spills.  Almost all of the papers
around the time of Chaiten-Briggs used # of spills as the metric.  That
may have been appropriate at that time but take those results with giant
grains of salt today.  Of course they are still very good papers for
understanding algorithms and heuristics.

The best way I know to determine what's best for your target is to run a
whole bunch of *real* codes on them, trying different allocation
algorithms and heuristics.  It is a lot of work, still worthy of a
Ph.D. even today.  Register allocation is not a "solved" problem and
probably never will be as architectures continue to change and become
ever more diverse.
> * Folding spill code into instructions, handling register coallescing,
> splitting live ranges, doing rematerialization, doing shrink wrapping
> are harder than RegAlloc
Again, as Vladimir said, they are all part of register allocation.
Sometimes they are implemented as separate passes but logically they all
contribute work to the task of assigning registers as efficiently as
possible.  And all of them use heuristics.  Choosing when and when not
to, for example, coalesce can be important.  Splitting live ranges is
logically the opposite of coalescing.  Theoretically one can "undo" a
bad coalescing decision by re-splitting the live range but it's usually
not that simple as other decisions in the interim can make that tricky
if not impossible.  It is a delicate balancing act.
> * The papers by Briggs and Chaiten contradict[2] themselves when
> examine the text of the paper vs. the pseudocode provided?
As others have said, I don't recall any inconsistencies but that doesn't
mean there aren't bugs and/or incomplete descriptions open to
interpretation.  One of my biggest issues with our computer academic
culture is that we do not value repeatability.  It is virtually
impossible to take a paper, implement the algorithm as described (or as
best as possible given ambiguity) and obtain the same results.  Heck,
half my Ph.D. dissertation was dissecting a published paper, examining
the sensitivity of the described algorithm to various parts of the
described heuristic that were ambiguous.  By interpreting the heuristic
description in different ways I observed very different results.  I read
papers for the algorithms, heuristics and ideas in them.  I pay no
attention to results because in the real world we have to implement the
ideas and test them ourselves to see if they will help us.

Peter is right to point you to Preston.  He is very accessible, friendly
and helpful.  I had the good fortune to work with him for a few years
and he taught me a lot.  He has much real-world experience on codes
actually used in production.  That experience is gold.

Good luck to you!  You're entering a world that some computing
professionals think is a waste of time because "we already know how to
do that."  Those of us in the trenches know better.  :)

                               -David

Leslie Zhai via llvm-dev

2017-Dec-18 17:52 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi David,

Thanks for your teaching!

I am a newbie in compiler area, I only learned Compiler Principle in 
2002 https://www.leetcode.cn/2017/12/ilove-compiler-principle.html

But I like to practice and learn :) 
https://github.com/xiangzhai/llvm/blob/avr/lib/CodeGen/RegAllocGraphColoring.cpp#L327
because theory is not always correct, or misunderstood by people, so I 
want to compare solutionByHEA, IRA, Greedy, PBQP and other algorithms.

Thanks for your lessons to correct my mistakes, such as memory=bad 
register=good, and I need to find the answer *when* memory is worse than 
a register, I am maintaining AVR target, there are 32 general registers, 
32K flash, 2K sram 
http://www.atmel.com/Images/Atmel-42735-8-bit-AVR-Microcontroller-ATmega328-328P_Datasheet.pdf
so perhaps to MCU, memory might be expensive than register? but what 
about AMDGPU or VLIW processors? I don't have experienced on them, 
please teach me.

I am reading LLVM's code SpillXXX, LiveRangeXXX, RegisterCoalescer, etc. 
to get the whole view of CodeGen.

I am reading Dr. Rhydian Lewis's book: A Guide to Graph Colouring: 
Algorithms and Applications 
http://www.springer.com/us/book/9783319257280   and other papers, even 
if HEA is not the best solution, I still want to practice and see the 
benchmark, I am not computing professionals, I am only 34 olds, perhaps 
I have enough time to waste :)


在 2017年12月19日 00:41, dag at cray.com 写道:> Leslie Zhai <lesliezhai at llvm.org.cn> writes:
>
>> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but
>> it has to spill code when PhysReg is unavailable
> As Vladimir said, the cache makes this kind of analysis much more
> tricky.  It's not necessarily the case that memory=bad and
> register=good.  Since there are tradeoffs involved, one needs to
> evaluate different strategies to determine *when* memory is worse than a
> register.  It may very well be the case that leaving something in memory
> frees up a register for something much more important to use it.  All of
> the register allocation algorithms try to determine this kind of thing
> through various heuristics.  Which heuristic is most effective is highly
> target-dependent.
>
> In my experience, changing heuristics can swing performance 20% or more
> in some cases.  Today's processors and memory systems are so complex
> that 2nd- and even 3rd-order effects become important.
>
> It is very, very wrong on today's machines to use # of spills as a
> metric to determine the "goodness" of an allocation.  Determining
*what*
> to spill is much more important than the raw number of spills.  Many
> times I have a seen codes generated with more spills perform much better
> than the code generated with fewer spills.  Almost all of the papers
> around the time of Chaiten-Briggs used # of spills as the metric.  That
> may have been appropriate at that time but take those results with giant
> grains of salt today.  Of course they are still very good papers for
> understanding algorithms and heuristics.
>
> The best way I know to determine what's best for your target is to run
a
> whole bunch of *real* codes on them, trying different allocation
> algorithms and heuristics.  It is a lot of work, still worthy of a
> Ph.D. even today.  Register allocation is not a "solved" problem
and
> probably never will be as architectures continue to change and become
> ever more diverse.
>
>> * Folding spill code into instructions, handling register coallescing,
>> splitting live ranges, doing rematerialization, doing shrink wrapping
>> are harder than RegAlloc
> Again, as Vladimir said, they are all part of register allocation.
> Sometimes they are implemented as separate passes but logically they all
> contribute work to the task of assigning registers as efficiently as
> possible.  And all of them use heuristics.  Choosing when and when not
> to, for example, coalesce can be important.  Splitting live ranges is
> logically the opposite of coalescing.  Theoretically one can
"undo" a
> bad coalescing decision by re-splitting the live range but it's usually
> not that simple as other decisions in the interim can make that tricky
> if not impossible.  It is a delicate balancing act.
>
>> * The papers by Briggs and Chaiten contradict[2] themselves when
>> examine the text of the paper vs. the pseudocode provided?
> As others have said, I don't recall any inconsistencies but that
doesn't
> mean there aren't bugs and/or incomplete descriptions open to
> interpretation.  One of my biggest issues with our computer academic
> culture is that we do not value repeatability.  It is virtually
> impossible to take a paper, implement the algorithm as described (or as
> best as possible given ambiguity) and obtain the same results.  Heck,
> half my Ph.D. dissertation was dissecting a published paper, examining
> the sensitivity of the described algorithm to various parts of the
> described heuristic that were ambiguous.  By interpreting the heuristic
> description in different ways I observed very different results.  I read
> papers for the algorithms, heuristics and ideas in them.  I pay no
> attention to results because in the real world we have to implement the
> ideas and test them ourselves to see if they will help us.
>
> Peter is right to point you to Preston.  He is very accessible, friendly
> and helpful.  I had the good fortune to work with him for a few years
> and he taught me a lot.  He has much real-world experience on codes
> actually used in production.  That experience is gold.
>
> Good luck to you!  You're entering a world that some computing
> professionals think is a waste of time because "we already know how to
> do that."  Those of us in the trenches know better.  :)
>
>                                 -David
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

Matthias Braun via llvm-dev

2017-Dec-19 02:07 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

> On Dec 18, 2017, at 9:52 AM, Leslie Zhai via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi David,
> 
> Thanks for your teaching!
> 
> I am a newbie in compiler area, I only learned Compiler Principle in 2002
https://www.leetcode.cn/2017/12/ilove-compiler-principle.html
<https://www.leetcode.cn/2017/12/ilove-compiler-principle.html>
> 
> But I like to practice and learn :)
https://github.com/xiangzhai/llvm/blob/avr/lib/CodeGen/RegAllocGraphColoring.cpp#L327
<https://github.com/xiangzhai/llvm/blob/avr/lib/CodeGen/RegAllocGraphColoring.cpp#L327>
because theory is not always correct, or misunderstood by people, so I want to
compare solutionByHEA, IRA, Greedy, PBQP and other algorithms.
Just as another tip:
- Indeed in my experience: Just implementing some algorithms yourself and
comparing them against what existing compilers produce and then figuring out why
is the best way to learn about allocators.
- Don't just put emphasis on all the different different coloring
techniques. In practice it is usually the way you deal register constraints and
other target specific coloring constraints, rematerialization and how you get
coalescing into the picture. Most regalloc papers don't talk too much about
that as it's highly finicky and often target specific. But they do have a
huge impact on allocation quality and can make or break any of those
algorithms...

- Matthias
> 
> Thanks for your lessons to correct my mistakes, such as memory=bad
register=good, and I need to find the answer *when* memory is worse than a
register, I am maintaining AVR target, there are 32 general registers, 32K
flash, 2K sram
http://www.atmel.com/Images/Atmel-42735-8-bit-AVR-Microcontroller-ATmega328-328P_Datasheet.pdf
<http://www.atmel.com/Images/Atmel-42735-8-bit-AVR-Microcontroller-ATmega328-328P_Datasheet.pdf>
so perhaps to MCU, memory might be expensive than register? but what about
AMDGPU or VLIW processors? I don't have experienced on them, please teach
me.
> 
> I am reading LLVM's code SpillXXX, LiveRangeXXX, RegisterCoalescer,
etc. to get the whole view of CodeGen.
> 
> I am reading Dr. Rhydian Lewis's book: A Guide to Graph Colouring:
Algorithms and Applications http://www.springer.com/us/book/9783319257280
<http://www.springer.com/us/book/9783319257280>   and other papers, even
if HEA is not the best solution, I still want to practice and see the benchmark,
I am not computing professionals, I am only 34 olds, perhaps I have enough time
to waste :)
> 
> 
> 在 2017年12月19日 00:41, dag at cray.com <mailto:dag at cray.com> 写道:
>> Leslie Zhai <lesliezhai at llvm.org.cn> writes:
>> 
>>> * Memory (20 - 100 cycles) is expensive than Register (1 cycle),
but
>>> it has to spill code when PhysReg is unavailable
>> As Vladimir said, the cache makes this kind of analysis much more
>> tricky.  It's not necessarily the case that memory=bad and
>> register=good.  Since there are tradeoffs involved, one needs to
>> evaluate different strategies to determine *when* memory is worse than
a
>> register.  It may very well be the case that leaving something in
memory
>> frees up a register for something much more important to use it.  All
of
>> the register allocation algorithms try to determine this kind of thing
>> through various heuristics.  Which heuristic is most effective is
highly
>> target-dependent.
>> 
>> In my experience, changing heuristics can swing performance 20% or more
>> in some cases.  Today's processors and memory systems are so
complex
>> that 2nd- and even 3rd-order effects become important.
>> 
>> It is very, very wrong on today's machines to use # of spills as a
>> metric to determine the "goodness" of an allocation. 
Determining *what*
>> to spill is much more important than the raw number of spills.  Many
>> times I have a seen codes generated with more spills perform much
better
>> than the code generated with fewer spills.  Almost all of the papers
>> around the time of Chaiten-Briggs used # of spills as the metric.  That
>> may have been appropriate at that time but take those results with
giant
>> grains of salt today.  Of course they are still very good papers for
>> understanding algorithms and heuristics.
>> 
>> The best way I know to determine what's best for your target is to
run a
>> whole bunch of *real* codes on them, trying different allocation
>> algorithms and heuristics.  It is a lot of work, still worthy of a
>> Ph.D. even today.  Register allocation is not a "solved"
problem and
>> probably never will be as architectures continue to change and become
>> ever more diverse.
>> 
>>> * Folding spill code into instructions, handling register
coallescing,
>>> splitting live ranges, doing rematerialization, doing shrink
wrapping
>>> are harder than RegAlloc
>> Again, as Vladimir said, they are all part of register allocation.
>> Sometimes they are implemented as separate passes but logically they
all
>> contribute work to the task of assigning registers as efficiently as
>> possible.  And all of them use heuristics.  Choosing when and when not
>> to, for example, coalesce can be important.  Splitting live ranges is
>> logically the opposite of coalescing.  Theoretically one can
"undo" a
>> bad coalescing decision by re-splitting the live range but it's
usually
>> not that simple as other decisions in the interim can make that tricky
>> if not impossible.  It is a delicate balancing act.
>> 
>>> * The papers by Briggs and Chaiten contradict[2] themselves when
>>> examine the text of the paper vs. the pseudocode provided?
>> As others have said, I don't recall any inconsistencies but that
doesn't
>> mean there aren't bugs and/or incomplete descriptions open to
>> interpretation.  One of my biggest issues with our computer academic
>> culture is that we do not value repeatability.  It is virtually
>> impossible to take a paper, implement the algorithm as described (or as
>> best as possible given ambiguity) and obtain the same results.  Heck,
>> half my Ph.D. dissertation was dissecting a published paper, examining
>> the sensitivity of the described algorithm to various parts of the
>> described heuristic that were ambiguous.  By interpreting the heuristic
>> description in different ways I observed very different results.  I
read
>> papers for the algorithms, heuristics and ideas in them.  I pay no
>> attention to results because in the real world we have to implement the
>> ideas and test them ourselves to see if they will help us.
>> 
>> Peter is right to point you to Preston.  He is very accessible,
friendly
>> and helpful.  I had the good fortune to work with him for a few years
>> and he taught me a lot.  He has much real-world experience on codes
>> actually used in production.  That experience is gold.
>> 
>> Good luck to you!  You're entering a world that some computing
>> professionals think is a waste of time because "we already know
how to
>> do that."  Those of us in the trenches know better.  :)
>> 
>>                                -David
> 
> -- 
> Regards,
> Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/
<https://reviews.llvm.org/p/xiangzhai/>
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171218/0be80505/attachment-0001.html>

Leslie Zhai via llvm-dev

2017-Dec-19 03:03 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi Matthias,

Thanks for your hint!

It is just for learning and practicing for me, just like migrate 
DragonEgg 
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117201.html the 
motivating is for learning from GCC and LLVM developers.


在 2017年12月19日 10:07, Matthias Braun 写道:>
>
>> On Dec 18, 2017, at 9:52 AM, Leslie Zhai via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>> Hi David,
>>
>> Thanks for your teaching!
>>
>> I am a newbie in compiler area, I only learned Compiler Principle in 
>> 2002https://www.leetcode.cn/2017/12/ilove-compiler-principle.html
>>
>> But I like to practice and learn 
>>
:)https://github.com/xiangzhai/llvm/blob/avr/lib/CodeGen/RegAllocGraphColoring.cpp#L327because
>> theory is not always correct, or misunderstood by people, so I want 
>> to compare solutionByHEA, IRA, Greedy, PBQP and other algorithms.
>
> Just as another tip:
> - Indeed in my experience: Just implementing some algorithms yourself 
> and comparing them against what existing compilers produce and then 
> figuring out why is the best way to learn about allocators.
> - Don't just put emphasis on all the different different coloring 
> techniques. In practice it is usually the way you deal register 
> constraints and other target specific coloring constraints, 
> rematerialization and how you get coalescing into the picture. Most 
> regalloc papers don't talk too much about that as it's highly
finicky
> and often target specific. But they do have a huge impact on 
> allocation quality and can make or break any of those algorithms...
>
> - Matthias
>
>>
>> Thanks for your lessons to correct my mistakes, such as memory=bad 
>> register=good, and I need to find the answer *when* memory is worse 
>> than a register, I am maintaining AVR target, there are 32 general 
>> registers, 32K flash, 2K 
>>
sramhttp://www.atmel.com/Images/Atmel-42735-8-bit-AVR-Microcontroller-ATmega328-328P_Datasheet.pdfso
>> perhaps to MCU, memory might be expensive than register? but what 
>> about AMDGPU or VLIW processors? I don't have experienced on them, 
>> please teach me.
>>
>> I am reading LLVM's code SpillXXX, LiveRangeXXX, RegisterCoalescer,
>> etc. to get the whole view of CodeGen.
>>
>> I am reading Dr. Rhydian Lewis's book: A Guide to Graph Colouring: 
>> Algorithms and 
>> Applicationshttp://www.springer.com/us/book/9783319257280   and other 
>> papers, even if HEA is not the best solution, I still want to 
>> practice and see the benchmark, I am not computing professionals, I 
>> am only 34 olds, perhaps I have enough time to waste :)
>>
>>
>> 在 2017年12月19日 00:41,dag at cray.com <mailto:dag at cray.com>写道:
>>> Leslie Zhai <lesliezhai at llvm.org.cn <mailto:lesliezhai at
llvm.org.cn>>
>>> writes:
>>>
>>>> * Memory (20 - 100 cycles) is expensive than Register (1
cycle), but
>>>> it has to spill code when PhysReg is unavailable
>>> As Vladimir said, the cache makes this kind of analysis much more
>>> tricky.  It's not necessarily the case that memory=bad and
>>> register=good.  Since there are tradeoffs involved, one needs to
>>> evaluate different strategies to determine *when* memory is worse
than a
>>> register.  It may very well be the case that leaving something in
memory
>>> frees up a register for something much more important to use it.
 All of
>>> the register allocation algorithms try to determine this kind of
thing
>>> through various heuristics.  Which heuristic is most effective is
highly
>>> target-dependent.
>>>
>>> In my experience, changing heuristics can swing performance 20% or
more
>>> in some cases.  Today's processors and memory systems are so
complex
>>> that 2nd- and even 3rd-order effects become important.
>>>
>>> It is very, very wrong on today's machines to use # of spills
as a
>>> metric to determine the "goodness" of an allocation.
 Determining *what*
>>> to spill is much more important than the raw number of spills.
 Many
>>> times I have a seen codes generated with more spills perform much
better
>>> than the code generated with fewer spills.  Almost all of the
papers
>>> around the time of Chaiten-Briggs used # of spills as the metric.
 That
>>> may have been appropriate at that time but take those results with
giant
>>> grains of salt today.  Of course they are still very good papers
for
>>> understanding algorithms and heuristics.
>>>
>>> The best way I know to determine what's best for your target is
to run a
>>> whole bunch of *real* codes on them, trying different allocation
>>> algorithms and heuristics.  It is a lot of work, still worthy of a
>>> Ph.D. even today.  Register allocation is not a "solved"
problem and
>>> probably never will be as architectures continue to change and
become
>>> ever more diverse.
>>>
>>>> * Folding spill code into instructions, handling register
coallescing,
>>>> splitting live ranges, doing rematerialization, doing shrink
wrapping
>>>> are harder than RegAlloc
>>> Again, as Vladimir said, they are all part of register allocation.
>>> Sometimes they are implemented as separate passes but logically
they all
>>> contribute work to the task of assigning registers as efficiently
as
>>> possible.  And all of them use heuristics.  Choosing when and when
not
>>> to, for example, coalesce can be important.  Splitting live ranges
is
>>> logically the opposite of coalescing.  Theoretically one can
"undo" a
>>> bad coalescing decision by re-splitting the live range but it's
usually
>>> not that simple as other decisions in the interim can make that
tricky
>>> if not impossible.  It is a delicate balancing act.
>>>
>>>> * The papers by Briggs and Chaiten contradict[2] themselves
when
>>>> examine the text of the paper vs. the pseudocode provided?
>>> As others have said, I don't recall any inconsistencies but
that doesn't
>>> mean there aren't bugs and/or incomplete descriptions open to
>>> interpretation.  One of my biggest issues with our computer
academic
>>> culture is that we do not value repeatability.  It is virtually
>>> impossible to take a paper, implement the algorithm as described
(or as
>>> best as possible given ambiguity) and obtain the same results.
 Heck,
>>> half my Ph.D. dissertation was dissecting a published paper,
examining
>>> the sensitivity of the described algorithm to various parts of the
>>> described heuristic that were ambiguous.  By interpreting the
heuristic
>>> description in different ways I observed very different results.  I
read
>>> papers for the algorithms, heuristics and ideas in them.  I pay no
>>> attention to results because in the real world we have to implement
the
>>> ideas and test them ourselves to see if they will help us.
>>>
>>> Peter is right to point you to Preston.  He is very accessible,
friendly
>>> and helpful.  I had the good fortune to work with him for a few
years
>>> and he taught me a lot.  He has much real-world experience on codes
>>> actually used in production.  That experience is gold.
>>>
>>> Good luck to you!  You're entering a world that some computing
>>> professionals think is a waste of time because "we already
know how to
>>> do that."  Those of us in the trenches know better.  :)
>>>
>>>                                -David
>>
>> --
>> Regards,
>> Leslie Zhai -https://reviews.llvm.org/p/xiangzhai/
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Dec 2017 - Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Maybe Matching Threads