thr3ads.net - llvm dev - [llvm-dev] Register Allocation Graph Coloring algorithm and Others [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Leslie Zhai via llvm-dev

2017-Dec-19 03:03 UTC

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi Matthias,

Thanks for your hint!

It is just for learning and practicing for me, just like migrate 
DragonEgg 
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117201.html the 
motivating is for learning from GCC and LLVM developers.


在 2017年12月19日 10:07, Matthias Braun 写道:>
>
>> On Dec 18, 2017, at 9:52 AM, Leslie Zhai via llvm-dev 
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>> Hi David,
>>
>> Thanks for your teaching!
>>
>> I am a newbie in compiler area, I only learned Compiler Principle in 
>> 2002https://www.leetcode.cn/2017/12/ilove-compiler-principle.html
>>
>> But I like to practice and learn 
>>
:)https://github.com/xiangzhai/llvm/blob/avr/lib/CodeGen/RegAllocGraphColoring.cpp#L327because
>> theory is not always correct, or misunderstood by people, so I want 
>> to compare solutionByHEA, IRA, Greedy, PBQP and other algorithms.
>
> Just as another tip:
> - Indeed in my experience: Just implementing some algorithms yourself 
> and comparing them against what existing compilers produce and then 
> figuring out why is the best way to learn about allocators.
> - Don't just put emphasis on all the different different coloring 
> techniques. In practice it is usually the way you deal register 
> constraints and other target specific coloring constraints, 
> rematerialization and how you get coalescing into the picture. Most 
> regalloc papers don't talk too much about that as it's highly
finicky
> and often target specific. But they do have a huge impact on 
> allocation quality and can make or break any of those algorithms...
>
> - Matthias
>
>>
>> Thanks for your lessons to correct my mistakes, such as memory=bad 
>> register=good, and I need to find the answer *when* memory is worse 
>> than a register, I am maintaining AVR target, there are 32 general 
>> registers, 32K flash, 2K 
>>
sramhttp://www.atmel.com/Images/Atmel-42735-8-bit-AVR-Microcontroller-ATmega328-328P_Datasheet.pdfso
>> perhaps to MCU, memory might be expensive than register? but what 
>> about AMDGPU or VLIW processors? I don't have experienced on them, 
>> please teach me.
>>
>> I am reading LLVM's code SpillXXX, LiveRangeXXX, RegisterCoalescer,
>> etc. to get the whole view of CodeGen.
>>
>> I am reading Dr. Rhydian Lewis's book: A Guide to Graph Colouring: 
>> Algorithms and 
>> Applicationshttp://www.springer.com/us/book/9783319257280   and other 
>> papers, even if HEA is not the best solution, I still want to 
>> practice and see the benchmark, I am not computing professionals, I 
>> am only 34 olds, perhaps I have enough time to waste :)
>>
>>
>> 在 2017年12月19日 00:41,dag at cray.com <mailto:dag at cray.com>写道:
>>> Leslie Zhai <lesliezhai at llvm.org.cn <mailto:lesliezhai at
llvm.org.cn>>
>>> writes:
>>>
>>>> * Memory (20 - 100 cycles) is expensive than Register (1
cycle), but
>>>> it has to spill code when PhysReg is unavailable
>>> As Vladimir said, the cache makes this kind of analysis much more
>>> tricky.  It's not necessarily the case that memory=bad and
>>> register=good.  Since there are tradeoffs involved, one needs to
>>> evaluate different strategies to determine *when* memory is worse
than a
>>> register.  It may very well be the case that leaving something in
memory
>>> frees up a register for something much more important to use it.
 All of
>>> the register allocation algorithms try to determine this kind of
thing
>>> through various heuristics.  Which heuristic is most effective is
highly
>>> target-dependent.
>>>
>>> In my experience, changing heuristics can swing performance 20% or
more
>>> in some cases.  Today's processors and memory systems are so
complex
>>> that 2nd- and even 3rd-order effects become important.
>>>
>>> It is very, very wrong on today's machines to use # of spills
as a
>>> metric to determine the "goodness" of an allocation.
 Determining *what*
>>> to spill is much more important than the raw number of spills.
 Many
>>> times I have a seen codes generated with more spills perform much
better
>>> than the code generated with fewer spills.  Almost all of the
papers
>>> around the time of Chaiten-Briggs used # of spills as the metric.
 That
>>> may have been appropriate at that time but take those results with
giant
>>> grains of salt today.  Of course they are still very good papers
for
>>> understanding algorithms and heuristics.
>>>
>>> The best way I know to determine what's best for your target is
to run a
>>> whole bunch of *real* codes on them, trying different allocation
>>> algorithms and heuristics.  It is a lot of work, still worthy of a
>>> Ph.D. even today.  Register allocation is not a "solved"
problem and
>>> probably never will be as architectures continue to change and
become
>>> ever more diverse.
>>>
>>>> * Folding spill code into instructions, handling register
coallescing,
>>>> splitting live ranges, doing rematerialization, doing shrink
wrapping
>>>> are harder than RegAlloc
>>> Again, as Vladimir said, they are all part of register allocation.
>>> Sometimes they are implemented as separate passes but logically
they all
>>> contribute work to the task of assigning registers as efficiently
as
>>> possible.  And all of them use heuristics.  Choosing when and when
not
>>> to, for example, coalesce can be important.  Splitting live ranges
is
>>> logically the opposite of coalescing.  Theoretically one can
"undo" a
>>> bad coalescing decision by re-splitting the live range but it's
usually
>>> not that simple as other decisions in the interim can make that
tricky
>>> if not impossible.  It is a delicate balancing act.
>>>
>>>> * The papers by Briggs and Chaiten contradict[2] themselves
when
>>>> examine the text of the paper vs. the pseudocode provided?
>>> As others have said, I don't recall any inconsistencies but
that doesn't
>>> mean there aren't bugs and/or incomplete descriptions open to
>>> interpretation.  One of my biggest issues with our computer
academic
>>> culture is that we do not value repeatability.  It is virtually
>>> impossible to take a paper, implement the algorithm as described
(or as
>>> best as possible given ambiguity) and obtain the same results.
 Heck,
>>> half my Ph.D. dissertation was dissecting a published paper,
examining
>>> the sensitivity of the described algorithm to various parts of the
>>> described heuristic that were ambiguous.  By interpreting the
heuristic
>>> description in different ways I observed very different results.  I
read
>>> papers for the algorithms, heuristics and ideas in them.  I pay no
>>> attention to results because in the real world we have to implement
the
>>> ideas and test them ourselves to see if they will help us.
>>>
>>> Peter is right to point you to Preston.  He is very accessible,
friendly
>>> and helpful.  I had the good fortune to work with him for a few
years
>>> and he taught me a lot.  He has much real-world experience on codes
>>> actually used in production.  That experience is gold.
>>>
>>> Good luck to you!  You're entering a world that some computing
>>> professionals think is a waste of time because "we already
know how to
>>> do that."  Those of us in the trenches know better.  :)
>>>
>>>                                -David
>>
>> --
>> Regards,
>> Leslie Zhai -https://reviews.llvm.org/p/xiangzhai/
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

Jakob Stoklund Olesen via llvm-dev

2017-Dec-20 17:25 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

> On Dec 18, 2017, at 19:03, Leslie Zhai <lesliezhai at llvm.org.cn>
wrote:
Hi Leslie,

As others have pointed out, the notion that register allocation is isomorphic to
graph coloring is poppycock. There are other important aspects, in particular
the placement of spill/fill/copy instructions. The importance of graph coloring
relative to spill code placement depends on how many registers you have
available. If you are generating code for 32-bit x86 which has only 6-7 general
purpose registers, you will have so much spill code and short live ranges that
graph coloring doesn’t matter much at all. On the other hand, if you have 32
registers like Chaitin did, you have much less spilling in typical code, and the
graph coloring aspect becomes important.

Early compilers would keep each local variable in a stack slot, and the register
allocation optimization would literally allocate a whole local variable to a
register. The C “register” keyword makes sense in that context. Later
improvements like copy coalescing and live range splitting meant that multiple
local variables could use the same register and a variable could live in
different places at different times. It is sometimes useful to take this
development to its logical extreme and look at register allocation as a caching
problem: The register allocator’s job is to make sure that values are available
to the instructions the need them, using the registers as a cache to get the
values there in the most efficient way possible.

Guo, J., Garzarán, M. J., & Padua, D. (2004). The Power of Belady’s
Algorithm in Register Allocation for Long Basic Blocks. In Languages and
Compilers for Parallel Computing (Vol. 2958, pp. 374–389). Berlin, Heidelberg:
Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-540-24644-2_24

Braun, M., & Hack, S. (2009). Register spilling and live-range splitting for
SSA-form programs. International Conference on Compiler Construction.

When you look at register allocation that way, the graph coloring aspect almost
disappears. The optimum approach is probably somewhere in the middle.

A third important aspect is register constraints on individual instructions.
Sometimes you almost need a little constraint solver just to figure out a valid
register assignment for a single instruction. Preston Briggs dealt with this in
his thesis, but it hasn’t gotten as much attention as graph coloring since.

Pereira, F. Q., & Palsberg, J. (2008). Register allocation by puzzle
solving.

Regards,
/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171220/a6b8d7f9/attachment.html>

Leslie Zhai via llvm-dev

2017-Dec-21 00:44 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi Jakob,

Thanks for your kind response!

My usecase is for AVR and RISCV targets, and I just want to learn and 
practice HEA in RA, thanks for your sharing.


在 2017年12月21日 01:25, Jakob Stoklund Olesen 写道:>
>> On Dec 18, 2017, at 19:03, Leslie Zhai <lesliezhai at llvm.org.cn 
>> <mailto:lesliezhai at llvm.org.cn>> wrote:
>
> Hi Leslie,
>
> As others have pointed out, the notion that register allocation is 
> isomorphic to graph coloring is poppycock. There are other important 
> aspects, in particular the placement of spill/fill/copy instructions. 
> The importance of graph coloring relative to spill code placement 
> depends on how many registers you have available. If you are 
> generating code for 32-bit x86 which has only 6-7 general purpose 
> registers, you will have so much spill code and short live ranges that 
> graph coloring doesn’t matter much at all. On the other hand, if you 
> have 32 registers like Chaitin did, you have much less spilling in 
> typical code, and the graph coloring aspect becomes important.
>
> Early compilers would keep each local variable in a stack slot, and 
> the register allocation optimization would literally allocate a whole 
> local variable to a register. The C “register” keyword makes sense in 
> that context. Later improvements like copy coalescing and live range 
> splitting meant that multiple local variables could use the same 
> register and a variable could live in different places at different 
> times. It is sometimes useful to take this development to its logical 
> extreme and look at register allocation as a caching problem: The 
> register allocator’s job is to make sure that values are available to 
> the instructions the need them, using the registers as a cache to get 
> the values there in the most efficient way possible.
>
>     Guo, J., Garzarán, M. J., & Padua, D. (2004). The Power of
>     Belady’s Algorithm in Register Allocation for Long Basic Blocks.
>     In Languages and Compilers for Parallel Computing (Vol. 2958, pp.
>     374–389). Berlin, Heidelberg: Springer Berlin Heidelberg.
>     http://doi.org/10.1007/978-3-540-24644-2_24
>
>     Braun, M., & Hack, S. (2009). Register spilling and live-range
>     splitting for SSA-form programs. International Conference on
>     Compiler Construction.
>
>
> When you look at register allocation that way, the graph coloring 
> aspect almost disappears. The optimum approach is probably somewhere 
> in the middle.
>
> A third important aspect is register constraints on individual 
> instructions. Sometimes you almost need a little constraint solver 
> just to figure out a valid register assignment for a single 
> instruction. Preston Briggs dealt with this in his thesis, but it 
> hasn’t gotten as much attention as graph coloring since.
>
>     Pereira, F. Q., & Palsberg, J. (2008). Register allocation by
>     puzzle solving.
>
>
> Regards,
> /jakob
>
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

Bruce Hoult via llvm-dev

2017-Dec-21 11:09 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

So, both AVR and RISC-V are fairly register-rich with usually 32. RV32E
only has 16, but that's still a lot better than i386. If you use a lot of
16 bit integers then AVR also only has effectively 16 registers (or a more
with a mix of 8 and 16 bit variables). 32 bit integers should be rare in
AVR code, but soft float/double variables are common in Arduino code (both
are implemented as 32 bits), so you only have room for 8 of those.

RISC-V doesn't have any hard constraints on something that *must* go in a
certain register, except the usual argument passing/return convention.
There is a an advantage to allocating both the data src/dst register and
the pointer base register for a load or store from x8-x15 (s0-1,a0-5) as
much as possible as this allows the assembler to use a two byte instruction
instead of a four byte instruction.

I haven't look at AVR in detail for a while, but I seem to recall the top
six registers make up three 16 bit pointer registers X,Y,Z. Any of them can
be used for (C language) *p, *--p, *p++, only Y&Z can be used for p->foo,
and only Z can be used for computed jumps (including function link
register) or loading constants from program memory. Also the various
multiply instructions take their 8 bit operands from any registers but
always produce the 16 bit result in r1:r0. Annoying but nowhere near as bad
as i386 as r0 and r1 are not used for anything else. The usual ABI makes r0
a clobber-at-will temp. r1 is supposed to be "always zero", so you
need to
CLR it after retrieving (or ignoring) the high bits of a multiply result.

On Thu, Dec 21, 2017 at 3:44 AM, Leslie Zhai via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Jakob,
>
> Thanks for your kind response!
>
> My usecase is for AVR and RISCV targets, and I just want to learn and
> practice HEA in RA, thanks for your sharing.
>
>
> 在 2017年12月21日 01:25, Jakob Stoklund Olesen 写道:
>
>>
>> On Dec 18, 2017, at 19:03, Leslie Zhai <lesliezhai at llvm.org.cn
<mailto:
>>> lesliezhai at llvm.org.cn>> wrote:
>>>
>>
>> Hi Leslie,
>>
>> As others have pointed out, the notion that register allocation is
>> isomorphic to graph coloring is poppycock. There are other important
>> aspects, in particular the placement of spill/fill/copy instructions.
The
>> importance of graph coloring relative to spill code placement depends
on
>> how many registers you have available. If you are generating code for
>> 32-bit x86 which has only 6-7 general purpose registers, you will have
so
>> much spill code and short live ranges that graph coloring doesn’t
matter
>> much at all. On the other hand, if you have 32 registers like Chaitin
did,
>> you have much less spilling in typical code, and the graph coloring
aspect
>> becomes important.
>>
>> Early compilers would keep each local variable in a stack slot, and the
>> register allocation optimization would literally allocate a whole local
>> variable to a register. The C “register” keyword makes sense in that
>> context. Later improvements like copy coalescing and live range
splitting
>> meant that multiple local variables could use the same register and a
>> variable could live in different places at different times. It is
sometimes
>> useful to take this development to its logical extreme and look at
register
>> allocation as a caching problem: The register allocator’s job is to
make
>> sure that values are available to the instructions the need them, using
the
>> registers as a cache to get the values there in the most efficient way
>> possible.
>>
>>     Guo, J., Garzarán, M. J., & Padua, D. (2004). The Power of
>>     Belady’s Algorithm in Register Allocation for Long Basic Blocks.
>>     In Languages and Compilers for Parallel Computing (Vol. 2958, pp.
>>     374–389). Berlin, Heidelberg: Springer Berlin Heidelberg.
>>     http://doi.org/10.1007/978-3-540-24644-2_24
>>
>>     Braun, M., & Hack, S. (2009). Register spilling and live-range
>>     splitting for SSA-form programs. International Conference on
>>     Compiler Construction.
>>
>>
>> When you look at register allocation that way, the graph coloring
aspect
>> almost disappears. The optimum approach is probably somewhere in the
middle.
>>
>> A third important aspect is register constraints on individual
>> instructions. Sometimes you almost need a little constraint solver just
to
>> figure out a valid register assignment for a single instruction.
Preston
>> Briggs dealt with this in his thesis, but it hasn’t gotten as much
>> attention as graph coloring since.
>>
>>     Pereira, F. Q., & Palsberg, J. (2008). Register allocation by
>>     puzzle solving.
>>
>>
>> Regards,
>> /jakob
>>
>>
> --
> Regards,
> Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171221/5c6414b8/attachment.html>

Leslie Zhai via llvm-dev

2017-Dec-21 16:25 UTC

head link

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Hi Bruce,

Thanks for your sharing!

I am porting GlobalISel to RISCV target[1], the highest priority in the 
TODO list[2], welcome to contribute to lowRISC, if fixed all the issues, 
then I could try to implement RegAllocGraphColoring in HEA and write 
great Machine Schedulers.

[1] https://github.com/lowRISC/riscv-llvm/issues/19
[2] https://github.com/lowRISC/riscv-llvm/issues

在 2017年12月21日 19:09, Bruce Hoult 写道:> So, both AVR and RISC-V are fairly register-rich with usually 32. 
> RV32E only has 16, but that's still a lot better than i386. If you use 
> a lot of 16 bit integers then AVR also only has effectively 16 
> registers (or a more with a mix of 8 and 16 bit variables). 32 bit 
> integers should be rare in AVR code, but soft float/double variables 
> are common in Arduino code (both are implemented as 32 bits), so you 
> only have room for 8 of those.
>
> RISC-V doesn't have any hard constraints on something that *must* go 
> in a certain register, except the usual argument passing/return 
> convention. There is a an advantage to allocating both the data 
> src/dst register and the pointer base register for a load or store 
> from x8-x15 (s0-1,a0-5) as much as possible as this allows the 
> assembler to use a two byte instruction instead of a four byte 
> instruction.
>
> I haven't look at AVR in detail for a while, but I seem to recall the 
> top six registers make up three 16 bit pointer registers X,Y,Z. Any of 
> them can be used for (C language) *p, *--p, *p++, only Y&Z can be used 
> for p->foo, and only Z can be used for computed jumps (including 
> function link register) or loading constants from program memory. Also 
> the various multiply instructions take their 8 bit operands from any 
> registers but always produce the 16 bit result in r1:r0. Annoying but 
> nowhere near as bad as i386 as r0 and r1 are not used for anything 
> else. The usual ABI makes r0 a clobber-at-will temp. r1 is supposed to 
> be "always zero", so you need to CLR it after retrieving (or
ignoring)
> the high bits of a multiply result.
>
> On Thu, Dec 21, 2017 at 3:44 AM, Leslie Zhai via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi Jakob,
>
>     Thanks for your kind response!
>
>     My usecase is for AVR and RISCV targets, and I just want to learn
>     and practice HEA in RA, thanks for your sharing.
>
>
>     在 2017年12月21日 01:25, Jakob Stoklund Olesen 写道:
>
>
>             On Dec 18, 2017, at 19:03, Leslie Zhai
>             <lesliezhai at llvm.org.cn <mailto:lesliezhai at
llvm.org.cn>
>             <mailto:lesliezhai at llvm.org.cn
>             <mailto:lesliezhai at llvm.org.cn>>> wrote:
>
>
>         Hi Leslie,
>
>         As others have pointed out, the notion that register
>         allocation is isomorphic to graph coloring is poppycock. There
>         are other important aspects, in particular the placement of
>         spill/fill/copy instructions. The importance of graph coloring
>         relative to spill code placement depends on how many registers
>         you have available. If you are generating code for 32-bit x86
>         which has only 6-7 general purpose registers, you will have so
>         much spill code and short live ranges that graph coloring
>         doesn’t matter much at all. On the other hand, if you have 32
>         registers like Chaitin did, you have much less spilling in
>         typical code, and the graph coloring aspect becomes important.
>
>         Early compilers would keep each local variable in a stack
>         slot, and the register allocation optimization would literally
>         allocate a whole local variable to a register. The C
>         “register” keyword makes sense in that context. Later
>         improvements like copy coalescing and live range splitting
>         meant that multiple local variables could use the same
>         register and a variable could live in different places at
>         different times. It is sometimes useful to take this
>         development to its logical extreme and look at register
>         allocation as a caching problem: The register allocator’s job
>         is to make sure that values are available to the instructions
>         the need them, using the registers as a cache to get the
>         values there in the most efficient way possible.
>
>             Guo, J., Garzarán, M. J., & Padua, D. (2004). The Power of
>             Belady’s Algorithm in Register Allocation for Long Basic
>         Blocks.
>             In Languages and Compilers for Parallel Computing (Vol.
>         2958, pp.
>             374–389). Berlin, Heidelberg: Springer Berlin Heidelberg.
>         http://doi.org/10.1007/978-3-540-24644-2_24
>         <http://doi.org/10.1007/978-3-540-24644-2_24>
>
>             Braun, M., & Hack, S. (2009). Register spilling and
live-range
>             splitting for SSA-form programs. International Conference on
>             Compiler Construction.
>
>
>         When you look at register allocation that way, the graph
>         coloring aspect almost disappears. The optimum approach is
>         probably somewhere in the middle.
>
>         A third important aspect is register constraints on individual
>         instructions. Sometimes you almost need a little constraint
>         solver just to figure out a valid register assignment for a
>         single instruction. Preston Briggs dealt with this in his
>         thesis, but it hasn’t gotten as much attention as graph
>         coloring since.
>
>             Pereira, F. Q., & Palsberg, J. (2008). Register allocation
by
>             puzzle solving.
>
>
>         Regards,
>         /jakob
>
>
>     -- 
>     Regards,
>     Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/
>     <https://reviews.llvm.org/p/xiangzhai/>
>
>
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
-- 
Regards,
Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/

llvm dev - Dec 2017 - Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others

[llvm-dev] Register Allocation Graph Coloring algorithm and Others