thr3ads.net - llvm dev - [LLVMdev] Greedy Register Allocation in LLVM 3.0 [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2011-Sep-27 05:11 UTC

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 26, 2011, at 4:22 AM, Leo Romanoff wrote:> Hi Jakob,
> 
> Thanks for a very interesting description of the new register allocation
algorithm in LLVM!!!
> 
> Could you elaborate a bit on the following topics:
> 
> 1) Do you have any plans to publish something more formal and detailed
about this algorithm, e.g. a paper or something similar?  It would be nice to
better understand how this algorithm relates to well-known algorithms described
in the literature, e.g. linear scan, graph coloring, iterated register
coalescing by Chaitin-Briggs, register allocation by priority-based coloring by
Chow and Hennesey and if it represents an extension of one of those algorithms
or if it is a totally new algorithm. And of course it would be nice if the
terminology used by this description would follow the one published in the
literature, as sometimes use of a different terminology can be misleading. For
example, it is rather well known that the Linear Scan register allocation of
LLVM was not a real linear scan register allocator, as it allowed for
backtracking and re-coloring. It was de-facto more of a special case of
graph-coloring register allocator. But due to its name, many
> published research papers assumed that it is a real linear scan
implementation and took it as a representative example for such allocators.
> 
> 2) A comparison of the quality of generated allocations  and allocator
performance with well-known register allocation algorithms would be very
interesting to better understand pros and cons of this new algorithm. Do you
have any numbers at hand? If not, do you plan to produce a more thorough
analysis?
> 
> 3) It is stated in the blog that this allocator is much simpler than the
old linear scan implementation of LLVM. Can you provide a bit more information?
>    For example:
>     - How many lines of code it is and how it compares in this regard to
the old linear scan?
>     - Are there any wired places like the former SimpleRegisterCoalescer,
which was not that simple at all and had an extremely wired logic ;-)
>     - You explained that it produces  in most situations a better code than
linear scan, which is cool! But it would be also interesting to know in which
pathological cases it still looses against linear scan and why
>     - Can you provide a rough estimate of the algorithmic complexity of the
new algorithm? Is it O(n), O(n^2) or something else?
> 
> Thanks,
>   Roman
It may be more helpful to explain how LLVM's register allocator came
into existence before debating the high level algorithm.

When I began working on LLVM last October, Jakob was developing an
infrastructure for global live range splitting. It was just becoming
clear that LLVM's Linear Scan was inadequate for this work, and no one
I talked to knew how to fix it. Chris and Evan were very receptive to
replacing the register allocator, but the design space was wide
open. I knew from experience that we did not want to build a
traditional interference graph. Chris, Evan, and Jakob all agreed on
this point. I reasoned that an efficient implementation of
LiveInterval along with a new "LiveIntervalUnion" data structure and
mechanism for caching interference tests would be sufficient, and
offered to build a prototype to prove it.

This was an important step. My goal was never to develop a
theoretically superior register allocator algorithm in its own
right. On the contrary, Jakob and I wanted the global splitter to
drive allocation, rather than an allocation algorithm that fell back
to splitting when it failed. The important stake holders at the time
strongly agreed that the problem Jakob was solving, global live range
splitting, was the difficult problem that deserved attention, and that
the register allocation algorithm was only important in that it did
not inhibit splitting or spilling or unduly impact compile time. I
believe this goal led to a register allocator that is fundamentally
different from anything I'm aware of at the level of software design.

To summarize, Jakob and I both strongly agreed on the following design
goals, primarily for engineering reasons, not theoretical reasons:

- early aggressive coalescing

- no predetermined allocation order
  i.e. no fixed order CFG traversal,
       splitting changes allocation order

- no explicit interference graph

- on-the-fly live range splitting at the finest possible granularity

  When I say this the new allocator is fundamentally different from
  the other allocators I've seen, this is mainly what I'm referring
  to. At each decision point made by the allocator, the splitter may
  rewrite the code, modify live ranges, and incrementally update every
  bit of state in the allocator. There is no abandoning the current
  solution and no iterative application of an algorithm.

When I implemented the Basic/Greedy prototypes, I was able to replace
the code in RegAllocLinearScan.cpp with the code you can still see in
RegAllocBasic.cpp. You can see for yourself what Jakob means by the
new allocator being much simpler. And as Jakob explained, the real
benefit will be in removing VirtRegWriter.cpp which is made possible
by his spill code optimizations. Those optimizations are in turn made
practical by the new framework for incrementally updating the register
allocator's state.

The naming decision was literally a five minute brainstorm driven by
the need for unique filenames that are easy to type. "Greedy" is no
better description of our design than "Linear Scan" is of the LLVM
implementation by that name or "Graph Coloring" is of most
Chaitin-Briggs implementations. I believe the best name for the new
design would be one that conveys that it is actually a live range
splitter that happens to perform allocation. But I am not attached to
the names, and always figured the final algorithm, if successful,
would simply be known as the "LLVM Register Allocator".

It's worth mentioning that we considered calling it a "Priority"
allocator, but I objected to this because I felt it to be
misleading. Register reassignment and eviction were part of my
original prototype and I felt those features were more important than
allocation order. I used a priority queue for convenience, but
believed the choice was somewhat insignificant. In fact, the design
was intended to work reasonably well even if priorities were
inverted. I initially used spill weight as a place holder for whatever
dynamic heuristic Jakob would find works well with the splitter. One
thing Jakob discovered is that an "inverted" heuristic actually
cooperates better with his splitter. The Greedy allocator now
allocates registers to the largest live ranges first.

To be fair to Linear Scan, it is also worth noting that the framework
for live intervals was directly inspired by the LLVM implementation of
the linear scan algorithm.

In the end I believe the effectiveness of any allocator reflects the
work that goes into modeling the details of the target
architecture. This is not easy to achieve in a retargetable code
generator. Consequently, that is where much of the effort continues to
be directed in LLVM, rather than in attempting to rank algorithms in
their most abstract interpretation.

I hope this brief chronicle of the algorithm's inception dispels some of
its mystery.

-Andy

Leo Romanoff

2011-Sep-27 07:11 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

Hi Jakob, Hi Andy,

First of all, thanks a lot for very elaborative and interesting answers!

> It may be more helpful to explain how LLVM's register allocator came
> into existence before debating the high level algorithm.
> 
> When I began working on LLVM last October, Jakob was developing an
> infrastructure for global live range splitting. It was just becoming
> clear that LLVM's Linear Scan was inadequate for this work, and no one
> I talked to knew how to fix it. Chris and Evan were very receptive to
> replacing the register allocator, but the design space was wide
> open. I knew from experience that we did not want to build a
> traditional interference graph. Chris, Evan, and Jakob all agreed on
> this point. I reasoned that an efficient implementation of
> LiveInterval along with a new "LiveIntervalUnion" data structure
and
> mechanism for caching interference tests would be sufficient, and
> offered to build a prototype to prove it.
> 
> This was an important step. My goal was never to develop a
> theoretically superior register allocator algorithm in its own
> right. On the contrary, Jakob and I wanted the global splitter to
> drive allocation, rather than an allocation algorithm that fell back
> to splitting when it failed. The important stake holders at the time
> strongly agreed that the problem Jakob was solving, global live range
> splitting, was the difficult problem that deserved attention, and that
> the register allocation algorithm was only important in that it did
> not inhibit splitting or spilling or unduly impact compile time. I
> believe this goal led to a register allocator that is fundamentally
> different from anything I'm aware of at the level of software design.
> 
> To summarize, Jakob and I both strongly agreed on the following design
> goals, primarily for engineering reasons, not theoretical reasons:
I totally understand that the goal was/is to build a real, efficient register
allocator, instead of producing a theoretical result, which may sound good in
theory, but would not be very practical in terms of quality or compilation speed
on real world use-cases.
> - early aggressive coalescing
> 
> - no predetermined allocation order
>   i.e. no fixed order CFG traversal,
>        splitting changes allocation order
> 
> - no explicit interference graph
> 
> - on-the-fly live range splitting at the finest possible granularity
This very nicely summarizes main objectives of the new allocator.
>   When I say this the new allocator is fundamentally different from
>   the other allocators I've seen, this is mainly what I'm referring
>   to. At each decision point made by the allocator, the splitter may
>   rewrite the code, modify live ranges, and incrementally update every
>   bit of state in the allocator. There is no abandoning the current
>   solution and no iterative application of an algorithm.
While I agree that most "classical" allocators (e.g. graph coloring)
do not do fine grained splitting and usually build an explicit interference
graph, I also want to mention that many of the modern proposals published in the
literature often do very agressive splitting (sometimes even before and after at
each instruction, or at the beginning and end of each basic block). Many of
proposals also avoid construction of interference graphs and replace it with
on-demand interference computation, which is often very cheap and efficient,
e.g. for SSA-progams. There are also descriptions how to compute liveness
information very efficiently. Quite some of these register allocation proposals
are also able to handle overlapping register classes. A few names to be
mentioned in these areas  are for example Hack, Bouchez, Boissinot,
Pereira, Wimmer, etc
> When I implemented the Basic/Greedy prototypes, I was able to replace
> the code in RegAllocLinearScan.cpp with the code you can still see in
> RegAllocBasic.cpp. You can see for yourself what Jakob means by the
> new allocator being much simpler. And as Jakob explained, the real
> benefit will be in removing VirtRegWriter.cpp which is made possibler c
> by his spill code optimizations. Those optimizations are in turn made
> practical by the new framework for incrementally updating the register
> allocator's state.
> 
> The naming decision was literally a five minute brainstorm driven by
> the need for unique filenames that are easy to type. "Greedy" is
no
> better description of our design than "Linear Scan" is of the
LLVM
> implementation by that name or "Graph Coloring" is of most
> Chaitin-Briggs implementations. I believe the best name for the new
> design would be one that conveys that it is actually a live range
> splitter that happens to perform allocation. But I am not attached to
> the names, and always figured the final algorithm, if successful,
> would simply be known as the "LLVM Register Allocator".

It is true that names are not always reflecting the essense. But on the other
hand, there is a lot of ongoing research on register allocation (and compilers
in general) and it looks like more and more such efforts choose LLVM as a
platform for experimentation. Quite some results and comparisons are published.
So, it would be nice to have a somewhat concrete (but may be not ideal) name for
the LLVM allocator rather than "LLVM register allocator".  In a few
years, there may be another, even better register alloctor for LLVM. If it will
be called "LLVM register allocator" again, it would lead to a lot of
misunderstandings when reading descriptions and papers. The typical question
then will be "Which LLVM register allocator  was meant?"  ;-)
> It's worth mentioning that we considered calling it a
"Priority"
> allocator, but I objected to this because I felt it to be
> misleading. Register reassignment and eviction were part of my
> original prototype and I felt those features were more important than
> allocation order. I used a priority queue for convenience, but
> believed the choice was somewhat insignificant. In fact, the design
> was intended to work reasonably well even if priorities were
> inverted. I initially used spill weight as a place holder for whatever
> dynamic heuristic Jakob would find works well with the splitter. One
> thing Jakob discovered is that an "inverted" heuristic actually
> cooperates better with his splitter. The Greedy allocator now
> allocates registers to the largest live ranges first.
> 
> To be fair to Linear Scan, it is also worth noting that the framework
> for live intervals was directly inspired by the LLVM implementation of
> the linear scan algorithm.
> 
> In the end I believe the effectiveness of any allocator reflects the
> work that goes into modeling the details of the target
> architecture. This is not easy to achieve in a retargetable code
> generator. Consequently, that is where much of the effort continues to
> be directed in LLVM, rather than in attempting to rank algorithms in
> their most abstract interpretation.
Absolutely. And yet a detailed description, deep explanation and comparison
(either theoretical or practical based on benchmarks) is always useful as it
introduces more clarity and helps to avoid misunderstandings. 
The aim is not to produce a paper of "academic" quality or something
comparable just for the sake of it, because you want it to be published at a
conference. There are enough people in academia who would do it ;-)
The reason should be that it makes it easier to further develop and improve the
LLVM register allocator and may also lead to better register allocation
algorithms overall. And there is also a lot to gain from developments outside of
LLVM. Once register allocation people (both from academia and from industry)
have understood  how this algorithm works and why certain (practical) decisions
were made, they may realize how to improve it even further by using  and
borrowing some approaches and tricks invented in scope of other frameworks and
register allocation algorithms.
> I hope this brief chronicle of the algorithm's inception dispels some
of
> its mystery.
Yes. It was very educative! Actually, it would be nice to have a page with a
detailed description of the current and may be old register allocator as a part
of the LLVM web-site. For starters, the blog entry from Jakob and explanations
from this mail thread could be put there. Later it could be extended and
improved. What do you think about this idea?


Thanks again,
  Roman

John McCall

2011-Sep-27 07:52 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:> It is true that names are not always reflecting the essense. But on the
other hand, there is a lot of ongoing research on register allocation (and
compilers in general) and it looks like more and more such efforts choose LLVM
as a platform for experimentation. Quite some results and comparisons are
published. So, it would be nice to have a somewhat concrete (but may be not
ideal) name for the LLVM allocator rather than "LLVM register
allocator".  In a few years, there may be another, even better register
alloctor for LLVM. If it will be called "LLVM register allocator"
again, it would lead to a lot of misunderstandings when reading descriptions and
papers. The typical question then will be "Which LLVM register allocator 
was meant?"  ;-)
I suppose you could call it the Tricky Jakob allocator. :)

John.

Jakob Stoklund Olesen

2011-Sep-27 14:18 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:
> Quite some of these register allocation proposals are also able to handle
overlapping register classes.
That's interesting. Do you have any references?

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/e4b1f13d/attachment.html>

Andrew Trick

2011-Sep-27 17:25 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:
>> I hope this brief chronicle of the algorithm's inception dispels
some of
>> its mystery.
> 
> Yes. It was very educative! Actually, it would be nice to have a page with
a detailed description of the current and may be old register allocator as a
part of the LLVM web-site. For starters, the blog entry from Jakob and
explanations from this mail thread could be put there. Later it could be
extended and improved. What do you think about this idea?
> 
> 
> Thanks again,
>   Roman
Thanks for your interest. You make several good points and give proper credit to
academics who are solving the problem in a more rigorous fashion.

I'll let Jakob decide how to proceed with the documentation. As I said, the
heavy lifting was in the splitter/spiller design, and that could use some
explanation.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/41bbe485/attachment.html>

Chris Lattner

2011-Sep-28 22:08 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:
> 
> 
> It is true that names are not always reflecting the essense. But on the
other hand, there is a lot of ongoing research on register allocation (and
compilers in general) and it looks like more and more such efforts choose LLVM
as a platform for experimentation. Quite some results and comparisons are
published. So, it would be nice to have a somewhat concrete (but may be not
ideal) name for the LLVM allocator rather than "LLVM register
allocator".  In a few years, there may be another, even better register
alloctor for LLVM. If it will be called "LLVM register allocator"
again, it would lead to a lot of misunderstandings when reading descriptions and
papers. The typical question then will be "Which LLVM register allocator 
was meant?"  ;-)
I propose "the LLVM 3.0 register allocator".  We have no plans to
throw away the greedy allocator in 3.1, but we'll probably make major
enhancements to it, which make it incomparable to 3.0.  If it does eventually
get thrown away and replaced, then a version number can capture that.

-Chris

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Sep 2011 - [LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

Apparently Analagous Threads