thr3ads.net - llvm dev - [LLVMdev] Greedy Register Allocation in LLVM 3.0 [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2011-Sep-19 15:44 UTC

[LLVMdev] Greedy Register Allocation in LLVM 3.0

I just uploaded a blog post outlining the new register allocation algorithm in
LLVM 3.0.

  http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html

Please direct comments here.

/jakob

Carlo Alberto Ferraris

2011-Sep-26 09:12 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

Just a quick question: is greedy still a local allocator (i.e. only 
takes into consideration the current bb) or a global one (takes into 
consideration the whole function)?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110926/d87aa2bd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cafxx.vcf
Type: text/x-vcard
Size: 230 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110926/d87aa2bd/attachment.vcf>

Cameron Zwarich

2011-Sep-26 09:18 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

The greedy allocator is global, but so was the old linear scan allocator.

Cameron

On Sep 26, 2011, at 2:12 AM, Carlo Alberto Ferraris <cafxx at
strayorange.com> wrote:
> Just a quick question: is greedy still a local allocator (i.e. only takes
into consideration the current bb) or a global one (takes into consideration the
whole function)?
> <cafxx.vcf>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Leo Romanoff

2011-Sep-26 11:22 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

Hi Jakob,

Thanks for a very interesting description of the new register allocation
algorithm in LLVM!!!

Could you elaborate a bit on the following topics:

1) Do you have any plans to publish something more formal and detailed about
this algorithm, e.g. a paper or something similar?  It would be nice to better
understand how this algorithm relates to well-known algorithms described in the
literature, e.g. linear scan, graph coloring, iterated register coalescing by
Chaitin-Briggs, register allocation by priority-based coloring by Chow and
Hennesey and if it represents an extension of one of those algorithms or if it
is a totally new algorithm. And of course it would be nice if the terminology
used by this description would follow the one published in the literature, as
sometimes use of a different terminology can be misleading. For example, it is
rather well known that the Linear Scan register allocation of LLVM was not a
real linear scan register allocator, as it allowed for backtracking and
re-coloring. It was de-facto more of a special case of graph-coloring register
allocator. But due to its name, many
 published research papers assumed that it is a real linear scan implementation
and took it as a representative example for such allocators.

2) A comparison of the quality of generated allocations  and allocator
performance with well-known register allocation algorithms would be very
interesting to better understand pros and cons of this new algorithm. Do you
have any numbers at hand? If not, do you plan to produce a more thorough
analysis?

3) It is stated in the blog that this allocator is much simpler than the old
linear scan implementation of LLVM. Can you provide a bit more information?
   For example:
    - How many lines of code it is and how it compares in this regard to the old
linear scan?
    - Are there any wired places like the former SimpleRegisterCoalescer, which
was not that simple at all and had an extremely wired logic ;-)
    - You explained that it produces  in most situations a better code than
linear scan, which is cool! But it would be also interesting to know in which
pathological cases it still looses against linear scan and why
    - Can you provide a rough estimate of the algorithmic complexity of the new
algorithm? Is it O(n), O(n^2) or something else?

Thanks,
  Roman


----- Ursprüngliche Message -----> Von: Jakob Stoklund Olesen <jolesen at apple.com>
> An: "llvmdev at cs.uiuc.edu List" <llvmdev at cs.uiuc.edu>
> Cc:
> Gesendet: 17:44 Montag, 19.September 2011
> Betreff: [LLVMdev] Greedy Register Allocation in LLVM 3.0
>
> I just uploaded a blog post outlining the new register allocation algorithm
in
> LLVM 3.0.
>
>   http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html
>
> Please direct comments here.
>
> /jakob
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Jakob Stoklund Olesen

2011-Sep-26 16:49 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 26, 2011, at 4:22 AM, Leo Romanoff wrote:> 1) Do you have any plans to publish something more formal and detailed
about this algorithm, e.g. a paper or something similar?
I wasn't planning on that, but I might put something together in the long
dark California winter nights.
> It would be nice to better understand how this algorithm relates to
well-known algorithms described in the literature, e.g. linear scan, graph
coloring, iterated register coalescing by Chaitin-Briggs, register allocation by
priority-based coloring by Chow and Hennesey and if it represents an extension
of one of those algorithms or if it is a totally new algorithm. And of course it
would be nice if the terminology used by this description would follow the one
published in the literature, as sometimes use of a different terminology can be
misleading. For example, it is rather well known that the Linear Scan register
allocation of LLVM was not a real linear scan register allocator, as it allowed
for backtracking and re-coloring. It was de-facto more of a special case of
graph-coloring register allocator. But due to its name, many
> published research papers assumed that it is a real linear scan
implementation and took it as a representative example for such allocators.
So much for peer review ;-)

Much of the terminology in LLVM's register allocation framework comes from
the linear scan literature, as far as I know. Sometimes it doesn't make any
sense. For example, our LiveInterval class represents a sorted list of
intervals. Each interval is called a LiveRange. I am hoping to clean this
stuff up gradually.

Most articles on register allocation start with the premise that register
allocation is isomorphic to graph coloring. This is quite wrong in the real
world where we must deal with overlapping register classes and aliasing
registers. The terminology is often based on this false premise, and that makes
me reluctant to adopt it.

Ideally, I would like to settle on a language that is as close as possible to
the established conventions without being misleading.

I don't think you can find any production quality compiler using a
'pure' published register allocator. The devil is in the details, and
the published algorithms are too simple.

Anyway, the 'basic' register allocator is quite similar to Chow and
Hennessy's priority-based coloring. They separate the unconstrained live
ranges first while basic just drops everything in the priority queue.
Unconstrained live ranges only make sense if you assume graph coloring
isomorphism.

You could compare the greedy allocator to George and Appel's iterated
register coalescing running in reverse. Greedy coalesces live ranges
aggressively, and splits them on demand.
> 2) A comparison of the quality of generated allocations and allocator
performance with well-known register allocation algorithms would be very
interesting to better understand pros and cons of this new algorithm. Do you
have any numbers at hand? If not, do you plan to produce a more thorough
analysis?
Unfortunately, it would be too much work to make a fair comparison. As I said,
the published algorithms are not good enough for a real compiler. Each would
require a lot of work before producing decent results. You would need to work
around the missing support for overlapping register classes and register
aliases, and you would need to handle pre-colored registers and call-clobbered
registers somehow.

The greedy allocator was designed to handle these real-world problems as part of
the core algorithm. It was my judgment that adapting something like George and
Appel's iterated register coalescing to the real world would be just as much
work as starting from scratch. It was my hope that handling the real-world
problems from the start would lead to a cleaner design.

GCC's new integrated register allocator was designed by adapting
Chaitin-Briggs to the real world, as far as I know. That is the most fair
comparison I can think of.
> 3) It is stated in the blog that this allocator is much simpler than the
old linear scan implementation of LLVM. Can you provide a bit more information?
> For example:
> - How many lines of code it is and how it compares in this regard to
the old linear scan?
A lot of source files are involved, and some are shared, so I don't have
these numbers.

The worst linear scan code is the rewriter which is 2600 lines in
VirtRegRewriter.cpp, and parts of LiveIntervalAnalysis.cpp.

The hard part of greedy is the live range splitting code in SplitKit.cpp at 1400
lines.

The rest is of comparable size, but greedy gets you global live range splitting
for the same price.
> - Are there any wired places like the former SimpleRegisterCoalescer,
which was not that simple at all and had an extremely wired logic ;-)
Greedy is still using the same coalescer. It has been renamed RegisterCoalescer
for honesty. ;-)

Since greedy handles pre-colored registers as part if its algorithm, the
coalescer no longer needs to handle physical register coalescing. This code can
be removed once linear scan is gone.

I suspect that parts of SplitKit.cpp might make people squint. As the author, I
find it perfectly clear, of course.
> - You explained that it produces in most situations a better code than
linear scan, which is cool! But it would be also interesting to know in which
pathological cases it still looses against linear scan and why
The regressions I have seen have mostly been luck. That is, neither heuristic
knows what it is doing, and one gets it just right by accident.

I have a vague sense that linear scan is sometimes doing better with some high
register pressure coloring problems. It is hard to quantify, though.
> - Can you provide a rough estimate of the algorithmic complexity of the
new algorithm? Is it O(n), O(n^2) or something else?
Something like N log (M), where N is the number of live ranges, and M is the
number of continuous live range segments. I wouldn't be surprised if the
worst-case behavior is much worse.

The complexity of global live range splitting depends on the topology of the CFG
spanned by the live range, but each split is sub-linear to linear in the number
of blocks spanned.

Local live range splitting has a known quadratic behavior in large basic blocks
that we still need to eliminate. It rarely shows up in practice.

/jakob

Jakob Stoklund Olesen

2011-Sep-26 18:27 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 26, 2011, at 2:12 AM, Carlo Alberto Ferraris wrote:
> Just a quick question: is greedy still a local allocator (i.e. only takes
into consideration the current bb) or a global one (takes into consideration the
whole function)?
RegAllocGreedy, RegAllocBasic, RegAllocLinearScan and RegAllocPBQP are global.

RegAllocFast is local.  It is only used for -O0.  In the future, it may get
primitive support for the kind of global live ranges that clang -O0 can produce.

/jakob

Andrew Trick

2011-Sep-27 05:11 UTC

head link

[LLVMdev] Greedy Register Allocation in LLVM 3.0

On Sep 26, 2011, at 4:22 AM, Leo Romanoff wrote:> Hi Jakob,
> 
> Thanks for a very interesting description of the new register allocation
algorithm in LLVM!!!
> 
> Could you elaborate a bit on the following topics:
> 
> 1) Do you have any plans to publish something more formal and detailed
about this algorithm, e.g. a paper or something similar?  It would be nice to
better understand how this algorithm relates to well-known algorithms described
in the literature, e.g. linear scan, graph coloring, iterated register
coalescing by Chaitin-Briggs, register allocation by priority-based coloring by
Chow and Hennesey and if it represents an extension of one of those algorithms
or if it is a totally new algorithm. And of course it would be nice if the
terminology used by this description would follow the one published in the
literature, as sometimes use of a different terminology can be misleading. For
example, it is rather well known that the Linear Scan register allocation of
LLVM was not a real linear scan register allocator, as it allowed for
backtracking and re-coloring. It was de-facto more of a special case of
graph-coloring register allocator. But due to its name, many
> published research papers assumed that it is a real linear scan
implementation and took it as a representative example for such allocators.
> 
> 2) A comparison of the quality of generated allocations  and allocator
performance with well-known register allocation algorithms would be very
interesting to better understand pros and cons of this new algorithm. Do you
have any numbers at hand? If not, do you plan to produce a more thorough
analysis?
> 
> 3) It is stated in the blog that this allocator is much simpler than the
old linear scan implementation of LLVM. Can you provide a bit more information?
>    For example:
>     - How many lines of code it is and how it compares in this regard to
the old linear scan?
>     - Are there any wired places like the former SimpleRegisterCoalescer,
which was not that simple at all and had an extremely wired logic ;-)
>     - You explained that it produces  in most situations a better code than
linear scan, which is cool! But it would be also interesting to know in which
pathological cases it still looses against linear scan and why
>     - Can you provide a rough estimate of the algorithmic complexity of the
new algorithm? Is it O(n), O(n^2) or something else?
> 
> Thanks,
>   Roman
It may be more helpful to explain how LLVM's register allocator came
into existence before debating the high level algorithm.

When I began working on LLVM last October, Jakob was developing an
infrastructure for global live range splitting. It was just becoming
clear that LLVM's Linear Scan was inadequate for this work, and no one
I talked to knew how to fix it. Chris and Evan were very receptive to
replacing the register allocator, but the design space was wide
open. I knew from experience that we did not want to build a
traditional interference graph. Chris, Evan, and Jakob all agreed on
this point. I reasoned that an efficient implementation of
LiveInterval along with a new "LiveIntervalUnion" data structure and
mechanism for caching interference tests would be sufficient, and
offered to build a prototype to prove it.

This was an important step. My goal was never to develop a
theoretically superior register allocator algorithm in its own
right. On the contrary, Jakob and I wanted the global splitter to
drive allocation, rather than an allocation algorithm that fell back
to splitting when it failed. The important stake holders at the time
strongly agreed that the problem Jakob was solving, global live range
splitting, was the difficult problem that deserved attention, and that
the register allocation algorithm was only important in that it did
not inhibit splitting or spilling or unduly impact compile time. I
believe this goal led to a register allocator that is fundamentally
different from anything I'm aware of at the level of software design.

To summarize, Jakob and I both strongly agreed on the following design
goals, primarily for engineering reasons, not theoretical reasons:

- early aggressive coalescing

- no predetermined allocation order
  i.e. no fixed order CFG traversal,
       splitting changes allocation order

- no explicit interference graph

- on-the-fly live range splitting at the finest possible granularity

  When I say this the new allocator is fundamentally different from
  the other allocators I've seen, this is mainly what I'm referring
  to. At each decision point made by the allocator, the splitter may
  rewrite the code, modify live ranges, and incrementally update every
  bit of state in the allocator. There is no abandoning the current
  solution and no iterative application of an algorithm.

When I implemented the Basic/Greedy prototypes, I was able to replace
the code in RegAllocLinearScan.cpp with the code you can still see in
RegAllocBasic.cpp. You can see for yourself what Jakob means by the
new allocator being much simpler. And as Jakob explained, the real
benefit will be in removing VirtRegWriter.cpp which is made possible
by his spill code optimizations. Those optimizations are in turn made
practical by the new framework for incrementally updating the register
allocator's state.

The naming decision was literally a five minute brainstorm driven by
the need for unique filenames that are easy to type. "Greedy" is no
better description of our design than "Linear Scan" is of the LLVM
implementation by that name or "Graph Coloring" is of most
Chaitin-Briggs implementations. I believe the best name for the new
design would be one that conveys that it is actually a live range
splitter that happens to perform allocation. But I am not attached to
the names, and always figured the final algorithm, if successful,
would simply be known as the "LLVM Register Allocator".

It's worth mentioning that we considered calling it a "Priority"
allocator, but I objected to this because I felt it to be
misleading. Register reassignment and eviction were part of my
original prototype and I felt those features were more important than
allocation order. I used a priority queue for convenience, but
believed the choice was somewhat insignificant. In fact, the design
was intended to work reasonably well even if priorities were
inverted. I initially used spill weight as a place holder for whatever
dynamic heuristic Jakob would find works well with the splitter. One
thing Jakob discovered is that an "inverted" heuristic actually
cooperates better with his splitter. The Greedy allocator now
allocates registers to the largest live ranges first.

To be fair to Linear Scan, it is also worth noting that the framework
for live intervals was directly inspired by the LLVM implementation of
the linear scan algorithm.

In the end I believe the effectiveness of any allocator reflects the
work that goes into modeling the details of the target
architecture. This is not easy to achieve in a retargetable code
generator. Consequently, that is where much of the effort continues to
be directed in LLVM, rather than in attempting to rank algorithms in
their most abstract interpretation.

I hope this brief chronicle of the algorithm's inception dispels some of
its mystery.

-Andy

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Sep 2011 - [LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

[LLVMdev] Greedy Register Allocation in LLVM 3.0

Maybe Matching Threads