On Sep 26, 2011, at 4:22 AM, Leo Romanoff wrote:> Hi Jakob, > > Thanks for a very interesting description of the new register allocation algorithm in LLVM!!! > > Could you elaborate a bit on the following topics: > > 1) Do you have any plans to publish something more formal and detailed about this algorithm, e.g. a paper or something similar? It would be nice to better understand how this algorithm relates to well-known algorithms described in the literature, e.g. linear scan, graph coloring, iterated register coalescing by Chaitin-Briggs, register allocation by priority-based coloring by Chow and Hennesey and if it represents an extension of one of those algorithms or if it is a totally new algorithm. And of course it would be nice if the terminology used by this description would follow the one published in the literature, as sometimes use of a different terminology can be misleading. For example, it is rather well known that the Linear Scan register allocation of LLVM was not a real linear scan register allocator, as it allowed for backtracking and re-coloring. It was de-facto more of a special case of graph-coloring register allocator. But due to its name, many > published research papers assumed that it is a real linear scan implementation and took it as a representative example for such allocators. > > 2) A comparison of the quality of generated allocations and allocator performance with well-known register allocation algorithms would be very interesting to better understand pros and cons of this new algorithm. Do you have any numbers at hand? If not, do you plan to produce a more thorough analysis? > > 3) It is stated in the blog that this allocator is much simpler than the old linear scan implementation of LLVM. Can you provide a bit more information? > For example: > - How many lines of code it is and how it compares in this regard to the old linear scan? > - Are there any wired places like the former SimpleRegisterCoalescer, which was not that simple at all and had an extremely wired logic ;-) > - You explained that it produces in most situations a better code than linear scan, which is cool! But it would be also interesting to know in which pathological cases it still looses against linear scan and why > - Can you provide a rough estimate of the algorithmic complexity of the new algorithm? Is it O(n), O(n^2) or something else? > > Thanks, > RomanIt may be more helpful to explain how LLVM's register allocator came into existence before debating the high level algorithm. When I began working on LLVM last October, Jakob was developing an infrastructure for global live range splitting. It was just becoming clear that LLVM's Linear Scan was inadequate for this work, and no one I talked to knew how to fix it. Chris and Evan were very receptive to replacing the register allocator, but the design space was wide open. I knew from experience that we did not want to build a traditional interference graph. Chris, Evan, and Jakob all agreed on this point. I reasoned that an efficient implementation of LiveInterval along with a new "LiveIntervalUnion" data structure and mechanism for caching interference tests would be sufficient, and offered to build a prototype to prove it. This was an important step. My goal was never to develop a theoretically superior register allocator algorithm in its own right. On the contrary, Jakob and I wanted the global splitter to drive allocation, rather than an allocation algorithm that fell back to splitting when it failed. The important stake holders at the time strongly agreed that the problem Jakob was solving, global live range splitting, was the difficult problem that deserved attention, and that the register allocation algorithm was only important in that it did not inhibit splitting or spilling or unduly impact compile time. I believe this goal led to a register allocator that is fundamentally different from anything I'm aware of at the level of software design. To summarize, Jakob and I both strongly agreed on the following design goals, primarily for engineering reasons, not theoretical reasons: - early aggressive coalescing - no predetermined allocation order i.e. no fixed order CFG traversal, splitting changes allocation order - no explicit interference graph - on-the-fly live range splitting at the finest possible granularity When I say this the new allocator is fundamentally different from the other allocators I've seen, this is mainly what I'm referring to. At each decision point made by the allocator, the splitter may rewrite the code, modify live ranges, and incrementally update every bit of state in the allocator. There is no abandoning the current solution and no iterative application of an algorithm. When I implemented the Basic/Greedy prototypes, I was able to replace the code in RegAllocLinearScan.cpp with the code you can still see in RegAllocBasic.cpp. You can see for yourself what Jakob means by the new allocator being much simpler. And as Jakob explained, the real benefit will be in removing VirtRegWriter.cpp which is made possible by his spill code optimizations. Those optimizations are in turn made practical by the new framework for incrementally updating the register allocator's state. The naming decision was literally a five minute brainstorm driven by the need for unique filenames that are easy to type. "Greedy" is no better description of our design than "Linear Scan" is of the LLVM implementation by that name or "Graph Coloring" is of most Chaitin-Briggs implementations. I believe the best name for the new design would be one that conveys that it is actually a live range splitter that happens to perform allocation. But I am not attached to the names, and always figured the final algorithm, if successful, would simply be known as the "LLVM Register Allocator". It's worth mentioning that we considered calling it a "Priority" allocator, but I objected to this because I felt it to be misleading. Register reassignment and eviction were part of my original prototype and I felt those features were more important than allocation order. I used a priority queue for convenience, but believed the choice was somewhat insignificant. In fact, the design was intended to work reasonably well even if priorities were inverted. I initially used spill weight as a place holder for whatever dynamic heuristic Jakob would find works well with the splitter. One thing Jakob discovered is that an "inverted" heuristic actually cooperates better with his splitter. The Greedy allocator now allocates registers to the largest live ranges first. To be fair to Linear Scan, it is also worth noting that the framework for live intervals was directly inspired by the LLVM implementation of the linear scan algorithm. In the end I believe the effectiveness of any allocator reflects the work that goes into modeling the details of the target architecture. This is not easy to achieve in a retargetable code generator. Consequently, that is where much of the effort continues to be directed in LLVM, rather than in attempting to rank algorithms in their most abstract interpretation. I hope this brief chronicle of the algorithm's inception dispels some of its mystery. -Andy
Hi Jakob, Hi Andy, First of all, thanks a lot for very elaborative and interesting answers!> It may be more helpful to explain how LLVM's register allocator came > into existence before debating the high level algorithm. > > When I began working on LLVM last October, Jakob was developing an > infrastructure for global live range splitting. It was just becoming > clear that LLVM's Linear Scan was inadequate for this work, and no one > I talked to knew how to fix it. Chris and Evan were very receptive to > replacing the register allocator, but the design space was wide > open. I knew from experience that we did not want to build a > traditional interference graph. Chris, Evan, and Jakob all agreed on > this point. I reasoned that an efficient implementation of > LiveInterval along with a new "LiveIntervalUnion" data structure and > mechanism for caching interference tests would be sufficient, and > offered to build a prototype to prove it. > > This was an important step. My goal was never to develop a > theoretically superior register allocator algorithm in its own > right. On the contrary, Jakob and I wanted the global splitter to > drive allocation, rather than an allocation algorithm that fell back > to splitting when it failed. The important stake holders at the time > strongly agreed that the problem Jakob was solving, global live range > splitting, was the difficult problem that deserved attention, and that > the register allocation algorithm was only important in that it did > not inhibit splitting or spilling or unduly impact compile time. I > believe this goal led to a register allocator that is fundamentally > different from anything I'm aware of at the level of software design. > > To summarize, Jakob and I both strongly agreed on the following design > goals, primarily for engineering reasons, not theoretical reasons:I totally understand that the goal was/is to build a real, efficient register allocator, instead of producing a theoretical result, which may sound good in theory, but would not be very practical in terms of quality or compilation speed on real world use-cases.> - early aggressive coalescing > > - no predetermined allocation order > i.e. no fixed order CFG traversal, > splitting changes allocation order > > - no explicit interference graph > > - on-the-fly live range splitting at the finest possible granularityThis very nicely summarizes main objectives of the new allocator.> When I say this the new allocator is fundamentally different from > the other allocators I've seen, this is mainly what I'm referring > to. At each decision point made by the allocator, the splitter may > rewrite the code, modify live ranges, and incrementally update every > bit of state in the allocator. There is no abandoning the current > solution and no iterative application of an algorithm.While I agree that most "classical" allocators (e.g. graph coloring) do not do fine grained splitting and usually build an explicit interference graph, I also want to mention that many of the modern proposals published in the literature often do very agressive splitting (sometimes even before and after at each instruction, or at the beginning and end of each basic block). Many of proposals also avoid construction of interference graphs and replace it with on-demand interference computation, which is often very cheap and efficient, e.g. for SSA-progams. There are also descriptions how to compute liveness information very efficiently. Quite some of these register allocation proposals are also able to handle overlapping register classes. A few names to be mentioned in these areas are for example Hack, Bouchez, Boissinot, Pereira, Wimmer, etc> When I implemented the Basic/Greedy prototypes, I was able to replace > the code in RegAllocLinearScan.cpp with the code you can still see in > RegAllocBasic.cpp. You can see for yourself what Jakob means by the > new allocator being much simpler. And as Jakob explained, the real > benefit will be in removing VirtRegWriter.cpp which is made possibler c > by his spill code optimizations. Those optimizations are in turn made > practical by the new framework for incrementally updating the register > allocator's state. > > The naming decision was literally a five minute brainstorm driven by > the need for unique filenames that are easy to type. "Greedy" is no > better description of our design than "Linear Scan" is of the LLVM > implementation by that name or "Graph Coloring" is of most > Chaitin-Briggs implementations. I believe the best name for the new > design would be one that conveys that it is actually a live range > splitter that happens to perform allocation. But I am not attached to > the names, and always figured the final algorithm, if successful, > would simply be known as the "LLVM Register Allocator".It is true that names are not always reflecting the essense. But on the other hand, there is a lot of ongoing research on register allocation (and compilers in general) and it looks like more and more such efforts choose LLVM as a platform for experimentation. Quite some results and comparisons are published. So, it would be nice to have a somewhat concrete (but may be not ideal) name for the LLVM allocator rather than "LLVM register allocator". In a few years, there may be another, even better register alloctor for LLVM. If it will be called "LLVM register allocator" again, it would lead to a lot of misunderstandings when reading descriptions and papers. The typical question then will be "Which LLVM register allocator was meant?" ;-)> It's worth mentioning that we considered calling it a "Priority" > allocator, but I objected to this because I felt it to be > misleading. Register reassignment and eviction were part of my > original prototype and I felt those features were more important than > allocation order. I used a priority queue for convenience, but > believed the choice was somewhat insignificant. In fact, the design > was intended to work reasonably well even if priorities were > inverted. I initially used spill weight as a place holder for whatever > dynamic heuristic Jakob would find works well with the splitter. One > thing Jakob discovered is that an "inverted" heuristic actually > cooperates better with his splitter. The Greedy allocator now > allocates registers to the largest live ranges first. > > To be fair to Linear Scan, it is also worth noting that the framework > for live intervals was directly inspired by the LLVM implementation of > the linear scan algorithm. > > In the end I believe the effectiveness of any allocator reflects the > work that goes into modeling the details of the target > architecture. This is not easy to achieve in a retargetable code > generator. Consequently, that is where much of the effort continues to > be directed in LLVM, rather than in attempting to rank algorithms in > their most abstract interpretation.Absolutely. And yet a detailed description, deep explanation and comparison (either theoretical or practical based on benchmarks) is always useful as it introduces more clarity and helps to avoid misunderstandings. The aim is not to produce a paper of "academic" quality or something comparable just for the sake of it, because you want it to be published at a conference. There are enough people in academia who would do it ;-) The reason should be that it makes it easier to further develop and improve the LLVM register allocator and may also lead to better register allocation algorithms overall. And there is also a lot to gain from developments outside of LLVM. Once register allocation people (both from academia and from industry) have understood how this algorithm works and why certain (practical) decisions were made, they may realize how to improve it even further by using and borrowing some approaches and tricks invented in scope of other frameworks and register allocation algorithms.> I hope this brief chronicle of the algorithm's inception dispels some of > its mystery.Yes. It was very educative! Actually, it would be nice to have a page with a detailed description of the current and may be old register allocator as a part of the LLVM web-site. For starters, the blog entry from Jakob and explanations from this mail thread could be put there. Later it could be extended and improved. What do you think about this idea? Thanks again, Roman
On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:> It is true that names are not always reflecting the essense. But on the other hand, there is a lot of ongoing research on register allocation (and compilers in general) and it looks like more and more such efforts choose LLVM as a platform for experimentation. Quite some results and comparisons are published. So, it would be nice to have a somewhat concrete (but may be not ideal) name for the LLVM allocator rather than "LLVM register allocator". In a few years, there may be another, even better register alloctor for LLVM. If it will be called "LLVM register allocator" again, it would lead to a lot of misunderstandings when reading descriptions and papers. The typical question then will be "Which LLVM register allocator was meant?" ;-)I suppose you could call it the Tricky Jakob allocator. :) John.
Jakob Stoklund Olesen
2011-Sep-27 14:18 UTC
[LLVMdev] Greedy Register Allocation in LLVM 3.0
On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:> Quite some of these register allocation proposals are also able to handle overlapping register classes.That's interesting. Do you have any references? /jakob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/e4b1f13d/attachment.html>
On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:>> I hope this brief chronicle of the algorithm's inception dispels some of >> its mystery. > > Yes. It was very educative! Actually, it would be nice to have a page with a detailed description of the current and may be old register allocator as a part of the LLVM web-site. For starters, the blog entry from Jakob and explanations from this mail thread could be put there. Later it could be extended and improved. What do you think about this idea? > > > Thanks again, > RomanThanks for your interest. You make several good points and give proper credit to academics who are solving the problem in a more rigorous fashion. I'll let Jakob decide how to proceed with the documentation. As I said, the heavy lifting was in the splitter/spiller design, and that could use some explanation. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110927/41bbe485/attachment.html>
On Sep 27, 2011, at 12:11 AM, Leo Romanoff wrote:> > > It is true that names are not always reflecting the essense. But on the other hand, there is a lot of ongoing research on register allocation (and compilers in general) and it looks like more and more such efforts choose LLVM as a platform for experimentation. Quite some results and comparisons are published. So, it would be nice to have a somewhat concrete (but may be not ideal) name for the LLVM allocator rather than "LLVM register allocator". In a few years, there may be another, even better register alloctor for LLVM. If it will be called "LLVM register allocator" again, it would lead to a lot of misunderstandings when reading descriptions and papers. The typical question then will be "Which LLVM register allocator was meant?" ;-)I propose "the LLVM 3.0 register allocator". We have no plans to throw away the greedy allocator in 3.1, but we'll probably make major enhancements to it, which make it incomparable to 3.0. If it does eventually get thrown away and replaced, then a version number can capture that. -Chris