On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote:>> I'm definitely interested in improving coalescing and it sounds like >> this would fall under that work. Do you have references to papers >> that talk about the various algorithms? > > Some suggestions: > > @InProceedings{Budimlic02, > AUTHOR = {Zoran Budimlic and Keith D. Cooper and Timothy J. Harvey > and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves}, > YEAR = "2002", > TITLE = "Fast copy coalescing and live-range identification",Yep, this is the one I was thinking of. It is available online here: http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf> And I have a quite fast algo that I believe is simpler than [Budimlic02] > and I can share it with you :)Can you briefly explain what it simplifies? -Chris -- http://nondot.org/sabre/ http://llvm.org/
> Can you briefly explain what it simplifies?Let me try to explain my algorithm first, and then you can try to compare with the others already described. The objective is to convert the SSA-form program into a conventional SSA-form program (CSSA). In this flavor of SSA, the variables related by phi-functions do not interfere, and can be assigned the same register, as in Budimlic's case, or the same memory address if spilled (that is what I am doing). One way to convert a program in CSSA-form is to split the live range of each virtual that is part of a phi function. You can do this using the algorithm below: PhiLifting For each instruction (a_i := phi(a_{i1}:B_1, a_{i2}:B_2, ..., a_{im}:B_m)) Create new virtual v_i Add copy instruction I = ( a_i := v_i ) at the end of block B_j Replace a_i by v_i in phi For each virtual a_{ij} in phi Create new virtual v_{ij} Add copy instruction I = (v_{ij} := a_{ij}) at the end of block B_j Replace a_{ij} by v_{ij} in \phi This algorithm will add one copy for each virtual that is a parameter of a phi-function, and one copy for each virtual that is the result of a phi-function. After you run it, given a phi function (a := phi(b, c)), you can assign the same register to a, b and c, because their live ranges don't overlap. But this is too convervative, because now the live ranges are very small. In order to increase this, you can merge classes of phi-related virtuals using the algorithm below: PhiMemCoalesce (S_I = { I_1, I_2, ..., I_q } // instructions created by PhiLifting. S_Q = { Q_1, Q_2, ..., Q_r }) /// Classes of phi-related virtuals. For each instruction I = ( v_{ij} := a_{ij} ) in S_I S_I := S_I - { I } Let Q_v be the equivalence class of v_{ij} Let Q_a be the equivalence class of a_{ij} if Q_v intersection Q_v = {} // they don't overlap S_Q := S_Q - { Q_v } S_Q := S_Q - { Q_a } S_Q := S_Q + { Q_a union Q_v } Virtuals that are in the same phi function are said to be in the same equivalence class of phi-related virtuals. For instance, for a := phi(b, c), the equivalence class is Q = {a, b, c}. To make it efficient, I do not build the interference graph of the source program in order to perform operations such as 'Q_i intersection Q_j'. Instead, I rely on the interval representation of live ranges commonly used in versions of the linear scan algorithm. Each virtual is represented as a collection of ordered intervals on the linearized control flow graph. Thus, a set Q of virtuals is a set of ordered integer intervals. In this way, the intersection of two phi-equivalence classes Q_i and Q_j can be found in time linear on the number of disjoint intervals in both sets. Because a phi-equivalence class can have O(V) disjoint intervals, the final complexity of algorithm PhiMemCoalesce is O(|L_i| * V). In my experiments, the interval based implementation of PhiMemCoalesce accounts for less than 1% of the total compilation time, whereas the interference graph based algorithm takes more than 30% of the compilation time! Fernando
Chris Lattner wrote:> On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote: >>> I'm definitely interested in improving coalescing and it sounds like >>> this would fall under that work. Do you have references to papers >>> that talk about the various algorithms? >> Some suggestions: >> >> @InProceedings{Budimlic02, >> AUTHOR = {Zoran Budimlic and Keith D. Cooper and Timothy J. Harvey >> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves}, >> YEAR = "2002", >> TITLE = "Fast copy coalescing and live-range identification", > > Yep, this is the one I was thinking of. It is available online here: > http://www.cs.rice.edu/~keith/LACSI/pldi02.pdfI was just looking at this today. One thing that strikes me about all these papers I've read on the topic is that no one seems to consider the interaction of coalescing with spilling. By definition coalescing increases lifetimes and can cause more interferences. Yet the papers all try to coalesce as many copies as possible. On, say, the x86 machines, the cost of a spill is really not much different from the cost of a copy so it's probably close to a wash in the end. But there are many machines where the cost of each operation is vastly different. Aggressive coalescing is not always good. Often you'd rather copy than spill. Is anyone aware of publications addressing the interplay among coalescing, live range splitting, register allocation and spilling? This is one of the reasons I want to separate all of these concerns. We shouldn't force developers to always coalesce or always use some generic measure of spill cost. -Dave
On Apr 16, 2007, at 1:17 PM, David Greene wrote:> Chris Lattner wrote: >> On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote: >>>> I'm definitely interested in improving coalescing and it sounds >>>> like >>>> this would fall under that work. Do you have references to papers >>>> that talk about the various algorithms? >>> Some suggestions: >>> >>> @InProceedings{Budimlic02, >>> AUTHOR = {Zoran Budimlic and Keith D. Cooper and Timothy J. >>> Harvey >>> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves}, >>> YEAR = "2002", >>> TITLE = "Fast copy coalescing and live-range identification", >> >> Yep, this is the one I was thinking of. It is available online here: >> http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf > > I was just looking at this today. One thing that strikes me about > all these papers I've read on the topic is that no one seems to > consider the interaction of coalescing with spilling. By definition > coalescing increases lifetimes and can cause more interferences. > Yet the papers all try to coalesce as many copies as possible. > > On, say, the x86 machines, the cost of a spill is really not much > different from the cost of a copy so it's probably close to a wash in > the end. But there are many machines where the cost of each operation > is vastly different. Aggressive coalescing is not always good. Often > you'd rather copy than spill. > > Is anyone aware of publications addressing the interplay among > coalescing, live range splitting, register allocation and spilling? > > This is one of the reasons I want to separate all of these concerns. > We shouldn't force developers to always coalesce or always use some > generic measure of spill cost.These are "implementation details". They don't necessarily make good paper material. :-) Obviously, smart heuristics can make a big difference here (estimated register pressures, etc.) But the more important thing is how the passes down stream can recover from the earlier mistakes. By this, we mean live range splitting and re-materialization. Unfortunately, neither of these are feasible given the current infrastructure. Chris and I have been discussing a rewrite informally, but no formal plan has been materialized. Evan> > -Dave > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Mon, 16 Apr 2007, David Greene wrote:>> Yep, this is the one I was thinking of. It is available online here: >> http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf > > I was just looking at this today. One thing that strikes me about > all these papers I've read on the topic is that no one seems to > consider the interaction of coalescing with spilling. By definition > coalescing increases lifetimes and can cause more interferences. > Yet the papers all try to coalesce as many copies as possible.Funny you should mention it, we've been (okay, Evan has been ;-) fighting this issue lately.> On, say, the x86 machines, the cost of a spill is really not much > different from the cost of a copy so it's probably close to a wash in > the end. But there are many machines where the cost of each operation > is vastly different. Aggressive coalescing is not always good. Often > you'd rather copy than spill.Even on X86, load are more expensive than copies (At least on sane implementations).> Is anyone aware of publications addressing the interplay among > coalescing, live range splitting, register allocation and spilling? > > This is one of the reasons I want to separate all of these concerns. > We shouldn't force developers to always coalesce or always use some > generic measure of spill cost.In my mind (a dangerous place to go), it's not about coallescing vs spilling, it's about splitting vs spilling. In particular, coallescing does increase live ranges of values, but you can have the same increase from code written without copies. Basically, not doing aggressive copy coallescing means that you are dependent on the user telling you where live ranges should be split. In an aggressive compiler, the user doesn't actually have control, as most of the explicit copies are gone by the time regalloc happens. This means that you have an arbitrary set of copies that come from things like 2-addr elim, phi elim, extensions that don't generate code, etc. Instead of doing this. I think the right design is to: 1. Coallesce vregs together as aggressively as possible. Build huge live ranges, delete as many copies as possible. 2. Start priority based register allocation. 3. For each live range without a register available, consider: 1. live range splitting 2. remat 3. spilling In particular, by doing live range splitting, the *register allocator* can control where the copies go, rather than relying on luck that previous passes introduce copies in convenient places. In practice, right now we have no splitting capability. In addition, we coallesce physregs with vregs, which can end up pinning a physreg across an entire function even if it has few uses. This is all very bad, Evan is working on making the coallescer more careful about coallescing vregs with pregs now. In the next 6 months or so, I'd like to start investigating a bigger refactoring of linear scan, which will hopefully allow us to build in some of the smarter techniques like aggressive remat, splitting, etc. The first step to getting that is to make sure the datastructures (e.g. live intervals) are sane. -Chris -- http://nondot.org/sabre/ http://llvm.org/
On 4/17/07, David Greene <greened at obbligato.org> wrote:> Is anyone aware of publications addressing the interplay among > coalescing, live range splitting, register allocation and spilling?I think in the paper I mentioned above ("Design and Implementation of GCRA in GCC"; though the implementation failed at last the paper itself contains a lot of useful information) there are descriptions of different coalescing techniques: iterated, optimistic and extended coalescing. You can find the papers on these methods at citeseer, for example. Here are the links: http://citeseer.ist.psu.edu/park98optimistic.html for optimistic and http://citeseer.ist.psu.edu/197136.html for iterated coalescing. The main idea is to try to split live ranges of the coalesced virtual regs that haven't got a color during the select step and to color them separately. I think this is exactly what Chris suggests (with rematerialization, too, of course). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070417/1f06a106/attachment.html>
On 4/16/07, David Greene <greened at obbligato.org> wrote:> Chris Lattner wrote: > > On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote: > >>> I'm definitely interested in improving coalescing and it sounds like > >>> this would fall under that work. Do you have references to papers > >>> that talk about the various algorithms? > >> Some suggestions: > >> > >> @InProceedings{Budimlic02, > >> AUTHOR = {Zoran Budimlic and Keith D. Cooper and Timothy J. Harvey > >> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves}, > >> YEAR = "2002", > >> TITLE = "Fast copy coalescing and live-range identification", > > > > Yep, this is the one I was thinking of. It is available online here: > > http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf > > I was just looking at this today. One thing that strikes me about > all these papers I've read on the topic is that no one seems to > consider the interaction of coalescing with spilling. > By definition > coalescing increases lifetimes and can cause more interferences. > Yet the papers all try to coalesce as many copies as possible.Most papers I read these days try to coalesce anything that won't cause spills. Some of these apparoaches are to coalesce everything, then undo as necessary to avoid spills, etc. But certainly, I don't believe recent papers in the area of coalescing ignore spill costs.> > Is anyone aware of publications addressing the interplay among > coalescing, live range splitting, register allocation and spilling?Yes, they are around, i'll dig them out of my pdfs :)