thr3ads.net - llvm dev - [LLVMdev] Regalloc Refactoring [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Chris Lattner

2007-Apr-14 03:06 UTC

[LLVMdev] Regalloc Refactoring

On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira
wrote:>> I'm definitely interested in improving coalescing and it sounds
like
>> this would fall under that work.  Do you have references to papers
>> that talk about the various algorithms?
>
> Some suggestions:
>
> @InProceedings{Budimlic02,
>   AUTHOR     = {Zoran Budimlic and Keith D. Cooper and Timothy J. Harvey
> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves},
>   YEAR       = "2002",
>   TITLE      = "Fast copy coalescing and live-range
identification",
Yep, this is the one I was thinking of.  It is available online here:
http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf
> And I have a quite fast algo that I believe is simpler than [Budimlic02]
> and I can share it with you :)
Can you briefly explain what it simplifies?

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Fernando Magno Quintao Pereira

2007-Apr-14 05:09 UTC

head link

[LLVMdev] Regalloc Refactoring

> Can you briefly explain what it simplifies?
Let me try to explain my algorithm first, and then you can try to compare 
with the others already described.

The objective is to convert the SSA-form program into a conventional 
SSA-form program (CSSA). In this flavor of SSA, the variables related by 
phi-functions do not interfere, and can be assigned the same register, as 
in Budimlic's case, or the same memory address if spilled (that is what I 
am doing). One way to convert a program in CSSA-form is to split the live 
range of each virtual that is part of a phi function. You can do this 
using the algorithm below:

PhiLifting
   For each instruction
   (a_i := phi(a_{i1}:B_1, a_{i2}:B_2, ..., a_{im}:B_m))
     Create new virtual v_i
       Add copy instruction I =  ( a_i := v_i ) at the end of block B_j
       Replace a_i by v_i in phi
       For each virtual  a_{ij} in phi
         Create new virtual v_{ij}
         Add copy instruction I =  (v_{ij} := a_{ij})
         at the end of block B_j
         Replace a_{ij} by v_{ij} in \phi

This algorithm will add one copy for each virtual that is a parameter of a 
phi-function, and one copy for each virtual that is the result of a 
phi-function. After you run it, given a phi function (a := phi(b, c)), you 
can assign the same register to a, b and c, because their live ranges 
don't overlap. But this is too convervative, because now the live ranges 
are very small. In order to increase this, you can merge classes of 
phi-related virtuals using the algorithm below:

PhiMemCoalesce
(S_I = { I_1, I_2, ..., I_q } // instructions created by PhiLifting.
S_Q = { Q_1, Q_2, ..., Q_r }) /// Classes of phi-related virtuals.
For each instruction  I = ( v_{ij} := a_{ij} ) in S_I
     S_I := S_I - { I }
         Let Q_v be the equivalence class of v_{ij}
             Let Q_a be the equivalence class of a_{ij}
                 if Q_v intersection Q_v = {} // they don't overlap
                     S_Q := S_Q - { Q_v }
                     S_Q := S_Q - { Q_a }
                     S_Q := S_Q + { Q_a union Q_v }

Virtuals that are in the same phi function are said to be in the same 
equivalence class of phi-related virtuals. For instance, for a := phi(b, 
c), the equivalence class is Q = {a, b, c}.

To make it efficient, I do not build the interference graph of the source 
program in order to perform operations such as 'Q_i intersection Q_j'. 
Instead, I rely on the interval representation of live ranges commonly 
used in versions of the linear scan algorithm. Each virtual is represented 
as a collection of ordered intervals on the linearized control flow graph. 
Thus, a set Q of virtuals is a set of ordered integer intervals. In this 
way, the intersection of two phi-equivalence classes Q_i and Q_j can be 
found in time linear on the number of disjoint intervals in both sets. 
Because a phi-equivalence class can have O(V) disjoint intervals, the 
final complexity of algorithm PhiMemCoalesce is O(|L_i| * V). In my 
experiments, the interval based implementation of PhiMemCoalesce accounts 
for less than 1% of the total compilation time, whereas the interference 
graph based algorithm takes more than 30% of the compilation time!

Fernando

David Greene

2007-Apr-16 20:17 UTC

head link

[LLVMdev] Regalloc Refactoring

Chris Lattner wrote:> On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote:
>>> I'm definitely interested in improving coalescing and it sounds
like
>>> this would fall under that work.  Do you have references to papers
>>> that talk about the various algorithms?
>> Some suggestions:
>>
>> @InProceedings{Budimlic02,
>>   AUTHOR     = {Zoran Budimlic and Keith D. Cooper and Timothy J.
Harvey
>> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves},
>>   YEAR       = "2002",
>>   TITLE      = "Fast copy coalescing and live-range
identification",
> 
> Yep, this is the one I was thinking of.  It is available online here:
> http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf
I was just looking at this today.  One thing that strikes me about
all these papers I've read on the topic is that no one seems to
consider the interaction of coalescing with spilling.  By definition
coalescing increases lifetimes and can cause more interferences.
Yet the papers all try to coalesce as many copies as possible.

On, say, the x86 machines, the cost of a spill is really not much
different from the cost of a copy so it's probably close to a wash in
the end.  But there are many machines where the cost of each operation
is vastly different.  Aggressive coalescing is not always good.  Often
you'd rather copy than spill.

Is anyone aware of publications addressing the interplay among
coalescing, live range splitting, register allocation and spilling?

This is one of the reasons I want to separate all of these concerns.
We shouldn't force developers to always coalesce or always use some
generic measure of spill cost.

                                -Dave

Evan Cheng

2007-Apr-16 22:20 UTC

head link

[LLVMdev] Regalloc Refactoring

On Apr 16, 2007, at 1:17 PM, David Greene wrote:
> Chris Lattner wrote:
>> On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote:
>>>> I'm definitely interested in improving coalescing and it
sounds
>>>> like
>>>> this would fall under that work.  Do you have references to
papers
>>>> that talk about the various algorithms?
>>> Some suggestions:
>>>
>>> @InProceedings{Budimlic02,
>>>   AUTHOR     = {Zoran Budimlic and Keith D. Cooper and Timothy J.  
>>> Harvey
>>> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves},
>>>   YEAR       = "2002",
>>>   TITLE      = "Fast copy coalescing and live-range
identification",
>>
>> Yep, this is the one I was thinking of.  It is available online here:
>> http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf
>
> I was just looking at this today.  One thing that strikes me about
> all these papers I've read on the topic is that no one seems to
> consider the interaction of coalescing with spilling.  By definition
> coalescing increases lifetimes and can cause more interferences.
> Yet the papers all try to coalesce as many copies as possible.
>
> On, say, the x86 machines, the cost of a spill is really not much
> different from the cost of a copy so it's probably close to a wash in
> the end.  But there are many machines where the cost of each operation
> is vastly different.  Aggressive coalescing is not always good.  Often
> you'd rather copy than spill.
>
> Is anyone aware of publications addressing the interplay among
> coalescing, live range splitting, register allocation and spilling?
>
> This is one of the reasons I want to separate all of these concerns.
> We shouldn't force developers to always coalesce or always use some
> generic measure of spill cost.
These are "implementation details". They don't necessarily make
good
paper material. :-)

Obviously, smart heuristics can make a big difference here (estimated  
register pressures, etc.) But the more important thing is how the  
passes down stream can recover from the earlier mistakes. By this, we  
mean live range splitting and re-materialization. Unfortunately,  
neither of these are feasible given the current infrastructure. Chris  
and I have been discussing a rewrite informally, but no formal plan  
has been materialized.

Evan
>
>                                 -Dave
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2007-Apr-16 22:43 UTC

head link

[LLVMdev] Regalloc Refactoring

On Mon, 16 Apr 2007, David Greene wrote:>> Yep, this is the one I was thinking of.  It is available online here:
>> http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf
>
> I was just looking at this today.  One thing that strikes me about
> all these papers I've read on the topic is that no one seems to
> consider the interaction of coalescing with spilling.  By definition
> coalescing increases lifetimes and can cause more interferences.
> Yet the papers all try to coalesce as many copies as possible.
Funny you should mention it, we've been (okay, Evan has been ;-) fighting 
this issue lately.
> On, say, the x86 machines, the cost of a spill is really not much
> different from the cost of a copy so it's probably close to a wash in
> the end.  But there are many machines where the cost of each operation
> is vastly different.  Aggressive coalescing is not always good.  Often
> you'd rather copy than spill.
Even on X86, load are more expensive than copies (At least on sane 
implementations).
> Is anyone aware of publications addressing the interplay among
> coalescing, live range splitting, register allocation and spilling?
>
> This is one of the reasons I want to separate all of these concerns.
> We shouldn't force developers to always coalesce or always use some
> generic measure of spill cost.
In my mind (a dangerous place to go), it's not about coallescing vs 
spilling, it's about splitting vs spilling.

In particular, coallescing does increase live ranges of values, but you 
can have the same increase from code written without copies.  Basically, 
not doing aggressive copy coallescing means that you are dependent on the 
user telling you where live ranges should be split.  In an aggressive 
compiler, the user doesn't actually have control, as most of the explicit 
copies are gone by the time regalloc happens.  This means that you have an 
arbitrary set of copies that come from things like 2-addr elim, phi elim, 
extensions that don't generate code, etc.

Instead of doing this.  I think the right design is to:

1. Coallesce vregs together as aggressively as possible.  Build huge live 
ranges, delete as many copies as possible.
2. Start priority based register allocation.
3. For each live range without a register available, consider:
    1. live range splitting
    2. remat
    3. spilling

In particular, by doing live range splitting, the *register allocator* can 
control where the copies go, rather than relying on luck that previous 
passes introduce copies in convenient places.

In practice, right now we have no splitting capability.  In addition, we 
coallesce physregs with vregs, which can end up pinning a physreg across 
an entire function even if it has few uses.  This is all very bad, Evan is 
working on making the coallescer more careful about coallescing vregs with 
pregs now.

In the next 6 months or so, I'd like to start investigating a bigger 
refactoring of linear scan, which will hopefully allow us to build in some 
of the smarter techniques like aggressive remat, splitting, etc.  The 
first step to getting that is to make sure the datastructures (e.g. live 
intervals) are sane.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Anton Vayvod

2007-Apr-17 05:21 UTC

head link

[LLVMdev] Regalloc Refactoring

On 4/17/07, David Greene <greened at obbligato.org> wrote:
> Is anyone aware of publications addressing the interplay among
> coalescing, live range splitting, register allocation and spilling?

 I think in the paper I mentioned above ("Design and Implementation of GCRA
in GCC"; though the implementation failed at last the paper itself contains
a lot of useful information) there are descriptions of different coalescing
techniques: iterated, optimistic and extended coalescing. You can find the
papers on these methods at citeseer, for example. Here are the links:
http://citeseer.ist.psu.edu/park98optimistic.html for optimistic and
http://citeseer.ist.psu.edu/197136.html for iterated coalescing.

The main idea is to try to split live ranges of the coalesced virtual regs
that haven't got a color during the select step and to color them
separately. I think this is exactly what Chris suggests (with
rematerialization, too, of course).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20070417/1f06a106/attachment.html>

Daniel Berlin

2007-Apr-17 17:37 UTC

head link

[LLVMdev] Regalloc Refactoring

On 4/16/07, David Greene <greened at obbligato.org>
wrote:> Chris Lattner wrote:
> > On Thu, 12 Apr 2007, Fernando Magno Quintao Pereira wrote:
> >>> I'm definitely interested in improving coalescing and it
sounds like
> >>> this would fall under that work.  Do you have references to
papers
> >>> that talk about the various algorithms?
> >> Some suggestions:
> >>
> >> @InProceedings{Budimlic02,
> >>   AUTHOR     = {Zoran Budimlic and Keith D. Cooper and Timothy J.
Harvey
> >> and Ken Kennedy and Timothy S. Oberg and Steven W. Reeves},
> >>   YEAR       = "2002",
> >>   TITLE      = "Fast copy coalescing and live-range
identification",
> >
> > Yep, this is the one I was thinking of.  It is available online here:
> > http://www.cs.rice.edu/~keith/LACSI/pldi02.pdf
>
> I was just looking at this today.  One thing that strikes me about
> all these papers I've read on the topic is that no one seems to
> consider the interaction of coalescing with spilling.
>   By definition
> coalescing increases lifetimes and can cause more interferences.
> Yet the papers all try to coalesce as many copies as possible.
Most papers I read these days try to coalesce anything that won't
cause spills.  Some of these apparoaches are to coalesce everything,
then undo as necessary to avoid spills, etc.

But certainly, I don't believe recent papers in the area of coalescing
ignore spill costs.
>
> Is anyone aware of publications addressing the interplay among
> coalescing, live range splitting, register allocation and spilling?
Yes, they are around, i'll dig them out of my pdfs :)

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Apr 2007 - [LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

[LLVMdev] Regalloc Refactoring

Seemingly Similar Threads