thr3ads.net - llvm dev - [LLVMdev] allocating registers less "sparingly" [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Pekka Jääskeläinen

2007-Nov-05 10:55 UTC

[LLVMdev] allocating registers less "sparingly"

Hello LLVM people,

Our customizable TTA target [1] is capable of having plenty of registers
and register file ports to improve instruction level parallelism and
reduce spills. It's totally up to the designer of the particular TTA
processor how much the processor has registers and register file resources
along with other TTA components.

We have ported LLVM 2.1 to produce an intermediate TTA program format with
registers allocated (it adapts to the architecture's register files) but
without
operations bound to target's function units, nor scheduled to instructions.
The operation binding and the final scheduling is done as a post pass in
our TTA-framework called TCE [2] (not yet released to public).

The problem is that the register allocators in LLVM seem to reuse registers
too much, that is, allocate them "too sparingly". This leads to false
dependencies which unnecessarily restrict scheduling freedom in our post pass
instruction scheduler. That is, even if the target has 64 registers, LLVM
might only use, say 13, and have multiple writes to the same register in
the same basic block. Is there some easy way to tune this behavior so it
reuses registers only if it really has to (when it runs out of new ones)?

I can understand that for many architectures without static scheduling etc.
the "allocate registers sparingly"-approach is the best one as it
might lead
to less context data to be saved in calls, etc., but in our case the calls are
not usually the bottleneck as our target programs are at the moment mainly DSP
kernels and multimedia codecs which are usually loop-oriented. In addition,
LLVM seems to do an excellent job in inlining function calls.

I look forward to your replies.

The links:

[1] http://en.wikipedia.org/wiki/Transport_triggered_architecture
[2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf

Best regards,
-- 
Pekka Jääskeläinen,
Researcher at Tampere Univ. of Technology, Finland.

Evan Cheng

2007-Nov-05 17:21 UTC

head link

[LLVMdev] allocating registers less "sparingly"

On Nov 5, 2007, at 2:55 AM, Pekka Jääskeläinen wrote:
> Hello LLVM people,
>
> Our customizable TTA target [1] is capable of having plenty of  
> registers
> and register file ports to improve instruction level parallelism and
> reduce spills. It's totally up to the designer of the particular TTA
> processor how much the processor has registers and register file  
> resources
> along with other TTA components.
>
> We have ported LLVM 2.1 to produce an intermediate TTA program  
> format with
> registers allocated (it adapts to the architecture's register  
> files) but without
> operations bound to target's function units, nor scheduled to  
> instructions.
> The operation binding and the final scheduling is done as a post  
> pass in
> our TTA-framework called TCE [2] (not yet released to public).
>
> The problem is that the register allocators in LLVM seem to reuse  
> registers
> too much, that is, allocate them "too sparingly". This leads to
false
> dependencies which unnecessarily restrict scheduling freedom in our  
> post pass
> instruction scheduler. That is, even if the target has 64  
> registers, LLVM
> might only use, say 13, and have multiple writes to the same  
> register in
> the same basic block. Is there some easy way to tune this behavior  
> so it
> reuses registers only if it really has to (when it runs out of new  
> ones)?
What you want is to remove anti-dependency before post-allocation  
scheduling. Alas, there is nothing in current LLVM implementation  
that deals with this. There are quite a few papers on this topic,  
e.g. http://citeseer.ist.psu.edu/calland97removal.html

I am also very interested in this. Do you plan to contribute your  
work back to the community? :-)

Evan
>
> I can understand that for many architectures without static  
> scheduling etc.
> the "allocate registers sparingly"-approach is the best one as it
> might lead
> to less context data to be saved in calls, etc., but in our case  
> the calls are
> not usually the bottleneck as our target programs are at the moment  
> mainly DSP
> kernels and multimedia codecs which are usually loop-oriented. In  
> addition,
> LLVM seems to do an excellent job in inlining function calls.
>
> I look forward to your replies.
>
> The links:
>
> [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture
> [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf
>
> Best regards,
> -- 
> Pekka Jääskeläinen,
> Researcher at Tampere Univ. of Technology, Finland.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Pekka Jääskeläinen

2007-Nov-06 05:54 UTC

head link

[LLVMdev] allocating registers less "sparingly"

Evan Cheng wrote:> What you want is to remove anti-dependency before post-allocation  
> scheduling. Alas, there is nothing in current LLVM implementation  
> that deals with this. There are quite a few papers on this topic,  
> e.g. http://citeseer.ist.psu.edu/calland97removal.html
Thank you for the pointer, I'll take a look when I have the time again.

I was kind of hoping we could have avoided the unnecessary anti-deps
the first place, for example, by means of a more sane distribution of
registers in the register allocator, but I guess this kind of pass
that removes them after allocation would be fine too.
> I am also very interested in this. Do you plan to contribute your  
> work back to the community? :-)
I certainly hope we could produce something we can contribute, I
cannot give any promises at this point though :)

-- 
--PJ

Pertti Kellomäki

2007-Nov-06 11:40 UTC

head link

[LLVMdev] allocating registers less "sparingly"

Evan Cheng wrote
 > What you want is to remove anti-dependency before post-allocation
 > scheduling. Alas, there is nothing in current LLVM implementation
 > that deals with this. There are quite a few papers on this topic,
 > e.g. http://citeseer.ist.psu.edu/calland97removal.html

Much of the work (like the one you cited) deals with arrays,
and the dependences are "real" in the sense that they exist
in the original program. Dealing with antidependences introduced
by register allocation is different in that the variables involved
are scalars. The dependences may also be an artefact of the choices
made by the register allocator, that is, the dependences may not
be present in the original program.

What this means is that the problem should hopefully be simpler
as only scalars are involved. But I guess it might also mean
reverse-engineering some information that was lost when the
register allocator decided to place distinct variables in the same
physical register.
-- 
Pertti

Christopher Lamb

2007-Nov-06 17:09 UTC

head link

[LLVMdev] allocating registers less "sparingly"

On Nov 5, 2007, at 2:55 AM, Pekka Jääskeläinen wrote:
> Hello LLVM people,
>
> Our customizable TTA target [1] is capable of having plenty of  
> registers
> and register file ports to improve instruction level parallelism and
> reduce spills. It's totally up to the designer of the particular TTA
> processor how much the processor has registers and register file  
> resources
> along with other TTA components.
>
> We have ported LLVM 2.1 to produce an intermediate TTA program  
> format with
> registers allocated (it adapts to the architecture's register  
> files) but without
> operations bound to target's function units, nor scheduled to  
> instructions.
> The operation binding and the final scheduling is done as a post  
> pass in
> our TTA-framework called TCE [2] (not yet released to public).
>
> The problem is that the register allocators in LLVM seem to reuse  
> registers
> too much, that is, allocate them "too sparingly". This leads to
false
> dependencies which unnecessarily restrict scheduling freedom in our  
> post pass
> instruction scheduler. That is, even if the target has 64  
> registers, LLVM
> might only use, say 13, and have multiple writes to the same  
> register in
> the same basic block. Is there some easy way to tune this behavior  
> so it
> reuses registers only if it really has to (when it runs out of new  
> ones)?
I'm only passingly familiar with the register allocator in LLVM, but  
I'd think that you could tweak it's allocation policy to allocate out  
of, say, sequential registers and only reuse registers with dead  
values once it runs out of completely empty ones. Is this what you want?

--
Christopher Lamb
> I can understand that for many architectures without static  
> scheduling etc.
> the "allocate registers sparingly"-approach is the best one as it
> might lead
> to less context data to be saved in calls, etc., but in our case  
> the calls are
> not usually the bottleneck as our target programs are at the moment  
> mainly DSP
> kernels and multimedia codecs which are usually loop-oriented. In  
> addition,
> LLVM seems to do an excellent job in inlining function calls.
>
> I look forward to your replies.
>
> The links:
>
> [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture
> [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf
>
> Best regards,
> -- 
> Pekka Jääskeläinen,
> Researcher at Tampere Univ. of Technology, Finland.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071106/97fa2684/attachment.html>

Pekka Jääskeläinen

2007-Nov-06 18:09 UTC

head link

[LLVMdev] allocating registers less "sparingly"

On 11/06/2007 07:09 PM, Christopher Lamb wrote:> I'm only passingly familiar with the register allocator in LLVM, but
I'd
> think that you could tweak it's allocation policy to allocate out of, 
> say, sequential registers and only reuse registers with dead values once 
> it runs out of completely empty ones. Is this what you want?
Exactly. I was hoping there was some generic framework to tweak the
"allocation policy" so it would apply for all register allocators, but
I suppose there is none at the moment.

In addition, for our case, it could help to "flush" all live values to
memory
before entering a long-running loop (which does not use the flushed
values) to free more registers for it. This would be useful also for
software pipelining which increases register pressure.

I'll see what I can do. I hopefully get more time to work on this after
the new year.

Greets,

-- 
Pekka

Evan Cheng

2007-Nov-06 18:40 UTC

head link

[LLVMdev] allocating registers less "sparingly"

On Nov 6, 2007, at 9:09 AM, Christopher Lamb wrote:
>
>
> I'm only passingly familiar with the register allocator in LLVM, but  
> I'd think that you could tweak it's allocation policy to allocate  
> out of, say, sequential registers and only reuse registers with dead  
> values once it runs out of completely empty ones. Is this what you  
> want?
That will help. But it's not really a clean solution. A heuristic  
based solution is never going to achieve the kind of balanced usage  
that's desired.

On the other hand, repairing anti-dependencies is a well understood  
problem and if it's implemented will bring a lot more benefits (e.g.  
post allocation scheduling).

Evan
>
> --
> Christopher Lamb
>
>> I can understand that for many architectures without static  
>> scheduling etc.
>> the "allocate registers sparingly"-approach is the best one
as it
>> might lead
>> to less context data to be saved in calls, etc., but in our case  
>> the calls are
>> not usually the bottleneck as our target programs are at the moment  
>> mainly DSP
>> kernels and multimedia codecs which are usually loop-oriented. In  
>> addition,
>> LLVM seems to do an excellent job in inlining function calls.
>>
>> I look forward to your replies.
>>
>> The links:
>>
>> [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture
>> [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf
>>
>> Best regards,
>> -- 
>> Pekka Jääskeläinen,
>> Researcher at Tampere Univ. of Technology, Finland.
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20071106/d2fd74ed/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Nov 2007 - [LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

[LLVMdev] allocating registers less "sparingly"

Seemingly Similar Threads