Hello LLVM people, Our customizable TTA target [1] is capable of having plenty of registers and register file ports to improve instruction level parallelism and reduce spills. It's totally up to the designer of the particular TTA processor how much the processor has registers and register file resources along with other TTA components. We have ported LLVM 2.1 to produce an intermediate TTA program format with registers allocated (it adapts to the architecture's register files) but without operations bound to target's function units, nor scheduled to instructions. The operation binding and the final scheduling is done as a post pass in our TTA-framework called TCE [2] (not yet released to public). The problem is that the register allocators in LLVM seem to reuse registers too much, that is, allocate them "too sparingly". This leads to false dependencies which unnecessarily restrict scheduling freedom in our post pass instruction scheduler. That is, even if the target has 64 registers, LLVM might only use, say 13, and have multiple writes to the same register in the same basic block. Is there some easy way to tune this behavior so it reuses registers only if it really has to (when it runs out of new ones)? I can understand that for many architectures without static scheduling etc. the "allocate registers sparingly"-approach is the best one as it might lead to less context data to be saved in calls, etc., but in our case the calls are not usually the bottleneck as our target programs are at the moment mainly DSP kernels and multimedia codecs which are usually loop-oriented. In addition, LLVM seems to do an excellent job in inlining function calls. I look forward to your replies. The links: [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf Best regards, -- Pekka Jääskeläinen, Researcher at Tampere Univ. of Technology, Finland.
On Nov 5, 2007, at 2:55 AM, Pekka Jääskeläinen wrote:> Hello LLVM people, > > Our customizable TTA target [1] is capable of having plenty of > registers > and register file ports to improve instruction level parallelism and > reduce spills. It's totally up to the designer of the particular TTA > processor how much the processor has registers and register file > resources > along with other TTA components. > > We have ported LLVM 2.1 to produce an intermediate TTA program > format with > registers allocated (it adapts to the architecture's register > files) but without > operations bound to target's function units, nor scheduled to > instructions. > The operation binding and the final scheduling is done as a post > pass in > our TTA-framework called TCE [2] (not yet released to public). > > The problem is that the register allocators in LLVM seem to reuse > registers > too much, that is, allocate them "too sparingly". This leads to false > dependencies which unnecessarily restrict scheduling freedom in our > post pass > instruction scheduler. That is, even if the target has 64 > registers, LLVM > might only use, say 13, and have multiple writes to the same > register in > the same basic block. Is there some easy way to tune this behavior > so it > reuses registers only if it really has to (when it runs out of new > ones)?What you want is to remove anti-dependency before post-allocation scheduling. Alas, there is nothing in current LLVM implementation that deals with this. There are quite a few papers on this topic, e.g. http://citeseer.ist.psu.edu/calland97removal.html I am also very interested in this. Do you plan to contribute your work back to the community? :-) Evan> > I can understand that for many architectures without static > scheduling etc. > the "allocate registers sparingly"-approach is the best one as it > might lead > to less context data to be saved in calls, etc., but in our case > the calls are > not usually the bottleneck as our target programs are at the moment > mainly DSP > kernels and multimedia codecs which are usually loop-oriented. In > addition, > LLVM seems to do an excellent job in inlining function calls. > > I look forward to your replies. > > The links: > > [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture > [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf > > Best regards, > -- > Pekka Jääskeläinen, > Researcher at Tampere Univ. of Technology, Finland. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Evan Cheng wrote:> What you want is to remove anti-dependency before post-allocation > scheduling. Alas, there is nothing in current LLVM implementation > that deals with this. There are quite a few papers on this topic, > e.g. http://citeseer.ist.psu.edu/calland97removal.htmlThank you for the pointer, I'll take a look when I have the time again. I was kind of hoping we could have avoided the unnecessary anti-deps the first place, for example, by means of a more sane distribution of registers in the register allocator, but I guess this kind of pass that removes them after allocation would be fine too.> I am also very interested in this. Do you plan to contribute your > work back to the community? :-)I certainly hope we could produce something we can contribute, I cannot give any promises at this point though :) -- --PJ
Evan Cheng wrote > What you want is to remove anti-dependency before post-allocation > scheduling. Alas, there is nothing in current LLVM implementation > that deals with this. There are quite a few papers on this topic, > e.g. http://citeseer.ist.psu.edu/calland97removal.html Much of the work (like the one you cited) deals with arrays, and the dependences are "real" in the sense that they exist in the original program. Dealing with antidependences introduced by register allocation is different in that the variables involved are scalars. The dependences may also be an artefact of the choices made by the register allocator, that is, the dependences may not be present in the original program. What this means is that the problem should hopefully be simpler as only scalars are involved. But I guess it might also mean reverse-engineering some information that was lost when the register allocator decided to place distinct variables in the same physical register. -- Pertti
On Nov 5, 2007, at 2:55 AM, Pekka Jääskeläinen wrote:> Hello LLVM people, > > Our customizable TTA target [1] is capable of having plenty of > registers > and register file ports to improve instruction level parallelism and > reduce spills. It's totally up to the designer of the particular TTA > processor how much the processor has registers and register file > resources > along with other TTA components. > > We have ported LLVM 2.1 to produce an intermediate TTA program > format with > registers allocated (it adapts to the architecture's register > files) but without > operations bound to target's function units, nor scheduled to > instructions. > The operation binding and the final scheduling is done as a post > pass in > our TTA-framework called TCE [2] (not yet released to public). > > The problem is that the register allocators in LLVM seem to reuse > registers > too much, that is, allocate them "too sparingly". This leads to false > dependencies which unnecessarily restrict scheduling freedom in our > post pass > instruction scheduler. That is, even if the target has 64 > registers, LLVM > might only use, say 13, and have multiple writes to the same > register in > the same basic block. Is there some easy way to tune this behavior > so it > reuses registers only if it really has to (when it runs out of new > ones)?I'm only passingly familiar with the register allocator in LLVM, but I'd think that you could tweak it's allocation policy to allocate out of, say, sequential registers and only reuse registers with dead values once it runs out of completely empty ones. Is this what you want? -- Christopher Lamb> I can understand that for many architectures without static > scheduling etc. > the "allocate registers sparingly"-approach is the best one as it > might lead > to less context data to be saved in calls, etc., but in our case > the calls are > not usually the bottleneck as our target programs are at the moment > mainly DSP > kernels and multimedia codecs which are usually loop-oriented. In > addition, > LLVM seems to do an excellent job in inlining function calls. > > I look forward to your replies. > > The links: > > [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture > [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf > > Best regards, > -- > Pekka Jääskeläinen, > Researcher at Tampere Univ. of Technology, Finland. > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071106/97fa2684/attachment.html>
On 11/06/2007 07:09 PM, Christopher Lamb wrote:> I'm only passingly familiar with the register allocator in LLVM, but I'd > think that you could tweak it's allocation policy to allocate out of, > say, sequential registers and only reuse registers with dead values once > it runs out of completely empty ones. Is this what you want?Exactly. I was hoping there was some generic framework to tweak the "allocation policy" so it would apply for all register allocators, but I suppose there is none at the moment. In addition, for our case, it could help to "flush" all live values to memory before entering a long-running loop (which does not use the flushed values) to free more registers for it. This would be useful also for software pipelining which increases register pressure. I'll see what I can do. I hopefully get more time to work on this after the new year. Greets, -- Pekka
On Nov 6, 2007, at 9:09 AM, Christopher Lamb wrote:> > > I'm only passingly familiar with the register allocator in LLVM, but > I'd think that you could tweak it's allocation policy to allocate > out of, say, sequential registers and only reuse registers with dead > values once it runs out of completely empty ones. Is this what you > want?That will help. But it's not really a clean solution. A heuristic based solution is never going to achieve the kind of balanced usage that's desired. On the other hand, repairing anti-dependencies is a well understood problem and if it's implemented will bring a lot more benefits (e.g. post allocation scheduling). Evan> > -- > Christopher Lamb > >> I can understand that for many architectures without static >> scheduling etc. >> the "allocate registers sparingly"-approach is the best one as it >> might lead >> to less context data to be saved in calls, etc., but in our case >> the calls are >> not usually the bottleneck as our target programs are at the moment >> mainly DSP >> kernels and multimedia codecs which are usually loop-oriented. In >> addition, >> LLVM seems to do an excellent job in inlining function calls. >> >> I look forward to your replies. >> >> The links: >> >> [1] http://en.wikipedia.org/wiki/Transport_triggered_architecture >> [2] http://tce.cs.tut.fi/papers/TCE_Overview.pdf >> >> Best regards, >> -- >> Pekka Jääskeläinen, >> Researcher at Tampere Univ. of Technology, Finland. >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20071106/d2fd74ed/attachment.html>
Seemingly Similar Threads
- [LLVMdev] allocating registers less "sparingly"
- [LLVMdev] allocating registers less "sparingly"
- [LLVMdev] [PATCH] increase the max number of physical registers
- [LLVMdev] Master's thesis: Retargetable Compiler Backend for Transport Triggered Architectures
- [LLVMdev] [PATCH] increase the max number of physical registers