Hello folks, I'm sure this has been asked many times, but is there current work on decreasing the time taken by the DAG-based instruction selector, or the other phases of llc? I am just beginning to dive into LLVM, and I am interested in compile-time reductions that do not reduce code quality dramatically. For example, simply switching on "-fast-isel" (roughly 17% drop in code quality for some of my benchmarks) or "-regalloc=local/fast" (roughly 2x slower for some of my benchmarks) is, unfortunately, not an acceptable option. I noticed that a semi-recent commit added the "STATISTIC(NumDAGIselRetries,"Number of times dag isel has to try another path");" as well as a sorting change in DAGISelEmitter.cpp. This suggests to me that there is work being done. Are there other similar efforts, but perhaps aiming for bigger gains, that I should be aware of? From the other end, are there efforts to make the fast-path produce better quality code? Some small things I've noticed: - LegalizeVectors might not make a single change over an entire input bc file, yet it can take about 6% of the ISel time. Identifying that this can be skipped ahead of time may help? - We can take the Changed flag returned by LegalizeVectors, and pass it on to Legalize so that it can decide not to AssignTopologicalOrder again, since it was already done in the LegalizeVectors pass. However, this is a negligible reduction: 0.2 seconds shaved off a 30+ second compile =). - BuildSchedGraph() can take around 30% of the X% of time taken by Instruction Scheduling. How different is the scheduler graph from the selection graph? - Just as a random guess that this would not affect code quality terribly, I tried dynamically switching on -fast-isel for basic blocks of size one. This can cover about 10% of the basic blocks. The code-quality drop on a few benchmarks is neglible, but for spec's 483.xalancbmk it is quite dramatic. Also, any pointers to documentation, or description of the algorithm used by the Select() -> SelectCode() -> SelectCodeCommon() pass would be very helpful. Sorry for the long email, and many thanks in advance! - Jan Voung -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100514/16e1c9ea/attachment.html>
On May 14, 2010, at 11:24 AM, Jan Voung wrote:> Hello folks, > > I'm sure this has been asked many times, but is there current work on decreasing the time taken by the DAG-based instruction selector, or the other phases of llc? I am just beginning to dive into LLVM, and I am interested in compile-time reductions that do not reduce code quality dramatically. For example, simply switching on "-fast-isel" (roughly 17% drop in code quality for some of my benchmarks) or "-regalloc=local/fast" (roughly 2x slower for some of my benchmarks) is, unfortunately, not an acceptable option. > > I noticed that a semi-recent commit added the "STATISTIC(NumDAGIselRetries,"Number of times dag isel has to try another path");" as well as a sorting change in DAGISelEmitter.cpp. This suggests to me that there is work being done. Are there other similar efforts, but perhaps aiming for bigger gains, that I should be aware of? From the other end, are there efforts to make the fast-path produce better quality code?\The specific work you mention was aimed at reducing isel's own code size. Previously, isel used a huge mountain of generated code which took a long time to compile, and consumed a large amount of memory. The new isel fixes these problems, but underneath it's using the same conceptual algorithm, and I believe is about the same speed. SelectionDAG performance is important. It hasn't seen a lot of attention recently, as the "-O0" use case has been where most of the compile-time attention has gone. But speeding up SelectionDAG isel would also be useful for many different purposes.> > Some small things I've noticed: > > - LegalizeVectors might not make a single change over an entire input bc file, yet it can take about 6% of the ISel time. Identifying that this can be skipped ahead of time may help?Sounds reasonable. Perhaps there could be a SelectionDAG-wide flag which indicates whether any vectors are present. Another thing that could be done is avoiding running DAGCombine when it isn't needed. I believe there are a few cases where the code just reruns DAGCombine even if the preceding Legalize pass doesn't make any changes. Longer term, I've also heard some ideas tossed around of merging LegalizeDAG.cpp and now also LegalizeVectorsOps.cpp with the new LegalizeTypes infrastructure so that everything would run as a single pass, which would greatly reduce the amount of "Walk and hash all the nodes" work that these passes currently do. Another thing to look at is the underlying FoldingSet data structure. Right now, node lookup requires a lot of recomputation of hash data.> - We can take the Changed flag returned by LegalizeVectors, and pass it on to Legalize so that it can decide not to AssignTopologicalOrder again, since it was already done in the LegalizeVectors pass. However, this is a negligible reduction: 0.2 seconds shaved off a 30+ second compile =).Ok.> - BuildSchedGraph() can take around 30% of the X% of time taken by Instruction Scheduling. How different is the scheduler graph from the selection graph?The scheduling graph is constructed using the selection graph information, but it does different things. Constants and other nodes which don't correspond to instructions are excluded, nodes which must be scheduled together are merged, extra edges can be added, the schedule graph doesn't do automatic CSE on edge changes, etc. The schedule graph also has data to track node height and depth, and other information used by the scheduler. Dan
Jakob Stoklund Olesen
2010-May-17 21:52 UTC
[LLVMdev] selection dag speedups / llc speedups
On May 14, 2010, at 11:24 AM, Jan Voung wrote:> I'm sure this has been asked many times, but is there current work on decreasing the time taken by the DAG-based instruction selector, or the other phases of llc? I am just beginning to dive into LLVM, and I am interested in compile-time reductions that do not reduce code quality dramatically. For example, simply switching on "-fast-isel" (roughly 17% drop in code quality for some of my benchmarks) or "-regalloc=local/fast" (roughly 2x slower for some of my benchmarks) is, unfortunately, not an acceptable option.The fast and local register allocators are meant to be used on unoptimized code, a 'Debug build'. While they do work on optimized code, they do not give good results. Their primary goal is compile time, not code quality. /jakob
> The fast and local register allocators are meant to be used on unoptimized code, a 'Debug build'. While they do work on optimized code, they do not give good results. Their primary goal is compile time, not code quality.Yes, we have a somewhat uncommon use case. It is fine to spend time optimizing bitcode (LTO is a OK), but we want to make the final IL -> Executable translation as fast as possible.> /jakobCheers, -- Rafael Ávila de Espíndola
Possibly Parallel Threads
- [LLVMdev] selection dag speedups / llc speedups
- [LLVMdev] selection dag speedups / llc speedups
- [LLVMdev] selection dag speedups / llc speedups
- [LLVMdev] [PATCH] Add new phase to legalization to handle vector operations
- [LLVMdev] selection dag speedups / llc speedups