thr3ads.net - llvm dev - [LLVMdev] selection dag speedups / llc speedups [May 2010]

If this information is useful, please help other people find it:
Share via:

Jan Voung

2010-May-14 18:24 UTC

[LLVMdev] selection dag speedups / llc speedups

Hello folks,

I'm sure this has been asked many times, but is there current work on
decreasing the time taken by the DAG-based instruction selector, or the
other phases of llc? I am just beginning to dive into LLVM, and I am
interested in compile-time reductions that do not reduce code quality
dramatically. For example, simply switching on "-fast-isel" (roughly
17%
drop in code quality for some of my benchmarks) or
"-regalloc=local/fast"
(roughly 2x slower for some of my benchmarks) is, unfortunately, not an
acceptable option.

I noticed that a semi-recent commit added the
"STATISTIC(NumDAGIselRetries,"Number of times dag isel has to try
another
path");" as well as a sorting change in DAGISelEmitter.cpp. This
suggests to
me that there is work being done. Are there other similar efforts, but
perhaps aiming for bigger gains, that I should be aware of? From the other
end, are there efforts to make the fast-path produce better quality code?

Some small things I've noticed:

 - LegalizeVectors might not make a single change over an entire input bc
file, yet it can take about 6% of the ISel time. Identifying that this can
be skipped ahead of time may help?
 - We can take the Changed flag returned by LegalizeVectors, and pass it on
to Legalize so that it can decide not to AssignTopologicalOrder again, since
it was already done in the LegalizeVectors pass. However, this is a
negligible reduction: 0.2 seconds shaved off a 30+ second compile =).
 - BuildSchedGraph() can take around 30% of the X% of time taken by
Instruction Scheduling. How different is the scheduler graph from the
selection graph?
 - Just as a random guess that this would not affect code quality
terribly, I tried dynamically switching on -fast-isel for basic blocks of
size one. This can cover about 10% of the basic blocks. The code-quality
drop on a few benchmarks is neglible, but for spec's 483.xalancbmk it is
quite dramatic.

Also, any pointers to documentation, or description of the algorithm used by
the Select() -> SelectCode() -> SelectCodeCommon() pass would be very
helpful.

Sorry for the long email, and many thanks in advance!

- Jan Voung
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100514/16e1c9ea/attachment.html>

Dan Gohman

2010-May-17 20:53 UTC

head link

[LLVMdev] selection dag speedups / llc speedups

On May 14, 2010, at 11:24 AM, Jan Voung wrote:
> Hello folks,
> 
> I'm sure this has been asked many times, but is there current work on
decreasing the time taken by the DAG-based instruction selector, or the other
phases of llc? I am just beginning to dive into LLVM, and I am interested in
compile-time reductions that do not reduce code quality dramatically. For
example, simply switching on "-fast-isel" (roughly 17% drop in code
quality for some of my benchmarks) or "-regalloc=local/fast" (roughly
2x slower for some of my benchmarks) is, unfortunately, not an acceptable
option.
> 
> I noticed that a semi-recent commit added the
"STATISTIC(NumDAGIselRetries,"Number of times dag isel has to try
another path");" as well as a sorting change in DAGISelEmitter.cpp.
This suggests to me that there is work being done. Are there other similar
efforts, but perhaps aiming for bigger gains, that I should be aware of? From
the other end, are there efforts to make the fast-path produce better quality
code?\
The specific work you mention was aimed at reducing isel's own code
size. Previously, isel used a huge mountain of generated code which took
a long time to compile, and consumed a large amount of memory.  The
new isel fixes these problems, but underneath it's using the same
conceptual algorithm, and I believe is about the same speed.

SelectionDAG performance is important. It hasn't seen a lot of
attention recently, as the "-O0" use case has been where most of
the compile-time attention has gone. But speeding up SelectionDAG
isel would also be useful for many different purposes.
> 
> Some small things I've noticed:
> 
>  - LegalizeVectors might not make a single change over an entire input bc
file, yet it can take about 6% of the ISel time. Identifying that this can be
skipped ahead of time may help?
Sounds reasonable. Perhaps there could be a SelectionDAG-wide flag which
indicates whether any vectors are present.

Another thing that could be done is avoiding running DAGCombine when
it isn't needed. I believe there are a few cases where the code just
reruns DAGCombine even if the preceding Legalize pass doesn't make
any changes.

Longer term, I've also heard some ideas tossed around of merging
LegalizeDAG.cpp and now also LegalizeVectorsOps.cpp with the new
LegalizeTypes infrastructure so that everything would run as a single
pass, which would greatly reduce the amount of "Walk and hash all the
nodes" work that these passes currently do.

Another thing to look at is the underlying FoldingSet data structure.
Right now, node lookup requires a lot of recomputation of hash data.
>  - We can take the Changed flag returned by LegalizeVectors, and pass it on
to Legalize so that it can decide not to AssignTopologicalOrder again, since it
was already done in the LegalizeVectors pass. However, this is a negligible
reduction: 0.2 seconds shaved off a 30+ second compile =).
Ok.
>  - BuildSchedGraph() can take around 30% of the X% of time taken by
Instruction Scheduling. How different is the scheduler graph from the selection
graph?
The scheduling graph is constructed using the selection graph information,
but it does different things. Constants and other nodes which don't
correspond
to instructions are excluded, nodes which must be scheduled together are
merged, extra edges can be added, the schedule graph doesn't do automatic
CSE on edge changes, etc.  The schedule graph also has data to track
node height and depth, and other information used by the scheduler.

Dan

Jakob Stoklund Olesen

2010-May-17 21:52 UTC

head link

[LLVMdev] selection dag speedups / llc speedups

On May 14, 2010, at 11:24 AM, Jan Voung wrote:
> I'm sure this has been asked many times, but is there current work on
decreasing the time taken by the DAG-based instruction selector, or the other
phases of llc? I am just beginning to dive into LLVM, and I am interested in
compile-time reductions that do not reduce code quality dramatically. For
example, simply switching on "-fast-isel" (roughly 17% drop in code
quality for some of my benchmarks) or "-regalloc=local/fast" (roughly
2x slower for some of my benchmarks) is, unfortunately, not an acceptable
option.
The fast and local register allocators are meant to be used on unoptimized code,
a 'Debug build'. While they do work on optimized code, they do not give
good results. Their primary goal is compile time, not code quality.

/jakob

Rafael Espindola

2010-May-18 04:09 UTC

head link

[LLVMdev] selection dag speedups / llc speedups

> The fast and local register allocators are meant to be used on unoptimized
code, a 'Debug build'. While they do work on optimized code, they do not
give good results. Their primary goal is compile time, not code quality.
Yes, we have a somewhat uncommon use case. It is fine to spend time
optimizing bitcode (LTO is a OK), but we want to make the final IL ->
Executable translation as fast as possible.
> /jakob
Cheers,
-- 
Rafael Ávila de Espíndola

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - May 2010 - [LLVMdev] selection dag speedups / llc speedups

[LLVMdev] selection dag speedups / llc speedups

[LLVMdev] selection dag speedups / llc speedups

[LLVMdev] selection dag speedups / llc speedups

[LLVMdev] selection dag speedups / llc speedups

Reasonably Related Threads