thr3ads.net - llvm dev - [llvm-dev] Best way to globally schedule MachineInstrs [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Denis Steckelmacher via llvm-dev

2020-Jul-12 15:29 UTC

[llvm-dev] Best way to globally schedule MachineInstrs

Hello,

It's probably not the first time this question is asked, but I got no luck
with my Google searches and tours of the LLVM code-base.

I would like to, in the safest, most correct and most general way possible,
schedule MachineInstrs across basic blocks. The idea is that in an in-order
issue but out-of-order retire machine (common in many open-source FPGA-based
microprocessors, that are in-order but have several execution units), it is
important to consume the result of instructions as late as possible. For
instance, "1+(1/x)" should issue the division as early as possible,
but wait as much as possible before having to add 1 to the result of the
division.

All the scheduling approaches that I've found in LLVM seem to work on basic
blocks. More precisely, they work on scheduling regions, that are sub-portions
of basic blocks. I have found the LLVM bitcode Sink pass that moves instructions
down to later basic blocks, and the MachineSink pass, that does the same but for
MachineInstrs. They address my problem, as long as I mark "fdiv" to be
not-sinkable in TargetInstrInfo::shouldSink. However, they seem to sink sinkable
instructions as much as possible, without much reasoning about how far they
should be sunk, what the impact on register pressure it would have, etc. Am I
correct? I also find it difficult to maintain to have some scheduling info in
.td files, some in TargetSubtarget::adjustSchedDependency, and some in
TargetInstrInfo::shouldSink.

So, is there something that I missed somewhere, and that could allow me to,
preferably in one place, describe how various long-dependency instructions
should be scheduled across basic blocks? Things like "fsin should be
emitted as early as possible" and "consumers of sink can wait after
<that whole loop> before being emitted"?

Best regards,
Denis

Florian Hahn via llvm-dev

2020-Jul-13 10:03 UTC

head link

[llvm-dev] Best way to globally schedule MachineInstrs

Hi,
> On Jul 12, 2020, at 16:29, Denis Steckelmacher via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hello,
> 
> It's probably not the first time this question is asked, but I got no
luck with my Google searches and tours of the LLVM code-base.
> 
> I would like to, in the safest, most correct and most general way possible,
schedule MachineInstrs across basic blocks. The idea is that in an in-order
issue but out-of-order retire machine (common in many open-source FPGA-based
microprocessors, that are in-order but have several execution units), it is
important to consume the result of instructions as late as possible. For
instance, "1+(1/x)" should issue the division as early as possible,
but wait as much as possible before having to add 1 to the result of the
division.
> 
> All the scheduling approaches that I've found in LLVM seem to work on
basic blocks. More precisely, they work on scheduling regions, that are
sub-portions of basic blocks. I have found the LLVM bitcode Sink pass that moves
instructions down to later basic blocks, and the MachineSink pass, that does the
same but for MachineInstrs. They address my problem, as long as I mark
"fdiv" to be not-sinkable in TargetInstrInfo::shouldSink. However,
they seem to sink sinkable instructions as much as possible, without much
reasoning about how far they should be sunk, what the impact on register
pressure it would have, etc. Am I correct?
> 
Yes, scheduling currently only works in sub-portions of basic blocks and it
looks like the sinking passes try to sink any instruction they can to their
successors, mostly independent of the impact on latency/resource usage and
register pressure.
> I also find it difficult to maintain to have some scheduling info in .td
files, some in TargetSubtarget::adjustSchedDependency, and some in
TargetInstrInfo::shouldSink.
The .td files contain static information about the instructions available for a
target (latency, resource usage). Additional hooks like adjustSchedDependency
and shouldSink allow making decision/adjustments based on a MachineInstr/SUnit.
There you have access to the containing block, concrete operands and more, so
you can make decisions based on more information than by defining things in the
.td files.
> So, is there something that I missed somewhere, and that could allow me to,
preferably in one place, describe how various long-dependency instructions
should be scheduled across basic blocks? Things like "fsin should be
emitted as early as possible" and "consumers of sink can wait after
<that whole loop> before being emitted"?
> 
Extending the scheduler to work across scheduling boundaries is probably a
relatively big project and if you are mostly interested in adjusting the
location of a few instructions, adding better cost modeling to the sinking
passes might be a good first step.

For that, MachineTraceMetrics might be helpful
(https://llvm.org/doxygen/classllvm_1_1MachineTraceMetrics.html). It generates
traces representing plausible sequences of executed basic blocks passing through
a given block and computes resource usage/latencies through the trace. See the
MachineCombiner as an example user. To estimate register pressure through the
trace, RegPressureTracker
(https://llvm.org/doxygen/classllvm_1_1RegPressureTracker.html) may be helpful.

Cheers,
Florian

llvm dev - Jul 2020 - Best way to globally schedule MachineInstrs

[llvm-dev] Best way to globally schedule MachineInstrs

[llvm-dev] Best way to globally schedule MachineInstrs