Nagurne, James via llvm-dev
2020-Oct-01 20:02 UTC
[llvm-dev] MachinePipeliner and irregular loops
I'm in the process of reading and understanding the background and
implementation of the MachinePipeliner's swing modulo scheduling algorithm.
In short, my question is: "Does the swing modulo scheduling algorithm work
for simple classes of irregular loops such as 'while (x != null) { x =
x->next; }' ". This is an important loop shape to optimize in
embedded benchmarks like coremark (core_list_join, reverse, find, etc.). Since
the loop control is often load -> conditional branch, targets with high
latency loads can often see performance boosts when the load and branch are
pipelined to hide the load latency.
I understand the ModuloScheduleExpander would need some heavy modification, as
it relies on helpers like "createTripCountGreaterCondition" and
"adjustTripCount", both of which are nonsensical in an irregular loop.
I think that's a separate question at the moment. There are other issues
with irregular loops that I'm aware of, such operations sensitive to
overexecuction like loads/stores, but there are well-known solutions for those
problems, especially on targets that support instruction predication in
hardware.
Our team has investigated pipelining irregular loops before, and found good
results with some relatively straightforward additions to our internal
implementation : https://dl.acm.org/doi/abs/10.1145/384196.384216
One common theme that seems to exist in all the SMS papers' examples and in
the implementation is that the scheduling DAG does not consider the branch in
the schedule. In fact, in the implementation, if the successor of any
instruction in the DAG happens to be the branch (usually also in an SUnit object
as ExitSU), the algorithm ends up crashing llc with a null-pointer dereference.
I first encountered this when dealing with a regular loop of the form:
%loop.body:
%3 = PHI %1, %loop.preheader, %2, %loop.body
...
%2 = decrement %3
branch-conditional (%2 > 0)
When calculating ALAP, the decrement's successor (the branch) is not
considered, and we end up trying to access an element in ScheduleInfo that
doesn't exist.
I worked around this problem quite simply: I removed the PHI and decrement,
hiding it inside of a pseudo-branch instruction with side effects.
Now, however, I'm dealing with an irregular loop, and have run into the same
problem with no obvious solution.
If the base algorithm should work for some types of single-block irregular
loops, I'd be more than happy to investigate and/or work with someone to
make the changes and upstream them.
Regards,
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201001/a340cca5/attachment.html>