Nagurne, James via llvm-dev
2021-Sep-09 18:11 UTC
[llvm-dev] Looking for some history on MachinePipeliner
I've been tinkering with the MachinePipeliner for my team's downstream
target, and have run into a particular issue with node ordering in the SMS
algorithm. In particular, I've noticed that in some cases, I'm receiving
an 'Invalid node order' warning in the debug. I've tracked this down
to how 'computeNodeOrder' is implemented.
computeNodeOrder's algorithm seems to be relatively untouched from the
initial commit:
https://reviews.llvm.org/D16829
However, the algorithm is very different from those found in the 3 papers upon
which the implementation is based:
https://github.com/llvm/llvm-project/blob/f40bba48a593d4297a6d0b4d1534082858b1d0ac/llvm/include/llvm/CodeGen/MachinePipeliner.h#L18
In the implementation:
https://github.com/llvm/llvm-project/blob/f40bba48a593d4297a6d0b4d1534082858b1d0ac/llvm/lib/CodeGen/MachinePipeliner.cpp#L1877
The order seems to be:
* If pred_L is a subset of the NodeSet, BottomUp and add subset to work list
* If succ_L is s subset of the NodeSet, TopDown and add subset to work list
* If intersect(succ_L, NodeSet) is non-empty, TopDown and add intersect to
work list
* If there is only 1 NodeSet, BottomUp and add nodes at the bottom of the
schedule (Succs.size() == 0) to work list
* Otherwise, BottomUp and add node with the highest ASAP to work list
In the papers (Using Tanya Lattner's master's
thesis<https://llvm.org/pubs/2005-06-17-LattnerMSThesis.html> as the
example), the order is:
* If intersect(pred_L, NodeSet) is non-empty, BottomUp and add subset to
work list
* If intersect(succ_L, NodeSet) is non-empty, TopDown and add subset to work
list
* Otherwise, BottomUp and add node with the highest ASAP to work list
Interestingly, when I implement the ordering from the paper, the node order for
my loop is valid.
This difference in implementation is not mentioned or commented on in the
review, so I'm wondering if there are any documents or emails I may have
missed in my search that would shed some light. Certainly, this ordering is just
a heuristic, so I can understand that the current incarnation just resulted in
better performance for the targets utilizing it, but I'm curious if there
was something more to the decision.
Regards,
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210909/38898e09/attachment.html>
Cahoon, Brendon via llvm-dev
2021-Sep-09 21:14 UTC
[llvm-dev] Looking for some history on MachinePipeliner
[Public]
It's been a while since I've looked at the pipeliner... The lack of
discussion and history is because the pipeliner was developed downsteam for a
long period of time before it was upstreamed.
Three of the cases are from Figure 6 in the original paper by Llosa and others.
* If pred_L is a subset of the NodeSet, BottomUp and add subset to work list
* If succ_L is s subset of the NodeSet, TopDown and add subset to work list
* Otherwise, BottomUp and add node with the highest ASAP to work list
I'm pretty sure, at various points, we tried intersection for pred_L and
succ_L, but I don't think that gave us very good results.
The two additional cases,
* If intersect(succ_L, NodeSet) is non-empty, TopDown and add intersect to
work list
* If there is only 1 NodeSet, BottomUp and add nodes at the bottom of the
schedule (Succs.size() == 0) to work list
These were added to improve node ordering and the generated schedule, which were
identified when analyzing benchmarks. For the first case, if that heuristic is
removed, then one of the tests, swp-conv3x3-nested.ll fails. That was an
important kernel to pipeline, and it doesn't pipeline without that
heuristic.
The SMS algorithm tries to limit the cases when a node has both successors and
predecessors scheduled already. This is a concern for node-sets with recurrences
(because there is one node that will have successors and predecessors scheduled
already). In loops that contain many node-sets that have recurrences, finding a
valid ordering may be difficult. The challenge with the original three cases is
that the fallback is to do a bottom-up schedule, and it starts with a single
node only. We noticed that it's important to choose the correct direction
and it's important to identify the initial set of nodes used (to start going
top-down or bottom-up).
For the first case, with the intersection, we don't want to go bottom-up
with a single node. Instead, starting top-down is better because the current
node-set contains successors of nodes scheduled already. Furthermore, we want
to start with the set of nodes. If we start with a single node going top-down,
then the algorithm ends up switching eventually to bottom-up and creating a case
where a node has a successor and predecessor scheduled already.
In the second case, with only 1 node-set, all the nodes in the basic block
belong to a single node-set. With the fall back, the algorithm goes bottom-up
with a single node, and it may be difficult to choose the best one to start
with. This heuristic chooses multiple nodes to start with so that bottom-up
traversal occurs as a wave over a set. That is, rather than going up, and then
down, etc. It helped to prime the traversal with multiple nodes rather than a
single node. (Unfortunately, there is no test case that fails if this heuristic
is removed).
Let me know if you have any questions. In your case, are the two new heuristics
the reason why your example doesn't schedule?
Thanks,
Brendon
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Nagurne,
James via llvm-dev
Sent: Thursday, September 9, 2021 1:12 PM
To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org>
Cc: Jayaraj, Ajay <ajayj at ti.com>
Subject: [llvm-dev] Looking for some history on MachinePipeliner
[CAUTION: External Email]
I've been tinkering with the MachinePipeliner for my team's downstream
target, and have run into a particular issue with node ordering in the SMS
algorithm. In particular, I've noticed that in some cases, I'm receiving
an 'Invalid node order' warning in the debug. I've tracked this down
to how 'computeNodeOrder' is implemented.
computeNodeOrder's algorithm seems to be relatively untouched from the
initial commit:
https://reviews.llvm.org/D16829<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD16829&data=04%7C01%7CBrendon.Cahoon%40amd.com%7C0e9df2f911644a1bd86008d973bd538c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637668079257051157%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NwJ1b72vIA0GilZSKyKeN7ffDy4y2%2FYR9XBbPCAdnMA%3D&reserved=0>
However, the algorithm is very different from those found in the 3 papers upon
which the implementation is based:
https://github.com/llvm/llvm-project/blob/f40bba48a593d4297a6d0b4d1534082858b1d0ac/llvm/include/llvm/CodeGen/MachinePipeliner.h#L18<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-project%2Fblob%2Ff40bba48a593d4297a6d0b4d1534082858b1d0ac%2Fllvm%2Finclude%2Fllvm%2FCodeGen%2FMachinePipeliner.h%23L18&data=04%7C01%7CBrendon.Cahoon%40amd.com%7C0e9df2f911644a1bd86008d973bd538c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637668079257061113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PNUKsATxYSr69GCNAGQlLexEobC1gQfgxyBhQ%2FvAwc8%3D&reserved=0>
In the implementation:
https://github.com/llvm/llvm-project/blob/f40bba48a593d4297a6d0b4d1534082858b1d0ac/llvm/lib/CodeGen/MachinePipeliner.cpp#L1877<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-project%2Fblob%2Ff40bba48a593d4297a6d0b4d1534082858b1d0ac%2Fllvm%2Flib%2FCodeGen%2FMachinePipeliner.cpp%23L1877&data=04%7C01%7CBrendon.Cahoon%40amd.com%7C0e9df2f911644a1bd86008d973bd538c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637668079257061113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2Bygn6Cycn02tUeLps4pWF%2FNqk6J9%2BYEcso6d4IUmcW8%3D&reserved=0>
The order seems to be:
* If pred_L is a subset of the NodeSet, BottomUp and add subset to work list
* If succ_L is s subset of the NodeSet, TopDown and add subset to work list
* If intersect(succ_L, NodeSet) is non-empty, TopDown and add intersect to
work list
* If there is only 1 NodeSet, BottomUp and add nodes at the bottom of the
schedule (Succs.size() == 0) to work list
* Otherwise, BottomUp and add node with the highest ASAP to work list
In the papers (Using Tanya Lattner's master's
thesis<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fllvm.org%2Fpubs%2F2005-06-17-LattnerMSThesis.html&data=04%7C01%7CBrendon.Cahoon%40amd.com%7C0e9df2f911644a1bd86008d973bd538c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637668079257071072%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8b8m1J797n7xRi0HyBE%2BpX%2FfJmcL2OqauVLmzVj4GEM%3D&reserved=0>
as the example), the order is:
* If intersect(pred_L, NodeSet) is non-empty, BottomUp and add subset to
work list
* If intersect(succ_L, NodeSet) is non-empty, TopDown and add subset to work
list
* Otherwise, BottomUp and add node with the highest ASAP to work list
Interestingly, when I implement the ordering from the paper, the node order for
my loop is valid.
This difference in implementation is not mentioned or commented on in the
review, so I'm wondering if there are any documents or emails I may have
missed in my search that would shed some light. Certainly, this ordering is just
a heuristic, so I can understand that the current incarnation just resulted in
better performance for the targets utilizing it, but I'm curious if there
was something more to the decision.
Regards,
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210909/da3530ca/attachment.html>