Ramanarayanan, Ramshankar
2015-Jan-15 21:06 UTC
[LLVMdev] proof of concept for a loop fusion pass (WIP)
Hi,
We are proposing a loop fusion pass that tries to proactive fuse loops across
function call boundaries and arbitrary control flow.
http://reviews.llvm.org/D7008
With this pass, we get 103 loop fusions in SPECCPU INT 2006 462.libquantum with
rate performance improving close to 2.5X in x86 (results from AMD A10-6700).
I took some liberties in patching up some of the code in
ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and also
adjusted the IPO/LTO pass managers. I would need to do a better job there but I
would like to put this out as WIP for high level reviews. At this time standard
loop fusion test cases may not work (cases where control dep are same or loops
are truly adjacent etc). I would be doing that as well.
Appreciate suggestions and other help.
The pass initially attempts to inline calls which have loops in them. Following
inlining, a separate scalar "main" pass tries to fuse loops. It is not
necessary for loops to be adjacent or have the same control dependence.
Specifically a control dependence closure is computed and loops that share a
control dep closure prefix are taken as candidates for fusion, in case there are
no other loops in a certain path between them. A loop graph is built with edges
between loops which share the control dependence closure prefix. A recursive
application of fusion follows on the graph from "bottom-up".
During fusion, program slices are taken based on the control flow closures of
the two loops being fused. Example: Suppose two loops A and B are going to be
fused. They share the same control dependence prefix, but their suffices vary.
The suffices are used to duplicate Control flow paths that pass through both
loops. Specifically paths from the first block in the control-dep closure suffix
of the first loop, through the first loop's exit and following into the
suffix of the second loop through the second loop's exit on to the common
post-dominator of either loop's exit. The number of paths grows
exponentially. At this time some heuristics are used when exploring for paths.
Using profile based heuristics is a better idea.
Also function unswitching helps by cleaning up control flow.
Example:
if (a & b) {
if (a & c) {
if (t & 0xF) {
for (i=0; i < size; i++) {
if (A[i] & d) {
A[i] |= (1 << t);
}
}
}
}
}
if (b & d) {
if (a & d) {
if (t & 0xA) {
for (i=0; i < size; i++) {
if (A[i] & d) {
A[i] |= (1 << t);
}
}
}
}
}
After fusion we will have:
if (a&b && a&c && t&0xF && b&d
&& a&d t&0xA) {
for (i=0; i < size; i++) {
if (A[i] & d) {
A[i] |= (1 << t);
}
if (A[i] & d) {
A[i] |= (1 << t);
}
}
} else {
// original code
if (a & b) {
if (a & c) {
if (t & 0xF) {
for (i=0; i < size; i++) {
if (A[i] & d) {
A[i] |= (1 << t);
}
}
}
}
}
if (b & d) {
if (a & d) {
if (t & 0xA) {
for (i=0; i < size; i++) {
if (A[i] & d) {
A[i] |= (1 << t);
}
}
}
}
}
}
Thanks,
Ramshankar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/03b2e79d/attachment.html>