Jesper Antonsson via llvm-dev
2018-Sep-12 12:58 UTC
[llvm-dev] Reassociate lose parallelism
I compile this very simple c-program: #define T unsigned T foo(T a, T b, T c, T d) { return (a+b)+(c+d); } Before reassociate, the first two adds in the IR are made in parallel: entry: %add = add i16 %a, %b %add1 = add i16 %c, %d %add2 = add i16 %add, %add1 ret i16 %add2 After reassociate, the adds have been serialized: entry: %add1 = add i16 %b, %a %add = add i16 %add1, %c %add2 = add i16 %add, %d ret i16 %add2 It seems to me that RewriteExprTree() does this and there's this comment: // Not the last operation. The left-hand side will be a sub-expression // while the right-hand side will be the current element of Ops. So I gather the serialization is a result of this algorithm. Now, my question is if the reassociate pass is supposed to care about the depth of expression trees, or if a conscious tradeoff has been made to not care? (I made a quick hack to bail out if the depth of the original expression would increase in RewriteExprTree(). Our benchmark suite had the hack kick in a few times, with a clear improvement in one benchmark and another benchmark being better in unweighted cycles but worse in loop weighted cycles.) Regards, Jesper