David Goldblatt via llvm-dev
2021-Jun-02 02:56 UTC
[llvm-dev] Improving codegen for not-quite-constant-folding
Hi all, Some Facebook colleagues and I are looking at poor codegen for the C++20 spaceship operator (i.e. operator <=>, which performs 3-way comparisons), when used to implement simple comparisons (e.g. operator<). A simple example is at https://gcc.godbolt.org/z/qhn6cf7nv, which desugars into something like https://gcc.godbolt.org/z/4z4e1jb6T. By comparison, a hand-written operator< would probably look like one of https://gcc.godbolt.org/z/zn1en8fGW, or https://gcc.godbolt.org/z/6oY5exTa8, or https://gcc.godbolt.org/z/j6734zWYz. A reduced example with the same issue is here: https://gcc.godbolt.org/z/6E1KxE83x. Broadly, a tree of select/phi nodes with constants at the leaves isn't subject to constant folding, even though the reduced example could be rewritten as something like: https://gcc.godbolt.org/z/356qr6rKK (this saves a multiply, and in "real" examples enables further simplifications). GCC's GVN/PRE pass does a better job here with -O2 -ftree-partial-pre ("partial pre" is not a typo; it's "partial partial redundancy elimination", in addition to regular partial redundancy elimination), or -O3 (which turns on -ftree-partial-pre). -ftree-partial-pre tracks partial anticipation of expressions (those that are evaluated on some path to exit) on a per-BB level, and inserts the necessary phis to ensure that a partially anticipated expression is available in a temporary if temporaries with the same value number are available in all a BB's predecessors. It does constant folding after phi translation before checking for availability, and regards all constants as always available. After phi-translating the result into each branch of the ternary operators, we get constants, so the optimization kicks in and we can replace each leaf constant with the result of multiplying it by 123. A simpler approach that could catch cases like this would be to find instances of trees in which the leaves are constants, and the non-leaf nodes are phis or selects. If the root of such a tree is the input into an operation we could otherwise constant fold (if that root were a constant), we constant-fold each leaf of the tree instead. This would miss some optimization opportunities that the GCC approach would catch, however. The former feels tricky to implement on top of GVN or NewGVN (and, it's unclear that it's always a net improvement). The latter seems reasonable, although it would require a one-off pass targeted at this relatively narrow case, which feels heavyweight. My inclination is to attempt the simpler approach, but wanted a sanity check that this seems reasonable and is not better done elsewhere (or with a different approach entirely). Best, David
Philip Reames via llvm-dev
2021-Jul-07 17:26 UTC
[llvm-dev] Improving codegen for not-quite-constant-folding
There's a bit of prior work in this area inspired by the java icmp and lcmp bytecode semantics which are fairly similar to C++ space ship operators. See matchThreeWayIntCompare in InstCombine. This recognizes the select form after all the control flow has been eliminated. We relied on jump threading and simplify cfg to eliminate the control flow, so that doesn't do anything for the question you posed. Your example also involves multiple element comparisons which complicates the matching significantly. (I suspect you know this, just stating it for the record.) On the partial PRE, fitting that into the existing logic in GVN wouldn't be terrible. However, I really question the profitability of this transform in general. It's effectively hoisting an instruction into path where it didn't previously execute. Doing that without regards to relative frequencies can be quite painful. You could also try to PRE the condition back into the predecessors. We explicitly avoid that today due to profitability problems, but maybe you can find a sub-case which is profitable and enable it selectively? Another observation is that we appear to not be canonicalizing the operands of the selects. In the optimized IR, I see: %10 = icmp eq i64 %1, %3, !dbg !42 %11 = icmp ult i64 %1, %3, !dbg !44 %12 = select i1 %11, i8 -1, i8 1, !dbg !44 %13 = select i1 %10, i8 0, i8 %12, !dbg !44 br label %14, !dbg !44 14: ; preds = %6, %9 %15 = phi i8 [ %8, %6 ], [ %13, %9 ], !dbg !33 %16 = icmp slt i8 %15, 0, !dbg !45 What's interesting here is that 1 and 0 are indistinguishable. Not sure what would happen if we rewrote the i8 1 to be i8 0, but maybe that unlocks some progress? Actually, I think it does. Once that's done, anding the two conditions becomes more obviously profitable. Maybe we could back propagate constant ranges and canonicalize constants in some way? (Actually, now that I write that, why can't simplify demanded bits do exactly that?!) Philip On 6/1/21 7:56 PM, David Goldblatt via llvm-dev wrote:> Hi all, > > Some Facebook colleagues and I are looking at poor codegen for the > C++20 spaceship operator (i.e. operator <=>, which performs 3-way > comparisons), when used to implement simple comparisons (e.g. > operator<). A simple example is at > https://gcc.godbolt.org/z/qhn6cf7nv, which desugars into something > like https://gcc.godbolt.org/z/4z4e1jb6T. By comparison, a > hand-written operator< would probably look like one of > https://gcc.godbolt.org/z/zn1en8fGW, or > https://gcc.godbolt.org/z/6oY5exTa8, or > https://gcc.godbolt.org/z/j6734zWYz. > > A reduced example with the same issue is here: > https://gcc.godbolt.org/z/6E1KxE83x. Broadly, a tree of select/phi > nodes with constants at the leaves isn't subject to constant folding, > even though the reduced example could be rewritten as something like: > https://gcc.godbolt.org/z/356qr6rKK (this saves a multiply, and in > "real" examples enables further simplifications). > > GCC's GVN/PRE pass does a better job here with -O2 -ftree-partial-pre > ("partial pre" is not a typo; it's "partial partial redundancy > elimination", in addition to regular partial redundancy elimination), > or -O3 (which turns on -ftree-partial-pre). -ftree-partial-pre tracks > partial anticipation of expressions (those that are evaluated on some > path to exit) on a per-BB level, and inserts the necessary phis to > ensure that a partially anticipated expression is available in a > temporary if temporaries with the same value number are available in > all a BB's predecessors. It does constant folding after phi > translation before checking for availability, and regards all > constants as always available. After phi-translating the result into > each branch of the ternary operators, we get constants, so the > optimization kicks in and we can replace each leaf constant with the > result of multiplying it by 123. > > A simpler approach that could catch cases like this would be to find > instances of trees in which the leaves are constants, and the non-leaf > nodes are phis or selects. If the root of such a tree is the input > into an operation we could otherwise constant fold (if that root were > a constant), we constant-fold each leaf of the tree instead. This > would miss some optimization opportunities that the GCC approach would > catch, however. > > The former feels tricky to implement on top of GVN or NewGVN (and, > it's unclear that it's always a net improvement). The latter seems > reasonable, although it would require a one-off pass targeted at this > relatively narrow case, which feels heavyweight. My inclination is to > attempt the simpler approach, but wanted a sanity check that this > seems reasonable and is not better done elsewhere (or with a different > approach entirely). > > Best, > David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev