Björn Pettersson A via llvm-dev
2021-Dec-14 17:56 UTC
[llvm-dev] Shouldn't DAG combines be applied in topological order?
I've been trying to understand why I do not get the expected output in some
lit tests.
After some debugging I found out that DAG combines aren't always applied in
topological order. Making it hard to predict the result, and I figure also
making it hard to get a specific behavior.
Consider the example here: https://godbolt.org/z/x8PEhsnqf
The debug log show this:
.... cut ....
Legalized selection DAG: %bb.0 'test:'
SelectionDAG has 15 nodes:
t0: ch = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t3: i16 = truncate t2
t5: i32,ch = CopyFromReg t0, Register:i32 %1
t6: i16 = truncate t5
t17: i16 = X86ISD::FSHL t3, t6, Constant:i8<6>
t10: i16 = and t17, Constant:i16<-32>
t13: ch,glue = CopyToReg t0, Register:i16 $ax, t10
t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i16 $ax,
t13:1
Combining: t17: i16 = X86ISD::FSHL t3, t6, Constant:i8<6>
Combining: t15: i8 = Constant<6>
Combining: t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>,
Register:i16 $ax, t13:1
Combining: t13: ch,glue = CopyToReg t0, Register:i16 $ax, t10
Combining: t10: i16 = and t17, Constant:i16<-32>
.... cut ....
So when the DAG combiner starts to combine things after having legalized the
fshl (in this case into using X86ISD::FSHL) it will do combine on X86ISD::FSHL
before doing combine on the AND node that is using the result of the funnel
shift.
In my target the fshl expanded to a pattern involving and OR operation, and I
wanted to do trigger a fold based on the outer AND, but instead the OR is
combined first, resulting in a pattern that is much harder to combine when
eventually triggering the DAG combine on the AND.
I'm well aware that I probably need/want to implement that more complicated
fold as well to be less depending on how the input to ISel looks like (and in
which order nodes are legalized/combined), but the result here was a bit
unexpected to me.
If I insert a call to
DAG.AssignTopologicalOrder();
in the beginning of DAGCombiner::Run I get the result I want. And I also noticed
up to 7% speedup in some downstream benchmarks.
However, I guess such a change might impact compile time slightly. And lots of
upstream lit tests would need to have their test checks updated.
Is it well known that we do not sort the worklist at the beginning of the
DAGCombiner::Run, or should this be considered as a bug?
Regards,
Björn