Björn Pettersson A via llvm-dev
2022-Jan-25 20:04 UTC
[llvm-dev] DAGCombiner, chains and mergeConsecutiveStores?
Hi llvm-dev, We got the following problem in our OOT target. During DAG combining we have a DAG like this: SelectionDAG has 16 nodes: t0: ch = EntryToken t4: i16 = add nuw GlobalAddress:i16<%str0* @g0> 0, Constant:i16<2> t7: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 1)`, align 1, !tbaa !1)> t0, Constant:i32<0>, t4, undef:i16 t8: i16,ch = load<(load (s16) from `%str1** undef`)> t7, undef:i16, undef:i16 t9: i16 = add nuw t8, Constant:i16<2> t10: i16,ch = load<(load (s16) from %ir.next1.i)> t7, t9, undef:i16 t11: i32,ch = load<(load (s32) from %ir.val.i1, align 1, !tbaa !7)> t7, t10, undef:i16 t17: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 0)`, align 1)> t11:1, Constant:i32<0>, GlobalAddress:i16<%str0* @g0> 0, undef:i16 t19: ch = TokenFactor t17, t8:1, t10:1 t15: ch,glue = CopyToReg t19, Register:i32 $a0_32, t11 t16: ch = PHXISD::RETURN t15, Register:i32 $a0_32, t15:1 Then when visiting the "load (s32)" load a better chain is found and a few combines later we get this DAG instead: SelectionDAG has 16 nodes: t0: ch = EntryToken t4: i16 = add nuw GlobalAddress:i16<%str0* @g0> 0, Constant:i16<2> t7: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 1)`, align 1, !tbaa !1)> t0, Constant:i32<0>, t4, undef:i16 t8: i16,ch = load<(load (s16) from `%str1** undef`)> t7, undef:i16, undef:i16 t9: i16 = add nuw t8, Constant:i16<2> t10: i16,ch = load<(load (s16) from %ir.next1.i)> t7, t9, undef:i16 t22: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 0)`, align 1)> t20:1, Constant:i32<0>, GlobalAddress:i16<%str0* @g0> 0, undef:i16 t25: ch = TokenFactor t8:1, t10:1, t22 t15: ch,glue = CopyToReg t25, Register:i32 $a0_32, t20 t20: i32,ch = load<(load (s32) from %ir.val.i1, align 1, !tbaa !7)> t0, t10, undef:i16 t16: ch = PHXISD::RETURN t15, Register:i32 $a0_32, t15:1 Notice how the chain for the "load (s32)" now points at the entry token. I figure TBAA indicated that there was no alias with the store so the we did not need the chain dependency between that load and the t7 store. However, there is still an ordering between the store and the load, since the t20 load use t10 as a pointer, and t10 has a chain dependency to t7. My questions is: Is it correct/legal to update the chain like that, or is the chain supposed to show the memory ordering (without the need to also analyze the mix of dependencies given both other operands and the chain)? If the above is a correct transform, then I think we got a problem with mergeConsecutiveStores (that we started to see downstream after https://reviews.llvm.org/D116895). When finding the merge candidates (getStoreMergeCandidates) only chains are followed, not considering other operand dependencies. And later when making sure we do not cause cycles in the DAG (in checkMergeStoreCandidatesForDependencies) it is assumed that chains already have been analyzed, so at that stage only other operands are analyzed and chains are skipped. Given the example above we have: t22 depends on t20 (chain dep) t20 depends on t10 (non-chain dep) t10 depends on t7 (chain dep) and t9+t8 (via non-chain dep) t7 depends on t0 (chain dep) Merging the t7 and t22 stores would result in a cycle. To figure that out I think one need to analyze both chain and non-chain dependency at the same time, not in two separate analyses. The particular problem we see can be solved by letting checkMergeStoreCandidatesForDependencies also analyse the chain operands. That of course depends on the legality of the chain rewrites that I asked questions about above (but if that transform isn't legal then I figure that checkMergeStoreCandidatesForDependencies would be kind of redundant). /Björn