Jeff Bush
2013-Jul-01 06:14 UTC
[LLVMdev] Convert the result of a vector comparison into a scalar bit mask?
When LLVM does a comparison of two vectors, in this case with 16 elements, the returned type of setcc is v16i1. The architecture I'm targeting allows storing the result of a vector comparison as a bit mask in a scalar register, but I'm having trouble converting the result of setcc into a value that is usable there. For example, if I try to AND together masks that are the results of two comparisons, it can't select an instruction because the operand types are v16i1 and no instructions can deal with that. I don't want to have to modify every instruction to be aware of v16i1 as a data type (which doesn't seem right anyway). Ideally, I could just tell the backend to treat the result of a vector setcc as an i32. I've tried a number of things, including: - Using setOperationAction for SETCC to Promote and set the Promote type to i32. It asserts internally because it tries to do a sext operation on the result, which is incompatible. - Using a custom lowering action to wrap the setcc in a combination of BITCAST/ZERO_EXTEND nodes (which I could match and eliminate in the instruction pattern). However those DAG nodes get removed during one of the passes and I the result type is still v16i1 So, my question is: what is the proper way to convert the result of a vector comparison into a scalar bitmask?
Jeff Bush
2013-Jul-02 16:12 UTC
[LLVMdev] Convert the result of a vector comparison into a scalar bit mask?
On Sun, Jun 30, 2013 at 11:14 PM, Jeff Bush <jeffbush001 at gmail.com> wrote:> When LLVM does a comparison of two vectors, in this case with 16 > elements, the returned type of setcc is v16i1. The architecture I'm > targeting allows storing the result of a vector comparison as a bit > mask in a scalar register, but I'm having trouble converting the > result of setcc into a value that is usable there. For example, if I > try to AND together masks that are the results of two comparisons, it > can't select an instruction because the operand types are v16i1 and no > instructions can deal with that. I don't want to have to modify every > instruction to be aware of v16i1 as a data type (which doesn't seem > right anyway). Ideally, I could just tell the backend to treat the > result of a vector setcc as an i32. I've tried a number of things, > including: > > - Using setOperationAction for SETCC to Promote and set the Promote > type to i32. It asserts internally because it tries to do a sext > operation on the result, which is incompatible. > > - Using a custom lowering action to wrap the setcc in a combination of > BITCAST/ZERO_EXTEND nodes (which I could match and eliminate in the > instruction pattern). However those DAG nodes get removed during one > of the passes and I the result type is still v16i1After some thought, I realize that the second approach doesn't work because the operation would be applied to each element in the vector (thus the result is still a vector). There doesn't appear a promotion type that will pack a vector. I tried adding a lowering that will transform SETCC into into a custom node that returns a scalar: SDValue VectorProcTargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(SPISD::VECTOR_COMPARE, Op.getDebugLoc(), MVT::i32, Op.getOperand(0), Op.getOperand(1), Op.getOperand(2)); } def veccmp : SDNode<"SPISD::VECTOR_COMPARE", SDTypeProfile<1, 1, [SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVec<1>]>>; And changing the pattern that matches vector comparisions to: [(set i32:$dst, (veccmp v16i32:$a, v16i32:$b, condition))] Unfortunately, this ends up tripping an assert: Assertion failed: (Op.getValueType().getScalarType().getSizeInBits() == BitWidth && "Mask size mismatches value type size!"), function SimplifyDemandedBits, file llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp, line 357. (another variant of this would be to keep the setcc and wrap it with a custom node 'PACK_VECTOR' that takes v16i1 as a param and returns i32 as a result). I'm not sure if I'm on the right track with this approach or not. Since I'm exposing this with a built-in in clang anyway (since there is no other way to do in C this that I know of), I could just punt entirely and use an intrinsic to expose packed vector comparisons. But that doesn't seem like the right thing to do.