How should I go about matching floating-point addsub-like vector instructions? My first inclination is to write something which matches build_vector 1.0, -1.0, and then use that in combination with a match on fadd, but that does not seem to work. I think this is because BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a constant load before instruction selection. Specifically, in SDValue SelectionDAGLegalize::LegalizeOp(SDValue Op): case ISD::BUILD_VECTOR: // A weird case: legalization for BUILD_VECTOR never legalizes the // operands! // FIXME: This really sucks... changing it isn't semantically incorrect, // but it massively pessimizes the code for floating-point BUILD_VECTORs // because ConstantFP operands get legalized into constant pool loads // before the BUILD_VECTOR code can see them. It doesn't usually bite, // though, because BUILD_VECTORS usually get lowered into other nodes // which get legalized properly. SimpleFinishLegalizing = false; break; and then: case ISD::BUILD_VECTOR: switch (TLI.getOperationAction(ISD::BUILD_VECTOR, Node->getValueType(0))) { default: assert(0 && "This action is not supported yet!"); case TargetLowering::Custom: Tmp3 = TLI.LowerOperation(Result, DAG); if (Tmp3.getNode()) { Result = Tmp3; break; } // FALLTHROUGH case TargetLowering::Expand: Result = ExpandBUILD_VECTOR(Result.getNode()); break; } break; (so there is not even branch for TargetLowering::Legal). Maybe I'm just missing something. Thanks in advance, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Oct 17, 2011, at 3:40 PM, Hal Finkel wrote:> How should I go about matching floating-point addsub-like vector > instructions? My first inclination is to write something which matches > build_vector 1.0, -1.0, and then use that in combination with a match on > fadd, but that does not seem to work. I think this is because > BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a > constant load before instruction selection.Trying to keep a vector producer this naive about the target instruction set is awkward. If the vector producer doesn't know whether the target has an addsub or subadd instruction, how is it to know the best way to expand complex multiplication? It's likely to get suboptimal code in many cases. On the other hand, a producer that knows that the target has certain instructions could pretty easily just use appropriate intrinsics that can be mapped directly to the desired instructions. This way, there's no need for it to play pictionary with the backend, drawing out its desired semantics in terms of primitive operations and expecting codegen to rediscover what was meant by pattern-matching. Dan
On Mon, 2011-10-17 at 17:33 -0700, Dan Gohman wrote:> On Oct 17, 2011, at 3:40 PM, Hal Finkel wrote: > > > How should I go about matching floating-point addsub-like vector > > instructions? My first inclination is to write something which matches > > build_vector 1.0, -1.0, and then use that in combination with a match on > > fadd, but that does not seem to work. I think this is because > > BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a > > constant load before instruction selection. > > Trying to keep a vector producer this naive about the target instruction > set is awkward. If the vector producer doesn't know whether the target > has an addsub or subadd instruction, how is it to know the best way to > expand complex multiplication? It's likely to get suboptimal code in many > cases. > > On the other hand, a producer that knows that the target has certain > instructions could pretty easily just use appropriate intrinsics that can > be mapped directly to the desired instructions. This way, there's no need > for it to play pictionary with the backend, drawing out its desired > semantics in terms of primitive operations and expecting codegen to > rediscover what was meant by pattern-matching. >In general, I agree with you. It is always questionable how far to attempt to go with idiom recognition. However, some kind of combination add/subtract is a common vector operation, and so I seem to be left with three choices: 1. Decompose the operation and then attempt to recognize the decomposition. 2. Add an additional LLVM instruction. 3. Add a number of target-specific special cases into the higher-level code. I am not sure which is better, but I'd prefer to stay away from choice (3) as much as practical in favor of one of the first two options. Would you support adding some kind of vector_faddsub LLVM instruction? Also, there is a precedent, in some sense, for choices (1) and (2) in how fneg %a is serialized (as fsub -0.0, %a). Perhaps we could recognize fadd (fmul <-1.0, 1.0>), %b and turn it into something else for instruction selection in a similar way. -Hal> Dan >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hi Hal, you should probably add a target specific DAG combine that synthesizes the appropriate target instruction. This is how I handled x86 horizontal add (see the FHADD X86 opcode). If it turns out that the same thing is useful for other targets then it can be generalized later. Ciao, Duncan. On 10/18/11 00:40, Hal Finkel wrote:> How should I go about matching floating-point addsub-like vector > instructions? My first inclination is to write something which matches > build_vector 1.0, -1.0, and then use that in combination with a match on > fadd, but that does not seem to work. I think this is because > BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a > constant load before instruction selection. > > Specifically, in SDValue SelectionDAGLegalize::LegalizeOp(SDValue Op): > > case ISD::BUILD_VECTOR: > // A weird case: legalization for BUILD_VECTOR never legalizes the > // operands! > // FIXME: This really sucks... changing it isn't semantically incorrect, > // but it massively pessimizes the code for floating-point BUILD_VECTORs > // because ConstantFP operands get legalized into constant pool loads > // before the BUILD_VECTOR code can see them. It doesn't usually bite, > // though, because BUILD_VECTORS usually get lowered into other nodes > // which get legalized properly. > SimpleFinishLegalizing = false; > break; > > and then: > > case ISD::BUILD_VECTOR: > switch (TLI.getOperationAction(ISD::BUILD_VECTOR, > Node->getValueType(0))) { > default: assert(0&& "This action is not supported yet!"); > case TargetLowering::Custom: > Tmp3 = TLI.LowerOperation(Result, DAG); > if (Tmp3.getNode()) { > Result = Tmp3; > break; > } > // FALLTHROUGH > case TargetLowering::Expand: > Result = ExpandBUILD_VECTOR(Result.getNode()); > break; > } > break; > (so there is not even branch for TargetLowering::Legal). Maybe I'm just > missing something. > > Thanks in advance, > Hal >
Duncan, Thanks for the suggestion; I'll look at it. -Hal On Tue, 2011-10-18 at 10:15 +0200, Duncan Sands wrote:> Hi Hal, you should probably add a target specific DAG combine that > synthesizes the appropriate target instruction. This is how I > handled x86 horizontal add (see the FHADD X86 opcode). If it turns > out that the same thing is useful for other targets then it can be > generalized later. > > Ciao, Duncan. > > On 10/18/11 00:40, Hal Finkel wrote: > > How should I go about matching floating-point addsub-like vector > > instructions? My first inclination is to write something which matches > > build_vector 1.0, -1.0, and then use that in combination with a match on > > fadd, but that does not seem to work. I think this is because > > BUILD_VECTOR cannot ever be "Legal", and so it is always turned into a > > constant load before instruction selection. > > > > Specifically, in SDValue SelectionDAGLegalize::LegalizeOp(SDValue Op): > > > > case ISD::BUILD_VECTOR: > > // A weird case: legalization for BUILD_VECTOR never legalizes the > > // operands! > > // FIXME: This really sucks... changing it isn't semantically incorrect, > > // but it massively pessimizes the code for floating-point BUILD_VECTORs > > // because ConstantFP operands get legalized into constant pool loads > > // before the BUILD_VECTOR code can see them. It doesn't usually bite, > > // though, because BUILD_VECTORS usually get lowered into other nodes > > // which get legalized properly. > > SimpleFinishLegalizing = false; > > break; > > > > and then: > > > > case ISD::BUILD_VECTOR: > > switch (TLI.getOperationAction(ISD::BUILD_VECTOR, > > Node->getValueType(0))) { > > default: assert(0&& "This action is not supported yet!"); > > case TargetLowering::Custom: > > Tmp3 = TLI.LowerOperation(Result, DAG); > > if (Tmp3.getNode()) { > > Result = Tmp3; > > break; > > } > > // FALLTHROUGH > > case TargetLowering::Expand: > > Result = ExpandBUILD_VECTOR(Result.getNode()); > > break; > > } > > break; > > (so there is not even branch for TargetLowering::Legal). Maybe I'm just > > missing something. > > > > Thanks in advance, > > Hal > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory 1-630-252-0023 hfinkel at anl.gov