In ToT, LowerVECTOR_SHUFFLE for x86 has this code: if (X86::isUNPCKLMask(SVOp)) getTargetShuffleNode(getUNPCKLOpcode(VT) dl, VT, V1, V2, DAG); why would this not be: if (X86::isUNPCKLMask(SVOp)) return SVOp; I'm trying to add support for VUNPCKL and am getting into trouble because the existing code ends up creating: VUNPCKLPS load load which is badness come selection time. Legalize doesn't get a chance to look below the target shuffle node to see that there are two memory operands. Back in the 2.7 days, we used to just return the shuffle as is if it was already legal. Why the change to create a target node? Will there be problems if I change the code to return the shuffle as-is? Thanks! -Dave
David Greene <dag at cray.com> writes:> In ToT, LowerVECTOR_SHUFFLE for x86 has this code: > > if (X86::isUNPCKLMask(SVOp)) > getTargetShuffleNode(getUNPCKLOpcode(VT) dl, VT, V1, V2, DAG); > > why would this not be: > > if (X86::isUNPCKLMask(SVOp)) > return SVOp;Ok, I discovered that Bruno did this in revisions 112934, 112942 and 113020 but the logs don't really make clear why. I did figure out that I needed new SDNode defs for VUNPCKLPSY and VUNPCKLPDY and corresponding patterns. Once I added them everything started working. I found this all very confusing because it appears there are now two ways to match certain shuffle instructions in .td files: one through the traditional shuffle operators like unpckl and shufp and another through these special X86* operators. This is reflected in X86InstrSSE.td: "Traditional": defm VUNPCKLPS: sse12_unpack_interleave<0x14, unpckl, v4f32, memopv4f32, VR128, f128mem, "unpcklps\t{$src2, $src1, $dst|$dst, $src1, $src2}", SSEPackedSingle>, VEX_4V; "New-style": def : Pat<(v4f32 (X86Unpcklps VR128:$src1, (memopv4f32 addr:$src2))), (VUNPCKLPSrm VR128:$src1, addr:$src2)>, Requires<[HasAVX]>; I think these are basically the same pattern. What's the purpose of these special operators and patterns? -Dave
It is inefficient and error-prone to recognize legal shuffles and then have isel repeat the process. For example, if the DAG combiner changes a shuffle in between legalization and isel, it may stop being legal and break isel. By legalizing to target-specific DAG nodes, we avoid that possibility and also make it much easier to match the shuffles during isel. On Feb 25, 2011, at 6:01 PM, David A. Greene wrote:> David Greene <dag at cray.com> writes: > >> In ToT, LowerVECTOR_SHUFFLE for x86 has this code: >> >> if (X86::isUNPCKLMask(SVOp)) >> getTargetShuffleNode(getUNPCKLOpcode(VT) dl, VT, V1, V2, DAG); >> >> why would this not be: >> >> if (X86::isUNPCKLMask(SVOp)) >> return SVOp; > > Ok, I discovered that Bruno did this in revisions 112934, 112942 and > 113020 but the logs don't really make clear why. I did figure out that > I needed new SDNode defs for VUNPCKLPSY and VUNPCKLPDY and corresponding > patterns. Once I added them everything started working. > > I found this all very confusing because it appears there are now two > ways to match certain shuffle instructions in .td files: one through the > traditional shuffle operators like unpckl and shufp and another through > these special X86* operators. > > This is reflected in X86InstrSSE.td: > > "Traditional": > > defm VUNPCKLPS: sse12_unpack_interleave<0x14, unpckl, v4f32, memopv4f32, > VR128, f128mem, "unpcklps\t{$src2, $src1, $dst|$dst, $src1, $src2}", > SSEPackedSingle>, VEX_4V; > "New-style": > > def : Pat<(v4f32 (X86Unpcklps VR128:$src1, (memopv4f32 addr:$src2))), > (VUNPCKLPSrm VR128:$src1, addr:$src2)>, Requires<[HasAVX]>; > > I think these are basically the same pattern. > > What's the purpose of these special operators and patterns? > > -Dave > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Apparently Analagous Threads
- [LLVMdev] X86 LowerVECTOR_SHUFFLE Question
- [LLVMdev] X86 LowerVECTOR_SHUFFLE Question
- [LLVMdev] RFC: AVX Pattern Specification [LONG]
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
- [LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits