Alexandros Lamprineas via llvm-dev
2018-Apr-26 13:51 UTC
[llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load
Hello, There is a particular code sequence I would like to optimize at the IR level. I'd like to turn an Arm/AArch64 table lookup intrinsic that takes a constant vector mask into a shufflevector instruction: vtbl1(V,mask) ~> shufflevector(V,undef,mask) The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend will generate rev64 instructions instead. If the mask comes from a vld1 of a global constant I could fold it to allow the above instruction combining. My question is, does the constant folding of the vld1 seem a good thing to do in the general case, as a standalone transformation, or only when used as a mask for a table lookup? Alexandros IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180426/15a09a92/attachment.html>
Friedman, Eli via llvm-dev
2018-Apr-26 19:22 UTC
[llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load
On 4/26/2018 6:51 AM, Alexandros Lamprineas via llvm-dev wrote:> > Hello, > > > There is a particular code sequence I would like to optimize at the IR > level. > > I'd like to turn an Arm/AArch64 table lookup intrinsic that takes a > constant vector mask into a shufflevector instruction: > > vtbl1(V,mask) ~> shufflevector(V,undef,mask) > > > The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend > will generate rev64 instructions instead. > > If the mask comes from a vld1 of a global constant I could fold it to > allow the above instruction combining. > > My question is, does the constant folding of the vld1 seem a good > thing to do in the general case, as a standalone transformation, or > only when used as a mask for a table lookup? >Yes, constant-folding vld1 seems like a good idea. Actually, we should probably just lower the NEON vld1 intrinsics to an LLVM "load" (which would give us constant-folding for free), but that would be more work to make sure it doesn't have any unexpected effects. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180426/c36c4b18/attachment.html>
Reasonably Related Threads
- [LLVMdev] Unaligned vector memory access for ARM/NEON.
- [LLVMdev] Unaligned vector memory access for ARM/NEON.
- [LLVMdev] Unaligned vector memory access for ARM/NEON.
- [LLVMdev] Unaligned vector memory access for ARM/NEON.
- [LLVMdev] Unaligned vector memory access for ARM/NEON.