thr3ads.net - llvm dev - [llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Alexandros Lamprineas via llvm-dev

2018-Apr-26 13:51 UTC

[llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

Hello,


There is a particular code sequence I would like to optimize at the IR level.

I'd like to turn an Arm/AArch64 table lookup intrinsic that takes a constant
vector mask into a shufflevector instruction:

vtbl1(V,mask) ~> shufflevector(V,undef,mask)


The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend will
generate rev64 instructions instead.

If the mask comes from a vld1 of a global constant I could fold it to allow the
above instruction combining.

My question is, does the constant folding of the vld1 seem a good thing to do in
the general case, as a standalone transformation, or only when used as a mask
for a table lookup?

Alexandros
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180426/15a09a92/attachment.html>

Friedman, Eli via llvm-dev

2018-Apr-26 19:22 UTC

head link

[llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

On 4/26/2018 6:51 AM, Alexandros Lamprineas via llvm-dev
wrote:>
> Hello,
>
>
> There is a particular code sequence I would like to optimize at the IR 
> level.
>
> I'd like to turn an Arm/AArch64 table lookup intrinsic that takes a 
> constant vector mask into a shufflevector instruction:
>
> vtbl1(V,mask) ~> shufflevector(V,undef,mask)
>
>
> The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend 
> will generate rev64 instructions instead.
>
> If the mask comes from a vld1 of a global constant I could fold it to 
> allow the above instruction combining.
>
> My question is, does the constant folding of the vld1 seem a good 
> thing to do in the general case, as a standalone transformation, or 
> only when used as a mask for a table lookup?
>
Yes, constant-folding vld1 seems like a good idea.

Actually, we should probably just lower the NEON vld1 intrinsics to an 
LLVM "load" (which would give us constant-folding for free), but that 
would be more work to make sure it doesn't have any unexpected effects.

-Eli


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180426/c36c4b18/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Apr 2018 - [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

[llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

[llvm-dev] [Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load

Seemingly Similar Threads