Pete Couperus via llvm-dev
2016-Nov-16 00:22 UTC
[llvm-dev] InstCombine question on combineLoadToOperationType
Hello, Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence. We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like: define void @bits(<32 x i1>* %A, <32 x i1>* %B) { %a = load <32 x i1>, <32 x i1>* %A store <32 x i1> %a, <32 x i1>* %B ret void } Is transformed to: define void @bits(<32 x i1>* %A, <32 x i1>* %B) { %1 = bitcast <32 x i1>* %A to i32* %a1 = load i32, i32* %1, align 4 %2 = bitcast <32 x i1>* %B to i32* store i32 %a1, i32* %2, align 4 ret void } This looks to be intentional. Is there a way to specify in the data-layout that v32i1 storage is not 32-bits? Absent that, is there any other reliable way to retain the original vector loads/store without just disabling this part of InstCombine? Or is it the backend's responsibility to try and work with this? Thanks! Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/90249a48/attachment-0001.html>
Friedman, Eli via llvm-dev
2016-Nov-16 19:23 UTC
[llvm-dev] InstCombine question on combineLoadToOperationType
On 11/15/2016 4:22 PM, Pete Couperus via llvm-dev wrote:> > Hello, > > Context: We have a backend where v32i1 is a Legal type, but the > storage for v32i1 is not 32-bits/uses a different instruction sequence. > > We ran into an issue because combineLoadToOperationType changed v32i1 > loads into i32 loads, so a sequence like: > > define void @bits(<32 x i1>* %A, <32 x i1>* %B) { > > %a = load <32 x i1>, <32 x i1>* %A > > store <32 x i1> %a, <32 x i1>* %B > > ret void > > } > > Is transformed to: > > define void @bits(<32 x i1>* %A, <32 x i1>* %B) { > > %1 = bitcast <32 x i1>* %A to i32* > > %a1 = load i32, i32* %1, align 4 > > %2 = bitcast <32 x i1>* %B to i32* > > store i32 %a1, i32* %2, align 4 > > ret void > > } > > This looks to be intentional. > > Is there a way to specify in the data-layout that v32i1 storage is not > 32-bits? >No, not at the moment. You could propose something, but you'd probably have a hard time convincing anyone it's necessary; nobody has cared about this for a very long time.> Absent that, is there any other reliable way to retain the original > vector loads/store without just disabling this part of InstCombine? >No, and you'll run into other problems (e.g. alias analysis) if the data layout lies about the size of a load or store.> Or is it the backend’s responsibility to try and work with this? >Where are these loads coming from? x86 without AVX512 doesn't have any convenient way generate code for a <32 x i1> store, but it doesn't matter because frontends don't generate <N x i1> loads and stores. If you have a frontend which is generating loads and stores like this, you could probably change it to use some other sequence (like a platform-specific intrinsic, or some sequence involving sext/trunc). -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/acaf828c/attachment.html>
Pete Couperus via llvm-dev
2016-Nov-17 16:28 UTC
[llvm-dev] InstCombine question on combineLoadToOperationType
On 11/15/2016 4:22 PM, Pete Couperus via llvm-dev wrote: Hello, Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence. We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like: define void @bits(<32 x i1>* %A, <32 x i1>* %B) { %a = load <32 x i1>, <32 x i1>* %A store <32 x i1> %a, <32 x i1>* %B ret void } Is transformed to: define void @bits(<32 x i1>* %A, <32 x i1>* %B) { %1 = bitcast <32 x i1>* %A to i32* %a1 = load i32, i32* %1, align 4 %2 = bitcast <32 x i1>* %B to i32* store i32 %a1, i32* %2, align 4 ret void } This looks to be intentional. Is there a way to specify in the data-layout that v32i1 storage is not 32-bits? No, not at the moment. You could propose something, but you'd probably have a hard time convincing anyone it's necessary; nobody has cared about this for a very long time. Absent that, is there any other reliable way to retain the original vector loads/store without just disabling this part of InstCombine? No, and you'll run into other problems (e.g. alias analysis) if the data layout lies about the size of a load or store. Or is it the backend’s responsibility to try and work with this? Where are these loads coming from? x86 without AVX512 doesn't have any convenient way generate code for a <32 x i1> store, but it doesn't matter because frontends don't generate <N x i1> loads and stores. If you have a frontend which is generating loads and stores like this, you could probably change it to use some other sequence (like a platform-specific intrinsic, or some sequence involving sext/trunc). We do have a frontend that can generate <32 x i1> loads/stores, though it is rare that these are inst-combined to i32 loads/stores like here (these were only illustrative examples). I’m trying to decide what the best way to remedy this is, and this info and suggestions help. Thanks! Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/1e9aaff1/attachment.html>
Mehdi Amini via llvm-dev
2016-Nov-17 22:10 UTC
[llvm-dev] InstCombine question on combineLoadToOperationType
> On Nov 16, 2016, at 11:23 AM, Friedman, Eli via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 11/15/2016 4:22 PM, Pete Couperus via llvm-dev wrote: >> Hello, >> >> Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence. >> We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like: >> define void @bits(<32 x i1>* %A, <32 x i1>* %B) { >> %a = load <32 x i1>, <32 x i1>* %A >> store <32 x i1> %a, <32 x i1>* %B >> ret void >> } >> >> Is transformed to: >> define void @bits(<32 x i1>* %A, <32 x i1>* %B) { >> %1 = bitcast <32 x i1>* %A to i32* >> %a1 = load i32, i32* %1, align 4 >> %2 = bitcast <32 x i1>* %B to i32* >> store i32 %a1, i32* %2, align 4 >> ret void >> } >> >> This looks to be intentional. >> Is there a way to specify in the data-layout that v32i1 storage is not 32-bits? > > No, not at the moment. You could propose something, but you'd probably have a hard time convincing anyone it's necessary; nobody has cared about this for a very long time. > >> Absent that, is there any other reliable way to retain the original vector loads/store without just disabling this part of InstCombine? > > No, and you'll run into other problems (e.g. alias analysis) if the data layout lies about the size of a load or store. > >> Or is it the backend’s responsibility to try and work with this? > > Where are these loads coming from? x86 without AVX512 doesn't have any convenient way generate code for a <32 x i1> store, but it doesn't matter because frontends don't generate <N x i1> loads and stores. > > If you have a frontend which is generating loads and stores like this, you could probably change it to use some other sequence (like a platform-specific intrinsic, or some sequence involving sext/trunc).Why not just generating the code with the proper storage? If <32 x i1> are used where the storage is <32 x i8> (for example), it seems a bad idea to lie to the IR and hide it with platform-specific intrinsic, right? I fear this would cause other problem down the line in the optimizer. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/9eef4de4/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Legalizing v32i1, v64i1 for Haswell pext/pdep instructions
- Redundant ptrtoint/inttoptr instructions
- AVX512 instruction generated when JIT compiling for an avx2 architecture
- AVX512 instruction generated when JIT compiling for an avx2 architecture
- Question about VectorLegalizer::ExpandStore() with v4i1