Liu Xin
2013-Oct-25 03:02 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, LLVM community, I write some code in hand using LLVM IR. for simplicity, I write them in <4 x float>. now I found some stores for elements are useless. for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a. maybe only %a.xy is alive in my program. our target doesn't feature SIMD instruction, which means we have to lower vector to many scalar instructions. I found llvm doesn't have DSE in codegen , right? Is there a pass which can break down vector operation to scalars? thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/33b18b5b/attachment.html>
Richard Sandiford
2013-Oct-25 10:06 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
Liu Xin <navy.xliu at gmail.com> writes:> Hi, LLVM community, > > I write some code in hand using LLVM IR. for simplicity, I write them in <4 > x float>. now I found some stores for elements are useless. > > for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a. maybe > only %a.xy is alive in my program. our target doesn't feature SIMD > instruction, which means we have to lower vector to many scalar > instructions. I found llvm doesn't have DSE in codegen , right? > > > Is there a pass which can break down vector operation to scalars?I wanted the same thing for SystemZ, which doesn't have vectors, in order to improve the llvmpipe code. FWIW, here's what I have locally. It is able to decompose loads and stores, but I found in the llvmpipe case that this made things worse with TBAA, because DAGCombiner::GaterAllAliases has some fairly strict limits. So I disabled that by default; use -decompose-vector-load-store to reenable. The main motivation for z was instead to get InstCombine to rewrite things like scalarised selects. I haven't submitted it yet because it's less of a win than the TBAA DAGCombiner patch I posted, so I didn't want to distract from that. It would also need some TargetTransformInfo hooks to decide which vectors should be decomposed. Thanks, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: decompose-vectors.diff Type: text/x-patch Size: 32835 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/ce62d95a/attachment.bin>
Renato Golin
2013-Oct-25 12:53 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
On 25 October 2013 11:06, Richard Sandiford <rsandifo at linux.vnet.ibm.com>wrote:> I wanted the same thing for SystemZ, which doesn't have vectors, > in order to improve the llvmpipe code. >Hi Richard, This is a nice patch. I was wondering how hard it'd be to do that, and it seems that you're catching lots of corner cases. My interest is also due to converting odd vectors into scalars, but to convert them again to CPU vectors, say from OpenCL to NEON code. It would also need some TargetTransformInfo hooks to decide which> vectors should be decomposed. >If I got it right, this may not be necessary, or it may even be harmful. Say you decide that <4 x i32> vectors should be left alone, so that your pass only scalarise the others. But when the vectorizer passes again (to try and use CPU vector instructions), it might not match the scalarised version with the vector, and you end up with data movement between scalar and vector pipelines, which normally slows down CPUs (at least in ARM's case). Also, problematic cases like <5 x i32> could be better split into 3+2 pairs, rather than 4+1. If you scalarise everything, than the vectorizers will have a better chance of spotting patterns and vectorising the whole lot, then based on target transform info. Is that what you had in mind? cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/cf325b87/attachment.html>
Liu Xin
2013-Oct-25 13:49 UTC
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, Richard, I think we are solving a same problem. I am working on shader language too. I am not satisfied with current binaries because vector operations are kept in llvm opt. glsl shader language has an operation called "swizzle". It can select sub-components of a vector. If a shader only takes components "xy" for a vec4. it's certainly wasteful to generate 4 operations for a scalar processor. i think a good solution for llvm is in codegen. Many compiler has codegen optimizer. A DSE is good enough. Which posted patch about TBAA? you have yet another solution except decompose-vectors? thanks, --lx On Fri, Oct 25, 2013 at 6:06 PM, Richard Sandiford < rsandifo at linux.vnet.ibm.com> wrote:> Liu Xin <navy.xliu at gmail.com> writes: > > Hi, LLVM community, > > > > I write some code in hand using LLVM IR. for simplicity, I write them in > <4 > > x float>. now I found some stores for elements are useless. > > > > for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a. maybe > > only %a.xy is alive in my program. our target doesn't feature SIMD > > instruction, which means we have to lower vector to many scalar > > instructions. I found llvm doesn't have DSE in codegen , right? > > > > > > Is there a pass which can break down vector operation to scalars? > > I wanted the same thing for SystemZ, which doesn't have vectors, > in order to improve the llvmpipe code. FWIW, here's what I have locally. > > It is able to decompose loads and stores, but I found in the llvmpipe case > that this made things worse with TBAA, because DAGCombiner::GaterAllAliases > has some fairly strict limits. So I disabled that by default; use > -decompose-vector-load-store to reenable. > > The main motivation for z was instead to get InstCombine to rewrite > things like scalarised selects. > > I haven't submitted it yet because it's less of a win than the TBAA > DAGCombiner patch I posted, so I didn't want to distract from that. > It would also need some TargetTransformInfo hooks to decide which > vectors should be decomposed. > > Thanks, > Richard > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/4784cfdb/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars
- [LLVMdev] Is there pass to break down <4 x float> to scalars