thr3ads.net - llvm dev - [LLVMdev] Is there pass to break down <4 x float> to scalars [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Liu Xin

2013-Oct-25 03:02 UTC

[LLVMdev] Is there pass to break down <4 x float> to scalars

Hi, LLVM community,

I write some code in hand using LLVM IR. for simplicity, I write them in <4
x float>. now I found some stores for elements are useless.

for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a. maybe
only %a.xy is alive in my program.  our target doesn't feature SIMD
instruction, which means we have to lower vector to many  scalar
instructions. I found llvm doesn't have DSE in codegen , right?


Is there a pass which can break down vector operation to scalars?


thanks,
--lx
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/33b18b5b/attachment.html>

Richard Sandiford

2013-Oct-25 10:06 UTC

head link

[LLVMdev] Is there pass to break down <4 x float> to scalars

Liu Xin <navy.xliu at gmail.com> writes:> Hi, LLVM community,
>
> I write some code in hand using LLVM IR. for simplicity, I write them in
<4
> x float>. now I found some stores for elements are useless.
>
> for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float> %a.
maybe
> only %a.xy is alive in my program.  our target doesn't feature SIMD
> instruction, which means we have to lower vector to many  scalar
> instructions. I found llvm doesn't have DSE in codegen , right?
>
>
> Is there a pass which can break down vector operation to scalars?
I wanted the same thing for SystemZ, which doesn't have vectors,
in order to improve the llvmpipe code.  FWIW, here's what I have locally.

It is able to decompose loads and stores, but I found in the llvmpipe case
that this made things worse with TBAA, because DAGCombiner::GaterAllAliases
has some fairly strict limits.  So I disabled that by default; use
-decompose-vector-load-store to reenable.

The main motivation for z was instead to get InstCombine to rewrite
things like scalarised selects.

I haven't submitted it yet because it's less of a win than the TBAA
DAGCombiner patch I posted, so I didn't want to distract from that.
It would also need some TargetTransformInfo hooks to decide which
vectors should be decomposed.

Thanks,
Richard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: decompose-vectors.diff
Type: text/x-patch
Size: 32835 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/ce62d95a/attachment.bin>

Renato Golin

2013-Oct-25 12:53 UTC

head link

[LLVMdev] Is there pass to break down <4 x float> to scalars

On 25 October 2013 11:06, Richard Sandiford <rsandifo at
linux.vnet.ibm.com>wrote:
> I wanted the same thing for SystemZ, which doesn't have vectors,
> in order to improve the llvmpipe code.
>
Hi Richard,

This is a nice patch. I was wondering how hard it'd be to do that, and it
seems that you're catching lots of corner cases.

My interest is also due to converting odd vectors into scalars, but to
convert them again to CPU vectors, say from OpenCL to NEON code.

It would also need some TargetTransformInfo hooks to decide
which> vectors should be decomposed.
>
If I got it right, this may not be necessary, or it may even be harmful.

Say you decide that <4 x i32> vectors should be left alone, so that your
pass only scalarise the others. But when the vectorizer passes again (to
try and use CPU vector instructions), it might not match the scalarised
version with the vector, and you end up with data movement between scalar
and vector pipelines, which normally slows down CPUs (at least in ARM's
case). Also, problematic cases like <5 x i32> could be better split into
3+2 pairs, rather than 4+1.

If you scalarise everything, than the vectorizers will have a better chance
of spotting patterns and vectorising the whole lot, then based on target
transform info.

Is that what you had in mind?

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/cf325b87/attachment.html>

Liu Xin

2013-Oct-25 13:49 UTC

head link

[LLVMdev] Is there pass to break down <4 x float> to scalars

Hi, Richard,

I think we are solving a same problem. I am working on shader language
too.  I am not satisfied with current binaries because vector operations
are kept in llvm opt.

glsl shader language has an operation called "swizzle". It can select
sub-components of a vector. If a shader only takes components "xy" for
a
vec4. it's certainly wasteful to generate 4 operations for a scalar
processor.

i think a good solution for llvm is in codegen. Many compiler has codegen
optimizer. A DSE is good enough.

Which posted patch about TBAA? you have yet another solution except
decompose-vectors?


thanks,
--lx



On Fri, Oct 25, 2013 at 6:06 PM, Richard Sandiford <
rsandifo at linux.vnet.ibm.com> wrote:
> Liu Xin <navy.xliu at gmail.com> writes:
> > Hi, LLVM community,
> >
> > I write some code in hand using LLVM IR. for simplicity, I write them
in
> <4
> > x float>. now I found some stores for elements are useless.
> >
> > for example, If I store {0.0, 1.0, 2.0, 3.0} to a <4 x float>
%a. maybe
> > only %a.xy is alive in my program.  our target doesn't feature
SIMD
> > instruction, which means we have to lower vector to many  scalar
> > instructions. I found llvm doesn't have DSE in codegen , right?
> >
> >
> > Is there a pass which can break down vector operation to scalars?
>
> I wanted the same thing for SystemZ, which doesn't have vectors,
> in order to improve the llvmpipe code.  FWIW, here's what I have
locally.
>
> It is able to decompose loads and stores, but I found in the llvmpipe case
> that this made things worse with TBAA, because DAGCombiner::GaterAllAliases
> has some fairly strict limits.  So I disabled that by default; use
> -decompose-vector-load-store to reenable.
>
> The main motivation for z was instead to get InstCombine to rewrite
> things like scalarised selects.
>
> I haven't submitted it yet because it's less of a win than the TBAA
> DAGCombiner patch I posted, so I didn't want to distract from that.
> It would also need some TargetTransformInfo hooks to decide which
> vectors should be decomposed.
>
> Thanks,
> Richard
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/4784cfdb/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Oct 2013 - [LLVMdev] Is there pass to break down <4 x float> to scalars

[LLVMdev] Is there pass to break down <4 x float> to scalars

[LLVMdev] Is there pass to break down <4 x float> to scalars

[LLVMdev] Is there pass to break down <4 x float> to scalars

[LLVMdev] Is there pass to break down <4 x float> to scalars

Maybe Matching Threads