thr3ads.net - llvm dev - [llvm-dev] Opportunity to split store of shuffled vector. [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Qiu Chaofan via llvm-dev

2019-Sep-26 09:53 UTC

[llvm-dev] Opportunity to split store of shuffled vector.

Hi there,

I notice that LLVM seems to always generate vector instructions for
vector operations in C, even it's just simple stores:

void foo(vector int* c) {
  (*c)[0] = 1;
  (*c)[1] = 2;
}

%0 = load <4 x i32>, <4 x i32>* %c, align 16
%vecins1 = shufflevector <4 x i32> <i32 1, i32 2, i32 undef, i32
undef>, <4 x i32> %0, <4 x i32> <i32 0, i32 1, i32 6, i32
7>
store <4 x i32> %vecins1, <4 x i32>* %c, align 16

But GCC generates two direct stores to their address, just like
arrays, which should be better on PowerPC. (Some other platforms would
benefit, also) So we can transform above IR to:

%0 = getelementptr inbounds <4 x i32>, <4 x i32>* %c, i64 0, i64 0
store i32 1, i32* %0, align 4
%1 = getelementptr <4 x i32>, <4 x i32>* %c, i64 0, i64 1
store i32 2, i32* %1, align 4

This could be an optimization opportunity, and I guess we can get it
done at InstCombine. But I'm not sure if there's any better place to
do it, since what it does is just like an 'inverse operation' of
vectorization. Also, there might be some other concerns I've not
noticed.

Looking forward to get any comments. Thanks.

Regards,
Qiu Chaofan

Florian Hahn via llvm-dev

2019-Sep-26 11:15 UTC

head link

[llvm-dev] Opportunity to split store of shuffled vector.

Hi
> On Sep 26, 2019, at 10:53, Qiu Chaofan via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi there,
> 
> I notice that LLVM seems to always generate vector instructions for
> vector operations in C, even it's just simple stores:
> 
> void foo(vector int* c) {
>  (*c)[0] = 1;
>  (*c)[1] = 2;
> }
> 
I may be missing something obvious, but what is `vector` defined as here? Can
you provide a buildable example?
> %0 = load <4 x i32>, <4 x i32>* %c, align 16
> %vecins1 = shufflevector <4 x i32> <i32 1, i32 2, i32 undef, i32
> undef>, <4 x i32> %0, <4 x i32> <i32 0, i32 1, i32 6, i32
7>
> store <4 x i32> %vecins1, <4 x i32>* %c, align 16
> 
For some reason, we load 4 elements from %c and write the last 2 elements back
unchanged. This causes sub-optimal codegen here. We could do a better job at
dropping the writes of unchanged elements. But from the original code, it is not
immediately obvious to me why we generate them in the first place. Maybe we
could avoid generating them?

Cheers,
Florian

Qiu Chaofan via llvm-dev

2019-Sep-27 02:59 UTC

head link

[llvm-dev] Opportunity to split store of shuffled vector.

> I may be missing something obvious, but what is `vector` defined as here?
Can you provide a buildable example?
Sorry, I should provide a cross-platform version using vector
extension of frontend :) `vector int` is a vector extension on
PowerPC, which is enabled if you set target to PowerPC platforms.
Example below should be successfully compiled in any platform:

    typedef float v4sf __attribute__ ((vector_size(16)));

    void foo(v4sf *a) {
      (*a)[0] = 1;
      (*a)[3] = 2;
    }

And we can get the IR mentioned before:

    %0 = load <4 x float>, <4 x float>* %a, align 16
    %vecins1 = shufflevector <4 x float> <float 1.000000e+00, float
undef, float undef, float 2.000000e+00>, <4 x float> %0, <4 x
i32>
<i32 0, i32 5, i32 6, i32 3>
    store <4 x float> %vecins1, <4 x float>* %a, align 16

Regards,
Qiu Chaofan


Florian Hahn <florian_hahn at apple.com> 于2019年9月26日周四
下午7:15写道：>
> Hi
>
> > On Sep 26, 2019, at 10:53, Qiu Chaofan via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> >
> > Hi there,
> >
> > I notice that LLVM seems to always generate vector instructions for
> > vector operations in C, even it's just simple stores:
> >
> > void foo(vector int* c) {
> >  (*c)[0] = 1;
> >  (*c)[1] = 2;
> > }
> >
>
> I may be missing something obvious, but what is `vector` defined as here?
Can you provide a buildable example?
>
> > %0 = load <4 x i32>, <4 x i32>* %c, align 16
> > %vecins1 = shufflevector <4 x i32> <i32 1, i32 2, i32 undef,
i32
> > undef>, <4 x i32> %0, <4 x i32> <i32 0, i32 1, i32
6, i32 7>
> > store <4 x i32> %vecins1, <4 x i32>* %c, align 16
> >
>
> For some reason, we load 4 elements from %c and write the last 2 elements
back unchanged. This causes sub-optimal codegen here. We could do a better job
at dropping the writes of unchanged elements. But from the original code, it is
not immediately obvious to me why we generate them in the first place. Maybe we
could avoid generating them?
>
> Cheers,
> Florian

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Sep 2019 - Opportunity to split store of shuffled vector.

[llvm-dev] Opportunity to split store of shuffled vector.

[llvm-dev] Opportunity to split store of shuffled vector.

[llvm-dev] Opportunity to split store of shuffled vector.

Apparently Analagous Threads