thr3ads.net - llvm dev - [llvm-dev] generate vectorized code [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Rail Shafigulin via llvm-dev

2016-Mar-18 21:17 UTC

[llvm-dev] generate vectorized code

On Fri, Mar 18, 2016 at 2:03 PM, Rail Shafigulin <rail at esenciatech.com>
wrote:
> On Fri, Mar 18, 2016 at 1:53 PM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
>
>>
>> On Mar 18, 2016, at 1:47 PM, Rail Shafigulin <rail at
esenciatech.com>
>> wrote:
>>
>> Yes this IR does not build or shuffle any vector. Try to write a
function
>>> that takes 8 ints and a pointer to a <4xi32>, builds two
vectors with the 8
>>> ints,
>>>
>>
>> This might sound like a dumb question, but how does one build a vector
of
>> ints out of regular ints in IR?
>>
>>
>> See: http://llvm.org/docs/LangRef.html#vector-operations
>>
>> In short, the IR has "insertelement", which maps to
"INSERT_VECTOR_ELT"
>> in SDAG and "extractelement", which maps to
"EXTRACT_VECTOR_ELT" in SDAG.
>>
>> I usually find good example by grepping in the lit tests. Another way
is
>> to write the function in clang, and run it with -O3 -emit-llvm -S to
get a
>> good starting point.
>>
> I tried using clang test.c -O3 -emit-llvm -S, but the only I didn't see
> any of the insertvectorelt or extractvectorelt. I'm wondering how does
one
> trigger vector operations?
>
> Below is the test.c file. It seemed to me like a good candidate for
> vectorization, however nothing happened. I would really appreciate  if you
> could point me in the right
> direction with respect to vector generation.
>
> Any help is appreciated.
>
>
>>
>> --
>> Mehdi
>>
>
>
>
> --
> Rail Shafigulin
> Software Engineer
> Esencia Technologies
>
Forgot to attach a C file. Here it is:

#define N 32

int main () {

  int  a[N], b[N];
  int c[N];

  for (int i = 0; i < N; ++i)
       c[i] = a[i] + b[i];

  int sum=0;
  for (int i = 0; i < N; ++i)
       sum += c[i];

  return sum;
}


-- 
Rail Shafigulin
Software Engineer
Esencia Technologies
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160318/42b507f8/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-18 21:24 UTC

head link

[llvm-dev] generate vectorized code

> On Mar 18, 2016, at 2:17 PM, Rail Shafigulin <rail at
esenciatech.com> wrote:
> 
> 
> 
> On Fri, Mar 18, 2016 at 2:03 PM, Rail Shafigulin <rail at
esenciatech.com <mailto:rail at esenciatech.com>> wrote:
> On Fri, Mar 18, 2016 at 1:53 PM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Mar 18, 2016, at 1:47 PM, Rail Shafigulin <rail at
esenciatech.com <mailto:rail at esenciatech.com>> wrote:
>> 
>> Yes this IR does not build or shuffle any vector. Try to write a
function that takes 8 ints and a pointer to a <4xi32>, builds two vectors
with the 8 ints,
>> 
>> This might sound like a dumb question, but how does one build a vector
of ints out of regular ints in IR?
> 
> See: http://llvm.org/docs/LangRef.html#vector-operations
<http://llvm.org/docs/LangRef.html#vector-operations>
> 
> In short, the IR has "insertelement", which maps to
"INSERT_VECTOR_ELT" in SDAG and "extractelement", which maps
to "EXTRACT_VECTOR_ELT" in SDAG.
> 
> I usually find good example by grepping in the lit tests. Another way is to
write the function in clang, and run it with -O3 -emit-llvm -S to get a good
starting point.
> I tried using clang test.c -O3 -emit-llvm -S, but the only I didn't see
any of the insertvectorelt or extractvectorelt. I'm wondering how does one
trigger vector operations?
> 
> Below is the test.c file. It seemed to me like a good candidate for
vectorization, however nothing happened. I would really appreciate  if you could
point me in the right
> direction with respect to vector generation.
I see vectorization happening on this example (see below).

> 
> Any help is appreciated.
>  
> 
> -- 
> Mehdi
> 
> 
> 
> -- 
> Rail Shafigulin
> Software Engineer 
> Esencia Technologies
> 
> Forgot to attach a C file. Here it is:
> 
> #define N 32
> 
> int main () {
> 
>   int  a[N], b[N];
>   int c[N];
> 
>   for (int i = 0; i < N; ++i)
>        c[i] = a[i] + b[i];
> 
>   int sum=0;
>   for (int i = 0; i < N; ++i)
>        sum += c[i];
> 
>   return sum;
> }
> 
This will be vectorized without any insertelement, here is a few lines extracted
from the output of clang on this code:

  %wide.load8.6 = load <4 x i32>* %48, align 16, !tbaa !2
  %49 = add nsw <4 x i32> %wide.load8.6, %wide.load.6
  %50 = getelementptr inbounds [32 x i32]* %c, i64 0, i64 24
  %51 = bitcast i32* %50 to <4 x i32>*
  store <4 x i32> %49, <4 x i32>* %51, align 16, !tbaa !2

Because you didn't write the example as I described it, i.e. taking integer,
doing a few arithmetic and writing result to contiguous memory, the vectorizer
will be able to load directly vectors from memory, operates on them, and store
the results. For example try with the following C code:

void foo (int a1, int a2, int a3, int a4, int b1, int b2, int b3, int b4, int
*res) {
  res[0] = a1 + b1 * 2;
  res[1] = a2 + b2 * 2;
  res[2] = a3 + b3 * 2;
  res[3] = a4 + b4 * 2;
}


That's for the clang part, you can look at the vectorizer lit test to have
examples of IR before/after vectorization.

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160318/8aa873e8/attachment.html>

Rail Shafigulin via llvm-dev

2016-Mar-18 21:37 UTC

head link

[llvm-dev] generate vectorized code

>
> I see vectorization happening on this example (see below).
>
>
>
>> Any help is appreciated.
>>
>>
>>>
>>> --
>>> Mehdi
>>>
>>
>>
>>
>> --
>> Rail Shafigulin
>> Software Engineer
>> Esencia Technologies
>>
>
> Forgot to attach a C file. Here it is:
>
> #define N 32
>
> int main () {
>
>   int  a[N], b[N];
>   int c[N];
>
>   for (int i = 0; i < N; ++i)
>        c[i] = a[i] + b[i];
>
>   int sum=0;
>   for (int i = 0; i < N; ++i)
>        sum += c[i];
>
>   return sum;
> }
>
>
> This will be vectorized without any insertelement, here is a few lines
> extracted from the output of clang on this code:
>
>   %wide.load8.6 = load <4 x i32>* %48, align 16, !tbaa !2
>   %49 = add nsw <4 x i32> %wide.load8.6, %wide.load.6
>   %50 = getelementptr inbounds [32 x i32]* %c, i64 0, i64 24
>   %51 = bitcast i32* %50 to <4 x i32>*
>   store <4 x i32> %49, <4 x i32>* %51, align 16, !tbaa !2
>
Hmm... It didn't work for me. Maybe because I'm running an older version
of
clang, 3.5 to be exactly. For now I'm stuck with it and can't switch to
a
newer version.

> Because you didn't write the example as I described it, i.e. taking
> integer, doing a few arithmetic and writing result to contiguous memory,
> the vectorizer will be able to load directly vectors from memory, operates
> on them, and store the results. For example try with the following C code:
>
> void foo (int a1, int a2, int a3, int a4, int b1, int b2, int b3, int b4,
> int *res) {
>   res[0] = a1 + b1 * 2;
>   res[1] = a2 + b2 * 2;
>   res[2] = a3 + b3 * 2;
>   res[3] = a4 + b4 * 2;
> }
>
>
> That's for the clang part, you can look at the vectorizer lit test to
have
> examples of IR before/after vectorization.
>
> --
> Mehdi
>
I misunderstood you. I thought asked me to create an IR with insertelement
in it. I'm going to try your example and see what happens.


-- 
Rail Shafigulin
Software Engineer
Esencia Technologies
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160318/c4e7ac4d/attachment.html>

Rail Shafigulin via llvm-dev

2016-Mar-18 21:43 UTC

head link

[llvm-dev] generate vectorized code

>
> I see vectorization happening on this example (see below).
>
>
>
>> Any help is appreciated.
>>
>>
>>>
>>> --
>>> Mehdi
>>>
>>
>>
>>
>> --
>> Rail Shafigulin
>> Software Engineer
>> Esencia Technologies
>>
>
> Forgot to attach a C file. Here it is:
>
> #define N 32
>
> int main () {
>
>   int  a[N], b[N];
>   int c[N];
>
>   for (int i = 0; i < N; ++i)
>        c[i] = a[i] + b[i];
>
>   int sum=0;
>   for (int i = 0; i < N; ++i)
>        sum += c[i];
>
>   return sum;
> }
>
>
> This will be vectorized without any insertelement, here is a few lines
> extracted from the output of clang on this code:
>
>   %wide.load8.6 = load <4 x i32>* %48, align 16, !tbaa !2
>   %49 = add nsw <4 x i32> %wide.load8.6, %wide.load.6
>   %50 = getelementptr inbounds [32 x i32]* %c, i64 0, i64 24
>   %51 = bitcast i32* %50 to <4 x i32>*
>   store <4 x i32> %49, <4 x i32>* %51, align 16, !tbaa !2
>
> Because you didn't write the example as I described it, i.e. taking
> integer, doing a few arithmetic and writing result to contiguous memory,
> the vectorizer will be able to load directly vectors from memory, operates
> on them, and store the results. For example try with the following C code:
>
> void foo (int a1, int a2, int a3, int a4, int b1, int b2, int b3, int b4,
> int *res) {
>   res[0] = a1 + b1 * 2;
>   res[1] = a2 + b2 * 2;
>   res[2] = a3 + b3 * 2;
>   res[3] = a4 + b4 * 2;
> }
>
>
> That's for the clang part, you can look at the vectorizer lit test to
have
> examples of IR before/after vectorization.
>
> --
> Mehdi
>
Just out of curiosity how did you know that your foo code will produce
vectorized code? I tried code similar to yours without any multiplication
and no vectors were generated.

-- 
Rail Shafigulin
Software Engineer
Esencia Technologies
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160318/f68d296b/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-18 21:52 UTC

head link

[llvm-dev] generate vectorized code

> On Mar 18, 2016, at 2:43 PM, Rail Shafigulin <rail at
esenciatech.com> wrote:
> 
> I see vectorization happening on this example (see below).
> 
> 
>> 
>> Any help is appreciated.
>>  
>> 
>> -- 
>> Mehdi
>> 
>> 
>> 
>> -- 
>> Rail Shafigulin
>> Software Engineer 
>> Esencia Technologies
>> 
>> Forgot to attach a C file. Here it is:
>> 
>> #define N 32
>> 
>> int main () {
>> 
>>   int  a[N], b[N];
>>   int c[N];
>> 
>>   for (int i = 0; i < N; ++i)
>>        c[i] = a[i] + b[i];
>> 
>>   int sum=0;
>>   for (int i = 0; i < N; ++i)
>>        sum += c[i];
>> 
>>   return sum;
>> }
>> 
> 
> This will be vectorized without any insertelement, here is a few lines
extracted from the output of clang on this code:
> 
>   %wide.load8.6 = load <4 x i32>* %48, align 16, !tbaa !2
>   %49 = add nsw <4 x i32> %wide.load8.6, %wide.load.6
>   %50 = getelementptr inbounds [32 x i32]* %c, i64 0, i64 24
>   %51 = bitcast i32* %50 to <4 x i32>*
>   store <4 x i32> %49, <4 x i32>* %51, align 16, !tbaa !2
> 
> Because you didn't write the example as I described it, i.e. taking
integer, doing a few arithmetic and writing result to contiguous memory, the
vectorizer will be able to load directly vectors from memory, operates on them,
and store the results. For example try with the following C code:
> 
> void foo (int a1, int a2, int a3, int a4, int b1, int b2, int b3, int b4,
int *res) {
>   res[0] = a1 + b1 * 2;
>   res[1] = a2 + b2 * 2;
>   res[2] = a3 + b3 * 2;
>   res[3] = a4 + b4 * 2;
> }
> 
> 
> That's for the clang part, you can look at the vectorizer lit test to
have examples of IR before/after vectorization.
> 
> -- 
> Mehdi
> 
> Just out of curiosity how did you know that your foo code will produce
vectorized code?
I read the source code for the SLP Vectorizer ;)
(other than looking at unit tests, this is another good way of learning of LLVM
works)
> I tried code similar to yours without any multiplication and no vectors
were generated.
It is a matter of cost model: there need to be a few arithmetic instruction to
balance the cost of building a vector.

-- 
Mehdi


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160318/0198d22f/attachment.html>

llvm dev - Mar 2016 - generate vectorized code

[llvm-dev] generate vectorized code

[llvm-dev] generate vectorized code

[llvm-dev] generate vectorized code

[llvm-dev] generate vectorized code

[llvm-dev] generate vectorized code