thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] LLVM loop vectorizer [Feb 2016]

If this information is useful, please help other people find it:
Share via:

RCU

2015-Jul-08 17:01 UTC

[LLVMdev] LLVM loop vectorizer

Hello.
I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication)
procedure
but the LLVM loop vectorizer is not able to handle such code.
I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the
-fvectorize option
with clang and -loop-vectorize with opt-3.4 .
The CSR SpMV function is inspired from
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
(I can provide the exact code samples used).

Basically the problem is the loop vectorizer does NOT work with if inside
loop (be it
2 nested loops or a modification of SpMV I did with just 1 loop - I can provide
the exact
code) changing the value of the accumulator z. I can sort of understand why LLVM
isn't
able to vectorize the code.
However, at http://llvm.org/docs/Vectorizers.html#if-conversion it is
written:
<<The Loop Vectorizer is able to "flatten" the IF
statement in the code and
generate a single stream of instructions.
The Loop Vectorizer supports any control flow in the innermost
loop.
The innermost loop may contain complex nesting of IFs, ELSEs and
even GOTOs.>>
Could you please tell me what are these lines exactly trying to say.

Could you please tell me what algorithm is the LLVM loop vectorizer using
(maybe the
algorithm is described in a paper) - I currently found only 2 presentations on
this topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf and
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
.

Thank you very much,
Alex

Michael Zolotukhin

2015-Jul-08 18:17 UTC

head link

[LLVMdev] LLVM loop vectorizer

Hi Alex,

Example from the link you provided looks like this:
for (i=0; i<M; i++ ){ 
    z[i]=0;
    for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; ckey++) {
      z[i] += data[ckey]*x[colind[ckey]];
    }              
  }
Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the
innermost loop.

But anyway, here vectorizer might have following troubles:
1) iteration count of the innermost loop is unknown.
2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s possible to
generate efficient code for such case, but a) I think it’s not supported yet, b)
if this ISA isn’t available, then vectorized code would need to ‘manually’
gather scalar values to vector, which might be slow (and thus, vectorizer might
decide to leave the code scalar).

And here is a list of papers vectorizer is based on:
// The reduction-variable vectorization is based on the paper:
//  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
//
// Variable uniformity checks are inspired by:
//  Karrenberg, R. and Hack, S. Whole Function Vectorization.
//
// The interleaved access vectorization is based on the paper:
//  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved
//  Data for SIMD
//
// Other ideas/concepts are from:
//  A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
//
//  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An Evaluation of
//  Vectorizing Compilers.
And probably, some of the parts are written from scratch with no reference to a
paper.

The presentations you found are a good starting point, but while they’re still
good from getting basics of the vectorizer, they are a bit outdated now in a
sense that a lot of new features has been added since then (and bugs fixed:) ).
Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the
example above, but it would be much more convenient to investigate why the loop
isn’t vectorized and fix vectorizer if we figure out how.

Best regards,
Michael
> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com> wrote:
> 
>  Hello.
>    I am trying to vectorize a CSR SpMV (sparse matrix vector
multiplication) procedure but the LLVM loop vectorizer is not able to handle
such code.
>    I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the
-fvectorize option with clang and -loop-vectorize with opt-3.4 .
>    The CSR SpMV function is inspired from
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
(I can provide the exact code samples used).
> 
>    Basically the problem is the loop vectorizer does NOT work with if
inside loop (be it 2 nested loops or a modification of SpMV I did with just 1
loop - I can provide the exact code) changing the value of the accumulator z. I
can sort of understand why LLVM isn't able to vectorize the code.
>    However, at http://llvm.org/docs/Vectorizers.html#if-conversion it is
written:
>            <<The Loop Vectorizer is able to "flatten" the
IF statement in the code and generate a single stream of instructions.
>              The Loop Vectorizer supports any control flow in the innermost
loop.
>              The innermost loop may contain complex nesting of IFs, ELSEs
and even GOTOs.>>
>     Could you please tell me what are these lines exactly trying to say.
> 
>    Could you please tell me what algorithm is the LLVM loop vectorizer
using (maybe the algorithm is described in a paper) - I currently found only 2
presentations on this topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf and
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
.
> 
>  Thank you very much,
>      Alex
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150708/2188ff84/attachment.html>

Mikhail Zolotukhin via llvm-dev

2015-Sep-17 23:49 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Resending to new llvm-dev...
> On Sep 17, 2015, at 4:48 PM, Michael Zolotukhin <mzolotukhin at
apple.com> wrote:
> 
> Hi Alex,
> 
>> On Sep 17, 2015, at 4:04 PM, Alex Susu <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>> wrote:
>> 
>>   Hello, Michael,
>>     Thank you for the answer.
>>     I think I found the reason why the code does not get vectorized at
all (none of the loops get vectorized): LLVM's LoopVectorizer does NOT work
well with the inner loop doing reduction on float, even if using -ffast-math.
> Yes, LLVM Vectorizer has some problems with float reductions, but there has
been some work done on this recently (and more to come). Also, diagnostics are
much better than they were before. So, I tried the following case (which is very
similar to yours) with a recent compiler:
> $ cat red.c
> float foo(float *A, float *B, int *C, int N) {
>   int i = 0;
>   float r = 0.0;
>   for (i = 0; i < N; i++) {
>     r += A[i] * B[C[i]];
>   }
>   return r;
> }
> 
> It's not vectorized without '-ffast-math':
> $ bin/clang -O3 red.c -Rpass=loop-vectorize -S
-Rpass-analysis=loop-vectorize
> red.c:6:7: remark: loop not vectorized: cannot prove it is safe to reorder
floating-point operations; allow reordering by specifying '#pragma clang
loop vectorize(enable)' before the loop or by providing the compiler option
'-ffast-math'.
>       [-Rpass-analysis=loop-vectorize]
>     r += A[i] * B[C[i]];
>       ^
> 
> but even with '-ffast-math' the cost heuristic says that
vectorization isn't beneficial:
> $ bin/clang -O3 red.c -Rpass=loop-vectorize -S
-Rpass-analysis=loop-vectorize -ffast-math
> red.c:6:10: remark: the cost-model indicates that vectorization is not
beneficial [-Rpass-analysis=loop-vectorize]
>     r += A[i] * B[C[i]];
>          ^
> red.c:6:10: remark: the cost-model indicates that interleaving is not
beneficial [-Rpass-analysis=loop-vectorize]
> 
> When I overrode the cost model by adding the suggested pragma before the
loop, I finally got the loop vectorized:
> $ bin/clang -O3 red.c -Rpass=loop-vectorize -S
-Rpass-analysis=loop-vectorize
> red.c:6:10: remark: vectorized loop (vectorization width: 2, interleaved
count: 2) [-Rpass=loop-vectorize]
>     r += A[i] * B[C[i]];
>          ^
> 
> BTW, I don't expect much gains from vectorization of this particular
loop. Expression B[C[i]] requires a lot of auxiliary instructions in vector
version, so it mitigates most of the gains from vectorization.
> 
> 
>>     I attach sample code that shows this - however, the float
NON-vectorization is not always happening - mabye it's some memory
corruption in the LLVM LoopVectorizer, which sometimes results in a bad state.
> Do you mean that sometimes you got the float loop vectorized? If that’s so,
it sounds really strange..
> 
>> 
>>     Let me know if I can provide more info.
>>     I'd like to mention I'm using the LLVM built from the
repository - downloaded on Jul 14th, 2015.
> If you really want to vectorize this loop, I'd suggest updating LLVM
and using pragma. If you have any questions, I'd be glad to help with them,
if I can.
> 
> Best regards,
> Michael
> 
>> 
>> 
>>   Best regards,
>>     Alex
>> 
>> 
>> 
>> On Wed, Jul 8, 2015 at 9:17 PM, Michael Zolotukhin <mzolotukhin at
apple.com <mailto:mzolotukhin at apple.com>> wrote:
>> Hi Alex,
>> 
>> Example from the link you provided looks like this:
>> for (i=0; i<M; i++ ){ 
>>     z[i]=0;
>>     for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; ckey++) {
>>       z[i] += data[ckey]*x[colind[ckey]];
>>     }              
>>   }
>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside
the innermost loop.
>> 
>> But anyway, here vectorizer might have following troubles:
>> 1) iteration count of the innermost loop is unknown.
>> 2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s
possible to generate efficient code for such case, but a) I think it’s not
supported yet, b) if this ISA isn’t available, then vectorized code would need
to ‘manually’ gather scalar values to vector, which might be slow (and thus,
vectorizer might decide to leave the code scalar).
>> 
>> And here is a list of papers vectorizer is based on:
>> // The reduction-variable vectorization is based on the paper:
>> //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
>> //
>> // Variable uniformity checks are inspired by:
>> //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
>> //
>> // The interleaved access vectorization is based on the paper:
>> //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
Interleaved
>> //  Data for SIMD
>> //
>> // Other ideas/concepts are from:
>> //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
>> //
>> //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An
Evaluation of
>> //  Vectorizing Compilers.
>> And probably, some of the parts are written from scratch with no
reference to a paper.
>> 
>> The presentations you found are a good starting point, but while
they’re still good from getting basics of the vectorizer, they are a bit
outdated now in a sense that a lot of new features has been added since then
(and bugs fixed:) ). Also, I’d recommend trying a newer LLVM version - I don’t
think it’ll handle the example above, but it would be much more convenient to
investigate why the loop isn’t vectorized and fix vectorizer if we figure out
how.
>> 
>> Best regards,
>> Michael
>> 
>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>> wrote:
>>> 
>>>  Hello.
>>>    I am trying to vectorize a CSR SpMV (sparse matrix vector
multiplication) procedure but the LLVM loop vectorizer is not able to handle
such code.
>>>    I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use
the -fvectorize option with clang and -loop-vectorize with opt-3.4 .
>>>    The CSR SpMV function is inspired from
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
<http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp>
(I can provide the exact code samples used).
>>> 
>>>    Basically the problem is the loop vectorizer does NOT work with
if inside loop (be it 2 nested loops or a modification of SpMV I did with just 1
loop - I can provide the exact code) changing the value of the accumulator z. I
can sort of understand why LLVM isn't able to vectorize the code.
>>>    However, at http://llvm.org/docs/Vectorizers.html#if-conversion
<http://llvm.org/docs/Vectorizers.html#if-conversion> it is written:
>>>            <<The Loop Vectorizer is able to
"flatten" the IF statement in the code and generate a single stream of
instructions.
>>>              The Loop Vectorizer supports any control flow in the
innermost loop.
>>>              The innermost loop may contain complex nesting of IFs,
ELSEs and even GOTOs.>>
>>>     Could you please tell me what are these lines exactly trying to
say.
>>> 
>>>    Could you please tell me what algorithm is the LLVM loop
vectorizer using (maybe the algorithm is described in a paper) - I currently
found only 2 presentations on this topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf
<http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf> and
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
<https://urldefense.proofpoint.com/v2/url?u=https-3A__archive.fosdem.org_2014_schedule_event_llvmautovec_attachments_audio_321_export_events_attachments_llvmautovec_audio_321_AutoVectorizationLLVM.pdf&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=ygVmcuuQ1MUhRUoJm-IgPtgjmvM0byfjlHDg99vufEI&m=wB_Hvsma5X84froglc2I4UFz1HGlCGpZpM7nxKKO_B8&s=pi6FRG39lC4JeMV5Onu3CZkX34HTM_85dUhLBBr4dKI&e=>.
>>> 
>>>  Thank you very much,
>>>      Alex
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>       
http://llvm.cs.uiuc.edu
<https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.cs.uiuc.edu&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=ygVmcuuQ1MUhRUoJm-IgPtgjmvM0byfjlHDg99vufEI&m=wB_Hvsma5X84froglc2I4UFz1HGlCGpZpM7nxKKO_B8&s=FRPFoePA5qDz4sI9_rCux2_hmquZMsQkZdiz5BsCUmI&e=>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.cs.uiuc.edu_mailman_listinfo_llvmdev&d=BQMFaQ&c=eEvniauFctOgLOKGJOplqw&r=ygVmcuuQ1MUhRUoJm-IgPtgjmvM0byfjlHDg99vufEI&m=wB_Hvsma5X84froglc2I4UFz1HGlCGpZpM7nxKKO_B8&s=Dd31y32iLFcIPCQ4SwWGWfg1Fc0g5dGL6xRRJxnXaIY&e=>
>> 
>> 
>> <test_case_LoopVectorizer.zip>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150917/ad063e8a/attachment.html>

Alex Susu via llvm-dev

2016-Feb-15 14:42 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Hello, Michael.
     I come back to this older email.

     I am trying to implement coalescing/collapsing of nested loops. This would
be clearly
beneficial for the loop vectorizer, also.
     I'm normally planning to start modifying the LLVM loop vectorizer to
add loop
coalescing of the LLVM language.

     Are you aware of a similar effort on loop coalescing in LLVM (maybe even a
different
LLVM pass, not related to the LLVM loop vectorizer)?

   Thank you,
     Alex

On 7/9/2015 10:38 AM, RCU wrote:>
>
> With best regards,
>      Alex Susu
>
> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>> Hi Alex,
>>
>> Example from the link you provided looks like this:
>>
>> |for  (i=0;  i<M;  i++  ){
>>      z[i]=0;
>>      for  (ckey=row_ptr[i];  ckey<row_ptr[i+1];  ckey++)  {
>>        z[i]  +=  data[ckey]*x[colind[ckey]];
>>      }
>>    }|
>>
>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside
the innermost loop.
> I tried to simplify this code in the hope the loop vectorizer can take care
of it better:
> I linearized...
>
>> But anyway, here vectorizer might have following troubles:
>> 1) iteration count of the innermost loop is unknown.
>> 2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s
possible to generate
>> efficient code for such case, but a) I think it’s not supported yet, b)
if this ISA isn’t
>> available, then vectorized code would need to ‘manually’ gather scalar
values to vector,
>> which might be slow (and thus, vectorizer might decide to leave the
code scalar).
>>
>> And here is a list of papers vectorizer is based on:
>> // The reduction-variable vectorization is based on the paper:
>> //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
>> //
>> // Variable uniformity checks are inspired by:
>> //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
>> //
>> // The interleaved access vectorization is based on the paper:
>> //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
Interleaved
>> //  Data for SIMD
>> //
>> // Other ideas/concepts are from:
>> //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
>> //
>> //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An
Evaluation of
>> //  Vectorizing Compilers.
>> And probably, some of the parts are written from scratch with no
reference to a paper.
>>
>> The presentations you found are a good starting point, but while
they’re still good from
>> getting basics of the vectorizer, they are a bit outdated now in a
sense that a lot of new
>> features has been added since then (and bugs fixed:) ). Also, I’d
recommend trying a newer
>> LLVM version - I don’t think it’ll handle the example above, but it
would be much more
>> convenient to investigate why the loop isn’t vectorized and fix
vectorizer if we figure
>> out how.
>>
>> Best regards,
>> Michael
>>
>
>    Thanks for the papers - these appear to be written in the header of the
file
> implementing the loop vect. tranformation (found at
>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>
>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>>
>>> wrote:
>>>
>>>  Hello.
>>>    I am trying to vectorize a CSR SpMV (sparse matrix vector
multiplication) procedure
>>> but the LLVM loop vectorizer is not able to handle such code.
>>>    I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use
the -fvectorize option
>>> with clang and -loop-vectorize with opt-3.4 .
>>>    The CSR SpMV function is inspired from
>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>
>>> (I can provide the exact code samples used).
>>>
>>>    Basically the problem is the loop vectorizer does NOT work with
if inside loop (be it
>>> 2 nested loops or a modification of SpMV I did with just 1 loop - I
can provide the
>>> exact code) changing the value of the accumulator z. I can sort of
understand why LLVM
>>> isn't able to vectorize the code.
>>>    However, at http://llvm.org/docs/Vectorizers.html#if-conversion
it is written:
>>>            <<The Loop Vectorizer is able to
"flatten" the IF statement in the code and
>>> generate a single stream of instructions.
>>>              The Loop Vectorizer supports any control flow in the
innermost loop.
>>>              The innermost loop may contain complex nesting of IFs,
ELSEs and even
>>> GOTOs.>>
>>>     Could you please tell me what are these lines exactly trying to
say.
>>>
>>>    Could you please tell me what algorithm is the LLVM loop
vectorizer using (maybe the
>>> algorithm is described in a paper) - I currently found only 2
presentations on this
>>> topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf and
>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>
>>> .
>>>
>>>  Thank you very much,
>>>      Alex
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

RCU via llvm-dev

2016-Feb-15 14:44 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Hello, Michael.
     I come back to this older email. Sorry if you receive it again.

     I am trying to implement coalescing/collapsing of nested loops. This would
be clearly
beneficial for the loop vectorizer, also.
     I'm normally planning to start modifying the LLVM loop vectorizer to
add loop
coalescing of the LLVM language.

     Are you aware of a similar effort on loop coalescing in LLVM (maybe even a
different
LLVM pass, not related to the LLVM loop vectorizer)?

   Thank you,
     Alex

On 7/9/2015 10:38 AM, RCU wrote:>
>
> With best regards,
>      Alex Susu
>
> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>> Hi Alex,
>>
>> Example from the link you provided looks like this:
>>
>> |for  (i=0;  i<M;  i++  ){
>>      z[i]=0;
>>      for  (ckey=row_ptr[i];  ckey<row_ptr[i+1];  ckey++)  {
>>        z[i]  +=  data[ckey]*x[colind[ckey]];
>>      }
>>    }|
>>
>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside
the innermost loop.
> I tried to simplify this code in the hope the loop vectorizer can take care
of it better:
> I linearized...
>
>> But anyway, here vectorizer might have following troubles:
>> 1) iteration count of the innermost loop is unknown.
>> 2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s
possible to generate
>> efficient code for such case, but a) I think it’s not supported yet, b)
if this ISA isn’t
>> available, then vectorized code would need to ‘manually’ gather scalar
values to vector,
>> which might be slow (and thus, vectorizer might decide to leave the
code scalar).
>>
>> And here is a list of papers vectorizer is based on:
>> // The reduction-variable vectorization is based on the paper:
>> //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
>> //
>> // Variable uniformity checks are inspired by:
>> //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
>> //
>> // The interleaved access vectorization is based on the paper:
>> //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
Interleaved
>> //  Data for SIMD
>> //
>> // Other ideas/concepts are from:
>> //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
>> //
>> //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An
Evaluation of
>> //  Vectorizing Compilers.
>> And probably, some of the parts are written from scratch with no
reference to a paper.
>>
>> The presentations you found are a good starting point, but while
they’re still good from
>> getting basics of the vectorizer, they are a bit outdated now in a
sense that a lot of new
>> features has been added since then (and bugs fixed:) ). Also, I’d
recommend trying a newer
>> LLVM version - I don’t think it’ll handle the example above, but it
would be much more
>> convenient to investigate why the loop isn’t vectorized and fix
vectorizer if we figure
>> out how.
>>
>> Best regards,
>> Michael
>>
>
>    Thanks for the papers - these appear to be written in the header of the
file
> implementing the loop vect. tranformation (found at
>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>
>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>>
>>> wrote:
>>>
>>>  Hello.
>>>    I am trying to vectorize a CSR SpMV (sparse matrix vector
multiplication) procedure
>>> but the LLVM loop vectorizer is not able to handle such code.
>>>    I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use
the -fvectorize option
>>> with clang and -loop-vectorize with opt-3.4 .
>>>    The CSR SpMV function is inspired from
>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>
>>> (I can provide the exact code samples used).
>>>
>>>    Basically the problem is the loop vectorizer does NOT work with
if inside loop (be it
>>> 2 nested loops or a modification of SpMV I did with just 1 loop - I
can provide the
>>> exact code) changing the value of the accumulator z. I can sort of
understand why LLVM
>>> isn't able to vectorize the code.
>>>    However, at http://llvm.org/docs/Vectorizers.html#if-conversion
it is written:
>>>            <<The Loop Vectorizer is able to
"flatten" the IF statement in the code and
>>> generate a single stream of instructions.
>>>              The Loop Vectorizer supports any control flow in the
innermost loop.
>>>              The innermost loop may contain complex nesting of IFs,
ELSEs and even
>>> GOTOs.>>
>>>     Could you please tell me what are these lines exactly trying to
say.
>>>
>>>    Could you please tell me what algorithm is the LLVM loop
vectorizer using (maybe the
>>> algorithm is described in a paper) - I currently found only 2
presentations on this
>>> topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf and
>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>
>>> .
>>>
>>>  Thank you very much,
>>>      Alex
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Gerolf Hoflehner via llvm-dev

2016-Feb-17 22:03 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Hi,

What is the performance gain you expect? Can you share modified source(s) or IR?
It might be good to have such 'test cases’ that can show potential
performance suites eg. as a separate test suite. And this might be a good start.

Since you are looking at a loop transformation you might find some inspiration
from Adam’s loop distribution pass http://reviews.llvm.org/D8831
<http://reviews.llvm.org/D8831>

Good luck!

Gerolf
> On Feb 15, 2016, at 6:44 AM, RCU via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>  Hello, Michael.
>    I come back to this older email. Sorry if you receive it again.
> 
>    I am trying to implement coalescing/collapsing of nested loops. This
would be clearly beneficial for the loop vectorizer, also.
>    I'm normally planning to start modifying the LLVM loop vectorizer to
add loop coalescing of the LLVM language.
> 
>    Are you aware of a similar effort on loop coalescing in LLVM (maybe even
a different LLVM pass, not related to the LLVM loop vectorizer)?
> 
>  Thank you,
>    Alex
> 
> On 7/9/2015 10:38 AM, RCU wrote:
>> 
>> 
>> With best regards,
>>     Alex Susu
>> 
>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>> Hi Alex,
>>> 
>>> Example from the link you provided looks like this:
>>> 
>>> |for  (i=0;  i<M;  i++  ){
>>>     z[i]=0;
>>>     for  (ckey=row_ptr[i];  ckey<row_ptr[i+1];  ckey++)  {
>>>       z[i]  +=  data[ckey]*x[colind[ckey]];
>>>     }
>>>   }|
>>> 
>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’
inside the innermost loop.
>> I tried to simplify this code in the hope the loop vectorizer can take
care of it better:
>> I linearized...
>> 
>>> But anyway, here vectorizer might have following troubles:
>>> 1) iteration count of the innermost loop is unknown.
>>> 2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions
it’s possible to generate
>>> efficient code for such case, but a) I think it’s not supported
yet, b) if this ISA isn’t
>>> available, then vectorized code would need to ‘manually’ gather
scalar values to vector,
>>> which might be slow (and thus, vectorizer might decide to leave the
code scalar).
>>> 
>>> And here is a list of papers vectorizer is based on:
>>> // The reduction-variable vectorization is based on the paper:
>>> //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
>>> //
>>> // Variable uniformity checks are inspired by:
>>> //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
>>> //
>>> // The interleaved access vectorization is based on the paper:
>>> //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
Interleaved
>>> //  Data for SIMD
>>> //
>>> // Other ideas/concepts are from:
>>> //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years
later.
>>> //
>>> //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An
Evaluation of
>>> //  Vectorizing Compilers.
>>> And probably, some of the parts are written from scratch with no
reference to a paper.
>>> 
>>> The presentations you found are a good starting point, but while
they’re still good from
>>> getting basics of the vectorizer, they are a bit outdated now in a
sense that a lot of new
>>> features has been added since then (and bugs fixed:) ). Also, I’d
recommend trying a newer
>>> LLVM version - I don’t think it’ll handle the example above, but it
would be much more
>>> convenient to investigate why the loop isn’t vectorized and fix
vectorizer if we figure
>>> out how.
>>> 
>>> Best regards,
>>> Michael
>>> 
>> 
>>   Thanks for the papers - these appear to be written in the header of
the file
>> implementing the loop vect. tranformation (found at
>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>> 
>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>>
>>>> wrote:
>>>> 
>>>> Hello.
>>>>   I am trying to vectorize a CSR SpMV (sparse matrix vector
multiplication) procedure
>>>> but the LLVM loop vectorizer is not able to handle such code.
>>>>   I am using cland and llvm version 3.4 (on Ubuntu 12.10). I
use the -fvectorize option
>>>> with clang and -loop-vectorize with opt-3.4 .
>>>>   The CSR SpMV function is inspired from
>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>> 
>>>> (I can provide the exact code samples used).
>>>> 
>>>>   Basically the problem is the loop vectorizer does NOT work
with if inside loop (be it
>>>> 2 nested loops or a modification of SpMV I did with just 1 loop
- I can provide the
>>>> exact code) changing the value of the accumulator z. I can sort
of understand why LLVM
>>>> isn't able to vectorize the code.
>>>>   However, at
http://llvm.org/docs/Vectorizers.html#if-conversion it is written:
>>>>           <<The Loop Vectorizer is able to
"flatten" the IF statement in the code and
>>>> generate a single stream of instructions.
>>>>             The Loop Vectorizer supports any control flow in
the innermost loop.
>>>>             The innermost loop may contain complex nesting of
IFs, ELSEs and even
>>>> GOTOs.>>
>>>>    Could you please tell me what are these lines exactly trying
to say.
>>>> 
>>>>   Could you please tell me what algorithm is the LLVM loop
vectorizer using (maybe the
>>>> algorithm is described in a paper) - I currently found only 2
presentations on this
>>>> topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf and
>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>> 
>>>> .
>>>> 
>>>> Thank you very much,
>>>>     Alex
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160217/6102480b/attachment.html>

Mikhail Zolotukhin via llvm-dev

2016-Feb-18 00:17 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Hi Alex,

I'm not aware of efforts on loop coalescing in LLVM, but probably polly can
do something like this. Also, one related thought: it might be worth making it a
separate pass, not a part of loop vectorizer. LLVM already has several
'utility' passes (e.g. loop rotation), which primarily aims at enabling
other passes.

Thanks,
Michael
> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com> wrote:
> 
>  Hello, Michael.
>    I come back to this older email. Sorry if you receive it again.
> 
>    I am trying to implement coalescing/collapsing of nested loops. This
would be clearly beneficial for the loop vectorizer, also.
>    I'm normally planning to start modifying the LLVM loop vectorizer to
add loop coalescing of the LLVM language.
> 
>    Are you aware of a similar effort on loop coalescing in LLVM (maybe even
a different LLVM pass, not related to the LLVM loop vectorizer)?
> 
>  Thank you,
>    Alex
> 
> On 7/9/2015 10:38 AM, RCU wrote:
>> 
>> 
>> With best regards,
>>     Alex Susu
>> 
>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>> Hi Alex,
>>> 
>>> Example from the link you provided looks like this:
>>> 
>>> |for  (i=0;  i<M;  i++  ){
>>>     z[i]=0;
>>>     for  (ckey=row_ptr[i];  ckey<row_ptr[i+1];  ckey++)  {
>>>       z[i]  +=  data[ckey]*x[colind[ckey]];
>>>     }
>>>   }|
>>> 
>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’
inside the innermost loop.
>> I tried to simplify this code in the hope the loop vectorizer can take
care of it better:
>> I linearized...
>> 
>>> But anyway, here vectorizer might have following troubles:
>>> 1) iteration count of the innermost loop is unknown.
>>> 2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions
it’s possible to generate
>>> efficient code for such case, but a) I think it’s not supported
yet, b) if this ISA isn’t
>>> available, then vectorized code would need to ‘manually’ gather
scalar values to vector,
>>> which might be slow (and thus, vectorizer might decide to leave the
code scalar).
>>> 
>>> And here is a list of papers vectorizer is based on:
>>> // The reduction-variable vectorization is based on the paper:
>>> //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
>>> //
>>> // Variable uniformity checks are inspired by:
>>> //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
>>> //
>>> // The interleaved access vectorization is based on the paper:
>>> //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
Interleaved
>>> //  Data for SIMD
>>> //
>>> // Other ideas/concepts are from:
>>> //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years
later.
>>> //
>>> //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An
Evaluation of
>>> //  Vectorizing Compilers.
>>> And probably, some of the parts are written from scratch with no
reference to a paper.
>>> 
>>> The presentations you found are a good starting point, but while
they’re still good from
>>> getting basics of the vectorizer, they are a bit outdated now in a
sense that a lot of new
>>> features has been added since then (and bugs fixed:) ). Also, I’d
recommend trying a newer
>>> LLVM version - I don’t think it’ll handle the example above, but it
would be much more
>>> convenient to investigate why the loop isn’t vectorized and fix
vectorizer if we figure
>>> out how.
>>> 
>>> Best regards,
>>> Michael
>>> 
>> 
>>   Thanks for the papers - these appear to be written in the header of
the file
>> implementing the loop vect. tranformation (found at
>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>> 
>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com> <mailto:alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>>>
>>>> wrote:
>>>> 
>>>> Hello.
>>>>   I am trying to vectorize a CSR SpMV (sparse matrix vector
multiplication) procedure
>>>> but the LLVM loop vectorizer is not able to handle such code.
>>>>   I am using cland and llvm version 3.4 (on Ubuntu 12.10). I
use the -fvectorize option
>>>> with clang and -loop-vectorize with opt-3.4 .
>>>>   The CSR SpMV function is inspired from
>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
<http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp>
>>>> 
>>>> (I can provide the exact code samples used).
>>>> 
>>>>   Basically the problem is the loop vectorizer does NOT work
with if inside loop (be it
>>>> 2 nested loops or a modification of SpMV I did with just 1 loop
- I can provide the
>>>> exact code) changing the value of the accumulator z. I can sort
of understand why LLVM
>>>> isn't able to vectorize the code.
>>>>   However, at
http://llvm.org/docs/Vectorizers.html#if-conversion
<http://llvm.org/docs/Vectorizers.html#if-conversion> it is written:
>>>>           <<The Loop Vectorizer is able to
"flatten" the IF statement in the code and
>>>> generate a single stream of instructions.
>>>>             The Loop Vectorizer supports any control flow in
the innermost loop.
>>>>             The innermost loop may contain complex nesting of
IFs, ELSEs and even
>>>> GOTOs.>>
>>>>    Could you please tell me what are these lines exactly trying
to say.
>>>> 
>>>>   Could you please tell me what algorithm is the LLVM loop
vectorizer using (maybe the
>>>> algorithm is described in a paper) - I currently found only 2
presentations on this
>>>> topic:
http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf
<http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf> and
>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
<https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf>
>>>> 
>>>> .
>>>> 
>>>> Thank you very much,
>>>>     Alex
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
<mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160217/85a2b034/attachment.html>

Renato Golin via llvm-dev

2016-Jun-04 15:57 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer

On 18 September 2015 at 00:49, Mikhail Zolotukhin via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> red.c:6:10: remark: the cost-model indicates that interleaving is not
> beneficial [-Rpass-analysis=loop-vectorize]
Try using:

#pragma clang loop vectorize(enable) interleave(enable)

just before the loop. It should force vectorization even if the cost
model doesn't like it.

This is not a final solution, but may give you a hint at what the
vectortizer thought it could do, and why it wasn't beneficial.

Try running benchmarks with the pragma on and off and see if it really
wasn't.

If the vectorized code is better, the cost model was wrong, and we
should update it.

If the vectorized code is worse, and you know of a better way, please
fill a bug in Bugzilla[1] so that we can find what the vectorizer can
do to make it work.

Of course, if you can change the vectorizer yourself and submit
patches, that'd be even better. :)

Please, provide the bug (or the patch proposals) with as much
information on costs and architecture as you can.

cheers,
--renato

[1] https://llvm.org/bugs/

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Feb 2016 - [LLVMdev] LLVM loop vectorizer

[LLVMdev] LLVM loop vectorizer

[LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Seemingly Similar Threads