thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] LLVM loop vectorizer

If this information is useful, please help other people find it:
Share via:

Mikhail Zolotukhin via llvm-dev

2016-Jun-04 01:28 UTC

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Hi Alex,

I think the changes you want are actually not vectorizer related. Vectorizer
just uses data provided by other passes.

What you probably might want is to look into routine Loop::getStartLoc() (see
lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are
welcome:)

Thanks,
Michael
> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com>
wrote:
> 
>  Hello.
>    Mikhail, I come back to this older thread.
>    I need to do a few changes to LoopVectorize.cpp.
> 
>    One of them is related to figuring out the exact C source line and
column number of the loops being vectorized. I've noticed that a recent
version of LoopVectorize.cpp prints imprecise debug info for vectorized loops
such as, for example, the location of a character of an assignment statement
inside the respective loop.
>    It would help me a lot in my project to find the exact C source line and
column number of the first and last character of the loop being vectorized.
(imprecise location would make my life more complicated).
>    Is this feasible? Or are there limitations at the level of clang of
retrieving the exact C source line and column number location of the beginning
and end of a loop (it can include indent chars before and after the loop)?
>    (I've seen other examples with imprecise location such as the
"Reading diagnostics" chapter in the book
https://books.google.ro/books?isbn=1782166939 .)
> 
>    Note: to be able to retrieve the debug info from the C source file we
require to run clang with -Rpass* options, as discussed before. Otherwise, if we
run clang first, then opt on the resulting .ll file which runs LoopVectorize, we
lose the C source file debug info (DebugLoc class, etc) and obtain the debug
info from the .ll file. An example:
>        clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
-Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize
-mllvm -debug -mllvm -force-vector-width=16 -save-temps
> 
>  Thank you,
>    Alex
> 
> 
> 
> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>> 
>> I'm not aware of efforts on loop coalescing in LLVM, but probably
polly can do
>> something like this. Also, one related thought: it might be worth
making it a separate
>> pass, not a part of loop vectorizer. LLVM already has several
'utility' passes (e.g.
>> loop rotation), which primarily aims at enabling other passes.
>> 
>> Thanks, Michael
>> 
>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>> 
>>> Hello, Michael. I come back to this older email. Sorry if you
receive it again.
>>> 
>>> I am trying to implement coalescing/collapsing of nested loops.
This would be
>>> clearly beneficial for the loop vectorizer, also. I'm normally
planning to start
>>> modifying the LLVM loop vectorizer to add loop coalescing of the
LLVM language.
>>> 
>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe
even a different
>>> LLVM pass, not related to the LLVM loop vectorizer)?
>>> 
>>> Thank you, Alex
>>> 
>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>> 
>>>> 
>>>> With best regards, Alex Susu
>>>> 
>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>> Hi Alex,
>>>>> 
>>>>> Example from the link you provided looks like this:
>>>>> 
>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];  ckey<row_ptr[i+1];
>>>>> ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>> 
>>>>> Is it the loop you are trying to vectorize? I don’t see any
‘if’ inside the
>>>>> innermost loop.
>>>> I tried to simplify this code in the hope the loop vectorizer
can take care of it
>>>> better: I linearized...
>>>> 
>>>>> But anyway, here vectorizer might have following troubles:
1) iteration count of
>>>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]]
). With AVX512 set of
>>>>> instructions it’s possible to generate efficient code for
such case, but a) I
>>>>> think it’s not supported yet, b) if this ISA isn’t
available, then vectorized
>>>>> code would need to ‘manually’ gather scalar values to
vector, which might be slow
>>>>> (and thus, vectorizer might decide to leave the code
scalar).
>>>>> 
>>>>> And here is a list of papers vectorizer is based on: // The
reduction-variable
>>>>> vectorization is based on the paper: //  D. Nuzman and R.
Henderson.
>>>>> Multi-platform Auto-vectorization. // // Variable
uniformity checks are inspired
>>>>> by: //  Karrenberg, R. and Hack, S. Whole Function
Vectorization. // // The
>>>>> interleaved access vectorization is based on the paper: // 
Dorit Nuzman, Ira
>>>>> Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved // 
Data for SIMD // //
>>>>> Other ideas/concepts are from: //  A. Zaks and D. Nuzman.
Autovectorization in
>>>>> GCC-two years later. // //  S. Maleki, Y. Gao, M. Garzaran,
T. Wong and D. Padua.
>>>>> An Evaluation of //  Vectorizing Compilers. And probably,
some of the parts are
>>>>> written from scratch with no reference to a paper.
>>>>> 
>>>>> The presentations you found are a good starting point, but
while they’re still
>>>>> good from getting basics of the vectorizer, they are a bit
outdated now in a
>>>>> sense that a lot of new features has been added since then
(and bugs fixed:) ).
>>>>> Also, I’d recommend trying a newer LLVM version - I don’t
think it’ll handle the
>>>>> example above, but it would be much more convenient to
investigate why the loop
>>>>> isn’t vectorized and fix vectorizer if we figure out how.
>>>>> 
>>>>> Best regards, Michael
>>>>> 
>>>> 
>>>> Thanks for the papers - these appear to be written in the
header of the file
>>>> implementing the loop vect. tranformation (found at
>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>>>> 
>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at
gmail.com
>>>>>> <mailto:alex.e.susu at
gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>> 
>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse
matrix vector
>>>>>> multiplication) procedure but the LLVM loop vectorizer
is not able to handle
>>>>>> such code. I am using cland and llvm version 3.4 (on
Ubuntu 12.10). I use the
>>>>>> -fvectorize option with clang and -loop-vectorize with
opt-3.4 . The CSR SpMV
>>>>>> function is inspired from
>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>> 
>>>>>> 
>>>>>> 
> (I can provide the exact code samples used).
>>>>>> 
>>>>>> Basically the problem is the loop vectorizer does NOT
work with if inside loop
>>>>>> (be it 2 nested loops or a modification of SpMV I did
with just 1 loop - I can
>>>>>> provide the exact code) changing the value of the
accumulator z. I can sort of
>>>>>> understand why LLVM isn't able to vectorize the
code. However,
>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit
is written: <<The Loop
>>>>>> Vectorizer is able to "flatten" the IF
statement in the code and generate a
>>>>>> single stream of instructions. The Loop Vectorizer
supports any control flow in
>>>>>> the innermost loop. The innermost loop may contain
complex nesting of IFs,
>>>>>> ELSEs and even GOTOs.>> Could you please tell me
what are these lines exactly
>>>>>> trying to say.
>>>>>> 
>>>>>> Could you please tell me what algorithm is the LLVM
loop vectorizer using
>>>>>> (maybe the algorithm is described in a paper) - I
currently found only 2
>>>>>> presentations on this
>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>> 
>>>>>> 
>>>>>> 
> .
>>>>>> 
>>>>>> Thank you very much, Alex
_______________________________________________ LLVM
>>>>>> Developers mailing list LLVMdev at cs.uiuc.edu
>>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev
at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>> 
>>>>>> 
> <http://llvm.cs.uiuc.edu/>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Alex Susu via llvm-dev

2016-Jun-13 19:22 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

Hello, Mikhail.
     I'm planning to do source-to-source transformation for loop
vectorization.
     Basically I want to generate C (C++) code from C (C++) source code:
       - the code that is not vectorized remains the same - this would be simple
to
achieve if we can obtain precisely the source location of each statement;
       - the code that gets vectorized I want to translate in C code the parts
that are
sequential and generate SIMD intrinsics for my SIMD processor where normally it
would
generate vector instructions.
      I started looking at InnerLoopVectorizer::vectorize() and 
InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++ code
(with the
help of LLVM intrinsics) is not trivial, but it should be reasonably simple to
achieve.

     Would you advise for such an operation as the one described above?  I guess
doing
this as a Clang phase (working on the source code) is not really a bad idea
either, since
I would have better control on source code, but I would need to reimplement the
loop
vectorizer algorithm that is currently implemented on LLVM code.

   Thank you,
     Alex

On 6/4/2016 4:28 AM, Mikhail Zolotukhin wrote:> Hi Alex,
>
> I think the changes you want are actually not vectorizer related.
Vectorizer just uses
> data provided by other passes.
>
> What you probably might want is to look into routine Loop::getStartLoc()
(see
> lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are
welcome:)
>
> Thanks, Michael
>
>> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com>
wrote:
>>
>> Hello. Mikhail, I come back to this older thread. I need to do a few
changes to
>> LoopVectorize.cpp.
>>
>> One of them is related to figuring out the exact C source line and
column number of
>> the loops being vectorized. I've noticed that a recent version of
LoopVectorize.cpp
>> prints imprecise debug info for vectorized loops such as, for example,
the location
>> of a character of an assignment statement inside the respective loop.
It would help
>> me a lot in my project to find the exact C source line and column
number of the first
>> and last character of the loop being vectorized. (imprecise location
would make my
>> life more complicated). Is this feasible? Or are there limitations at
the level of
>> clang of retrieving the exact C source line and column number location
of the
>> beginning and end of a loop (it can include indent chars before and
after the loop)?
>> (I've seen other examples with imprecise location such as the
"Reading diagnostics"
>> chapter in the book https://books.google.ro/books?isbn=1782166939 .)
>>
>> Note: to be able to retrieve the debug info from the C source file we
require to run
>> clang with -Rpass* options, as discussed before. Otherwise, if we run
clang first,
>> then opt on the resulting .ll file which runs LoopVectorize, we lose
the C source
>> file debug info (DebugLoc class, etc) and obtain the debug info from
the .ll file. An
>> example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
>> -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm
-fvectorize -mllvm
>> -debug -mllvm -force-vector-width=16 -save-temps
>>
>> Thank you, Alex
>>
>>
>>
>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>> Hi Alex,
>>>
>>> I'm not aware of efforts on loop coalescing in LLVM, but
probably polly can do
>>> something like this. Also, one related thought: it might be worth
making it a
>>> separate pass, not a part of loop vectorizer. LLVM already has
several 'utility'
>>> passes (e.g. loop rotation), which primarily aims at enabling other
passes.
>>>
>>> Thanks, Michael
>>>
>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>
>>>> Hello, Michael. I come back to this older email. Sorry if you
receive it again.
>>>>
>>>> I am trying to implement coalescing/collapsing of nested loops.
This would be
>>>> clearly beneficial for the loop vectorizer, also. I'm
normally planning to start
>>>> modifying the LLVM loop vectorizer to add loop coalescing of
the LLVM language.
>>>>
>>>> Are you aware of a similar effort on loop coalescing in LLVM
(maybe even a
>>>> different LLVM pass, not related to the LLVM loop vectorizer)?
>>>>
>>>> Thank you, Alex
>>>>
>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>
>>>>>
>>>>> With best regards, Alex Susu
>>>>>
>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> Example from the link you provided looks like this:
>>>>>>
>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];
>>>>>> ckey<row_ptr[i+1]; ckey++)  { z[i]  += 
data[ckey]*x[colind[ckey]]; } }|
>>>>>>
>>>>>> Is it the loop you are trying to vectorize? I don’t see
any ‘if’ inside the
>>>>>> innermost loop.
>>>>> I tried to simplify this code in the hope the loop
vectorizer can take care of
>>>>> it better: I linearized...
>>>>>
>>>>>> But anyway, here vectorizer might have following
troubles: 1) iteration count
>>>>>> of the innermost loop is unknown. 2) Gather accesses (
a[b[i]] ). With AVX512
>>>>>> set of instructions it’s possible to generate efficient
code for such case,
>>>>>> but a) I think it’s not supported yet, b) if this ISA
isn’t available, then
>>>>>> vectorized code would need to ‘manually’ gather scalar
values to vector,
>>>>>> which might be slow (and thus, vectorizer might decide
to leave the code
>>>>>> scalar).
>>>>>>
>>>>>> And here is a list of papers vectorizer is based on: //
The
>>>>>> reduction-variable vectorization is based on the paper:
//  D. Nuzman and R.
>>>>>> Henderson. Multi-platform Auto-vectorization. // //
Variable uniformity
>>>>>> checks are inspired by: //  Karrenberg, R. and Hack, S.
Whole Function
>>>>>> Vectorization. // // The interleaved access
vectorization is based on the
>>>>>> paper: //  Dorit Nuzman, Ira Rosen and Ayal Zaks. 
Auto-Vectorization of
>>>>>> Interleaved //  Data for SIMD // // Other
ideas/concepts are from: //  A.
>>>>>> Zaks and D. Nuzman. Autovectorization in GCC-two years
later. // //  S.
>>>>>> Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An
Evaluation of //
>>>>>> Vectorizing Compilers. And probably, some of the parts
are written from
>>>>>> scratch with no reference to a paper.
>>>>>>
>>>>>> The presentations you found are a good starting point,
but while they’re
>>>>>> still good from getting basics of the vectorizer, they
are a bit outdated now
>>>>>> in a sense that a lot of new features has been added
since then (and bugs
>>>>>> fixed:) ). Also, I’d recommend trying a newer LLVM
version - I don’t think
>>>>>> it’ll handle the example above, but it would be much
more convenient to
>>>>>> investigate why the loop isn’t vectorized and fix
vectorizer if we figure out
>>>>>> how.
>>>>>>
>>>>>> Best regards, Michael
>>>>>>
>>>>>
>>>>> Thanks for the papers - these appear to be written in the
header of the file
>>>>> implementing the loop vect. tranformation (found at
>>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>> ).
>>>>>
>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at
gmail.com
>>>>>>> <mailto:alex.e.susu at
gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>
>>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse
matrix vector
>>>>>>> multiplication) procedure but the LLVM loop
vectorizer is not able to
>>>>>>> handle such code. I am using cland and llvm version
3.4 (on Ubuntu 12.10).
>>>>>>> I use the -fvectorize option with clang and
-loop-vectorize with opt-3.4 .
>>>>>>> The CSR SpMV function is inspired from
>>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>>
>>>>>>>
>>>>>>>
>>
>>>>>>>
(I can provide the exact code samples used).>>>>>>>
>>>>>>> Basically the problem is the loop vectorizer does
NOT work with if inside
>>>>>>> loop (be it 2 nested loops or a modification of
SpMV I did with just 1 loop
>>>>>>> - I can provide the exact code) changing the value
of the accumulator z. I
>>>>>>> can sort of understand why LLVM isn't able to
vectorize the code. However,
>>>>>>>
athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The
>>>>>>> Loop Vectorizer is able to "flatten" the
IF statement in the code and
>>>>>>> generate a single stream of instructions. The Loop
Vectorizer supports any
>>>>>>> control flow in the innermost loop. The innermost
loop may contain complex
>>>>>>> nesting of IFs, ELSEs and even GOTOs.>> Could
you please tell me what are
>>>>>>> these lines exactly trying to say.
>>>>>>>
>>>>>>> Could you please tell me what algorithm is the LLVM
loop vectorizer using
>>>>>>> (maybe the algorithm is described in a paper) - I
currently found only 2
>>>>>>> presentations on this
>>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>>
>>>>>>>
>>>>>>>
>>
>>>>>>>
.>>>>>>>
>>>>>>> Thank you very much, Alex
_______________________________________________
>>>>>>> LLVM Developers mailing list LLVMdev at cs.uiuc.edu
>>>>>>> <mailto:LLVMdev at
cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>>
>>>>>>>
>>
>>>>>>>
<http://llvm.cs.uiuc.edu/>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>
>

C Bergström via llvm-dev

2016-Jun-13 19:31 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

On Tue, Jun 14, 2016 at 3:22 AM, Alex Susu via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>   Hello, Mikhail.
>     I'm planning to do source-to-source transformation for loop
> vectorization.
>     Basically I want to generate C (C++) code from C (C++) source code:
>       - the code that is not vectorized remains the same - this would be
> simple to achieve if we can obtain precisely the source location of each
> statement;
>       - the code that gets vectorized I want to translate in C code the
> parts that are sequential and generate SIMD intrinsics for my SIMD
processor
> where normally it would generate vector instructions.
>      I started looking at InnerLoopVectorizer::vectorize() and
> InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++
> code (with the help of LLVM intrinsics) is not trivial, but it should be
> reasonably simple to achieve.
>
>     Would you advise for such an operation as the one described above?  I
> guess doing this as a Clang phase (working on the source code) is not
really
> a bad idea either, since I would have better control on source code, but I
> would need to reimplement the loop vectorizer algorithm that is currently
> implemented on LLVM code.

vectorization is a coordination from high level optimizations like
loop level stuff and low level target stuff. If you are still at the
source level, how do you plan to handle the actual lowering? In that
case you'll still always be at the mercy of another piece, which may
or may not be able to handle what you've done. (In theory your
transformation could be correct, but backend just not handle it)

Having said this - why not actually work on fixing the root of the
"problem" - that being the actual llvm passes which aren't doing
what
you need. This would also likely be more robust and you can maintain
control over the whole experiment (compilation flow)

I get really annoyed when reviewing papers from academics who have
used source-to-source because they thought it was "easier". Short term
short-cuts aren't likely going to produce novel results..

Mehdi Amini via llvm-dev

2016-Jun-13 19:34 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

> On Jun 13, 2016, at 12:22 PM, Alex Susu via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>  Hello, Mikhail.
>    I'm planning to do source-to-source transformation for loop
vectorization.
>    Basically I want to generate C (C++) code from C (C++) source code:
>      - the code that is not vectorized remains the same - this would be
simple to achieve if we can obtain precisely the source location of each
statement;
>      - the code that gets vectorized I want to translate in C code the
parts that are sequential and generate SIMD intrinsics for my SIMD processor
where normally it would generate vector instructions.
>     I started looking at InnerLoopVectorizer::vectorize() and
InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++ code
(with the help of LLVM intrinsics) is not trivial, but it should be reasonably
simple to achieve.
> 
>    Would you advise for such an operation as the one described above?  I
guess doing this as a Clang phase (working on the source code) is not really a
bad idea either, since I would have better control on source code, but I would
need to reimplement the loop vectorizer algorithm that is currently implemented
on LLVM code.

Some related work: http://llvm.org/devmtg/2013-04/krzikalla-slides.pdf

-- 
Mehdi


> 
>  Thank you,
>    Alex
> 
> On 6/4/2016 4:28 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>> 
>> I think the changes you want are actually not vectorizer related.
Vectorizer just uses
>> data provided by other passes.
>> 
>> What you probably might want is to look into routine
Loop::getStartLoc() (see
>> lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches
are welcome:)
>> 
>> Thanks, Michael
>> 
>>> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at
gmail.com> wrote:
>>> 
>>> Hello. Mikhail, I come back to this older thread. I need to do a
few changes to
>>> LoopVectorize.cpp.
>>> 
>>> One of them is related to figuring out the exact C source line and
column number of
>>> the loops being vectorized. I've noticed that a recent version
of LoopVectorize.cpp
>>> prints imprecise debug info for vectorized loops such as, for
example, the location
>>> of a character of an assignment statement inside the respective
loop. It would help
>>> me a lot in my project to find the exact C source line and column
number of the first
>>> and last character of the loop being vectorized. (imprecise
location would make my
>>> life more complicated). Is this feasible? Or are there limitations
at the level of
>>> clang of retrieving the exact C source line and column number
location of the
>>> beginning and end of a loop (it can include indent chars before and
after the loop)?
>>> (I've seen other examples with imprecise location such as the
"Reading diagnostics"
>>> chapter in the book https://books.google.ro/books?isbn=1782166939
.)
>>> 
>>> Note: to be able to retrieve the debug info from the C source file
we require to run
>>> clang with -Rpass* options, as discussed before. Otherwise, if we
run clang first,
>>> then opt on the resulting .ll file which runs LoopVectorize, we
lose the C source
>>> file debug info (DebugLoc class, etc) and obtain the debug info
from the .ll file. An
>>> example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
>>> -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm
-fvectorize -mllvm
>>> -debug -mllvm -force-vector-width=16 -save-temps
>>> 
>>> Thank you, Alex
>>> 
>>> 
>>> 
>>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>>> Hi Alex,
>>>> 
>>>> I'm not aware of efforts on loop coalescing in LLVM, but
probably polly can do
>>>> something like this. Also, one related thought: it might be
worth making it a
>>>> separate pass, not a part of loop vectorizer. LLVM already has
several 'utility'
>>>> passes (e.g. loop rotation), which primarily aims at enabling
other passes.
>>>> 
>>>> Thanks, Michael
>>>> 
>>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at
gmail.com
>>>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>> 
>>>>> Hello, Michael. I come back to this older email. Sorry if
you receive it again.
>>>>> 
>>>>> I am trying to implement coalescing/collapsing of nested
loops. This would be
>>>>> clearly beneficial for the loop vectorizer, also. I'm
normally planning to start
>>>>> modifying the LLVM loop vectorizer to add loop coalescing
of the LLVM language.
>>>>> 
>>>>> Are you aware of a similar effort on loop coalescing in
LLVM (maybe even a
>>>>> different LLVM pass, not related to the LLVM loop
vectorizer)?
>>>>> 
>>>>> Thank you, Alex
>>>>> 
>>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>> 
>>>>>> 
>>>>>> With best regards, Alex Susu
>>>>>> 
>>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>>> Hi Alex,
>>>>>>> 
>>>>>>> Example from the link you provided looks like this:
>>>>>>> 
>>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];
>>>>>>> ckey<row_ptr[i+1]; ckey++)  { z[i]  += 
data[ckey]*x[colind[ckey]]; } }|
>>>>>>> 
>>>>>>> Is it the loop you are trying to vectorize? I don’t
see any ‘if’ inside the
>>>>>>> innermost loop.
>>>>>> I tried to simplify this code in the hope the loop
vectorizer can take care of
>>>>>> it better: I linearized...
>>>>>> 
>>>>>>> But anyway, here vectorizer might have following
troubles: 1) iteration count
>>>>>>> of the innermost loop is unknown. 2) Gather
accesses ( a[b[i]] ). With AVX512
>>>>>>> set of instructions it’s possible to generate
efficient code for such case,
>>>>>>> but a) I think it’s not supported yet, b) if this
ISA isn’t available, then
>>>>>>> vectorized code would need to ‘manually’ gather
scalar values to vector,
>>>>>>> which might be slow (and thus, vectorizer might
decide to leave the code
>>>>>>> scalar).
>>>>>>> 
>>>>>>> And here is a list of papers vectorizer is based
on: // The
>>>>>>> reduction-variable vectorization is based on the
paper: //  D. Nuzman and R.
>>>>>>> Henderson. Multi-platform Auto-vectorization. // //
Variable uniformity
>>>>>>> checks are inspired by: //  Karrenberg, R. and
Hack, S. Whole Function
>>>>>>> Vectorization. // // The interleaved access
vectorization is based on the
>>>>>>> paper: //  Dorit Nuzman, Ira Rosen and Ayal Zaks. 
Auto-Vectorization of
>>>>>>> Interleaved //  Data for SIMD // // Other
ideas/concepts are from: //  A.
>>>>>>> Zaks and D. Nuzman. Autovectorization in GCC-two
years later. // //  S.
>>>>>>> Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.
An Evaluation of //
>>>>>>> Vectorizing Compilers. And probably, some of the
parts are written from
>>>>>>> scratch with no reference to a paper.
>>>>>>> 
>>>>>>> The presentations you found are a good starting
point, but while they’re
>>>>>>> still good from getting basics of the vectorizer,
they are a bit outdated now
>>>>>>> in a sense that a lot of new features has been
added since then (and bugs
>>>>>>> fixed:) ). Also, I’d recommend trying a newer LLVM
version - I don’t think
>>>>>>> it’ll handle the example above, but it would be
much more convenient to
>>>>>>> investigate why the loop isn’t vectorized and fix
vectorizer if we figure out
>>>>>>> how.
>>>>>>> 
>>>>>>> Best regards, Michael
>>>>>>> 
>>>>>> 
>>>>>> Thanks for the papers - these appear to be written in
the header of the file
>>>>>> implementing the loop vect. tranformation (found at
>>>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>>> ).
>>>>>> 
>>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU
<alex.e.susu at gmail.com
>>>>>>>> <mailto:alex.e.susu at
gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Hello. I am trying to vectorize a CSR SpMV
(sparse matrix vector
>>>>>>>> multiplication) procedure but the LLVM loop
vectorizer is not able to
>>>>>>>> handle such code. I am using cland and llvm
version 3.4 (on Ubuntu 12.10).
>>>>>>>> I use the -fvectorize option with clang and
-loop-vectorize with opt-3.4 .
>>>>>>>> The CSR SpMV function is inspired from
>>>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
> (I can provide the exact code samples used).
>>>>>>>> 
>>>>>>>> Basically the problem is the loop vectorizer
does NOT work with if inside
>>>>>>>> loop (be it 2 nested loops or a modification of
SpMV I did with just 1 loop
>>>>>>>> - I can provide the exact code) changing the
value of the accumulator z. I
>>>>>>>> can sort of understand why LLVM isn't able
to vectorize the code. However,
>>>>>>>>
athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The
>>>>>>>> Loop Vectorizer is able to "flatten"
the IF statement in the code and
>>>>>>>> generate a single stream of instructions. The
Loop Vectorizer supports any
>>>>>>>> control flow in the innermost loop. The
innermost loop may contain complex
>>>>>>>> nesting of IFs, ELSEs and even GOTOs.>>
Could you please tell me what are
>>>>>>>> these lines exactly trying to say.
>>>>>>>> 
>>>>>>>> Could you please tell me what algorithm is the
LLVM loop vectorizer using
>>>>>>>> (maybe the algorithm is described in a paper) -
I currently found only 2
>>>>>>>> presentations on this
>>>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
> .
>>>>>>>> 
>>>>>>>> Thank you very much, Alex
_______________________________________________
>>>>>>>> LLVM Developers mailing list LLVMdev at
cs.uiuc.edu
>>>>>>>> <mailto:LLVMdev at
cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
> <http://llvm.cs.uiuc.edu/>
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> 
>> 
>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mikhail Zolotukhin via llvm-dev

2016-Jun-13 23:42 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

Hi Alex,
> On Jun 13, 2016, at 12:22 PM, Alex Susu <alex.e.susu at gmail.com>
wrote:
> 
>  Hello, Mikhail.
>    I'm planning to do source-to-source transformation for loop
vectorization.Could you please share your reasoning on why you need to do it source-to-source?
While I recognize that there might be external reasons to do it, I do think that
working on IR is much easier.
>    Basically I want to generate C (C++) code from C (C++) source code:
>      - the code that is not vectorized remains the same - this would be
simple to achieve if we can obtain precisely the source location of each
statement;If you work completely in front-end, without generating IR, then yes, it's
probably true. But the most complicated part thought would be to check if
vectorization is legal. Even in IR it's not a trivial task - if you want the
same level of error-proofness as we have now, I'm afraid you'll end with
just another IR internal to your transformation. For one, think about how would
you handle memory aliasing.

If you do lower to IR first, then there is no "the code that is not
vectorized remains the same" - it's already mutated by previous passes
anyway. E.g. what if the loop was distributed/unrolled before vectorization?
>      - the code that gets vectorized I want to translate in C code the
parts that are sequential and generate SIMD intrinsics for my SIMD processor
where normally it would generate vector instructions.
>     I started looking at InnerLoopVectorizer::vectorize() and
InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++ code
(with the help of LLVM intrinsics) is not trivial, but it should be reasonably
simple to achieve.What you suggest here is like writing a C backend and teach it to generate
intrinsics for vector code (such backend existed some time ago btw). It should
be doable, but I wouldn't call it simple:)

Thanks,
Michael> 
>    Would you advise for such an operation as the one described above?  I
guess doing this as a Clang phase (working on the source code) is not really a
bad idea either, since I would have better control on source code, but I would
need to reimplement the loop vectorizer algorithm that is currently implemented
on LLVM code.
> 
>  Thank you,
>    Alex
> 
> On 6/4/2016 4:28 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>> 
>> I think the changes you want are actually not vectorizer related.
Vectorizer just uses
>> data provided by other passes.
>> 
>> What you probably might want is to look into routine
Loop::getStartLoc() (see
>> lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches
are welcome:)
>> 
>> Thanks, Michael
>> 
>>> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at
gmail.com> wrote:
>>> 
>>> Hello. Mikhail, I come back to this older thread. I need to do a
few changes to
>>> LoopVectorize.cpp.
>>> 
>>> One of them is related to figuring out the exact C source line and
column number of
>>> the loops being vectorized. I've noticed that a recent version
of LoopVectorize.cpp
>>> prints imprecise debug info for vectorized loops such as, for
example, the location
>>> of a character of an assignment statement inside the respective
loop. It would help
>>> me a lot in my project to find the exact C source line and column
number of the first
>>> and last character of the loop being vectorized. (imprecise
location would make my
>>> life more complicated). Is this feasible? Or are there limitations
at the level of
>>> clang of retrieving the exact C source line and column number
location of the
>>> beginning and end of a loop (it can include indent chars before and
after the loop)?
>>> (I've seen other examples with imprecise location such as the
"Reading diagnostics"
>>> chapter in the book https://books.google.ro/books?isbn=1782166939
.)
>>> 
>>> Note: to be able to retrieve the debug info from the C source file
we require to run
>>> clang with -Rpass* options, as discussed before. Otherwise, if we
run clang first,
>>> then opt on the resulting .ll file which runs LoopVectorize, we
lose the C source
>>> file debug info (DebugLoc class, etc) and obtain the debug info
from the .ll file. An
>>> example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
>>> -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm
-fvectorize -mllvm
>>> -debug -mllvm -force-vector-width=16 -save-temps
>>> 
>>> Thank you, Alex
>>> 
>>> 
>>> 
>>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>>> Hi Alex,
>>>> 
>>>> I'm not aware of efforts on loop coalescing in LLVM, but
probably polly can do
>>>> something like this. Also, one related thought: it might be
worth making it a
>>>> separate pass, not a part of loop vectorizer. LLVM already has
several 'utility'
>>>> passes (e.g. loop rotation), which primarily aims at enabling
other passes.
>>>> 
>>>> Thanks, Michael
>>>> 
>>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at
gmail.com
>>>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>> 
>>>>> Hello, Michael. I come back to this older email. Sorry if
you receive it again.
>>>>> 
>>>>> I am trying to implement coalescing/collapsing of nested
loops. This would be
>>>>> clearly beneficial for the loop vectorizer, also. I'm
normally planning to start
>>>>> modifying the LLVM loop vectorizer to add loop coalescing
of the LLVM language.
>>>>> 
>>>>> Are you aware of a similar effort on loop coalescing in
LLVM (maybe even a
>>>>> different LLVM pass, not related to the LLVM loop
vectorizer)?
>>>>> 
>>>>> Thank you, Alex
>>>>> 
>>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>> 
>>>>>> 
>>>>>> With best regards, Alex Susu
>>>>>> 
>>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>>> Hi Alex,
>>>>>>> 
>>>>>>> Example from the link you provided looks like this:
>>>>>>> 
>>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];
>>>>>>> ckey<row_ptr[i+1]; ckey++)  { z[i]  += 
data[ckey]*x[colind[ckey]]; } }|
>>>>>>> 
>>>>>>> Is it the loop you are trying to vectorize? I don’t
see any ‘if’ inside the
>>>>>>> innermost loop.
>>>>>> I tried to simplify this code in the hope the loop
vectorizer can take care of
>>>>>> it better: I linearized...
>>>>>> 
>>>>>>> But anyway, here vectorizer might have following
troubles: 1) iteration count
>>>>>>> of the innermost loop is unknown. 2) Gather
accesses ( a[b[i]] ). With AVX512
>>>>>>> set of instructions it’s possible to generate
efficient code for such case,
>>>>>>> but a) I think it’s not supported yet, b) if this
ISA isn’t available, then
>>>>>>> vectorized code would need to ‘manually’ gather
scalar values to vector,
>>>>>>> which might be slow (and thus, vectorizer might
decide to leave the code
>>>>>>> scalar).
>>>>>>> 
>>>>>>> And here is a list of papers vectorizer is based
on: // The
>>>>>>> reduction-variable vectorization is based on the
paper: //  D. Nuzman and R.
>>>>>>> Henderson. Multi-platform Auto-vectorization. // //
Variable uniformity
>>>>>>> checks are inspired by: //  Karrenberg, R. and
Hack, S. Whole Function
>>>>>>> Vectorization. // // The interleaved access
vectorization is based on the
>>>>>>> paper: //  Dorit Nuzman, Ira Rosen and Ayal Zaks. 
Auto-Vectorization of
>>>>>>> Interleaved //  Data for SIMD // // Other
ideas/concepts are from: //  A.
>>>>>>> Zaks and D. Nuzman. Autovectorization in GCC-two
years later. // //  S.
>>>>>>> Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.
An Evaluation of //
>>>>>>> Vectorizing Compilers. And probably, some of the
parts are written from
>>>>>>> scratch with no reference to a paper.
>>>>>>> 
>>>>>>> The presentations you found are a good starting
point, but while they’re
>>>>>>> still good from getting basics of the vectorizer,
they are a bit outdated now
>>>>>>> in a sense that a lot of new features has been
added since then (and bugs
>>>>>>> fixed:) ). Also, I’d recommend trying a newer LLVM
version - I don’t think
>>>>>>> it’ll handle the example above, but it would be
much more convenient to
>>>>>>> investigate why the loop isn’t vectorized and fix
vectorizer if we figure out
>>>>>>> how.
>>>>>>> 
>>>>>>> Best regards, Michael
>>>>>>> 
>>>>>> 
>>>>>> Thanks for the papers - these appear to be written in
the header of the file
>>>>>> implementing the loop vect. tranformation (found at
>>>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>>> ).
>>>>>> 
>>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU
<alex.e.susu at gmail.com
>>>>>>>> <mailto:alex.e.susu at
gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Hello. I am trying to vectorize a CSR SpMV
(sparse matrix vector
>>>>>>>> multiplication) procedure but the LLVM loop
vectorizer is not able to
>>>>>>>> handle such code. I am using cland and llvm
version 3.4 (on Ubuntu 12.10).
>>>>>>>> I use the -fvectorize option with clang and
-loop-vectorize with opt-3.4 .
>>>>>>>> The CSR SpMV function is inspired from
>>>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
> (I can provide the exact code samples used).
>>>>>>>> 
>>>>>>>> Basically the problem is the loop vectorizer
does NOT work with if inside
>>>>>>>> loop (be it 2 nested loops or a modification of
SpMV I did with just 1 loop
>>>>>>>> - I can provide the exact code) changing the
value of the accumulator z. I
>>>>>>>> can sort of understand why LLVM isn't able
to vectorize the code. However,
>>>>>>>>
athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The
>>>>>>>> Loop Vectorizer is able to "flatten"
the IF statement in the code and
>>>>>>>> generate a single stream of instructions. The
Loop Vectorizer supports any
>>>>>>>> control flow in the innermost loop. The
innermost loop may contain complex
>>>>>>>> nesting of IFs, ELSEs and even GOTOs.>>
Could you please tell me what are
>>>>>>>> these lines exactly trying to say.
>>>>>>>> 
>>>>>>>> Could you please tell me what algorithm is the
LLVM loop vectorizer using
>>>>>>>> (maybe the algorithm is described in a paper) -
I currently found only 2
>>>>>>>> presentations on this
>>>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
> .
>>>>>>>> 
>>>>>>>> Thank you very much, Alex
_______________________________________________
>>>>>>>> LLVM Developers mailing list LLVMdev at
cs.uiuc.edu
>>>>>>>> <mailto:LLVMdev at
cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>>> 
>>>>>>>> 
>>> 
>>>>>>>> 
> <http://llvm.cs.uiuc.edu/>
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> 
>> 
>>

llvm dev - Jun 2016 - [LLVMdev] LLVM loop vectorizer - changing vectorized code

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code