thr3ads.net - llvm dev - [llvm-dev] [LLVMdev] LLVM loop vectorizer

If this information is useful, please help other people find it:
Share via:

Adam Nemet via llvm-dev

2016-Jun-07 22:29 UTC

[llvm-dev] [LLVMdev] LLVM loop vectorizer

Hi Alex,

This has been very recently fixed by Hal.  See http://reviews.llvm.org/rL270771

Adam
> On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>  Hello.
>    Mikhail, I come back to this older thread.
>    I need to do a few changes to LoopVectorize.cpp.
> 
>    One of them is related to figuring out the exact C source line and
column number of the loops being vectorized. I've noticed that a recent
version of LoopVectorize.cpp prints imprecise debug info for vectorized loops
such as, for example, the location of a character of an assignment statement
inside the respective loop.
>    It would help me a lot in my project to find the exact C source line and
column number of the first and last character of the loop being vectorized.
(imprecise location would make my life more complicated).
>    Is this feasible? Or are there limitations at the level of clang of
retrieving the exact C source line and column number location of the beginning
and end of a loop (it can include indent chars before and after the loop)?
>    (I've seen other examples with imprecise location such as the
"Reading diagnostics" chapter in the book
https://books.google.ro/books?isbn=1782166939
<https://books.google.ro/books?isbn=1782166939> .)
> 
>    Note: to be able to retrieve the debug info from the C source file we
require to run clang with -Rpass* options, as discussed before. Otherwise, if we
run clang first, then opt on the resulting .ll file which runs LoopVectorize, we
lose the C source file debug info (DebugLoc class, etc) and obtain the debug
info from the .ll file. An example:
>        clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
-Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize
-mllvm -debug -mllvm -force-vector-width=16 -save-temps
> 
>  Thank you,
>    Alex
> 
> 
> 
> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>> 
>> I'm not aware of efforts on loop coalescing in LLVM, but probably
polly can do
>> something like this. Also, one related thought: it might be worth
making it a separate
>> pass, not a part of loop vectorizer. LLVM already has several
'utility' passes (e.g.
>> loop rotation), which primarily aims at enabling other passes.
>> 
>> Thanks, Michael
>> 
>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>
>>> <mailto:alex.e.susu at gmail.com <mailto:alex.e.susu at
gmail.com>>> wrote:
>>> 
>>> Hello, Michael. I come back to this older email. Sorry if you
receive it again.
>>> 
>>> I am trying to implement coalescing/collapsing of nested loops.
This would be
>>> clearly beneficial for the loop vectorizer, also. I'm normally
planning to start
>>> modifying the LLVM loop vectorizer to add loop coalescing of the
LLVM language.
>>> 
>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe
even a different
>>> LLVM pass, not related to the LLVM loop vectorizer)?
>>> 
>>> Thank you, Alex
>>> 
>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>> 
>>>> 
>>>> With best regards, Alex Susu
>>>> 
>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>> Hi Alex,
>>>>> 
>>>>> Example from the link you provided looks like this:
>>>>> 
>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];  ckey<row_ptr[i+1];
>>>>> ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>> 
>>>>> Is it the loop you are trying to vectorize? I don’t see any
‘if’ inside the
>>>>> innermost loop.
>>>> I tried to simplify this code in the hope the loop vectorizer
can take care of it
>>>> better: I linearized...
>>>> 
>>>>> But anyway, here vectorizer might have following troubles:
1) iteration count of
>>>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]]
). With AVX512 set of
>>>>> instructions it’s possible to generate efficient code for
such case, but a) I
>>>>> think it’s not supported yet, b) if this ISA isn’t
available, then vectorized
>>>>> code would need to ‘manually’ gather scalar values to
vector, which might be slow
>>>>> (and thus, vectorizer might decide to leave the code
scalar).
>>>>> 
>>>>> And here is a list of papers vectorizer is based on: // The
reduction-variable
>>>>> vectorization is based on the paper: //  D. Nuzman and R.
Henderson.
>>>>> Multi-platform Auto-vectorization. // // Variable
uniformity checks are inspired
>>>>> by: //  Karrenberg, R. and Hack, S. Whole Function
Vectorization. // // The
>>>>> interleaved access vectorization is based on the paper: // 
Dorit Nuzman, Ira
>>>>> Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved // 
Data for SIMD // //
>>>>> Other ideas/concepts are from: //  A. Zaks and D. Nuzman.
Autovectorization in
>>>>> GCC-two years later. // //  S. Maleki, Y. Gao, M. Garzaran,
T. Wong and D. Padua.
>>>>> An Evaluation of //  Vectorizing Compilers. And probably,
some of the parts are
>>>>> written from scratch with no reference to a paper.
>>>>> 
>>>>> The presentations you found are a good starting point, but
while they’re still
>>>>> good from getting basics of the vectorizer, they are a bit
outdated now in a
>>>>> sense that a lot of new features has been added since then
(and bugs fixed:) ).
>>>>> Also, I’d recommend trying a newer LLVM version - I don’t
think it’ll handle the
>>>>> example above, but it would be much more convenient to
investigate why the loop
>>>>> isn’t vectorized and fix vectorizer if we figure out how.
>>>>> 
>>>>> Best regards, Michael
>>>>> 
>>>> 
>>>> Thanks for the papers - these appear to be written in the
header of the file
>>>> implementing the loop vect. tranformation (found at
>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>>>> 
>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at
gmail.com <mailto:alex.e.susu at gmail.com>
>>>>>> <mailto:alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>><mailto:alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>>> wrote:
>>>>>> 
>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse
matrix vector
>>>>>> multiplication) procedure but the LLVM loop vectorizer
is not able to handle
>>>>>> such code. I am using cland and llvm version 3.4 (on
Ubuntu 12.10). I use the
>>>>>> -fvectorize option with clang and -loop-vectorize with
opt-3.4 . The CSR SpMV
>>>>>> function is inspired from
>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
<http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp>
>>>>>> 
>>>>>> 
>>>>>> 
> (I can provide the exact code samples used).
>>>>>> 
>>>>>> Basically the problem is the loop vectorizer does NOT
work with if inside loop
>>>>>> (be it 2 nested loops or a modification of SpMV I did
with just 1 loop - I can
>>>>>> provide the exact code) changing the value of the
accumulator z. I can sort of
>>>>>> understand why LLVM isn't able to vectorize the
code. However,
>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit
<athttp://llvm.org/docs/Vectorizers.html#if-conversionit> is written:
<<The Loop
>>>>>> Vectorizer is able to "flatten" the IF
statement in the code and generate a
>>>>>> single stream of instructions. The Loop Vectorizer
supports any control flow in
>>>>>> the innermost loop. The innermost loop may contain
complex nesting of IFs,
>>>>>> ELSEs and even GOTOs.>> Could you please tell me
what are these lines exactly
>>>>>> trying to say.
>>>>>> 
>>>>>> Could you please tell me what algorithm is the LLVM
loop vectorizer using
>>>>>> (maybe the algorithm is described in a paper) - I
currently found only 2
>>>>>> presentations on this
>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
<http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand>
>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
<https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf>
>>>>>> 
>>>>>> 
>>>>>> 
> .
>>>>>> 
>>>>>> Thank you very much, Alex
_______________________________________________ LLVM
>>>>>> Developers mailing list LLVMdev at cs.uiuc.edu
<mailto:LLVMdev at cs.uiuc.edu>
>>>>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>><mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at
cs.uiuc.edu>>http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>>>>>> 
>>>>>> 
> <http://llvm.cs.uiuc.edu/ <http://llvm.cs.uiuc.edu/>>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/25b8f0c4/attachment-0001.html>

Alex Susu via llvm-dev

2016-Aug-12 09:43 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer - start and end locations

Hello.
      Hal, Adam, thank you very much for the fix mentioned. I ran an opt built
with this
fix and I got the precise start loop location.
      I am interested in getting both exact start and end locations for a loop
in order to
replace the loop with a different content in the source file (basically perform
a rather
non-standard source-to-source transformation).

     I've tried to compute the end location for the loop by
"parsing" the file (looking at
each character) at least from the start location, but this can be quite complex
for nested
blocks in the loop, etc.

     Also, I've tried to get more information from the LLVM IR instructions:
        - Loop::getUniqueExitBlock()::front()::getDebugLoc() returns the first
statement
after the loop. But, in the case of a nested loop the first (and last) statement
after the
loop is the "increment" statement in the outer enclosing loop. So,
even if for simple
loops getUniqueExitBlock etc looks promising, this is still not great.
        - I also iterated through all the statements of all basic-blocks of the
loop (used
Loop::block_begin() and block_end(), etc). From these I can choose the the min
and max
locations. This is not great either because the loops can contain comments
before the
final "}" of the loop (if there is one) and this would result in
imprecise end location -
most importantly the "}" of the loop basically does not have a
corresponding LLVM IR
instruction. Of course "parsing" to the right of the max location
found above for an
uncommented "}" is not very difficult.

     I could also try to get more information from the AST of Clang while being
in the opt
tool, but I don't know how to read it - maybe I could use Libtooling. Do you
have an idea
here?
     Can I get the corresponding AST node from an LLVM IR instruction? (Here I
got an
interesting pointer: 
http://clang-developers.42468.n3.nabble.com/Matching-Clang-s-AST-nodes-to-the-LLVM-IR-instructions-they-produced-td3665037.html,
but maybe it is outdated)

     Thank you,
       Alex


On 6/8/2016 1:29 AM, Adam Nemet wrote:> Hi Alex,
>
> This has been very recently fixed by Hal.  See
http://reviews.llvm.org/rL270771
>
> Adam
>
>> On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at
lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>  Hello.
>>    Mikhail, I come back to this older thread.
>>    I need to do a few changes to LoopVectorize.cpp.
>>
>>    One of them is related to figuring out the exact C source line and
column number of
>> the loops being vectorized. I've noticed that a recent version of
LoopVectorize.cpp
>> prints imprecise debug info for vectorized loops such as, for example,
the location of a
>> character of an assignment statement inside the respective loop.
>>    It would help me a lot in my project to find the exact C source line
and column
>> number of the first and last character of the loop being vectorized.
(imprecise location
>> would make my life more complicated).
>>    Is this feasible? Or are there limitations at the level of clang of
retrieving the
>> exact C source line and column number location of the beginning and end
of a loop (it
>> can include indent chars before and after the loop)?
>>    (I've seen other examples with imprecise location such as the
"Reading diagnostics"
>> chapter in the book https://books.google.ro/books?isbn=1782166939.)
>>
>>    Note: to be able to retrieve the debug info from the C source file
we require to run
>> clang with -Rpass* options, as discussed before. Otherwise, if we run
clang first, then
>> opt on the resulting .ll file which runs LoopVectorize, we lose the C
source file debug
>> info (DebugLoc class, etc) and obtain the debug info from the .ll file.
An example:
>>        clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
-Rpass=loop-vectorize
>> -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm -debug
-mllvm
>> -force-vector-width=16 -save-temps
>>
>>  Thank you,
>>    Alex
>>
>>
>>
>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>> Hi Alex,
>>>
>>> I'm not aware of efforts on loop coalescing in LLVM, but
probably polly can do
>>> something like this. Also, one related thought: it might be worth
making it a separate
>>> pass, not a part of loop vectorizer. LLVM already has several
'utility' passes (e.g.
>>> loop rotation), which primarily aims at enabling other passes.
>>>
>>> Thanks, Michael
>>>
>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
<mailto:alex.e.susu at gmail.com>
>>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>
>>>> Hello, Michael. I come back to this older email. Sorry if you
receive it again.
>>>>
>>>> I am trying to implement coalescing/collapsing of nested loops.
This would be
>>>> clearly beneficial for the loop vectorizer, also. I'm
normally planning to start
>>>> modifying the LLVM loop vectorizer to add loop coalescing of
the LLVM language.
>>>>
>>>> Are you aware of a similar effort on loop coalescing in LLVM
(maybe even a different
>>>> LLVM pass, not related to the LLVM loop vectorizer)?
>>>>
>>>> Thank you, Alex
>>>>
>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>
>>>>>
>>>>> With best regards, Alex Susu
>>>>>
>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> Example from the link you provided looks like this:
>>>>>>
>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];  ckey<row_ptr[i+1];
>>>>>> ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>>>
>>>>>> Is it the loop you are trying to vectorize? I don’t see
any ‘if’ inside the
>>>>>> innermost loop.
>>>>> I tried to simplify this code in the hope the loop
vectorizer can take care of it
>>>>> better: I linearized...
>>>>>
>>>>>> But anyway, here vectorizer might have following
troubles: 1) iteration count of
>>>>>> the innermost loop is unknown. 2) Gather accesses (
a[b[i]] ). With AVX512 set of
>>>>>> instructions it’s possible to generate efficient code
for such case, but a) I
>>>>>> think it’s not supported yet, b) if this ISA isn’t
available, then vectorized
>>>>>> code would need to ‘manually’ gather scalar values to
vector, which might be slow
>>>>>> (and thus, vectorizer might decide to leave the code
scalar).
>>>>>>
>>>>>> And here is a list of papers vectorizer is based on: //
The reduction-variable
>>>>>> vectorization is based on the paper: //  D. Nuzman and
R. Henderson.
>>>>>> Multi-platform Auto-vectorization. // // Variable
uniformity checks are inspired
>>>>>> by: //  Karrenberg, R. and Hack, S. Whole Function
Vectorization. // // The
>>>>>> interleaved access vectorization is based on the paper:
//  Dorit Nuzman, Ira
>>>>>> Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved
//  Data for SIMD // //
>>>>>> Other ideas/concepts are from: //  A. Zaks and D.
Nuzman. Autovectorization in
>>>>>> GCC-two years later. // //  S. Maleki, Y. Gao, M.
Garzaran, T. Wong and D. Padua.
>>>>>> An Evaluation of //  Vectorizing Compilers. And
probably, some of the parts are
>>>>>> written from scratch with no reference to a paper.
>>>>>>
>>>>>> The presentations you found are a good starting point,
but while they’re still
>>>>>> good from getting basics of the vectorizer, they are a
bit outdated now in a
>>>>>> sense that a lot of new features has been added since
then (and bugs fixed:) ).
>>>>>> Also, I’d recommend trying a newer LLVM version - I
don’t think it’ll handle the
>>>>>> example above, but it would be much more convenient to
investigate why the loop
>>>>>> isn’t vectorized and fix vectorizer if we figure out
how.
>>>>>>
>>>>>> Best regards, Michael
>>>>>>
>>>>>
>>>>> Thanks for the papers - these appear to be written in the
header of the file
>>>>> implementing the loop vect. tranformation (found at
>>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
).
>>>>>
>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at
gmail.com <mailto:alex.e.susu at gmail.com>
>>>>>>> <mailto:alex.e.susu at
gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>
>>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse
matrix vector
>>>>>>> multiplication) procedure but the LLVM loop
vectorizer is not able to handle
>>>>>>> such code. I am using cland and llvm version 3.4
(on Ubuntu 12.10). I use the
>>>>>>> -fvectorize option with clang and -loop-vectorize
with opt-3.4 . The CSR SpMV
>>>>>>> function is inspired from
>>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>>
>>>>>>>
>>>>>>>
>> (I can provide the exact code samples used).
>>>>>>>
>>>>>>> Basically the problem is the loop vectorizer does
NOT work with if inside loop
>>>>>>> (be it 2 nested loops or a modification of SpMV I
did with just 1 loop - I can
>>>>>>> provide the exact code) changing the value of the
accumulator z. I can sort of
>>>>>>> understand why LLVM isn't able to vectorize the
code. However,
>>>>>>>
athttp://llvm.org/docs/Vectorizers.html#if-conversionitis written: <<The
Loop
>>>>>>> Vectorizer is able to "flatten" the IF
statement in the code and generate a
>>>>>>> single stream of instructions. The Loop Vectorizer
supports any control flow in
>>>>>>> the innermost loop. The innermost loop may contain
complex nesting of IFs,
>>>>>>> ELSEs and even GOTOs.>> Could you please tell
me what are these lines exactly
>>>>>>> trying to say.
>>>>>>>
>>>>>>> Could you please tell me what algorithm is the LLVM
loop vectorizer using
>>>>>>> (maybe the algorithm is described in a paper) - I
currently found only 2
>>>>>>> presentations on this
>>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>>
<http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand>
>>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>>
>>>>>>>
>>>>>>>
>> .
>>>>>>>
>>>>>>> Thank you very much, Alex
_______________________________________________ LLVM
>>>>>>> Developers mailing listLLVMdev at cs.uiuc.edu
<mailto:LLVMdev at cs.uiuc.edu>
>>>>>>> <mailto:LLVMdev at
cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>> <http://llvm.cs.uiuc.edu/>
>>>>>>>
>>>>>>>
>> <http://llvm.cs.uiuc.edu/>
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Hal Finkel via llvm-dev

2016-Aug-12 22:52 UTC

head link

[llvm-dev] [LLVMdev] LLVM loop vectorizer - start and end locations

Hi Alex,

If you want to get both the starting and ending locations, I think your best bet
is to enhance Clang to insert into the loop metadata, not just the location of
the start of the loop, but also the location of the end of the loop. Then you
can grab that in the backend.

What's your use case for this exactly?

 -Hal

----- Original Message -----> From: "Alex Susu" <alex.e.susu at gmail.com>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Cc: "Adam Nemet" <anemet at apple.com>, "Hal
Finkel" <hfinkel at anl.gov>
> Sent: Friday, August 12, 2016 4:43:27 AM
> Subject: Re: [llvm-dev] [LLVMdev] LLVM loop vectorizer - start and end
locations
> 
> Hello.
>       Hal, Adam, thank you very much for the fix mentioned. I ran an
>       opt built with this
> fix and I got the precise start loop location.
>       I am interested in getting both exact start and end locations
>       for a loop in order to
> replace the loop with a different content in the source file
> (basically perform a rather
> non-standard source-to-source transformation).
> 
>      I've tried to compute the end location for the loop by
"parsing"
>      the file (looking at
> each character) at least from the start location, but this can be
> quite complex for nested
> blocks in the loop, etc.
> 
>      Also, I've tried to get more information from the LLVM IR
>      instructions:
>         - Loop::getUniqueExitBlock()::front()::getDebugLoc() returns
>         the first statement
> after the loop. But, in the case of a nested loop the first (and
> last) statement after the
> loop is the "increment" statement in the outer enclosing loop.
So,
> even if for simple
> loops getUniqueExitBlock etc looks promising, this is still not
> great.
>         - I also iterated through all the statements of all
>         basic-blocks of the loop (used
> Loop::block_begin() and block_end(), etc). From these I can choose
> the the min and max
> locations. This is not great either because the loops can contain
> comments before the
> final "}" of the loop (if there is one) and this would result in
> imprecise end location -
> most importantly the "}" of the loop basically does not have a
> corresponding LLVM IR
> instruction. Of course "parsing" to the right of the max location
> found above for an
> uncommented "}" is not very difficult.
> 
>      I could also try to get more information from the AST of Clang
>      while being in the opt
> tool, but I don't know how to read it - maybe I could use Libtooling.
> Do you have an idea
> here?
>      Can I get the corresponding AST node from an LLVM IR
>      instruction? (Here I got an
> interesting pointer:
>
http://clang-developers.42468.n3.nabble.com/Matching-Clang-s-AST-nodes-to-the-LLVM-IR-instructions-they-produced-td3665037.html,
> but maybe it is outdated)
> 
>      Thank you,
>        Alex
> 
> 
> On 6/8/2016 1:29 AM, Adam Nemet wrote:
> > Hi Alex,
> >
> > This has been very recently fixed by Hal.  See
> > http://reviews.llvm.org/rL270771
> >
> > Adam
> >
> >> On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev
> >> <llvm-dev at lists.llvm.org
> >> <mailto:llvm-dev at lists.llvm.org>> wrote:
> >>
> >>  Hello.
> >>    Mikhail, I come back to this older thread.
> >>    I need to do a few changes to LoopVectorize.cpp.
> >>
> >>    One of them is related to figuring out the exact C source line
> >>    and column number of
> >> the loops being vectorized. I've noticed that a recent version
of
> >> LoopVectorize.cpp
> >> prints imprecise debug info for vectorized loops such as, for
> >> example, the location of a
> >> character of an assignment statement inside the respective loop.
> >>    It would help me a lot in my project to find the exact C source
> >>    line and column
> >> number of the first and last character of the loop being
> >> vectorized. (imprecise location
> >> would make my life more complicated).
> >>    Is this feasible? Or are there limitations at the level of
> >>    clang of retrieving the
> >> exact C source line and column number location of the beginning
> >> and end of a loop (it
> >> can include indent chars before and after the loop)?
> >>    (I've seen other examples with imprecise location such as
the
> >>    "Reading diagnostics"
> >> chapter in the book
> >> https://books.google.ro/books?isbn=1782166939.)
> >>
> >>    Note: to be able to retrieve the debug info from the C source
> >>    file we require to run
> >> clang with -Rpass* options, as discussed before. Otherwise, if we
> >> run clang first, then
> >> opt on the resulting .ll file which runs LoopVectorize, we lose
> >> the C source file debug
> >> info (DebugLoc class, etc) and obtain the debug info from the .ll
> >> file. An example:
> >>        clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
> >>        -Rpass=loop-vectorize
> >> -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm
> >> -debug -mllvm
> >> -force-vector-width=16 -save-temps
> >>
> >>  Thank you,
> >>    Alex
> >>
> >>
> >>
> >> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
> >>> Hi Alex,
> >>>
> >>> I'm not aware of efforts on loop coalescing in LLVM, but
probably
> >>> polly can do
> >>> something like this. Also, one related thought: it might be
worth
> >>> making it a separate
> >>> pass, not a part of loop vectorizer. LLVM already has several
> >>> 'utility' passes (e.g.
> >>> loop rotation), which primarily aims at enabling other passes.
> >>>
> >>> Thanks, Michael
> >>>
> >>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at
gmail.com
> >>>> <mailto:alex.e.susu at gmail.com>
> >>>> <mailto:alex.e.susu at gmail.com>> wrote:
> >>>>
> >>>> Hello, Michael. I come back to this older email. Sorry if
you
> >>>> receive it again.
> >>>>
> >>>> I am trying to implement coalescing/collapsing of nested
loops.
> >>>> This would be
> >>>> clearly beneficial for the loop vectorizer, also. I'm
normally
> >>>> planning to start
> >>>> modifying the LLVM loop vectorizer to add loop coalescing
of the
> >>>> LLVM language.
> >>>>
> >>>> Are you aware of a similar effort on loop coalescing in
LLVM
> >>>> (maybe even a different
> >>>> LLVM pass, not related to the LLVM loop vectorizer)?
> >>>>
> >>>> Thank you, Alex
> >>>>
> >>>> On 7/9/2015 10:38 AM, RCU wrote:
> >>>>>
> >>>>>
> >>>>> With best regards, Alex Susu
> >>>>>
> >>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
> >>>>>> Hi Alex,
> >>>>>>
> >>>>>> Example from the link you provided looks like
this:
> >>>>>>
> >>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for 
(ckey=row_ptr[i];
> >>>>>> | ckey<row_ptr[i+1];
> >>>>>> ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; }
}|
> >>>>>>
> >>>>>> Is it the loop you are trying to vectorize? I
don’t see any
> >>>>>> ‘if’ inside the
> >>>>>> innermost loop.
> >>>>> I tried to simplify this code in the hope the loop
vectorizer
> >>>>> can take care of it
> >>>>> better: I linearized...
> >>>>>
> >>>>>> But anyway, here vectorizer might have following
troubles: 1)
> >>>>>> iteration count of
> >>>>>> the innermost loop is unknown. 2) Gather accesses
( a[b[i]] ).
> >>>>>> With AVX512 set of
> >>>>>> instructions it’s possible to generate efficient
code for such
> >>>>>> case, but a) I
> >>>>>> think it’s not supported yet, b) if this ISA isn’t
available,
> >>>>>> then vectorized
> >>>>>> code would need to ‘manually’ gather scalar values
to vector,
> >>>>>> which might be slow
> >>>>>> (and thus, vectorizer might decide to leave the
code scalar).
> >>>>>>
> >>>>>> And here is a list of papers vectorizer is based
on: // The
> >>>>>> reduction-variable
> >>>>>> vectorization is based on the paper: //  D. Nuzman
and R.
> >>>>>> Henderson.
> >>>>>> Multi-platform Auto-vectorization. // // Variable
uniformity
> >>>>>> checks are inspired
> >>>>>> by: //  Karrenberg, R. and Hack, S. Whole Function
> >>>>>> Vectorization. // // The
> >>>>>> interleaved access vectorization is based on the
paper: //
> >>>>>>  Dorit Nuzman, Ira
> >>>>>> Rosen and Ayal Zaks.  Auto-Vectorization of
Interleaved //
> >>>>>>  Data for SIMD // //
> >>>>>> Other ideas/concepts are from: //  A. Zaks and D.
Nuzman.
> >>>>>> Autovectorization in
> >>>>>> GCC-two years later. // //  S. Maleki, Y. Gao, M.
Garzaran, T.
> >>>>>> Wong and D. Padua.
> >>>>>> An Evaluation of //  Vectorizing Compilers. And
probably, some
> >>>>>> of the parts are
> >>>>>> written from scratch with no reference to a paper.
> >>>>>>
> >>>>>> The presentations you found are a good starting
point, but
> >>>>>> while they’re still
> >>>>>> good from getting basics of the vectorizer, they
are a bit
> >>>>>> outdated now in a
> >>>>>> sense that a lot of new features has been added
since then
> >>>>>> (and bugs fixed:) ).
> >>>>>> Also, I’d recommend trying a newer LLVM version -
I don’t
> >>>>>> think it’ll handle the
> >>>>>> example above, but it would be much more
convenient to
> >>>>>> investigate why the loop
> >>>>>> isn’t vectorized and fix vectorizer if we figure
out how.
> >>>>>>
> >>>>>> Best regards, Michael
> >>>>>>
> >>>>>
> >>>>> Thanks for the papers - these appear to be written in
the
> >>>>> header of the file
> >>>>> implementing the loop vect. tranformation (found at
> >>>>>
"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
> >>>>> ).
> >>>>>
> >>>>>>> On Jul 8, 2015, at 10:01 AM, RCU
<alex.e.susu at gmail.com
> >>>>>>> <mailto:alex.e.susu at gmail.com>
> >>>>>>> <mailto:alex.e.susu at
gmail.com><mailto:alex.e.susu at gmail.com>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hello. I am trying to vectorize a CSR SpMV
(sparse matrix
> >>>>>>> vector
> >>>>>>> multiplication) procedure but the LLVM loop
vectorizer is not
> >>>>>>> able to handle
> >>>>>>> such code. I am using cland and llvm version
3.4 (on Ubuntu
> >>>>>>> 12.10). I use the
> >>>>>>> -fvectorize option with clang and
-loop-vectorize with
> >>>>>>> opt-3.4 . The CSR SpMV
> >>>>>>> function is inspired from
> >>>>>>>
http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >> (I can provide the exact code samples used).
> >>>>>>>
> >>>>>>> Basically the problem is the loop vectorizer
does NOT work
> >>>>>>> with if inside loop
> >>>>>>> (be it 2 nested loops or a modification of
SpMV I did with
> >>>>>>> just 1 loop - I can
> >>>>>>> provide the exact code) changing the value of
the accumulator
> >>>>>>> z. I can sort of
> >>>>>>> understand why LLVM isn't able to
vectorize the code.
> >>>>>>> However,
> >>>>>>>
athttp://llvm.org/docs/Vectorizers.html#if-conversionitis
> >>>>>>> written: <<The Loop
> >>>>>>> Vectorizer is able to "flatten" the
IF statement in the code
> >>>>>>> and generate a
> >>>>>>> single stream of instructions. The Loop
Vectorizer supports
> >>>>>>> any control flow in
> >>>>>>> the innermost loop. The innermost loop may
contain complex
> >>>>>>> nesting of IFs,
> >>>>>>> ELSEs and even GOTOs.>> Could you please
tell me what are
> >>>>>>> these lines exactly
> >>>>>>> trying to say.
> >>>>>>>
> >>>>>>> Could you please tell me what algorithm is the
LLVM loop
> >>>>>>> vectorizer using
> >>>>>>> (maybe the algorithm is described in a paper)
- I currently
> >>>>>>> found only 2
> >>>>>>> presentations on this
> >>>>>>>
topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
> >>>>>>>
<http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand>
> >>>>>>>
https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >> .
> >>>>>>>
> >>>>>>> Thank you very much, Alex
> >>>>>>>
_______________________________________________ LLVM
> >>>>>>> Developers mailing listLLVMdev at cs.uiuc.edu
> >>>>>>> <mailto:LLVMdev at cs.uiuc.edu>
> >>>>>>> <mailto:LLVMdev at
cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
> >>>>>>> <http://llvm.cs.uiuc.edu/>
> >>>>>>>
> >>>>>>>
> >> <http://llvm.cs.uiuc.edu/>
> >>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Aug 2016 - [LLVMdev] LLVM loop vectorizer - start and end locations

[llvm-dev] [LLVMdev] LLVM loop vectorizer

[llvm-dev] [LLVMdev] LLVM loop vectorizer - start and end locations

[llvm-dev] [LLVMdev] LLVM loop vectorizer - start and end locations