Mikhail Zolotukhin via llvm-dev
2016-Feb-18 00:17 UTC
[llvm-dev] [LLVMdev] LLVM loop vectorizer
Hi Alex, I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes. Thanks, Michael> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com> wrote: > > Hello, Michael. > I come back to this older email. Sorry if you receive it again. > > I am trying to implement coalescing/collapsing of nested loops. This would be clearly beneficial for the loop vectorizer, also. > I'm normally planning to start modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language. > > Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different LLVM pass, not related to the LLVM loop vectorizer)? > > Thank you, > Alex > > On 7/9/2015 10:38 AM, RCU wrote: >> >> >> With best regards, >> Alex Susu >> >> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote: >>> Hi Alex, >>> >>> Example from the link you provided looks like this: >>> >>> |for (i=0; i<M; i++ ){ >>> z[i]=0; >>> for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; ckey++) { >>> z[i] += data[ckey]*x[colind[ckey]]; >>> } >>> }| >>> >>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the innermost loop. >> I tried to simplify this code in the hope the loop vectorizer can take care of it better: >> I linearized... >> >>> But anyway, here vectorizer might have following troubles: >>> 1) iteration count of the innermost loop is unknown. >>> 2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s possible to generate >>> efficient code for such case, but a) I think it’s not supported yet, b) if this ISA isn’t >>> available, then vectorized code would need to ‘manually’ gather scalar values to vector, >>> which might be slow (and thus, vectorizer might decide to leave the code scalar). >>> >>> And here is a list of papers vectorizer is based on: >>> // The reduction-variable vectorization is based on the paper: >>> // D. Nuzman and R. Henderson. Multi-platform Auto-vectorization. >>> // >>> // Variable uniformity checks are inspired by: >>> // Karrenberg, R. and Hack, S. Whole Function Vectorization. >>> // >>> // The interleaved access vectorization is based on the paper: >>> // Dorit Nuzman, Ira Rosen and Ayal Zaks. Auto-Vectorization of Interleaved >>> // Data for SIMD >>> // >>> // Other ideas/concepts are from: >>> // A. Zaks and D. Nuzman. Autovectorization in GCC-two years later. >>> // >>> // S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of >>> // Vectorizing Compilers. >>> And probably, some of the parts are written from scratch with no reference to a paper. >>> >>> The presentations you found are a good starting point, but while they’re still good from >>> getting basics of the vectorizer, they are a bit outdated now in a sense that a lot of new >>> features has been added since then (and bugs fixed:) ). Also, I’d recommend trying a newer >>> LLVM version - I don’t think it’ll handle the example above, but it would be much more >>> convenient to investigate why the loop isn’t vectorized and fix vectorizer if we figure >>> out how. >>> >>> Best regards, >>> Michael >>> >> >> Thanks for the papers - these appear to be written in the header of the file >> implementing the loop vect. tranformation (found at >> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ). >> >>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com> <mailto:alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com>>> >>>> wrote: >>>> >>>> Hello. >>>> I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure >>>> but the LLVM loop vectorizer is not able to handle such code. >>>> I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option >>>> with clang and -loop-vectorize with opt-3.4 . >>>> The CSR SpMV function is inspired from >>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp <http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp> >>>> >>>> (I can provide the exact code samples used). >>>> >>>> Basically the problem is the loop vectorizer does NOT work with if inside loop (be it >>>> 2 nested loops or a modification of SpMV I did with just 1 loop - I can provide the >>>> exact code) changing the value of the accumulator z. I can sort of understand why LLVM >>>> isn't able to vectorize the code. >>>> However, at http://llvm.org/docs/Vectorizers.html#if-conversion <http://llvm.org/docs/Vectorizers.html#if-conversion> it is written: >>>> <<The Loop Vectorizer is able to "flatten" the IF statement in the code and >>>> generate a single stream of instructions. >>>> The Loop Vectorizer supports any control flow in the innermost loop. >>>> The innermost loop may contain complex nesting of IFs, ELSEs and even >>>> GOTOs.>> >>>> Could you please tell me what are these lines exactly trying to say. >>>> >>>> Could you please tell me what algorithm is the LLVM loop vectorizer using (maybe the >>>> algorithm is described in a paper) - I currently found only 2 presentations on this >>>> topic: http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf <http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf> and >>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf <https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf> >>>> >>>> . >>>> >>>> Thank you very much, >>>> Alex >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160217/85a2b034/attachment.html>
Hello. Mikhail, I come back to this older thread. I need to do a few changes to LoopVectorize.cpp. One of them is related to figuring out the exact C source line and column number of the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp prints imprecise debug info for vectorized loops such as, for example, the location of a character of an assignment statement inside the respective loop. It would help me a lot in my project to find the exact C source line and column number of the first and last character of the loop being vectorized. (imprecise location would make my life more complicated). Is this feasible? Or are there limitations at the level of clang of retrieving the exact C source line and column number location of the beginning and end of a loop (it can include indent chars before and after the loop)? (I've seen other examples with imprecise location such as the "Reading diagnostics" chapter in the book https://books.google.ro/books?isbn=1782166939 .) Note: to be able to retrieve the debug info from the C source file we require to run clang with -Rpass* options, as discussed before. Otherwise, if we run clang first, then opt on the resulting .ll file which runs LoopVectorize, we lose the C source file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm -debug -mllvm -force-vector-width=16 -save-temps Thank you, Alex On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:> Hi Alex, > > I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do > something like this. Also, one related thought: it might be worth making it a separate > pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. > loop rotation), which primarily aims at enabling other passes. > > Thanks, Michael > >> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com >> <mailto:alex.e.susu at gmail.com>> wrote: >> >> Hello, Michael. I come back to this older email. Sorry if you receive it again. >> >> I am trying to implement coalescing/collapsing of nested loops. This would be >> clearly beneficial for the loop vectorizer, also. I'm normally planning to start >> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language. >> >> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different >> LLVM pass, not related to the LLVM loop vectorizer)? >> >> Thank you, Alex >> >> On 7/9/2015 10:38 AM, RCU wrote: >>> >>> >>> With best regards, Alex Susu >>> >>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote: >>>> Hi Alex, >>>> >>>> Example from the link you provided looks like this: >>>> >>>> |for (i=0; i<M; i++ ){ z[i]=0; for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; >>>> ckey++) { z[i] += data[ckey]*x[colind[ckey]]; } }| >>>> >>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the >>>> innermost loop. >>> I tried to simplify this code in the hope the loop vectorizer can take care of it >>> better: I linearized... >>> >>>> But anyway, here vectorizer might have following troubles: 1) iteration count of >>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512 set of >>>> instructions it’s possible to generate efficient code for such case, but a) I >>>> think it’s not supported yet, b) if this ISA isn’t available, then vectorized >>>> code would need to ‘manually’ gather scalar values to vector, which might be slow >>>> (and thus, vectorizer might decide to leave the code scalar). >>>> >>>> And here is a list of papers vectorizer is based on: // The reduction-variable >>>> vectorization is based on the paper: // D. Nuzman and R. Henderson. >>>> Multi-platform Auto-vectorization. // // Variable uniformity checks are inspired >>>> by: // Karrenberg, R. and Hack, S. Whole Function Vectorization. // // The >>>> interleaved access vectorization is based on the paper: // Dorit Nuzman, Ira >>>> Rosen and Ayal Zaks. Auto-Vectorization of Interleaved // Data for SIMD // // >>>> Other ideas/concepts are from: // A. Zaks and D. Nuzman. Autovectorization in >>>> GCC-two years later. // // S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. >>>> An Evaluation of // Vectorizing Compilers. And probably, some of the parts are >>>> written from scratch with no reference to a paper. >>>> >>>> The presentations you found are a good starting point, but while they’re still >>>> good from getting basics of the vectorizer, they are a bit outdated now in a >>>> sense that a lot of new features has been added since then (and bugs fixed:) ). >>>> Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the >>>> example above, but it would be much more convenient to investigate why the loop >>>> isn’t vectorized and fix vectorizer if we figure out how. >>>> >>>> Best regards, Michael >>>> >>> >>> Thanks for the papers - these appear to be written in the header of the file >>> implementing the loop vect. tranformation (found at >>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ). >>> >>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com >>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote: >>>>> >>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector >>>>> multiplication) procedure but the LLVM loop vectorizer is not able to handle >>>>> such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the >>>>> -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV >>>>> function is inspired from >>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp >>>>> >>>>> >>>>>(I can provide the exact code samples used).>>>>> >>>>> Basically the problem is the loop vectorizer does NOT work with if inside loop >>>>> (be it 2 nested loops or a modification of SpMV I did with just 1 loop - I can >>>>> provide the exact code) changing the value of the accumulator z. I can sort of >>>>> understand why LLVM isn't able to vectorize the code. However, >>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The Loop >>>>> Vectorizer is able to "flatten" the IF statement in the code and generate a >>>>> single stream of instructions. The Loop Vectorizer supports any control flow in >>>>> the innermost loop. The innermost loop may contain complex nesting of IFs, >>>>> ELSEs and even GOTOs.>> Could you please tell me what are these lines exactly >>>>> trying to say. >>>>> >>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using >>>>> (maybe the algorithm is described in a paper) - I currently found only 2 >>>>> presentations on this >>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand >>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf >>>>> >>>>> >>>>>.>>>>> >>>>> Thank you very much, Alex _______________________________________________ LLVM >>>>> Developers mailing list LLVMdev at cs.uiuc.edu >>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu >>>>> >>>>><http://llvm.cs.uiuc.edu/>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Mikhail Zolotukhin via llvm-dev
2016-Jun-04 01:28 UTC
[llvm-dev] [LLVMdev] LLVM loop vectorizer
Hi Alex, I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes. What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:) Thanks, Michael> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote: > > Hello. > Mikhail, I come back to this older thread. > I need to do a few changes to LoopVectorize.cpp. > > One of them is related to figuring out the exact C source line and column number of the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp prints imprecise debug info for vectorized loops such as, for example, the location of a character of an assignment statement inside the respective loop. > It would help me a lot in my project to find the exact C source line and column number of the first and last character of the loop being vectorized. (imprecise location would make my life more complicated). > Is this feasible? Or are there limitations at the level of clang of retrieving the exact C source line and column number location of the beginning and end of a loop (it can include indent chars before and after the loop)? > (I've seen other examples with imprecise location such as the "Reading diagnostics" chapter in the book https://books.google.ro/books?isbn=1782166939 .) > > Note: to be able to retrieve the debug info from the C source file we require to run clang with -Rpass* options, as discussed before. Otherwise, if we run clang first, then opt on the resulting .ll file which runs LoopVectorize, we lose the C source file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An example: > clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm -debug -mllvm -force-vector-width=16 -save-temps > > Thank you, > Alex > > > > On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote: >> Hi Alex, >> >> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do >> something like this. Also, one related thought: it might be worth making it a separate >> pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. >> loop rotation), which primarily aims at enabling other passes. >> >> Thanks, Michael >> >>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com >>> <mailto:alex.e.susu at gmail.com>> wrote: >>> >>> Hello, Michael. I come back to this older email. Sorry if you receive it again. >>> >>> I am trying to implement coalescing/collapsing of nested loops. This would be >>> clearly beneficial for the loop vectorizer, also. I'm normally planning to start >>> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language. >>> >>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different >>> LLVM pass, not related to the LLVM loop vectorizer)? >>> >>> Thank you, Alex >>> >>> On 7/9/2015 10:38 AM, RCU wrote: >>>> >>>> >>>> With best regards, Alex Susu >>>> >>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote: >>>>> Hi Alex, >>>>> >>>>> Example from the link you provided looks like this: >>>>> >>>>> |for (i=0; i<M; i++ ){ z[i]=0; for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; >>>>> ckey++) { z[i] += data[ckey]*x[colind[ckey]]; } }| >>>>> >>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the >>>>> innermost loop. >>>> I tried to simplify this code in the hope the loop vectorizer can take care of it >>>> better: I linearized... >>>> >>>>> But anyway, here vectorizer might have following troubles: 1) iteration count of >>>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512 set of >>>>> instructions it’s possible to generate efficient code for such case, but a) I >>>>> think it’s not supported yet, b) if this ISA isn’t available, then vectorized >>>>> code would need to ‘manually’ gather scalar values to vector, which might be slow >>>>> (and thus, vectorizer might decide to leave the code scalar). >>>>> >>>>> And here is a list of papers vectorizer is based on: // The reduction-variable >>>>> vectorization is based on the paper: // D. Nuzman and R. Henderson. >>>>> Multi-platform Auto-vectorization. // // Variable uniformity checks are inspired >>>>> by: // Karrenberg, R. and Hack, S. Whole Function Vectorization. // // The >>>>> interleaved access vectorization is based on the paper: // Dorit Nuzman, Ira >>>>> Rosen and Ayal Zaks. Auto-Vectorization of Interleaved // Data for SIMD // // >>>>> Other ideas/concepts are from: // A. Zaks and D. Nuzman. Autovectorization in >>>>> GCC-two years later. // // S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. >>>>> An Evaluation of // Vectorizing Compilers. And probably, some of the parts are >>>>> written from scratch with no reference to a paper. >>>>> >>>>> The presentations you found are a good starting point, but while they’re still >>>>> good from getting basics of the vectorizer, they are a bit outdated now in a >>>>> sense that a lot of new features has been added since then (and bugs fixed:) ). >>>>> Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the >>>>> example above, but it would be much more convenient to investigate why the loop >>>>> isn’t vectorized and fix vectorizer if we figure out how. >>>>> >>>>> Best regards, Michael >>>>> >>>> >>>> Thanks for the papers - these appear to be written in the header of the file >>>> implementing the loop vect. tranformation (found at >>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ). >>>> >>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com >>>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote: >>>>>> >>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector >>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to handle >>>>>> such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the >>>>>> -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV >>>>>> function is inspired from >>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp >>>>>> >>>>>> >>>>>> > (I can provide the exact code samples used). >>>>>> >>>>>> Basically the problem is the loop vectorizer does NOT work with if inside loop >>>>>> (be it 2 nested loops or a modification of SpMV I did with just 1 loop - I can >>>>>> provide the exact code) changing the value of the accumulator z. I can sort of >>>>>> understand why LLVM isn't able to vectorize the code. However, >>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The Loop >>>>>> Vectorizer is able to "flatten" the IF statement in the code and generate a >>>>>> single stream of instructions. The Loop Vectorizer supports any control flow in >>>>>> the innermost loop. The innermost loop may contain complex nesting of IFs, >>>>>> ELSEs and even GOTOs.>> Could you please tell me what are these lines exactly >>>>>> trying to say. >>>>>> >>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using >>>>>> (maybe the algorithm is described in a paper) - I currently found only 2 >>>>>> presentations on this >>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand >>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf >>>>>> >>>>>> >>>>>> > . >>>>>> >>>>>> Thank you very much, Alex _______________________________________________ LLVM >>>>>> Developers mailing list LLVMdev at cs.uiuc.edu >>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu >>>>>> >>>>>> > <http://llvm.cs.uiuc.edu/> >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
Hi Alex, This has been very recently fixed by Hal. See http://reviews.llvm.org/rL270771 Adam> On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello. > Mikhail, I come back to this older thread. > I need to do a few changes to LoopVectorize.cpp. > > One of them is related to figuring out the exact C source line and column number of the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp prints imprecise debug info for vectorized loops such as, for example, the location of a character of an assignment statement inside the respective loop. > It would help me a lot in my project to find the exact C source line and column number of the first and last character of the loop being vectorized. (imprecise location would make my life more complicated). > Is this feasible? Or are there limitations at the level of clang of retrieving the exact C source line and column number location of the beginning and end of a loop (it can include indent chars before and after the loop)? > (I've seen other examples with imprecise location such as the "Reading diagnostics" chapter in the book https://books.google.ro/books?isbn=1782166939 <https://books.google.ro/books?isbn=1782166939> .) > > Note: to be able to retrieve the debug info from the C source file we require to run clang with -Rpass* options, as discussed before. Otherwise, if we run clang first, then opt on the resulting .ll file which runs LoopVectorize, we lose the C source file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An example: > clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm -debug -mllvm -force-vector-width=16 -save-temps > > Thank you, > Alex > > > > On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote: >> Hi Alex, >> >> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do >> something like this. Also, one related thought: it might be worth making it a separate >> pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. >> loop rotation), which primarily aims at enabling other passes. >> >> Thanks, Michael >> >>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com> >>> <mailto:alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com>>> wrote: >>> >>> Hello, Michael. I come back to this older email. Sorry if you receive it again. >>> >>> I am trying to implement coalescing/collapsing of nested loops. This would be >>> clearly beneficial for the loop vectorizer, also. I'm normally planning to start >>> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language. >>> >>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different >>> LLVM pass, not related to the LLVM loop vectorizer)? >>> >>> Thank you, Alex >>> >>> On 7/9/2015 10:38 AM, RCU wrote: >>>> >>>> >>>> With best regards, Alex Susu >>>> >>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote: >>>>> Hi Alex, >>>>> >>>>> Example from the link you provided looks like this: >>>>> >>>>> |for (i=0; i<M; i++ ){ z[i]=0; for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; >>>>> ckey++) { z[i] += data[ckey]*x[colind[ckey]]; } }| >>>>> >>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the >>>>> innermost loop. >>>> I tried to simplify this code in the hope the loop vectorizer can take care of it >>>> better: I linearized... >>>> >>>>> But anyway, here vectorizer might have following troubles: 1) iteration count of >>>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512 set of >>>>> instructions it’s possible to generate efficient code for such case, but a) I >>>>> think it’s not supported yet, b) if this ISA isn’t available, then vectorized >>>>> code would need to ‘manually’ gather scalar values to vector, which might be slow >>>>> (and thus, vectorizer might decide to leave the code scalar). >>>>> >>>>> And here is a list of papers vectorizer is based on: // The reduction-variable >>>>> vectorization is based on the paper: // D. Nuzman and R. Henderson. >>>>> Multi-platform Auto-vectorization. // // Variable uniformity checks are inspired >>>>> by: // Karrenberg, R. and Hack, S. Whole Function Vectorization. // // The >>>>> interleaved access vectorization is based on the paper: // Dorit Nuzman, Ira >>>>> Rosen and Ayal Zaks. Auto-Vectorization of Interleaved // Data for SIMD // // >>>>> Other ideas/concepts are from: // A. Zaks and D. Nuzman. Autovectorization in >>>>> GCC-two years later. // // S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. >>>>> An Evaluation of // Vectorizing Compilers. And probably, some of the parts are >>>>> written from scratch with no reference to a paper. >>>>> >>>>> The presentations you found are a good starting point, but while they’re still >>>>> good from getting basics of the vectorizer, they are a bit outdated now in a >>>>> sense that a lot of new features has been added since then (and bugs fixed:) ). >>>>> Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the >>>>> example above, but it would be much more convenient to investigate why the loop >>>>> isn’t vectorized and fix vectorizer if we figure out how. >>>>> >>>>> Best regards, Michael >>>>> >>>> >>>> Thanks for the papers - these appear to be written in the header of the file >>>> implementing the loop vect. tranformation (found at >>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ). >>>> >>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com> >>>>>> <mailto:alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com>><mailto:alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com>>> wrote: >>>>>> >>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector >>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to handle >>>>>> such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the >>>>>> -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV >>>>>> function is inspired from >>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp <http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp> >>>>>> >>>>>> >>>>>> > (I can provide the exact code samples used). >>>>>> >>>>>> Basically the problem is the loop vectorizer does NOT work with if inside loop >>>>>> (be it 2 nested loops or a modification of SpMV I did with just 1 loop - I can >>>>>> provide the exact code) changing the value of the accumulator z. I can sort of >>>>>> understand why LLVM isn't able to vectorize the code. However, >>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit <athttp://llvm.org/docs/Vectorizers.html#if-conversionit> is written: <<The Loop >>>>>> Vectorizer is able to "flatten" the IF statement in the code and generate a >>>>>> single stream of instructions. The Loop Vectorizer supports any control flow in >>>>>> the innermost loop. The innermost loop may contain complex nesting of IFs, >>>>>> ELSEs and even GOTOs.>> Could you please tell me what are these lines exactly >>>>>> trying to say. >>>>>> >>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using >>>>>> (maybe the algorithm is described in a paper) - I currently found only 2 >>>>>> presentations on this >>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand <http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand> >>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf <https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf> >>>>>> >>>>>> >>>>>> > . >>>>>> >>>>>> Thank you very much, Alex _______________________________________________ LLVM >>>>>> Developers mailing list LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> >>>>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>><mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> >>>>>> >>>>>> > <http://llvm.cs.uiuc.edu/ <http://llvm.cs.uiuc.edu/>> >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160608/25b8f0c4/attachment-0001.html>
Possibly Parallel Threads
- [LLVMdev] LLVM loop vectorizer
- [LLVMdev] LLVM loop vectorizer
- [LLVMdev] LLVM loop vectorizer
- [GSoC] Supporting Efficiently the Shift-vector Instructions of the Connex Vector Processor
- RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)