similar to: LoopVectorize fails to vectorize code with condition on reduction

Displaying 20 results from an estimated 1000 matches similar to: "LoopVectorize fails to vectorize code with condition on reduction"

2018 Jul 07
2
LoopVectorize fails to vectorize more complex loops
Hello. Could you please tell me why the first loop of the following program (also maybe the commented loop) doesn't get vectorized with LoopVectorize (from a recent LLVM build from the SVN repository from Jun 2018)? typedef short TYPE; TYPE data[1400][1200]; void kernel_covariance(int m, int n, TYPE mean[1200]) { int i, j, k; for (j = 0; j < m; j++) { mean[j] =
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations. Thanks, Nadav On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote: > Hi, > > I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating
2017 Apr 14
2
Separate LoopVectorize LLVM pass
Hello. I am trying to create my own LoopVectorize.cpp pass as a separate pass from the LLVM trunk, as described in http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project. Did anybody try something like this? I added close to the end of the .cpp file: /* this line seems to be required - it allows to run this pass as an embedded pass by giving opt -my-loop-vectorize
2017 Jun 20
3
LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions
On 06/20/2017 03:26 AM, Hal Finkel wrote: > Hi, Adrien, Hello Hal! Thanks for your answer! > Thanks for reporting this. I recommend that you file a bug report at > https://bugs.llvm.org/ Will do! > Whenever I see reports of missed optimization opportunities in the face > of ptrtoint/inttoptr, my first question is: why are these instructions > present in the first place? At
2014 Mar 18
4
[LLVMdev] E = L->begin() in LoopVectorize
Hi, I'm studying loop vectorizer. I don't understand the code yet. But it looks not right to assign L->begin() to E. Is it a typo? Thanks, Liang diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp index 435c005..87b5d79 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@
2014 Mar 18
2
[LLVMdev] E = L->begin() in LoopVectorize
Looking at it now, curious why no tests failed. On Tue, Mar 18, 2014 at 2:48 PM, Jim Grosbach <grosbach at apple.com> wrote: > Almost certainly, yes. Nice catch! > > > On Mar 18, 2014, at 2:38 PM, Liang Wang <netcasper at gmail.com> wrote: > > > Hi, > > > > I'm studying loop vectorizer. I don't understand the code yet. But > > it
2016 Aug 21
2
LoopVectorize module - some possible enhancements
Hello, Michael, I'd like to ask if we can enhance the LoopVectorize LLVM module (I am currently using a version from Jul 2016). More exactly: - do you envision to support in the near future LLVM IR gather and scatter intrinsics (as described at http://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics and scatter)? I see you have defined some methods that should
2018 Feb 08
2
[RFC] Make LoopVectorize Aware of SLP Operations
Hi, On 08/02/2018 04:22, Caballero, Diego wrote: > Hi Florian! > > This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation
2018 Feb 08
0
[RFC] Make LoopVectorize Aware of SLP Operations
Hi Florian! This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some
2017 Jun 17
5
LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions
Hello all, There is a missing vectorization opportunity issue with clang 4.0 with the file attached. Indeed, when compiled with -O2, the "op_distance" function get vectorized, but not the "op" one. For information, this test case has been reduced from a file generated by the Pythran compiler (https://github.com/serge-sans-paille/pythran). If we take a look at the generated
2018 Feb 06
2
[RFC] Make LoopVectorize Aware of SLP Operations
Hello, We would like to propose making LoopVectorize aware of SLP operations, to improve the generated code for loops operating on struct fields or doing complex math. At the moment, LoopVectorize uses interleaving to vectorize loops that operate on values loaded/stored from consecutive addresses: vector loads/stores are generated to combine consecutive loads/stores and then shufflevector
2018 Jul 24
2
[LoopVectorizer] Improving the performance of dot product reduction loop
On 07/24/2018 02:58 AM, Nema, Ashutosh wrote: > >   > >   > > *From:*Hal Finkel <hfinkel at anl.gov> > *Sent:* Tuesday, July 24, 2018 5:05 AM > *To:* Craig Topper <craig.topper at gmail.com>; hideki.saito at intel.com; > estotzer at ti.com; Nemanja Ivanovic <nemanja.i.ibm at gmail.com>; Adam > Nemet <anemet at apple.com>; graham.hunter at
2016 Feb 18
3
[LLVMdev] LLVM loop vectorizer
Hi Alex, I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes. Thanks, Michael > On Feb 15, 2016, at 6:44 AM, RCU
2016 Jun 04
4
[LLVMdev] LLVM loop vectorizer
Hi Alex, I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes. What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:) Thanks, Michael > On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote: >
2016 Jun 07
2
[LLVMdev] LLVM loop vectorizer
Hi Alex, This has been very recently fixed by Hal. See http://reviews.llvm.org/rL270771 Adam > On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello. > Mikhail, I come back to this older thread. > I need to do a few changes to LoopVectorize.cpp. > > One of them is related to figuring out the exact C source line
2018 Jul 23
2
[LoopVectorizer] Improving the performance of dot product reduction loop
On 07/23/2018 06:23 PM, Hal Finkel via llvm-dev wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: >> Hello all, >> >> This code https://godbolt.org/g/tTyxpf is a dot product reduction >> loop multipying sign extended 16-bit values to produce a 32-bit >> accumulated result. The x86 backend is currently not able to optimize >> it as well as gcc and icc.
2018 Jul 23
3
[LoopVectorizer] Improving the performance of dot product reduction loop
Hello all, This code https://godbolt.org/g/tTyxpf is a dot product reduction loop multipying sign extended 16-bit values to produce a 32-bit accumulated result. The x86 backend is currently not able to optimize it as well as gcc and icc. The IR we are getting from the loop vectorizer has several v8i32 adds and muls inside the loop. These are fed by v8i16 loads and sexts from v8i16 to v8i32. The
2016 Aug 01
2
LLVM Loop vectorizer - 2 vector.body blocks appear
Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named