thr3ads.net - similar to: "[LLVMdev] LLVM on ARM A15"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] LLVM on ARM A15"

2013 Jan 04

[LLVMdev] LLVM on ARM A15

Hi Renato, The tests below are target independent and should pass. Did you build the x86 backend ? I think that the problem is that opt uses the triple from the LL file to initialize the backends and the cost model, and if you don't have the x86 backend then the tests fail. I will fix the tests shortly. Thanks, Nadav On Jan 4, 2013, at 3:45 PM, Renato Golin <renato.golin at

Separate LoopVectorize LLVM pass

2017 Apr 14

Separate LoopVectorize LLVM pass

Hello. I am trying to create my own LoopVectorize.cpp pass as a separate pass from the LLVM trunk, as described in http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project. Did anybody try something like this? I added close to the end of the .cpp file: /* this line seems to be required - it allows to run this pass as an embedded pass by giving opt -my-loop-vectorize

[LLVMdev] E = L->begin() in LoopVectorize

2014 Mar 18

[LLVMdev] E = L->begin() in LoopVectorize

Hi, I'm studying loop vectorizer. I don't understand the code yet. But it looks not right to assign L->begin() to E. Is it a typo? Thanks, Liang diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp index 435c005..87b5d79 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@

[LLVMdev] E = L->begin() in LoopVectorize

2014 Mar 18

[LLVMdev] E = L->begin() in LoopVectorize

Looking at it now, curious why no tests failed. On Tue, Mar 18, 2014 at 2:48 PM, Jim Grosbach <grosbach at apple.com> wrote: > Almost certainly, yes. Nice catch! > > > On Mar 18, 2014, at 2:38 PM, Liang Wang <netcasper at gmail.com> wrote: > > > Hi, > > > > I'm studying loop vectorizer. I don't understand the code yet. But > > it

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Thank you for information. I’ll build clang without the hack and re-run the benchmark tomorrow. -Evgeny From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 8:00 PM To: Evgeny Astigeevich Cc: llvm-dev; nd Subject: Re: [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines > Do you mean to

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 23

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Confirm there is no change in IR if the hack is disabled in the sources. David wrote that these instructions are created by SCEV. Are other targets affected by the changes, e.g. X86? Kind regards, Evgeny Astigeevich Senior Compiler Engineer Compilation Tools ARM From: Sanjay Patel [mailto:spatel at rotateright.com] Sent: Sunday, January 22, 2017 10:45 PM To: Evgeny Astigeevich Cc: llvm-dev; nd

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 06

[RFC] Make LoopVectorize Aware of SLP Operations

Hello, We would like to propose making LoopVectorize aware of SLP operations, to improve the generated code for loops operating on struct fields or doing complex math. At the moment, LoopVectorize uses interleaving to vectorize loops that operate on values loaded/stored from consecutive addresses: vector loads/stores are generated to combine consecutive loads/stores and then shufflevector

LLVM Loop vectorizer - 2 vector.body blocks appear

2016 Aug 01

LLVM Loop vectorizer - 2 vector.body blocks appear

Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi Sanjay, Thank you for your analysis. It’s interesting why the x86 machine is not affected. Maybe the x86 backend is smarter than the AArch64 backend, or it might be micro-architectural differences. I don’t mind to keep the changes on trunk. What I’d like to see is who will/should be involved in solving the issue. What kind of help/support is needed? Should we (ARM Compilation Tools) start

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I started looking at the log files that you attached, and I'm confused. > The code that is supposedly causing the perf regression is created by the > loop vectorizer, right? Except the bad code is not in the "vector.body", so > is there something peculiar about

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 22

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Hi Sanjay, The benchmark source file: http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/Benchmarks/Shootout/sieve.c?view=markup Clang options used to produce the initial IR: clang -DNDEBUG -O3 -DNDEBUG -mcpu=cortex-a53 -fomit-frame-pointer -O3 -DNDEBUG -w -Werror=date-time -c sieve.c -S -emit-llvm -mllvm -disable-llvm-optzns --target=aarch64-arm-linux Opt options: opt -O3

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > All targets are likely affected in some way by the icmp+shl fold introduced with r292492. It's a basic pattern that occurs in lots of code. Did you see any perf wins on your targets with this commit? > > Sadly, it is also likely that many (all?) targets are negatively

LoopVectorize module - some possible enhancements

2016 Aug 21

LoopVectorize module - some possible enhancements

Hello, Michael, I'd like to ask if we can enhance the LoopVectorize LLVM module (I am currently using a version from Jul 2016). More exactly: - do you envision to support in the near future LLVM IR gather and scatter intrinsics (as described at http://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics and scatter)? I see you have defined some methods that should

[LLVMdev] Decouple LoopVectorizer from O3

2013 Apr 11

[LLVMdev] Decouple LoopVectorizer from O3

Hi Nadav, I tried your suggestion by changing the condition to : 189 if (LoopVectorize && OptLevel >= 0) 190 MPM.add(createLoopVectorizePass()); and compiled. Then I used the following command: opt -mtriple=x86_64-linux-gnu -vectorize-loops -vectorizer-min-trip-count=6 -debug-only=loop-vectorize -O1-S -o example1_vect.s example1.s where example1.s is IR generated by clang -S

LoopVectorize fails to vectorize code with condition on reduction

2018 Jun 11

LoopVectorize fails to vectorize code with condition on reduction

Hello. I'm not able to vectorize this simple C loop doing basically what could be called predicated sum-reduction: #define NMAX 1000 int colOccupied[NMAX]; void Func(int N) { int numSol = 0; for (int c = 0; c < N; c++) { if (colOccupied[c] == 0) numSol++; } return numSol; } The compiler

6 separate instances of static getPointerOperand(). Time to consolidate?

2018 Feb 06

6 separate instances of static getPointerOperand(). Time to consolidate?

LLVM friends, I'm currently trying to make LoopVectorizationLegality class in Transform/Vectorize/LoopVectorize.cpp more modular and eventually move it to Analysis directory tree. It uses several file scope helper functions that do not really belong to LoopVectorize. Let me start from getPointerOperand(). Within LLVM, there are five other similar functions defined. I think it's time to

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

[LLVMdev] Packed instructions generaetd by LoopVectorize?

Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating

[LLVMdev] Decouple LoopVectorizer from O3

2013 Apr 11

[LLVMdev] Decouple LoopVectorizer from O3

Done. Best, Anadi. On Thu, Apr 11, 2013 at 7:01 AM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Anadi, > > Yes, this is a bug in the loop vectorizer. The loop vectorizer expects only > one loop counter (integer with step=1). There is no reason why we should > not handle the case below, and it should be easy to fix. Interestingly > enough if you reverse the order of

6 separate instances of static getPointerOperand(). Time to consolidate?

2018 Feb 06

6 separate instances of static getPointerOperand(). Time to consolidate?

What LoopVectorize.cpp has are the following. Each function may have to have a separate consolidation discussion. I'm bringing up getpointerOperand() since I actually found multiple instances defined/used. DependenceAnalysis.cpp has isLoadOrStore(). LoopAccessAnalysis.cpp has getAddressSpaceOperand(). I'm sure there are others that might be worth discussing within this thread or a follow

[LLVMdev] Decouple LoopVectorizer from O3

2013 Apr 11

[LLVMdev] Decouple LoopVectorizer from O3

Hi Anadi, Yes, this is a bug in the loop vectorizer. The loop vectorizer expects only one loop counter (integer with step=1). There is no reason why we should not handle the case below, and it should be easy to fix. Interestingly enough if you reverse the order of iterations and count from SIZE to zero, the loop vectorizer would vectorize it. If you open a bugzilla report and assign it to me

similar to: [LLVMdev] LLVM on ARM A15