thr3ads.net - similar to: "LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target"

Displaying 20 results from an estimated 4000 matches similar to: "LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target"

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

Tried a bunch of them there (x86-64, haswell, znver2) and they all defaulted to 4-wide - haswell additionally caused some extra loop unrolling but still with 8-wide pows. Cheers, -Neil. On Thu, Jul 16, 2020 at 2:39 PM Roman Lebedev <lebedev.ri at gmail.com> wrote: > Did you specify the target CPU the code should be optimized for? > For clang that is -march=native/znver2/... /

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

So for us we use SLEEF to actually implement the libcalls (LLVM intrinsics) that LLVM by default would generate - and since SLEEF has highly optimal 8-wide pow, optimized for AVX and AVX2, we really want to use that. So we would not see 4/8 libcalls and instead see 1 call to something that lights up the ymm registers. I guess the problem then is that the default expectation is that pow would be

Help with SROA throwing away no-alias information

2020 Jan 17

Help with SROA throwing away no-alias information

I'm having an issue where SROA will throw away no-alias information on some loads after inlining, because the loads are derived from a store to an alloca which can be removed after inlining. The pointers that were originally stored into the alloca do *not *have any aliasing information - the only context that allowed me to assert aliasing was that the inlined-function guaranteed it to be so.

[RFC] `opt-out` attribute list for intrinsics

2020 Jun 24

[RFC] `opt-out` attribute list for intrinsics

Hi all, A while back we started annotating intrinsics with new attributes ( https://reviews.llvm.org/D65377) After some discussion it was decided it would be good to have an `opt-out` attribute list for intrinsics. Some attributes that can be added to the list could be: nosync, nofree, nounwind, willreturn For now, there are 2 approaches: 1. Filtering opt-out attributes in tablegen source (

loop unrolling introduces conditional branch

2015 Aug 20

loop unrolling introduces conditional branch

Hi, I want to use loop unrolling pass, however, I find that loop unrolling will introduces conditional branch at end of every "unrolled" part. For example, consider the following code *void foo( int n, int array_x[])* *{* * for (int i=0; i < n; i++)* * array_x[i] = i; * *}* Then I use this command "opt-3.5 try.bc -mem2reg -loops -loop-simplify -loop-rotate -lcssa

[LLVMdev] Preserving NSW/NUW bits

2014 Sep 02

[LLVMdev] Preserving NSW/NUW bits

David/All, Just a quick question about NSW/NUW bits, if you've got a second. I noticed you've been doing a little work on this as of late. I have a bit of code that looks like the following: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %2 = add i64 %indvars.iv.next, -1 %tmp = trunc i64 %2 to i32 %cmp = icmp slt i32 %tmp, %0 br i1 %cmp, label %for.body, label

loop unrolling introduces conditional branch

2015 Aug 22

loop unrolling introduces conditional branch

Hi, Mehdi, For example, I have this very simple source code: void foo( int n, int array_x[]) { for (int i=0; i < n; i++) array_x[i] = i; } After I use "clang -emit-llvm -o bc_from_clang.bc -c try.cc", I get bc_from_clang.bc. With my code (using LLVM IRbuilder API), I get bc_from_api.bc. Attachment please find thse two files. I also past the IR here.

[LLVMdev] Loops Prevent Function Pointer Inlining?

2014 Sep 24

[LLVMdev] Loops Prevent Function Pointer Inlining?

I've CC'ed Chad Rosier as I think this behaviour is a side-effect of his revert of IndVarSimplify.cpp (git c6b1a7e577a0b9e9cff9f9b7ac35a2cde7c448d8, SVN 217962). The change basically makes the IndVar pass change: ; <label>:4 ; preds = %6, %0 %i.0 = phi i32 [ 0, %0 ], [ %11, %6 ] %5 = icmp eq i32 %i.0, 0 br i1 %5, label %6, label %17 To:

MemorySSA question

2017 Dec 19

MemorySSA question

Hi, I am new to MemorySSA and wanted to understand its capabilities. Hence I wrote the following program (test.c): int N; void test(int *restrict a, int *restrict b, int *restrict c, int *restrict d, int *restrict e) { int i; for (i = 0; i < N; i = i + 5) { a[i] = b[i] + c[i]; } for (i = 0; i < N - 5; i = i + 5) { e[i] = a[i] * d[i]; } } I compiled this program using

loop unrolling introduces conditional branch

2015 Aug 20

loop unrolling introduces conditional branch

Hi Xiangyang, The algorithm for loop unrolling was changed post-3.5 to do more what you'd expect. If you use 3.6 or 3.7 you'll likely get better results. Cheers, James On Thu, 20 Aug 2015 at 18:09 Philip Reames via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On 08/20/2015 07:38 AM, Xiangyang Guo via llvm-dev wrote: > > Hi, > > I want to use loop unrolling

loop unrolling introduces conditional branch

2015 Aug 22

loop unrolling introduces conditional branch

Thanks for your point that out. I just add DataLayout in my code such as "mod->setDataLayout("e-m:e-i64:64-f80:128-n8:16:32:64-S128");", still no luck. I'm really confused about this. Do I need to add more passes before -loop-unroll? On Sat, Aug 22, 2015 at 11:36 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Aug 22, 2015, at 7:27 AM, Xiangyang

[RFC] A nofree (and nosynch) function attribute: Mixing dereferenceable and delete

2018 Jul 11

[RFC] A nofree (and nosynch) function attribute: Mixing dereferenceable and delete

Hi, everyone, I'd like to propose adding a nofree function attribute to indicate that a function does not, directly or indirectly, call a memory-deallocation function (e.g., free, C++'s operator delete). Clang/LLVM can currently misoptimize functions that: 1. Have a reference argument. 2. Free the memory backing the object to which the reference is bound during the function's

loop unrolling introduces conditional branch

2015 Aug 22

loop unrolling introduces conditional branch

Hi, I just tried llvm-3.8 (LLVM SVN Repository). With this version, -fno-rtti can help me to compile my code and -irce can help me to do a better job for loop unrolling. However, I still have one question. If I use Clang to compile a piece of c++ code to .bc and then use 'opt -loop-rotate -loop-unroll -irce', I can get what I want. I mean, there is no conditional branch at the end of each

[LLVMdev] -indvars issues?

2012 Mar 08

[LLVMdev] -indvars issues?

Hi, Is the -indvars pass functional? I've done some small test to check it, but this fails to canonicalize: > int *x; > int *y; > int i; > ... > for (i = 1; i < 100; i+=2) { > x[i] = y[i] + 3; > } The IR produced after -indvars: > br label %for.cond > > for.cond: ; preds = %for.inc, %entry > %indvars.iv =

loop unrolling introduces conditional branch

2015 Aug 21

loop unrolling introduces conditional branch

There's been some recent noise on the mailing list about requiring -fno-rtti; http://lists.llvm.org/pipermail/llvm-dev/2015-August/089010.html Could that be it? On Sat, Aug 22, 2015 at 12:21 AM, Xiangyang Guo via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, James and Philip, Thanks for your help. > > Based on your advice, I downloaded llvm-3.7. However, with this new

Instrument intrinsic invalid

2020 Jul 19

Instrument intrinsic invalid

Hi, I try to use llvm-dis to disassemble the result after opt, my pass will add a intrinsic after the load instruction, like following: bool fpscan::ldAddMetadata (Instruction *Inst, StringRef c) { std::vector<Metadata *> dataTuples; // Add metadata in list dataTuples.push_back(MDString::get(Inst->getContext(), c)); MDNode* N = MDNode::get(Inst->getContext(),

MemorySSA question

2017 Dec 19

MemorySSA question

On Tue, Dec 19, 2017 at 9:10 AM, Siddharth Bhat via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I could be entirely wrong, but from my understanding of memorySSA, each > def defines an "abstract heap state" which has the coarsest possible > definition - any write will be modelled as a "new heap state". > This is true for def-def relationships, but

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

2013 Jun 26

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Sent from my iPhone... On Jun 25, 2013, at 8:14 AM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> >> >> >> On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: >> >> >> >> >> Indvars should ideally preserve NSW flags whenever possible. However, >> we don't want to

Canonicalize induction variables

2016 Aug 25

Canonicalize induction variables

But even for a very simple loop: int test1 (int *x, int *y, int *z, int k) { int sum = 0; for (int i = 10; i < k; i++) { z[i] = x[i] / y[i]; } return sum; } The initial value of induction variable is not zero after compiling with -O3 -mcpu=power8 x.cpp -S -c -emit-llvm -fno-unroll-loops (see bottom of the email for IR) Also I can write somewhat more complicated loop where step

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

2013 Jun 25

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

----- Original Message ----- > > > > On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > Indvars should ideally preserve NSW flags whenever possible. However, > we don't want to rely on SCEV to preserve them. SCEV expressions are > implicitly reassociated and uniqued in a flow-insensitive universe > independent of the

similar to: LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target