thr3ads.net - similar to: "extra loads in nested for-loop"

Displaying 20 results from an estimated 300 matches similar to: "extra loads in nested for-loop"

2016 Nov 17

Loop invariant not being optimized

I've got an example where I think that there should be some loop-invariant optimization happening, but it's not. Here's the C code: #define DIM 8 #define UNROLL_DIM DIM typedef double InArray[DIM][DIM]; __declspec(noalias) void f1( InArray c, const InArray a, const InArray b ) { #pragma clang loop unroll_count(UNROLL_DIM) for( int i=0;i<DIM;i++) #pragma clang loop

Loop invariant not being optimized

2016 Nov 18

Loop invariant not being optimized

I tried changing 'noalias' to 'restrict' in the code and I get: fma.c:17:12: warning: 'restrict' attribute only applies to return values that are pointers It seems like 'noalias' would be the correct attribute here, from the article you linked: "if a function is annotated as noalias, the optimizer can assume that, in addition to the parameters themselves,

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

Incompatible type assertion from llvm-tblgen

2016 Oct 04

Incompatible type assertion from llvm-tblgen

On Wed, Sep 28, 2016 at 12:54 PM, Krzysztof Parzyszek < kparzysz at codeaurora.org> wrote: > On 9/28/2016 2:44 PM, Phil Tomson wrote: > >> And map it to a load.idx instruction with the following semantics: >> load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))] >> >> That somehow the pattern matching dag fragment would need to be

Invoke loop vectorizer

2016 Aug 12

Invoke loop vectorizer

Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > >

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 13

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Hi folks, Moving the discussion to llvm.dev. None of the changes we talked earlier help. Find attached the C source code that you can use to reproduce the issue. clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm LICM hoisting to while.body.lr.ph: %21 = load double** %arrayidx8, align 8, !tbaa !5 LICM hoisting to

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens) which then got eliminated later on (since I don't see any vector instructions in the final IR). Below the debug output of the SLP pass: Args: opt -O1 -vectorize-slp -debug loop.ll -S SLP: Analyzing blocks in _Z3barmmPfS_S_. SLP: Found 2 stores to vectorize. SLP:

[LLVMdev] Expected behavior of calling bitcasted functions?

2013 May 30

[LLVMdev] Expected behavior of calling bitcasted functions?

Hi, I'm not sure what the expected behavior of calling a bitcasted function is. Suppose you have a case like this (which you get on the source level from attribute alias): @alias_f32 = alias bitcast (i32 (i32)* @func_i32 to float (float)*) define internal i32 @func_i32(i32 %v) noinline nounwind { entry: ret i32 %v } define void @bitcast_alias_scalar(float* noalias %source, float* noalias

Invoke loop vectorizer

2016 Aug 12

Invoke loop vectorizer

I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable)

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 14

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Can you send me actual LLVM IR or a preprocessed source from using -E? I don't have a machine handy that has headers that target that arch. On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> wrote: > Anything other than noalias or mustalias should be getting passed down the > stack, so either that is not happening or CFL aa is giving better answers > and

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

Well, they are not directly consecutive. They are consecutive with a constant offset or stride: ir1 = ir0 + 4 If I rewrite the function in this form void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = (

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the

Extended $ function called $$

2005 Nov 24

Extended $ function called $$

This code lets you use standard CSS selectors to get an array of elements. For example, $$("#container div.myElements") would return all subelements of #container that are divs and are of the class myElements. I submitted similar code a while back to the email address for the prototype library but never got a reply. Thought I''d post this in the hopes that some others will find

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 14

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote: > This is caused by CFLAA returning PartialAlias for a query that BasicAA > can prove is NoAlias. > One of them is wrong. Which one? I'm not sure from your description that this is a chaining issue. PartialAlias doesn't chain and isn't supposed to, it's a final answer just like NoAlias and

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

I ran the BB vectorizer as I guess this is the SLP vectorizer. BBV: using target information BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_... BBV: found 2 instructions with candidate pairs BBV: found 0 pair connections. BBV: done! However, this was run on the unrolled loop (I guess). Here is the IR printed by 'opt': entry: %cmp9 = icmp ult i64 %start, %end br i1 %cmp9, label

Question about instcombine pass.

2018 Feb 27

Question about instcombine pass.

Hello, Everyone. I have a question about llvm's "Combine redundant instructions(instcombine)" pass. I have tested instcombine pass by writing the following three test cases. But, CASE3 is not optimized as I expected. Is this behavior expected? The version of llvm is: clang version 5.0.1 (tags/RELEASE_501/final 325232) Option of clang command is: clang -O1 a.c -S -emit-llvm

[LLVMdev] Expected behavior of calling bitcasted functions?

2013 May 30

[LLVMdev] Expected behavior of calling bitcasted functions?

Hello, This is an interesting example. Whenever I see strange things like this, I use opt's -lint. In this case, opt -lint reports: Undefined behavior: Call return type mismatches callee return type %call = call float @alias_f32(float %tmp2) #1 You'll get a similar report when the parameter types mismatch. Pete On Wed, May 29, 2013 at 5:40 PM, Arsenault, Matthew <

similar to: extra loads in nested for-loop