similar to: extra loads in nested for-loop

Displaying 20 results from an estimated 300 matches similar to: "extra loads in nested for-loop"

2016 Nov 17
2
Loop invariant not being optimized
I've got an example where I think that there should be some loop-invariant optimization happening, but it's not. Here's the C code: #define DIM 8 #define UNROLL_DIM DIM typedef double InArray[DIM][DIM]; __declspec(noalias) void f1( InArray c, const InArray a, const InArray b ) { #pragma clang loop unroll_count(UNROLL_DIM) for( int i=0;i<DIM;i++) #pragma clang loop
2016 Nov 18
2
Loop invariant not being optimized
I tried changing 'noalias' to 'restrict' in the code and I get: fma.c:17:12: warning: 'restrict' attribute only applies to return values that are pointers It seems like 'noalias' would be the correct attribute here, from the article you linked: "if a function is annotated as noalias, the optimizer can assume that, in addition to the parameters themselves,
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully
2016 Oct 04
2
Incompatible type assertion from llvm-tblgen
On Wed, Sep 28, 2016 at 12:54 PM, Krzysztof Parzyszek < kparzysz at codeaurora.org> wrote: > On 9/28/2016 2:44 PM, Phil Tomson wrote: > >> And map it to a load.idx instruction with the following semantics: >> load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))] >> >> That somehow the pattern matching dag fragment would need to be
2016 Aug 12
2
Invoke loop vectorizer
Hi Daniel, I increased the size of your test to be 128 but -stats still shows no loop optimized... Xiaochu On Aug 12, 2016 11:11 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > It's not possible to know that A and B don't alias in this example. It's > almost certainly not profitable to add a runtime check given the size of > the loop. > > >
2015 Jan 13
2
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Hi folks, Moving the discussion to llvm.dev. None of the changes we talked earlier help. Find attached the C source code that you can use to reproduce the issue. clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm LICM hoisting to while.body.lr.ph: %21 = load double** %arrayidx8, align 8, !tbaa !5 LICM hoisting to
2013 Oct 30
0
[LLVMdev] loop vectorizer
The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens) which then got eliminated later on (since I don't see any vector instructions in the final IR). Below the debug output of the SLP pass: Args: opt -O1 -vectorize-slp -debug loop.ll -S SLP: Analyzing blocks in _Z3barmmPfS_S_. SLP: Found 2 stores to vectorize. SLP:
2013 May 30
3
[LLVMdev] Expected behavior of calling bitcasted functions?
Hi, I'm not sure what the expected behavior of calling a bitcasted function is. Suppose you have a case like this (which you get on the source level from attribute alias): @alias_f32 = alias bitcast (i32 (i32)* @func_i32 to float (float)*) define internal i32 @func_i32(i32 %v) noinline nounwind { entry: ret i32 %v } define void @bitcast_alias_scalar(float* noalias %source, float* noalias
2016 Aug 12
4
Invoke loop vectorizer
I'm not compiling it to x86. Should loop optimizer something independent of the target? If so, should the vectorized code on IR level? On Aug 12, 2016 11:39 AM, "Daniel Berlin" <dberlin at dberlin.org> wrote: > cat > test.c > > #define SIZE 128 > > void bar(int *restrict A, int* restrict B,int K) { > > #pragma clang loop vectorize(enable)
2015 Jan 14
2
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Can you send me actual LLVM IR or a preprocessed source from using -E? I don't have a machine handy that has headers that target that arch. On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> wrote: > Anything other than noalias or mustalias should be getting passed down the > stack, so either that is not happening or CFL aa is giving better answers > and
2013 Oct 30
0
[LLVMdev] loop vectorizer
Well, they are not directly consecutive. They are consecutive with a constant offset or stride: ir1 = ir0 + 4 If I rewrite the function in this form void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = (
2013 Oct 30
2
[LLVMdev] loop vectorizer
The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the
2005 Nov 24
1
Extended $ function called $$
This code lets you use standard CSS selectors to get an array of elements. For example, $$("#container div.myElements") would return all subelements of #container that are divs and are of the class myElements. I submitted similar code a while back to the email address for the prototype library but never got a reply. Thought I''d post this in the hopes that some others will find
2015 Jan 14
3
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote: > This is caused by CFLAA returning PartialAlias for a query that BasicAA > can prove is NoAlias. > One of them is wrong. Which one? I'm not sure from your description that this is a chaining issue. PartialAlias doesn't chain and isn't supposed to, it's a final answer just like NoAlias and
2013 Oct 30
0
[LLVMdev] loop vectorizer
I ran the BB vectorizer as I guess this is the SLP vectorizer. BBV: using target information BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_... BBV: found 2 instructions with candidate pairs BBV: found 0 pair connections. BBV: done! However, this was run on the unrolled loop (I guess). Here is the IR printed by 'opt': entry: %cmp9 = icmp ult i64 %start, %end br i1 %cmp9, label
2018 Feb 27
0
Question about instcombine pass.
Hello, Everyone. I have a question about llvm's "Combine redundant instructions(instcombine)" pass. I have tested instcombine pass by writing the following three test cases. But, CASE3 is not optimized as I expected. Is this behavior expected? The version of llvm is: clang version 5.0.1 (tags/RELEASE_501/final 325232) Option of clang command is: clang -O1 a.c -S -emit-llvm
2013 May 30
0
[LLVMdev] Expected behavior of calling bitcasted functions?
Hello, This is an interesting example. Whenever I see strange things like this, I use opt's -lint. In this case, opt -lint reports: Undefined behavior: Call return type mismatches callee return type %call = call float @alias_f32(float %tmp2) #1 You'll get a similar report when the parameter types mismatch. Pete On Wed, May 29, 2013 at 5:40 PM, Arsenault, Matthew <