thr3ads.net - similar to: "[LLVMdev] [Vectorization] Mis match in code generated"

Displaying 20 results from an estimated 300 matches similar to: "[LLVMdev] [Vectorization] Mis match in code generated"

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

LIBCLC with LLVM 3.9 Trunk

2016 Apr 08

LIBCLC with LLVM 3.9 Trunk

It's not clear what is actually wrong from your original message, I think you need to give some more information as to what you are doing: Example source, what target GPU, compiler error messages or other evidence of "it's wrong" (llvm IR, disassembly, etc) ... -- Mats On 8 April 2016 at 09:55, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > I built it

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: >> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> > Thanks! Did you compile with any non-default flags other than -mllvm >> > -vectorize? >> >> I used -O3 and -vectorize, no other non-default

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: > On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > Thanks! Did you compile with any non-default flags other than -mllvm > > -vectorize? > > I used -O3 and -vectorize, no other non-default flags. If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c then I get no

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, Jan 26, 2012 at 3:41 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: >> arm-none-linux-gnueabi > > Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get Minor remark: please use -target instead of -ccc-host-triple that is now deprecated. Thanks for looking at this testcase. Sebastian -- Qualcomm

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 26

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: > arm-none-linux-gnueabi Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get vectorization (even though I don't get vectorization when targeting x86_64). I'll let you know what I find. -Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory

extra loads in nested for-loop

2016 Jul 11

extra loads in nested for-loop

I was looking at the code generated from the following c code and noticed extra loads in the inner-loop of these nested for-loops: #define DIM 8 #define UNROLL_DIM DIM typedef double InArray[DIM][DIM]; void f1( InArray c, InArray a, InArray b ) { #pragma clang loop unroll_count(UNROLL_DIM) for( int i=0;i<DIM;i++) #pragma clang loop unroll_count(UNROLL_DIM) for( int

[LLVMdev] How can I remove these redundant copy between registers?

2015 May 21

[LLVMdev] How can I remove these redundant copy between registers?

Hi, I've been working on a Blackfin backend (llvm-3.6.0) based on the previous one that was removed in llvm-3.1. llc generates codes like this: 29 p1 = r2; 30 r5 = [p1]; 31 p1 = r2; 32 r6 = [p1 + 4]; 33 r5 = r6 + r5; 34 r6 = [p0 + -4]; 35 r5 *= r6; 36 p1 = r2; 37 r6 = [p1 + 8]; 38 p1 = r2; p1 and r2 are in different register classes. A p*

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 14

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Inline - George > On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > >> On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com> wrote: >>> On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote: >>> This is caused by CFLAA returning PartialAlias for a query that BasicAA can

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 15

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Yes. I've attached an updated patch that does the following: 1. Fixes the partialalias of globals/arguments 2. Enables partialalias for cases where nothing has been unified to a global/argument 3. Fixes that select was unifying the condition to the other pieces (the condition does not need to be processed :P). This was causing unnecessary aliasing. 4. Adds a regression test to

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 14

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Oh, sorry, i didn't rebase it when i changed the fix, you would have had to apply the first on top of the second. Here is one against HEAD On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos <apazos at codeaurora.org> wrote: > Daniel, your patch does not apply cleanly. Are you on the tip? > > The code I see there is no line if (QueryResult == MayAlias|| QueryResult == PartialAlias)

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

2012 May 04

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

Hi Preston, On Fri, May 4, 2012 at 9:12 AM, Preston Briggs <preston.briggs at gmail.com> wrote: > > which produces > > %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 > %arrayidx21.sum, i64 %add1411, i64 %add > store i64 0, i64* %arrayidx24, align 8 > {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 *

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

2012 May 04

[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful

Is there any chance of replacing/extending the GEP instruction? As noted in the GEP FAQ, GEPs don't support variable-length arrays; when the front ends have to support VLAs, they linearize the subscript expressions, throwing away information. The FAQ suggests that folks interested in writing an analysis that understands array indices (I'm thinking of dependence analysis) should be

Incompatible type assertion from llvm-tblgen

2016 Oct 04

Incompatible type assertion from llvm-tblgen

On Wed, Sep 28, 2016 at 12:54 PM, Krzysztof Parzyszek < kparzysz at codeaurora.org> wrote: > On 9/28/2016 2:44 PM, Phil Tomson wrote: > >> And map it to a load.idx instruction with the following semantics: >> load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))] >> >> That somehow the pattern matching dag fragment would need to be

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I started looking at the log files that you attached, and I'm confused. > The code that is supposedly causing the perf regression is created by the > loop vectorizer, right? Except the bad code is not in the "vector.body", so > is there something peculiar about

similar to: [LLVMdev] [Vectorization] Mis match in code generated