similar to: [LLVMdev] [Vectorization] Mis match in code generated

Displaying 20 results from an estimated 300 matches similar to: "[LLVMdev] [Vectorization] Mis match in code generated"

2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully
2016 Apr 08
2
LIBCLC with LLVM 3.9 Trunk
It's not clear what is actually wrong from your original message, I think you need to give some more information as to what you are doing: Example source, what target GPU, compiler error messages or other evidence of "it's wrong" (llvm IR, disassembly, etc) ... -- Mats On 8 April 2016 at 09:55, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > I built it
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We
2012 Jan 26
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: >> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> > Thanks! Did you compile with any non-default flags other than -mllvm >> > -vectorize? >> >> I used -O3 and -vectorize, no other non-default
2012 Jan 26
3
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: > On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > Thanks! Did you compile with any non-default flags other than -mllvm > > -vectorize? > > I used -O3 and -vectorize, no other non-default flags. If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c then I get no
2012 Jan 26
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, Jan 26, 2012 at 3:41 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: >> arm-none-linux-gnueabi > > Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get Minor remark: please use -target instead of -ccc-host-triple that is now deprecated. Thanks for looking at this testcase. Sebastian -- Qualcomm
2012 Jan 26
2
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote: > arm-none-linux-gnueabi Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get vectorization (even though I don't get vectorization when targeting x86_64). I'll let you know what I find. -Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
2016 Jul 11
2
extra loads in nested for-loop
I was looking at the code generated from the following c code and noticed extra loads in the inner-loop of these nested for-loops: #define DIM 8 #define UNROLL_DIM DIM typedef double InArray[DIM][DIM]; void f1( InArray c, InArray a, InArray b ) { #pragma clang loop unroll_count(UNROLL_DIM) for( int i=0;i<DIM;i++) #pragma clang loop unroll_count(UNROLL_DIM) for( int
2015 May 21
2
[LLVMdev] How can I remove these redundant copy between registers?
Hi, I've been working on a Blackfin backend (llvm-3.6.0) based on the previous one that was removed in llvm-3.1. llc generates codes like this: 29 p1 = r2; 30 r5 = [p1]; 31 p1 = r2; 32 r6 = [p1 + 4]; 33 r5 = r6 + r5; 34 r6 = [p0 + -4]; 35 r5 *= r6; 36 p1 = r2; 37 r6 = [p1 + 8]; 38 p1 = r2; p1 and r2 are in different register classes. A p*
2015 Jan 14
4
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Inline - George > On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > >> On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com> wrote: >>> On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote: >>> This is caused by CFLAA returning PartialAlias for a query that BasicAA can
2015 Jan 15
2
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Yes. I've attached an updated patch that does the following: 1. Fixes the partialalias of globals/arguments 2. Enables partialalias for cases where nothing has been unified to a global/argument 3. Fixes that select was unifying the condition to the other pieces (the condition does not need to be processed :P). This was causing unnecessary aliasing. 4. Adds a regression test to
2015 Jan 14
3
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Oh, sorry, i didn't rebase it when i changed the fix, you would have had to apply the first on top of the second. Here is one against HEAD On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos <apazos at codeaurora.org> wrote: > Daniel, your patch does not apply cleanly. Are you on the tip? > > The code I see there is no line if (QueryResult == MayAlias|| QueryResult == PartialAlias)
2012 May 04
0
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Hi Preston, On Fri, May 4, 2012 at 9:12 AM, Preston Briggs <preston.briggs at gmail.com> wrote: > > which produces > > %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64 > %arrayidx21.sum, i64 %add1411, i64 %add > store i64 0, i64* %arrayidx24, align 8 > {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 *
2012 May 04
3
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Is there any chance of replacing/extending the GEP instruction? As noted in the GEP FAQ, GEPs don't support variable-length arrays; when the front ends have to support VLAs, they linearize the subscript expressions, throwing away information. The FAQ suggests that folks interested in writing an analysis that understands array indices (I'm thinking of dependence analysis) should be
2016 Oct 04
2
Incompatible type assertion from llvm-tblgen
On Wed, Sep 28, 2016 at 12:54 PM, Krzysztof Parzyszek < kparzysz at codeaurora.org> wrote: > On 9/28/2016 2:44 PM, Phil Tomson wrote: > >> And map it to a load.idx instruction with the following semantics: >> load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))] >> >> That somehow the pattern matching dag fragment would need to be
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I started looking at the log files that you attached, and I'm confused. > The code that is supposedly causing the perf regression is created by the > loop vectorizer, right? Except the bad code is not in the "vector.body", so > is there something peculiar about