Displaying 20 results from an estimated 300 matches similar to: "[LLVMdev] [Vectorization] Mis match in code generated"
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav,
Thanks for the quick reply !!
Ok, so as of now we are lacking capability to handle flat large reductions.
I did go through function vectorizeChainsInBlock() (line number 2862). In
this function,
we try to vectorize if we have phi nodes in the IR (several if's check for
phi nodes) i.e we try to
construct tree that starts at chains.
Any pointers on how to join multiple trees? I
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold,
Thanks for your reply.
I tried test case as suggested by you.
*void foo(int *a, int *sum) {*sum =
a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}*
so that it has a 'store' in its IR.
*IR before vectorization :*target datalayout =
"e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple =
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog,
Thanks for looking at this. This has recently got itself onto my TODO list
too.
> I am not sure how much all this will improve the code quality for
horizontal reduction
> (donno how frequently such pattern of horizontal reduction from same
array occurs in real world/SPECS).
Actually the main loop of 470.lbm can be SLP vectorized like this. We have
three parts to it: A fully
2016 Apr 08
2
LIBCLC with LLVM 3.9 Trunk
It's not clear what is actually wrong from your original message, I think
you need to give some more information as to what you are doing: Example
source, what target GPU, compiler error messages or other evidence of "it's
wrong" (llvm IR, disassembly, etc) ...
--
Mats
On 8 April 2016 at 09:55, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
> I built it
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this
eliminates 'sext' instructions in the IR)
Here the IR produced with clang -O0
define float @foo(i64 %start, i64 %end, float* %A) #0 {
entry:
%start.addr = alloca i64, align 8
%end.addr = alloca i64, align 8
%A.addr = alloca float*, align 8
%sum = alloca [4 x float], align 16
%i = alloca i64, align 8
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote:
> LV: We don't allow storing to uniform addresses
>
This is triggering because it didn't recognize as a reduction variable
during the canVectorizeInstrs() but did recognize that sum[q] is loop
invariant in canVectorizeMemory().
I'm guessing the nested loop was unrolled because of the low trip-count,
and
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel:
float foo( int start , int end , float * A )
{
float sum[4] = {0.,0.,0.,0.};
for (int i = start ; i < end ; ++i ) {
for (int q = 0 ; q < 4 ; ++q )
sum[q] += A[i*4+q];
}
return sum[0]+sum[1]+sum[2]+sum[3];
}
LV: Checking a loop in "foo"
LV: Found a loop: for.cond1
LV: Found an induction variable.
LV: We
2012 Jan 26
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote:
>> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> > Thanks! Did you compile with any non-default flags other than -mllvm
>> > -vectorize?
>>
>> I used -O3 and -vectorize, no other non-default
2012 Jan 26
3
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote:
> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > Thanks! Did you compile with any non-default flags other than -mllvm
> > -vectorize?
>
> I used -O3 and -vectorize, no other non-default flags.
If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c
then I get no
2012 Jan 26
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, Jan 26, 2012 at 3:41 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote:
>> arm-none-linux-gnueabi
>
> Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get
Minor remark: please use -target instead of -ccc-host-triple that is
now deprecated.
Thanks for looking at this testcase.
Sebastian
--
Qualcomm
2012 Jan 26
2
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote:
> arm-none-linux-gnueabi
Indeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get
vectorization (even though I don't get vectorization when targeting
x86_64). I'll let you know what I find.
-Hal
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
2016 Jul 11
2
extra loads in nested for-loop
I was looking at the code generated from the following c code and noticed
extra loads in the inner-loop of these nested for-loops:
#define DIM 8
#define UNROLL_DIM DIM
typedef double InArray[DIM][DIM];
void f1( InArray c, InArray a, InArray b ) {
#pragma clang loop unroll_count(UNROLL_DIM)
for( int i=0;i<DIM;i++)
#pragma clang loop unroll_count(UNROLL_DIM)
for( int
2015 May 21
2
[LLVMdev] How can I remove these redundant copy between registers?
Hi,
I've been working on a Blackfin backend (llvm-3.6.0) based on the previous
one that was removed in llvm-3.1.
llc generates codes like this:
29 p1 = r2;
30 r5 = [p1];
31 p1 = r2;
32 r6 = [p1 + 4];
33 r5 = r6 + r5;
34 r6 = [p0 + -4];
35 r5 *= r6;
36 p1 = r2;
37 r6 = [p1 + 8];
38 p1 = r2;
p1 and r2 are in different register classes.
A p*
2015 Jan 14
4
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Inline
- George
> On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
>
>> On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com> wrote:
>>> On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote:
>>> This is caused by CFLAA returning PartialAlias for a query that BasicAA can
2015 Jan 15
2
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Yes.
I've attached an updated patch that does the following:
1. Fixes the partialalias of globals/arguments
2. Enables partialalias for cases where nothing has been unified to a
global/argument
3. Fixes that select was unifying the condition to the other pieces (the
condition does not need to be processed :P). This was causing unnecessary
aliasing.
4. Adds a regression test to
2015 Jan 14
3
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Oh, sorry, i didn't rebase it when i changed the fix, you would have had to
apply the first on top of the second.
Here is one against HEAD
On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos <apazos at codeaurora.org> wrote:
> Daniel, your patch does not apply cleanly. Are you on the tip?
>
> The code I see there is no line if (QueryResult == MayAlias|| QueryResult == PartialAlias)
2012 May 04
0
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Hi Preston,
On Fri, May 4, 2012 at 9:12 AM, Preston Briggs <preston.briggs at gmail.com> wrote:
>
> which produces
>
> %arrayidx24 = getelementptr inbounds [100 x [100 x i64]]* %A, i64
> %arrayidx21.sum, i64 %add1411, i64 %add
> store i64 0, i64* %arrayidx24, align 8
> {{{(5 + ((3 + %n) * %n)),+,(2 * %n * %n)}<%for.cond1.preheader>,+,(4 *
2012 May 04
3
[LLVMdev] Extending GetElementPointer, or Premature Linearization Considered Harmful
Is there any chance of replacing/extending the GEP instruction?
As noted in the GEP FAQ, GEPs don't support variable-length arrays; when
the front ends have to support VLAs, they linearize the subscript
expressions, throwing away information. The FAQ suggests that folks
interested in writing an analysis that understands array indices (I'm
thinking of dependence analysis) should be
2016 Oct 04
2
Incompatible type assertion from llvm-tblgen
On Wed, Sep 28, 2016 at 12:54 PM, Krzysztof Parzyszek <
kparzysz at codeaurora.org> wrote:
> On 9/28/2016 2:44 PM, Phil Tomson wrote:
>
>> And map it to a load.idx instruction with the following semantics:
>> load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))]
>>
>> That somehow the pattern matching dag fragment would need to be
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> I started looking at the log files that you attached, and I'm confused.
> The code that is supposedly causing the perf regression is created by the
> loop vectorizer, right? Except the bad code is not in the "vector.body", so
> is there something peculiar about