thr3ads.net - similar to: "[LLVMdev] alloca scalarization with dynamic indexing into vectors"

Displaying 20 results from an estimated 500 matches similar to: "[LLVMdev] alloca scalarization with dynamic indexing into vectors"

[LLVMdev] [Patch][RFC] Change R600 data layout

2013 Dec 31

[LLVMdev] [Patch][RFC] Change R600 data layout

Hi, I've prepared patches for both LLVM and Clang to change the datalayout for R600. This may seem like a bold move, but I think it is warranted. R600/SI is a strange architecture in that it uses 64bit pointers but does not support 64 bit arithmetic except for load/store operations that roughly map onto getelementptr. The current datalayout for r600 includes n32:64, which is odd

[LLVMdev] Address space extension

2013 Aug 11

[LLVMdev] Address space extension

Hello Micah, I first apologize for the mail length, but I think that using an example would be better to clarify the case and the objections. > [Micah Villmow] In the case of OpenCL, you can't correctly use the standard C calling convention and still be OpenCL compliant, the C calling convention is too permissive. The second you use OpenCL, you are using an OpenCL specific calling

Vectorizing remainder loop

2018 Jul 29

Vectorizing remainder loop

Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has

[LLVMdev] better code for IV

2014 Feb 19

[LLVMdev] better code for IV

Hi Andrew, The issue below refers to LSR, so I'll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them. Thanks, Anat _____________________________________________ From: Shemer, Anat Sent: Tuesday, February 18, 2014 15:07 To: 'llvmdev at

[LLVMdev] Vectorizing global struct pointers

2013 Feb 05

[LLVMdev] Vectorizing global struct pointers

Hi all, One of the reasons the Livermore Loops couldn't be vectorized is that it was using global structures to hold the arrays. Today, I'm investigating why is that so and how to fix it. My investigation brought me to LoopVectorizationLegality::canVectorizeMemory(): if (WriteObjects.count(*it)) { DEBUG(dbgs() << "LV: Found a possible read/write reorder:"

[LLVMdev] Metadata/Value split has landed

2014 Dec 10

[LLVMdev] Metadata/Value split has landed

> On 2014 Dec 10, at 08:40, Tom Stellard <tom at stellard.net> wrote: > > On Tue, Dec 09, 2014 at 09:22:16PM -0800, Duncan P. N. Exon Smith wrote: >> The `Metadata`/`Value` split (PR21532) landed in r223802 -- at least, the >> C++ side of it. This was a rocky day, but I suppose that's what I get >> for failing to stage the change in smaller pieces. >>

MemorySSA question

2017 Dec 19

MemorySSA question

Hi, I am new to MemorySSA and wanted to understand its capabilities. Hence I wrote the following program (test.c): int N; void test(int *restrict a, int *restrict b, int *restrict c, int *restrict d, int *restrict e) { int i; for (i = 0; i < N; i = i + 5) { a[i] = b[i] + c[i]; } for (i = 0; i < N - 5; i = i + 5) { e[i] = a[i] * d[i]; } } I compiled this program using

[LLVMdev] Address space extension

2013 Aug 10

[LLVMdev] Address space extension

> -----Original Message----- > From: Michele Scandale [mailto:michele.scandale at gmail.com] > Sent: Saturday, August 10, 2013 6:29 AM > To: Micah Villmow > Cc: LLVM Developers Mailing List > Subject: Re: [LLVMdev] Address space extension > > On 08/10/2013 02:47 PM, Micah Villmow wrote: > > Michele, > > The information you are trying to gather is fundamentally

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I

[LLVMdev] Vectorizing global struct pointers

2013 Feb 05

[LLVMdev] Vectorizing global struct pointers

If I understand you correctly, conceptually you want two different objects to be returned for Foo.bl and Foo.al? Here is my take on this (take this with a grain of salt, Dan is the expert on this): http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds LLVM's semantic allows for arrays to be accessed out of bounds - this allows you to walk from the first

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to

[LLVMdev] Vectorizing alloca instructions

2013 Oct 24

[LLVMdev] Vectorizing alloca instructions

Hi Tom, Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA. Thanks, Nadav On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote: > Hi, > >

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

Verifying function running passes ... LV: Checking a loop in "bar" LV: Found a loop: L0 LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float* %arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to i64)) + %arg2) LV: Src Scev: {((4 * (sext

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully

MemorySSA question

2017 Dec 19

MemorySSA question

On Tue, Dec 19, 2017 at 9:10 AM, Siddharth Bhat via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I could be entirely wrong, but from my understanding of memorySSA, each > def defines an "abstract heap state" which has the coarsest possible > definition - any write will be modelled as a "new heap state". > This is true for def-def relationships, but

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

The loop vectorized does not estimate the cost of vectorization by looking at the IR you list below. It does not vectorize and then run the CostAnalysis pass. It estimates the cost itself before it even performs the vectorization. The way it works is that it looks at all the scalar instructions and asks: What is the cost if I execute the scalar instruction as a vector instruction. Therefore, it

[LLVMdev] Loop vectorizer dosen't find loop bounds

2013 Oct 28

[LLVMdev] Loop vectorizer dosen't find loop bounds

I am trying to vectorize the function void bar(float *c, float *a, float *b) { const int width = 256; for (int i = 0 ; i < 256 ; ++i ) { c[ i ] = a[ i ] + b[ i ]; c[ width + i ] = a[ width + i ] + b[ width + i ]; } } using the following commands clang -emit-llvm -S loop.c opt loop.ll -O3 -debug-only=loop-vectorize -S -o - LV: Checking a loop in

Canonicalize induction variables

2016 Aug 25

Canonicalize induction variables

But even for a very simple loop: int test1 (int *x, int *y, int *z, int k) { int sum = 0; for (int i = 10; i < k; i++) { z[i] = x[i] / y[i]; } return sum; } The initial value of induction variable is not zero after compiling with -O3 -mcpu=power8 x.cpp -S -c -emit-llvm -fno-unroll-loops (see bottom of the email for IR) Also I can write somewhat more complicated loop where step

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

Frank, It looks like the loop vectorizer is unable to tell that the two stores in your code never overlap. This is probably because of the sign-extend in your code. Can you extend the indices to 64bit ? Thanks, Nadav On Oct 28, 2013, at 1:38 PM, Frank Winter <fwinter at jlab.org> wrote: > Verifying function > running passes ... > LV: Checking a loop in "bar" > LV:

similar to: [LLVMdev] alloca scalarization with dynamic indexing into vectors