search for: canvectorizememori

Displaying 20 results from an estimated 22 matches for "canvectorizememori".

Did you mean: canvectorizememory
2013 Feb 05
3
[LLVMdev] Vectorizing global struct pointers
Hi all, One of the reasons the Livermore Loops couldn't be vectorized is that it was using global structures to hold the arrays. Today, I'm investigating why is that so and how to fix it. My investigation brought me to LoopVectorizationLegality::canVectorizeMemory(): if (WriteObjects.count(*it)) { DEBUG(dbgs() << "LV: Found a possible read/write reorder:"
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and
2013 Jan 29
2
[LLVMdev] Apparent indeterminism in PreVerifier
Hi Sergei, "addRuntimeCheck" inserts code that checks that two or more arrays are disjoint. I looked at the code and it looks fine. We generate PHIs in the order that they appear in a vector. The values are inserted in 'canVectorizeMemory', which also looks fine. Please let me know if you think I missed something. Thanks, Nadav On Jan 29, 2013, at 8:48 AM, Sergei Larin
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 09:56 AM, Nadav Rotem wrote: > Thanks for checking the Loop Vectorizer, I am interested in hearing your > feedback. The Loop Vectorizer does not fit here. OpenCL vectorization is > completely different because the language itself is data-parallel. You > don't need all of the legality checks that the loop vectorizer has. I'm aware of this and it was my point in
2013 Feb 05
0
[LLVMdev] Vectorizing global struct pointers
If I understand you correctly, conceptually you want two different objects to be returned for Foo.bl and Foo.al? Here is my take on this (take this with a grain of salt, Dan is the expert on this): http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds LLVM's semantic allows for arrays to be accessed out of bounds - this allows you to walk from the first
2013 Oct 21
5
[LLVMdev] First attempt at recognizing pointer reduction
Hi Nadav, Arnold, I managed to find some time to work on the pointer reduction, and I got a patch that can make "canVectorize()" pass. Basically what I do is to teach AddReductionVar() about pointers, saying they don't really have an exit instructions, and that (maybe) the final store is a good candidate (is it?). This makes it recognize the writes and reads, but then
2015 Mar 20
2
[LLVMdev] RFC: Loop versioning for LICM
> On Mar 19, 2015, at 9:46 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > > Thanks Adam for your reply. > > From: Adam Nemet [mailto:anemet at apple.com <mailto:anemet at apple.com>] > Sent: Friday, March 20, 2015 3:23 AM > To: Nema, Ashutosh > Cc: Hal Finkel; Philip Reames; llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > Subject:
2013 Jan 29
0
[LLVMdev] Apparent indeterminism in PreVerifier
Nadav, Thanks for the quick response. By now I am convinced that the given loop ends up vectorized with enough difference to cause bad things later on, but I have not found the exact cause yet. To continue with my work I'll have to simply turn off vectorization for now, but I will come back and investigate. Again, there is some indeterminism in order of PHIs processing somewhere. I'll
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8
2013 Jan 29
1
[LLVMdev] Apparent indeterminism in PreVerifier
Is there a test case that you can share ? On Jan 29, 2013, at 9:24 AM, Sergei Larin <slarin at codeaurora.org> wrote: > Nadav, > > Thanks for the quick response. By now I am convinced that the given loop > ends up vectorized with enough difference to cause bad things later on, but > I have not found the exact cause yet. To continue with my work I'll have to >
2015 Mar 19
2
[LLVMdev] RFC: Loop versioning for LICM
Hi Ashutosh, > On Mar 16, 2015, at 9:06 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > > Hi Adam, > > From: Adam Nemet [mailto:anemet at apple.com <mailto:anemet at apple.com>] > Sent: Wednesday, March 11, 2015 10:48 AM > To: Nema, Ashutosh > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > Subject: Re: [LLVMdev] RFC: Loop
2015 Mar 24
3
[LLVMdev] RFC: Loop versioning for LICM
> On Mar 20, 2015, at 8:02 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > > > Yes, this is what I was proposing above and here ;): > Thanks Adam it’s for confirming J NP :). > > > No, not hasLoopInvariantStore but hasAccessToLoopInvariantAddress. > Its only for invariant stores[not loads], Using ‘hasLoopInvariantStore’ (or a name with invariant store)
2013 Oct 21
0
[LLVMdev] First attempt at recognizing pointer reduction
Renato, This looks like the right direction. Did you run it on the LLVM test suite to check if it finds new loops to vectorize ? Thanks, Nadav On Oct 21, 2013, at 8:23 AM, Renato Golin <renato.golin at linaro.org> wrote: > Hi Nadav, Arnold, > > I managed to find some time to work on the pointer reduction, and I got a patch that can make "canVectorize()" pass. >
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
----- Original Message ----- > From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Nadav Rotem" <nrotem at apple.com> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, January 25, 2013 5:35:16 AM > Subject: Re: [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization > > On
2013 Feb 05
3
[LLVMdev] Vectorizing global struct pointers
----- Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > If I understand you correctly, conceptually you want two different objects to be returned for Foo.bl and Foo.al? > > Here is my take on this (take this with a grain of salt, Dan is the expert on this): > > http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds > >
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, > How I see it, the data parallel input simply makes the vectorizer's job > easier (skip some of the legality checks) while reusing most of the > implementation (e.g. cost estimation, unrolling decisions, the > vector instruction formation itself, predication/if-conversion, > speculative execution+blend, etc.). > What you need is outer loop vectorization while
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, > Hi, > > I started to play with the LoopVectorizer of LLVM trunk > on the work-item loops produced by pocl's OpenCL C > kernel compiler, in hopes of implementing multi-work-item > work group autovectorization in a modular manner. > Thanks for checking the Loop Vectorizer, I am interested in hearing your feedback. The Loop Vectorizer does not fit here.
2013 Jan 29
0
[LLVMdev] Apparent indeterminism in PreVerifier
Nadav, As I peel this onion, it looks like you might know something about InnerLoopVectorizer::addRuntimeCheck. What does it do, and can it be causing the below described issue? Could resuming somehow (indeterministically) switch the order of PHIs in the original code? Thanks a lot. Sergei. --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
2013 Jan 24
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi, I started to play with the LoopVectorizer of LLVM trunk on the work-item loops produced by pocl's OpenCL C kernel compiler, in hopes of implementing multi-work-item work group autovectorization in a modular manner. The vectorizer seems to refuse to vectorize the loop if it sees multiple writes to the same memory object within the same iteration. In case of parallel loops such as the