search for: canvectorizememory

Displaying 20 results from an estimated 22 matches for "canvectorizememory".

2013 Feb 05
3
[LLVMdev] Vectorizing global struct pointers
Hi all, One of the reasons the Livermore Loops couldn't be vectorized is that it was using global structures to hold the arrays. Today, I'm investigating why is that so and how to fix it. My investigation brought me to LoopVectorizationLegality::canVectorizeMemory(): if (WriteObjects.count(*it)) { DEBUG(dbgs() << "LV: Found a possible read/write reorder:" << **it <<"\n"); return false; } In the first pass, it registers all underlying objects for writes, than it does it again fo...
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and removed, so it ended up as: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { sum[0] += A[i*4+0]; sum[1] += A[i*4+1]; sum[2] +=...
2013 Jan 29
2
[LLVMdev] Apparent indeterminism in PreVerifier
Hi Sergei, "addRuntimeCheck" inserts code that checks that two or more arrays are disjoint. I looked at the code and it looks fine. We generate PHIs in the order that they appear in a vector. The values are inserted in 'canVectorizeMemory', which also looks fine. Please let me know if you think I missed something. Thanks, Nadav On Jan 29, 2013, at 8:48 AM, Sergei Larin <slarin at codeaurora.org> wrote: > Nadav, > > As I peel this onion, it looks like you might know something about > InnerLoopVectorizer::...
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...uld be simply not done and the parallelization can be attempted using some other method (e.g. pure unrolling), like usual. get_local_id is converted to regular iteration variables (local id space x, y,z) in the wiloop. I played yesterday a bit by kludge-hacking the LoopVectorizer code to skip the canVectorizeMemory() check for these wiloop constructs and it managed to vectorize a kernel as expected. > You need to implement something like Whole Function Vectorization > (http://dl.acm.org/citation.cfm?id=2190061). The loop vectorizer can't > help you here. Ralf Karrenberg open sourced his implemen...
2013 Feb 05
0
[LLVMdev] Vectorizing global struct pointers
...ro.org> wrote: > Hi all, > > One of the reasons the Livermore Loops couldn't be vectorized is that it was using global structures to hold the arrays. Today, I'm investigating why is that so and how to fix it. > > My investigation brought me to LoopVectorizationLegality::canVectorizeMemory(): > > if (WriteObjects.count(*it)) { > DEBUG(dbgs() << "LV: Found a possible read/write reorder:" > << **it <<"\n"); > return false; > } > > In the first pass, it registers all underlying objects f...
2013 Oct 21
5
[LLVMdev] First attempt at recognizing pointer reduction
...atch that can make "canVectorize()" pass. Basically what I do is to teach AddReductionVar() about pointers, saying they don't really have an exit instructions, and that (maybe) the final store is a good candidate (is it?). This makes it recognize the writes and reads, but then "canVectorizeMemory()" bails because it can't find the array bounds. This will be my next step, but I'd like to know if I'm in the right direction, with the right assumptions about the reduction detection, before proceeding into more complicated bits. An example of the debug output I get for my earli...
2015 Mar 20
2
[LLVMdev] RFC: Loop versioning for LICM
...;Strides) { > 1029 if (isUniform(Ptr)) { > 1030 hasLoopInvariantStore = true; > 1031 } > > So later optimization can use this information in their legality analysis and make specific actions. > i.e. LoopVectorizer: > > 4002 bool LoopVectorizationLegality::canVectorizeMemory() { > 4008 if (LAI->hasLoopInvariantStore) { > 4009 emitAnalysis(VectorizationReport() > 4010 << "write to a loop invariant address could not be vectorized"); > 4011 DEBUG(dbgs() << "LV: We don't allow storing to uniform addresses\n&...
2013 Jan 29
0
[LLVMdev] Apparent indeterminism in PreVerifier
...indeterminism in PreVerifier > > Hi Sergei, > > "addRuntimeCheck" inserts code that checks that two or more arrays are > disjoint. I looked at the code and it looks fine. We generate PHIs in > the order that they appear in a vector. The values are inserted in > 'canVectorizeMemory', which also looks fine. Please let me know if you > think I missed something. > > Thanks, > Nadav > > On Jan 29, 2013, at 8:48 AM, Sergei Larin <slarin at codeaurora.org> > wrote: > > > Nadav, > > > > As I peel this onion, it looks like you m...
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
...t; <mailto:fwinter at jlab.org>> wrote: > > LV: We don't allow storing to uniform addresses > > > This is triggering because it didn't recognize as a reduction variable > during the canVectorizeInstrs() but did recognize that sum[q] is loop > invariant in canVectorizeMemory(). > > I'm guessing the nested loop was unrolled because of the low > trip-count, and removed, so it ended up as: > > float foo( int start , int end , float * A ) > { > float sum[4] = {0.,0.,0.,0.}; > for (int i = start ; i < end ; ++i ) { > sum[0] += A[i...
2013 Jan 29
1
[LLVMdev] Apparent indeterminism in PreVerifier
...>> >> Hi Sergei, >> >> "addRuntimeCheck" inserts code that checks that two or more arrays are >> disjoint. I looked at the code and it looks fine. We generate PHIs in >> the order that they appear in a vector. The values are inserted in >> 'canVectorizeMemory', which also looks fine. Please let me know if you >> think I missed something. >> >> Thanks, >> Nadav >> >> On Jan 29, 2013, at 8:48 AM, Sergei Larin <slarin at codeaurora.org> >> wrote: >> >>> Nadav, >>> >>>...
2015 Mar 19
2
[LLVMdev] RFC: Loop versioning for LICM
Hi Ashutosh, > On Mar 16, 2015, at 9:06 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > > Hi Adam, > > From: Adam Nemet [mailto:anemet at apple.com <mailto:anemet at apple.com>] > Sent: Wednesday, March 11, 2015 10:48 AM > To: Nema, Ashutosh > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu> > Subject: Re: [LLVMdev] RFC: Loop
2015 Mar 24
3
[LLVMdev] RFC: Loop versioning for LICM
...;Strides) { > 1029 if (isUniform(Ptr)) { > 1030 hasLoopInvariantStore = true; > 1031 } > > So later optimization can use this information in their legality analysis and make specific actions. > i.e. LoopVectorizer: > > 4002 bool LoopVectorizationLegality::canVectorizeMemory() { > 4008 if (LAI->hasLoopInvariantStore) { > 4009 emitAnalysis(VectorizationReport() > 4010 << "write to a loop invariant address could not be vectorized"); > 4011 DEBUG(dbgs() << "LV: We don't allow storing to uniform addresses\n&...
2013 Oct 21
0
[LLVMdev] First attempt at recognizing pointer reduction
...quot;canVectorize()" pass. > > Basically what I do is to teach AddReductionVar() about pointers, saying they don't really have an exit instructions, and that (maybe) the final store is a good candidate (is it?). > > This makes it recognize the writes and reads, but then "canVectorizeMemory()" bails because it can't find the array bounds. This will be my next step, but I'd like to know if I'm in the right direction, with the right assumptions about the reduction detection, before proceeding into more complicated bits. > > An example of the debug output I get fo...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...n can be attempted using some > other > method (e.g. pure unrolling), like usual. > > get_local_id is converted to regular iteration variables (local id > space x, > y,z) in the wiloop. > > I played yesterday a bit by kludge-hacking the LoopVectorizer code to > skip the canVectorizeMemory() check for these wiloop constructs and > it > managed to vectorize a kernel as expected. Based on this experience, can you propose some metadata that would allow this to happen (so that the LoopVectorizer would be generally useful for POCL)? I suspect this same metadata might be useful in o...
2013 Feb 05
3
[LLVMdev] Vectorizing global struct pointers
...gt; Hi all, > > > > One of the reasons the Livermore Loops couldn't be vectorized is that it was using global structures to hold the arrays. Today, I'm investigating why is that so and how to fix it. > > > > My investigation brought me to LoopVectorizationLegality::canVectorizeMemory(): > > > > if (WriteObjects.count(*it)) { > > DEBUG(dbgs() << "LV: Found a possible read/write reorder:" > > << **it <<"n"); > > return false; > > } > > > > In the first pass, it registers all underlying objects...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...llelization can be attempted using some other > method (e.g. pure unrolling), like usual. > > get_local_id is converted to regular iteration variables (local id space x, > y,z) in the wiloop. > > I played yesterday a bit by kludge-hacking the LoopVectorizer code to > skip the canVectorizeMemory() check for these wiloop constructs and it > managed to vectorize a kernel as expected. > >> You need to implement something like Whole Function Vectorization >> (http://dl.acm.org/citation.cfm?id=2190061). The loop vectorizer can't >> help you here. Ralf Karrenberg ope...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, > Hi, > > I started to play with the LoopVectorizer of LLVM trunk > on the work-item loops produced by pocl's OpenCL C > kernel compiler, in hopes of implementing multi-work-item > work group autovectorization in a modular manner. > Thanks for checking the Loop Vectorizer, I am interested in hearing your feedback. The Loop Vectorizer does not fit here.
2013 Jan 29
0
[LLVMdev] Apparent indeterminism in PreVerifier
Nadav, As I peel this onion, it looks like you might know something about InnerLoopVectorizer::addRuntimeCheck. What does it do, and can it be causing the below described issue? Could resuming somehow (indeterministically) switch the order of PHIs in the original code? Thanks a lot. Sergei. --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
2013 Jan 24
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi, I started to play with the LoopVectorizer of LLVM trunk on the work-item loops produced by pocl's OpenCL C kernel compiler, in hopes of implementing multi-work-item work group autovectorization in a modular manner. The vectorizer seems to refuse to vectorize the loop if it sees multiple writes to the same memory object within the same iteration. In case of parallel loops such as the