similar to: [LLVMdev] Loop Vectorizer Update

Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] Loop Vectorizer Update"

2014 Dec 13
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in
2013 Jan 03
1
[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?
Hi Nadav, On Thu, Jan 3, 2013 at 1:53 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Akira! > > > > > Does the current loop vectorizer inquire about the SIMD capabilities of > the target architecture when it decides whether it is profitable to > vectorize a loop? > > Yes, it uses a cost model to determine the profitability of vectorization. > At the
2013 Jan 03
0
[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?
Hi Akira! > > Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop? Yes, it uses a cost model to determine the profitability of vectorization. At the moment only x86 provides the necessary hooks that are needed for calculating the costs. We may need to change the cost defaults to
2014 Dec 11
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the
2013 Jan 03
3
[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?
Nadav (or anyone who is familiar with the loop vectorizer), Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop? I am asking this because I would like to have loop vectorization disabled for targets that don't support SIMD instructions (for example, standard mips32). Loop vectorization
2012 Oct 08
3
[LLVMdev] LLVM Loop Vectorizer (Nadav Rotem)
On 10/08/2012 06:02 AM, Nadav Rotem wrote: > Hi Javed, > > Developing a good loop vectorizer takes several years. The work on the GCC vectorizer began in 2004, and they spent several years improving and optimizing their vectorizer. They started by vectorizing simple loops, and added features that they needed in order to vectorize additional loops that were important for them. They
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is
2012 Oct 08
0
[LLVMdev] LLVM Loop Vectorizer (Nadav Rotem)
Hi Javed, Developing a good loop vectorizer takes several years. The work on the GCC vectorizer began in 2004, and they spent several years improving and optimizing their vectorizer. They started by vectorizing simple loops, and added features that they needed in order to vectorize additional loops that were important for them. They started with a single-block loops, and later they added
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Joshua Klontz" <josh.klontz at gmail.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, November 15, 2013 4:05:53 PM > Subject: Re: [LLVMdev] Limit loop vectorizer to SSE > > > Something like: > > index
2013 Oct 30
0
[LLVMdev] loop vectorizer
I ran the BB vectorizer as I guess this is the SLP vectorizer. BBV: using target information BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_... BBV: found 2 instructions with candidate pairs BBV: found 0 pair connections. BBV: done! However, this was run on the unrolled loop (I guess). Here is the IR printed by 'opt': entry: %cmp9 = icmp ult i64 %start, %end br i1 %cmp9, label
2013 Nov 15
2
[LLVMdev] Limit loop vectorizer to SSE
Yes, I was just about to send out: DL->getABITypeAlignment(ScalarDataTy); The question is: “… ABI alignment for the target …" is that getPrefTypeAlignment or getABITypeAlignment I would have thought the latter. On Nov 15, 2013, at 4:12 PM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Arnold Schwaighofer"
2013 Oct 30
3
[LLVMdev] loop vectorizer
On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote: > The access pattern to arrays a and b is non-linear. Unrolled loops are > usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all > values for i ? > Based on his list of values, it seems that the induction stride is linear within each block of 4 iterations, but it's not a clear
2013 Nov 15
4
[LLVMdev] Limit loop vectorizer to SSE
Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?
2012 Oct 17
0
[LLVMdev] Loop vectorizer
Hi Nadav, Do you have any small write-up of current design of loop vectorizer?. If so, can you please send it across?. I want to see if there are dependencies such as unrolling for the vectorization. In the design we may also have to consider BB vectorizer and loop vectorizer working well together with no ambiguous requirements/dependencies. Regards, Shivaram -----Original Message-----
2013 Oct 30
0
[LLVMdev] loop vectorizer
Well, they are not directly consecutive. They are consecutive with a constant offset or stride: ir1 = ir0 + 4 If I rewrite the function in this form void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = (
2012 Oct 05
0
[LLVMdev] LLVM Loop Vectorizer
----- Original Message ----- > From: "Nadav Rotem" <nrotem at apple.com> > To: "llvmdev at cs.uiuc.edu Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, October 5, 2012 1:14:47 AM > Subject: [LLVMdev] LLVM Loop Vectorizer > > Hi, > > We are starting to work on an LLVM loop vectorizer. There's number of > different projects that
2013 Jan 03
2
[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?
On 3 January 2013 21:53, Nadav Rotem <nrotem at apple.com> wrote: > > I am asking this because I would like to have loop vectorization > disabled for targets that don't support SIMD instructions (for example, > standard mips32). > > Loop vectorization bloats the code size and prolongs compilation time > without any improvement to performance for such targets. >
2012 Oct 05
1
[LLVMdev] LLVM Loop Vectorizer
Why not just have a hook into the TargetInstrInfo to query for the cost of an instruction? This is already used in many places throughout the optimizers. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Das, Dibyendu > Sent: Friday, October 05, 2012 2:00 AM > To: Nadav Rotem; llvmdev at cs.uiuc.edu Mailing
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]