thr3ads.net - similar to: "Loop vectorizer doesn't try to align vectors on preferred vector alignment"

Displaying 20 results from an estimated 3000 matches similar to: "Loop vectorizer doesn't try to align vectors on preferred vector alignment"

Loop vectorizer doesn't try to align vectors on preferred vector alignment

2018 Apr 12

Loop vectorizer doesn't try to align vectors on preferred vector alignment

On Thu, Apr 12, 2018, 8:40 AM Benoit Meister <meister at reservoir.com> wrote: > Thank you, Ayal! And thanks for the quote, Mehdi. I believe it says that > it would be a normal thing for the Loop Vectorizer to conform to the > backend's preferred alignment constraints as given by the datalayout. > > On Thu, Apr 12, 2018, 3:24 AM Zaks, Ayal <ayal.zaks at intel.com>

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?

[Proposal][RFC] Epilog loop vectorization

2017 Mar 14

[Proposal][RFC] Epilog loop vectorization

Summarizing the discussion on the implementation approaches. Discussed about two approaches, first running ‘InnerLoopVectorizer’ again on the epilog loop immediately after vectorizing the original loop within the same vectorization pass, the second approach where re-running vectorization pass and limiting vectorization factor of epilog loop by metadata. <Approach-2> Challenges with

RFC: New intrinsics masked.expandload and masked.compressstore

2016 Sep 25

RFC: New intrinsics masked.expandload and masked.compressstore

| |Hi Elena, | |Technically speaking, this seems straightforward. | |I wonder, however, how target-independent this is in a practical |sense; will there be an efficient lowering when targeting any other |ISA? I don't want to get into the territory where, because the |vectorizer is supposed to be architecture independent, we need to |add target-independent intrinsics for all

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Joshua Klontz" <josh.klontz at gmail.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, November 15, 2013 4:05:53 PM > Subject: Re: [LLVMdev] Limit loop vectorizer to SSE > > > Something like: > > index

[Proposal][RFC] Epilog loop vectorization

2017 Mar 14

[Proposal][RFC] Epilog loop vectorization

On 03/14/2017 11:58 AM, Michael Kuperstein wrote: > I'm still not sure about this, for a few reasons: > > 1) I'd like to try to treat epilogue loops the same way regardless of > whether the main loop was vectorized by hand or automatically. So if > someone hand-wrote an avx-512 16-wide loop, with alias checks, and we > decide it's profitable to vectorize the

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

Yes, I was just about to send out: DL->getABITypeAlignment(ScalarDataTy); The question is: “… ABI alignment for the target …" is that getPrefTypeAlignment or getABITypeAlignment I would have thought the latter. On Nov 15, 2013, at 4:12 PM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Arnold Schwaighofer"

LoopVectorize module - some possible enhancements

2016 Aug 21

LoopVectorize module - some possible enhancements

Hello, Michael, I'd like to ask if we can enhance the LoopVectorize LLVM module (I am currently using a version from Jul 2016). More exactly: - do you envision to support in the near future LLVM IR gather and scatter intrinsics (as described at http://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics and scatter)? I see you have defined some methods that should

[Proposal][RFC] Epilog loop vectorization

2017 Mar 14

[Proposal][RFC] Epilog loop vectorization

On 03/14/2017 11:21 AM, Adam Nemet wrote: > >> On Mar 14, 2017, at 6:00 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com >> <mailto:Ashutosh.Nema at amd.com>> wrote: >> >> Summarizing the discussion on the implementation approaches. >> Discussed about two approaches, first running ‘InnerLoopVectorizer’ >> again on the epilog loop immediately after

[Proposal][RFC] Epilog loop vectorization

2017 Feb 28

[Proposal][RFC] Epilog loop vectorization

I have tried running both gvn and newgvn but it did not helped in hoisting the alias checks: Please check, maybe I have missed something. <TestCase> void foo (char *A, char *B, char *C, int len) { int i = 0; for (i=0 ; i< len; i++) A[i] = B[i] + C[i]; } <Command> $ opt –O3 –gvn test.ll –o test.opt.ll $ opt –O3 –newgvn test.ll –o test.opt.ll “test.ll” is attached, it

[Proposal][RFC] Epilog loop vectorization

2017 Mar 14

[Proposal][RFC] Epilog loop vectorization

On 03/14/2017 12:11 PM, Adam Nemet wrote: > >> On Mar 14, 2017, at 9:49 AM, Hal Finkel <hfinkel at anl.gov >> <mailto:hfinkel at anl.gov>> wrote: >> >> >> On 03/14/2017 11:21 AM, Adam Nemet wrote: >>> >>>> On Mar 14, 2017, at 6:00 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com >>>> <mailto:Ashutosh.Nema at

RFC: New intrinsics masked.expandload and masked.compressstore

2016 Sep 19

RFC: New intrinsics masked.expandload and masked.compressstore

Hi all, AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in order to allow vectorization of the following loops with two specific types of cross-iteration dependencies: Compress: for (int i=0; i<N; ++i) If (t[i]) *A++ = expr; Expand: for (i=0; i<N; ++i) If (t[i]) X[i] = *A++; else

RFC: New intrinsics masked.expandload and masked.compressstore

2016 Sep 26

RFC: New intrinsics masked.expandload and masked.compressstore

| |How would this work in this case? The result would need to affect the |legality and cost of the memory instruction. From your poster, it looks |like we're talking about loops with constructs like this: | |for (i =0; i < N; i++) { | if (topVal > b[i]) { | *dst = a[i]; | dst++; | } |} | |is this loop vectorizable at all without these constructs? Good

[LLVMdev] LLVM loop vectorizer

2016 Jun 04

[LLVMdev] LLVM loop vectorizer

Hi Alex, I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes. What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:) Thanks, Michael > On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote: >

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

[LLVMdev] Limit loop vectorizer to SSE

Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]

ORC JIT Weekly #6 -- General initializer support and JITLink optimizations

2020 Feb 24

ORC JIT Weekly #6 -- General initializer support and JITLink optimizations

Hi All, The general initializer support patch has landed (see 85fb997659b plus follow up fixes). Some quick background: Until now ORC, like MCJIT, has handled static initializer discovery by searching for llvm.global_ctors and llvm.global_dtors arrays in the IR added to the JIT. This approach suffers from several drawbacks: 1) It provides no built-in support for other program representations:

LLVM Loop vectorizer - 2 vector.body blocks appear

2016 Aug 01

LLVM Loop vectorizer - 2 vector.body blocks appear

Hello. Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the beginning of July 2016) I ran the following piece of C code: void foo(long *A, long *B, long *C, long N) { for (long i = 0; i < N; ++i) { C[i] = A[i] + B[i]; } } The vectorized LLVM program I obtain contains 2 vector.body blocks - one named

[LLVMdev] A weird, reproducable problem with MCJIT

2013 Oct 14

[LLVMdev] A weird, reproducable problem with MCJIT

I switched my Common Lisp compiler to use MCJIT on the weekend and ran into a weird problem compiling one particular function. It crashes with an EXC_BAD_ACCESS error in MCJIT::finalizeObject when calling processFDE. The weird part is that the function does not appear to do anything special and I've whittled it down to the minimum size that still causes the crash. If I remove even one

[LLVMdev] A weird, reproducable problem with MCJIT

2013 Oct 14

[LLVMdev] A weird, reproducable problem with MCJIT

Hi Christian, Thanks for sharing this. Yaron Keren has been investigating some problems in the EH frame registration code recently, and I think this may be related. It at least sounds similar to the type of variations in behavior based on code size that Yaron was seeing. -Andy -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

2016 Nov 30

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

Dear all, I have just created a couple of differential reviews to enable the vectorisation of loops that have function calls to routines marked with “#pragma omp declare simd”. They can be (re)viewed here: * https://reviews.llvm.org/D27249 * https://reviews.llvm.org/D27250 The current implementation allows the loop vectorizer to generate vector code for source file as: #pragma omp declare

similar to: Loop vectorizer doesn't try to align vectors on preferred vector alignment