thr3ads.net - similar to: "[LLVMdev] parallel loop awareness to the LoopVectorizer"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] parallel loop awareness to the LoopVectorizer"

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 01/28/2013 06:51 PM, Hal Finkel wrote: > Is this sufficient to implement #pragma ivdep in clang? I'm not completely sure of this: "Note: The proven dependencies that prevent vectorization are not ignored, only assumed dependencies are ignored."

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

----- Original Message ----- > From: "Nadav Rotem" <nrotem at apple.com> > To: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Monday, January 28, 2013 10:45:36 AM > Subject: Re: [LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer > > Hi Pekka,

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 01/28/2013 09:23 PM, Redmond, Paul wrote: > If ivdep are the semantics you're going for I'd use that. Fine, except I prefer not to include 'v' in it. Vectorization is merely a one way to parallelize the loop. How does llvm.loop.ignore_assumed_deps sound? -- --Pekka

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

It sounds like a good idea to move the method in to Loop. Is there a naming scheme for metadata? I think llvm.loop.* would be helpful for loop-specific metadata. As for parallel I think it is a little too generic. If ivdep are the semantics you're going for I'd use that. paul On 2013-01-28, at 12:03 PM, Pekka Jääskeläinen wrote: > On 01/28/2013 06:45 PM, Nadav Rotem wrote: >>

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 01/28/2013 07:36 PM, Pekka Jääskeläinen wrote: > If there's a "yes" from the analyzer it still prevents the vectorization. > So, sort of a softened programmer-friendlier version of the semantics. That said, I cannot think of a case where it would *harm* if the dependency analyzer, if it can actually prove a dependency, serializes the code. Thus, the same metadata can be

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 01/28/2013 06:45 PM, Nadav Rotem wrote: > I am okay with this patch, assuming that you follow the review of Tobias > and Renato and provide a separate patch for the min-iter-count and a few > test cases. OK. Any opinions on the location of the isParallelLoop() check? Shall I put it to Loop so it is more widely accessible? I.e. Loop->isParallel(). -- Pekka

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Hi Pekka, I am okay with this patch, assuming that you follow the review of Tobias and Renato and provide a separate patch for the min-iter-count and a few test cases. I think that it would be a good idea to start a new thread and to discuss the best way to annotate loops in LLVM. Thanks, Nadav On Jan 28, 2013, at 5:49 AM, Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi> wrote: >

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Hi Renato, On 01/28/2013 03:22 PM, Renato Golin wrote: > This seems an awfully specific check on a generic part of the code... If True. Perhaps the check is better encapsulated, e.g., in the Loop class? Or, if there's such thing as a loop-carried data dependency analyzer, the correct place could be there, as a trivial "no deps" analysis. > this metadata standard in any

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 28 January 2013 11:58, Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi>wrote: > Attached is a patch which uses a simple "parallel_loop" metadata attached > to the loop branch instruction in the loop latch for skipping > cross-iteration > memory dependency checking in the LoopVectorizer. This was briefly > discussed > in the email thread "LoopVectorizer

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 29

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Hi Tobias, On 01/29/2013 10:51 AM, Tobias Grosser wrote: > Is the meta data now still valid or how do we ensure the invalid meta data is > removed? It seems it's not valid anymore. Good catch. I was requesting for these transformation cases earlier. Probably there are more not thought of yet. > I have the feeling it may be necessary to link the loop as well as the accesses > for

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 29

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 01/28/2013 12:58 PM, Pekka Jääskeläinen wrote: > Hi, > > Attached is a patch which uses a simple "parallel_loop" metadata attached > to the loop branch instruction in the loop latch for skipping > cross-iteration > memory dependency checking in the LoopVectorizer. This was briefly > discussed > in the email thread "LoopVectorizer in OpenCL C work group

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Hi, Attached is a patch which uses a simple "parallel_loop" metadata attached to the loop branch instruction in the loop latch for skipping cross-iteration memory dependency checking in the LoopVectorizer. This was briefly discussed in the email thread "LoopVectorizer in OpenCL C work group autovectorization". It also converts the "min iteration count to vectorize"

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 30

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On 01/29/2013 07:58 PM, Nadav Rotem wrote: > > On Jan 29, 2013, at 12:51 AM, Tobias Grosser <tobias at grosser.es > <mailto:tobias at grosser.es>> wrote: > >> >> # ignore assumed dependences. >> for (i = 0; i < 4; i++) { >> tmp1 = A[3i+1]; >> tmp2 = A[3i+2]; >> tmp3 = tmp1 + tmp2; >> A[3i] = tmp3; >> } >>

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 29

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

On Jan 29, 2013, at 12:51 AM, Tobias Grosser <tobias at grosser.es> wrote: > > # ignore assumed dependences. > for (i = 0; i < 4; i++) { > tmp1 = A[3i+1]; > tmp2 = A[3i+2]; > tmp3 = tmp1 + tmp2; > A[3i] = tmp3; > } > > Now I apply for whatever reason a partial reg2mem transformation. > > float tmp3[1]; > > # ignore assumed

[LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

2019 Aug 15

[LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

The ivdep pragma is designed to do exactly what the name states - ignore vector dependencies. Cray Research first implemented this in 1978 in their CFT compiler, and has supported it since. This pragma is typically used by application developers who want vectorized code when the compiler cannot automatically determine safety; it is not equivalent to the OpenMP SIMD pragma in that the compiler is

[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics

2013 Apr 09

[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics

Hello, LLVM generates two additional instructions for 128->256 bit typecasts (e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register. vxorps xmm2,xmm2,xmm2 vinsertf128 ymm0,ymm2,xmm0,0x0 Most of the industry-standard C/C++ compilers (GCC, Intel's compiler, Visual Studio compiler) don't generate any extra moves

[LLVMdev] Predicated Vector Operations

2013 May 09

[LLVMdev] Predicated Vector Operations

> I'm not sure I understand the full impact of this example, and I would like to. > > What are the desired memory model semantics for a masked store? Specifically, let me suppose a simplified vector model of <2 x i64> on an i64-word-size platform. > Hi Chandler, I brought the example in this email thread to show that the optimizations that we currently have won't

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

2016 Dec 12

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

On 12 December 2016 at 16:49, Francesco Petrogalli <Francesco.Petrogalli at arm.com> wrote: > I am not sure I understand here. In my patch, all I am doing is “vector > symbol awareness generation”. There are no globals that are generated in > the final object file, it is just the TargetLibraryInfoImpl that is being > populated with the info needed by the vectorizer. The

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

2016 Dec 12

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

Xinmin, Allow me to share a couple of comments about what Renato is saying. On 08/12/2016 22:08, "Renato Golin" <renato.golin at linaro.org> wrote: >I'm still unsure how the simplistic mangling we have today will work >around the multiple versions we could have with NEON (and in the >future, SVE) without polluting the mangling quite a lot (have you seen

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

On Thu, May 23, 2013 at 12:02 PM, Nadav Rotem <nrotem at apple.com> wrote: > > On May 23, 2013, at 8:52 AM, "Redmond, Paul" <paul.redmond at intel.com> > wrote: > > > !0 = metadata !{ metadata !1, metadata !2 } > !1 = metadata !{ metadata !"llvm.loop.parallel" } > !2 = metadata !{ metadata !"llvm.vectorization.vector_width", i32 8

similar to: [LLVMdev] parallel loop awareness to the LoopVectorizer