thr3ads.net - search: "parallel

Displaying 19 results from an estimated 19 matches for "parallel_for".

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 25

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

...rious > to see how what you come up with might apply to other use cases as well. OK, attached is the first quick attempt towards this. I'm not proposing committing this, but would like to get comments to possibly move towards something committable. It simply looks for a metadata named 'parallel_for' in any of the instructions in the loop's header and assumes the loop is a parallel one if such is found. This metadata is added by the pocl's wiloops generation routine. It passes the pocl test suite when enabled but probably cannot vectorize many kernels (at least) due to the missing...

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 25

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

Pekka, I am in favor of adding metadata to control different aspects of vectorization, mainly for supporting user-level pargmas [1] but also for DSLs. Before we start adding metadata to the IR we need to define the semantics of the tags. "Parallel_for" is too general. We also want to control vectorization factor, unroll factor, cost model, etc. Doug Gregor suggested to add the metadata to the branch instruction of the latch block in the loop. My main concern is that your approach for vectorizing OpenCL is wrong. OpenCL was designed for...

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 25

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

> I am in favor of adding metadata to control different aspects of > vectorization, mainly for supporting user-level pargmas [1] but also for > DSLs. Before we start adding metadata to the IR we need to define the > semantics of the tags. "Parallel_for" is too general. We also want to control > vectorization factor, unroll factor, cost model, etc. These are used to control *how* the loops are parallelized. The generic "parallel_for" lets the compiler (to try) to do the actual parallelization decisions based on the target (aim f...

[LLVMdev] [PATCH] parallel loop metadata

2013 Jan 29

[LLVMdev] [PATCH] parallel loop metadata

...9;t have any objections. I think the only requirement is that the semantics are clearly defined. Personally I think this metadata should be used to guide the vectorizer only. I'm not sure how it will be used in the context of OpenMP or OpenCL. For OpenMP I assume you'd add this metadata to parallel_for loops. At what point do you insert the runtime calls? Does LLVM need to know how to target each runtime? paul > -- > --Pekka > > _______________________________________________ > llvm-commits mailing list > llvm-commits at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/lis...

[LLVMdev] [PATCH] parallel loop metadata

2013 Jan 29

[LLVMdev] [PATCH] parallel loop metadata

On 01/29/2013 08:22 PM, Dan Gohman wrote: > "Ignore assumed dependencies" is shaky semantics. I haven't seen anything > explicitly spell out which dependencies a compiler is guaranteed to detect. > How can users use such a directive safely in a loop which has dependencies? > I understand that this is what icc's documentation says, but I'm wondering > if

[LLVMdev] [PATCH] parallel loop metadata

2013 Jan 29

[LLVMdev] [PATCH] parallel loop metadata

...nk the only requirement is that the > semantics are clearly defined. > > Personally I think this metadata should be used to guide the > vectorizer only. I'm not sure how it will be used in the context of > OpenMP or OpenCL. For OpenMP I assume you'd add this metadata to > parallel_for loops. At what point do you insert the runtime calls? > Does LLVM need to know how to target each runtime? I think that the general direction being pursued for OpenMP is to do frontend lowering (outlining). However, we'd still want iteration-independence metadata to attach to the outlined l...

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 25

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

----- Original Message ----- > From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Nadav Rotem" <nrotem at apple.com> > Sent: Friday, January 25, 2013 8:14:57 AM > Subject: Re: [LLVMdev] LoopVectorizer in OpenCL

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 25

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

On 01/25/2013 04:00 PM, Hal Finkel wrote: > Based on this experience, can you propose some metadata that would allow > this to happen (so that the LoopVectorizer would be generally useful for > POCL)? I suspect this same metadata might be useful in other contexts (such > as implementing iteration-independence pragmas). I cannot yet. In this hack I simply changed LoopVectorizer to

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 31

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

...Pekka Jääskeläinen wrote: >> I am in favor of adding metadata to control different aspects of >> vectorization, mainly for supporting user-level pargmas [1] but also for >> DSLs. Before we start adding metadata to the IR we need to define the >> semantics of the tags. "Parallel_for" is too general. We also want to >> control >> vectorization factor, unroll factor, cost model, etc. > > These are used to control *how* the loops are parallelized. > The generic "parallel_for" lets the compiler (to try) to do the actual > parallelization deci...

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 28

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

...ou come up with might apply to other use cases as well. > > OK, attached is the first quick attempt towards this. I'm not > proposing committing this, but would like to get comments > to possibly move towards something committable. > > It simply looks for a metadata named 'parallel_for' in any of the > instructions in the loop's header and assumes the loop is a parallel > one if such is found. Aren't all loops in OpenCL parallel? Or are you planning to inline non-OpenCL code into your OpenCL code before running the vectorizer? If not, just have the vectorizer...

[LLVMdev] [PATCH] parallel loop metadata

2013 Jan 30

[LLVMdev] [PATCH] parallel loop metadata

...equirement is that the semantics are clearly defined. I think that's very important :-) | Personally I think this metadata should be used to guide the vectorizer only. I'm not sure how it will be used in the context of OpenMP or OpenCL. For OpenMP I assume you'd add this metadata to | parallel_for loops. At what point do you insert the runtime calls? Does LLVM need to know how to target each runtime? I think there's two use cases: where a human programmer has written an annotation (in some source language variant) where it represents a "best guess" which the compiler might dec...

LLVM multithreading support

2020 Apr 12

LLVM multithreading support

On Apr 12, 2020, at 2:23 PM, Eli Friedman <efriedma at quicinc.com> wrote: > > Yes, the llvm::Smart* family of locks still exist. But very few places are using them outside of MLIR; it’s more common to just use plain std::mutex. > > That said, I don’t think it’s really a good idea to use them, even if they were fixed to work as designed. It’s not composable: the boolean

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

2013 Jan 28

[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

...ion". It also converts the "min iteration count to vectorize" to a parameter so this can be controlled from the command line. Comments welcomed. Thanks in advance, -- Pekka -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm-3.3-loopvectorizer-parallel_for-metadata-detection.patch Type: text/x-patch Size: 2049 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130128/f580afe7/attachment.bin>

[LLVMdev] On LLD performance

2015 Mar 11

[LLVMdev] On LLD performance

...x. On the other hand, on Windows (and IIRC on Darwin), we move all library files to end of input file list and group them together, so it's effective. It improves single-thread performance. r231454 <http://reviews.llvm.org/rL231454> is to apply relocations in parallel in the writer using parallel_for. Because number of relocations are usually pretty large, and each application is independent, you basically get linear performance gain by using threads. Previously it took about 2 seconds to apply 10M relocations (the number in the commit message is wrong) on my machine, but it's now 300 milli...

(no subject)

2017 Mar 08

(no subject)

On 03/08/2017 12:44 PM, Johannes Doerfert wrote: > I don't know who pointed it out first but Mehdi made me aware of it at > CGO. I try to explain it shortly. > > Given the following situation (in pseudo code): > > alloc A[100]; > parallel_for(i = 0; i < 100; i++) > A[i] = f(i); > > acc = 1; > for(i = 0; i < 100; i++) > acc = acc * A[i]; > > Afaik, with your parallel regions there won't be a CFG loop for the > parallel initialization, right? Instead some intrinsics that annotate > the...

(no subject)

2017 Mar 08

(no subject)

..., Vikram Sadanand <vadve at illinois.edu>; TB Schardl <neboat at mit.edu>; acjacob at us.ibm.com Subject: Re: I don't know who pointed it out first but Mehdi made me aware of it at CGO. I try to explain it shortly. Given the following situation (in pseudo code): alloc A[100]; parallel_for(i = 0; i < 100; i++) A[i] = f(i); acc = 1; for(i = 0; i < 100; i++) acc = acc * A[i]; Afaik, with your parallel regions there won't be a CFG loop for the parallel initialization, right? Instead some intrinsics that annotate the parallel region and then one initialization. I...

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

...t;; TB Schardl <neboat at mit.edu>; > acjacob at us.ibm.com > Subject: Re: > > I don't know who pointed it out first but Mehdi made me aware of it at CGO. I try to explain it shortly. > > Given the following situation (in pseudo code): > > alloc A[100]; > parallel_for(i = 0; i < 100; i++) > A[i] = f(i); > > acc = 1; > for(i = 0; i < 100; i++) > acc = acc * A[i]; > > Afaik, with your parallel regions there won't be a CFG loop for the parallel initialization, right? Instead some intrinsics that annotate the parallel r...

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

...om >>> Subject: Re: >>> >>> I don't know who pointed it out first but Mehdi made me aware of it at CGO. I try to explain it shortly. >>> >>> Given the following situation (in pseudo code): >>> >>> alloc A[100]; >>> parallel_for(i = 0; i < 100; i++) >>> A[i] = f(i); >>> >>> acc = 1; >>> for(i = 0; i < 100; i++) >>> acc = acc * A[i]; >>> >>> Afaik, with your parallel regions there won't be a CFG loop for the parallel initialization, r...

(no subject)

2017 Mar 08

(no subject)

".... the problem Mehdi pointed out regarding the missed initializations of array elements, did you comment on that one yet?" What is the initializations of array elements question? I don't remember this question. Please refresh my memory. Thanks. I thought Mehdi's question is more about what are attributes needed for these IR-annotation for other LLVM pass to understand and

search for: parallel_for