thr3ads.net - similar to: "[LLVMdev] Loop-specific optimizations"

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] Loop-specific optimizations"

2013 Apr 03

[LLVMdev] Loop-specific optimizations

Hi Tim, we at Saarland University are working on something similar to what you are describing. In principle, we enhance Clang by an attribute that allows to specify what transformation phases should be run on the annotated construct (currently functions, compound statements, or loops) and in what order. Will you be at the LLVM Euro Conference? We will have a lightning talk and poster on the

[LLVMdev] Loop-specific optimizations

2013 Apr 05

[LLVMdev] Loop-specific optimizations

Hi Ralf, > we at Saarland University are working on something similar to what you > are describing. In principle, we enhance Clang by an attribute that > allows to specify what transformation phases should be run on the > annotated construct (currently functions, compound statements, or loops) > and in what order. That definitely sounds interesting. Do you add these attributes to

[LLVMdev] Loop-specific optimizations

2013 Apr 06

[LLVMdev] Loop-specific optimizations

Hi Tim, On 05.04.2013 11:48, Tim Besard wrote: >> we at Saarland University are working on something similar to what you >> are describing. In principle, we enhance Clang by an attribute that >> allows to specify what transformation phases should be run on the >> annotated construct (currently functions, compound statements, or loops) >> and in what order. > That

[LLVMdev] Loop-specific optimizations

2013 Apr 06

[LLVMdev] Loop-specific optimizations

Hi Ralf, > I don't think that the lightning talks will be videotaped since they are only 5 > minutes long - but I may be wrong. There are also no proceedings, and we don't > have anything ready except some examples. the lightning talks will be videotaped. We would also like to put any slides on the conference web-page (as well as integrating them into the video). If people

[LLVMdev] Using llvm Metadata inside llc

2013 Apr 13

[LLVMdev] Using llvm Metadata inside llc

The project I am working on is to use the llvm toolchain for embedded CGRA processors . This however poses some restrictions on the block formation, because modulo scheduling is applied in a later stage. For this reason the idea was to create custom pragma's to generate metadata and attach it to de branches of loops we wanted to map on a cgra module. It is a lot similar to the loop parallell

(no subject)

2017 Mar 08

(no subject)

<mehdi.amini at apple.com>, Bcc: Subject: Re: [llvm-dev] [RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension Reply-To: In-Reply-To: <20170224221713.GA931 at arch-linux-jd.home> Ping. PS. Are there actually people interested in this? We will continue working anyway but it might not make sense to put it on reviews and announce it on the ML if nobody cares. On 02/24,

[LLVMdev] : Predication on SIMD architectures and LLVM

2012 Oct 31

[LLVMdev] : Predication on SIMD architectures and LLVM

Hi all, I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this

(no subject)

2017 Mar 08

(no subject)

".... the problem Mehdi pointed out regarding the missed initializations of array elements, did you comment on that one yet?" What is the initializations of array elements question? I don't remember this question. Please refresh my memory. Thanks. I thought Mehdi's question is more about what are attributes needed for these IR-annotation for other LLVM pass to understand and

(no subject)

2017 Mar 08

(no subject)

A quick update, we have been looking through all LLVM passes to identify the impact of "IR-region annotation", and interaction issues with the rest of LoopOpt and scalarOpt, e.g. interaction with vectorization when you have schedule(simd:guided: 64). What are the common properties for optimizer to know on IR-region annotations. We have our implementation working from O0, O1, O2 to O3.

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > >

(no subject)

2017 Mar 08

(no subject)

> On Mar 8, 2017, at 10:55 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> >> On Mar 8, 2017, at 5:36 AM, Johannes Doerfert <doerfert at cs.uni-saarland.de> wrote: >> >> <mehdi.amini at apple.com>, >> Bcc: >> Subject: Re: [llvm-dev] [RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension >> Reply-To: >>

[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension

2017 Jan 28

[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension

Dear all, This RFC proposes three new LLVM IR instructions to express high-level parallel constructs in a simple, low-level fashion. For this first stage we prepared two commits that add the proposed instructions and a pass to lower them to obtain sequential IR. Both patches have be uploaded for review [1, 2]. The latter patch is very simple and the former consists of almost only mechanical

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

I assume the referring case is something like below, right? #pragma omp parallel num_threads(n) { #pragma omp critical { x = x + 1; } } If that is the case, the programmer is already writing the code that is not "serial equivalent". Our representation for parallelizer is %t = @llvm.region.entry()["omp.parallel"(),

(no subject)

2017 Mar 08

(no subject)

On 03/08/2017 12:44 PM, Johannes Doerfert wrote: > I don't know who pointed it out first but Mehdi made me aware of it at > CGO. I try to explain it shortly. > > Given the following situation (in pseudo code): > > alloc A[100]; > parallel_for(i = 0; i < 100; i++) > A[i] = f(i); > > acc = 1; > for(i = 0; i < 100; i++) > acc = acc *

(no subject)

2017 Mar 08

(no subject)

The IR-region annotation we proposed is as below, there is no @llvm.parallel.for.iterator()..... There is no change to loop CFG. alloc A[100]; %t = call token @llvm.region.entry()["parallel.for"()] for(i = 0; i < 100; i++) { a[i] = f(i); } @llvm.region.exit(%t)() ["end.parallel.for"()] Xinmin -----Original Message----- From: Johannes Doerfert

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

> On Mar 8, 2017, at 11:50 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > On 03/08/2017 01:24 PM, Tian, Xinmin wrote: >> I assume the referring case is something like below, right? >> >> #pragma omp parallel num_threads(n) >> { >> #pragma omp critical >> { >> x = x + 1; >> } >> }

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

We already have a pass that vectorizes loads and stores in nvptx and amdgpu. Not at my laptop, I forget the exact filename, but it's called load-store vectorizer. I think the question is, why is LSV not vectorizing this code? I think the answer is, llvm can't tell that the loads are aligned. Ptxas can, but only because it's (apparently) doing vectorization *after* it reesolves the

Test if data uniformly distributed (newbie)

2011 Jun 10

Test if data uniformly distributed (newbie)

Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an "a" or a "b". I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square

[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)

2012 Sep 13

[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)

Hi, I know most compilers go from AST to CFG. I am writing a decompiler, so I was wondering if anyone knew of any documents describing how best to get from CFG to AST. The decompiler project is open source. https://github.com/jcdutton/libbeauty The decompiler already contains a disassembler and a virtual machine resulting in an annotated CFG. It uses information gained from using a virtual

[LLVMdev] : Predication on SIMD architectures and LLVM

2012 Nov 01

[LLVMdev] : Predication on SIMD architectures and LLVM

On Wed, Oct 31, 2012 at 09:13:43PM +0100, Bjorn De Sutter wrote: > Hi all, > > I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we

similar to: [LLVMdev] Loop-specific optimizations