similar to: [LLVMdev] Loop-specific optimizations

Displaying 20 results from an estimated 6000 matches similar to: "[LLVMdev] Loop-specific optimizations"

2013 Apr 03
0
[LLVMdev] Loop-specific optimizations
Hi Tim, we at Saarland University are working on something similar to what you are describing. In principle, we enhance Clang by an attribute that allows to specify what transformation phases should be run on the annotated construct (currently functions, compound statements, or loops) and in what order. Will you be at the LLVM Euro Conference? We will have a lightning talk and poster on the
2013 Apr 05
2
[LLVMdev] Loop-specific optimizations
Hi Ralf, > we at Saarland University are working on something similar to what you > are describing. In principle, we enhance Clang by an attribute that > allows to specify what transformation phases should be run on the > annotated construct (currently functions, compound statements, or loops) > and in what order. That definitely sounds interesting. Do you add these attributes to
2013 Apr 06
0
[LLVMdev] Loop-specific optimizations
Hi Tim, On 05.04.2013 11:48, Tim Besard wrote: >> we at Saarland University are working on something similar to what you >> are describing. In principle, we enhance Clang by an attribute that >> allows to specify what transformation phases should be run on the >> annotated construct (currently functions, compound statements, or loops) >> and in what order. > That
2013 Apr 06
1
[LLVMdev] Loop-specific optimizations
Hi Ralf, > I don't think that the lightning talks will be videotaped since they are only 5 > minutes long - but I may be wrong. There are also no proceedings, and we don't > have anything ready except some examples. the lightning talks will be videotaped. We would also like to put any slides on the conference web-page (as well as integrating them into the video). If people
2013 Apr 13
2
[LLVMdev] Using llvm Metadata inside llc
The project I am working on is to use the llvm toolchain for embedded CGRA processors . This however poses some restrictions on the block formation, because modulo scheduling is applied in a later stage. For this reason the idea was to create custom pragma's to generate metadata and attach it to de branches of loops we wanted to map on a cgra module. It is a lot similar to the loop parallell
2017 Mar 08
5
(no subject)
<mehdi.amini at apple.com>, Bcc: Subject: Re: [llvm-dev] [RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension Reply-To: In-Reply-To: <20170224221713.GA931 at arch-linux-jd.home> Ping. PS. Are there actually people interested in this? We will continue working anyway but it might not make sense to put it on reviews and announce it on the ML if nobody cares. On 02/24,
2012 Oct 31
3
[LLVMdev] : Predication on SIMD architectures and LLVM
Hi all, I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we support or, and, and conditional predicates (see Scott Mahlke's papers on this
2017 Mar 08
4
(no subject)
".... the problem Mehdi pointed out regarding the missed initializations of array elements, did you comment on that one yet?" What is the initializations of array elements question? I don't remember this question. Please refresh my memory. Thanks. I thought Mehdi's question is more about what are attributes needed for these IR-annotation for other LLVM pass to understand and
2017 Mar 08
3
(no subject)
A quick update, we have been looking through all LLVM passes to identify the impact of "IR-region annotation", and interaction issues with the rest of LoopOpt and scalarOpt, e.g. interaction with vectorization when you have schedule(simd:guided: 64). What are the common properties for optimizer to know on IR-region annotations. We have our implementation working from O0, O1, O2 to O3.
2018 Jun 21
2
NVPTX - Reordering load instructions
Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > >
2017 Mar 08
3
(no subject)
> On Mar 8, 2017, at 10:55 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> >> On Mar 8, 2017, at 5:36 AM, Johannes Doerfert <doerfert at cs.uni-saarland.de> wrote: >> >> <mehdi.amini at apple.com>, >> Bcc: >> Subject: Re: [llvm-dev] [RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension >> Reply-To: >>
2017 Jan 28
3
[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension
Dear all, This RFC proposes three new LLVM IR instructions to express high-level parallel constructs in a simple, low-level fashion. For this first stage we prepared two commits that add the proposed instructions and a pass to lower them to obtain sequential IR. Both patches have be uploaded for review [1, 2]. The latter patch is very simple and the former consists of almost only mechanical
2017 Mar 08
3
[RFC][PIR] Parallel LLVM IR -- Stage 0 --
I assume the referring case is something like below, right? #pragma omp parallel num_threads(n) { #pragma omp critical { x = x + 1; } } If that is the case, the programmer is already writing the code that is not "serial equivalent". Our representation for parallelizer is %t = @llvm.region.entry()["omp.parallel"(),
2017 Mar 08
2
(no subject)
On 03/08/2017 12:44 PM, Johannes Doerfert wrote: > I don't know who pointed it out first but Mehdi made me aware of it at > CGO. I try to explain it shortly. > > Given the following situation (in pseudo code): > > alloc A[100]; > parallel_for(i = 0; i < 100; i++) > A[i] = f(i); > > acc = 1; > for(i = 0; i < 100; i++) > acc = acc *
2017 Mar 08
2
(no subject)
The IR-region annotation we proposed is as below, there is no @llvm.parallel.for.iterator()..... There is no change to loop CFG. alloc A[100]; %t = call token @llvm.region.entry()["parallel.for"()] for(i = 0; i < 100; i++) { a[i] = f(i); } @llvm.region.exit(%t)() ["end.parallel.for"()] Xinmin -----Original Message----- From: Johannes Doerfert
2017 Mar 08
2
[RFC][PIR] Parallel LLVM IR -- Stage 0 --
> On Mar 8, 2017, at 11:50 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > On 03/08/2017 01:24 PM, Tian, Xinmin wrote: >> I assume the referring case is something like below, right? >> >> #pragma omp parallel num_threads(n) >> { >> #pragma omp critical >> { >> x = x + 1; >> } >> }
2018 Jun 21
2
NVPTX - Reordering load instructions
We already have a pass that vectorizes loads and stores in nvptx and amdgpu. Not at my laptop, I forget the exact filename, but it's called load-store vectorizer. I think the question is, why is LSV not vectorizing this code? I think the answer is, llvm can't tell that the loads are aligned. Ptxas can, but only because it's (apparently) doing vectorization *after* it reesolves the
2011 Jun 10
3
Test if data uniformly distributed (newbie)
Hello, I have a bunch of files containing 300 data points each with values from 0 to 1 which also sum to 1 (I don't think the last element is relevant though). In addition, each data point is annotated as an "a" or a "b". I would like to know in which files (if any) the data is uniformly distributed. I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
2012 Sep 13
5
[LLVMdev] [OT] Control Flow Graph(CFG) into Abstract Syntax Tree(AST)
Hi, I know most compilers go from AST to CFG. I am writing a decompiler, so I was wondering if anyone knew of any documents describing how best to get from CFG to AST. The decompiler project is open source. https://github.com/jcdutton/libbeauty The decompiler already contains a disassembler and a virtual machine resulting in an annotated CFG. It uses information gained from using a virtual
2012 Nov 01
0
[LLVMdev] : Predication on SIMD architectures and LLVM
On Wed, Oct 31, 2012 at 09:13:43PM +0100, Bjorn De Sutter wrote: > Hi all, > > I am working on a CGRA backend (something like a 2D VLIW), and we also absolutely need predication. I extended the IfConversion pass to allow it to be executed multiple times and to predicate already predicated code. This is necessary to predicate code with nested conditional statements. At this point, we