thr3ads.net - similar to: "[RFC] Refinement of convergent semantics"

Displaying 20 results from an estimated 10000 matches similar to: "[RFC] Refinement of convergent semantics"

[RFC] Refinement of convergent semantics

2015 Sep 22

[RFC] Refinement of convergent semantics

Hi Jingyue, I consider it a very important element of the design of convergent that it does not require baseline LLVM to contain a definition of uniformity, which would itself pull in a definition of SIMT/SPMD, warps, threads, etc. The intention is that it should be a conservative (but hopefully not too conservative) approximation, and that implementations of specific GPU programming models

[RFC] Refinement of convergent semantics

2015 Sep 14

[RFC] Refinement of convergent semantics

> On Sep 14, 2015, at 12:15 PM, Philip Reames <listmail at philipreames.com> wrote: > > On 09/04/2015 01:25 PM, Owen Anderson via llvm-dev wrote: >> Hi all, >> >> In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

[LLVMdev] RFC: Convergent attribute

Hi Mehdi, My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. James On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < > llvm-dev at

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

[LLVMdev] RFC: Convergent attribute

Hi Jingyue, Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that. I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like

[LLVMdev] RFC: Convergent attribute

2015 May 13

[LLVMdev] RFC: Convergent attribute

Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. —Owen In order to make LLVM more suitable for programming models variously called SPMD and SIMT, we would like to propose a new intrinsic and MachineInstr annotation called

[LLVMdev] RFC: Convergent attribute

2015 May 14

[LLVMdev] RFC: Convergent attribute

Why is this a regalloc problem? I assume in the example below the "r0" is somehow forced by the ABI? Because otherwise moving the texture2d operation into the branch wouldn't matter as long as we assign different registers to the two branches and use a technique like lib/Target/R600/SIFixSGPRLiveRanges.cpp. - Matthias > On May 13, 2015, at 6:00 PM, Philip Reames <listmail at

CFLAA

2016 Aug 25

CFLAA

(and sys::cas_flag that STATISTIC uses is a uint32 ...) On Thu, Aug 25, 2016 at 9:54 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > Okay, dumb question: > Are you really getting negative numbers in the second column? > > 526,766 -136 mem2reg # PHI nodes inserted > > http://llvm.org/docs/doxygen/html/PromoteMemoryToRegister_8cpp_source.html >

CFLAA

2016 Aug 25

CFLAA

I did gathered aggregate statistics reported by “-stats” over the ~400 test files. The following table summarizes the impact. The first column is the sum where the new analysis is enabled, the second column is the delta from baseline where no CFL alias analysis is performed. I am not experienced enough to know which of these are “good” or “bad” indicators. —david 72,250 685 SLP

[LLVMdev] Autotuning parameters/heuristics within LLVM

2014 Oct 02

[LLVMdev] Autotuning parameters/heuristics within LLVM

Hi, I am planning to begin a project to explore the space of tuning LLVM internals in an effort to increase performance. I am wondering if anyone can point to me any parameterizations, heuristics, or priorities functions within LLVM that can be tuned/adjusted. So far, I'm considering BranchProbabilityInfo and InlineCost. Does anyone have any other suggestions? Thanks, Robert

[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?

2014 Jan 22

[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?

On Wed, Jan 22, 2014 at 12:33 AM, Andrew Trick <atrick at apple.com> wrote: > > There appear to be two chunks of "functionality" provided by loop passes: > > > > 1) A worklist of loops to process. This is very rarely used: > > 1.1) LoopSimplify and LoopUnswitch add loops to the queue. > > I’m making this up without much thought, but we may benefit

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

2015 Jan 24

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

In our experience, as Owen also suggests, a pragma or a language extension can be avoided by a combination of static and dynamic analysis. We prefer this approach in our compiler ;) Regards, Vinod On Sat, Jan 24, 2015 at 12:09 AM, Owen Anderson <resistor at mac.com> wrote: > Hi Jingyue, > > Have you considered using dynamic uniformity checks? In my experience you > can

[RFC] Adding thread group semantics to LangRef (motivated by GPUs)

2019 Jan 31

[RFC] Adding thread group semantics to LangRef (motivated by GPUs)

Strong agree with Mehdi, I am also not really sure what is the proposal at this point so it's hard to comment further. > There are a number of questions that I have. Do we need better machine descriptions so that various resources can be considered? Do we need the capability to reason about the machine state for the cross-lane operations to enable more optimizations? Are intrinsics the

question about unrolling loops with convergent instructions

2018 Jan 11

question about unrolling loops with convergent instructions

I have a loop with convergent instructions with a loop count of 1024. I use pragma to specify unroll count to be 32. However, the loop was unrolled by 512, which results in very long compilation time. In tryToUnrollLoop, there is // If the loop contains a convergent operation, the prelude we'd add // to do the first few instructions before we hit the unrolled loop // is unsafe -- it

[LLVMdev] GSoC Proposal: Profiling Enhancements

2012 Apr 05

[LLVMdev] GSoC Proposal: Profiling Enhancements

Hello Everyone, Before I get started I just want to sincerely apologise for not getting feedback on this earlier. I've had an extremely busy week as I was presenting a paper at the CGO conference. If anyone is able to provide feedback in such a short time-frame then it will be gratefully received. If not, then I just hope the work described sounds useful. I have already submitted

[LLVMdev] Proposal: pragma for branch divergence

2015 Jan 24

[LLVMdev] Proposal: pragma for branch divergence

*Hi, I am considering a language extension to Clang for optimizing GPU programs. This extension will allow the compiler to use different optimization strategies for divergent and non-divergent branches (to be explained below). We have observed significant performance gain by leveraging this proposed extension, so I want to discuss it here to see how the community likes/dislikes the idea. I will

RFC: (Co-)Convergent functions and uniform function parameters

2016 Oct 24

RFC: (Co-)Convergent functions and uniform function parameters

Hi all, Some brain-storming on an issue with SPMD/SIMT backend support where I think some additional IR attributes would be useful. Sorry for the somewhat long mail; the short version of my current thinking is that I would like to have the following: 1) convergent: a call to a function with this attribute cannot be moved to have additional control dependencies; i.e., moving it from A to B is

RFC: (Co-)Convergent functions and uniform function parameters

2016 Oct 24

RFC: (Co-)Convergent functions and uniform function parameters

On 24.10.2016 21:54, Mehdi Amini wrote: >> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> Some brain-storming on an issue with SPMD/SIMT backend support where I think some additional IR attributes would be useful. Sorry for the somewhat long mail; the short version of my current thinking is that I would like to have the following:

RFC: Extending optimization reporting

2019 May 08

RFC: Extending optimization reporting

Hi Adam, Thanks for your input. If I understand correctly, you’re saying that we can handle the loop versioning issue by explicitly identifying new loops as they are created. So, the unswitching optimization, for example, would report that it unswitched loop-0 at source location X, creating loop-1 and loop-2, and then later the vectorizer would report that it was unable to vectorize loop-1 at

CFLAA

2016 Aug 25

CFLAA

(Adding "LLVM Dev") My variant is up as https://reviews.llvm.org/D23876 -david From: George Burgess IV <george.burgess.iv at gmail.com<mailto:george.burgess.iv at gmail.com>> Date: Wednesday, August 24, 2016 at 3:17 PM To: David Callahan <dcallahan at fb.com<mailto:dcallahan at fb.com>> Subject: Re: CFLAA Hi! > I see there is on going work with alias

[LLVMdev] [RFC] Heuristic for complete loop unrolling

2015 Jan 23

[LLVMdev] [RFC] Heuristic for complete loop unrolling

Hi devs, Recently I came across an interesting testcase that LLVM failed to optimize well. The test does some image processing, and as a part of it, it traverses all the pixels and computes some value basing on the adjacent pixels. So, the hot part looks like this: for(y = 0..height) { for (x = 0..width) { val = 0 for (j = 0..5) { for (i = 0..5) { val += img[x+i,y+j] *

similar to: [RFC] Refinement of convergent semantics