similar to: [LLVMdev] Proposal: pragma for branch divergence

Displaying 20 results from an estimated 9000 matches similar to: "[LLVMdev] Proposal: pragma for branch divergence"

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

2015 Jan 24

2

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

In our experience, as Owen also suggests, a pragma or a language extension can be avoided by a combination of static and dynamic analysis. We prefer this approach in our compiler ;) Regards, Vinod On Sat, Jan 24, 2015 at 12:09 AM, Owen Anderson <resistor at mac.com> wrote: > Hi Jingyue, > > Have you considered using dynamic uniformity checks? In my experience you > can

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

2015 Jan 25

2

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

Hi Owen and Vinod, Thanks for sharing the paper! I like the idea a lot. Regarding the paper itself, Vinod, are the consensual branches (e.g., cbranch.ifnone) you mentioned in the paper publicly available in PTX ISA? Owen, could you explain more on the approach of using branch-if-none instructions in your mind? I believe you have lots of great insights, but I don't see how cbranch.ifnone

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

2

[CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni

[RFC] Refinement of convergent semantics

2015 Sep 22

2

[RFC] Refinement of convergent semantics

Hi Jingyue, I consider it a very important element of the design of convergent that it does not require baseline LLVM to contain a definition of uniformity, which would itself pull in a definition of SIMT/SPMD, warps, threads, etc. The intention is that it should be a conservative (but hopefully not too conservative) approximation, and that implementations of specific GPU programming models

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 16

3

[LLVMdev] Attaching range metadata to IntrinsicInst

Hi, The range metadata can only be attached to LoadInst for now. I am considering extending its usage to IntrinsicInst so that the frontend can annotate the range of the return value of an intrinsic call. e.g., %a = call i32 @llvm.xxx(), !range !0 !0 = metadata !{ i32 0, i23 1024 } The motivation behind this extension is some optimizations we are working on for CUDA programs. Some special

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

5

[LLVMdev] Attaching range metadata to IntrinsicInst

Chandler Carruth wrote: > This seems fine to me, but I'd like to make sure it looks OK to Nick as > well. I strongly prefer baking in knowledge about the intrinsics themselves into the passes if possible. Metadata will always be secondary. Separately, should value tracking look use range metadata when it's available? Absolutely. I think it should apply to all CallInst not just

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

4

[LLVMdev] Attaching range metadata to IntrinsicInst

On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: > On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: > >> Chandler Carruth wrote: >> >>> This seems fine to me, but I'd like to make sure it looks OK to Nick as >>> well. >>> >> >> I strongly prefer baking in knowledge about the

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

3

[LLVMdev] Attaching range metadata to IntrinsicInst

On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Eric, > > In the IR, besides "target datalayout" and "target triple", we have a > special "target cpu" string which is set by the Clang front-end according to > its -target-cpu flag. We also write a Module::getTargetCPU() method to > retrieve this string from the

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

2

[LLVMdev] Attaching range metadata to IntrinsicInst

Eh? How do you envision this? -eric On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Nick, > > That makes sense. I think a main issue here is that the ranges of these PTX > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > visible to clang and llc. Would you mind we specify "target cpu" in the IR > similar

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

3

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

2014 Apr 19

4

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

Hi, We wrote an optimization that eliminates common sub-expressions in a group of similar GEPs for the NVPTX backend. It speeds up some of our benchmarks by up to 20%, which convinces us to try to upstream it. Here's a brief description of why we wrote this optimization, what we did, and how we did it. Loops in CUDA programs are often extensively unrolled by programmers and compilers,

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

2

[LLVMdev] RFC: Convergent attribute

Hi Jingyue, Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that. I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

2014 Apr 21

2

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

Hi Hal, Thanks for your comments! I'm inlining my responses below. Jingyue On Sat, Apr 19, 2014 at 6:38 AM, Hal Finkel <hfinkel at anl.gov> wrote: > Jingyue, > > I can't speak for the NVPTX backend, but I think this looks useful as an > (optional) target-independent pass. A few thoughts: > > - Running GVN tends to be pretty expensive; have you tried EarlyCSE

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

2

[LLVMdev] RFC: Convergent attribute

Hi Mehdi, My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. James On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < > llvm-dev at

[RFC] Refinement of convergent semantics

2015 Sep 04

9

[RFC] Refinement of convergent semantics

Hi all, In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that should resolve a lot of the identified problems regarding loop unrolling, loop unswitching, etc. Credit to John McCall for talking this over with me and seeding the core ideas. Today,

[RFC] Adding thread group semantics to LangRef (motivated by GPUs)

2018 Dec 19

5

[RFC] Adding thread group semantics to LangRef (motivated by GPUs)

Hi all, LLVM needs a solution to the long-standing problem that the IR is unable to express certain semantics expected by high-level programming languages that target GPUs. Solving this issue is necessary both for upstream use of LLVM as a compiler backend for GPUs and for correctly supporting LLVM IR <-> SPIR-V roundtrip translation. It may also be useful for compilers targeting

[SPIR/PTX] Divergence analysis for BasicBlocks

2017 Jul 14

2

[SPIR/PTX] Divergence analysis for BasicBlocks

Hello, It seems to me that our current DivergenceAnalysis does not save which BasicBlocks may suffer from divergent control. Am I correct? I want to modify our DivergenceAnalysis to add a "bool isControlDivergent(BasicBlock*) const" method and save in the divergence propagator the basicblock that are divergent. I am not sure that is entirely correct, if you have input on that please

[SPIR/PTX] Divergence analysis for BasicBlocks

2017 Jul 21

2

[SPIR/PTX] Divergence analysis for BasicBlocks

Hello, Yes? Where is allActive defined, I couldn't find it. Basically, a BB is control divergent if it's execution depends on a branch that itself depends on a divergent ssa value. On Fri, Jul 21, 2017 at 4:13 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote: > What would be the definition of “isControlDivergent(BasicBlock*)”; the > complementary of “allActive(BasicBlock*)” –

[LLVMdev] RFC: Convergent attribute

2015 May 13

8

[LLVMdev] RFC: Convergent attribute

Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. —Owen In order to make LLVM more suitable for programming models variously called SPMD and SIMT, we would like to propose a new intrinsic and MachineInstr annotation called

instrumenting device code with gpucc

2016 Mar 05

2

instrumenting device code with gpucc

On Fri, Mar 4, 2016 at 5:50 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com> wrote: > Hi Jingyue, > > My name is Yuanfeng Peng, I'm a PhD student at UPenn. I'm sorry to bother > you, but I'm having trouble with gpucc in my project, and I would be really > grateful for your help! > > Currently we're trying to instrument CUDA code using LLVM 3.9, and