Displaying 20 results from an estimated 9000 matches similar to: "[LLVMdev] Proposal: pragma for branch divergence"
2015 Jan 24
2
[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence
In our experience, as Owen also suggests, a pragma or a language extension
can be avoided by a combination of static and dynamic analysis. We prefer
this approach in our compiler ;)
Regards,
Vinod
On Sat, Jan 24, 2015 at 12:09 AM, Owen Anderson <resistor at mac.com> wrote:
> Hi Jingyue,
>
> Have you considered using dynamic uniformity checks? In my experience you
> can
2015 Jan 25
2
[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence
Hi Owen and Vinod,
Thanks for sharing the paper! I like the idea a lot. Regarding the paper
itself, Vinod, are the consensual branches (e.g., cbranch.ifnone) you
mentioned in the paper publicly available in PTX ISA?
Owen, could you explain more on the approach of using branch-if-none
instructions in your mind? I believe you have lots of great insights, but I
don't see how cbranch.ifnone
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example.
nvcc sync.cu -arch=sm_35 -ptx
gives
// .globl _Z3foov
.visible .entry _Z3foov(
)
{
.reg .pred %p<2>;
.reg .s32 %r<3>;
mov.u32 %r1, %tid.x;
and.b32 %r2, %r1, 1;
setp.eq.b32 %p1, %r2, 1;
@!%p1 bra BB7_2;
bra.uni
2015 Sep 22
2
[RFC] Refinement of convergent semantics
Hi Jingyue,
I consider it a very important element of the design of convergent that it does not require baseline LLVM to contain a definition of uniformity, which would itself pull in a definition of SIMT/SPMD, warps, threads, etc. The intention is that it should be a conservative (but hopefully not too conservative) approximation, and that implementations of specific GPU programming models
2014 Jun 16
3
[LLVMdev] Attaching range metadata to IntrinsicInst
Hi,
The range metadata can only be attached to LoadInst for now. I am
considering extending its usage to IntrinsicInst so that the frontend can
annotate the range of the return value of an intrinsic call. e.g.,
%a = call i32 @llvm.xxx(), !range !0
!0 = metadata !{ i32 0, i23 1024 }
The motivation behind this extension is some optimizations we are working
on for CUDA programs. Some special
2014 Jun 17
5
[LLVMdev] Attaching range metadata to IntrinsicInst
Chandler Carruth wrote:
> This seems fine to me, but I'd like to make sure it looks OK to Nick as
> well.
I strongly prefer baking in knowledge about the intrinsics themselves
into the passes if possible. Metadata will always be secondary.
Separately, should value tracking look use range metadata when it's
available? Absolutely.
I think it should apply to all CallInst not just
2014 Jun 17
4
[LLVMdev] Attaching range metadata to IntrinsicInst
On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote:
> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote:
>
>> Chandler Carruth wrote:
>>
>>> This seems fine to me, but I'd like to make sure it looks OK to Nick as
>>> well.
>>>
>>
>> I strongly prefer baking in knowledge about the
2014 Jun 17
3
[LLVMdev] Attaching range metadata to IntrinsicInst
On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote:
> Hi Eric,
>
> In the IR, besides "target datalayout" and "target triple", we have a
> special "target cpu" string which is set by the Clang front-end according to
> its -target-cpu flag. We also write a Module::getTargetCPU() method to
> retrieve this string from the
2014 Jun 17
2
[LLVMdev] Attaching range metadata to IntrinsicInst
Eh? How do you envision this?
-eric
On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote:
> Hi Nick,
>
> That makes sense. I think a main issue here is that the ranges of these PTX
> special registers (e.g., threadIdx.x) depend on -target-cpu which is only
> visible to clang and llc. Would you mind we specify "target cpu" in the IR
> similar
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin,
Is a compiler allowed to inline a function that calls __syncthreads? I saw
nvcc does that, but not sure it's valid though. For example,
void foo() {
__syncthreads();
}
if (threadIdx.x % 2 == 0) {
...
foo();
} else {
...
foo();
}
Before inlining, all threads meet at one __syncthreads(). After inlining
if (threadIdx.x % 2 == 0) {
...
__syncthreads();
} else {
...
2014 Apr 19
4
[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs
Hi,
We wrote an optimization that eliminates common sub-expressions in a group
of similar GEPs for the NVPTX backend. It speeds up some of our benchmarks
by up to 20%, which convinces us to try to upstream it. Here's a brief
description of why we wrote this optimization, what we did, and how we did
it.
Loops in CUDA programs are often extensively unrolled by programmers and
compilers,
2015 Aug 14
2
[LLVMdev] RFC: Convergent attribute
Hi Jingyue,
Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that.
I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like
2014 Apr 21
2
[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs
Hi Hal,
Thanks for your comments! I'm inlining my responses below.
Jingyue
On Sat, Apr 19, 2014 at 6:38 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> Jingyue,
>
> I can't speak for the NVPTX backend, but I think this looks useful as an
> (optional) target-independent pass. A few thoughts:
>
> - Running GVN tends to be pretty expensive; have you tried EarlyCSE
2015 Aug 14
2
[LLVMdev] RFC: Convergent attribute
Hi Mehdi,
My reading of it is that if you have a convergent instruction A, it is
legal to duplicate it to instruction B if (assuming B is after A in program
flow) A dominates B and B post-dominates A.
James
On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev <
> llvm-dev at
2015 Sep 04
9
[RFC] Refinement of convergent semantics
Hi all,
In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that should resolve a lot of the identified problems regarding loop unrolling, loop unswitching, etc. Credit to John McCall for talking this over with me and seeding the core ideas.
Today,
2018 Dec 19
5
[RFC] Adding thread group semantics to LangRef (motivated by GPUs)
Hi all,
LLVM needs a solution to the long-standing problem that the IR is unable
to express certain semantics expected by high-level programming
languages that target GPUs.
Solving this issue is necessary both for upstream use of LLVM as a
compiler backend for GPUs and for correctly supporting LLVM IR <->
SPIR-V roundtrip translation. It may also be useful for compilers
targeting
2017 Jul 14
2
[SPIR/PTX] Divergence analysis for BasicBlocks
Hello,
It seems to me that our current DivergenceAnalysis does not save which
BasicBlocks may suffer from divergent control. Am I correct?
I want to modify our DivergenceAnalysis to add a "bool
isControlDivergent(BasicBlock*) const" method and save in the divergence
propagator the basicblock that are divergent. I am not sure that is
entirely correct, if you have input on that please
2017 Jul 21
2
[SPIR/PTX] Divergence analysis for BasicBlocks
Hello,
Yes? Where is allActive defined, I couldn't find it.
Basically, a BB is control divergent if it's execution depends on a branch
that itself depends on a divergent ssa value.
On Fri, Jul 21, 2017 at 4:13 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote:
> What would be the definition of “isControlDivergent(BasicBlock*)”; the
> complementary of “allActive(BasicBlock*)” –
2015 May 13
8
[LLVMdev] RFC: Convergent attribute
Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome.
—Owen
In order to make LLVM more suitable for programming models variously called SPMD
and SIMT, we would like to propose a new intrinsic and MachineInstr annotation
called
2016 Mar 05
2
instrumenting device code with gpucc
On Fri, Mar 4, 2016 at 5:50 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com>
wrote:
> Hi Jingyue,
>
> My name is Yuanfeng Peng, I'm a PhD student at UPenn. I'm sorry to bother
> you, but I'm having trouble with gpucc in my project, and I would be really
> grateful for your help!
>
> Currently we're trying to instrument CUDA code using LLVM 3.9, and