similar to: RFC: (Co-)Convergent functions and uniform function parameters

Displaying 20 results from an estimated 900 matches similar to: "RFC: (Co-)Convergent functions and uniform function parameters"

2016 Oct 24
2
RFC: (Co-)Convergent functions and uniform function parameters
On 24.10.2016 21:54, Mehdi Amini wrote: >> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> Some brain-storming on an issue with SPMD/SIMT backend support where I think some additional IR attributes would be useful. Sorry for the somewhat long mail; the short version of my current thinking is that I would like to have the following:
2016 Oct 24
2
RFC: (Co-)Convergent functions and uniform function parameters
> On Oct 24, 2016, at 4:15 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote: > > On 25.10.2016 01:11, Nicolai Hähnle wrote: >> On 24.10.2016 21:54, Mehdi Amini wrote: >>>> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev >>>> <llvm-dev at lists.llvm.org> wrote: >>>> Some brain-storming on an issue with SPMD/SIMT backend
2016 Oct 26
3
RFC: (Co-)Convergent functions and uniform function parameters
On 25.10.2016 16:28, Nicolai Hähnle wrote: > But I fear that this path leads to eternal fuzziness. Let me try a > completely different approach to define what we need by augmenting the > semantics of IR with "divergence tokens". In addition to its usual > value, every IR value carries a "divergence set" of divergence tokens. > > The basic rule is: the
2013 Nov 29
1
texelFetch sampler1/2DArray on nv50 gallium
Hi Christoph/nouveau folks, I've noticed that the piglit test "texelFetch" fails on my nv98 for sampler1DArray and sampler2DArray. I nuked the logic in nv50_ir_lowering_nv50.cpp:604 that does the f32 -> u32 conversion, and it seems to be passing now. TBH, I have no clue how the parameters are passed around and what that a[] is, but it seems like it's a u32 to begin with? (Or
2015 May 13
8
[LLVMdev] RFC: Convergent attribute
Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. —Owen In order to make LLVM more suitable for programming models variously called SPMD and SIMT, we would like to propose a new intrinsic and MachineInstr annotation called
2015 Aug 14
2
[LLVMdev] RFC: Convergent attribute
Hi Jingyue, Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that. I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like
2015 May 14
2
[LLVMdev] RFC: Convergent attribute
Why is this a regalloc problem? I assume in the example below the "r0" is somehow forced by the ABI? Because otherwise moving the texture2d operation into the branch wouldn't matter as long as we assign different registers to the two branches and use a technique like lib/Target/R600/SIFixSGPRLiveRanges.cpp. - Matthias > On May 13, 2015, at 6:00 PM, Philip Reames <listmail at
2015 Aug 14
2
[LLVMdev] RFC: Convergent attribute
Hi Mehdi, My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. James On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < > llvm-dev at
2015 Sep 22
2
[RFC] Refinement of convergent semantics
Hi Jingyue, I consider it a very important element of the design of convergent that it does not require baseline LLVM to contain a definition of uniformity, which would itself pull in a definition of SIMT/SPMD, warps, threads, etc. The intention is that it should be a conservative (but hopefully not too conservative) approximation, and that implementations of specific GPU programming models
2016 Oct 31
0
RFC: (Co-)Convergent functions and uniform function parameters
(I work on CUDA / PTX.) For one thing I'm in favor of having fewer annotations rather than more, so if we can do this in a reasonable way without introducing the notion of co-convergent calls, I think that would be a win. The one convergent annotation is difficult enough for the GPU folks to grok and then keep in cache, and everyone who works on llvm has to pay the cost of keeping their
2015 Sep 04
9
[RFC] Refinement of convergent semantics
Hi all, In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that should resolve a lot of the identified problems regarding loop unrolling, loop unswitching, etc. Credit to John McCall for talking this over with me and seeding the core ideas. Today,
2018 Dec 19
5
[RFC] Adding thread group semantics to LangRef (motivated by GPUs)
Hi all, LLVM needs a solution to the long-standing problem that the IR is unable to express certain semantics expected by high-level programming languages that target GPUs. Solving this issue is necessary both for upstream use of LLVM as a compiler backend for GPUs and for correctly supporting LLVM IR <-> SPIR-V roundtrip translation. It may also be useful for compilers targeting
2018 Mar 07
1
TLD instruction usage in non-linked sampler mode
Hi Andy, Thanks for checking! I do see an issue on Tesla as well (at least G92, and I believe someone else reported on a GT215 or GT218). However I haven't confirmed that it's the identical issue to what I see on Fermi with quite as much certainty as what I've checked on a GF108. (For the G92, the texture buffer object test fails in the same way it does on Fermi, but there could be
2018 Mar 02
2
TLD instruction usage in non-linked sampler mode
Hello, This question is in the context of Tesla / Fermi generations, which have explicit bindings for textures / samplers. It might also apply to Kepler+, not quite as sure due to the bindless nature. I've been trying to understand how the TLD operation works (which is used to implement texelFetch in GLSL). It does not appear to the op takes an explicit sampler id at all (unlike all the
2011 Dec 14
2
[LLVMdev] Changes to the PTX calling conventions
Hi all, On 12/13/2011 10:50 PM, Justin Holewinski wrote: > You mean having no calling convention for device functions, and a new, common > calling convention for kernels? I think this might make sense. One major issue with OpenCL C (and I suppose CUDA) kernels some fail to see is that the functions are "directly callable" (just by choosing a correct the calling convention) in
2015 Jan 11
2
[PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
On Sun, Jan 11, 2015 at 5:08 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote: > > > On 11.01.2015 22:54, Ilia Mirkin wrote: >> >> On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann >> <tobias.johannes.klausmann at mni.thm.de> wrote: >>> >>> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, >>>
2004 Jul 29
3
extracting the t-statistic: just the numbers, please
Hi, there I am quite sure there is an easy answer to this, but I am unsure how to gather a bunch of t-statistics in an organized format. I am trying to generate a list of t-statistics for a randomization routine. If I try to collect a bunch of t-statistics from a run, this is what happens: > M <- 10 ; simt <- NULL > for(i in 1:M) + { + perm<-sample(site,replace=F) + +
2015 Sep 14
2
[RFC] Refinement of convergent semantics
> On Sep 14, 2015, at 12:15 PM, Philip Reames <listmail at philipreames.com> wrote: > > On 09/04/2015 01:25 PM, Owen Anderson via llvm-dev wrote: >> Hi all, >> >> In light of recent discussions regarding updating passes to respect convergent semantics, and whether or not it is sufficient for barriers, I would like to propose a change in convergent semantics that
2011 Dec 14
0
[LLVMdev] Changes to the PTX calling conventions
2011/12/14 Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi> > Hi all, > > On 12/13/2011 10:50 PM, Justin Holewinski wrote: > > You mean having no calling convention for device functions, and a new, > common > > calling convention for kernels? > > I think this might make sense. > To be clear, I do like the idea of using the default calling convention for
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 09:56 AM, Nadav Rotem wrote: > Thanks for checking the Loop Vectorizer, I am interested in hearing your > feedback. The Loop Vectorizer does not fit here. OpenCL vectorization is > completely different because the language itself is data-parallel. You > don't need all of the legality checks that the loop vectorizer has. I'm aware of this and it was my point in