thr3ads.net - search: "_

Displaying 14 results from an estimated 14 matches for "__syncthreads".

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthrea...

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

...; duplicate the barrier. Which nvcc shows this behavior? > > Adding Vinod and Yuan for comment. > > > On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com> wrote: > > > > Hi Justin, > > > > Is a compiler allowed to inline a function that calls __syncthreads? I > saw nvcc does that, but not sure it's valid though. For example, > > > > void foo() { > > __syncthreads(); > > } > > > > if (threadIdx.x % 2 == 0) { > > ... > > foo(); > > } else { > > ... > > foo(); > > }...

__syncthreads() inrtinsic seems to be missing from __clang_cuda_device_functions.h

2020 May 22

__syncthreads() inrtinsic seems to be missing from __clang_cuda_device_functions.h

Hi! I see the `__syncthreads_{and|or|count}()` CUDA intrinsics listed in https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/__clang_cuda_device_functions.h but the vanilla `__syncthreads()` intrinsic seems to be missing. Is this an oversight, or is it omitted for a specific reason? Thank you! -------------- ne...

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150408/8d0c6590/atta...

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

...2015-04-08 19:01 GMT+02:00 Ahmed ElTantawy <ahmede at ece.ubc.ca>: > >> Hi, >> >> I wanted to ask whether there is ongoing effort (or an already >> established tool) that enables to convert CUDA kernels (that uses CUDA >> specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I >> am aware that I can do this for OpenCL with the help of libclc but I can >> not find something similar for CUDA. >> >> Thanks >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at...

Missing TargetPrefix for NVVM intrinsics

2016 Jul 01

Missing TargetPrefix for NVVM intrinsics

...There seem to be three prefixes that are used in this file. About 900 are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to me if this inconsistency is intentional or warranted - should these all be named int_nvvm_*? Is there a good reason to differentiate int_ptx_*? Why does __syncthreads map to int_cuda_syncthreads, rather than int_nvvm_syncthreads? I'm probably going to go ahead and add the TargetPrefix to the nvvm intrinsics, but I'm not familiar enough with NVPTX to know what to do with the others. Thanks, -- Justin

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

[LLVMdev] RFC: Convergent attribute

...xample, if a function containing a convergent instruction is called at two call sites, inlining the function produces two convergent instructions. Neither of the two is control equivalent to the original, but they combined are in some sense. > > I came across this when I am thinking whether __syncthreads in CUDA should be tagged "convergent'. Right now, it's tagged as noduplicate so inlining and loop unrolling are disallowed. But I think noduplicate is too strong for the semantics of convergent. > > Jingyue > > On Wed, May 13, 2015 at 1:17 PM, Owen Anderson <resistor...

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 01

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

...t seems to only be possible to > use the bar.sync 0 instruction. Unless this is being removed for PTX 3.0, the > spec (and the PTX backend) support using bar.sync {0..15}. The old PTX > intrinsic also supports a non-zero integer operand. The NVVM intrinsic is there to implement CUDA's __syncthreads(). The old intrinsic is still exposed. I'll see about adding a proper NVVM intrinsic. > > * I guess this raises the question of whether or not it's actually worthwhile > retaining compatibility with the old backend. I converted my Jet compiler to > use NVVM intrinsics and st...

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

[LLVMdev] RFC: Convergent attribute

...function > containing a convergent instruction is called at two call sites, inlining > the function produces two convergent instructions. Neither of the two is > control equivalent to the original, but they combined are in some sense. > > I came across this when I am thinking whether __syncthreads in CUDA should > be tagged "convergent'. Right now, it's tagged as noduplicate so inlining > and loop unrolling are disallowed. But I think noduplicate is too strong > for the semantics of convergent. > > Jingyue > > On Wed, May 13, 2015 at 1:17 PM, Owen Anderson &...

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 02

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

...s this is being removed for PTX 3.0, the spec (and the PTX backend) support using bar.sync {0..15}. The old PTX intrinsic also supports a non-zero integer operand. </pre> </blockquote> <pre wrap=""> The NVVM intrinsic is there to implement CUDA's __syncthreads(). The old intrinsic is still exposed. I'll see about adding a proper NVVM intrinsic. </pre> <blockquote type="cite"> <pre wrap="">* I guess this raises the question of whether or not it's actually worthwhile retaining compatibility with the...

[LLVMdev] Behaviour of NVPTX intrinsic

2014 Sep 30

[LLVMdev] Behaviour of NVPTX intrinsic

I have written test.ll as below and ran 'opt' on it as " opt -std-compile-opts test.ll -S -o -" . But the output shows that there is code motion around the barrier intrinsics. test.ll ------- ; ModuleID = 'test.bc' define void @test(i16* %I_0, i16* %I_1, i16* %I_2, i16* %I_3, i16* %O_0) { entry: %T_0 = load volatile i16* %I_0 %T_1 = load volatile i16* %I_1 %T_2 =

[LLVMdev] RFC: Convergent attribute

2015 May 13

[LLVMdev] RFC: Convergent attribute

Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. —Owen In order to make LLVM more suitable for programming models variously called SPMD and SIMT, we would like to propose a new intrinsic and MachineInstr annotation called

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 Apr 29

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Justin,<br> <br> Firstly, this is great! It seems to be so much further forward in terms of features

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 Apr 27

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

Thanks for the feedback! The attached patch addresses the style issues that have been found. From: Jim Grosbach [mailto:grosbach at apple.com] Sent: Wednesday, April 25, 2012 2:22 PM To: Justin Holewinski Cc: llvm-commits at cs.uiuc.edu; llvmdev at cs.uiuc.edu; Vinod Grover Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend Hi Justin, Cool stuff, to be sure. Excited to see this. As a

search for: __syncthreads