thr3ads.net - search: "bb7

Displaying 2 results from an estimated 2 matches for "bb7_2".

Did you mean: bb0_2

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

...u -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni BB7_1; BB7_1: bar.sync 0; bra.uni BB7_3; BB7_2: bar.sync 0; BB7_3: ret; } As you see, bar.sync is duplicated. On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski <jholewinski at nvidia.com> wrote: > That’s...

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

search for: bb7_2