Displaying 2 results from an estimated 2 matches for "bb7_2".
Did you mean:
bb0_2
2015 Aug 21
2
[CUDA/NVPTX] is inlining __syncthreads allowed?
...u -arch=sm_35 -ptx
gives
// .globl _Z3foov
.visible .entry _Z3foov(
)
{
.reg .pred %p<2>;
.reg .s32 %r<3>;
mov.u32 %r1, %tid.x;
and.b32 %r2, %r1, 1;
setp.eq.b32 %p1, %r2, 1;
@!%p1 bra BB7_2;
bra.uni BB7_1;
BB7_1:
bar.sync 0;
bra.uni BB7_3;
BB7_2:
bar.sync 0;
BB7_3:
ret;
}
As you see, bar.sync is duplicated.
On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski <jholewinski at nvidia.com>
wrote:
> That’s...
2015 Aug 21
3
[CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin,
Is a compiler allowed to inline a function that calls __syncthreads? I saw
nvcc does that, but not sure it's valid though. For example,
void foo() {
__syncthreads();
}
if (threadIdx.x % 2 == 0) {
...
foo();
} else {
...
foo();
}
Before inlining, all threads meet at one __syncthreads(). After inlining
if (threadIdx.x % 2 == 0) {
...
__syncthreads();
} else {
...