thr3ads.net - llvm dev - [llvm-dev] [CUDA/NVPTX] is inlining _

If this information is useful, please help other people find it:
Share via:

Jingyue Wu via llvm-dev

2015-Aug-21 20:24 UTC

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin,

Is a compiler allowed to inline a function that calls __syncthreads? I saw
nvcc does that, but not sure it's valid though. For example,

void foo() {
  __syncthreads();
}

if (threadIdx.x % 2 == 0) {
  ...
  foo();
} else {
  ...
  foo();
}

Before inlining, all threads meet at one __syncthreads(). After inlining

if (threadIdx.x % 2 == 0) {
  ...
  __syncthreads();
} else {
  ...
  __syncthreads();
}

The __syncthreads call is duplicated, and it's no longer guaranteed that
all threads can meet one __syncthreads().

Any thoughts?

Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/196c4b79/attachment.html>

Justin Holewinski via llvm-dev

2015-Aug-21 20:56 UTC

head link

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

That’s an interesting case.  AFAIK, inlining should be restricted here for the
reason you mention.  Inlining should only be valid if it doesn’t duplicate the
barrier.  Which nvcc shows this behavior?

Adding Vinod and Yuan for comment.
> On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com>
wrote:
> 
> Hi Justin, 
> 
> Is a compiler allowed to inline a function that calls __syncthreads? I saw
nvcc does that, but not sure it's valid though. For example,
> 
> void foo() {
>   __syncthreads();
> }
> 
> if (threadIdx.x % 2 == 0) {
>   ...
>   foo();
> } else {
>   ...
>   foo();
> }
> 
> Before inlining, all threads meet at one __syncthreads(). After inlining
> 
> if (threadIdx.x % 2 == 0) {
>   ...
>   __syncthreads();
> } else {
>   ...
>   __syncthreads();
> }
> 
> The __syncthreads call is duplicated, and it's no longer guaranteed
that all threads can meet one __syncthreads().
> 
> Any thoughts? 
> 
> Jingyue

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information.  Any unauthorized review, use, disclosure or
distribution
is prohibited.  If you are not the intended recipient, please contact the sender
by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

David Majnemer via llvm-dev

2015-Aug-21 21:00 UTC

head link

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

Perhaps it is semantics preserving so long as the __syncthreads callsite is
marked noduplicate?

https://github.com/llvm-mirror/llvm/blob/896f064a4900458e3fb245ad3f6fc9e7a3d8c8cd/lib/Analysis/InlineCost.cpp#L1284

On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> That’s an interesting case.  AFAIK, inlining should be restricted here for
> the reason you mention.  Inlining should only be valid if it doesn’t
> duplicate the barrier.  Which nvcc shows this behavior?
>
> Adding Vinod and Yuan for comment.
>
> > On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com>
wrote:
> >
> > Hi Justin,
> >
> > Is a compiler allowed to inline a function that calls __syncthreads? I
> saw nvcc does that, but not sure it's valid though. For example,
> >
> > void foo() {
> >   __syncthreads();
> > }
> >
> > if (threadIdx.x % 2 == 0) {
> >   ...
> >   foo();
> > } else {
> >   ...
> >   foo();
> > }
> >
> > Before inlining, all threads meet at one __syncthreads(). After
inlining
> >
> > if (threadIdx.x % 2 == 0) {
> >   ...
> >   __syncthreads();
> > } else {
> >   ...
> >   __syncthreads();
> > }
> >
> > The __syncthreads call is duplicated, and it's no longer
guaranteed that
> all threads can meet one __syncthreads().
> >
> > Any thoughts?
> >
> > Jingyue
>
>
>
>
-----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
>
-----------------------------------------------------------------------------------
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/3713242b/attachment.html>

Jingyue Wu via llvm-dev

2015-Aug-21 22:11 UTC

head link

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example.

nvcc sync.cu -arch=sm_35 -ptx

gives

        // .globl       _Z3foov
.visible .entry _Z3foov(

)
{
        .reg .pred      %p<2>;
        .reg .s32       %r<3>;


        mov.u32         %r1, %tid.x;
        and.b32         %r2, %r1, 1;
        setp.eq.b32     %p1, %r2, 1;
        @!%p1 bra       BB7_2;
        bra.uni         BB7_1;

BB7_1:
        bar.sync        0;
        bra.uni         BB7_3;

BB7_2:
        bar.sync        0;

BB7_3:
        ret;
}

As you see, bar.sync is duplicated.

On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski <jholewinski at
nvidia.com>
wrote:
> That’s an interesting case.  AFAIK, inlining should be restricted here for
> the reason you mention.  Inlining should only be valid if it doesn’t
> duplicate the barrier.  Which nvcc shows this behavior?
>
> Adding Vinod and Yuan for comment.
>
> > On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com>
wrote:
> >
> > Hi Justin,
> >
> > Is a compiler allowed to inline a function that calls __syncthreads? I
> saw nvcc does that, but not sure it's valid though. For example,
> >
> > void foo() {
> >   __syncthreads();
> > }
> >
> > if (threadIdx.x % 2 == 0) {
> >   ...
> >   foo();
> > } else {
> >   ...
> >   foo();
> > }
> >
> > Before inlining, all threads meet at one __syncthreads(). After
inlining
> >
> > if (threadIdx.x % 2 == 0) {
> >   ...
> >   __syncthreads();
> > } else {
> >   ...
> >   __syncthreads();
> > }
> >
> > The __syncthreads call is duplicated, and it's no longer
guaranteed that
> all threads can meet one __syncthreads().
> >
> > Any thoughts?
> >
> > Jingyue
>
>
>
>
-----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
>
-----------------------------------------------------------------------------------
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/5538a63f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sync.cu
Type: application/octet-stream
Size: 273 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/5538a63f/attachment.obj>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Aug 2015 - [CUDA/NVPTX] is inlining __syncthreads allowed?

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?

Apparently Analagous Threads