Jingyue Wu via llvm-dev
2015-Aug-21 20:24 UTC
[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?
Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ... __syncthreads(); } The __syncthreads call is duplicated, and it's no longer guaranteed that all threads can meet one __syncthreads(). Any thoughts? Jingyue -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/196c4b79/attachment.html>
Justin Holewinski via llvm-dev
2015-Aug-21 20:56 UTC
[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?
That’s an interesting case. AFAIK, inlining should be restricted here for the reason you mention. Inlining should only be valid if it doesn’t duplicate the barrier. Which nvcc shows this behavior? Adding Vinod and Yuan for comment.> On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com> wrote: > > Hi Justin, > > Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, > > void foo() { > __syncthreads(); > } > > if (threadIdx.x % 2 == 0) { > ... > foo(); > } else { > ... > foo(); > } > > Before inlining, all threads meet at one __syncthreads(). After inlining > > if (threadIdx.x % 2 == 0) { > ... > __syncthreads(); > } else { > ... > __syncthreads(); > } > > The __syncthreads call is duplicated, and it's no longer guaranteed that all threads can meet one __syncthreads(). > > Any thoughts? > > Jingyue----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------
David Majnemer via llvm-dev
2015-Aug-21 21:00 UTC
[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?
Perhaps it is semantics preserving so long as the __syncthreads callsite is marked noduplicate? https://github.com/llvm-mirror/llvm/blob/896f064a4900458e3fb245ad3f6fc9e7a3d8c8cd/lib/Analysis/InlineCost.cpp#L1284 On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski via llvm-dev < llvm-dev at lists.llvm.org> wrote:> That’s an interesting case. AFAIK, inlining should be restricted here for > the reason you mention. Inlining should only be valid if it doesn’t > duplicate the barrier. Which nvcc shows this behavior? > > Adding Vinod and Yuan for comment. > > > On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com> wrote: > > > > Hi Justin, > > > > Is a compiler allowed to inline a function that calls __syncthreads? I > saw nvcc does that, but not sure it's valid though. For example, > > > > void foo() { > > __syncthreads(); > > } > > > > if (threadIdx.x % 2 == 0) { > > ... > > foo(); > > } else { > > ... > > foo(); > > } > > > > Before inlining, all threads meet at one __syncthreads(). After inlining > > > > if (threadIdx.x % 2 == 0) { > > ... > > __syncthreads(); > > } else { > > ... > > __syncthreads(); > > } > > > > The __syncthreads call is duplicated, and it's no longer guaranteed that > all threads can meet one __syncthreads(). > > > > Any thoughts? > > > > Jingyue > > > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and > may contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > ----------------------------------------------------------------------------------- > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/3713242b/attachment.html>
Jingyue Wu via llvm-dev
2015-Aug-21 22:11 UTC
[llvm-dev] [CUDA/NVPTX] is inlining __syncthreads allowed?
I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni BB7_1; BB7_1: bar.sync 0; bra.uni BB7_3; BB7_2: bar.sync 0; BB7_3: ret; } As you see, bar.sync is duplicated. On Fri, Aug 21, 2015 at 1:56 PM, Justin Holewinski <jholewinski at nvidia.com> wrote:> That’s an interesting case. AFAIK, inlining should be restricted here for > the reason you mention. Inlining should only be valid if it doesn’t > duplicate the barrier. Which nvcc shows this behavior? > > Adding Vinod and Yuan for comment. > > > On Aug 21, 2015, at 4:24 PM, Jingyue Wu <jingyue at google.com> wrote: > > > > Hi Justin, > > > > Is a compiler allowed to inline a function that calls __syncthreads? I > saw nvcc does that, but not sure it's valid though. For example, > > > > void foo() { > > __syncthreads(); > > } > > > > if (threadIdx.x % 2 == 0) { > > ... > > foo(); > > } else { > > ... > > foo(); > > } > > > > Before inlining, all threads meet at one __syncthreads(). After inlining > > > > if (threadIdx.x % 2 == 0) { > > ... > > __syncthreads(); > > } else { > > ... > > __syncthreads(); > > } > > > > The __syncthreads call is duplicated, and it's no longer guaranteed that > all threads can meet one __syncthreads(). > > > > Any thoughts? > > > > Jingyue > > > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and > may contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > ----------------------------------------------------------------------------------- >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/5538a63f/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: sync.cu Type: application/octet-stream Size: 273 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150821/5538a63f/attachment.obj>
Reasonably Related Threads
- [CUDA/NVPTX] is inlining __syncthreads allowed?
- [LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs
- [LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs
- NVPTX - Reordering load instructions
- [LLVMdev] CUDA front-end (CUDA to LLVM IR)