From: Justin Holewinski [mailto:justin.holewinski at gmail.com] Sent: Tuesday, December 13, 2011 10:50 AM To: Villmow, Micah Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Changes to the PTX calling conventions On Tue, Dec 13, 2011 at 12:54 PM, Villmow, Micah <Micah.Villmow at amd.com<mailto:Micah.Villmow at amd.com>> wrote: From: Justin Holewinski [mailto:justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>] Sent: Tuesday, December 13, 2011 9:48 AM To: Villmow, Micah Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] Changes to the PTX calling conventions On Tue, Dec 13, 2011 at 11:25 AM, Villmow, Micah <Micah.Villmow at amd.com<mailto:Micah.Villmow at amd.com>> wrote: Currently, PTX has its own calling conventions where they are split into kernel/device. The AMDIL backend requires very similar calling conventions and I was wondering if we could change the calling conventions from PTX_* to something more generic? Maybe just Kernel/Device? Or would it be preferable to add a new calling convention that is unique for each target, even though it duplicates functionality? I don't see any reason why a generic calling convention would not work. We could do something like cl_device/cl_kernel. I hate to introduce OpenCL terms into a back-end where OpenCL is just one consumer, but it does map cleanly to the architecture model. Or perhaps something more generic like gpu_device/gpu_global. [Villmow, Micah] Yeah, but this should apply to more than just gpu's. For example, AMD's OpenCL CPU implementation could utilize the calling conventions, along with projects like ocelot that have the device-only vs host/device differentiation. Maybe just device/host is good enough? Device/host just seems vague. Maybe we could create a set of specific conventions, one set for OpenCL: cl_device/cl_kernel, and another set for general accelerators, e.g. accel_device/accel_global. [Villmow, Micah] Yeah, that is true. What about leaving the calling convention alone for 'device' and just having a calling convention for 'kernel'(i.e. functions callable from another device). The normal calling conventions handle calls from the same device, but there is no calling convention that handles functions that are callable from a seperate device. This would handle the CPU/GPU and accelerator cases. That I believe is the fundamental difference between the two calling conventions that OpenCL uses. Thanks, Micah _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -- Thanks, Justin Holewinski -- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111213/e96340b1/attachment.html>
On Tue, Dec 13, 2011 at 3:37 PM, Villmow, Micah <Micah.Villmow at amd.com>wrote:> ** ** > > *From:* Justin Holewinski [mailto:justin.holewinski at gmail.com] > *Sent:* Tuesday, December 13, 2011 10:50 AM > > *To:* Villmow, Micah > *Cc:* LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] Changes to the PTX calling conventions**** > > ** ** > > On Tue, Dec 13, 2011 at 12:54 PM, Villmow, Micah <Micah.Villmow at amd.com> > wrote:**** > > **** > > **** > > *From:* Justin Holewinski [mailto:justin.holewinski at gmail.com] > *Sent:* Tuesday, December 13, 2011 9:48 AM > *To:* Villmow, Micah > *Cc:* LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] Changes to the PTX calling conventions**** > > **** > > On Tue, Dec 13, 2011 at 11:25 AM, Villmow, Micah <Micah.Villmow at amd.com> > wrote:**** > > Currently, PTX has its own calling conventions where they are split into > kernel/device. **** > > The AMDIL backend requires very similar calling conventions and I was > wondering if **** > > we could change the calling conventions from PTX_* to something more > generic?**** > > **** > > Maybe just Kernel/Device? Or would it be preferable to add a new calling > convention**** > > that is unique for each target, even though it duplicates functionality?** > ** > > **** > > I don't see any reason why a generic calling convention would not work. > We could do something like cl_device/cl_kernel. I hate to introduce > OpenCL terms into a back-end where OpenCL is just one consumer, but it does > map cleanly to the architecture model. Or perhaps something more generic > like gpu_device/gpu_global.**** > > *[Villmow, Micah] Yeah, but this should apply to more than just gpu's. > For example, AMD's OpenCL CPU implementation could utilize the calling > conventions, along with projects like ocelot that have the device-only vs > host/device differentiation. Maybe just device/host is good enough?***** > > ** ** > > Device/host just seems vague. Maybe we could create a set of specific > conventions, one set for OpenCL: cl_device/cl_kernel, and another set for > general accelerators, e.g. accel_device/accel_global.**** > > *[Villmow, Micah] Yeah, that is true. What about leaving the calling > convention alone for 'device' and just having a calling convention for > 'kernel'(i.e. functions callable from another device). The normal calling > conventions handle calls from the same device, but there is no calling > convention that handles functions that are callable from a seperate device. > This would handle the CPU/GPU and accelerator cases. That I believe is the > fundamental difference between the two calling conventions that OpenCL uses. > * >You mean having no calling convention for device functions, and a new, common calling convention for kernels? While this would work in practice, my issue with this approach is that it goes against the LLVM reference: *"ccc" - The C calling convention*:This calling convention (the default if no other calling convention is specified) matches the target C calling conventions. This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function (as does normal C). Our devices do not really have a "C calling convention," so the default does not make much sense. However, I have no objection to modifying the documentation to state that the C calling convention is the default for targets that support that convention. * *****> > **** > > **** > > **** > > Thanks,**** > > Micah**** > > **** > > **** > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev**** > > > > **** > > **** > > -- **** > > Thanks,**** > > **** > > Justin Holewinski**** > > **** > > > > **** > > ** ** > > -- **** > > Thanks,**** > > ** ** > > Justin Holewinski**** > > ** ** >-- Thanks, Justin Holewinski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111213/b52be599/attachment.html>
Peter Collingbourne
2011-Dec-13 21:42 UTC
[LLVMdev] Changes to the PTX calling conventions
On Tue, Dec 13, 2011 at 03:50:28PM -0500, Justin Holewinski wrote:> You mean having no calling convention for device functions, and a new, > common calling convention for kernels? > > While this would work in practice, my issue with this approach is that it > goes against the LLVM reference: > > *"ccc" - The C calling convention*:This calling convention (the default if > no other calling convention is specified) matches the target C calling > conventions. This calling convention supports varargs function calls and > tolerates some mismatch in the declared prototype and implemented > declaration of the function (as does normal C). > > Our devices do not really have a "C calling convention," so the default > does not make much sense. However, I have no objection to modifying the > documentation to state that the C calling convention is the default for > targets that support that convention.I think that we should have the default calling convention map to *something* on every target. On PTX, the ptx_device calling convention makes sense. The reason I would like the C calling convention to be supported is that it allows us to write callable generic functions in LLVM bitcode. In libclc I had to write identity wrappers for each target [1] around functions implemented in LLVM bitcode just to be able to call them, and it would be much more convenient if I didn't have to do this for every target. Thanks, -- Peter [1] http://git.pcc.me.uk/?p=~peter/libclc.git;a=blob;f=ptx/lib/integer/add_sat.ll;h=9b8311cfb9ce9158f5359fec0d864e37e132b701;hb=3637bed2ea414a417e8e0d8d0d22e29cb0d3767e
Hi all, On 12/13/2011 10:50 PM, Justin Holewinski wrote:> You mean having no calling convention for device functions, and a new, common > calling convention for kernels?I think this might make sense. One major issue with OpenCL C (and I suppose CUDA) kernels some fail to see is that the functions are "directly callable" (just by choosing a correct the calling convention) in general only for SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs). For the MIMD (with possible SIMD/vector extensions) CPU-architectures you need to transform the kernel function to a "work group function" so it retains its parallel work item semantics whenever the kernel is to be called with more than 1 parallel work items. The transformation is not completely trivial due to the work group (WG) barrier semantics. You can have barriers inside for-loops, conditional blocks, etc. which makes it a more difficult compilation problem than "just adding a loop around the whole WI kernel function". Converting the "single WI kernel semantics" to work group functions statically while avoiding threads for WI execution is the main point of complexity the pocl project [1] has to go through. For OpenCL compilation I think it's common to inline everything to the kernel functions so the "device functions" usually just disappear. This makes sense for SIMT and also when you do vectorization across WIs of a WG, or in general want to improve the DLP/ILP of the kernel. That said, you might not want to fully inline with all targets (e.g. with a CPU with SIMD + OoOE you might want to reduce the icache footprint and not inline). Therefore, the kernel functions in this sense are different from the device functions and at least the metadata that marks the kernels is still needed. In pocl the OpenCL compilation is now enabled for all (CPU) targets supported by LLVM solely depending on the kernel metadata. In case *only* the kernel functions are marked with this calling convention, the kernel metadata might not be needed. But, you still might need the calling convention for the device functions if you assume them not to get always inlined. [1] https://launchpad.net/pocl Best regards, -- --Pekka