thr3ads.net - llvm dev - [LLVMdev] Changes to the PTX calling conventions [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Villmow, Micah

2011-Dec-13 20:37 UTC

[LLVMdev] Changes to the PTX calling conventions

From: Justin Holewinski [mailto:justin.holewinski at gmail.com]
Sent: Tuesday, December 13, 2011 10:50 AM
To: Villmow, Micah
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Changes to the PTX calling conventions

On Tue, Dec 13, 2011 at 12:54 PM, Villmow, Micah <Micah.Villmow at
amd.com<mailto:Micah.Villmow at amd.com>> wrote:

From: Justin Holewinski [mailto:justin.holewinski at
gmail.com<mailto:justin.holewinski at gmail.com>]
Sent: Tuesday, December 13, 2011 9:48 AM
To: Villmow, Micah
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Changes to the PTX calling conventions

On Tue, Dec 13, 2011 at 11:25 AM, Villmow, Micah <Micah.Villmow at
amd.com<mailto:Micah.Villmow at amd.com>> wrote:
Currently, PTX has its own calling conventions where they are split into
kernel/device.
The AMDIL backend requires very similar calling conventions and I was wondering
if
we could change the calling conventions from PTX_* to something more generic?

Maybe just Kernel/Device? Or would it be preferable to add a new calling
convention
that is unique for each target, even though it duplicates functionality?

I don't see any reason why a generic calling convention would not work.  We
could do something like cl_device/cl_kernel.  I hate to introduce OpenCL terms
into a back-end where OpenCL is just one consumer, but it does map cleanly to
the architecture model.  Or perhaps something more generic like
gpu_device/gpu_global.
[Villmow, Micah] Yeah, but this should apply to more than just gpu's. For
example, AMD's OpenCL CPU implementation could utilize the calling
conventions, along with projects like ocelot that have the device-only vs
host/device differentiation. Maybe just device/host is good enough?

Device/host just seems vague.  Maybe we could create a set of specific
conventions, one set for OpenCL: cl_device/cl_kernel, and another set for
general accelerators, e.g. accel_device/accel_global.
[Villmow, Micah] Yeah, that is true. What about leaving the calling convention
alone for 'device' and just having a calling convention for
'kernel'(i.e. functions callable from another device). The normal
calling conventions handle calls from the same device, but there is no calling
convention that handles functions that are callable from a seperate device. This
would handle the CPU/GPU and accelerator cases. That I believe is the
fundamental difference between the two calling conventions that OpenCL uses.

Thanks,
Micah

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

--
Thanks,

Justin Holewinski

--
Thanks,

Justin Holewinski

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111213/e96340b1/attachment.html>

Justin Holewinski

2011-Dec-13 20:50 UTC

head link

[LLVMdev] Changes to the PTX calling conventions

On Tue, Dec 13, 2011 at 3:37 PM, Villmow, Micah <Micah.Villmow at
amd.com>wrote:
>  ** **
>
> *From:* Justin Holewinski [mailto:justin.holewinski at gmail.com]
> *Sent:* Tuesday, December 13, 2011 10:50 AM
>
> *To:* Villmow, Micah
> *Cc:* LLVM Developers Mailing List
> *Subject:* Re: [LLVMdev] Changes to the PTX calling conventions****
>
>  ** **
>
> On Tue, Dec 13, 2011 at 12:54 PM, Villmow, Micah <Micah.Villmow at
amd.com>
> wrote:****
>
>  ****
>
>  ****
>
> *From:* Justin Holewinski [mailto:justin.holewinski at gmail.com]
> *Sent:* Tuesday, December 13, 2011 9:48 AM
> *To:* Villmow, Micah
> *Cc:* LLVM Developers Mailing List
> *Subject:* Re: [LLVMdev] Changes to the PTX calling conventions****
>
>  ****
>
> On Tue, Dec 13, 2011 at 11:25 AM, Villmow, Micah <Micah.Villmow at
amd.com>
> wrote:****
>
> Currently, PTX has its own calling conventions where they are split into
> kernel/device. ****
>
> The AMDIL backend requires very similar calling conventions and I was
> wondering if ****
>
> we could change the calling conventions from PTX_* to something more
> generic?****
>
>  ****
>
> Maybe just Kernel/Device? Or would it be preferable to add a new calling
> convention****
>
> that is unique for each target, even though it duplicates functionality?**
> **
>
>  ****
>
> I don't see any reason why a generic calling convention would not work.
>  We could do something like cl_device/cl_kernel.  I hate to introduce
> OpenCL terms into a back-end where OpenCL is just one consumer, but it does
> map cleanly to the architecture model.  Or perhaps something more generic
> like gpu_device/gpu_global.****
>
> *[Villmow, Micah] Yeah, but this should apply to more than just gpu's.
> For example, AMD's OpenCL CPU implementation could utilize the calling
> conventions, along with projects like ocelot that have the device-only vs
> host/device differentiation. Maybe just device/host is good enough?*****
>
> ** **
>
> Device/host just seems vague.  Maybe we could create a set of specific
> conventions, one set for OpenCL: cl_device/cl_kernel, and another set for
> general accelerators, e.g. accel_device/accel_global.****
>
> *[Villmow, Micah] Yeah, that is true. What about leaving the calling
> convention alone for 'device' and just having a calling convention
for
> 'kernel'(i.e. functions callable from another device). The normal
calling
> conventions handle calls from the same device, but there is no calling
> convention that handles functions that are callable from a seperate device.
> This would handle the CPU/GPU and accelerator cases. That I believe is the
> fundamental difference between the two calling conventions that OpenCL
uses.
> *
>
You mean having no calling convention for device functions, and a new,
common calling convention for kernels?

While this would work in practice, my issue with this approach is that it
goes against the LLVM reference:

*"ccc" - The C calling convention*:This calling convention (the
default if
no other calling convention is specified) matches the target C calling
conventions. This calling convention supports varargs function calls and
tolerates some mismatch in the declared prototype and implemented
declaration of the function (as does normal C).

Our devices do not really have a "C calling convention," so the
default
does not make much sense.  However, I have no objection to modifying the
documentation to state that the C calling convention is the default for
targets that support that convention.

* *****>
>  ****
>
>     ****
>
>   ****
>
> Thanks,****
>
> Micah****
>
>  ****
>
>  ****
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev****
>
>
>
> ****
>
>  ****
>
> -- ****
>
> Thanks,****
>
>  ****
>
> Justin Holewinski****
>
>  ****
>
>
>
> ****
>
> ** **
>
> -- ****
>
> Thanks,****
>
> ** **
>
> Justin Holewinski****
>
> ** **
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111213/b52be599/attachment.html>

Peter Collingbourne

2011-Dec-13 21:42 UTC

head link

[LLVMdev] Changes to the PTX calling conventions

On Tue, Dec 13, 2011 at 03:50:28PM -0500, Justin Holewinski
wrote:> You mean having no calling convention for device functions, and a new,
> common calling convention for kernels?
> 
> While this would work in practice, my issue with this approach is that it
> goes against the LLVM reference:
> 
> *"ccc" - The C calling convention*:This calling convention (the
default if
> no other calling convention is specified) matches the target C calling
> conventions. This calling convention supports varargs function calls and
> tolerates some mismatch in the declared prototype and implemented
> declaration of the function (as does normal C).
> 
> Our devices do not really have a "C calling convention," so the
default
> does not make much sense.  However, I have no objection to modifying the
> documentation to state that the C calling convention is the default for
> targets that support that convention.
I think that we should have the default calling convention map
to *something* on every target.  On PTX, the ptx_device calling
convention makes sense.

The reason I would like the C calling convention to be supported is
that it allows us to write callable generic functions in LLVM bitcode.
In libclc I had to write identity wrappers for each target [1] around
functions implemented in LLVM bitcode just to be able to call them,
and it would be much more convenient if I didn't have to do this for
every target.

Thanks,
-- 
Peter

[1]
http://git.pcc.me.uk/?p=~peter/libclc.git;a=blob;f=ptx/lib/integer/add_sat.ll;h=9b8311cfb9ce9158f5359fec0d864e37e132b701;hb=3637bed2ea414a417e8e0d8d0d22e29cb0d3767e

Pekka Jääskeläinen

2011-Dec-14 07:47 UTC

head link

[LLVMdev] Changes to the PTX calling conventions

Hi all,

On 12/13/2011 10:50 PM, Justin Holewinski wrote:> You mean having no calling convention for device functions, and a new,
common
> calling convention for kernels?
I think this might make sense.

One major issue with OpenCL C (and I suppose CUDA) kernels some
fail to see is that the functions are "directly callable"
(just by choosing a correct the calling convention) in general only for
SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs).

For the MIMD (with possible SIMD/vector extensions) CPU-architectures
you need to transform the kernel function to a "work group function"
so it retains its parallel work item semantics whenever the kernel is
to be called with more than 1 parallel work items.

The transformation is not completely trivial due to the work
group (WG) barrier semantics. You can have barriers inside for-loops,
conditional blocks, etc. which makes it a more difficult compilation
problem than "just adding a loop around the whole WI kernel function".
Converting the "single WI kernel semantics" to work group
functions statically while avoiding threads for WI execution
is the main point of complexity the pocl project [1] has to go
through.

For OpenCL compilation I think it's common to inline everything to
the kernel functions so the "device functions" usually just disappear.
This makes sense for SIMT and also when you do vectorization across
WIs of a WG, or in general want to improve the DLP/ILP of the kernel.
That said, you might not want to fully inline with all targets
(e.g. with a CPU with SIMD + OoOE you might want to reduce the icache
footprint and not inline).

Therefore, the kernel functions in this sense are different from the
device functions and at least the metadata that marks the kernels is
still needed. In pocl the OpenCL compilation is now enabled for all
(CPU) targets supported by LLVM solely depending on the kernel metadata.
In case *only* the kernel functions are marked with this calling
convention, the kernel metadata might not be needed. But, you still
might need the calling convention for the device functions if you
assume them not to get always inlined.

[1] https://launchpad.net/pocl

Best regards,
--
--Pekka

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Dec 2011 - [LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

Seemingly Similar Threads