thr3ads.net - llvm dev - [LLVMdev] Changes to the PTX calling conventions [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Pekka Jääskeläinen

2011-Dec-14 07:47 UTC

[LLVMdev] Changes to the PTX calling conventions

Hi all,

On 12/13/2011 10:50 PM, Justin Holewinski wrote:> You mean having no calling convention for device functions, and a new,
common
> calling convention for kernels?
I think this might make sense.

One major issue with OpenCL C (and I suppose CUDA) kernels some
fail to see is that the functions are "directly callable"
(just by choosing a correct the calling convention) in general only for
SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs).

For the MIMD (with possible SIMD/vector extensions) CPU-architectures
you need to transform the kernel function to a "work group function"
so it retains its parallel work item semantics whenever the kernel is
to be called with more than 1 parallel work items.

The transformation is not completely trivial due to the work
group (WG) barrier semantics. You can have barriers inside for-loops,
conditional blocks, etc. which makes it a more difficult compilation
problem than "just adding a loop around the whole WI kernel function".
Converting the "single WI kernel semantics" to work group
functions statically while avoiding threads for WI execution
is the main point of complexity the pocl project [1] has to go
through.

For OpenCL compilation I think it's common to inline everything to
the kernel functions so the "device functions" usually just disappear.
This makes sense for SIMT and also when you do vectorization across
WIs of a WG, or in general want to improve the DLP/ILP of the kernel.
That said, you might not want to fully inline with all targets
(e.g. with a CPU with SIMD + OoOE you might want to reduce the icache
footprint and not inline).

Therefore, the kernel functions in this sense are different from the
device functions and at least the metadata that marks the kernels is
still needed. In pocl the OpenCL compilation is now enabled for all
(CPU) targets supported by LLVM solely depending on the kernel metadata.
In case *only* the kernel functions are marked with this calling
convention, the kernel metadata might not be needed. But, you still
might need the calling convention for the device functions if you
assume them not to get always inlined.

[1] https://launchpad.net/pocl

Best regards,
--
--Pekka

Justin Holewinski

2011-Dec-14 12:41 UTC

head link

[LLVMdev] Changes to the PTX calling conventions

2011/12/14 Pekka Jääskeläinen <pekka.jaaskelainen at tut.fi>
> Hi all,
>
> On 12/13/2011 10:50 PM, Justin Holewinski wrote:
> > You mean having no calling convention for device functions, and a new,
> common
> > calling convention for kernels?
>
> I think this might make sense.
>
To be clear, I do like the idea of using the default calling convention for
device functions.  My hesitation is from the LLVM specification that says
the default calling convention is the C calling convention, which supports
varargs.  If the spec is changed to make the supported features of the C
calling convention dependent on the target, then I'm fine with this.

Any core LLVM devs have any issues with this?

>
> One major issue with OpenCL C (and I suppose CUDA) kernels some
> fail to see is that the functions are "directly callable"
> (just by choosing a correct the calling convention) in general only for
> SIMT/SPMD-style machines (like NVIDIA and I suppose AMD's GPUs).
>
> For the MIMD (with possible SIMD/vector extensions) CPU-architectures
> you need to transform the kernel function to a "work group
function"
> so it retains its parallel work item semantics whenever the kernel is
> to be called with more than 1 parallel work items.
>
> The transformation is not completely trivial due to the work
> group (WG) barrier semantics. You can have barriers inside for-loops,
> conditional blocks, etc. which makes it a more difficult compilation
> problem than "just adding a loop around the whole WI kernel
function".
> Converting the "single WI kernel semantics" to work group
> functions statically while avoiding threads for WI execution
> is the main point of complexity the pocl project [1] has to go
> through.
>
> For OpenCL compilation I think it's common to inline everything to
> the kernel functions so the "device functions" usually just
disappear.
> This makes sense for SIMT and also when you do vectorization across
> WIs of a WG, or in general want to improve the DLP/ILP of the kernel.
> That said, you might not want to fully inline with all targets
> (e.g. with a CPU with SIMD + OoOE you might want to reduce the icache
> footprint and not inline).
>
> Therefore, the kernel functions in this sense are different from the
> device functions and at least the metadata that marks the kernels is
> still needed.  In pocl the OpenCL compilation is now enabled for all
> (CPU) targets supported by LLVM solely depending on the kernel metadata.
> In case *only* the kernel functions are marked with this calling
> convention, the kernel metadata might not be needed. But, you still
> might need the calling convention for the device functions if you
> assume them not to get always inlined.
>
We absolutely cannot rely on inlining.  An OpenCL front-end is only one
possible consumer of the PTX back-end, and general PTX supports recursion
which cannot always be inlined.

I would favor calling conventions over metadata for the simple reason that
this maps more cleanly to the device model.  Device and kernel functions
are represented differently in PTX, including (sometimes) the way
parameters are passed.

>
> [1] https://launchpad.net/pocl
>
> Best regards,
> --
> --Pekka
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111214/de4b4ec1/attachment.html>

Pekka Jääskeläinen

2011-Dec-14 12:54 UTC

head link

[LLVMdev] Changes to the PTX calling conventions

On 12/14/2011 02:41 PM, Justin Holewinski wrote:> I would favor calling conventions over metadata for the simple reason
> that this maps more cleanly to the device model.  Device and kernel
> functions are represented differently in PTX, including (sometimes) the
> way parameters are passed.
For the record, marking the kernels with "calling conventions" instead
of metadata is fine also for the pocl use case. It's enough if there is a
way
to differentiate OpenCL C kernels from the "device functions" for the
reason
I discussed in the previous email. That is, in the pocl point of view we just
need a way to pick the "host-callable" kernel functions as they need
the
special treatment before they can be called (like a C function).

BTW what about the other OpenCL data like required_wg_size which
affect the possible "kernel treatment" of pocl and can be converted to
some
special instructions (I suppose) for the SIMT targets? Currently only the
TCE target in Clang adds metadata for the required_wg_size kernel
attribute (as we need it in "offline compilation") but IMHO that could
be
useful in general, as a default metadata (to enable its support in pocl
for all targets, for example).

-- 
Pekka

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Dec 2011 - [LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

[LLVMdev] Changes to the PTX calling conventions

Reasonably Related Threads