thr3ads.net - llvm dev - [LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support) [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne

2010-Dec-24 23:49 UTC

[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

On Tue, Dec 21, 2010 at 07:17:40PM -0000, Anton Lokhmotov
wrote:> > From: Peter Collingbourne [mailto:peter at pcc.me.uk]
> > Sent: 20 December 2010 20:11
> > As with __local variables, it may be that "kernelness"
cannot be
> > represented in a standard form in LLVM.  For example on a CPU a
> > kernel function may have an additional parameter which is a pointer to
> > __local memory space, which would not be necessary on GPUs.  Then in
> > fact you would use a standard calling convention on a CPU.
> > 
> > But for GPUs, I think using the calling convention is appropriate.
> > If we standardise the calling convention number, this can be the
> > default behaviour.
> I don't think we want LLVM-IR coming from an OpenCL C frontend to be
> different for GPU and CPU targets. In my view, the frontend should be
> parameterised by only two (more or less) parameters: bitness (32/64) and
> endianness (little/big).
Not only sizes but alignment requirements will change between
platforms.  Also, what about __local on CPU?
> How one can even guarantee e.g. that a calling
> convention for NVIDIA GPUs is appropriate for ATI GPUs?
We have full control over the target code generators.  There's nothing
stopping us defining a specific constant representing the 'kernel'
calling convention and harmonising the GPU targets to use that
calling convention.

Thanks,
-- 
Peter

Villmow, Micah

2011-Jan-03 18:52 UTC

head link

[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

Sorry for the late reply, as I have been on vacation for awhile.

One method which I haven't seen mentioned is to separate out the kernel
semantics from the function definition.

All the kernel attribute does is specify that this function is an entry point to
the device from the host. So, why not just create a separate entry point that is
only callable by the host and everything from the device goes to the original
entry point.

For example, you have two functions and one calls the other:

kernel foo() {
}
kernel bar() {
  foo();
}

If you separate kernel function from the function body, then handling this
becomes easy.

You end up with four functions:

kernel foo_kernel() {
 foo();
}

foo() {
}

kernel bar_kernel() {
 bar();
}

bar(){
 foo();
}

Then the issue is no longer a compilation problem, but just an entry point
runtime issue. Instead of calling foo(), the runtime just calls foo_kernel()
which handles all of the kernel setup issues and then calls the function body
itself.

This removes the need to have any metadata nodes in the IR and allows the kernel
function to handle any setup issues for the specific device such as
__local's, id/group calculations, memory offsets, etc... without having to
impact the performance of a kernel calling another kernel.

Micah
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Peter Collingbourne
> Sent: Friday, December 24, 2010 3:50 PM
> To: Anton Lokhmotov
> Cc: cfe-dev at cs.uiuc.edu; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Function-level metadata for OpenCL (was Re:
> OpenCL support)
> 
> On Tue, Dec 21, 2010 at 07:17:40PM -0000, Anton Lokhmotov wrote:
> > > From: Peter Collingbourne [mailto:peter at pcc.me.uk]
> > > Sent: 20 December 2010 20:11
> > > As with __local variables, it may be that "kernelness"
cannot be
> > > represented in a standard form in LLVM.  For example on a CPU a
> > > kernel function may have an additional parameter which is a
pointer
> to
> > > __local memory space, which would not be necessary on GPUs.  Then
> in
> > > fact you would use a standard calling convention on a CPU.
> > >
> > > But for GPUs, I think using the calling convention is
appropriate.
> > > If we standardise the calling convention number, this can be the
> > > default behaviour.
> > I don't think we want LLVM-IR coming from an OpenCL C frontend to
be
> > different for GPU and CPU targets. In my view, the frontend should be
> > parameterised by only two (more or less) parameters: bitness (32/64)
> and
> > endianness (little/big).
> 
> Not only sizes but alignment requirements will change between
> platforms.  Also, what about __local on CPU?
> 
> > How one can even guarantee e.g. that a calling
> > convention for NVIDIA GPUs is appropriate for ATI GPUs?
> 
> We have full control over the target code generators.  There's nothing
> stopping us defining a specific constant representing the 'kernel'
> calling convention and harmonising the GPU targets to use that
> calling convention.
> 
> Thanks,
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Peter Collingbourne

2011-Jan-04 19:51 UTC

head link

[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

On Mon, Jan 03, 2011 at 12:52:02PM -0600, Villmow, Micah
wrote:> Sorry for the late reply, as I have been on vacation for awhile.
> 
> One method which I haven't seen mentioned is to separate out the kernel
semantics from the function definition.
> 
> All the kernel attribute does is specify that this function is an entry
point to the device from the host. So, why not just create a separate entry
point that is only callable by the host and everything from the device goes to
the original entry point.
> 
> For example, you have two functions and one calls the other:
> 
> kernel foo() {
> }
> kernel bar() {
>   foo();
> }
> 
> If you separate kernel function from the function body, then handling this
becomes easy.
> 
> You end up with four functions:
> 
> kernel foo_kernel() {
>  foo();
> }
> 
> foo() {
> }
> 
> kernel bar_kernel() {
>  bar();
> }
> 
> bar(){
>  foo();
> }
> 
> Then the issue is no longer a compilation problem, but just an entry point
runtime issue. Instead of calling foo(), the runtime just calls foo_kernel()
which handles all of the kernel setup issues and then calls the function body
itself.
> 
> This removes the need to have any metadata nodes in the IR and allows the
kernel function to handle any setup issues for the specific device such as
__local's, id/group calculations, memory offsets, etc... without having to
impact the performance of a kernel calling another kernel.
I like this idea.  I think that the entry point should keep its
original name though, while we rename the body, because the fact that
we factor out the function body seems like an implementation detail.

To a certain extent it also removes the need to attach metadata for
reqd_work_group_size etc at the function level (if required by the
target), since this information can be attached to intrinsic calls
within the entry point.  Example:

define void @foo() nounwind {
entry:
  call void @llvm.opencl.reqd.work.group.size(i32 4, i32 1, i32 1)
  ; .. other setup ..
  call void @foo_kernel()
  ret void
}

define internal void @foo_kernel() nounwind {
  ; ... body ...
}

These intrinsics wouldn't necessarily expand to target code directly,
but would be used to generate something appropriate for the target in
a similar fashion to the debug metadata intrinsics.  Also, by keeping
the metadata in the entry point we guarantee that no more than one
intrinsic call may appear within a function even if the inliner
is used, allowing code generators to simply search for uses of the
@llvm.opencl.reqd.work.group.size (or whatever) intrinsic to create
a mapping from functions to attributes.

Thanks,
-- 
Peter

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jan 2011 - [LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)

Reasonably Related Threads