thr3ads.net - llvm dev - [LLVMdev] Example for usage of LLVM/Clang/libclc [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Ahmed ElTantawy

2015-Feb-03 23:35 UTC

[LLVMdev] Example for usage of LLVM/Clang/libclc

Hi,

My goal is to use Clang/LLVM/libclc to compile an OpenCL kernel and
eventually generate a PTX code. I already did this but I am not sure if the
PTX code I am generating is correct (is the one that is supposed to be
generated).

For example, currently,

In OpenCL : get_global_id(0)   translates to
In LLVM    :  %call = tail call i32 @get_global_id(i32 0)  which translates
to
In PTX:

        // .globl       blur2d
.func  (.param .b32 func_retval0) get_global_id
(
        .param .b32 get_global_id_param_0
)
;

         mov.u32         %r2, 0;
        .param .b32 param0;
        st.param.b32    [param0+0], %r2;
        .param .b32 retval0;
        call.uni (retval0),
        get_global_id,
        (
        param0
        );




Is this what is supposed to happen ? or there is something wrong ?  I am
saying this because the get_global_id implementation does not make much
sense to me and I am not sure if it used the libclc definitions at all ?

If it is not, any idea how the correct conversion will look like ?

Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/60533a21/attachment.html>

Ahmed ElTantawy

2015-Feb-05 06:09 UTC

head link

[LLVMdev] Example for usage of LLVM/Clang/libclc

Just as an update, I figured out that Clang was not properly linked to the
libclc library and I used a slightly modified version of the script in
libclc/compile-test.sh,

" clang  -target nvptx-unknown-nvcl -S -emit-llvm -O4
-Iptx-nvidiacl/include -Igeneric/include   -include clc/clc.h  -Xclang
-mlink-bitcode-file -Xclang nvptx--nvidiacl/lib/builtins.opt.bc
-Dcl_clang_storage_class_specifiers -Dcl_khr_fp64 "$@" "


which works but it produces LLVM IR code for all OpenCL intrinsics
implemented by libclc along with the kernel I am interested in, is their a
possibility to avoid this ? and only produce the llvm code for the kernel
required ?

On Tue, Feb 3, 2015 at 3:35 PM, Ahmed ElTantawy <ahmede at ece.ubc.ca>
wrote:
> Hi,
>
> My goal is to use Clang/LLVM/libclc to compile an OpenCL kernel and
> eventually generate a PTX code. I already did this but I am not sure if the
> PTX code I am generating is correct (is the one that is supposed to be
> generated).
>
> For example, currently,
>
> In OpenCL : get_global_id(0)   translates to
> In LLVM    :  %call = tail call i32 @get_global_id(i32 0)  which
> translates to
> In PTX:
>
>         // .globl       blur2d
> .func  (.param .b32 func_retval0) get_global_id
> (
>         .param .b32 get_global_id_param_0
> )
> ;
>
>          mov.u32         %r2, 0;
>         .param .b32 param0;
>         st.param.b32    [param0+0], %r2;
>         .param .b32 retval0;
>         call.uni (retval0),
>         get_global_id,
>         (
>         param0
>         );
>
>
>
>
> Is this what is supposed to happen ? or there is something wrong ?  I am
> saying this because the get_global_id implementation does not make much
> sense to me and I am not sure if it used the libclc definitions at all ?
>
> If it is not, any idea how the correct conversion will look like ?
>
> Thanks,
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150204/332ff711/attachment.html>

Dan Liew

2015-Feb-05 13:50 UTC

head link

[LLVMdev] Example for usage of LLVM/Clang/libclc

Hi,
> which works but it produces LLVM IR code for all OpenCL intrinsics
> implemented by libclc along with the kernel I am interested in, is their a
> possibility to avoid this ? and only produce the llvm code for the kernel
> required ?
Mark all functions apart from the kernel entry points with the
internal attribute and then run global dead code elimination (it
should remove most of the unused functions).

You can use the opt tool to do this.

e.g. if you had kernel entry points foo and bar you could run the following

$ opt -internalize-public-api-list=foo,bar -globaldce
your_program.bc> transformed_program.bc
Hope that helps.

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Feb 2015 - [LLVMdev] Example for usage of LLVM/Clang/libclc

[LLVMdev] Example for usage of LLVM/Clang/libclc

[LLVMdev] Example for usage of LLVM/Clang/libclc

[LLVMdev] Example for usage of LLVM/Clang/libclc

Apparently Analagous Threads