thr3ads.net - search: "get_global

2016 Sep 12

2

builtins name mangling in SPIR 2.0

...According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_qual !6 { entry: %call = tail call spir_func i128 @get_global_...

[LLVMdev] Example for usage of LLVM/Clang/libclc

2015 Feb 03

2

[LLVMdev] Example for usage of LLVM/Clang/libclc

Hi, My goal is to use Clang/LLVM/libclc to compile an OpenCL kernel and eventually generate a PTX code. I already did this but I am not sure if the PTX code I am generating is correct (is the one that is supposed to be generated). For example, currently, In OpenCL : get_global_id(0) translates to In LLVM : %call = tail call i32 @get_global_id(i32 0) which translates to In PTX: // .globl blur2d .func (.param .b32 func_retval0) get_global_id ( .param .b32 get_global_id_param_0 ) ; mov.u32 %r2, 0; .param .b32 param0;...

builtins name mangling in SPIR 2.0

2016 Sep 12

2

builtins name mangling in SPIR 2.0

Thanks a lot. On Mon, Sep 12, 2016 at 1:42 PM, Liu, Yaxun (Sam) <Yaxun.Liu at amd.com> wrote: > If you use the default header file under clang/lib/Headers/opencl-c.h, > get_global_id will be mangled. > > > > If you want to declare get_global_id in your own header, add > __attribute__((overloadable)), then it will be mangled. > > > > Sam > > > > *From:* Hongbin Zheng [mailto:etherzhhb at gmail.com] > *Sent:* Monday, September 12, 2016 4:...

builtins name mangling in SPIR 2.0

2016 Sep 16

2

builtins name mangling in SPIR 2.0

...PM, Hongbin Zheng <etherzhhb at gmail.com<mailto:etherzhhb at gmail.com>> wrote: Thanks a lot. On Mon, Sep 12, 2016 at 1:42 PM, Liu, Yaxun (Sam) <Yaxun.Liu at amd.com<mailto:Yaxun.Liu at amd.com>> wrote: If you use the default header file under clang/lib/Headers/opencl-c.h, get_global_id will be mangled. If you want to declare get_global_id in your own header, add __attribute__((overloadable)), then it will be mangled. Sam From: Hongbin Zheng [mailto:etherzhhb at gmail.com<mailto:etherzhhb at gmail.com>] Sent: Monday, September 12, 2016 4:21 PM To: cfe-dev at lists.llvm.or...

builtins name mangling in SPIR 2.0

2016 Sep 18

2

builtins name mangling in SPIR 2.0

...PM, Hongbin Zheng <etherzhhb at gmail.com<mailto:etherzhhb at gmail.com>> wrote: Thanks a lot. On Mon, Sep 12, 2016 at 1:42 PM, Liu, Yaxun (Sam) <Yaxun.Liu at amd.com<mailto:Yaxun.Liu at amd.com>> wrote: If you use the default header file under clang/lib/Headers/opencl-c.h, get_global_id will be mangled. If you want to declare get_global_id in your own header, add __attribute__((overloadable)), then it will be mangled. Sam From: Hongbin Zheng [mailto:etherzhhb at gmail.com<mailto:etherzhhb at gmail.com>] Sent: Monday, September 12, 2016 4:21 PM To: cfe-dev at lists.llvm.or...

[LLVMdev] How to decide whether a function is executed or not

2014 May 20

2

[LLVMdev] How to decide whether a function is executed or not

On 20 May 2014 16:08, RICHARD STUCKEY <richard.stuckey at virgin.net> wrote: > Consider a function which contains an infinite loop: any algorithm which > could determine whether that function is called or not would effectively be > an algorithm that could determine whether the program containing that > function halts or not. Equally, deciding whether the function contains an

[LLVMdev] [cfe-dev] SPIR Review Status: after Introduction and 32bits vs. 64bits discussions

2012 Sep 19

0

[LLVMdev] [cfe-dev] SPIR Review Status: after Introduction and 32bits vs. 64bits discussions

...? If we ignore the issue of size_t inside structs, I don't see the problem with deciding that size_t is 64bits, even on 32bit systems. The only place that I saw that size_t was used, in user code, is in the get_global_id() family of functions (and other APIs which require offsets). A target-specific compiler optimization can reduce the bit width of the get_global_id (and friends) back to 32bits and propagate this, if needed. Answer: we are discussing th...

[LLVMdev] SPIR Review Status: after Introduction and 32bits vs. 64bits discussions

2012 Sep 14

2

[LLVMdev] SPIR Review Status: after Introduction and 32bits vs. 64bits discussions

...? If we ignore the issue of size_t inside structs, I don't see the problem with deciding that size_t is 64bits, even on 32bit systems. The only place that I saw that size_t was used, in user code, is in the get_global_id() family of functions (and other APIs which require offsets). A target-specific compiler optimization can reduce the bit width of the get_global_id (and friends) back to 32bits and propagate this, if needed. Answer: we are discussing th...

[LLVMdev] SPIR: Answers to the issues raised so far

2012 Sep 27

0

[LLVMdev] SPIR: Answers to the issues raised so far

...e? If we ignore the issue of size_t inside structs, I don't see the problem with deciding that size_t is 64bits, even on 32bit systems. The only place that I saw that size_t was used, in user code, is in the get_global_id() family of functions (and other APIs which require offsets). A target-specific compiler optimization can reduce the bit width of the get_global_id (and friends) back to 32bits and propagate this, if needed. Answer: First and most impo...

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

2

[AMDGPU] non-hsa intrinsic with hsa target

...BCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group_id.exit.thread i32 1, label %get_group_id.exit.thread22 i32 2, label %get_group_id.exit.thread24 ] get_group_id.exit.thread: ; preds = %entry %x.i = tail call i32 @ll...

[LLVMdev] SPIR Portability Discussion

2012 Sep 12

0

[LLVMdev] SPIR Portability Discussion

...these types as opaque types, and defines "builtin" functions to handle them. If we ignore the issue of size_t inside structs, I don't see the problem with deciding that size_t is 64bits, even on 32bit systems. The only place that I saw that size_t was used, in user code, is in the get_global_id() family of functions (and other APIs which require offsets). A target-specific compiler optimization can reduce the bit width of the get_global_id (and friends) back to 32bits and propagate this, if needed.

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

2

[AMDGPU] non-hsa intrinsic with hsa target

Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output of the program wasn't correct at tha...

[LLVMdev] Is llc broken for Cortex-A9 + neon ?

2012 Jun 25

2

[LLVMdev] Is llc broken for Cortex-A9 + neon ?

...considering following .ll file ; ModuleID = 'vect3x.ll' target triple = "armv7-none-linux-gnueabi" define arm_aapcscc void @test_hi_char8(i8* %.T0351, <8 x i8>* nocapture %srcA, <4 x i8>* nocapture %dst) noinline { L.entry: %0 = tail call arm_aapcscc i32 (...)* @get_global_id(i8* %.T0351, i32 0) %1 = bitcast <8 x i8>* %srcA to <4 x i8>* %2 = getelementptr <4 x i8>* %1, i32 0, i32 4 %3 = bitcast i8* %2 to <4 x i8>* %4 = shl i32 %0, 3 %5 = getelementptr <4 x i8>* %3, i32 0, i32 %4 %6 = bitcast i8* %5 to <4 x i8>* %7 = load...

Some llvm questions (for tgsi backend)

2016 Jan 11

4

Some llvm questions (for tgsi backend)

...kepler card, 3 has been tested with pocl. My goal for this week is to get the tgsi backend to produce code which I can copy and paste into 2 and then have it working on a kepler card. The test program looks like this: __kernel void test_kern(__global uint *vals, __global uint *buf) { uint id = get_global_id(0); buf[32 * id] -= vals[id]; } The llvm ir looks like this: bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl ; ModuleID = '/home/hans/foo.cl' target datalayout = "E-p:32:32-i64:64:64-f32:32:32-n32" target...

[LLVMdev] Is llc broken for Cortex-A9 + neon ?

2012 Jun 25

0

[LLVMdev] Is llc broken for Cortex-A9 + neon ?

...> > ; ModuleID = 'vect3x.ll' > target triple = "armv7-none-linux-gnueabi" > > define arm_aapcscc void @test_hi_char8(i8* %.T0351, <8 x i8>* nocapture %srcA, <4 x i8>* nocapture %dst) noinline { > L.entry: > %0 = tail call arm_aapcscc i32 (...)* @get_global_id(i8* %.T0351, i32 0) > %1 = bitcast <8 x i8>* %srcA to <4 x i8>* > %2 = getelementptr <4 x i8>* %1, i32 0, i32 4 > %3 = bitcast i8* %2 to <4 x i8>* > %4 = shl i32 %0, 3 > %5 = getelementptr <4 x i8>* %3, i32 0, i32 %4 > %6 = bitcast i8* %5 to &lt...

Some llvm questions (for tgsi backend)

2016 Jan 12

1

Some llvm questions (for tgsi backend)

...get >> the tgsi backend to produce code which I can copy >> and paste into 2 and then have it working on a kepler card. >> >> The test program looks like this: >> >> __kernel void test_kern(__global uint *vals, __global uint *buf) >> { >> uint id = get_global_id(0); >> >> buf[32 * id] -= vals[id]; >> } >> >> The llvm ir looks like this: >> >> bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl >> >> ; ModuleID = '/home/hans/foo.cl&...

[LLVMdev] Valid debug information being deleted by DAGCombiner

2011 Apr 15

0

[LLVMdev] Valid debug information being deleted by DAGCombiner

...; preds = %get_local_id.exit %10 = phi i32 [ %8, %get_local_id.exit ] ; <i32> [#uses=1] br label %11 ; <label>:11 ; preds = %9 %12 = phi i32 [ %10, %9 ] ; <i32> [#uses=1] br label %get_global_id.exit get_global_id.exit: ; preds = %11 %13 = phi i32 [ %12, %11 ] ; <i32> [#uses=4] call void @llvm.dbg.value(metadata !{i32 %13}, i64 0, metadata !27), !dbg !28 However, a similar sequence of IR 'works', IF that sequence...

[LLVMdev] Valid debug information being deleted by DAGCombiner

2011 Apr 15

2

[LLVMdev] Valid debug information being deleted by DAGCombiner

...; preds = %get_local_id.exit %10 = phi i32 [ %8, %get_local_id.exit ] ; <i32> [#uses=1] br label %11 ; <label>:11 ; preds = %9 %12 = phi i32 [ %10, %9 ] ; <i32> [#uses=1] br label %get_global_id.exit get_global_id.exit: ; preds = %11 %13 = phi i32 [ %12, %11 ] ; <i32> [#uses=4] call void @llvm.dbg.value(metadata !{i32 %13}, i64 0, metadata !27), !dbg !28 However, a similar sequence of IR 'works', IF that sequence...

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

2013 Jan 25

4

[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

...ocessors (many GPUs) this step is not always necessary as they can input the single kernel instructions and do the "spreading" on the fly. We have a different method to generate the WG functions for such targets. > Moreover, OpenCL has lots of language specific APIs such as > "get_global_id" and builtin function calls, and without knowledge of these > calls it is impossible to vectorize OpenCL. In pocl the whole kernel is "flattened", that is, the processed kernel code does not usually have function calls. Well, printf() and some intrisics calls might be exceptions....

Some llvm questions (for tgsi backend)

2016 Jan 11

0

Some llvm questions (for tgsi backend)

...l. My goal for this week is to get > the tgsi backend to produce code which I can copy > and paste into 2 and then have it working on a kepler card. > > The test program looks like this: > > __kernel void test_kern(__global uint *vals, __global uint *buf) > { > uint id = get_global_id(0); > > buf[32 * id] -= vals[id]; > } > > The llvm ir looks like this: > > bin/clang -x cl -c -emit-llvm -target tgsi-- -include /usr/share/pocl/include/_kernel.h -o ~/foo.ir -x cl -S ~/foo.cl > > ; ModuleID = '/home/hans/foo.cl' > target datalayout = &qu...

search for: get_global_id