search for: __kernel

Displaying 20 results from an estimated 46 matches for "__kernel".

Did you mean: __kernel_
2010 Dec 16
4
[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)
...t's why we propose to use metadata for this purpose. Does anyone have a > better idea? I agree that metadata should be used for function qualifiers; a prerequisite being support for non-discardable function-level metadata, which would need to be added to LLVM. I'm undecided on whether __kernel should also be represented by metadata; there is precedent (PTX backend) for using the calling convention. I do have a concern though with the semantics of the inliner when it needs to inline a function with metadata. One possibility would be to discard the callee's metadata, or somehow merge...
2010 Dec 17
0
[LLVMdev] [cfe-dev] Function-level metadata for OpenCL (was Re: OpenCL support)
...data for this purpose. Does anyone have > a > > better idea? > > I agree that metadata should be used for function qualifiers; > a prerequisite being support for non-discardable function-level > metadata, which would need to be added to LLVM. I'm undecided on > whether __kernel should also be represented by metadata; there is > precedent (PTX backend) for using the calling convention. > Being discardable is a design point of metadata. You might add something else to support this, but it won't be metadata. Why are you trying to preserve "kernel"-ness...
2010 Dec 20
1
[LLVMdev] [cfe-dev] Function-level metadata for OpenCL (was Re: OpenCL support)
...ycky <nlewycky at google.com> wrote: > Being discardable is a design point of metadata. You might add something > else to support this, but it won't be metadata. > Why are you trying to preserve "kernel"-ness into the LLVM IR? What > semantics does it have? What does __kernel actually mean to the optimizers > and code generator? > Could you just make __kernel mean "externally visible" and undecorated > functions be "linkonce_odr"? If that's not enough, could you swing it around > and maintain single named metadata node with a list of...
2010 Dec 17
0
[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)
...e metadata for this purpose.  Does anyone have a >> better idea? > > I agree that metadata should be used for function qualifiers; > a prerequisite being support for non-discardable function-level > metadata, which would need to be added to LLVM.  I'm undecided on > whether __kernel should also be represented by metadata; there is > precedent (PTX backend) for using the calling convention. > > I do have a concern though with the semantics of the inliner when it > needs to inline a function with metadata.  One possibility would be to > discard the callee's me...
2016 Sep 12
2
builtins name mangling in SPIR 2.0
...> > According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. > > > > However, when I compile OpenCl code with Clang 3.9 with the > "spir64-unknown-unknown" target, Clang generates IR without mangling the > builtins, e.g. for: > > > > __kernel void input_zip_int(__global int *in0) { > > *in0 = get_global_id(0); > > } > > > > clang generates: > > > > define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) > local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4...
2010 Dec 20
6
[LLVMdev] Function-level metadata for OpenCL (was Re: OpenCL support)
...it to be discardable. In particular, if the metadata is attached to a function, the only case I can think of where an optimiser needs to touch the metadata is if a function with metadata is inlined. And as I mentioned in my previous mail I don't think this will be any trouble for OpenCL. The __kernel attribute isn't the only attribute we need to preserve. There are also: __attribute__((vec_type_hint(type))) __attribute__((work_group_size_hint(X, Y, Z))) __attribute__((reqd_work_group_size(X, Y, Z))) which provide hints to the code generator regarding the specific work load of a particula...
2016 Sep 16
2
builtins name mangling in SPIR 2.0
...angled to _Z6printfPrU3AS2cz, while in clang's opencl-c.h[2], printf does not have the overload attribute: int printf(__constant const char* st, ...); (and it is different from the standard, which is printf(restrict __constant char *, ...)) I try the following code: #include <opencl-c.h> __kernel void vadd(__global const int* a, __global const int* b, __global int* c) { printf("aaaaa"); } and get a printf that is not mangled: %call = tail call spir_func i32 (i8 addrspace(2)*, ...) @printf(i8 addrspace(2)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(2)* @.str, i64 0,...
2011 Oct 15
0
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...ivate memory using barrier synchronisation, but since there is no > way to queue a memory fence across __private memory (only __local and > __global), any access to that memory would invoke undefined behaviour. > For example, consider the following (2 work-items in a work-group): > > __kernel void foo() { > int x = 0; > int *__local p; > if (get_local_id(0) == 0) p = &x; > barrier(CLK_LOCAL_MEM_FENCE); > if (get_local_id(0) == 1) *p = 1; > barrier(CLK_LOCAL_MEM_FENCE); > // what is the value of x in work-item 0 here? > } > > The value of x at the...
2016 Sep 12
2
builtins name mangling in SPIR 2.0
Hi all, According to the SPIR 2.0 spec[1], the name of OpenCL builtins are mangled. However, when I compile OpenCl code with Clang 3.9 with the "spir64-unknown-unknown" target, Clang generates IR without mangling the builtins, e.g. for: __kernel void input_zip_int(__global int *in0) { *in0 = get_global_id(0); } clang generates: define spir_kernel void @input_zip_int(i32 addrspace(1)* nocapture %in0) local_unnamed_addr #0 !kernel_arg_addr_space !3 !kernel_arg_access_qual !4 !kernel_arg_type !5 !kernel_arg_base_type !5 !kernel_arg_type_q...
2011 Dec 13
0
[LLVMdev] AMD IL Code Generator Backend for OpenCL
...cast ([1 x i8] addrspace(2)* @sgv to i8*), i8* bitcast ([1 x i8] addrspace(2)* @fgv to i8*), i8* bitcast ([0 x i8*]* @lvgv to i8*), i32 0 }], section "llvm.metadata" define void @__OpenCL_foo_kernel() nounwind readnone { entry: ret void } Here's my attempt.. I'm guessing that __kernel in OpenCL triggers your annotation, but when I try to use annotate("doesNothing") there's no sign of "doesNothing" in the resulting IR. Thus, my attempt with the function pointers. %0 still has a different form, however. What do "sgv" and "fgv" stand fo...
2011 Oct 14
2
[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces
...shares pointers to __private memory using barrier synchronisation, but since there is no way to queue a memory fence across __private memory (only __local and __global), any access to that memory would invoke undefined behaviour. For example, consider the following (2 work-items in a work-group): __kernel void foo() { int x = 0; int *__local p; if (get_local_id(0) == 0) p = &x; barrier(CLK_LOCAL_MEM_FENCE); if (get_local_id(0) == 1) *p = 1; barrier(CLK_LOCAL_MEM_FENCE); // what is the value of x in work-item 0 here? } The value of x at the comment is undefined, because no fence...
2011 Nov 14
2
[LLVMdev] PTX backend fatal error
Hi everybody, I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. Compiling the Histogram64.cl program I get a several backend errors. I isolated one of them in the following kernel program: __kernel void kernel_function(__global int *input) { __local char localArray[16]; for(unsigned int index = 0; index < 16; ++index) localArray[index] = 0; input[0] = localArray[get_local_id(0)]; } fatal error: error in backend: Cannot select: 0x5810cc0: i32,ch = load 0x57fa148,...
2016 Sep 18
2
builtins name mangling in SPIR 2.0
...angled to _Z6printfPrU3AS2cz, while in clang's opencl-c.h[2], printf does not have the overload attribute: int printf(__constant const char* st, ...); (and it is different from the standard, which is printf(restrict __constant char *, ...)) I try the following code: #include <opencl-c.h> __kernel void vadd(__global const int* a, __global const int* b, __global int* c) { printf("aaaaa"); } and get a printf that is not mangled: %call = tail call spir_func i32 (i8 addrspace(2)*, ...) @printf(i8 addrspace(2)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(2)* @.str, i64 0,...
2011 Dec 13
2
[LLVMdev] AMD IL Code Generator Backend for OpenCL
We are working on getting the documentation cleaned up to the point where it can be released. If you look at the test cases, you can infer what needs to be done. Basically since this is targeted for OpenCL, we annotate OpenCL kernels slightly different than normal functions and that is what causes the code to be generated. That being said, on my list of things to do is fix this so that any
2009 Oct 07
3
[LLVMdev] Instructions that cannot be duplicated
...r is that if a barrier exists inside of control flow, every thread in a work-group must execute the barrier instruction(6.11.9). However, in this simple CL code: #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable __kernel void KMeansMapReduceAtomic(const int num_attributes, const int num_objects, __global int* delta_d ) { __local int clusterCount[256]; __local int sTemp[1]; // amd opencl needed this to be a...
2013 Dec 31
2
[LLVMdev] [PATCH] R600 - Fix zero extend of i1
Hi, When trying to compile a trivial opencl kernel such as: __kernel void if_eq(__global int * out, int arg0, int arg1){ out[0] = arg0==arg1?0:1; } Clang generates IR like: %1 = icmp eq i32 %arg0, %arg1 %. = zext i1 %1 to i32 This eventually crashes ISel on R600. Attached patch adds a selector so it will compile. Regards, Jon Pry jonpry at gmail.c...
2013 Jun 07
2
[LLVMdev] How to generate constant memory for ptx code by NVPTX?
Hello, I work on compiling OpenCL kernel to PTX code by clang and NVPTX with libclc. I have a kernel that contains constant variable declared in file scope like this: constant one_f = 1.0f; __kernel void test( ...){ ... } Then it is compiled to llvm-ir: @one_f = addrspace(4) const float 1.000000e+00, align 4 define void test(...){ ... } Finally ptx: .visible .global .align 4 .f32 one_f = 0f3F800000; .entry test( ...) { ... } one_f is placed in global memory and load it by ld.globa...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgrou...
2011 Nov 14
0
[LLVMdev] PTX backend fatal error
...o Magni <alberto.magni86 at gmail.com>wrote: > Hi everybody, > > I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. > Compiling the Histogram64.cl program I get a several backend errors. > > I isolated one of them in the following kernel program: > > __kernel void kernel_function(__global int *input) { > __local char localArray[16]; > for(unsigned int index = 0; index < 16; ++index) > localArray[index] = 0; > input[0] = localArray[get_local_id(0)]; > } > > fatal error: error in backend: Cannot select: > 0x58...
2011 Nov 14
1
[LLVMdev] PTX backend fatal error
...i <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>> wrote: Hi everybody, I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks. Compiling the Histogram64.cl program I get a several backend errors. I isolated one of them in the following kernel program: __kernel void kernel_function(__global int *input) { __local char localArray[16]; for(unsigned int index = 0; index < 16; ++index) localArray[index] = 0; input[0] = localArray[get_local_id(0)]; } fatal error: error in backend: Cannot select: 0x5810cc0: i32,ch = load 0x57fa148, 0x...