search for: eltantawy

Displaying 6 results from an estimated 6 matches for "eltantawy".

2016 Jan 20
4
Executing OpenMP 4.0 code on Nvidia's GPU
Hi Arpith, That is exactly what it is :). My bad, I thought I copied over the libraries to where LIBRARY_PATH pointing but apparently it was copied to a wrong destination. Thanks a lot. On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com> wrote: > Hi Ahmed, > > nvlink is unable to find the GPU OMP runtime library in its path. Does > LIBRARY_PATH point to
2015 Apr 08
5
[LLVMdev] CUDA front-end (CUDA to LLVM IR)
Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was
2015 Apr 08
2
[LLVMdev] CUDA front-end (CUDA to LLVM IR)
On Wed, Apr 8, 2015 at 10:12 AM, Dmitry Mikushin <dmitry at kernelgen.org> wrote: > A tool of this kind here: https://github.com/apc-llc/nvcc-llvm-ir > > 2015-04-08 19:01 GMT+02:00 Ahmed ElTantawy <ahmede at ece.ubc.ca>: > >> Hi, >> >> I wanted to ask whether there is ongoing effort (or an already >> established tool) that enables to convert CUDA kernels (that uses CUDA >> specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I >&g...
2015 Feb 03
2
[LLVMdev] Example for usage of LLVM/Clang/libclc
Hi, My goal is to use Clang/LLVM/libclc to compile an OpenCL kernel and eventually generate a PTX code. I already did this but I am not sure if the PTX code I am generating is correct (is the one that is supposed to be generated). For example, currently, In OpenCL : get_global_id(0) translates to In LLVM : %call = tail call i32 @get_global_id(i32 0) which translates to In PTX:
2015 Jun 19
2
[LLVMdev] Performance impact of different optimization passes
Hi, I was wondering if there is a paper or a technical report that documents the performance impact of the different optimizations passes on a some set of benchmarks. Is something like this available ? Best regards, Ahmed -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150619/c3c0a941/attachment.html>
2016 Mar 23
0
__sync_synchronize() crashes when compiling OpenMP to a GPU target
Hi, I get this error when compiling a code that contains "__sync_synchronize()" fatal error: error in backend: Cannot select: 0x85ddfb0: ch = AtomicFence 0x85fd8d8, 0x85c7890, 0x85dd9e8 [ORD=4] [ID=27]example.c:378:13 0x85c7890: i64 = Constant<7> [ID=5]example.c:378:13 0x85dd9e8: i64 = Constant<1> [ID=6]example.c:378:13 I believe it should be equivalent to