search for: lcuda

Displaying 8 results from an estimated 8 matches for "lcuda".

Did you mean: cuda
2012 May 08
0
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
...be implemented in an external LLVM-IR optimization pass. clang -Xclang -load -Xclang CUDAGenerator.so file.c -O3 -mllvm -offload-cuda The very same should work for Pure, dragonegg and basically any compiler based on LLVM. So I do not want to change clang at all (except of possibly linking to -lcuda). The llvm.codegen intrinsic allows this, without requiring changes to any of the external tools. It works both when outputting assembly, enabling direct object file emission and it even works in the mc-jit. All alternatives proposed so far, are way more complex and require significant changes...
2012 May 08
2
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Justin Holewinski <justin.holewinski at gmail.com> writes: > I believe the point Tobias is trying to make is that he wants to > retain the ability to pipe modules between tools and not worry about > the modules ever hitting disk, e.g. > > opt -load GPUOptimizer.so -gpu-opt | llc -march=x86 > where the module coming in to opt is just unoptimized host code, and the module
2012 May 08
2
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Tobias Grosser <tobias at grosser.es> writes: > The very same should work for Pure, dragonegg and basically any > compiler based on LLVM. So I do not want to change clang at all > (except of possibly linking to -lcuda). Why is this a requirement? I think it's completely unrealistic to expect to be able to do this without driver changes. If you don't want to change clang, then you'll need an external wrapper driver to call all the relevant tools. -Dave
2016 Mar 05
2
instrumenting device code with gpucc
...fat binary to the host code using ld. Clang does step 2-4 by invoking subcommands. Therefore, you can use "clang -###" to dump all the subcommands, and then find the ones for step 2-4. For example, $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc -L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread --cuda-gpu-arch=sm_35 clang version 3.9.0 (http://llvm.org/git/clang.git 4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git 2550ef485b6f9668bb7a4daa7ab276b6501492df) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/loca...
2019 Nov 13
2
AMDGPU and math functions
There certainly is support; after all AMD supports both OpenCL and HIP (a dialect of C++ very close to cuda). AMD device libraries (in bitcode form) are installed when ROCm ( https://rocm.github.io/ ) is installed. AMD device libraries are mostly written in (OpenCL) C and open source at https://github.com/RadeonOpenCompute/ROCm-Device-Libs . They are configured by linking in a number tiny
2012 May 09
0
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On 05/08/2012 09:15 PM, dag at cray.com wrote: > Tobias Grosser<tobias at grosser.es> writes: > >> The very same should work for Pure, dragonegg and basically any >> compiler based on LLVM. So I do not want to change clang at all >> (except of possibly linking to -lcuda). > > Why is this a requirement? I think it's completely unrealistic to > expect to be able to do this without driver changes. It is very convenient for users and it is perfectly possible with the llvm.codegen intrinsic. So why going for something less comfortable and more complica...
2012 May 11
1
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
...l LLVM-IR optimization pass. > > clang -Xclang -load -Xclang CUDAGenerator.so file.c -O3 -mllvm -offload-cuda > > The very same should work for Pure, dragonegg and basically any compiler > based on LLVM. So I do not want to change clang at all (except of > possibly linking to -lcuda). Ok, that *is* an interesting use case. It would be great for LLVM to support this kind of thing. We're clearly not set up for it out of the box right now. On May 8, 2012, at 2:08 AM, Tobias Grosser wrote: > In terms of the complexity. The only alternative proposal I have heard > of...
2016 Mar 10
4
instrumenting device code with gpucc
...g does step 2-4 by invoking subcommands. Therefore, you can use >> "clang -###" to dump all the subcommands, and then find the ones for step >> 2-4. For example, >> >> $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc >> -L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread >> --cuda-gpu-arch=sm_35 >> >> clang version 3.9.0 (http://llvm.org/git/clang.git >> 4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git >> 2550ef485b6f9668bb7a4daa7ab276b6501492df) >> Target: x86_64-unknown-...