Displaying 8 results from an estimated 8 matches for "lcuda".
Did you mean:
cuda
2012 May 08
0
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
...be
implemented in an external LLVM-IR optimization pass.
clang -Xclang -load -Xclang CUDAGenerator.so file.c -O3 -mllvm -offload-cuda
The very same should work for Pure, dragonegg and basically any compiler
based on LLVM. So I do not want to change clang at all (except of
possibly linking to -lcuda).
The llvm.codegen intrinsic allows this, without requiring changes to any
of the external tools. It works both when outputting assembly, enabling
direct object file emission and it even works in the mc-jit. All
alternatives proposed so far, are way more complex and require
significant changes...
2012 May 08
2
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Justin Holewinski <justin.holewinski at gmail.com> writes:
> I believe the point Tobias is trying to make is that he wants to
> retain the ability to pipe modules between tools and not worry about
> the modules ever hitting disk, e.g.
>
> opt -load GPUOptimizer.so -gpu-opt | llc -march=x86
> where the module coming in to opt is just unoptimized host code, and the module
2012 May 08
2
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
Tobias Grosser <tobias at grosser.es> writes:
> The very same should work for Pure, dragonegg and basically any
> compiler based on LLVM. So I do not want to change clang at all
> (except of possibly linking to -lcuda).
Why is this a requirement? I think it's completely unrealistic to
expect to be able to do this without driver changes.
If you don't want to change clang, then you'll need an external wrapper
driver to call all the relevant tools.
-Dave
2016 Mar 05
2
instrumenting device code with gpucc
...fat binary to the host code using ld.
Clang does step 2-4 by invoking subcommands. Therefore, you can use "clang
-###" to dump all the subcommands, and then find the ones for step 2-4. For
example,
$ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc
-L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
--cuda-gpu-arch=sm_35
clang version 3.9.0 (http://llvm.org/git/clang.git
4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git
2550ef485b6f9668bb7a4daa7ab276b6501492df)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/loca...
2019 Nov 13
2
AMDGPU and math functions
There certainly is support; after all AMD supports both OpenCL and HIP (a dialect of C++ very close to cuda).
AMD device libraries (in bitcode form) are installed when ROCm ( https://rocm.github.io/ ) is installed.
AMD device libraries are mostly written in (OpenCL) C and open source at https://github.com/RadeonOpenCompute/ROCm-Device-Libs . They are configured by linking in a number tiny
2012 May 09
0
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
On 05/08/2012 09:15 PM, dag at cray.com wrote:
> Tobias Grosser<tobias at grosser.es> writes:
>
>> The very same should work for Pure, dragonegg and basically any
>> compiler based on LLVM. So I do not want to change clang at all
>> (except of possibly linking to -lcuda).
>
> Why is this a requirement? I think it's completely unrealistic to
> expect to be able to do this without driver changes.
It is very convenient for users and it is perfectly possible with the
llvm.codegen intrinsic. So why going for something less comfortable and
more complica...
2012 May 11
1
[LLVMdev] [PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation
...l LLVM-IR optimization pass.
>
> clang -Xclang -load -Xclang CUDAGenerator.so file.c -O3 -mllvm -offload-cuda
>
> The very same should work for Pure, dragonegg and basically any compiler
> based on LLVM. So I do not want to change clang at all (except of
> possibly linking to -lcuda).
Ok, that *is* an interesting use case. It would be great for LLVM to support this kind of thing. We're clearly not set up for it out of the box right now.
On May 8, 2012, at 2:08 AM, Tobias Grosser wrote:
> In terms of the complexity. The only alternative proposal I have heard
> of...
2016 Mar 10
4
instrumenting device code with gpucc
...g does step 2-4 by invoking subcommands. Therefore, you can use
>> "clang -###" to dump all the subcommands, and then find the ones for step
>> 2-4. For example,
>>
>> $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc
>> -L/usr/local/cuda/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
>> --cuda-gpu-arch=sm_35
>>
>> clang version 3.9.0 (http://llvm.org/git/clang.git
>> 4ce165e39e7b185e394aa713d9adffd920288988) (http://llvm.org/git/llvm.git
>> 2550ef485b6f9668bb7a4daa7ab276b6501492df)
>> Target: x86_64-unknown-...