thr3ads.net - similar to: "cuda-memcheck to debug CUDA-enabled R packages"

Displaying 20 results from an estimated 10000 matches similar to: "cuda-memcheck to debug CUDA-enabled R packages"

[CUDA] Lost debug information when compiling CUDA code

2017 Jun 14

[CUDA] Lost debug information when compiling CUDA code

Hi, I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated

Debug info for CUDA code

2018 Nov 30

Debug info for CUDA code

Hi all, I found this http://lists.llvm.org/pipermail/llvm-dev/2017-November/118871.html when googling about compiling CUDA code using llvm. Is it still the case that one can't step into CUDA kernel code compiled by llvm in cuda-gdb? I'm using clang 7.0. Thanks, Char -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] [LV] possible `vector.memcheck` regression when using `llvm.loop` and `llvm.mem.parallel_loop_access`

2015 Mar 19

[LLVMdev] [LV] possible `vector.memcheck` regression when using `llvm.loop` and `llvm.mem.parallel_loop_access`

Adam, Please find the attached test case (run with ToT opt -O3). As you can see, `y_body` successfully is vectorized, though %33 and %46 are deemed MayAlias despite their exclusive use in loads ands stores marked with `llvm.mem.parallel_loop_access`. Many Thanks, Josh On Thu, Mar 19, 2015 at 12:55 PM, Adam Nemet <anemet at apple.com> wrote: > > > On Mar 19, 2015, at 9:43 AM,

[LLVMdev] [LV] possible `vector.memcheck` regression when using `llvm.loop` and `llvm.mem.parallel_loop_access`

2015 Mar 19

[LLVMdev] [LV] possible `vector.memcheck` regression when using `llvm.loop` and `llvm.mem.parallel_loop_access`

It seems that at some point in the not-so-distant-past that the loop vectorizer gained the ability to vectorize loops without explicit `llvm.loop` & `llvm.mem.parallel_loop_access` metadata. While that's awesome, there seems to be a regression in that `llvm.mem.parallel_loop_access` metadata doesn't make it into the alias analysis, and therefore a `vector.memcheck` basic block is

Debug info for CUDA code

2018 Dec 14

Debug info for CUDA code

Are you planning to release this as soon as it's ready or you want to make it into a major release? Is it possible to let me know (maybe by replying to this thread) once the code is ready? I know sometimes it takes a while to get things in the major release. I greatly appreciate your work on this! Thanks, Char 在 2018-12-15 05:19:50，"Alexey Bataev" <a.bataev at outlook.com>

Debug info for CUDA code

2018 Dec 14

Debug info for CUDA code

Hi Alex, Eric and Valentin, Thanks for the information. I don't mean to push this but I'm in desperate need of debugging some cuda code. I'm not familiar with the llvm internal but it sounds like there's at least line info now, right? If so, can you point me to a branch of llvm that can help tracing the bug down to certain line of code. I believe my bug is simply a write/read

Debug info for CUDA code

2018 Dec 04

Debug info for CUDA code

Adding Alexey here who has been driving this effort in llvm. There are about 5 patches waiting on my review: -: https://reviews.llvm.org/D54320 -: https://reviews.llvm.org/D46189 -: https://reviews.llvm.org/D51554 -: https://reviews.llvm.org/D46061 -: https://reviews.llvm.org/D45784 After which I think we're good. -eric On Mon, Dec 3, 2018 at 6:29 PM Valentin Churavy via

cuda __shfl_sync problem

2020 Sep 24

cuda __shfl_sync problem

Hi, First of all, i'm not sure if i should be posting this here or in cfe-dev, but here it goes. In order to instrument CUDA kernels i first generate device IR with: clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR

Status of CUDA 11 support

2020 Jul 30

Status of CUDA 11 support

Hi, I work in a large CUDA codebase and use Clang to build some of our CUDA code to improve compilation speed. We're planning to upgrade to CUDA 11 soon, and it appears that CUDA 11 is not yet supported in LLVM. >From the LLVM commits history, I can see that work on CUDA 11 has started. Is this currently being worked on? What is the remaining work left? And is any help needed to finish

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

+Artem Belevich <tra at google.com> On Fri, Mar 23, 2018 at 7:53 PM Bharath Bhoopalam via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I was wondering if anyone has encountered this issue when cross compiling > cuda on Nvidia TX2 running android. > > The error is > In file included from <built-in>:1: > In file included from >

cuda __shfl_sync problem

2020 Sep 25

cuda __shfl_sync problem

Do you mean in llc? Because i don't see such an option i'm afraid. ~George On 24-09-2020 20:54, Johannes Doerfert wrote: > Not that I am an expert but it looks like it defaults to the minimal > PTX version that supports the compute capability. You might be able to > choose PTX 6.0 though. > > ~ Johannes > > > On 9/24/20 1:02 PM, George K via llvm-dev wrote:

CUDA compilation "No available targets are compatible with this triple." problem

2017 Aug 02

CUDA compilation "No available targets are compatible with this triple." problem

Yes, I followed the guide. The same error showed up: >clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread error: unable to create target: 'No available targets are compatible with this triple.' ________________________________ From: Kevin Choi <code.kchoi at gmail.com> Sent: Wednesday, August 2,

llvm/cuda: Indentify kernel functions and optimizations

2016 Dec 21

llvm/cuda: Indentify kernel functions and optimizations

Hi, I am trying to instrument CUDA kernel functions only (llvm-3.9.0). Is there a way to identify cuda kernel functions? I see that in llvm IR for CUDA has nvvm annotations section, where kernel functions are identified for NVPTX usage. I can parse the whole IR for this kernel metadata and then proceed, but this is very clumsy. Other way is to work with cuda-device-only IR. But then I am not

[LLVMdev] Valgrind memcheck errors in llvm

2011 Feb 24

[LLVMdev] Valgrind memcheck errors in llvm

I have ran under valgrind memcheck the process using libLLVM-2.9.so (rev.126022) and got several errors: ==24227== Invalid read of size 1 ==24227== at 0x40274C9: memcpy (mc_replace_strmem.c:497) ==24227== by 0x40D5B84: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in

JIT compiling CUDA source code

2020 Nov 19

JIT compiling CUDA source code

Sound right now like you are emitting an LLVM module? The best strategy is probably to use to emit a PTX module and then pass that to the CUDA driver. This is what we do on the Julia side in CUDA.jl. Nvidia has a somewhat helpful tutorial on this at https://github.com/NVIDIA/cuda-samples/blob/c4e2869a2becb4b6d9ce5f64914406bf5e239662/Samples/vectorAdd_nvrtc/vectorAdd.cpp and

llvm/cuda: Indentify kernel functions and optimizations

2016 Dec 21

llvm/cuda: Indentify kernel functions and optimizations

https://github.com/llvm-mirror/llvm/blob/652375a8cc49615de31fd9d424753795059185b6/lib/Target/NVPTX/NVPTXUtilities.h#L58 Does this solve your problem? On Wed, Dec 21, 2016 at 2:29 PM, Gurunath Kadam via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hi, > > I am trying to instrument CUDA kernel functions only (llvm-3.9.0). > > Is there a way to identify cuda kernel

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

I was wondering if anyone has encountered this issue when cross compiling cuda on Nvidia TX2 running android. The error is In file included from <built-in>:1: In file included from prebuilts/clang/host/linux-x86/clang-4667116/lib64/clang/7.0.1/include/__clang_cuda_runtime_wrapper.h:219: ../cuda/targets/aarch64-linux-androideabi/include/math_functions.hpp:3477:19: error: no matching function

CUDA tools?

2017 Oct 05

CUDA tools?

vychytraly . wrote: > On Thu, Oct 5, 2017 at 9:51 PM, <m.roth at 5-cent.us> wrote: >> >> So, kmod-nvidia installed. Trouble is, I have no tool to test it. And my >> user might need nvcc, which, of course, is only provided by the NVidia >> CUDA, which won't install, because it conflicts with kmod-nvidia. >> >> Has *anyone* dealt with this? If so,

CUDA compilation "No available targets are compatible with this triple." problem

2017 Aug 02

CUDA compilation "No available targets are compatible with this triple." problem

Hi, I have trouble compiling CUDA code with Clang. The following is a command I tried: > clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 --cuda-path=/usr/local/cuda The error message is error: unable to create target: 'No available targets are compatible with this triple.' The info of the LLVM I'm using is as follows: > lang++ --version clang version 6.0.0

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

On Wed, Apr 8, 2015 at 10:12 AM, Dmitry Mikushin <dmitry at kernelgen.org> wrote: > A tool of this kind here: https://github.com/apc-llc/nvcc-llvm-ir > > 2015-04-08 19:01 GMT+02:00 Ahmed ElTantawy <ahmede at ece.ubc.ca>: > >> Hi, >> >> I wanted to ask whether there is ongoing effort (or an already >> established tool) that enables to convert CUDA

similar to: cuda-memcheck to debug CUDA-enabled R packages