thr3ads.net - similar to: "RFC: Debug info for Cuda"

Displaying 20 results from an estimated 7000 matches similar to: "RFC: Debug info for Cuda"

2017 Nov 06

Debug info for Cuda

06.11.2017 14:56, Robinson, Paul пишет: >> Hi everybody, >> As you know, Cuda/NVPTX target has very limited support of the debug >> info in Clang/LLVM. Currently, LLVM supports only emission of the line >> numbers debug info. >> This is caused by limitations of the Cuda/NVPTX codegen. Clang/LLVM >> translates the source code to LLVM IR, which is then lowered to

Debug info for Cuda

2017 Nov 08

Debug info for Cuda

Nobody blames ptxas. I'm not saying that these are the troubles, I'm just saying that it has some features and we have some problems to be solved. But lack of labels, label arithmetics in DWARF sections is the real problem, because LLVM actively uses it in DWARF sections Best regards, Alexey Bataev 8 нояб. 2017 г., в 5:35, Madhur Amilkanthwar <madhur13490 at

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

Hi, Looks like "{" and "}" are lost when trying to use the combination of Clang and NVPTX, which may result into clash of definitions of the function-scope and asm-scope. Here is an example: > cat test.cu __attribute__((device)) __attribute__((nv_linkonce_odr)) __inline__ int __any(int a) { int result; asm __volatile__ ("{ \n\t" ".reg .pred

[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?

2016 Apr 07

[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?

Hi, I needed to compile a cuda source file (say, a.cu) into IR (a.bc), and then merge a.bc with another bitcode file (b.bc, compiled from b.cu). So I used *llvm-link a.bc b.bc -o c.bc* However, I noticed that an internal function '* _ZL21__nvvm_reflect_anchorv() *' is defined in both a.bc & b.bc, and when merging these two files, one of the two definitions was renamed to

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

Dmitry, You might be better served by filing this as a bug (http://llvm.org/bugs/). Please include a test case and the steps to reproduce (i.e., what you've provided below). Chad On Jul 10, 2012, at 3:15 PM, Dmitry N. Mikushin wrote: > Hi, > > Looks like "{" and "}" are lost when trying to use the combination of Clang and NVPTX, which may result into clash of

[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?

2016 Apr 08

[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?

Yeah, '.' is the direct reason for the ptxas failure here. I'm curious, however, about what the purpose of nvvm_reflect_anchorv() is here, and why does the front-end always generate this function? Since the current PTX emission doesn't mangle dots, it would be a reasonable workaround for me to prevent the front-end from generating this function in the first place. Is there any

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

Yes, sure, good idea, because might be also Clang-related. http://llvm.org/bugs/show_bug.cgi?id=13322 2012/7/11 Chad Rosier <mcrosier at apple.com> > Dmitry, > You might be better served by filing this as a bug (http://llvm.org/bugs/). > Please include a test case and the steps to reproduce (i.e., what you've > provided below). > > Chad > > On Jul 10, 2012,

[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?

2016 Apr 09

[GPUCC] how to remove _ZL21__nvvm_reflect_anchorv() automatically?

David's change makes nvvm_reflect_anchor unnecessary. The issue with dots in names generated by llvm still needs to be fixed. On Apr 9, 2016 8:32 AM, "Jingyue Wu" <jingyue at google.com> wrote: > Artem, > > With David's http://reviews.llvm.org/rL265060, do you think > __nvvm_reflect_anchor is still necessary? > > On Fri, Apr 8, 2016 at 9:37 AM, Yuanfeng

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 12

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

Dear LLVM NVPTX maintainers, Just to have the issue recorded, I don't know how important it is: clang generates linkonce_odr out of __inline__, and NVPTX generates .weak out of linkonce_odr (how it happens - a big question, btw, because I can't find anything related in NVPTX asm printer - does it chain to some other printer?), and finally ptxas (both 4.2 and 5) fails to compile it to

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

Hi all, I'm looking into the performance difference of a benchmark compiled with NVCC vs NVPTX (coming from Julia, not CUDA C) and I'm seeing a significant difference due to PTX instruction ordering. The relevant source code consists of two nested loops that get fully unrolled, doing some basic arithmetic with values loaded from shared memory: > #define BLOCK_SIZE 16 > >

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 13

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

On Tue, Jun 12, 2012 at 6:11 PM, Dmitry N. Mikushin <maemarcus at gmail.com>wrote: > Dear LLVM NVPTX maintainers, > > Just to have the issue recorded, I don't know how important it is: > > clang generates linkonce_odr out of __inline__, and NVPTX generates .weak > out of linkonce_odr (how it happens - a big question, btw, because I can't > find anything related

Guidance on working with the NVIDIA GPU back-end

2019 Dec 16

Guidance on working with the NVIDIA GPU back-end

Hi all, I'm primarily a hardware person but would like to do some compiler-architecture co-design research. Are there any good references for the NVPTX backend? I'd like to change that backend to have a limited number of physical registers rather than an unlimited number of virtual ones (for more realistic modeling in a uarch simulator). Being able to do register allocation and other

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

We already have a pass that vectorizes loads and stores in nvptx and amdgpu. Not at my laptop, I forget the exact filename, but it's called load-store vectorizer. I think the question is, why is LSV not vectorizing this code? I think the answer is, llvm can't tell that the loads are aligned. Ptxas can, but only because it's (apparently) doing vectorization *after* it reesolves the

Debug info for CUDA code

2019 Feb 27

Debug info for CUDA code

Hi Alexey, I submitted the bug report to nvidia. While they are working on it, can you share some insight in what could potentially cause this? I just want to get a sense if such a bug require significant amount of work to fix, which can help me make some decision moving forward with my project. Thanks, Char At 2019-02-27 03:19:02, "Alexey Bataev" <a.bataev at outlook.com>

instrumenting device code with gpucc

2016 Mar 05

instrumenting device code with gpucc

On Fri, Mar 4, 2016 at 5:50 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com> wrote: > Hi Jingyue, > > My name is Yuanfeng Peng, I'm a PhD student at UPenn. I'm sorry to bother > you, but I'm having trouble with gpucc in my project, and I would be really > grateful for your help! > > Currently we're trying to instrument CUDA code using LLVM 3.9, and

Debug info for CUDA code

2020 Jan 15

Debug info for CUDA code

Hi Alexey, Almost a year has passed and Nvidia finally fixes the ptxas issue in CUDA 10.2 according to: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-compiler-resolved-issues However, I can not yet use it with llvm 9.0.0 release because CUDA 10.2 is not supported yet. Is there other branches of the llvm repo that supports CUDA 10.2 now? Or do I need to wait for llvm 10

Debug info for CUDA code

2019 Feb 26

Debug info for CUDA code

Hi Alexey, Just want to make sure I understand what you said because I'm not familiar with the llvm pipeline, it's this line: /net/gs/vol3/software/modules-sw/cuda/10.0/Linux/RHEL6/x86_64/bin/ptxas" -m64 -g --dont-merge-basicblocks --return-at-end -v --gpu-name sm_75 --output-file /tmp/60663577.1.login.q/testparticles-4fd988.o /tmp/60663577.1.login.q/testparticles-1d20c4.s that

Debug info for CUDA code

2019 Mar 11

Debug info for CUDA code

Hi Alexey, Is there any option for clang to turn on debug for the host code only but not the device code? I've been using something like -ggdb3 -O0 but this generate debug info for both host and device. I'm trying to work around the aforementioned ptxas bug. Thanks, Char At 2019-02-28 02:09:54, "Alexey Bataev" <a.bataev at outlook.com> wrote: Hi Char, it looks like

Debug info for CUDA code

2019 Feb 26

Debug info for CUDA code

Hi Alexey, Thanks for the great work! The version I checked out works most of the time. But I do encounter crashes sometimes. I can't file a bug report on https://bugs.llvm.org/ because I don't have an account. I sent an email to bugs-admin at lists.llvm.org for an account already but I haven't heard back. Meanwhile, can you take a look at the issue? I'm attaching the bug report

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

2013 Mar 01

[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU

I'm building this with llvm-c, and accessing these intrinsics via calling the intrinsic as if it were a function. class F_SREG<string OpStr, NVPTXRegClass regclassOut, Intrinsic IntOp> : NVPTXInst<(outs regclassOut:$dst), (ins), OpStr, [(set regclassOut:$dst, (IntOp))]>; def INT_PTX_SREG_TID_X : F_SREG<"mov.u32 \t$dst, %tid.x;",

similar to: RFC: Debug info for Cuda