search for: nvptx64

Displaying 20 results from an estimated 81 matches for "nvptx64".

Did you mean: nvptx
2014 Oct 24
3
[LLVMdev] IndVar widening in IndVarSimplify causing performance regression on GPU programs
...to 40%) on some internal CUDA benchmarks (a reduced example presented below). The root cause of this regression seems that IndVarSimpilfy widens induction variables assuming arithmetics on wider integer types are as cheap as those on narrower ones. However, this assumption is wrong at least for the NVPTX64 target. Although the NVPTX64 target supports 64-bit arithmetics, since the actual NVIDIA GPU typically has only 32-bit integer registers, one 64-bit arithmetic typically ends up with two machine instructions taking care of the low 32 bits and the high 32 bits respectively. I haven't looked at...
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
...r, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated in the previous step), inserting a call to a hook function before each device memory access. The hook function is defined in another file, b.cu. Let's say we get a file named intrumented-a-device.bc after this step; 3) Generate IR files for b.cu: clang++ -g -em...
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Duncan, You're right, global variables use preferred alignment. And - yes, preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's prop. compiler gives 4. However, since CUDA 5.0 ptx modules are linkable with each other, I think alignments for externally visible functions and data should all follow ABI rules. Is there a guide on making tests? I have ~5 pending patches
2012 Nov 10
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Perhaps "compatibility" is the wrong term to use here. For now, I would like to "match" what the vendor compiler does. I don't think using preferred alignment would hurt anything in terms of correctness, but I need to go through the entire back-end to see what effects it could have on performance (e.g. adding extra padding increases local memory usage). It could be a
2012 Nov 09
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Dmitry, > I'm attaching a patch that should fix the issue mentioned above. It > simply makes the same check seen in the same file for global > variables: > > emitPTXAddressSpace(PTy->getAddressSpace(), O); > if (GVar->getAlignment() == 0) > O << " .align " << (int) TD->getPrefTypeAlignment(ETy); > else > O
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
...0) O << " .align " << (int) TD->getPrefTypeAlignment(ETy); else O << " .align " << GVar->getAlignment(); Could you please review and commit? Do you think it needs a test case? Thanks, - D. dmikushin at hp2:~/forge/align0> llc -march=nvptx64 -mcpu=sm_20 align0.ll -o - // // Generated by LLVM NVPTX Back-End // .version 3.1 .target sm_20 .address_size 64 // .globl __internal_dsmul .visible .func __internal_dsmul( .param .b64 __internal_dsmul_param_0, .param .align 4 .b8 __internal_dsmul_param_1[8], .param .align 4 .b8 __internal_d...
2012 Nov 10
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Justin, On 09/11/12 22:49, Justin Holewinski wrote: > Test cases exist under test/CodeGen/NVPTX (name changed in May). I've deleted the empty PTX directory. Now that I'm > back at NVIDIA, I'm going to be running through the bugzilla issues (thanks > Dmitry for the reports!). I have practically the exact same patch here in my > queue. :) > > In this case, I
2012 Jun 27
2
[LLVMdev] [NVPTX] Backend failure in LegalizeDAG due to unimplemented expand in target lowering
Dear LLVM, I'm trying to understand why the attached IR code works for x86_64 target and fails for nvptx64, because of unimplemented expand during the target lowering. Any ideas? Just change the target triple to x86_64-unknown-unknown, and the same IR code could we successfully codegen-ed for x86_64. Thanks, - Dima. dmikushin at dmikushin-desktop:~/Desktop$ gdb ~/sandbox/bin/llc GNU gdb (Ubuntu/Linaro...
2013 Dec 09
0
[LLVMdev] PTX generation examples?
...mpiler. The output will be a string buffer that contains the PTX, which you can load into the CUDA runtime. As for determining if PTX support is compiled into the LLVM binary you are using, you could register all targets and then check if you can create a Target for the "nvptx" or "nvptx64" triple: InitializeAllTargets(); InitializeAllTargetMCs(); InitializeAllAsmPrinters(); InitializeAllAsmParsers(); std::string Err; const Target *Tgt = TargetRegistry::lookupTarget("nvptx64", Err); if (Tgt) { // nvptx target is available } else { // nvptx tar...
2013 Dec 06
2
[LLVMdev] PTX generation examples?
OK, fine -- an example of MCJIT that sets up for PTX JIT would also be helpful. On Dec 6, 2013, at 12:32 PM, Eli Bendersky <eliben at google.com> wrote: > > You'll have to switch to MCJIT for this purpose. Legacy JIT doesn't emit PTX. > > Eli -- Larry Gritz lg at larrygritz.com -------------- next part -------------- An HTML attachment was scrubbed... URL:
2012 Jul 11
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
...getelementptr inbounds %struct.float2* %agg.result, i64 0, i32 1 store float %15, float* %agg.result.1, align 4 ret void } declare ptx_device float @llvm.nvvm.add.rn.f(float, float) nounwind readnone declare ptx_device float @llvm.nvvm.mul.rn.f(float, float) nounwind readnone > llc -march=nvptx64 -mcpu=sm_20 test.ll -o test.ptx > cat test.ptx // // Generated by LLVM NVPTX Back-End // .version 3.0 .target sm_20, texmode_independent .address_size 64 // .globl __internal_dsmul .func __internal_dsmul( .param .b64 __internal_dsmul_param_0, .param .align 0 .b8 __internal_dsmu...
2012 Jun 29
0
[LLVMdev] [NVPTX] Backend failure in LegalizeDAG due to unimplemented expand in target lowering
...e: 2012/6/27 > Subject: [NVPTX] Backend failure in LegalizeDAG due to unimplemented > expand in target lowering > To: LLVM-Dev <llvmdev at cs.uiuc.edu> > > > Dear LLVM, > > I'm trying to understand why the attached IR code works for x86_64 > target and fails for nvptx64, because of unimplemented expand during > the target lowering. Any ideas? > Just change the target triple to x86_64-unknown-unknown, and the same > IR code could we successfully codegen-ed for x86_64. > > Thanks, > - Dima. > > dmikushin at dmikushin-desktop:~/Desktop$ gdb ~/...
2013 Dec 09
1
[LLVMdev] PTX generation examples?
...he output will be a string buffer that contains the PTX, which you can load into the CUDA runtime. > > As for determining if PTX support is compiled into the LLVM binary you are using, you could register all targets and then check if you can create a Target for the "nvptx" or "nvptx64" triple: > > InitializeAllTargets(); > InitializeAllTargetMCs(); > InitializeAllAsmPrinters(); > InitializeAllAsmParsers(); > > std::string Err; > const Target *Tgt = TargetRegistry::lookupTarget("nvptx64", Err); > if (Tgt) { > // nvp...
2013 Mar 21
0
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
...or these keywords: tests/SemaCUDA/cuda.h If you compile as CUDA (use .cu extension, or "-x cuda") and use this header, you will have basic support. You can invoke clang with something like: $ clang test1.cu -Xclang -fcuda-is-device -I ../src/clang/test/SemaCUDA -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S ... assuming your clang source directory is ../src/clang, you want 64-bit PTX, and your target SM is 2.0. Adjust accordingly. Clang also knows how to map OpenCL to PTX, so you would do something like: $ clang test1.cl -Xclang -triple -Xclang nvptx64 -Xclang...
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Test cases exist under test/CodeGen/NVPTX (name changed in May). Now that I'm back at NVIDIA, I'm going to be running through the bugzilla issues (thanks Dmitry for the reports!). I have practically the exact same patch here in my queue. :) In this case, I would prefer ABI alignment for compatibility with the vendor compiler. It should work either way, but I do need to audit the
2013 Mar 20
2
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
Thanks a lot Justin, I will remove the toolkit header. Just one last question..(maybe ;) ) If I do away with toolkit headers it says unknown type name '__device__'. Does this function qualifier have an alternative ? or I can just do away with ? -- View this message in context: http://llvm.1065342.n5.nabble.com/UNREACHABLE-executed-error-while-trying-to-generate-PTX-tp56026p56093.html
2013 Mar 22
2
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
Well, I tried the command line given by you and I get the following error clang++ nbody.kernel.cu -Xclang -fcuda-is-device -I/home/upitamba/llvm-3.2.src/tools/clang/test/SemaCUDA/ -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_10 -S fatal error: error in backend: Cannot select: 0x334a870: v4f32 = NVPTXISD::MoveParam 0x334a770 [ORD=1] [ID=22] 0x334a770: v4f32 = TargetExternalSymbol'.PARAM0' [ID=1] In function: computeBodyAccel Am I doing anything wrong here ? Attached my new nb...
2012 Nov 09
3
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Dmitry, > You're right, global variables use preferred alignment. And - yes, > preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's > prop. compiler gives 4. However, since CUDA 5.0 ptx modules are > linkable with each other, I think alignments for externally visible > functions and data should all follow ABI rules. giving it an alignment of 8 does
2020 Sep 24
2
cuda __shfl_sync problem
...nly -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR programmatically with Linker::linkModules(..) Then after some analysis i use llc to get ptx: llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx This works fine but the problem is that the instrumentation code uses __shfl_sync() and ptxas gives me the following error: ptxas device.ptx, line 1033; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later Now according to http...
2016 May 03
4
Is the CppBackend still supported?
...dian) cpp - C++ backend hexagon - Hexagon mips - Mips mips64 - Mips64 [experimental] mips64el - Mips64el [experimental] mipsel - Mipsel msp430 - MSP430 [experimental] nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit ppc32 - PowerPC 32 ppc64 - PowerPC 64 ppc64le - PowerPC 64 LE r600 - AMD GPUs HD2XXX-HD6XXX sparc - Sparc sparcel - Sparc LE sparcv9 - Sparc V9 systemz - SystemZ thumb...