thr3ads.net - search: "cubin"

NVPTX Back-end: relocatable device code support for dynamic parallelism

2017 Jun 09

1

NVPTX Back-end: relocatable device code support for dynamic parallelism

...t; "/tmp/tmpxft_00007040_00000000-13_cuda_id_test.cpp3.i" -o "/tmp/tmpxft_00007040_00000000-6_cuda_id_test.ptx" #$ ptxas -arch=sm_35 -m64 --compile-only "/tmp/tmpxft_00007040_00000000-6_cuda_id_test.ptx" -o "/tmp/tmpxft_00007040_00000000-14_cuda_id_test.sm_35.cubin" #$ fatbinary --create="/tmp/tmpxft_00007040_00000000-2_cuda_id_test.fatbin" -64 --cmdline="--compile-only " "--image=profile=sm_35,file=/tmp/tmpxft_00007040_00000000-14_cuda_id_test.sm_35.cubin" "--image=profile=compute_35,file=/tmp/tmpxft_00007040_0000000...

[LLVMdev] [NVPTX] PTXAS - Unimplemented feature: labels as initial values

2012 Jul 18

2

[LLVMdev] [NVPTX] PTXAS - Unimplemented feature: labels as initial values

...isible .global .align 4096 .u64 a = _2E_cst12; .visible .global .align 4096 .b8 _2E_cst12[3] = {97, 97, 0}; .visible .global .align 4096 .b8 _2E_cst2[26] = {85, 115, 97, 103, 101, 58, 32, 37, 115, 32, 60, 110, 120, 62, 32, 60, 110, 121, 62, 32, 60, 110, 122, 62, 10, 0}; > ptxas test.ptx -o test.cubin ptxas test.ptx, line 10; error : Unimplemented feature: labels as initial values ptxas test.ptx, line 12; error : Unimplemented feature: labels as initial values ptxas test.ptx, line 10; error : Label expected for forward reference of '_2E_cst1' ptxas test.ptx, line 12; error : Labe...

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 12

2

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

...is: clang generates linkonce_odr out of __inline__, and NVPTX generates .weak out of linkonce_odr (how it happens - a big question, btw, because I can't find anything related in NVPTX asm printer - does it chain to some other printer?), and finally ptxas (both 4.2 and 5) fails to compile it to cubin. Below is the test case: > cat test3.cu __inline__ __attribute__((device)) __attribute__((used)) void test() { return; } > clang -cc1 -emit-llvm -triple ptx64-unknown-unknown -fcuda-is-device test3.cu -o test3.ll > cat test3.ll ; ModuleID = 'test3.cu' target datalayout = &qu...

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

2

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

...%r0; // inline asm .reg .pred %p1; .reg .pred %p2; setp.ne.u32 %p1, %r0, 0; vote.any.pred %p2, %p1; selp.s32 %r0, 1, 0, %p2; // inline asm st.global.u32 [%SP+4], %r0; mov.b32 func_retval0, %r0; ret; } > ptxas test.ptx -o test.cubin ptxas test.ptx, line 33; error : Duplicate definition of variable '%p1' ptxas test.ptx, line 34; error : Duplicate definition of variable '%p2' ptxas test.ptx, line 36; error : Instruction 'vote' requires .target sm_12 or higher ptxas fatal : Ptx assembly aborted due...

[LLVMdev] [NVPTX] PTXAS - Unimplemented feature: labels as initial values

2012 Jul 18

0

[LLVMdev] [NVPTX] PTXAS - Unimplemented feature: labels as initial values

...isible .global .align 4096 .u64 a = _2E_cst12; .visible .global .align 4096 .b8 _2E_cst12[3] = {97, 97, 0}; .visible .global .align 4096 .b8 _2E_cst2[26] = {85, 115, 97, 103, 101, 58, 32, 37, 115, 32, 60, 110, 120, 62, 32, 60, 110, 121, 62, 32, 60, 110, 122, 62, 10, 0}; > ptxas test.ptx -o test.cubin ptxas test.ptx, line 10; error : Unimplemented feature: labels as initial values ptxas test.ptx, line 12; error : Unimplemented feature: labels as initial values ptxas test.ptx, line 10; error : Label expected for forward reference of '_2E_cst1' ptxas test.ptx, line 12; error : Labe...

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

0

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

...%p2; > setp.ne.u32 %p1, %r0, 0; > vote.any.pred %p2, %p1; > selp.s32 %r0, 1, 0, %p2; > > // inline asm > st.global.u32 [%SP+4], %r0; > mov.b32 func_retval0, %r0; > ret; > } > > > ptxas test.ptx -o test.cubin > ptxas test.ptx, line 33; error : Duplicate definition of variable '%p1' > ptxas test.ptx, line 34; error : Duplicate definition of variable '%p2' > ptxas test.ptx, line 36; error : Instruction 'vote' requires .target sm_12 or higher > ptxas fatal : Ptx...

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 13

0

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

...rates linkonce_odr out of __inline__, and NVPTX generates .weak > out of linkonce_odr (how it happens - a big question, btw, because I can't > find anything related in NVPTX asm printer - does it chain to some other > printer?), and finally ptxas (both 4.2 and 5) fails to compile it to cubin. > Below is the test case: > > > cat test3.cu > > __inline__ __attribute__((device)) __attribute__((used)) void test() > { > return; > } > > > clang -cc1 -emit-llvm -triple ptx64-unknown-unknown -fcuda-is-device > test3.cu -o test3.ll > > cat test3.l...

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

2012 Jul 10

1

[LLVMdev] [NVPTX] CUDA inline PTX asm definitions scoping "{" "}" is broken

....reg .pred %p2; > setp.ne.u32 %p1, %r0, 0; > vote.any.pred %p2, %p1; > selp.s32 %r0, 1, 0, %p2; > > // inline asm > st.global.u32 [%SP+4], %r0; > mov.b32 func_retval0, %r0; > ret; > } > > > ptxas test.ptx -o test.cubin > ptxas test.ptx, line 33; error : Duplicate definition of variable '%p1' > ptxas test.ptx, line 34; error : Duplicate definition of variable '%p2' > ptxas test.ptx, line 36; error : Instruction 'vote' requires .target > sm_12 or higher > ptxas fatal :...

nv50 microcode/shader format

2007 Nov 11

1

nv50 microcode/shader format

Hello there, With the use of CUDA and ptx I managed to make a disassembler for at least one of the nv50 microcode/shader formats. This might come in handy for some people, hence I'm posting it here: http://www.cs.rug.nl/~wladimir/decuda/ I'm not yet sure if pixel/vertex shaders use the same instruction format as the compute shaders used by CUDA, but you'd think so, as nv50 is

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

2

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...%f1; add.f32 %f1, %f3, %f0; add.rn.f32 %f0, %f2, %f1; sub.f32 %f2, %f2, %f0; add.rn.f32 %f1, %f2, %f1; ld.param.u64 %rl0, [__internal_dsmul_param_0]; st.f32 [%rl0], %f1; st.f32 [%rl0+4], %f0; ret; } > ptxas -arch=sm_20 test.ptx -o ptx.cubin ptxas test.ptx, line 13; error : Alignment must be a power of two ptxas test.ptx, line 14; error : Alignment must be a power of two ptxas fatal : Ptx assembly aborted due to errors -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/piperm...

[LLVMdev] Backend vs JIT : GPU

2013 Oct 09

2

[LLVMdev] Backend vs JIT : GPU

Hi guys, I am understanding OpenCL compilation flow on GPU in order to develop OpenCL runtime for a new hardware. I understood that OpenCL compiler is part of a vendor's runtime library which is the heart of OpenCL. Since OpenCL kernel is compiled at runtime, hence at high level its compilation takes place in two steps: i. source code is first converted to intermediate code. ii.

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

2016 Mar 09

2

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

...================== Platform Format Location Setter =========== ======= =========== ========================= CUDA PTX disk `AddCudaPtxOnDisk` CUDA PTX memory `AddCudaPtxInMemory` CUDA cubin disk `AddCudaCubinOnDisk` CUDA cubin memory `AddCudaCubinInMemory` OpenCL text disk `AddOpenCLTextOnDisk` OpenCL text memory `AddOpenCLTextInMemory` OpenCL binary disk...

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

2016 Mar 09

2

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

...latform Format Location Setter > =========== ======= =========== ========================= > CUDA PTX disk `AddCudaPtxOnDisk` > CUDA PTX memory `AddCudaPtxInMemory` > CUDA cubin disk `AddCudaCubinOnDisk` > CUDA cubin memory `AddCudaCubinInMemory` > OpenCL text disk `AddOpenCLTextOnDisk` > OpenCL text memory `AddOpenCLTextInMemory` > OpenCL bin...

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

2016 Mar 10

2

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

...ormat Location Setter >> =========== ======= =========== ========================= >> CUDA PTX disk `AddCudaPtxOnDisk` >> CUDA PTX memory `AddCudaPtxInMemory` >> CUDA cubin disk `AddCudaCubinOnDisk` >> CUDA cubin memory `AddCudaCubinInMemory` >> OpenCL text disk `AddOpenCLTextOnDisk` >> OpenCL text memory `AddOpenCLTextInMemory` >> Ope...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

0

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...d.rn.f32 %f0, %f2, %f1; > sub.f32 %f2, %f2, %f0; > add.rn.f32 %f1, %f2, %f1; > ld.param.u64 %rl0, [__internal_dsmul_param_0]; > st.f32 [%rl0], %f1; > st.f32 [%rl0+4], %f0; > ret; > } > >> ptxas -arch=sm_20 test.ptx -o ptx.cubin > ptxas test.ptx, line 13; error : Alignment must be a power of two > ptxas test.ptx, line 14; error : Alignment must be a power of two > ptxas fatal : Ptx assembly aborted due to errors > -------------- next part -------------- A non-text attachment was scrubbed... Name: align0.p...

Google’s TensorFlow team would like to contribute MLIR to the LLVM Foundation

2019 Sep 09

3

Google’s TensorFlow team would like to contribute MLIR to the LLVM Foundation

On Mon, 9 Sep 2019 at 22:22, Chris Lattner <clattner at google.com> wrote: > Including a bunch of content, eg a full langref doc: > https://github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md Thanks Chris, that looks awesome! This one could perhaps be improved with time: https://github.com/tensorflow/mlir/blob/master/g3doc/ConversionToLLVMDialect.md Which I think was Hal's

search for: cubin