thr3ads.net - search: "sm

Displaying 20 results from an estimated 48 matches for "sm_20".

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 12

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

...:64" target triple = "ptx64-unknown-unknown" @llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @_Z4testv to i8*)], section "llvm.metadata" define linkonce_odr ptx_device void @_Z4testv() nounwind inlinehint { entry: ret void } > llc -march=nvptx64 -mcpu=sm_20 test3.ll -o test3.ptx > cat test3.ptx // // Generated by LLVM NVPTX Back-End // .version 3.0 .target sm_20, texmode_independent .address_size 64 .weak _Z4testv .func _Z4testv( ) // @_Z4testv { .reg .pred %p<396>; .reg .s16 %rc<39...

instrumenting device code with gpucc

2016 Mar 05

instrumenting device code with gpucc

...M 3.9, and I've > written a pass to insert hook functions for certain function calls and > memory accesses. For example, given a CUDA program, say, axpy.cu, I > first compile it with > > clang++ -emit-llvm -c axpy.cu, > > which gives me two bitcode files, axpy.bc and axpy-sm_20.bc. Then I use > opt to load my pass and insert the hook functions to axpy.bc, which works > fine. After inspecting the instrumented axpy.bc, I noticed that the kernel > code was not there; rather, it lived inside axpy-sm_20.bc, so I also load > my pass to instrument axpy-sm_20.bc. &g...

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

2012 Jun 13

[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

...nknown" > > @llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @_Z4testv > to i8*)], section "llvm.metadata" > > define linkonce_odr ptx_device void @_Z4testv() nounwind inlinehint { > entry: > ret void > } > > > llc -march=nvptx64 -mcpu=sm_20 test3.ll -o test3.ptx > > cat test3.ptx > > // > // Generated by LLVM NVPTX Back-End > // > > .version 3.0 > .target sm_20, texmode_independent > .address_size 64 > > > .weak _Z4testv > .func _Z4testv( > > )...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...< " .align " << (int) TD->getPrefTypeAlignment(ETy); else O << " .align " << GVar->getAlignment(); Could you please review and commit? Do you think it needs a test case? Thanks, - D. dmikushin at hp2:~/forge/align0> llc -march=nvptx64 -mcpu=sm_20 align0.ll -o - // // Generated by LLVM NVPTX Back-End // .version 3.1 .target sm_20 .address_size 64 // .globl __internal_dsmul .visible .func __internal_dsmul( .param .b64 __internal_dsmul_param_0, .param .align 4 .b8 __internal_dsmul_param_1[8], .param .align 4 .b8 __internal_dsmul_param_2...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hello, FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324 When compiling the following code for sm_20, func params are by some reason given with .align 0, which is invalid. Problem does not occur if compiled for sm_10. > cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple = "ptx64-unknown-unkno...

PTX generation from CUDA file for compute capability 1.0 (sm_10)

2016 Jun 02

PTX generation from CUDA file for compute capability 1.0 (sm_10)

Hello, When generating the PTX output from CUDA file(.cu file), the minimum target that is accepted by LLVM is sm_20. But I have a specific requirement to generate PTX output for compute capability 1.0 (sm_10). Is there any previous version of LLVM supporting this? Thank you, Ginu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachme...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Duncan, You're right, global variables use preferred alignment. And - yes, preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's prop. compiler gives 4. However, since CUDA 5.0 ptx modules are linkable with each other, I think alignments for externally visible functions and data should all follow ABI rules. Is there a guide on making tests? I have ~5 pending patches

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 10

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Perhaps "compatibility" is the wrong term to use here. For now, I would like to "match" what the vendor compiler does. I don't think using preferred alignment would hurt anything in terms of correctness, but I need to go through the entire back-end to see what effects it could have on performance (e.g. adding extra padding increases local memory usage). It could be a

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Dmitry, > I'm attaching a patch that should fix the issue mentioned above. It > simply makes the same check seen in the same file for global > variables: > > emitPTXAddressSpace(PTy->getAddressSpace(), O); > if (GVar->getAlignment() == 0) > O << " .align " << (int) TD->getPrefTypeAlignment(ETy); > else > O

PTX generation from CUDA file for compute capability 1.0 (sm_10)

2016 Jun 02

PTX generation from CUDA file for compute capability 1.0 (sm_10)

...A by a group of researchers (http://www.ecs.umass.edu/ece/tessier/andryc-fpt13.pdf). Our group have some further research interest on this work. I was working on modifying the Clang-LLVM for a couple of months and achieved the required changes. But Clang-LLVM is only allowing me to generate PTX for sm_20, sm_30 etc.While trying to generate PTX for sm_10, it gave *error: unknown target CPU 'sm_10'* *fatal error: cannot open file '/tmp/shared-395893.s': No such file or directory1 error generated.* The compilation command used is: clang -Xclang -I$LIBCLC/include/generic -I$LIBCLC/in...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 10

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Justin, On 09/11/12 22:49, Justin Holewinski wrote: > Test cases exist under test/CodeGen/NVPTX (name changed in May). I've deleted the empty PTX directory. Now that I'm > back at NVIDIA, I'm going to be running through the bugzilla issues (thanks > Dmitry for the reports!). I have practically the exact same patch here in my > queue. :) > > In this case, I

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...; preds = %while.cond br label %while.cond while.end: ; preds = %while.cond ret void } declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32) CODEGEN ========= dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20 // // Generated by LLVM NVPTX Back-End // .version 3.0 .target sm_20, texmode_independent .address_size 32 .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii ( .param .b32 _Z12__iAtomicCASPiii_param_0, .param .b32 _Z12__iAtomicCASPiii_param_1, .param .b32 _Z12__iAtomicCASPiii_param_2 ) ;...

instrumenting device code with gpucc

2016 Mar 10

instrumenting device code with gpucc

...t hook functions for certain function calls and >>> memory accesses. For example, given a CUDA program, say, axpy.cu, I >>> first compile it with >>> >>> clang++ -emit-llvm -c axpy.cu, >>> >>> which gives me two bitcode files, axpy.bc and axpy-sm_20.bc. Then I use >>> opt to load my pass and insert the hook functions to axpy.bc, which works >>> fine. After inspecting the instrumented axpy.bc, I noticed that the kernel >>> code was not there; rather, it lived inside axpy-sm_20.bc, so I also load >>> my pass...

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...cond > > while.end: ; preds = %while.cond > ret void > } > > declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32) > > CODEGEN > ========= > > dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20 > // > // Generated by LLVM NVPTX Back-End > // > > .version 3.0 > .target sm_20, texmode_independent > .address_size 32 > > .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii > ( > .param .b32 _Z12__iAtomicCASPiii_param_0, > .param .b32 _Z12__iAtomicCASP...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Test cases exist under test/CodeGen/NVPTX (name changed in May). Now that I'm back at NVIDIA, I'm going to be running through the bugzilla issues (thanks Dmitry for the reports!). I have practically the exact same patch here in my queue. :) In this case, I would prefer ABI alignment for compatibility with the vendor compiler. It should work either way, but I do need to audit the

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hi Dmitry, > You're right, global variables use preferred alignment. And - yes, > preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's > prop. compiler gives 4. However, since CUDA 5.0 ptx modules are > linkable with each other, I think alignments for externally visible > functions and data should all follow ABI rules. giving it an alignment of 8 does

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 16

[LLVMdev] Attaching range metadata to IntrinsicInst

...nsics as special cases. This approach is already taken for the x86_sse42_crc32_64_64 intrinsic. However, this approach may not be elegant because the ranges of these CUDA special registers depend on the GPU compute capability specified by -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is probably discouraged. Therefore, the approach I am considering is to have clang annotate the ranges of these CUDA special registers according to the -target-cpu flag, and have ValueTracking pick the range metadata for optimization. By d...

Missing R.h

2011 Feb 25

Missing R.h

Hi, I'm trying to install a module - gputools - and keep getting compile time errors about missing R.h Does anyone know where this file can be found? Thanks!

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

2013 Mar 21

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

...uda.h If you compile as CUDA (use .cu extension, or "-x cuda") and use this header, you will have basic support. You can invoke clang with something like: $ clang test1.cu -Xclang -fcuda-is-device -I ../src/clang/test/SemaCUDA -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S ... assuming your clang source directory is ../src/clang, you want 64-bit PTX, and your target SM is 2.0. Adjust accordingly. Clang also knows how to map OpenCL to PTX, so you would do something like: $ clang test1.cl -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S On...

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

2013 Mar 20

[LLVMdev] UNREACHABLE executed! error while trying to generate PTX

Thanks a lot Justin, I will remove the toolkit header. Just one last question..(maybe ;) ) If I do away with toolkit headers it says unknown type name '__device__'. Does this function qualifier have an alternative ? or I can just do away with ? -- View this message in context: http://llvm.1065342.n5.nabble.com/UNREACHABLE-executed-error-while-trying-to-generate-PTX-tp56026p56093.html

search for: sm_20