Displaying 20 results from an estimated 48 matches for "sm_20".
2012 Jun 12
2
[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it
...:64"
target triple = "ptx64-unknown-unknown"
@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @_Z4testv to
i8*)], section "llvm.metadata"
define linkonce_odr ptx_device void @_Z4testv() nounwind inlinehint {
entry:
ret void
}
> llc -march=nvptx64 -mcpu=sm_20 test3.ll -o test3.ptx
> cat test3.ptx
//
// Generated by LLVM NVPTX Back-End
//
.version 3.0
.target sm_20, texmode_independent
.address_size 64
.weak _Z4testv
.func _Z4testv(
) // @_Z4testv
{
.reg .pred %p<396>;
.reg .s16 %rc<39...
2016 Mar 05
2
instrumenting device code with gpucc
...M 3.9, and I've
> written a pass to insert hook functions for certain function calls and
> memory accesses. For example, given a CUDA program, say, axpy.cu, I
> first compile it with
>
> clang++ -emit-llvm -c axpy.cu,
>
> which gives me two bitcode files, axpy.bc and axpy-sm_20.bc. Then I use
> opt to load my pass and insert the hook functions to axpy.bc, which works
> fine. After inspecting the instrumented axpy.bc, I noticed that the kernel
> code was not there; rather, it lived inside axpy-sm_20.bc, so I also load
> my pass to instrument axpy-sm_20.bc.
&g...
2012 Jun 13
0
[LLVMdev] [NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it
...nknown"
>
> @llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @_Z4testv
> to i8*)], section "llvm.metadata"
>
> define linkonce_odr ptx_device void @_Z4testv() nounwind inlinehint {
> entry:
> ret void
> }
>
> > llc -march=nvptx64 -mcpu=sm_20 test3.ll -o test3.ptx
> > cat test3.ptx
>
> //
> // Generated by LLVM NVPTX Back-End
> //
>
> .version 3.0
> .target sm_20, texmode_independent
> .address_size 64
>
>
> .weak _Z4testv
> .func _Z4testv(
>
> )...
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
...< " .align " << (int) TD->getPrefTypeAlignment(ETy);
else
O << " .align " << GVar->getAlignment();
Could you please review and commit? Do you think it needs a test case?
Thanks,
- D.
dmikushin at hp2:~/forge/align0> llc -march=nvptx64 -mcpu=sm_20 align0.ll -o -
//
// Generated by LLVM NVPTX Back-End
//
.version 3.1
.target sm_20
.address_size 64
// .globl __internal_dsmul
.visible .func __internal_dsmul(
.param .b64 __internal_dsmul_param_0,
.param .align 4 .b8 __internal_dsmul_param_1[8],
.param .align 4 .b8 __internal_dsmul_param_2...
2012 Jul 11
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hello,
FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324
When compiling the following code for sm_20, func params are by some reason
given with .align 0, which is invalid. Problem does not occur if compiled
for sm_10.
> cat test.ll
; ModuleID = '__kernelgen_main_module'
target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64"
target triple = "ptx64-unknown-unkno...
2016 Jun 02
5
PTX generation from CUDA file for compute capability 1.0 (sm_10)
Hello,
When generating the PTX output from CUDA file(.cu file), the minimum target
that is accepted by LLVM is sm_20. But I have a specific requirement to
generate PTX output for compute capability 1.0 (sm_10). Is there any
previous version of LLVM supporting this?
Thank you,
Ginu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachme...
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Duncan,
You're right, global variables use preferred alignment. And - yes,
preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's
prop. compiler gives 4. However, since CUDA 5.0 ptx modules are
linkable with each other, I think alignments for externally visible
functions and data should all follow ABI rules.
Is there a guide on making tests? I have ~5 pending patches
2012 Nov 10
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Perhaps "compatibility" is the wrong term to use here. For now, I would
like to "match" what the vendor compiler does. I don't think using
preferred alignment would hurt anything in terms of correctness, but I need
to go through the entire back-end to see what effects it could have on
performance (e.g. adding extra padding increases local memory usage). It
could be a
2012 Nov 09
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Dmitry,
> I'm attaching a patch that should fix the issue mentioned above. It
> simply makes the same check seen in the same file for global
> variables:
>
> emitPTXAddressSpace(PTy->getAddressSpace(), O);
> if (GVar->getAlignment() == 0)
> O << " .align " << (int) TD->getPrefTypeAlignment(ETy);
> else
> O
2016 Jun 02
3
PTX generation from CUDA file for compute capability 1.0 (sm_10)
...A by a
group of researchers (http://www.ecs.umass.edu/ece/tessier/andryc-fpt13.pdf).
Our group have some further research interest on this work. I was working
on modifying the Clang-LLVM for a couple of months and achieved the
required changes. But Clang-LLVM is only allowing me to generate PTX for
sm_20, sm_30 etc.While trying to generate PTX for sm_10, it gave
*error: unknown target CPU 'sm_10'*
*fatal error: cannot open file '/tmp/shared-395893.s': No such file or
directory1 error generated.*
The compilation command used is:
clang -Xclang -I$LIBCLC/include/generic -I$LIBCLC/in...
2012 Nov 10
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Justin,
On 09/11/12 22:49, Justin Holewinski wrote:
> Test cases exist under test/CodeGen/NVPTX (name changed in May).
I've deleted the empty PTX directory.
Now that I'm
> back at NVIDIA, I'm going to be running through the bugzilla issues (thanks
> Dmitry for the reports!). I have practically the exact same patch here in my
> queue. :)
>
> In this case, I
2012 May 16
2
[LLVMdev] NVPTX: __iAtomicCAS support ?
...; preds = %while.cond
br label %while.cond
while.end: ; preds = %while.cond
ret void
}
declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32)
CODEGEN
=========
dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20
//
// Generated by LLVM NVPTX Back-End
//
.version 3.0
.target sm_20, texmode_independent
.address_size 32
.func (.param .b32 func_retval0) _Z12__iAtomicCASPiii
(
.param .b32 _Z12__iAtomicCASPiii_param_0,
.param .b32 _Z12__iAtomicCASPiii_param_1,
.param .b32 _Z12__iAtomicCASPiii_param_2
)
;...
2016 Mar 10
4
instrumenting device code with gpucc
...t hook functions for certain function calls and
>>> memory accesses. For example, given a CUDA program, say, axpy.cu, I
>>> first compile it with
>>>
>>> clang++ -emit-llvm -c axpy.cu,
>>>
>>> which gives me two bitcode files, axpy.bc and axpy-sm_20.bc. Then I use
>>> opt to load my pass and insert the hook functions to axpy.bc, which works
>>> fine. After inspecting the instrumented axpy.bc, I noticed that the kernel
>>> code was not there; rather, it lived inside axpy-sm_20.bc, so I also load
>>> my pass...
2012 May 16
0
[LLVMdev] NVPTX: __iAtomicCAS support ?
...cond
>
> while.end: ; preds = %while.cond
> ret void
> }
>
> declare ptx_device i32 @_Z12__iAtomicCASPiii(i32*, i32, i32)
>
> CODEGEN
> =========
>
> dmikushin at hp2:~> llc < kernelgen_monitor.ll -march=nvptx -mcpu=sm_20
> //
> // Generated by LLVM NVPTX Back-End
> //
>
> .version 3.0
> .target sm_20, texmode_independent
> .address_size 32
>
> .func (.param .b32 func_retval0) _Z12__iAtomicCASPiii
> (
> .param .b32 _Z12__iAtomicCASPiii_param_0,
> .param .b32 _Z12__iAtomicCASP...
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Test cases exist under test/CodeGen/NVPTX (name changed in May). Now that
I'm back at NVIDIA, I'm going to be running through the bugzilla issues
(thanks Dmitry for the reports!). I have practically the exact same patch
here in my queue. :)
In this case, I would prefer ABI alignment for compatibility with the
vendor compiler. It should work either way, but I do need to audit the
2012 Nov 09
3
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
Hi Dmitry,
> You're right, global variables use preferred alignment. And - yes,
> preferred alignment in this case is bigger: 8 instead of 4. NVIDIA's
> prop. compiler gives 4. However, since CUDA 5.0 ptx modules are
> linkable with each other, I think alignments for externally visible
> functions and data should all follow ABI rules.
giving it an alignment of 8 does
2014 Jun 16
3
[LLVMdev] Attaching range metadata to IntrinsicInst
...nsics as special cases. This approach is already taken for the
x86_sse42_crc32_64_64 intrinsic. However, this approach may not be elegant
because the ranges of these CUDA special registers depend on the GPU
compute capability specified by -target-cpu. For instance, blockIdx.x is
bounded by 65535 in sm_20 but 2^31-1 in sm_30. Exposing -target-cpu to
ValueTracking is probably discouraged.
Therefore, the approach I am considering is to have clang annotate the
ranges of these CUDA special registers according to the -target-cpu flag,
and have ValueTracking pick the range metadata for optimization. By d...
2011 Feb 25
2
Missing R.h
Hi,
I'm trying to install a module - gputools - and keep getting compile
time errors about missing R.h
Does anyone know where this file can be found?
Thanks!
2013 Mar 21
0
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
...uda.h
If you compile as CUDA (use .cu extension, or "-x cuda") and use this
header, you will have basic support. You can invoke clang with something
like:
$ clang test1.cu -Xclang -fcuda-is-device -I ../src/clang/test/SemaCUDA
-Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S
... assuming your clang source directory is ../src/clang, you want 64-bit
PTX, and your target SM is 2.0. Adjust accordingly.
Clang also knows how to map OpenCL to PTX, so you would do something like:
$ clang test1.cl -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu
-Xclang sm_20 -S
On...
2013 Mar 20
2
[LLVMdev] UNREACHABLE executed! error while trying to generate PTX
Thanks a lot Justin,
I will remove the toolkit header. Just one last question..(maybe ;) ) If I
do away with toolkit headers it says unknown type name '__device__'. Does
this function qualifier have an alternative ? or I can just do away with ?
--
View this message in context: http://llvm.1065342.n5.nabble.com/UNREACHABLE-executed-error-while-trying-to-generate-PTX-tp56026p56093.html