thr3ads.net - search: "sm

Displaying 17 results from an estimated 17 matches for "sm_30".

2016 Mar 12

instrumenting device code with gpucc

...I believe the linking issue was caused > by nvvm reflection anchors. I haven't played with that, but I guess running > nvvm-reflect on an IR removes the nvvm reflect anchors. After that, you can > llvm-link the two bc/ll files. > > Another potential issue is that your cuda_hooks-sm_30.ll is unoptimized. > This could cause the instrumented code to run super slow. > > On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < > yuanfeng.jack.peng at gmail.com> wrote: > >> Hey Jingyue, >> >> Attached are the .ll files. Thanks! >> >> yuanfen...

instrumenting device code with gpucc

2016 Mar 10

instrumenting device code with gpucc

It's hard to tell what is wrong without a concrete example. E.g., what is the program you are instrumenting? What is the definition of the hook function? How did you link that definition with the binary? One thing suspicious to me is that you may have linked the definition of _Cool_MemRead_Hook as a host function instead of a device function. AFAIK, PTX assembly cannot be linked. So, if you

instrumenting device code with gpucc

2016 Mar 13

instrumenting device code with gpucc

..." "-fcxx-exceptions" "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops" "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu" "-fcuda-include-gpubinary" "axpy-sm_30.fatbin" which, from my understanding, compiles the host code in tests/axpy.cu and link it with axpy-sm_30.fatbin. However, now that I instrumented the IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd should I use to link axpy.s with axpy-sm_30.fatbin? I tried to use...

instrumenting device code with gpucc

2016 Mar 15

instrumenting device code with gpucc

...ns" >> "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops" >> "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu" >> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin" >> >> which, from my understanding, compiles the host code in tests/axpy.cu >> and link it with axpy-sm_30.fatbin. However, now that I instrumented the >> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd >> should I use to link axp...

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 16

[LLVMdev] Attaching range metadata to IntrinsicInst

...es. This approach is already taken for the x86_sse42_crc32_64_64 intrinsic. However, this approach may not be elegant because the ranges of these CUDA special registers depend on the GPU compute capability specified by -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is probably discouraged. Therefore, the approach I am considering is to have clang annotate the ranges of these CUDA special registers according to the -target-cpu flag, and have ValueTracking pick the range metadata for optimization. By doing so, we hide the...

PTX generation from CUDA file for compute capability 1.0 (sm_10)

2016 Jun 02

PTX generation from CUDA file for compute capability 1.0 (sm_10)

...group of researchers (http://www.ecs.umass.edu/ece/tessier/andryc-fpt13.pdf). Our group have some further research interest on this work. I was working on modifying the Clang-LLVM for a couple of months and achieved the required changes. But Clang-LLVM is only allowing me to generate PTX for sm_20, sm_30 etc.While trying to generate PTX for sm_10, it gave *error: unknown target CPU 'sm_10'* *fatal error: cannot open file '/tmp/shared-395893.s': No such file or directory1 error generated.* The compilation command used is: clang -Xclang -I$LIBCLC/include/generic -I$LIBCLC/include/p...

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

[LLVMdev] Attaching range metadata to IntrinsicInst

...or the x86_sse42_crc32_64_64 intrinsic. However, this > approach may not be elegant because the ranges of these CUDA special > registers depend on the GPU compute capability specified by > -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 > but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is > probably discouraged. > > Therefore, the approach I am considering is to have clang annotate > the ranges of these CUDA special registers according to the > -target-cpu flag, and have ValueTracking pick the range metadata fo...

[LLVMdev] Problem with PTX assembly printing (NVPTX backend)

2013 Feb 04

[LLVMdev] Problem with PTX assembly printing (NVPTX backend)

Hi Nikolaos, Following commands work great for me. $ clang -S -emit-llvm -target nvptx -x cl -include clc/clctypes.h ../data-types/scalar.cl $ llc -mcpu=sm_30 scalar.s You can follow Justin's blog [1]. It helped me a lot to understand where to start. [1] http://jholewinski.org/blog/llvm-3-0-ptx-backend/ Best, Ankur On Mon, Feb 4, 2013 at 11:40 PM, Justin Holewinski < justin.holewinski at gmail.com> wrote: > On Mon, Feb 4, 2013 at 1:09...

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

[LLVMdev] Attaching range metadata to IntrinsicInst

...rinsic. However, this >>> approach may not be elegant because the ranges of these CUDA special >>> registers depend on the GPU compute capability specified by >>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 >>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is >>> probably discouraged. >>> >>> Therefore, the approach I am considering is to have clang annotate >>> the ranges of these CUDA special registers according to the >>> -target-cpu flag, and have...

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

[LLVMdev] Attaching range metadata to IntrinsicInst

...ch may not be elegant because the ranges of these CUDA >>>>> special >>>>> registers depend on the GPU compute capability specified by >>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20 >>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is >>>>> probably discouraged. >>>>> >>>>> Therefore, the approach I am considering is to have clang annotate >>>>> the ranges of these CUDA special registers according to the >>>...

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

[LLVMdev] Attaching range metadata to IntrinsicInst

...t;>>>> special >> >>>>> registers depend on the GPU compute capability specified by >> >>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in >> >>>>> sm_20 >> >>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is >> >>>>> probably discouraged. >> >>>>> >> >>>>> Therefore, the approach I am considering is to have clang >> >>>>> annotate >> >>>>> the ra...

PTX generation from CUDA file for compute capability 1.0 (sm_10)

2016 Jun 02

PTX generation from CUDA file for compute capability 1.0 (sm_10)

Hello, When generating the PTX output from CUDA file(.cu file), the minimum target that is accepted by LLVM is sm_20. But I have a specific requirement to generate PTX output for compute capability 1.0 (sm_10). Is there any previous version of LLVM supporting this? Thank you, Ginu -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Problem with PTX assembly printing (NVPTX backend)

2013 Feb 04

[LLVMdev] Problem with PTX assembly printing (NVPTX backend)

On Mon, Feb 4, 2013 at 1:09 PM, <nkavv at physics.auth.gr> wrote: > Hi Justin, > > > Has anyone had similar problems with the NVPTX backend? Shouldn't this >>> code be linked to the AsmPrinter library for NVPTX (already)? >>> >> >> What do you mean by "doesn't work"? The AsmPrinter library really houses >> the MCInst

question about Makeconf and nvcc/CUDA

2013 Jul 18

question about Makeconf and nvcc/CUDA

Dear R development: I'm not sure if this is the appropriate list, but it's a start. I would like to put together a package which contains a CUDA program on Windows 7. I believe that it has to do with the Makeconf file in the etc directory. But when I just use the nvcc with the shared option, I can use the dyn.load command, but when I use the is.loaded function, it shows FALSE.

[LLVMdev] Problem with PTX assembly printing (NVPTX backend)

2013 Feb 04

[LLVMdev] Problem with PTX assembly printing (NVPTX backend)

Hi Justin, >> Has anyone had similar problems with the NVPTX backend? Shouldn't this >> code be linked to the AsmPrinter library for NVPTX (already)? > > What do you mean by "doesn't work"? The AsmPrinter library really houses > the MCInst printer, which isn't implemented for NVPTX yet. The older > assembly printer works just fine. This is

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

Hi Justin, Thanks for your response! The clang & llvm I'm using was built from source. Below is the output of compiling with -v. Any suggestions would be appreciated! *clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)* *Target: x86_64-unknown-linux-gnu* *Thread model: posix* *InstalledDir: /usr/local/bin* *Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8*

[GPUCC] link against libdevice

2016 Aug 01

[GPUCC] link against libdevice

OK, I see the problem. You were right that we weren't picking up libdevice. CUDA 7.0 only ships with the following libdevice binaries (found /path/to/cuda/nvvm/libdevice): libdevice.compute_20.10.bc libdevice.compute_30.10.bc libdevice.compute_35.10.bc If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice binary, and it will apparently silently give up and try to

search for: sm_30