thr3ads.net - search: "gpugems3"

Displaying 8 results from an estimated 8 matches for "gpugems3".

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 06

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...ckend, and I would like to upstream it if possible. This backend is implemented by LLVM's target independent code generator framework; I think this will make it easier to maintain. I have tested this backend to translate a work-efficient parallel scan kernel ( http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html ) into PTX code. The generated PTX code was then executed on real hardware, and the result is correct. So far I have to hack clang to generate bitcode for this backend, but I will try to patch clang to parse CUDA (or OpenCL) while I am upstreaming this backend. I am new to LLV...

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 10

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...g code generator. > > But I didn't study their code thoroughly, so I might be wrong about this. I haven't had a chance to look at it yet either. >>> I have tested this backend to translate a work-efficient parallel scan >>> kernel ( http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html >>> ) into PTX code. The generated PTX code was then executed on real >>> hardware, and the result is correct. >> >> How much of the LLVM IR does this support? What's missing? > Have to add some intrinsics, calling conventions, and address s...

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 09

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...ated, can you do a comparison of the two? Perhaps there are holes in each that can be filled by the other. It would be a shame to have two completely different PTX backends. > I have tested this backend to translate a work-efficient parallel scan > kernel ( http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html > ) into PTX code. The generated PTX code was then executed on real > hardware, and the result is correct. How much of the LLVM IR does this support? What's missing? > So far I have to hack clang to generate bitcode for this backend, but > I will try to patch...

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 10

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...some PTX instruction set features that are hard to exploit if not using code generator. But I didn't study their code thoroughly, so I might be wrong about this. >> I have tested this backend to translate a work-efficient parallel scan >> kernel ( http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html >> ) into PTX code. The generated PTX code was then executed on real >> hardware, and the result is correct. > > How much of the LLVM IR does this support? What's missing? Have to add some intrinsics, calling conventions, and address spaces. I would say th...

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 10

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...39;t study their code thoroughly, so I might be wrong about > this. > > I haven't had a chance to look at it yet either. > > >>> I have tested this backend to translate a work-efficient parallel > scan > >>> kernel ( > http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html > >>> ) into PTX code. The generated PTX code was then executed on real > >>> hardware, and the result is correct. > >> > >> How much of the LLVM IR does this support? What's missing? > > Have to add some intrinsics, calling c...

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 07

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...t; upstream it if possible. This backend is implemented by LLVM's target > independent code generator framework; I think this will make it easier > to maintain. > > I have tested this backend to translate a work-efficient parallel scan > kernel ( http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html > ) into PTX code. The generated PTX code was then executed on real > hardware, and the result is correct. > > So far I have to hack clang to generate bitcode for this backend, but > I will try to patch clang to parse CUDA (or OpenCL) while I am > upstreaming t...

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

2010 Aug 11

[LLVMdev] Upstream PTX backend that uses target independent code generator if possible

...ly, so I might be wrong about >> this. >> >> I haven't had a chance to look at it yet either. >> >> >>> I have tested this backend to translate a work-efficient parallel >> scan >> >>> kernel ( >> http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html >> >>> ) into PTX code. The generated PTX code was then executed on real >> >>> hardware, and the result is correct. >> >> >> >> How much of the LLVM IR does this support? What's missing? >> > Have to add some i...

[LLD] Linker Relaxation

2017 Jul 11

[LLD] Linker Relaxation

Here's an example using the gcc toolchain for embedded 32 bit RISC-V (my HiFive1 board): #include <stdio.h> int foo(int i){ if (i < 100){ printf("%d\n", i); } return i; } int main(){ foo(10); return 0; } After compiling to a .o with -O2 -march=RV32IC we get (just looking at foo) 00000000 <foo>: 0: 1141 addi sp,sp,-16

search for: gpugems3