thr3ads.net - search: "gpuopen"

Displaying 13 results from an estimated 13 matches for "gpuopen".

Can I control HSA config generated by AMDGPU backend?

2018 Sep 05

Can I control HSA config generated by AMDGPU backend?

Finally I kind of modified llvm to generate assembly that can run on AMDGPU pro drivers. One problem is the performance of the code generated by llvm is about 10% slower than amdgpu's online compiler. Anything I can tune the performance up the performance of llvm?\ Thanks! On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote: > I am writing a miner of crypto

SPIRV-LLVM as an external tool

2018 Feb 22

SPIRV-LLVM as an external tool

...p to unify our effort to make this available as an LLVM component. A number of companies have been involved in the original development of this converter and there are more that have adopted this design in their propriety or open source toolchains. Mesa and AMD Vulkan driver (https://github.com/GPUOpen-Drivers/AMDVLK) are just some examples of the open source ones. There were a number of threads regarding putting this work upstream in the past years. And due to several conceptual differences it, unfortunately, took us a while to get an agreement on the best integration approach. During this t...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...nted that you need to do this, but >>>>>> I can think of a few concerns/questions. First of all, to implement >>>>>> the prefix scan, we'll need to do a code sequence that looks like >>>>>> this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 //...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 13

Implementing cross-thread reduction in the AMDGPU backend

...level shuffle intrinsics implemented that you need to do this, but >>>> I can think of a few concerns/questions. First of all, to implement >>>> the prefix scan, we'll need to do a code sequence that looks like >>>> this, modified from >>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>> v_foo_f32 with the appropriate operation): >>>> >>>> ; v0 is the input register >>>> v_mov_b32 v1, v0 >>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>> v_foo_f32...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...that you need to do >>>>>> this, but I can think of a few concerns/questions. First of all, >>>>>> to implement the prefix scan, we'll need to do a code sequence >>>>>> that looks like this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>> (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f3...

Execute OpenCL

2019 Sep 19

Execute OpenCL

Dear all, After a huge amount of time trying to install LLVM and Clang i could finally do it, so now im trying to use this tools for generating a bytecode, then apply it modular optimizations and then generate an executable to test the result. First, I only want to compile a project and execute it to see how it works, specifically this one:

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...On the LLVM side, I think that we have most of the AMD-specific low-level shuffle intrinsics implemented that you need to do this, but I can think of a few concerns/questions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add t...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...f the AMD-specific >> low-level shuffle intrinsics implemented that you need to do this, but >> I can think of a few concerns/questions. First of all, to implement >> the prefix scan, we'll need to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>...

Execute OpenCL

2019 Sep 26

Execute OpenCL

...date optimization > pipeline in there with you own modifications > > > > Personally, I would go with the last option. > > > > > > [1]: https://software.intel.com/en-us/opencl-sdk > > [2]: https://developer.nvidia.com/opencl > > [3]: https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases > > [4]: https://developer.arm.com/solutions/graphics/apis/opencl > > [5]: https://www.iwocl.org/resources/opencl-implementations/ > > > > [6]: https://github.com/KhronosGroup/OpenCL-ICD-Loader > > [7]: https://github.com/KhronosGroup/...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...ut >>>>>>>> I can think of a few concerns/questions. First of all, to implement >>>>>>>> the prefix scan, we'll need to do a code sequence that looks like >>>>>>>> this, modified from >>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>> >>>>>>>> ; v0 is the input register >>>>>>>> v_mov_b32 v1, v0 >>>>>>&gt...

SPIRV-LLVM as an external tool

2018 Feb 21

SPIRV-LLVM as an external tool

On 2018-02-21 — 14:55, Tom Stellard via llvm-dev wrote: > On 02/21/2018 12:15 AM, Tomeu Vizoso via llvm-dev wrote: > > Hi, > > > > for a few months already I have been asking around for opinions on how > > people could best work together on Khronos' SPIR-V <-> LLVM-IR converter > > and some consensus seems to have formed. > > > > Most of the

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...need to do this, but I can think of a few concerns/questions. >>>>>>>>> First of all, to implement the prefix scan, we'll need to do a >>>>>>>>> code sequence that looks like this, modified from >>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>>>>> (replace >>>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>>> >>>>>>>>> ; v0 is the input register >>>>>>&gt...

RFC: Adding a staging branch (temporarily) to facilitate upstreaming

2020 Jun 30

RFC: Adding a staging branch (temporarily) to facilitate upstreaming

On Mon, Jun 29, 2020 at 9:43 PM Mehdi AMINI via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hey Duncan, > > On Mon, Jun 29, 2020 at 8:28 PM Duncan Exon Smith via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> To facilitate collaboration on an upstreaming effort (see "More context" >> below), we'd like to *push a branch* (with history)

search for: gpuopen