search for: gpuopen

Displaying 13 results from an estimated 13 matches for "gpuopen".

2018 Sep 05
4
Can I control HSA config generated by AMDGPU backend?
Finally I kind of modified llvm to generate assembly that can run on AMDGPU pro drivers. One problem is the performance of the code generated by llvm is about 10% slower than amdgpu's online compiler. Anything I can tune the performance up the performance of llvm?\ Thanks! On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote: > I am writing a miner of crypto
2018 Feb 22
0
SPIRV-LLVM as an external tool
...p to unify our effort to make this available as an LLVM component. A number of companies have been involved in the original development of this converter and there are more that have adopted this design in their propriety or open source toolchains. Mesa and AMD Vulkan driver (https://github.com/GPUOpen-Drivers/AMDVLK) are just some examples of the open source ones. There were a number of threads regarding putting this work upstream in the past years. And due to several conceptual differences it, unfortunately, took us a while to get an agreement on the best integration approach. During this t...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...nted that you need to do this, but >>>>>> I can think of a few concerns/questions. First of all, to implement >>>>>> the prefix scan, we'll need to do a code sequence that looks like >>>>>> this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 //...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...level shuffle intrinsics implemented that you need to do this, but >>>> I can think of a few concerns/questions. First of all, to implement >>>> the prefix scan, we'll need to do a code sequence that looks like >>>> this, modified from >>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>> v_foo_f32 with the appropriate operation): >>>> >>>> ; v0 is the input register >>>> v_mov_b32 v1, v0 >>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >>>> v_foo_f32...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...that you need to do >>>>>> this, but I can think of a few concerns/questions. First of all, >>>>>> to implement the prefix scan, we'll need to do a code sequence >>>>>> that looks like this, modified from >>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>> (replace >>>>>> v_foo_f32 with the appropriate operation): >>>>>> >>>>>> ; v0 is the input register >>>>>> v_mov_b32 v1, v0 >>>>>> v_foo_f3...
2019 Sep 19
2
Execute OpenCL
Dear all, After a huge amount of time trying to install LLVM and Clang i could finally do it, so now im trying to use this tools for generating a bytecode, then apply it modular optimizations and then generate an executable to test the result. First, I only want to compile a project and execute it to see how it works, specifically this one:
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...On the LLVM side, I think that we have most of the AMD-specific low-level shuffle intrinsics implemented that you need to do this, but I can think of a few concerns/questions. First of all, to implement the prefix scan, we'll need to do a code sequence that looks like this, modified from http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace v_foo_f32 with the appropriate operation): ; v0 is the input register v_mov_b32 v1, v0 v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3 v_nop // Add t...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...f the AMD-specific >> low-level shuffle intrinsics implemented that you need to do this, but >> I can think of a few concerns/questions. First of all, to implement >> the prefix scan, we'll need to do a code sequence that looks like >> this, modified from >> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >> v_foo_f32 with the appropriate operation): >> >> ; v0 is the input register >> v_mov_b32 v1, v0 >> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1 >> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2 >>...
2019 Sep 26
3
Execute OpenCL
...date optimization > pipeline in there with you own modifications > > > > Personally, I would go with the last option. > > > > > > [1]: https://software.intel.com/en-us/opencl-sdk > > [2]: https://developer.nvidia.com/opencl > > [3]: https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases > > [4]: https://developer.arm.com/solutions/graphics/apis/opencl > > [5]: https://www.iwocl.org/resources/opencl-implementations/ > > > > [6]: https://github.com/KhronosGroup/OpenCL-ICD-Loader > > [7]: https://github.com/KhronosGroup/...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...ut >>>>>>>> I can think of a few concerns/questions. First of all, to implement >>>>>>>> the prefix scan, we'll need to do a code sequence that looks like >>>>>>>> this, modified from >>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace >>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>> >>>>>>>> ; v0 is the input register >>>>>>>> v_mov_b32 v1, v0 >>>>>>&gt...
2018 Feb 21
4
SPIRV-LLVM as an external tool
On 2018-02-21 — 14:55, Tom Stellard via llvm-dev wrote: > On 02/21/2018 12:15 AM, Tomeu Vizoso via llvm-dev wrote: > > Hi, > > > > for a few months already I have been asking around for opinions on how > > people could best work together on Khronos' SPIR-V <-> LLVM-IR converter > > and some consensus seems to have formed. > > > > Most of the
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...need to do this, but I can think of a few concerns/questions. >>>>>>>>> First of all, to implement the prefix scan, we'll need to do a >>>>>>>>> code sequence that looks like this, modified from >>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ >>>>>>>>> (replace >>>>>>>>> v_foo_f32 with the appropriate operation): >>>>>>>>> >>>>>>>>> ; v0 is the input register >>>>>>&gt...
2020 Jun 30
10
RFC: Adding a staging branch (temporarily) to facilitate upstreaming
On Mon, Jun 29, 2020 at 9:43 PM Mehdi AMINI via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Hey Duncan, > > On Mon, Jun 29, 2020 at 8:28 PM Duncan Exon Smith via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> To facilitate collaboration on an upstreaming effort (see "More context" >> below), we'd like to *push a branch* (with history)