Displaying 13 results from an estimated 13 matches for "gpuopen".
2018 Sep 05
4
Can I control HSA config generated by AMDGPU backend?
Finally I kind of modified llvm to generate assembly that can run on AMDGPU
pro drivers. One problem is the performance of the code generated by llvm
is about 10% slower than amdgpu's online compiler. Anything I can tune the
performance up the performance of llvm?\
Thanks!
On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote:
> I am writing a miner of crypto
2018 Feb 22
0
SPIRV-LLVM as an external tool
...p
to unify our effort to make this available as an LLVM component. A number of
companies have been involved in the original development of this converter and
there are more that have adopted this design in their propriety or open source
toolchains. Mesa and AMD Vulkan driver (https://github.com/GPUOpen-Drivers/AMDVLK)
are just some examples of the open source ones. There were a number of threads
regarding putting this work upstream in the past years. And due to several
conceptual differences it, unfortunately, took us a while to get an agreement
on the best integration approach. During this t...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...nted that you need to do this, but
>>>>>> I can think of a few concerns/questions. First of all, to implement
>>>>>> the prefix scan, we'll need to do a code sequence that looks like
>>>>>> this, modified from
>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>
>>>>>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 //...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...level shuffle intrinsics implemented that you need to do this, but
>>>> I can think of a few concerns/questions. First of all, to implement
>>>> the prefix scan, we'll need to do a code sequence that looks like
>>>> this, modified from
>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>> v_foo_f32 with the appropriate operation):
>>>>
>>>> ; v0 is the input register
>>>> v_mov_b32 v1, v0
>>>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>>>> v_foo_f32...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...that you need to do
>>>>>> this, but I can think of a few concerns/questions. First of all,
>>>>>> to implement the prefix scan, we'll need to do a code sequence
>>>>>> that looks like this, modified from
>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>>>>>> (replace
>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>
>>>>>> ; v0 is the input register
>>>>>> v_mov_b32 v1, v0
>>>>>> v_foo_f3...
2019 Sep 19
2
Execute OpenCL
Dear all,
After a huge amount of time trying to install LLVM and Clang i could
finally do it, so now im trying to use this tools for generating a
bytecode, then apply it modular optimizations and then generate an
executable to test the result.
First, I only want to compile a project and execute it to see how it works,
specifically this one:
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...On the LLVM side, I think that we have most of the AMD-specific
low-level shuffle intrinsics implemented that you need to do this, but
I can think of a few concerns/questions. First of all, to implement
the prefix scan, we'll need to do a code sequence that looks like
this, modified from
http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
v_foo_f32 with the appropriate operation):
; v0 is the input register
v_mov_b32 v1, v0
v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
v_foo_f32 v1, v0, v1 row_shr:3/ / Instruction 3
v_nop // Add t...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...f the AMD-specific
>> low-level shuffle intrinsics implemented that you need to do this, but
>> I can think of a few concerns/questions. First of all, to implement
>> the prefix scan, we'll need to do a code sequence that looks like
>> this, modified from
>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>> v_foo_f32 with the appropriate operation):
>>
>> ; v0 is the input register
>> v_mov_b32 v1, v0
>> v_foo_f32 v1, v0, v1 row_shr:1 // Instruction 1
>> v_foo_f32 v1, v0, v1 row_shr:2 // Instruction 2
>>...
2019 Sep 26
3
Execute OpenCL
...date optimization
> pipeline in there with you own modifications
>
>
>
> Personally, I would go with the last option.
>
>
>
>
>
> [1]: https://software.intel.com/en-us/opencl-sdk
>
> [2]: https://developer.nvidia.com/opencl
>
> [3]: https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases
>
> [4]: https://developer.arm.com/solutions/graphics/apis/opencl
>
> [5]: https://www.iwocl.org/resources/opencl-implementations/
>
>
>
> [6]: https://github.com/KhronosGroup/OpenCL-ICD-Loader
>
> [7]: https://github.com/KhronosGroup/...
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...ut
>>>>>>>> I can think of a few concerns/questions. First of all, to implement
>>>>>>>> the prefix scan, we'll need to do a code sequence that looks like
>>>>>>>> this, modified from
>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ (replace
>>>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>>>
>>>>>>>> ; v0 is the input register
>>>>>>>> v_mov_b32 v1, v0
>>>>>>>...
2018 Feb 21
4
SPIRV-LLVM as an external tool
On 2018-02-21 — 14:55, Tom Stellard via llvm-dev wrote:
> On 02/21/2018 12:15 AM, Tomeu Vizoso via llvm-dev wrote:
> > Hi,
> >
> > for a few months already I have been asking around for opinions on how
> > people could best work together on Khronos' SPIR-V <-> LLVM-IR converter
> > and some consensus seems to have formed.
> >
> > Most of the
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...need to do this, but I can think of a few concerns/questions.
>>>>>>>>> First of all, to implement the prefix scan, we'll need to do a
>>>>>>>>> code sequence that looks like this, modified from
>>>>>>>>> http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
>>>>>>>>> (replace
>>>>>>>>> v_foo_f32 with the appropriate operation):
>>>>>>>>>
>>>>>>>>> ; v0 is the input register
>>>>>>>...
2020 Jun 30
10
RFC: Adding a staging branch (temporarily) to facilitate upstreaming
On Mon, Jun 29, 2020 at 9:43 PM Mehdi AMINI via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hey Duncan,
>
> On Mon, Jun 29, 2020 at 8:28 PM Duncan Exon Smith via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> To facilitate collaboration on an upstreaming effort (see "More context"
>> below), we'd like to *push a branch* (with history)