thr3ads.net - llvm dev - [llvm-dev] Can I control HSA config generated by AMDGPU backend? [Sep 2018]

If this information is useful, please help other people find it:
Share via:

Changdao Dong via llvm-dev

2018-Sep-05 18:17 UTC

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

Finally I kind of modified llvm to generate assembly that can run on AMDGPU
pro drivers. One problem is the performance of the code generated by llvm
is about 10% slower than amdgpu's online compiler. Anything I can tune the
performance up the performance of llvm?\

Thanks!

On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote:
> I am writing a miner of crypto currency, for which most users run it with
> amdgpu driver. I have written a script the translate the meta data of LLVM
> isa format into clrxasm format. I also modified ROCm version of llvm to
> reorganize the order of the kernel args so that it’s compatible with
> clrxasm. It seems working and clrxasm seems support this dispatch kernel
> ptr thing. But it would be nice if I can turn it off. Reading the LLVM code
> it seems this intrinsic is hard coded? Hope not.
>
> Regards!
>    Changdao from cell phone
>
> On Sep 4, 2018, at 4:12 AM, Tamazov, Artem <Artem.Tamazov at amd.com>
wrote:
>
> ...
>
>
>
> Hi Artem,
>
>
>
> Thanks  for replying!
>
> I am working on OpenCL program that runs with AMDGPU Pro driver.
>
> However, the compiler comes with it doesn't support inline assembly. So
my
> plan is to compile my OpenCL code with inline assembly with llvm to get a
> isa file, and then generate the binary using clrxasm.
>
> It seems that the calling conventions of llvm and clrxasm are different in
> some register usages but I cannot figure out the detail from their docs. I
> wonder if llvm supports turning off "enable_sgpr_dispatch_ptr =
1" so that
> it's compatible with clrxasm.
>
>
>
> Thanks,
>
>   Changdao
>
>
>
> On Mon, Sep 3, 2018 at 5:25 AM Tamazov, Artem <Artem.Tamazov at
amd.com>
> wrote:
>
> Hello,
>
>
>
> Please look into https://llvm.org/docs/AMDGPUUsage.html.
>
>
>
> > My target is amdgpu--amdhsa.
>
>
>
> This means that the kernel(s) are to be executed on HSA compatible
> runtimes such as AMD’s ROCm.
>
>
>
> > ..."enable_sgpr_dispatch_ptr = 1". Can I do something to
turn that off
> in the generated assembly file?
>
> > ...user argument is placed at the first place while hidden arguments
> like "HiddenGlobalOffsetX" are placed after user arguments.
>
> > Can I change the order of the arguments so that the first argument
will
> be hidden arguments before user arguments?
>
>
>
> If your wishes are met, then the compatibility with HSA will be broken.
>
>
>
> > Can I change the order of the arguments so that the first argument
will
> be hidden arguments before user arguments?
>
>
>
> I think this is not normally possible while target stays at
"...-amdhsa".
>
>
>
> Also I think that community would be able to help you more if you explain
> your reasons.
>
>
>
> Regards,
>
> artem
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of
*Changdao
> Dong via llvm-dev
> *Sent:* 31 August 2018 г. 22:15
> *To:* llvm-dev at lists.llvm.org
> *Subject:* [llvm-dev] Can I control HSA config generated by AMDGPU
> backend?
>
>
>
> I am using llvm clang to offline compile my opencl code into assembly. My
> target is amdgpu--amdhsa. The assembly file generated by clang has config
> of "enable_sgpr_dispatch_ptr = 1". Can I do something to turn
that off in
> the generated assembly file? Also, it seems that the order of kernel
> arguments is in the reverse order of AMDCL2 convention. i.e. user argument
> is placed at the first place while hidden arguments like
> "HiddenGlobalOffsetX" are placed after user arguments. Can I
change the
> order of the arguments so that the first argument will be hidden arguments
> before user arguments?
>
>
>
> Thanks a lot!
>
>
> --
>
> DONG, Changdao
>
> dongchangdao at gmail.com
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180905/a7137936/attachment.html>

Matt Arsenault via llvm-dev

2018-Sep-05 18:31 UTC

head link

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

> On Sep 5, 2018, at 23:17, Changdao Dong via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> Finally I kind of modified llvm to generate assembly that can run on AMDGPU
pro drivers. One problem is the performance of the code generated by llvm is
about 10% slower than amdgpu's online compiler. Anything I can tune the
performance up the performance of llvm?\
> This is very dependent on the case you are looking at, so without a specific
example or ISA comparison between the compilers there’s no guessing

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180905/b9bf3874/attachment.html>

Changdao Dong via llvm-dev

2018-Sep-05 19:26 UTC

head link

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

The target algorithm is lyra2 and the target kernel is "search2" in
https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/phi2.cl.
The detail is implemented in
https://github.com/fancyIX/sgminer-phi2-branch/blob/master/kernel/lyra2mdz.cl
If you have time to take a look at the assembly, I can upload them later
today.

Thanks,
    Changdao

On Wed, Sep 5, 2018 at 11:32 AM Matt Arsenault <arsenm2 at gmail.com>
wrote:
>
>
> On Sep 5, 2018, at 23:17, Changdao Dong via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> Finally I kind of modified llvm to generate assembly that can run on
> AMDGPU pro drivers. One problem is the performance of the code generated by
> llvm is about 10% slower than amdgpu's online compiler. Anything I can
tune
> the performance up the performance of llvm?\
>
> This is very dependent on the case you are looking at, so without a
> specific example or ISA comparison between the compilers there’s no
guessing
>
> -Matt
>

-- 
DONG, Changdao

MP: 1-412-551-2330
dongchangdao at gmail.com <cddong at cmu.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180905/ed1da73f/attachment.html>

UE US via llvm-dev

2018-Sep-07 03:11 UTC

head link

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

On Wed, Sep 5, 2018 at 1:17 PM Changdao Dong via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> Finally I kind of modified llvm to generate assembly that can run on
> AMDGPU pro drivers. One problem is the performance of the code generated by
> llvm is about 10% slower than amdgpu's online compiler. Anything I can
tune
> the performance up the performance of llvm?\
>
> Thanks!
>
> On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote:
>
>> I am writing a miner of crypto currency, for which most users run it
with
>> amdgpu driver. I have written a script the translate the meta data of
LLVM
>> isa format into clrxasm format.
>>
>clrxasm's docs say it only supports GCN devices to begin with, so it seems
like you wouldn't actually want to use the --amdhsa "os" flag (or
the
amdgpu target, you'd want amdgcn);  that's for things that will be
directly
loaded with the HSA API as far as I know.  If you felt like it you could
load and execute them with that API instead of the opencl one and not mess
around with it further than that.  I've never worked with that, so Artem
can probably tell you more if that doesn't explain things.  It looks
relatively straightforward.
https://gpuopen.com/rocm-with-harmony-combining-opencl-hcc-hsa-in-a-single-program/

This page  https://openwall.info/wiki/john/development/AMD-IL (linked from
another AMD list posting last year about something similar)   says that the
following work:

*(i)*Setting the environment variable:
AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps ./Name_of_executable
*(ii)*Using the build options:
In clBuildProram() specify ”-save-temps” in the build option field to
generate IL and ISA.

...and the driver will retain the .isa and .il files, but then you'd still
be left with patching in your changes somehow.   If that works it would at
least give you an example of what LLVM is currently generating vs. the
driver so you can compare those and also modify / test assembly changes to
determine if they're worthwhile for whatever issue you're trying to
solve.

If this is an optimization thing, I'd strongly suggest going through the
files as-is and trying to perform some of the ocl-level optimizations AMD's
guides suggest.  You'd be surprised what removing a couple of conditionals
in often-called loops can do for performance of many things.    Looking at
the code, vectorizing / using native opencl data types would probably show
some gains as well.  Many of them seem to be straight C source conversions
of stuff that was optimized for x86 at some point before SSE2 existed and
promptly  forgotten.

Cheers,
-G
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180906/102c2b32/attachment.html>

UE US via llvm-dev

2018-Sep-07 03:22 UTC

head link

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

This page https://gpuopen.com/opencl-rocm1-6/ also suggests that inline asm
is supported by the rocm toolchain, and there are example exercises /
solutions here:

https://github.com/HandsOnOpenCL/Exercises-Solutions/tree/master/Solutions

The AMD PRO driver says it has supported rocm 1.6 since last year, but it
sounds like that doesn't work with it, so ???

-G


On Thu, Sep 6, 2018 at 10:11 PM UE US <uexplorer666 at gmail.com> wrote:
>
>
> On Wed, Sep 5, 2018 at 1:17 PM Changdao Dong via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> Finally I kind of modified llvm to generate assembly that can run on
>> AMDGPU pro drivers. One problem is the performance of the code
generated by
>> llvm is about 10% slower than amdgpu's online compiler. Anything I
can tune
>> the performance up the performance of llvm?\
>>
>> Thanks!
>>
>> On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com>
wrote:
>>
>>> I am writing a miner of crypto currency, for which most users run
it
>>> with amdgpu driver. I have written a script the translate the meta
data of
>>> LLVM isa format into clrxasm format.
>>>
>>
> clrxasm's docs say it only supports GCN devices to begin with, so it
seems
> like you wouldn't actually want to use the --amdhsa "os" flag
(or the
> amdgpu target, you'd want amdgcn);  that's for things that will be
directly
> loaded with the HSA API as far as I know.  If you felt like it you could
> load and execute them with that API instead of the opencl one and not mess
> around with it further than that.  I've never worked with that, so
Artem
> can probably tell you more if that doesn't explain things.  It looks
> relatively straightforward.
>
https://gpuopen.com/rocm-with-harmony-combining-opencl-hcc-hsa-in-a-single-program/
>
> This page  https://openwall.info/wiki/john/development/AMD-IL (linked
> from another AMD list posting last year about something similar)   says
> that the following work:
>
> *(i)*Setting the environment variable:
> AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps ./Name_of_executable
> *(ii)*Using the build options:
> In clBuildProram() specify ”-save-temps” in the build option field to
> generate IL and ISA.
>
> ...and the driver will retain the .isa and .il files, but then you'd
still
> be left with patching in your changes somehow.   If that works it would at
> least give you an example of what LLVM is currently generating vs. the
> driver so you can compare those and also modify / test assembly changes to
> determine if they're worthwhile for whatever issue you're trying to
solve.
>
> If this is an optimization thing, I'd strongly suggest going through
the
> files as-is and trying to perform some of the ocl-level optimizations
AMD's
> guides suggest.  You'd be surprised what removing a couple of
conditionals
> in often-called loops can do for performance of many things.    Looking at
> the code, vectorizing / using native opencl data types would probably show
> some gains as well.  Many of them seem to be straight C source conversions
> of stuff that was optimized for x86 at some point before SSE2 existed and
> promptly  forgotten.
>
> Cheers,
> -G
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180906/023025cf/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Sep 2018 - Can I control HSA config generated by AMDGPU backend?

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

[llvm-dev] Can I control HSA config generated by AMDGPU backend?

Maybe Matching Threads