thr3ads.net - similar to: "Can I control HSA config generated by AMDGPU backend?"

Displaying 20 results from an estimated 600 matches similar to: "Can I control HSA config generated by AMDGPU backend?"

2016 Feb 18

Bug in X86 assembler?

You can't use mov. I am not an x86 expert, but after quick googling: Recall that immediates are normally restricted to 32 bits. To load a larger constant into a quad register, use movabsq, which takes a full 64-bit immediate as its source [https://www.lri.fr/~filliatr/ens/compil/x86-64.pdf] Even in your example, assembler replaces mov by movq, which, say, hints:

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment,

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

Hi- Here is the simple C++ function: ----------- void foo() { } ----------- Let's say, above function is compiled to generate LLVM IR with -g flag using the command line `clang++ -g -O0 -S -emit-llvm foo.cpp`, we get below IR ----------- ; ModuleID = 'foo.cpp' source_filename = "foo.cpp" target datalayout =

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

Hi David If you look at line https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Verifier.cpp#L1160 there is IR verification which asserts that only in case of `spFlags = DISPFlagDefinition`, the compilation unit (`unit` field) should be present. Otherwise, it should *not* be present. In the crash case, `spFlags = DISPFlagOptimized`. So, I guess, `unit` field should *not* be present,

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

I am bit confused - `unit` must be present for definitions, and `optimized ` is also a `definition`, so, `unit` must be present for `optimized ` too. Am I right? Mahesha On Sun, May 31, 2020 at 10:14 PM David Blaikie <dblaikie at gmail.com> wrote: > definition and optimized are orthogonal (a function could be both, or > neither) - one says this DISubprogram describes a function

LLC crash while handling DEBUG info

2020 Jun 01

LLC crash while handling DEBUG info

Let's forget about my malformed IR if it is adding additional confusion here. I mentioned it here to ease the conversation, but if it is causing confusion rather than making the discussion flow easier, then we better ignore it. The whole triggering point for this email initiative is - one of the applications is crashing with the stack trace that I mentioned earlier. The crash is during the

[PATCH][SVM][5/5] add hsa for ucode

2006 May 16

[PATCH][SVM][5/5] add hsa for ucode

SVM patch to add a host save area per core for the hypervisor and also for the microcode. The microcode area is not guaranteed to be compatible with the vmcb layout, therefore will require it''s own "scratch pad". Consolidate the per core areas into a single structure. Applies cleanly to 10002. Please apply to xen-unstable.hg. Please apply to xen-3.0-testing.hg.

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0

AMDGPU and math functions

2019 Nov 13

AMDGPU and math functions

There certainly is support; after all AMD supports both OpenCL and HIP (a dialect of C++ very close to cuda). AMD device libraries (in bitcode form) are installed when ROCm ( https://rocm.github.io/ ) is installed. AMD device libraries are mostly written in (OpenCL) C and open source at https://github.com/RadeonOpenCompute/ROCm-Device-Libs . They are configured by linking in a number tiny

AMDGPU and math functions

2019 Nov 13

AMDGPU and math functions

Does anyone know whether there is yet support for math functions in AMD GPU kernels? In the NVIDIA world they provide the libdevice IR module which can be linked to an existing module containing the kernel. In other words they provide all math functions on IR level. NVIDIA even claims that libdevice is actually device specific (compute capability). I was wondering how that is done on the AMD

How to define data for X86 assembler?

2016 Feb 17

How to define data for X86 assembler?

Hi, Is there any documentation on the syntax accepted by X86 assembler? I have this code in my .asm file to define data: text db "127.1.1.1 google.lk" But X86 assembler fails to understand it, with error: error: unexpected token in argument list text db "127.1.1.1 google.lk" ^ Any ideas how to fix this problem? I tried to find some

wasteful cmake defaults

2020 Nov 18

wasteful cmake defaults

Am Mi., 18. Nov. 2020 um 12:17 Uhr schrieb David Chisnall <David.Chisnall at cl.cam.ac.uk>: > On 18/11/2020 18:03, Michael Kruse wrote: > > This is missing the probably largest group: Users of clang who compile > > clang themselves (e.g. because their OS does not come with a package > > for clang, or is too old) > > I think our target for this group should be that

OpenCL toolset (for AMD GPU)

2015 Sep 29

OpenCL toolset (for AMD GPU)

On 09/29/2015 04:19 PM, Tom Stellard via llvm-dev wrote: > On Tue, Sep 29, 2015 at 01:20:57PM +0000, Paweł Bylica via llvm-dev wrote: >> Hi LLVM, >> >> I would like to compile OpenCL kernel for a specific AMD GPU target. Is it >> possible with the current clang/LLVM? >> >> I started by using `clang -x cl` but it looks like at least some OpenCL >>

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

HI LLVM comunity, after 3 years of development, various talks on LLVM-HPC and EuroLLVM and other scientific conferences I want to present my PhD research topic to the lists. The main goal for my research was to develop a single-source programming model equal to CUDA or SYCL for accelerators supported by LLVM (e.g., Nvidia GPUs). PACXX uses Clang as front-end for code generation and comes with

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

Interesting. I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code generation is done at compile time. The runtime is aided by compile time reflection so that calling kernels is done by symbol. What kind of performance difference do you see running code that was not developed with GPU in mind (e.g.

storing MBB MCSymbol in custom section

2017 Nov 28

storing MBB MCSymbol in custom section

Dear llvm-dev-list, I have created my own custom section to be added at the end into a binary upon compilation which contains address of all basic blocks. As the final address of the basic block is not known until link time, I collect the MCSymbol* Symbol Values per BB in a temp array and at the in the custom section and emit it (emitSymbolValue) into my section within EmitEndOfAsmFile() I have

Execute OpenCL

2019 Sep 26

Execute OpenCL

Hi Alexey, Your reply has been a great help to me,your way of explain the different types of compilation is very detailed and easy to understand. Even so, I have a couple of questions. 1) What do yourefer by OpenCL RT? 2) Could you give me some examples of an open-source OpenCL implementation and update optimization pipeline? Thank you in advance. Regards El vie., 20 sept. 2019 a las 12:34,

wasteful cmake defaults

2020 Nov 18

wasteful cmake defaults

Am Mi., 18. Nov. 2020 um 05:32 Uhr schrieb David Chisnall via llvm-dev <llvm-dev at lists.llvm.org>: > In terms of the most useful build configuration to be the default, I > think there are a bunch of users that we need to consider: > > - Developers of LLVM > - Developers of downstream projects that use LLVM > - Package builders > - CI admins. This is missing

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

I was going to say, this reminds me of Kai's presentation at Fosdem yesterday. https://fosdem.org/2018/schedule/event/heterogenousd/ It's always good to see the cross-architecture power of LLVM being used in creative ways! :) cheers, --renato On 5 February 2018 at 13:35, Nicholas Wilson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Interesting. > > I do something

similar to: Can I control HSA config generated by AMDGPU backend?