thr3ads.net - similar to: "[AMDGPU] Strange results with different address spaces"

Displaying 20 results from an estimated 9000 matches similar to: "[AMDGPU] Strange results with different address spaces"

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > >> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi dev list, >> >> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some

[AMDGPU] Strange results with different address spaces

2017 Dec 06

[AMDGPU] Strange results with different address spaces

> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at uni-muenster.de> wrote: > > The IR goes through a backend agnostic preparation phase that brings it into SSA from and changes the AS from 0 to 1. This sounds possibly problematic to me. The IR should be created with the correct address space to begin with. Changing this in the middle sounds suspect. > After this

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

HI LLVM comunity, after 3 years of development, various talks on LLVM-HPC and EuroLLVM and other scientific conferences I want to present my PhD research topic to the lists. The main goal for my research was to develop a single-source programming model equal to CUDA or SYCL for accelerators supported by LLVM (e.g., Nvidia GPUs). PACXX uses Clang as front-end for code generation and comes with

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

Interesting. I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code generation is done at compile time. The runtime is aided by compile time reflection so that calling kernels is done by symbol. What kind of performance difference do you see running code that was not developed with GPU in mind (e.g.

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

I was going to say, this reminds me of Kai's presentation at Fosdem yesterday. https://fosdem.org/2018/schedule/event/heterogenousd/ It's always good to see the cross-architecture power of LLVM being used in creative ways! :) cheers, --renato On 5 February 2018 at 13:35, Nicholas Wilson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Interesting. > > I do something

Can I control HSA config generated by AMDGPU backend?

2018 Sep 05

Can I control HSA config generated by AMDGPU backend?

Finally I kind of modified llvm to generate assembly that can run on AMDGPU pro drivers. One problem is the performance of the code generated by llvm is about 10% slower than amdgpu's online compiler. Anything I can tune the performance up the performance of llvm?\ Thanks! On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote: > I am writing a miner of crypto

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment,

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output

Are AMDGPU intrinsics available in LLVM IR ?

2020 Apr 13

Are AMDGPU intrinsics available in LLVM IR ?

Hi! I'm trying to figure out how to access the workgroup id from within the LLVM IR language when lowering with the AMDGPU backend. Looking at the 'llvm/include/llvm/IR/IntrinsicsAMDGPU.td' file there are intrinsics defined to access the workitem index (thread index), but this file lives in 'llvm/include':

AMDGPU and math functions

2019 Nov 13

AMDGPU and math functions

There certainly is support; after all AMD supports both OpenCL and HIP (a dialect of C++ very close to cuda). AMD device libraries (in bitcode form) are installed when ROCm ( https://rocm.github.io/ ) is installed. AMD device libraries are mostly written in (OpenCL) C and open source at https://github.com/RadeonOpenCompute/ROCm-Device-Libs . They are configured by linking in a number tiny

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

Hi- Here is the simple C++ function: ----------- void foo() { } ----------- Let's say, above function is compiled to generate LLVM IR with -g flag using the command line `clang++ -g -O0 -S -emit-llvm foo.cpp`, we get below IR ----------- ; ModuleID = 'foo.cpp' source_filename = "foo.cpp" target datalayout =

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

Hi David If you look at line https://github.com/llvm/llvm-project/blob/master/llvm/lib/IR/Verifier.cpp#L1160 there is IR verification which asserts that only in case of `spFlags = DISPFlagDefinition`, the compilation unit (`unit` field) should be present. Otherwise, it should *not* be present. In the crash case, `spFlags = DISPFlagOptimized`. So, I guess, `unit` field should *not* be present,

LLC crash while handling DEBUG info

2020 May 31

LLC crash while handling DEBUG info

I am bit confused - `unit` must be present for definitions, and `optimized ` is also a `definition`, so, `unit` must be present for `optimized ` too. Am I right? Mahesha On Sun, May 31, 2020 at 10:14 PM David Blaikie <dblaikie at gmail.com> wrote: > definition and optimized are orthogonal (a function could be both, or > neither) - one says this DISubprogram describes a function

LLC crash while handling DEBUG info

2020 Jun 01

LLC crash while handling DEBUG info

Let's forget about my malformed IR if it is adding additional confusion here. I mentioned it here to ease the conversation, but if it is causing confusion rather than making the discussion flow easier, then we better ignore it. The whole triggering point for this email initiative is - one of the applications is crashing with the stack trace that I mentioned earlier. The crash is during the

[OpenCL][AMDGPU] Using AMDGPU generated kernel code for OpenCL

2017 May 08

[OpenCL][AMDGPU] Using AMDGPU generated kernel code for OpenCL

Hello everyone I was wondering, what the correct way of using an AMDGPU generated kernel code for OpenCL was. I am trying to provide Polly's GPGPU Code generation with the ability to run on different GPU devices, such as AMD GPUs. For NVIDIA, I simply retrieve a pre-compiled PTX string from the NVPTX backend and pass that to OpenCL's 'clCreateProgramWithBinary' function. However,

Changing behavior of lit.py's -v flag

2019 Sep 16

Changing behavior of lit.py's -v flag

Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> writes: > Hi Varun, > > I'm definitely in favour of making -v more useful like this. > > On Thu, 12 Sep 2019 at 19:31, Varun Gandhi via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Option 2 (less deviation from status quo): >> -v: Adopts behavior of -vvv from Option 1. :) >> -vv: Same

wasteful cmake defaults

2020 Nov 18

wasteful cmake defaults

Am Mi., 18. Nov. 2020 um 12:17 Uhr schrieb David Chisnall <David.Chisnall at cl.cam.ac.uk>: > On 18/11/2020 18:03, Michael Kruse wrote: > > This is missing the probably largest group: Users of clang who compile > > clang themselves (e.g. because their OS does not come with a package > > for clang, or is too old) > > I think our target for this group should be that

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

On 06/13/2017 07:33 PM, Matt Arsenault wrote: > >> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >> >> On 06/12/2017 08:03 PM, Connor Abbott wrote: >>> On Mon, Jun 12, 2017 at 4:56 PM, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >>>> On

AMDGPU and math functions

2019 Nov 13

AMDGPU and math functions

Does anyone know whether there is yet support for math functions in AMD GPU kernels? In the NVIDIA world they provide the libdevice IR module which can be linked to an existing module containing the kernel. In other words they provide all math functions on IR level. NVIDIA even claims that libdevice is actually device specific (compute capability). I was wondering how that is done on the AMD

[LLVMdev] R600 -> AMDGPU rename coming on Friday

2015 Jun 08

[LLVMdev] R600 -> AMDGPU rename coming on Friday

Hi, I'm finally going to do the R600->AMDGPU rename this Friday. This is something I originally proposed last July [1], but had to put off in order to avoid creating really bad merge headaches for some users. The only change from my original proposal is that I'll just keep the existing r600 and amdgcn triples rather than adding a new one for amdgpu. If anyone has any strong

similar to: [AMDGPU] Strange results with different address spaces