thr3ads.net - similar to: "[RFC] Upstreaming PACXX (Programing Accelerators with C++)"

Displaying 20 results from an estimated 1000 matches similar to: "[RFC] Upstreaming PACXX (Programing Accelerators with C++)"

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

Interesting. I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code generation is done at compile time. The runtime is aided by compile time reflection so that calling kernels is done by symbol. What kind of performance difference do you see running code that was not developed with GPU in mind (e.g.

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

2018 Feb 05

[RFC] Upstreaming PACXX (Programing Accelerators with C++)

I was going to say, this reminds me of Kai's presentation at Fosdem yesterday. https://fosdem.org/2018/schedule/event/heterogenousd/ It's always good to see the cross-architecture power of LLVM being used in creative ways! :) cheers, --renato On 5 February 2018 at 13:35, Nicholas Wilson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Interesting. > > I do something

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > >> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi dev list, >> >> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some

LLVM Weekly - #215, Feb 12th 2018

2018 Feb 12

LLVM Weekly - #215, Feb 12th 2018

LLVM Weekly - #215, Feb 12th 2018 ================================= If you prefer, you can read a HTML version of this email at <http://llvmweekly.org/issue/215>. Welcome to the two hundred and fifteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by [Alex

[RFC] A New Divergence Analysis for LLVM

2018 May 28

[RFC] A New Divergence Analysis for LLVM

TL;DR This RFC is a joint effort by Intel and Saarland University to bring the divergence analysis of the Region Vectorizer [1,2,3,4,5] (dubbed the vectorization analysis of RV) to LLVM. The implementation is available on github for feedback [0]. The existing divergence analysis infrastructure in LLVM has conceptual limitations (structured control, SCEV based). The new analysis resolves bugs

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 13

[LLVMdev] Emitting IR in older formats (for NVVM)

Since SPIR can be (easily) transformed to NVVM IR at least for me this helps a lot. Thank you Tobias. -MH On January 12, 2015, Tobias Grosser <tgrosser at inf.ethz.ch> wrote: > On 12.01.2015 05:48, Jonathan Ragan-Kelley wrote: > > This question is specifically motivated by the practical constraints of > > NVVM, but I don't know anywhere better to ask (hopefully, e.g.,

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Mar 13

[RFC] Late (OpenMP) GPU code "SPMD-zation"

1. You don't need to implement everything in a single patch. The development process is a step-by-step process, when you commit something in small pieces. The code must nit be fully functional, you may start from some basic features. Currently it is very hard to review. 2. I rather doubt that it can be reused without changes for AMD etc., especially without being fully tested. The only tested

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Mar 13

[RFC] Late (OpenMP) GPU code "SPMD-zation"

------------- Best regards, Alexey Bataev 13.03.2019 15:35, Doerfert, Johannes пишет: > > Hi Alexey, > > > thank you for your quick feedback. > > > > There are tooooooo(!) many changes, I don't who's going to review sooooo big > patch. > > > I can for sure split it in the three components/repositories that are > touched, clang, llvm, and openmp.

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Mar 13

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Johannes, did you try it on AMD GPUs? If not, I think it might be early to claim it as a general interface for NVidia/AMD GPUs. I'm ok, if you want tointroduce a basic class for the GPU-specific codegen, but it must be done step-by-step and thoroughly tested and reviewed. Theremightbe some parts, common with NVPTX codegen. You can put the commonfunctions into a base class and remove them from

[LLVMdev] SPMD Autovectorizer

2015 Jul 06

[LLVMdev] SPMD Autovectorizer

Hi, Are there any plans to integrate an autovectorizer for SPMD programs into LLVM? For example, there were previous discussions about integrating the whole function vectorizer (WFV) from Ralf Karrenberg into LLVM. Thanks, Zack -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

[LLVMdev] RFC: Convergent attribute

Hi Jingyue, Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that. I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like

[LLVMdev] RFC: Convergent attribute

2015 May 13

[LLVMdev] RFC: Convergent attribute

Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. —Owen In order to make LLVM more suitable for programming models variously called SPMD and SIMT, we would like to propose a new intrinsic and MachineInstr annotation called

[LLVMdev] RFC: Convergent attribute

2015 Aug 14

[LLVMdev] RFC: Convergent attribute

Hi Mehdi, My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. James On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < > llvm-dev at

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 31

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Hi Johannes, Thank you for the explanation. I think we need to clarify some details about code generation in Clang today: <font size=2

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 04

Status of llvm.experimental.vector.reduce.* intrinsics

I assume smaller types like <4 x i1> are getting zero extended to e.g., i8? Am 04.08.2017 um 15:58 schrieb Amara Emerson: > Actually for mask vectors of i1 values, you don't need to use reductions > at all(although for SVE this is what we'll do). You can instead bitcast > the vector value to an i8/i16/whatever and then compare against zero. > > Amara > > On

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 22

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Where we are ------------ Currently, when we generate OpenMP target offloading code for GPUs, we use sufficient syntactic criteria to decide between two execution modes: 1) SPMD -- All target threads (in an OpenMP team) run all the code. 2) "Guarded" -- The master thread (of an OpenMP team) runs the user code. If an OpenMP distribute region is encountered,

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 22

[RFC] Late (OpenMP) GPU code "SPMD-zation"

We would still know that. We can do exactly the same reasoning as we do now. I think the important question is, how different is the code generated for either mode and can we hide (most of) the differences in the runtime. If I understand you correctly, you say the data sharing code looks very different and the differences cannot be hidden, correct? It would be helpful for me to understand your

[LLVMdev] SPMD Autovectorizer

2015 Jul 07

[LLVMdev] SPMD Autovectorizer

On 07/07/2015 01:32 PM, Renato Golin wrote: > Wouldn't OpenMP account for some of that? At least on a single > machine, could you have both parallel and simd optimisations done on > the same loop? The point in SPMD program description (e.g. CUDA or OpenCL C) autovectorization is to produce something like OpenMP parallel loops or SIMD pragmas automatically from the single thread/WI

Status of llvm.experimental.vector.reduce.* intrinsics

2017 Aug 04

Status of llvm.experimental.vector.reduce.* intrinsics

I am currently working on a transformation pass that transforms masked.load and masked.store intrinsics to (hopefully) increase performance on targets where masked.load and masked.store are not legal. To check if the loads and stores are necessary at all I take the mask for the masked operations and want to reduce them to a single value. vector.reduce.or seemed very handy to do the job. I

similar to: [RFC] Upstreaming PACXX (Programing Accelerators with C++)