similar to: [RFC] Upstreaming PACXX (Programing Accelerators with C++)

Displaying 20 results from an estimated 1000 matches similar to: "[RFC] Upstreaming PACXX (Programing Accelerators with C++)"

2018 Feb 05
0
[RFC] Upstreaming PACXX (Programing Accelerators with C++)
Interesting. I do something similar for D targeting CUDA (via NVPTX) and OpenCL (via my forward proved fork of Khronos’ SPIRV-LLVM)[1], except all the code generation is done at compile time. The runtime is aided by compile time reflection so that calling kernels is done by symbol. What kind of performance difference do you see running code that was not developed with GPU in mind (e.g.
2018 Feb 05
1
[RFC] Upstreaming PACXX (Programing Accelerators with C++)
I was going to say, this reminds me of Kai's presentation at Fosdem yesterday. https://fosdem.org/2018/schedule/event/heterogenousd/ It's always good to see the cross-architecture power of LLVM being used in creative ways! :) cheers, --renato On 5 February 2018 at 13:35, Nicholas Wilson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Interesting. > > I do something
2017 Dec 05
3
[AMDGPU] Strange results with different address spaces
Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0
2017 Dec 05
2
[AMDGPU] Strange results with different address spaces
> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > >> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi dev list, >> >> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some
2018 Feb 12
1
LLVM Weekly - #215, Feb 12th 2018
LLVM Weekly - #215, Feb 12th 2018 ================================= If you prefer, you can read a HTML version of this email at <http://llvmweekly.org/issue/215>. Welcome to the two hundred and fifteenth issue of LLVM Weekly, a weekly newsletter (published every Monday) covering developments in LLVM, Clang, and related projects. LLVM Weekly is brought to you by [Alex
2018 May 28
0
[RFC] A New Divergence Analysis for LLVM
TL;DR This RFC is a joint effort by Intel and Saarland University to bring the divergence analysis of the Region Vectorizer [1,2,3,4,5] (dubbed the vectorization analysis of RV) to LLVM. The implementation is available on github for feedback [0]. The existing divergence analysis infrastructure in LLVM has conceptual limitations (structured control, SCEV based). The new analysis resolves bugs
2015 Jan 13
2
[LLVMdev] Emitting IR in older formats (for NVVM)
Since SPIR can be (easily) transformed to NVVM IR at least for me this helps a lot. Thank you Tobias. -MH On January 12, 2015, Tobias Grosser <tgrosser at inf.ethz.ch> wrote: > On 12.01.2015 05:48, Jonathan Ragan-Kelley wrote: > > This question is specifically motivated by the practical constraints of > > NVVM, but I don't know anywhere better to ask (hopefully, e.g.,
2019 Mar 13
4
[RFC] Late (OpenMP) GPU code "SPMD-zation"
1. You don't need to implement everything in a single patch. The development process is a step-by-step process, when you commit something in small pieces. The code must nit be fully functional, you may start from some basic features. Currently it is very hard to review. 2. I rather doubt that it can be reused without changes for AMD etc., especially without being fully tested. The only tested
2019 Mar 13
2
[RFC] Late (OpenMP) GPU code "SPMD-zation"
------------- Best regards, Alexey Bataev 13.03.2019 15:35, Doerfert, Johannes пишет: > > Hi Alexey, > > > thank you for your quick feedback. > > > > There are tooooooo(!) many changes, I don't who's going to review sooooo big > patch. > > > I can for sure split it in the three components/repositories that are > touched, clang, llvm, and openmp.
2019 Mar 13
3
[RFC] Late (OpenMP) GPU code "SPMD-zation"
Johannes, did you try it on AMD GPUs? If not, I think it might be early to claim it as a general interface for NVidia/AMD GPUs. I'm ok, if you want tointroduce a basic class for the GPU-specific codegen, but it must be done step-by-step and thoroughly tested and reviewed. Theremightbe some parts, common with NVPTX codegen. You can put the commonfunctions into a base class and remove them from
2015 Jul 06
4
[LLVMdev] SPMD Autovectorizer
Hi, Are there any plans to integrate an autovectorizer for SPMD programs into LLVM? For example, there were previous discussions about integrating the whole function vectorizer (WFV) from Ralf Karrenberg into LLVM. Thanks, Zack -------------- next part -------------- An HTML attachment was scrubbed... URL:
2015 Aug 14
2
[LLVMdev] RFC: Convergent attribute
Hi Jingyue, Convergent is not intended to prevent inlining. It’s tricky to formalize this inter-procedurally, but the intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. Normal inlining would not violate that. I would imagine that it would make sense to use a combination of convergent and noduplicate for barrier-like
2015 May 13
8
[LLVMdev] RFC: Convergent attribute
Below is a proposal for a new "convergent" intrinsic attribute and MachineInstr property, needed for correctly modeling many SPMD/SIMT programming models in LLVM. Comments and feedback welcome. —Owen In order to make LLVM more suitable for programming models variously called SPMD and SIMT, we would like to propose a new intrinsic and MachineInstr annotation called
2015 Aug 14
2
[LLVMdev] RFC: Convergent attribute
Hi Mehdi, My reading of it is that if you have a convergent instruction A, it is legal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. James On Fri, 14 Aug 2015 at 08:32 Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Aug 13, 2015, at 9:43 PM, Owen Anderson via llvm-dev < > llvm-dev at
2019 Jan 31
2
[RFC] Late (OpenMP) GPU code "SPMD-zation"
<font size=2 face="sans-serif">Hi Johannes,</font><br><br><font size=2 face="sans-serif">Thank you for the explanation.</font><br><br><font size=2 face="sans-serif">I think we need to clarify some details about code generation in Clang today:</font><br><br><font size=2
2017 Aug 04
3
Status of llvm.experimental.vector.reduce.* intrinsics
I assume smaller types like <4 x i1> are getting zero extended to e.g., i8? Am 04.08.2017 um 15:58 schrieb Amara Emerson: > Actually for mask vectors of i1 values, you don't need to use reductions > at all(although for SVE this is what we'll do). You can instead bitcast > the vector value to an i8/i16/whatever and then compare against zero. > > Amara > > On
2019 Jan 22
7
[RFC] Late (OpenMP) GPU code "SPMD-zation"
Where we are ------------ Currently, when we generate OpenMP target offloading code for GPUs, we use sufficient syntactic criteria to decide between two execution modes: 1) SPMD -- All target threads (in an OpenMP team) run all the code. 2) "Guarded" -- The master thread (of an OpenMP team) runs the user code. If an OpenMP distribute region is encountered,
2019 Jan 22
3
[RFC] Late (OpenMP) GPU code "SPMD-zation"
We would still know that. We can do exactly the same reasoning as we do now. I think the important question is, how different is the code generated for either mode and can we hide (most of) the differences in the runtime. If I understand you correctly, you say the data sharing code looks very different and the differences cannot be hidden, correct? It would be helpful for me to understand your
2015 Jul 07
2
[LLVMdev] SPMD Autovectorizer
On 07/07/2015 01:32 PM, Renato Golin wrote: > Wouldn't OpenMP account for some of that? At least on a single > machine, could you have both parallel and simd optimisations done on > the same loop? The point in SPMD program description (e.g. CUDA or OpenCL C) autovectorization is to produce something like OpenMP parallel loops or SIMD pragmas automatically from the single thread/WI
2017 Aug 04
2
Status of llvm.experimental.vector.reduce.* intrinsics
I am currently working on a transformation pass that transforms masked.load and masked.store intrinsics to (hopefully) increase performance on targets where masked.load and masked.store are not legal. To check if the loads and stores are necessary at all I take the mask for the masked operations and want to reduce them to a single value. vector.reduce.or seemed very handy to do the job. I