thr3ads.net - similar to: "Executing OpenMP 4.0 code on Nvidia's GPU"

Displaying 20 results from an estimated 3000 matches similar to: "Executing OpenMP 4.0 code on Nvidia's GPU"

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

On Wed, Apr 8, 2015 at 10:12 AM, Dmitry Mikushin <dmitry at kernelgen.org> wrote: > A tool of this kind here: https://github.com/apc-llc/nvcc-llvm-ir > > 2015-04-08 19:01 GMT+02:00 Ahmed ElTantawy <ahmede at ece.ubc.ca>: > >> Hi, >> >> I wanted to ask whether there is ongoing effort (or an already >> established tool) that enables to convert CUDA

[LLVMdev] Example for usage of LLVM/Clang/libclc

2015 Feb 03

[LLVMdev] Example for usage of LLVM/Clang/libclc

Hi, My goal is to use Clang/LLVM/libclc to compile an OpenCL kernel and eventually generate a PTX code. I already did this but I am not sure if the PTX code I am generating is correct (is the one that is supposed to be generated). For example, currently, In OpenCL : get_global_id(0) translates to In LLVM : %call = tail call i32 @get_global_id(i32 0) which translates to In PTX:

Disable spilling sub-registers in LLVM

2018 Jan 30

Disable spilling sub-registers in LLVM

Right Matthias, I am aware that an implementation for storeRegToStackSlot()/loadRegFromStackSlot() is necessary. But these functions receive the physical register that need to be spilled, they might receive the sub-register. In this case, using the super-register naively is unsafe (e.g., one might overwrite parts of it). Thus, I think the register allocator/spillar need to be aware of the

Disable spilling sub-registers in LLVM

2018 Jan 30

Disable spilling sub-registers in LLVM

To make my point clear, I believe an implementation of storeRegToStackSlot()/loadRegFromStackSlot() is not sufficient (as it received the physical register already). Does this make sense? On 2018-01-30 13:33, ahmede wrote: > Right Matthias, I am aware that an implementation for > storeRegToStackSlot()/loadRegFromStackSlot() is necessary. But these > functions receive the physical

Disable spilling sub-registers in LLVM

2018 Jan 30

Disable spilling sub-registers in LLVM

Hi Quentin, Let me clarify if I understood this correctly. If the accesses (writes and reads) to sub-registers are expressed always as sub-registers of the super-register register class (e.g., SuperReg.sub1;), then the spilling decision is for the super register. But, if the accesses are in terms of the register class of the sub-registers directly (SubReg;), then the spilling decision will

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 22

[RFC] Late (OpenMP) GPU code "SPMD-zation"

We would still know that. We can do exactly the same reasoning as we do now. I think the important question is, how different is the code generated for either mode and can we hide (most of) the differences in the runtime. If I understand you correctly, you say the data sharing code looks very different and the differences cannot be hidden, correct? It would be helpful for me to understand your

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 22

[RFC] Late (OpenMP) GPU code "SPMD-zation"

We could still do that in clang, couldn't we? Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Alexey Bataev <a.bataev at outlook.com> Sent: Tuesday, January 22, 2019 12:52:42 PM To: Doerfert, Johannes Rudolf; cfe-dev at lists.llvm.org Cc: openmp-dev at lists.llvm.org; LLVM-Dev; Finkel, Hal J.; Alexey Bataev; Arpith Chacko Jacob Subject: Re:

[LLVMdev] Supporting heterogeneous computing in llvm.

2015 Jun 09

[LLVMdev] Supporting heterogeneous computing in llvm.

Hi Sergos and Samuel, Thanks for the links, I've got it mostly working now. I still have a problem with linking the code. It seems that the clang driver doesn't pass its library search path to nvlink when linking the generated cuda code to the target library, resulting in it not correctly finding libtarget-nvptx.a. Is there some flag or environment variable that I should set here?

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

2016 Mar 10

RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

----- Original Message ----- > From: "Arpith C Jacob" <acjacob at us.ibm.com> > To: llvm-dev at lists.llvm.org > Cc: jhen at google.com, "Hal J. Finkel" <hfinkel at anl.gov> > Sent: Thursday, March 10, 2016 10:38:46 AM > Subject: Re: [llvm-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries > > Hi Jason, >

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 22

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Could you elaborate on what you refer to wrt data sharing. What do we currently do in the clang code generation that we could not effectively implement in the runtime, potentially with support of an llvm pass. Thanks, James Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Alexey Bataev <a.bataev at outlook.com> Sent: Tuesday, January 22, 2019

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 23

[RFC] Late (OpenMP) GPU code "SPMD-zation"

We are working on OpenMP target offloading for GPUs in Flang, and adopting the same code generation strategy. The proposal is affecting us. It would be nice to know more details about the proposal. So we can prepare ourselves to adapt flang (if everything goes on the way). Have you find and a solution for data sharing? How are you going to manage data sharing for SPMD and non-SPMD? From: cfe-dev

Disable spilling sub-registers in LLVM

2018 Jan 30

Disable spilling sub-registers in LLVM

I still think my answer applies that you have to modify storeRegToStackSlot()/loadRegFromStackSlot(). They decide how registers are spilled and reloaded. Nobody is stopping you from using super registers spills/reloads to implement spilling/reloading smaller registers there. - Matthias > On Jan 30, 2018, at 10:21 AM, ahmede <ahmede at ece.ubc.ca> wrote: > > Hi Quentin, > >

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 22

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Where we are ------------ Currently, when we generate OpenMP target offloading code for GPUs, we use sufficient syntactic criteria to decide between two execution modes: 1) SPMD -- All target threads (in an OpenMP team) run all the code. 2) "Guarded" -- The master thread (of an OpenMP team) runs the user code. If an OpenMP distribute region is encountered,

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Jan 31

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Hi Johannes, Thank you for the explanation. I think we need to clarify some details about code generation in Clang today: <font size=2

[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

2019 May 31

[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Yes, this is very similar, but only expressed in terms of clang attributes, which may have different spellings for clang, GCC, c++11 etc. I don't think GCC will implement this as pragma. They added simd attribute instead of pragma. Best regards, Alexey Bataev > 31 мая 2019 г., в 14:43, Francesco Petrogalli <Francesco.Petrogalli at arm.com> написал(а): > > > >> On

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Mar 13

[RFC] Late (OpenMP) GPU code "SPMD-zation"

1. You don't need to implement everything in a single patch. The development process is a step-by-step process, when you commit something in small pieces. The code must nit be fully functional, you may start from some basic features. Currently it is very hard to review. 2. I rather doubt that it can be reused without changes for AMD etc., especially without being fully tested. The only tested

[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

2019 May 31

[cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

You can define clang specific attribute and later add GCC alias for it. Best regards, Alexey Bataev > 31 мая 2019 г., в 13:46, Francesco Petrogalli <Francesco.Petrogalli at arm.com> написал(а): > > > >> On May 31, 2019, at 12:38 PM, Alexey Bataev <a.bataev at hotmail.com> wrote: >> >> Francesco, there won't be any duplication. Most of the

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

I assume the referring case is something like below, right? #pragma omp parallel num_threads(n) { #pragma omp critical { x = x + 1; } } If that is the case, the programmer is already writing the code that is not "serial equivalent". Our representation for parallelizer is %t = @llvm.region.entry()["omp.parallel"(),

[RFC] Late (OpenMP) GPU code "SPMD-zation"

2019 Mar 13

[RFC] Late (OpenMP) GPU code "SPMD-zation"

Johannes, did you try it on AMD GPUs? If not, I think it might be early to claim it as a general interface for NVidia/AMD GPUs. I'm ok, if you want tointroduce a basic class for the GPU-specific codegen, but it must be done step-by-step and thoroughly tested and reviewed. Theremightbe some parts, common with NVPTX codegen. You can put the commonfunctions into a base class and remove them from

similar to: Executing OpenMP 4.0 code on Nvidia's GPU