thr3ads.net - similar to: "__sync_synchronize() crashes when compiling OpenMP to a GPU target"

Displaying 20 results from an estimated 1000 matches similar to: "__sync_synchronize() crashes when compiling OpenMP to a GPU target"

[LLVMdev] How to set isTarget bit for a complex intrinsic class in tblgen?

2013 Feb 07

[LLVMdev] How to set isTarget bit for a complex intrinsic class in tblgen?

Dear all, Let's say there is some complex tblgen intrinsic definition, for example: class MEMBAR<string StrOp, Intrinsic IntOP> : NVPTXInst<(outs), (ins), StrOp, [(IntOP)]>; def INT_MEMBAR_CTA : MEMBAR<"membar.cta;", int_nvvm_membar_cta>; def INT_MEMBAR_GL : MEMBAR<"membar.gl;", int_nvvm_membar_gl>; def INT_MEMBAR_SYS :

[LLVMdev] Behaviour of NVPTX intrinsic

2014 Sep 30

[LLVMdev] Behaviour of NVPTX intrinsic

I have written test.ll as below and ran 'opt' on it as " opt -std-compile-opts test.ll -S -o -" . But the output shows that there is code motion around the barrier intrinsics. test.ll ------- ; ModuleID = 'test.bc' define void @test(i16* %I_0, i16* %I_1, i16* %I_2, i16* %I_3, i16* %O_0) { entry: %T_0 = load volatile i16* %I_0 %T_1 = load volatile i16* %I_1 %T_2 =

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 13

[LLVMdev] Emitting IR in older formats (for NVVM)

Since SPIR can be (easily) transformed to NVVM IR at least for me this helps a lot. Thank you Tobias. -MH On January 12, 2015, Tobias Grosser <tgrosser at inf.ethz.ch> wrote: > On 12.01.2015 05:48, Jonathan Ragan-Kelley wrote: > > This question is specifically motivated by the practical constraints of > > NVVM, but I don't know anywhere better to ask (hopefully, e.g.,

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 12

[LLVMdev] Emitting IR in older formats (for NVVM)

This question is specifically motivated by the practical constraints of NVVM, but I don't know anywhere better to ask (hopefully, e.g., @jholewinski is still following), and I believe it concerns general LLVM issues: NVIDIA's libNVVM is built on LLVM 3.2. This means its bitcode and LL text parsers are from that generation. It's interface calls for adding modules as either bitcode

instrumenting device code with gpucc

2016 Mar 12

instrumenting device code with gpucc

Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you

Executing OpenMP 4.0 code on Nvidia's GPU

2016 Jan 20

Executing OpenMP 4.0 code on Nvidia's GPU

Hi Arpith, That is exactly what it is :). My bad, I thought I copied over the libraries to where LIBRARY_PATH pointing but apparently it was copied to a wrong destination. Thanks a lot. On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com> wrote: > Hi Ahmed, > > nvlink is unable to find the GPU OMP runtime library in its path. Does > LIBRARY_PATH point to

[LLVMdev] Proposal for atomic and synchronization instructions

2007 Jul 09

[LLVMdev] Proposal for atomic and synchronization instructions

Chandler Carruth wrote: > Hello, > > After a fair amount of research and work, I have put together a > concrete proposal for LLVM representations of atomic operations and > synchronization constructs. These aim to provide the minimal > functionality in the IR for representing the hardware constructs that > threading libraries and parallel programming rely on. > >

instrumenting device code with gpucc

2016 Mar 13

instrumenting device code with gpucc

Hey Jingyue, Thanks for being so responsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Dear all, I'm attaching a patch that should fix the issue mentioned above. It simply makes the same check seen in the same file for global variables: emitPTXAddressSpace(PTy->getAddressSpace(), O); if (GVar->getAlignment() == 0) O << " .align " << (int) TD->getPrefTypeAlignment(ETy); else O << " .align " <<

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

Hello, FYI, this is a bug http://llvm.org/bugs/show_bug.cgi?id=13324 When compiling the following code for sm_20, func params are by some reason given with .align 0, which is invalid. Problem does not occur if compiled for sm_10. > cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple =

instrumenting device code with gpucc

2016 Mar 15

instrumenting device code with gpucc

Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. > > On Mon, Mar 14, 2016

Missing TargetPrefix for NVVM intrinsics

2016 Jul 01

Missing TargetPrefix for NVVM intrinsics

Justins: I noticed that the intrinsics in IntrinsicsNVVM don't specify a TargetPrefix. This seems like a simple omission, so I was going to simply throw a `let TargetPrefix = "nvvm" ` block around them, but this doesn't quite work. There seem to be three prefixes that are used in this file. About 900 are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to me

Information about the number of indices in memory accesses

2020 Sep 23

Information about the number of indices in memory accesses

Hi all, For loads and stores i want to extract information about the number of indices accessed. For instance: struct S {int X, int *Y}; __global__ void kernel(int *A, int **B, struct S) { int x = A[..][..]; // -> L: A[..][..] int y = *B[2]; // -> L: B[0][2] int z = S.y[..]; // -> L: S.1[..] // etc.. } I am performing some preprocessing on IR to: 1. Move constant

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

2015 Apr 08

[LLVMdev] CUDA front-end (CUDA to LLVM IR)

Hi, I wanted to ask whether there is ongoing effort (or an already established tool) that enables to convert CUDA kernels (that uses CUDA specific intrinsics, e.g., threadId.x, __syncthreads(), ...) to LLVM IR. I am aware that I can do this for OpenCL with the help of libclc but I can not find something similar for CUDA. Thanks -------------- next part -------------- An HTML attachment was

[LLVMdev] Behaviour of NVPTX intrinsic

2014 Sep 30

[LLVMdev] Behaviour of NVPTX intrinsic

is there any guarantee that the nvptx intrinsic "llvm.nvvm.barrier0" will not be moved around by opt ? In other words, can I expect all the instructions above "llvm.nvvm.barrier0" to remain above it and those below it to remain below, after all the opt passes are run ? If that is not the case, is there a way to define such an intrinsic ? Thanks. -------------- next part

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 02

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Justin Holewinski wrote: <blockquote

[LLVMdev] Proposal for atomic and synchronization instructions

2007 Jul 09

[LLVMdev] Proposal for atomic and synchronization instructions

On 7/9/07, John Criswell <criswell at cs.uiuc.edu> wrote: > 1) You may want to consider adding atomic load-<bitwise operation>-store > instructions in addition to load-<add/subtract> instructions. The Linux > kernel uses these for atomic bit setting/clearing, and on many systems > they can be implemented more efficiently using special assembly > instructions.

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 13

[LLVMdev] Emitting IR in older formats (for NVVM)

Thanks, all. I didn’t realize a 7.0 RC was public and changed to 3.4—I will go down that road for now, though I’ll probably also look into integrating variants of the SPIR converter in the future. Another possibility is to skip libnvvm altogether and use LLVM's NVPTX target. This is of course harder since you have to configure the passes yourself instead of just calling a few C

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 01

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

> -----Original Message----- > From: Dan Bailey [mailto:dan at dneg.com] > Sent: Sunday, April 29, 2012 8:46 AM > To: Justin Holewinski > Cc: Jim Grosbach; llvm-commits at cs.uiuc.edu; Vinod Grover; > llvmdev at cs.uiuc.edu > Subject: Re: [llvm-commits] [PATCH][RFC] NVPTX Backend > > Justin, > > Firstly, this is great! It seems to be so much further forward in

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

2013 Apr 14

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

After reading about Intel's 'Shevlin Park' project to implement C++AMP in llvm/clang, and failing to find any code for it, I decided to try to implement something similar. I did it as an excuse to explore and hack on llvm/clang, which I hadn't done before, but it's now at the point where it will run the simplest matrix multiplication sample from MSDN, so I thought I might

similar to: __sync_synchronize() crashes when compiling OpenMP to a GPU target