similar to: [LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs"

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

2014 Apr 21

2

[LLVMdev] [NVPTX] Eliminate common sub-expressions in a group of similar GEPs

Hi Hal, Thanks for your comments! I'm inlining my responses below. Jingyue On Sat, Apr 19, 2014 at 6:38 AM, Hal Finkel <hfinkel at anl.gov> wrote: > Jingyue, > > I can't speak for the NVPTX backend, but I think this looks useful as an > (optional) target-independent pass. A few thoughts: > > - Running GVN tends to be pretty expensive; have you tried EarlyCSE

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

3

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

2

[CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 16

3

[LLVMdev] Attaching range metadata to IntrinsicInst

Hi, The range metadata can only be attached to LoadInst for now. I am considering extending its usage to IntrinsicInst so that the frontend can annotate the range of the return value of an intrinsic call. e.g., %a = call i32 @llvm.xxx(), !range !0 !0 = metadata !{ i32 0, i23 1024 } The motivation behind this extension is some optimizations we are working on for CUDA programs. Some special

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

5

[LLVMdev] Attaching range metadata to IntrinsicInst

Chandler Carruth wrote: > This seems fine to me, but I'd like to make sure it looks OK to Nick as > well. I strongly prefer baking in knowledge about the intrinsics themselves into the passes if possible. Metadata will always be secondary. Separately, should value tracking look use range metadata when it's available? Absolutely. I think it should apply to all CallInst not just

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

4

[LLVMdev] Attaching range metadata to IntrinsicInst

On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: > On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: > >> Chandler Carruth wrote: >> >>> This seems fine to me, but I'd like to make sure it looks OK to Nick as >>> well. >>> >> >> I strongly prefer baking in knowledge about the

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

3

[LLVMdev] Attaching range metadata to IntrinsicInst

On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Eric, > > In the IR, besides "target datalayout" and "target triple", we have a > special "target cpu" string which is set by the Clang front-end according to > its -target-cpu flag. We also write a Module::getTargetCPU() method to > retrieve this string from the

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

2

[LLVMdev] Attaching range metadata to IntrinsicInst

Eh? How do you envision this? -eric On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Nick, > > That makes sense. I think a main issue here is that the ranges of these PTX > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > visible to clang and llc. Would you mind we specify "target cpu" in the IR > similar

[LLVMdev] Proposal: pragma for branch divergence

2015 Jan 24

2

[LLVMdev] Proposal: pragma for branch divergence

*Hi, I am considering a language extension to Clang for optimizing GPU programs. This extension will allow the compiler to use different optimization strategies for divergent and non-divergent branches (to be explained below). We have observed significant performance gain by leveraging this proposed extension, so I want to discuss it here to see how the community likes/dislikes the idea. I will

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

2015 Jan 24

2

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

In our experience, as Owen also suggests, a pragma or a language extension can be avoided by a combination of static and dynamic analysis. We prefer this approach in our compiler ;) Regards, Vinod On Sat, Jan 24, 2015 at 12:09 AM, Owen Anderson <resistor at mac.com> wrote: > Hi Jingyue, > > Have you considered using dynamic uniformity checks? In my experience you > can

how to run linear regression models at once

2011 Jan 07

2

how to run linear regression models at once

hey, folks, I have two very simple questions. I am not quite sure if I can do this using R. so, I am analyzing a large data frame with thousands of variables. For example: Dependent variables: d1, d2, d3 (i.e., there are three dependent variables) Independent variables: s1, s2, s3, ......s1000 (i.e., there are 1000 independent variables) now I want to run simple linear regression analyses of

LLVM backend -- Avoid base+index address mode for X86

2016 Oct 17

4

LLVM backend -- Avoid base+index address mode for X86

Hi All, I have a question regarding LLVM backend. I appreciate a lot if anyone can provide some hints. My work here is to avoid base+index address mode for X86 target, to allow base-register only or index-register only address mode. For example, "mov (%rsi), %rbx" is allowed, but "mov (%rsi, %rax), %rbx" is not allowed. I understand LLVM backend is a complex system. Can any

[RFC] design doc for straight-line scalar optimizations

2015 Aug 25

3

[RFC] design doc for straight-line scalar optimizations

Hi Escha, We certainly would love to generalize them as long as the performance doesn't suffer in general. If you have specific use cases that are regressed due to these optimizations, I am more than happy to take a look. On Mon, Aug 24, 2015 at 6:43 PM, escha <escha at apple.com> wrote: > > On Aug 24, 2015, at 11:10 AM, Jingyue Wu via llvm-dev < > llvm-dev at

Information about the number of indices in memory accesses

2020 Sep 23

2

Information about the number of indices in memory accesses

Hi all, For loads and stores i want to extract information about the number of indices accessed. For instance: struct S {int X, int *Y}; __global__ void kernel(int *A, int **B, struct S) { int x = A[..][..]; // -> L: A[..][..] int y = *B[2]; // -> L: B[0][2] int z = S.y[..]; // -> L: S.1[..] // etc.. } I am performing some preprocessing on IR to: 1. Move constant

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

2015 Jan 25

2

[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence

Hi Owen and Vinod, Thanks for sharing the paper! I like the idea a lot. Regarding the paper itself, Vinod, are the consensual branches (e.g., cbranch.ifnone) you mentioned in the paper publicly available in PTX ISA? Owen, could you explain more on the approach of using branch-if-none instructions in your mind? I believe you have lots of great insights, but I don't see how cbranch.ifnone

Information about the number of indices in memory accesses

2020 Oct 03

2

Information about the number of indices in memory accesses

Hi Ees, SCEV Delinearization is the closest I know. But it has its problems. Well for one your expression should be SCEVable. But more importantly, SCEV Delinearization is trying to deduce something that is high-level (actually source-level) from a low-level IR in which a lot of this info has been lost. So, since there's not a 1-1 mapping from high-level code to LLVM IR, going backwards will

[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution

2015 Jun 30

5

[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution

Hi Adam, Indvar widening can sometimes be harmful for architectures (e.g. NVPTX and AMDGPU) where wider integer operations are more expensive ( https://llvm.org/bugs/show_bug.cgi?id=21148). For this reason, we disabled indvar widening in NVPTX in http://reviews.llvm.org/D6196. Hope it helps. Jingyue On Mon, Jun 29, 2015 at 11:59 AM Adam Nemet <anemet at apple.com> wrote: > > >

[RFC] design doc for straight-line scalar optimizations

2015 Aug 24

4

[RFC] design doc for straight-line scalar optimizations

Hi, As you may have noticed, since last year, we (Google's CUDA compiler team) have contributed quite a lot to the effort of optimizing LLVM for CUDA programs. I think it's worthwhile to write some docs to wrap them up for two reasons. 1) Whoever wants to understand or work on these optimizations has some detailed docs instead of just source code to refer to. 2) RFC on how to improve

[RFC] BasicAA considers address spaces?

2015 Aug 07

6

[RFC] BasicAA considers address spaces?

+ the new llvm-dev On Fri, Aug 7, 2015 at 11:30 AM, Jingyue Wu <jingyue at google.com> wrote: > Hi folks, > > Unsurprisingly, leveraging the fact that certain address spaces don't > alias can significantly improve alias analysis precision and enhance > (observably 2x performance gain) load/store optimizations such as LICM and > DSE. > > This sounds to me an

[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution

2015 Jul 01

3

[LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution

----- Original Message ----- > From: "Bjarke Roune" <broune at google.com> > To: "Jingyue Wu" <jingyue at google.com> > Cc: llvmdev at cs.uiuc.edu > Sent: Tuesday, June 30, 2015 8:16:13 PM > Subject: Re: [LLVMdev] Deriving undefined behavior from nsw/inbounds/poison for scalar evolution > > Hi Adam, > > Jingyue is right. We need to keep