thr3ads.net - similar to: "[RFC] design doc for straight-line scalar optimizations"

Displaying 20 results from an estimated 400 matches similar to: "[RFC] design doc for straight-line scalar optimizations"

[RFC] design doc for straight-line scalar optimizations

2015 Aug 25

[RFC] design doc for straight-line scalar optimizations

Hi Escha, We certainly would love to generalize them as long as the performance doesn't suffer in general. If you have specific use cases that are regressed due to these optimizations, I am more than happy to take a look. On Mon, Aug 24, 2015 at 6:43 PM, escha <escha at apple.com> wrote: > > On Aug 24, 2015, at 11:10 AM, Jingyue Wu via llvm-dev < > llvm-dev at

[LLVMdev] [LSR] hoisting loop invariants in reverse order

2015 May 09

[LLVMdev] [LSR] hoisting loop invariants in reverse order

Hi, I was tracking down a performance regression and noticed that LoopStrengthReduce hoists loop invariants (e.g., the initial formulae of indvars) in the reverse order of how they appear in the loop. This reverse order creates troubles for the StraightLineStrengthReduce pass I recently add. While I understand ultimately SLSR should be able to sort independent candidates in an optimal order,

[LLVMdev] [LSR] hoisting loop invariants in reverse order

2015 May 18

[LLVMdev] [LSR] hoisting loop invariants in reverse order

It's not caused by "the insertion point is set to the default after". I should mention the reason somewhere earlier. "Reversing the order of arg0~3 is not intentional. The user list of pixel_idx happens to have pixel_idx+3, pixel_idx+2, and pixel_idx+1 in this order, so LSR simply follows this order when collecting the LSRFixups." I'm not an expert on uselist orders,

[LLVMdev] Problems in installing LNT

2014 Aug 09

[LLVMdev] Problems in installing LNT

I got Python 2.7.3 Sounds right? On Fri Aug 08 2014 at 4:45:01 PM Yi Kong <kongy.dev at gmail.com> wrote: > Hi Jingyue, > > I've never seen this error before. It looks like something to do with > virtualenv. > > What do you get by running `~/mysandbox/bin/python --version`? > > -Yi > > On 8 August 2014 23:48, Jingyue Wu <jingyue at google.com>

instrumenting device code with gpucc

2016 Mar 15

instrumenting device code with gpucc

Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. > > On Mon, Mar 14, 2016

instrumenting device code with gpucc

2016 Mar 12

instrumenting device code with gpucc

Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you

instrumenting device code with gpucc

2016 Mar 13

instrumenting device code with gpucc

Hey Jingyue, Thanks for being so responsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

[LLVMdev] Attaching range metadata to IntrinsicInst

On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Eric, > > In the IR, besides "target datalayout" and "target triple", we have a > special "target cpu" string which is set by the Clang front-end according to > its -target-cpu flag. We also write a Module::getTargetCPU() method to > retrieve this string from the

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

I'm using 7.0. I am attaching the reduced example. nvcc sync.cu -arch=sm_35 -ptx gives // .globl _Z3foov .visible .entry _Z3foov( ) { .reg .pred %p<2>; .reg .s32 %r<3>; mov.u32 %r1, %tid.x; and.b32 %r2, %r1, 1; setp.eq.b32 %p1, %r2, 1; @!%p1 bra BB7_2; bra.uni

[LLVMdev] Problems in installing LNT

2014 Aug 08

[LLVMdev] Problems in installing LNT

Hi, I followed the instructions on http://llvm.org/docs/lnt/quickstart.html, and got stuck on installation Step 4. ~/mysandbox/bin/python ~/lnt/setup.py develop said running develop error: None and returned error code 1. The lnt binary wasn't installed anywhere. Any clue? Thanks much, Jingyue -------------- next part -------------- An HTML attachment was scrubbed... URL:

instrumenting device code with gpucc

2016 Mar 05

instrumenting device code with gpucc

On Fri, Mar 4, 2016 at 5:50 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com> wrote: > Hi Jingyue, > > My name is Yuanfeng Peng, I'm a PhD student at UPenn. I'm sorry to bother > you, but I'm having trouble with gpucc in my project, and I would be really > grateful for your help! > > Currently we're trying to instrument CUDA code using LLVM 3.9, and

[RFC] BasicAA considers address spaces?

2015 Aug 07

[RFC] BasicAA considers address spaces?

+ the new llvm-dev On Fri, Aug 7, 2015 at 11:30 AM, Jingyue Wu <jingyue at google.com> wrote: > Hi folks, > > Unsurprisingly, leveraging the fact that certain address spaces don't > alias can significantly improve alias analysis precision and enhance > (observably 2x performance gain) load/store optimizations such as LICM and > DSE. > > This sounds to me an

[LLVMdev] Attaching range metadata to IntrinsicInst

2014 Jun 17

[LLVMdev] Attaching range metadata to IntrinsicInst

Eh? How do you envision this? -eric On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Nick, > > That makes sense. I think a main issue here is that the ranges of these PTX > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > visible to clang and llc. Would you mind we specify "target cpu" in the IR > similar

[RFC] BasicAA considers address spaces?

2015 Aug 07

[RFC] BasicAA considers address spaces?

On Fri, Aug 7, 2015 at 12:01 PM, Matt Arsenault <Matthew.Arsenault at amd.com> wrote: > On 08/07/2015 11:35 AM, Jingyue Wu wrote: > > + the new llvm-dev > > On Fri, Aug 7, 2015 at 11:30 AM, Jingyue Wu <jingyue at google.com> wrote: > >> Hi folks, >> >> Unsurprisingly, leveraging the fact that certain address spaces don't >> alias can

[CUDA/NVPTX] is inlining __syncthreads allowed?

2015 Aug 21

[CUDA/NVPTX] is inlining __syncthreads allowed?

Hi Justin, Is a compiler allowed to inline a function that calls __syncthreads? I saw nvcc does that, but not sure it's valid though. For example, void foo() { __syncthreads(); } if (threadIdx.x % 2 == 0) { ... foo(); } else { ... foo(); } Before inlining, all threads meet at one __syncthreads(). After inlining if (threadIdx.x % 2 == 0) { ... __syncthreads(); } else { ...

[LLVMdev] Instruction Cost

2015 Jan 15

[LLVMdev] Instruction Cost

CostModule::getInstructionCost also consults TTI ( http://llvm.org/docs/doxygen/html/CostModel_8cpp_source.html#l00380). No? Jingyue On Wed, Jan 14, 2015 at 4:05 PM, Chandler Carruth <chandlerc at google.com> wrote: > > On Wed, Jan 14, 2015 at 3:54 PM, Jingyue Wu <jingyue at google.com> wrote: > >> I'm looking for APIs that compute instruction costs, and noticed

instrumenting device code with gpucc

2016 Mar 10

instrumenting device code with gpucc

It's hard to tell what is wrong without a concrete example. E.g., what is the program you are instrumenting? What is the definition of the hook function? How did you link that definition with the binary? One thing suspicious to me is that you may have linked the definition of _Cool_MemRead_Hook as a host function instead of a device function. AFAIK, PTX assembly cannot be linked. So, if you

[LLVMdev] Failed to Unroll a Seemingly Simple Loop

2014 Jun 22

[LLVMdev] Failed to Unroll a Seemingly Simple Loop

Hi, I found LLVM cannot unroll the loop in the example below, while gcc can. Before I dig more about this issue, is this behavior as designed? bool bar(int i); void foo(int *a, int x, int y) { for (int i = 0; i < 4; ++i) { if (bar(i)) { break; } a[i] = i; } } Btw, if s/break/continue, LLVM is able to unroll it. Thanks, Jingyue -------------- next part --------------

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

2014 Dec 10

[LLVMdev] Best way for JIT to query whether llvm.fma.* is fast?

Thanks! That’s probably close enough for practical purposes. I looked at the overrides on various targets, and they all return true if the FMA hardware exists. - Arch From: Jingyue Wu [mailto:jingyue at google.com] Sent: Wednesday, December 10, 2014 2:56 PM To: Robison, Arch Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Best way for JIT to query whether llvm.fma.* is fast? Does

[LLVMdev] Problems in installing LNT

2014 Aug 11

[LLVMdev] Problems in installing LNT

Thanks for your help! After I installed the sandbox to /tmp/mysandbox instead of ~/mysandbox, everything starts to work. It still looks weird though. My home folder is not symlinked by the way. Jingyue On Sat Aug 09 2014 at 10:59:49 AM Renato Golin <renato.golin at linaro.org> wrote: > On 9 August 2014 02:56, Jingyue Wu <jingyue at google.com> wrote: > > I got > >

similar to: [RFC] design doc for straight-line scalar optimizations