similar to: Information about the number of indices in memory accesses

Displaying 20 results from an estimated 2000 matches similar to: "Information about the number of indices in memory accesses"

2020 Oct 03
2
Information about the number of indices in memory accesses
Hi Ees, SCEV Delinearization is the closest I know. But it has its problems. Well for one your expression should be SCEVable. But more importantly, SCEV Delinearization is trying to deduce something that is high-level (actually source-level) from a low-level IR in which a lot of this info has been lost. So, since there's not a 1-1 mapping from high-level code to LLVM IR, going backwards will
2020 Oct 03
2
Information about the number of indices in memory accesses
Michael makes a great point about aliasing here and different indexing that accesses the same element! Another note: x = A[0][2] is fundamentally different depending on the type of `A`. If e.g. A was declared: int A[10][20], there's only _one_ load. A is a (and is treated as) a linear buffer, and GEPs only pinpoint the specific position of A[0][2] in this buffer (i.e. 0*10 + 2). But if A was
2014 Jun 16
3
[LLVMdev] Attaching range metadata to IntrinsicInst
Hi, The range metadata can only be attached to LoadInst for now. I am considering extending its usage to IntrinsicInst so that the frontend can annotate the range of the return value of an intrinsic call. e.g., %a = call i32 @llvm.xxx(), !range !0 !0 = metadata !{ i32 0, i23 1024 } The motivation behind this extension is some optimizations we are working on for CUDA programs. Some special
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
I tried to manually assign each of 3 arrays a unique TBAA node. But it does not seem to help: alias analysis still considers arrays as may-alias, which most likely prevents the desired optimization. Below is the sample code with TBAA metadata inserted. Could you please suggest what might be wrong with it? Many thanks, - D. marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa
2014 Jun 17
5
[LLVMdev] Attaching range metadata to IntrinsicInst
Chandler Carruth wrote: > This seems fine to me, but I'd like to make sure it looks OK to Nick as > well. I strongly prefer baking in knowledge about the intrinsics themselves into the passes if possible. Metadata will always be secondary. Separately, should value tracking look use range metadata when it's available? Absolutely. I think it should apply to all CallInst not just
2014 Jun 17
4
[LLVMdev] Attaching range metadata to IntrinsicInst
On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote: > On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote: > >> Chandler Carruth wrote: >> >>> This seems fine to me, but I'd like to make sure it looks OK to Nick as >>> well. >>> >> >> I strongly prefer baking in knowledge about the
2014 Jun 17
2
[LLVMdev] Attaching range metadata to IntrinsicInst
Eh? How do you envision this? -eric On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Nick, > > That makes sense. I think a main issue here is that the ranges of these PTX > special registers (e.g., threadIdx.x) depend on -target-cpu which is only > visible to clang and llc. Would you mind we specify "target cpu" in the IR > similar
2014 Jun 17
3
[LLVMdev] Attaching range metadata to IntrinsicInst
On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote: > Hi Eric, > > In the IR, besides "target datalayout" and "target triple", we have a > special "target cpu" string which is set by the Clang front-end according to > its -target-cpu flag. We also write a Module::getTargetCPU() method to > retrieve this string from the
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
I have written test.ll as below and ran 'opt' on it as " opt -std-compile-opts test.ll -S -o -" . But the output shows that there is code motion around the barrier intrinsics. test.ll ------- ; ModuleID = 'test.bc' define void @test(i16* %I_0, i16* %I_1, i16* %I_2, i16* %I_3, i16* %O_0) { entry: %T_0 = load volatile i16* %I_0 %T_1 = load volatile i16* %I_1 %T_2 =
2013 Mar 01
1
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
The identifier INT_PTX_SREG_TID_X is the name of an instruction as the back-end sees it, and has very little to do with the name you should use in your IR. Your best bet is to look at the include/llvm/IR/IntrinsicsNVVM.td file and see the definitions for each intrinsic. Then, the name mapping is just: int_foo_bar -> llvm.foo.bar() int_ prefix becomes llvm., and all underscores turn into
2017 Jun 22
2
Legal names for Functions and other Identifiers
Thanks for the heads up Philip ! I did come across a strange case where LLVM allowed "%" to be a part of a function's name. This was in the context of my patch https://reviews.llvm.org/D33985, where I prefix the name of the source function and the Scop ( A special kind of Region that Polly can optimize, the name of the Scop is the name of the Region ) to the name of the PTX kernel
2013 Mar 01
0
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
Hi Timothy, I'm not sure what you mean by this working for other intrinsics, but in this case, I think you want the intrinsic name llvm.nvvm.read.ptx.sreg.tid.x. For me, this looks like: %x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() Pete On Fri, Mar 1, 2013 at 11:51 AM, Timothy Baldridge <tbaldridge at gmail.com> wrote: > I'm building this with llvm-c, and accessing these
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
is there any guarantee that the nvptx intrinsic "llvm.nvvm.barrier0" will not be moved around by opt ? In other words, can I expect all the instructions above "llvm.nvvm.barrier0" to remain above it and those below it to remain below, after all the opt passes are run ? If that is not the case, is there a way to define such an intrinsic ? Thanks. -------------- next part
2015 Jan 24
2
[LLVMdev] Proposal: pragma for branch divergence
*Hi, I am considering a language extension to Clang for optimizing GPU programs. This extension will allow the compiler to use different optimization strategies for divergent and non-divergent branches (to be explained below). We have observed significant performance gain by leveraging this proposed extension, so I want to discuss it here to see how the community likes/dislikes the idea. I will
2015 Jan 24
2
[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence
In our experience, as Owen also suggests, a pragma or a language extension can be avoided by a combination of static and dynamic analysis. We prefer this approach in our compiler ;) Regards, Vinod On Sat, Jan 24, 2015 at 12:09 AM, Owen Anderson <resistor at mac.com> wrote: > Hi Jingyue, > > Have you considered using dynamic uniformity checks? In my experience you > can
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you
2016 Jul 01
2
Missing TargetPrefix for NVVM intrinsics
Justins: I noticed that the intrinsics in IntrinsicsNVVM don't specify a TargetPrefix. This seems like a simple omission, so I was going to simply throw a `let TargetPrefix = "nvvm" ` block around them, but this doesn't quite work. There seem to be three prefixes that are used in this file. About 900 are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to me
2016 Mar 13
2
instrumenting device code with gpucc
Hey Jingyue, Thanks for being so responsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I
2015 Jan 12
3
[LLVMdev] Emitting IR in older formats (for NVVM)
This question is specifically motivated by the practical constraints of NVVM, but I don't know anywhere better to ask (hopefully, e.g., @jholewinski is still following), and I believe it concerns general LLVM issues: NVIDIA's libNVVM is built on LLVM 3.2. This means its bitcode and LL text parsers are from that generation. It's interface calls for adding modules as either bitcode