Displaying 20 results from an estimated 2000 matches similar to: "Information about the number of indices in memory accesses"
2020 Oct 03
2
Information about the number of indices in memory accesses
Hi Ees,
SCEV Delinearization is the closest I know. But it has its problems. Well
for one your expression should be SCEVable.
But more importantly, SCEV Delinearization is trying to deduce something
that is high-level (actually source-level) from a low-level IR in which a
lot of this info has been lost. So, since there's not a 1-1 mapping from
high-level code to LLVM IR, going backwards will
2020 Oct 03
2
Information about the number of indices in memory accesses
Michael makes a great point about aliasing here and different indexing that
accesses the same element!
Another note: x = A[0][2] is fundamentally different depending on the type
of `A`. If e.g. A was declared: int A[10][20], there's only _one_ load. A
is a (and is treated as) a linear buffer,
and GEPs only pinpoint the specific position of A[0][2] in this buffer
(i.e. 0*10 + 2). But if A was
2014 Jun 16
3
[LLVMdev] Attaching range metadata to IntrinsicInst
Hi,
The range metadata can only be attached to LoadInst for now. I am
considering extending its usage to IntrinsicInst so that the frontend can
annotate the range of the return value of an intrinsic call. e.g.,
%a = call i32 @llvm.xxx(), !range !0
!0 = metadata !{ i32 0, i23 1024 }
The motivation behind this extension is some optimizations we are working
on for CUDA programs. Some special
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
I tried to manually assign each of 3 arrays a unique TBAA node. But it does
not seem to help: alias analysis still considers arrays as may-alias, which
most likely prevents the desired optimization. Below is the sample code
with TBAA metadata inserted. Could you please suggest what might be wrong
with it?
Many thanks,
- D.
marcusmae at M17xR4:~/forge/llvm$ opt -time-passes -enable-tbaa -tbaa
2014 Jun 17
5
[LLVMdev] Attaching range metadata to IntrinsicInst
Chandler Carruth wrote:
> This seems fine to me, but I'd like to make sure it looks OK to Nick as
> well.
I strongly prefer baking in knowledge about the intrinsics themselves
into the passes if possible. Metadata will always be secondary.
Separately, should value tracking look use range metadata when it's
available? Absolutely.
I think it should apply to all CallInst not just
2014 Jun 17
4
[LLVMdev] Attaching range metadata to IntrinsicInst
On 17 June 2014 06:41, Eli Bendersky <eliben at google.com> wrote:
> On Tue, Jun 17, 2014 at 1:38 AM, Nick Lewycky <nicholas at mxc.ca> wrote:
>
>> Chandler Carruth wrote:
>>
>>> This seems fine to me, but I'd like to make sure it looks OK to Nick as
>>> well.
>>>
>>
>> I strongly prefer baking in knowledge about the
2014 Jun 17
2
[LLVMdev] Attaching range metadata to IntrinsicInst
Eh? How do you envision this?
-eric
On Tue, Jun 17, 2014 at 2:09 PM, Jingyue Wu <jingyue at google.com> wrote:
> Hi Nick,
>
> That makes sense. I think a main issue here is that the ranges of these PTX
> special registers (e.g., threadIdx.x) depend on -target-cpu which is only
> visible to clang and llc. Would you mind we specify "target cpu" in the IR
> similar
2014 Jun 17
3
[LLVMdev] Attaching range metadata to IntrinsicInst
On Tue, Jun 17, 2014 at 2:33 PM, Jingyue Wu <jingyue at google.com> wrote:
> Hi Eric,
>
> In the IR, besides "target datalayout" and "target triple", we have a
> special "target cpu" string which is set by the Clang front-end according to
> its -target-cpu flag. We also write a Module::getTargetCPU() method to
> retrieve this string from the
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all,
Attached notunrolled.ll is a module containing reduction kernel. What I'm
trying to do is to unroll it in such way, that partial reduction on
unrolled iterations would be performed on register, and then stored to
memory only once. Currently llvm's unroller together with all standard
optimizations produce code, which stores value to memory after every
unrolled iteration, which is
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
I have written test.ll as below and ran 'opt' on it as
" opt -std-compile-opts test.ll -S -o -" . But the output shows that there
is code motion around the barrier intrinsics.
test.ll
-------
; ModuleID = 'test.bc'
define void @test(i16* %I_0, i16* %I_1, i16* %I_2, i16* %I_3, i16* %O_0) {
entry:
%T_0 = load volatile i16* %I_0
%T_1 = load volatile i16* %I_1
%T_2 =
2013 Mar 01
1
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
The identifier INT_PTX_SREG_TID_X is the name of an instruction as the
back-end sees it, and has very little to do with the name you should use in
your IR. Your best bet is to look at the include/llvm/IR/IntrinsicsNVVM.td
file and see the definitions for each intrinsic. Then, the name mapping is
just:
int_foo_bar -> llvm.foo.bar()
int_ prefix becomes llvm., and all underscores turn into
2017 Jun 22
2
Legal names for Functions and other Identifiers
Thanks for the heads up Philip !
I did come across a strange case where LLVM allowed "%" to be a part of a
function's name. This was in the context of my patch
https://reviews.llvm.org/D33985, where I prefix the name of the source
function and the Scop ( A special kind of Region that Polly can optimize,
the name of the Scop is the name of the Region ) to the name of the PTX
kernel
2013 Mar 01
0
[LLVMdev] NVPTX CUDA_ERROR_NO_BINARY_FOR_GPU
Hi Timothy,
I'm not sure what you mean by this working for other intrinsics, but
in this case, I think you want the intrinsic name
llvm.nvvm.read.ptx.sreg.tid.x.
For me, this looks like:
%x = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
Pete
On Fri, Mar 1, 2013 at 11:51 AM, Timothy Baldridge <tbaldridge at gmail.com> wrote:
> I'm building this with llvm-c, and accessing these
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
is there any guarantee that the nvptx intrinsic "llvm.nvvm.barrier0" will
not be moved around by opt ?
In other words, can I expect all the instructions above
"llvm.nvvm.barrier0" to remain above it and those below it to remain below,
after all the opt passes are run ?
If that is not the case, is there a way to define such an intrinsic ?
Thanks.
-------------- next part
2015 Jan 24
2
[LLVMdev] Proposal: pragma for branch divergence
*Hi, I am considering a language extension to Clang for optimizing GPU
programs. This extension will allow the compiler to use different
optimization strategies for divergent and non-divergent branches (to be
explained below). We have observed significant performance gain by
leveraging this proposed extension, so I want to discuss it here to see how
the community likes/dislikes the idea. I will
2015 Jan 24
2
[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence
In our experience, as Owen also suggests, a pragma or a language extension
can be avoided by a combination of static and dynamic analysis. We prefer
this approach in our compiler ;)
Regards,
Vinod
On Sat, Jan 24, 2015 at 12:09 AM, Owen Anderson <resistor at mac.com> wrote:
> Hi Jingyue,
>
> Have you considered using dynamic uniformity checks? In my experience you
> can
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue,
Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect
anchor didn't go away; ptxas is still complaining about the duplicate
definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused
the nvvm-reflect pass?
Thanks!
yuanfeng
On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote:
> According to the examples you
2016 Jul 01
2
Missing TargetPrefix for NVVM intrinsics
Justins:
I noticed that the intrinsics in IntrinsicsNVVM don't specify a
TargetPrefix. This seems like a simple omission, so I was going to
simply throw a `let TargetPrefix = "nvvm" ` block around them, but this
doesn't quite work.
There seem to be three prefixes that are used in this file. About 900
are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to
me
2016 Mar 13
2
instrumenting device code with gpucc
Hey Jingyue,
Thanks for being so responsive! I finally figured out a way to resolve the
issue: all I have to do is to use `-only-needed` when merging the device
bitcodes with llvm-link.
However, since we actually need to instrument the host code as well, I
encountered another issue when I tried to glue the instrumented host code
and fatbin together. When I only instrumented the device code, I
2015 Jan 12
3
[LLVMdev] Emitting IR in older formats (for NVVM)
This question is specifically motivated by the practical constraints of
NVVM, but I don't know anywhere better to ask (hopefully, e.g.,
@jholewinski is still following), and I believe it concerns general LLVM
issues:
NVIDIA's libNVVM is built on LLVM 3.2. This means its bitcode and LL text
parsers are from that generation. It's interface calls for adding modules
as either bitcode