thr3ads.net - llvm dev - [llvm-dev] Information about the number of indices in memory accesses [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Stefanos Baziotis via llvm-dev

2020-Oct-03 10:43 UTC

[llvm-dev] Information about the number of indices in memory accesses

Michael makes a great point about aliasing here and different indexing that
accesses the same element!

Another note: x = A[0][2] is fundamentally different depending on the type
of `A`. If e.g. A was declared: int A[10][20], there's only _one_ load. A
is a (and is treated as) a linear buffer,
and GEPs only pinpoint the specific position of A[0][2] in this buffer
(i.e. 0*10 + 2). But if A was e.g. this: int **A, there _two_ loads. One
load to get the "pointer of the zeroth row" and another load to get
the
2nd element off of that row.

So, you see, all these things make any deduction method very very imprecise.

Best,
Stefanos

Στις Σάβ, 3 Οκτ 2020 στις 5:13 π.μ., ο/η Michael Kruse <
llvmdev at meinersbur.de> έγραψε:
> As Stefanos mentioned, LLVM-IR is generally too low-level for this. C/C++
> semantics also don't make it easier due to possible aliasing.
>
> The 3 cases are very different to analyze.
>
> 1. A[..][..] is syntactically invalid. A is a single pointer.
> 2. *B[2] Is not equivalent to  B[0][2], but to B[2][0]. This
> jagged/indirect/pointer-chasing arrays accesses are expressed as chains of
> getelementptr and load instructions. However, they cannot be usefully
> optimized because pointers on each level can actually point to the same
> thing. Eg:
>
> int data[] = { 41, 42, 43 };
> B[0] = data;
> B[1] = data;
> B[2] = data;
>
> therefore
>
> B[0][1] == B[1][1] == B[2][1] == 42;
>
> 3. S.y[..] can be one or multiple getelementptr instructions and is the
> most analyzable since all indices are static at compile-time. The number of
> subscripts and their values can directly be taken from the getelementptr
> instruction(s). The pointers-to-int can still alias with other ints in the
> program.
>
> 4. These case of VLA or manually linearized array:
>
> int *C = alloca(sizeof(int) * m * n);
> C[x + m*y];
>
> This is intended to have two dimensions, but appear as just one in the
> LLVM-IR. The following (C99 VLA) is compiled to approximately the same
> LLVM-IR
>
> int C[n][m];
> C[y][x];
>
> Delinearization as mentioned by Stefanos tries to recover the two
> subscripts x and y, but can only do so heuristically. Also keep in mind
> that
> C[1][-1] appears as the same IR as C[0][m-1], so there is no unique way to
> delinerarize. In particular, one cannot just assume that if all indices are
> different, that the memory locations being accessed are different (again, a
> pointer aliasing problem)
>
> Michael
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Am Fr., 2. Okt. 2020 um 19:25 Uhr schrieb Stefanos Baziotis via llvm-dev
<
> llvm-dev at lists.llvm.org>:
>
>> Hi Ees,
>>
>> SCEV Delinearization is the closest I know. But it has its problems.
Well
>> for one your expression should be SCEVable.
>>
>> But more importantly, SCEV Delinearization is trying to deduce
something
>> that is high-level (actually source-level) from a low-level IR in which
a
>> lot of this info has been lost. So, since there's not a 1-1 mapping
from
>> high-level code to LLVM IR, going backwards will always be imperfect.
>>
>> And so since you are too trying to deduce a high-level thing, I believe
>> that any solution will be imperfect.
>>
>> Best,
>> Stefanos
>>
>> On Sat, Oct 3, 2020, 02:55 Ees via llvm-dev <llvm-dev at
lists.llvm.org>
>> wrote:
>>
>>> Anyone? I'd really appreciate any hints to look up as i'm
somewhat stuck
>>> with this.
>>>
>>> Cheers.
>>>
>>> On 23-09-2020 12:27, Ees wrote:
>>> > Hi all,
>>> >
>>> > For loads and stores i want to extract information about the
number of
>>> > indices accessed. For instance:
>>> >
>>> > struct S {int X, int *Y};
>>> >
>>> > __global__ void kernel(int *A, int **B, struct S) {
>>> >   int x = A[..][..]; // -> L: A[..][..]
>>> >   int y = *B[2];   // -> L: B[0][2]
>>> >   int z = S.y[..];  // -> L: S.1[..]
>>> >
>>> >   // etc..
>>> > }
>>> >
>>> > I am performing some preprocessing on IR to:
>>> > 1. Move constant inline GEPs into instructions
>>> > 2. For loads and stores without a GEP operand, explicitly
create a
>>> > (trivial) GEP with index 0
>>> >
>>> > So now the operand of every load and store is a GEP
instruction.
>>> >
>>> > For simple stuff i am getting the right answer but when the
index
>>> > expression becomes more complex multiple GEPs are introduced.
For
>>> > instance:
>>> >
>>> > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
>>> >
>>> > produces:
>>> >
>>> >   %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
>>> >   %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
>>> >   %8 = mul i32 %6, %7,
>>> >   %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
>>> >   %10 = add i32 %8, %9,
>>> >   %11 = add i32 %10, 1,
>>> >   %12 = mul i32 2, %11,
>>> >   %13 = zext i32 %12 to i64,
>>> >   %14 = getelementptr inbounds i32, i32* %0, i64 %13
>>> >   %15 = getelementptr inbounds i32, i32* %14, i64 2
>>> >   %16 = getelementptr inbounds i32, i32* %15, i64 3
>>> >   store i32 5, i32* %16, align 4,
>>> >
>>> > So i guess relying on the number of GEPs to figure the number
of
>>> > indices is only a heuristic. Is there a more robust way to go
on about
>>> > it? Or some example i can look at?
>>> >
>>> > PS: I'm only interested about CUDA kernels.
>>> >
>>> > Ees
>>> >
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201003/4398357a/attachment.html>

Ees via llvm-dev

2020-Oct-03 12:26 UTC

head link

[llvm-dev] Information about the number of indices in memory accesses

Stefanos, Michael, thank you for your responses. I've been toying with 
this for some days now and i kind of figured/feared this might not be 
possible, i just thought, that maybe there is something. I'm new to LLVM 
so i'm not aware of most things. In any case i've decided to take 
another direction with this.

May i ask one more thing? Say i want for each load/store to get a name 
for the base pointer. For instance the "A", "B",
"S.1" in the example
above.

In this example:

   %14 = getelementptr inbounds i32, i32* %0, i64 %13
   %15 = getelementptr inbounds i32, i32* %14, i64 2
   %16 = getelementptr inbounds i32, i32* %15, i64 3
   store i32 5, i32* %16, align 4,

I can follow the GEP chain until i hit an argument/global/alloca. But 
what if there are temporaries in between them? Is there a more 
generic/systematic way to go on about it?
Maybe some project you are aware of that does something similar to have 
a look at?

Cheers,
Ees

On 03-10-2020 12:43, Stefanos Baziotis wrote:> Michael makes a great point about aliasing here and different indexing 
> that accesses the same element!
>
> Another note: x = A[0][2] is fundamentally different depending on the 
> type of `A`. If e.g. A was declared: int A[10][20], there's only _one_ 
> load. A is a (and is treated as) a linear buffer,
> and GEPs only pinpoint the specific position of A[0][2] in this buffer 
> (i.e. 0*10 + 2). But if A was e.g. this: int **A, there _two_ loads. 
> One load to get the "pointer of the zeroth row" and another load
to
> get the 2nd element off of that row.
>
> So, you see, all these things make any deduction method very very 
> imprecise.
>
> Best,
> Stefanos
>
> Στις Σάβ, 3 Οκτ 2020 στις 5:13 π.μ., ο/η Michael Kruse 
> <llvmdev at meinersbur.de <mailto:llvmdev at meinersbur.de>>
έγραψε:
>
>     As Stefanos mentioned, LLVM-IR is generally too low-level for
>     this. C/C++ semantics also don't make it easier due to possible
>     aliasing.
>
>     The 3 cases are very different to analyze.
>
>     1. A[..][..] is syntactically invalid. A is a single pointer.
>     2. *B[2] Is not equivalent to B[0][2], but to B[2][0]. This
>     jagged/indirect/pointer-chasing arrays accesses are expressed as
>     chains of getelementptr and load instructions. However, they
>     cannot be usefully optimized because pointers on each level can
>     actually point to the same thing. Eg:
>
>     int data[] = { 41, 42, 43 };
>     B[0] = data;
>     B[1] = data;
>     B[2] = data;
>
>     therefore
>
>     B[0][1] == B[1][1] == B[2][1] == 42;
>
>     3. S.y[..] can be one or multiple getelementptr instructions and
>     is the most analyzable since all indices are static at
>     compile-time. The number of subscripts and their values can
>     directly be taken from the getelementptr instruction(s). The
>     pointers-to-int can still alias with other ints in the program.
>
>     4. These case of VLA or manually linearized array:
>
>     int *C = alloca(sizeof(int) * m * n);
>     C[x + m*y];
>
>     This is intended to have two dimensions, but appear as just one in
>     the LLVM-IR. The following (C99 VLA) is compiled to approximately
>     the same LLVM-IR
>
>     int C[n][m];
>     C[y][x];
>
>     Delinearization as mentioned by Stefanos tries to recover the two
>     subscripts x and y, but can only do so heuristically. Also keep in
>     mind that
>     C[1][-1] appears as the same IR as C[0][m-1], so there is no
>     unique way to delinerarize. In particular, one cannot just assume
>     that if all indices are different, that the memory locations being
>     accessed are different (again, a pointer aliasing problem)
>
>     Michael
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>     Am Fr., 2. Okt. 2020 um 19:25 Uhr schrieb Stefanos Baziotis via
>     llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>:
>
>         Hi Ees,
>
>         SCEV Delinearization is the closest I know. But it has its
>         problems. Well for one your expression should be SCEVable.
>
>         But more importantly, SCEV Delinearization is trying to deduce
>         something that is high-level (actually source-level) from a
>         low-level IR in which a lot of this info has been lost. So,
>         since there's not a 1-1 mapping from high-level code to LLVM
>         IR, going backwards will always be imperfect.
>
>         And so since you are too trying to deduce a high-level thing,
>         I believe that any solution will be imperfect.
>
>         Best,
>         Stefanos
>
>         On Sat, Oct 3, 2020, 02:55 Ees via llvm-dev
>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>             Anyone? I'd really appreciate any hints to look up as
i'm
>             somewhat stuck
>             with this.
>
>             Cheers.
>
>             On 23-09-2020 12:27, Ees wrote:
>             > Hi all,
>             >
>             > For loads and stores i want to extract information about
>             the number of
>             > indices accessed. For instance:
>             >
>             > struct S {int X, int *Y};
>             >
>             > __global__ void kernel(int *A, int **B, struct S) {
>             >   int x = A[..][..]; // -> L: A[..][..]
>             >   int y = *B[2];   // -> L: B[0][2]
>             >   int z = S.y[..];  // -> L: S.1[..]
>             >
>             >   // etc..
>             > }
>             >
>             > I am performing some preprocessing on IR to:
>             > 1. Move constant inline GEPs into instructions
>             > 2. For loads and stores without a GEP operand,
>             explicitly create a
>             > (trivial) GEP with index 0
>             >
>             > So now the operand of every load and store is a GEP
>             instruction.
>             >
>             > For simple stuff i am getting the right answer but when
>             the index
>             > expression becomes more complex multiple GEPs are
>             introduced. For
>             > instance:
>             >
>             > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
>             >
>             > produces:
>             >
>             >   %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
>             >   %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
>             >   %8 = mul i32 %6, %7,
>             >   %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
>             >   %10 = add i32 %8, %9,
>             >   %11 = add i32 %10, 1,
>             >   %12 = mul i32 2, %11,
>             >   %13 = zext i32 %12 to i64,
>             >   %14 = getelementptr inbounds i32, i32* %0, i64 %13
>             >   %15 = getelementptr inbounds i32, i32* %14, i64 2
>             >   %16 = getelementptr inbounds i32, i32* %15, i64 3
>             >   store i32 5, i32* %16, align 4,
>             >
>             > So i guess relying on the number of GEPs to figure the
>             number of
>             > indices is only a heuristic. Is there a more robust way
>             to go on about
>             > it? Or some example i can look at?
>             >
>             > PS: I'm only interested about CUDA kernels.
>             >
>             > Ees
>             >
>             _______________________________________________
>             LLVM Developers mailing list
>             llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>             https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201003/a922d9c9/attachment-0001.html>

Stefanos Baziotis via llvm-dev

2020-Oct-03 12:57 UTC

head link

[llvm-dev] Information about the number of indices in memory accesses

Hi Ees,

You may want to have a look at GetUnderlyingObjects (
https://llvm.org/doxygen/namespacellvm.html#ad3428471506e6c03e9395a697a897a83
).
The page takes some time to load. Alternatively, you can see the
implementation at ValueTracking.cpp (
https://llvm.org/doxygen/ValueTracking_8cpp_source.html)
and use it by including `llvm/Analysis/ValueTracking.h`.

Note that you probably don't want to call this frequently.

Best,
Stefanos

Στις Σάβ, 3 Οκτ 2020 στις 3:26 μ.μ., ο/η Ees <kayesg42 at gmail.com>
έγραψε:
> Stefanos, Michael, thank you for your responses. I've been toying with
> this for some days now and i kind of figured/feared this might not be
> possible, i just thought, that maybe there is something. I'm new to
LLVM so
> i'm not aware of most things. In any case i've decided to take
another
> direction with this.
>
> May i ask one more thing? Say i want for each load/store to get a name for
> the base pointer. For instance the "A", "B",
"S.1" in the example above.
>
> In this example:
>
>   %14 = getelementptr inbounds i32, i32* %0, i64 %13
>   %15 = getelementptr inbounds i32, i32* %14, i64 2
>   %16 = getelementptr inbounds i32, i32* %15, i64 3
>   store i32 5, i32* %16, align 4,
>
> I can follow the GEP chain until i hit an argument/global/alloca. But what
> if there are temporaries in between them? Is there a more
> generic/systematic way to go on about it?
> Maybe some project you are aware of that does something similar to have a
> look at?
>
> Cheers,
> Ees
> On 03-10-2020 12:43, Stefanos Baziotis wrote:
>
> Michael makes a great point about aliasing here and different indexing
> that accesses the same element!
>
> Another note: x = A[0][2] is fundamentally different depending on the type
> of `A`. If e.g. A was declared: int A[10][20], there's only _one_ load.
A
> is a (and is treated as) a linear buffer,
> and GEPs only pinpoint the specific position of A[0][2] in this buffer
> (i.e. 0*10 + 2). But if A was e.g. this: int **A, there _two_ loads. One
> load to get the "pointer of the zeroth row" and another load to
get the
> 2nd element off of that row.
>
> So, you see, all these things make any deduction method very very
> imprecise.
>
> Best,
> Stefanos
>
> Στις Σάβ, 3 Οκτ 2020 στις 5:13 π.μ., ο/η Michael Kruse <
> llvmdev at meinersbur.de> έγραψε:
>
>> As Stefanos mentioned, LLVM-IR is generally too low-level for this.
C/C++
>> semantics also don't make it easier due to possible aliasing.
>>
>> The 3 cases are very different to analyze.
>>
>> 1. A[..][..] is syntactically invalid. A is a single pointer.
>> 2. *B[2] Is not equivalent to  B[0][2], but to B[2][0]. This
>> jagged/indirect/pointer-chasing arrays accesses are expressed as chains
of
>> getelementptr and load instructions. However, they cannot be usefully
>> optimized because pointers on each level can actually point to the same
>> thing. Eg:
>>
>> int data[] = { 41, 42, 43 };
>> B[0] = data;
>> B[1] = data;
>> B[2] = data;
>>
>> therefore
>>
>> B[0][1] == B[1][1] == B[2][1] == 42;
>>
>> 3. S.y[..] can be one or multiple getelementptr instructions and is the
>> most analyzable since all indices are static at compile-time. The
number of
>> subscripts and their values can directly be taken from the
getelementptr
>> instruction(s). The pointers-to-int can still alias with other ints in
the
>> program.
>>
>> 4. These case of VLA or manually linearized array:
>>
>> int *C = alloca(sizeof(int) * m * n);
>> C[x + m*y];
>>
>> This is intended to have two dimensions, but appear as just one in the
>> LLVM-IR. The following (C99 VLA) is compiled to approximately the same
>> LLVM-IR
>>
>> int C[n][m];
>> C[y][x];
>>
>> Delinearization as mentioned by Stefanos tries to recover the two
>> subscripts x and y, but can only do so heuristically. Also keep in mind
>> that
>> C[1][-1] appears as the same IR as C[0][m-1], so there is no unique way
>> to delinerarize. In particular, one cannot just assume that if all
indices
>> are different, that the memory locations being accessed are different
>> (again, a pointer aliasing problem)
>>
>> Michael
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Am Fr., 2. Okt. 2020 um 19:25 Uhr schrieb Stefanos Baziotis via
llvm-dev <
>> llvm-dev at lists.llvm.org>:
>>
>>> Hi Ees,
>>>
>>> SCEV Delinearization is the closest I know. But it has its
problems.
>>> Well for one your expression should be SCEVable.
>>>
>>> But more importantly, SCEV Delinearization is trying to deduce
something
>>> that is high-level (actually source-level) from a low-level IR in
which a
>>> lot of this info has been lost. So, since there's not a 1-1
mapping from
>>> high-level code to LLVM IR, going backwards will always be
imperfect.
>>>
>>> And so since you are too trying to deduce a high-level thing, I
believe
>>> that any solution will be imperfect.
>>>
>>> Best,
>>> Stefanos
>>>
>>> On Sat, Oct 3, 2020, 02:55 Ees via llvm-dev <llvm-dev at
lists.llvm.org>
>>> wrote:
>>>
>>>> Anyone? I'd really appreciate any hints to look up as
i'm somewhat
>>>> stuck
>>>> with this.
>>>>
>>>> Cheers.
>>>>
>>>> On 23-09-2020 12:27, Ees wrote:
>>>> > Hi all,
>>>> >
>>>> > For loads and stores i want to extract information about
the number
>>>> of
>>>> > indices accessed. For instance:
>>>> >
>>>> > struct S {int X, int *Y};
>>>> >
>>>> > __global__ void kernel(int *A, int **B, struct S) {
>>>> >   int x = A[..][..]; // -> L: A[..][..]
>>>> >   int y = *B[2];   // -> L: B[0][2]
>>>> >   int z = S.y[..];  // -> L: S.1[..]
>>>> >
>>>> >   // etc..
>>>> > }
>>>> >
>>>> > I am performing some preprocessing on IR to:
>>>> > 1. Move constant inline GEPs into instructions
>>>> > 2. For loads and stores without a GEP operand, explicitly
create a
>>>> > (trivial) GEP with index 0
>>>> >
>>>> > So now the operand of every load and store is a GEP
instruction.
>>>> >
>>>> > For simple stuff i am getting the right answer but when
the index
>>>> > expression becomes more complex multiple GEPs are
introduced. For
>>>> > instance:
>>>> >
>>>> > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
>>>> >
>>>> > produces:
>>>> >
>>>> >   %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
>>>> >   %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
>>>> >   %8 = mul i32 %6, %7,
>>>> >   %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
>>>> >   %10 = add i32 %8, %9,
>>>> >   %11 = add i32 %10, 1,
>>>> >   %12 = mul i32 2, %11,
>>>> >   %13 = zext i32 %12 to i64,
>>>> >   %14 = getelementptr inbounds i32, i32* %0, i64 %13
>>>> >   %15 = getelementptr inbounds i32, i32* %14, i64 2
>>>> >   %16 = getelementptr inbounds i32, i32* %15, i64 3
>>>> >   store i32 5, i32* %16, align 4,
>>>> >
>>>> > So i guess relying on the number of GEPs to figure the
number of
>>>> > indices is only a heuristic. Is there a more robust way to
go on
>>>> about
>>>> > it? Or some example i can look at?
>>>> >
>>>> > PS: I'm only interested about CUDA kernels.
>>>> >
>>>> > Ees
>>>> >
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201003/4f4d9972/attachment.html>

llvm dev - Oct 2020 - Information about the number of indices in memory accesses

[llvm-dev] Information about the number of indices in memory accesses

[llvm-dev] Information about the number of indices in memory accesses

[llvm-dev] Information about the number of indices in memory accesses