thr3ads.net - llvm dev - [llvm-dev] Opaque pointers and i8 GEPs [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Nikita Popov via llvm-dev

2021-Sep-04 09:17 UTC

[llvm-dev] Opaque pointers and i8 GEPs

Hi everyone,

During the transition to opaque pointers (see
https://llvm.org/docs/OpaquePointers.html for context) a somewhat recurring
issue is the choice of source element types for getelementptr instructions.
Consider these two GEPs:

getelementptr [[i32 x 10], x 10], ptr %p, i64 0, i64 1, i64 0
getelementptr i8, ptr %p, i64 40

If I counted correctly, these two GEPs perform the same address
calculation. With typed pointers, they differed in their source and result
pointer types. With opaque pointers, these GEPs are now strictly equivalent.

Various transforms in LLVM need to generate a GEP with a certain offset
from a base pointer. In order to minimize bitcasts, they will usually try
to use the source pointer element type as the GEP source element type and
try to find a sequence of GEP indices that encodes the desired offset. If
they can't do that, they fall back to bitcasts to i8*, doing an i8 GEP and
bitcasting to the result pointer type (or give up altogether).

The most sophisticated and complete implementation of this exists in SROA,
which will determine the optimal GEP to generate in some three hundred
lines of code:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/SROA.cpp#L1420-L1714

Now, with opaque pointers, doing this is entirely pointless. We can always
simply generate a "getelementptr i8, ptr %Base, i64 Offset" without
introducing bitcasts. With opaque pointers, the basic i8 GEP representation
is already the optimal one.

In addition to that, generating something other than an i8 GEP requires us
to pick an appropriate source element type. With typed pointers, that was
simply the pointer element type. With opaque pointers, doing this requires
some kind of heuristic guess. It would be possible to recurse through
operations (selects, phis, etc) until we hit a value with an associated
type, such as a GEP, alloca or global. However, doing this is not reliable,
because there might not be such a value (e.g. if it is based on a function
argument). There is no guarantee that this would arrive at the same GEP
type as typed pointers did, and I think doing this goes against the spirit
of opaque pointers.

My proposal here is that in opaque pointers mode, LLVM should consider i8
GEPs canonical for GEPs with constant offsets. We should not attempt to
"guess" a good GEP type to use, and we should not try to generate
complex
GEP structures if a simple one will do. I don't think there's really any
disadvantages to this, apart from the fact that it makes the discrepancy
between typed and opaque pointer mode larger.

Context for this mail is https://reviews.llvm.org/D109259, where Arthur
requested this to be discussed on llvm-dev.

Regards,
Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210904/c2ffbef1/attachment.html>

Nuno Lopes via llvm-dev

2021-Sep-04 11:34 UTC

head link

[llvm-dev] Opaque pointers and i8 GEPs

Sounds like a good idea. LLVM's memory model doesn't allow
field-sensitive AA, so the GEP typing doesn't matter.
The exception might be TBAA (unclear, as I'm not familiar). That case may
require multiple GEPs, one per field?  I've no clue; just wondering
what's the impact there.

Nuno

_________________________________________
From: Nikita Popov via llvm-dev
Sent: 04 September 2021 10:17
To: llvm-dev <llvm-dev at lists.llvm.org>; Arthur Eubanks <aeubanks at
google.com>
Subject: [llvm-dev] Opaque pointers and i8 GEPs

Hi everyone,

During the transition to opaque pointers (see
https://llvm.org/docs/OpaquePointers.html for context) a somewhat recurring
issue is the choice of source element types for getelementptr instructions.
Consider these two GEPs:

getelementptr [[i32 x 10], x 10], ptr %p, i64 0, i64 1, i64 0
getelementptr i8, ptr %p, i64 40

If I counted correctly, these two GEPs perform the same address calculation.
With typed pointers, they differed in their source and result pointer types.
With opaque pointers, these GEPs are now strictly equivalent.

Various transforms in LLVM need to generate a GEP with a certain offset from a
base pointer. In order to minimize bitcasts, they will usually try to use the
source pointer element type as the GEP source element type and try to find a
sequence of GEP indices that encodes the desired offset. If they can't do
that, they fall back to bitcasts to i8*, doing an i8 GEP and bitcasting to the
result pointer type (or give up altogether).

The most sophisticated and complete implementation of this exists in SROA, which
will determine the optimal GEP to generate in some three hundred lines of code:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/SROA.cpp#L1420-L1714

Now, with opaque pointers, doing this is entirely pointless. We can always
simply generate a "getelementptr i8, ptr %Base, i64 Offset" without
introducing bitcasts. With opaque pointers, the basic i8 GEP representation is
already the optimal one.

In addition to that, generating something other than an i8 GEP requires us to
pick an appropriate source element type. With typed pointers, that was simply
the pointer element type. With opaque pointers, doing this requires some kind of
heuristic guess. It would be possible to recurse through operations (selects,
phis, etc) until we hit a value with an associated type, such as a GEP, alloca
or global. However, doing this is not reliable, because there might not be such
a value (e.g. if it is based on a function argument). There is no guarantee that
this would arrive at the same GEP type as typed pointers did, and I think doing
this goes against the spirit of opaque pointers.

My proposal here is that in opaque pointers mode, LLVM should consider i8 GEPs
canonical for GEPs with constant offsets. We should not attempt to
"guess" a good GEP type to use, and we should not try to generate
complex GEP structures if a simple one will do. I don't think there's
really any disadvantages to this, apart from the fact that it makes the
discrepancy between typed and opaque pointer mode larger.

Context for this mail is https://reviews.llvm.org/D109259, where Arthur
requested this to be discussed on llvm-dev.

Regards,
Nikita

David Chisnall via llvm-dev

2021-Sep-04 13:08 UTC

head link

[llvm-dev] Opaque pointers and i8 GEPs

On 4 Sep 2021, at 10:17, Nikita Popov via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> My proposal here is that in opaque pointers mode, LLVM should consider i8
GEPs canonical for GEPs with constant offsets. We should not attempt to
"guess" a good GEP type to use, and we should not try to generate
complex GEP structures if a simple one will do. I don't think there's
really any disadvantages to this, apart from the fact that it makes the
discrepancy between typed and opaque pointer mode larger.
I completely agree.  Eventually, I’d like GEP to lose the ability to have
multiple type arguments and I think this is a good first step in that direction:

1. Always generate GEPs with i8 offsets with all address calculation in the
arithmetic leading up to the operation.
2. Canonicalise all other GEPs to this form.
3. Remove support for other kinds of GEP.

In the back end, a GEP is just a {PTR} + {expression that evaluates to an
integer} add operation.  That’s what it looks like in any target and it’s the
most that we guarantee about memory in the IR, so everything that looks as if it
supports more than this adds complexity for no benefit.  It also means that we
sometimes miss optimisation opportunities because GEPs can encode complex
arithmetic expressions that are not CSE’d because they’re a separate and special
kind of arithmetic.

David

llvm dev - Sep 2021 - Opaque pointers and i8 GEPs

[llvm-dev] Opaque pointers and i8 GEPs

[llvm-dev] Opaque pointers and i8 GEPs

[llvm-dev] Opaque pointers and i8 GEPs