thr3ads.net - llvm dev - [llvm-dev] getelementptr inbounds with offset 0 [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2019-Mar-26 17:20 UTC

[llvm-dev] getelementptr inbounds with offset 0

Hi Johannes,
>> So, the thinking here is: LLVM cannot exclude the possibility of an
>> object of size 0 existing at any given address.  The pointer returned
>> by "GEPi p 0" then would be one-past-the-end of such a
0-sized object.
>> Thus, "GEPi p 0" is the identitiy function for any p, it will
not
>> return poison.
> 
> I don't see the problem. The behavior I hope we want and implement is:
> 
> Either LLVM knows that %p points to an invalid address (=non-object) or
> it doesn't. If it does, %p and all GEPs on it yield poison. If it
> doesn't, it has to assume %p points to a valid address and offset 0, 1,
> 2, ... might all yield valid pointers. The special case is when we know
> %p is valid and has extend of (at most) S, then all offsets <= S,
> including 0, are potentially valid (negative extends are similar).
So you are basically saying whether the offset is 0 or not does not matter, but
whether the base is an object LLVM can now about or not does?  I see.  That
makes sense.

The reason I restricted myself to offset 0 is that we'd like to do this
without
actually having any accessible objects anywhere, which works out if the objects
have size 0.

FWIW, in <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf> we anyway
had to
make "getelementptr inbounds" on integer pointers (pointers obtained
by casting
an integer to a pointer) never yield poison directly and instead defer the
in-bound check to the time when the actual access happens.  That nicely
accommodates all uses of getelementptr that just compute addresses without ever
using them for a memory access (using them only, e.g. to compute offsets or
compare pointers).  But this is not how the LLVM LangRef is written,
unfortunately.
>> # example1
>>
>> %P1 = int2ptr 4
>> %G1 = gep inbounds %P1 0
>>
>> # example2
>>
>> %P2 = call noalias i8* @malloc(i64 12)
>> call void @free(i8* %P2)
>> %G2 = gep inbounds %P2 0
>>
>> The first happens in Rust all the time, and we rely on not getting
>> poison.  The second doesn't occur in Rust (to my knowledge), but it
>> seems somewhat inconsistent to return poison in one case and not the
>> other.
> 
> Let's start with example2, note that I renamed the values above.
> 
> %P2 is dangling (and we know it) after the free. %P2 is therefore
> poison* and so is %G2.
> 
> * or undef I'm always confused which might be bad in this conversation.
Wait, I know that C has a rule that dangling pointers are
"indeterminate" but
this is the first time I hear that LLVM has it as well.  Is that written down
anywhere?  Rust relies heavily in dangling pointers being well-behaved when used
only on comparisons and casts (no accesses), so this would be a big deal.
(Also, this rule in C is pretty much impossible to formalize and serves no
purpose that I know of, but that is a separate discussion.)
> In example1, without further information, I'd say that there is no
> poison (statically). Address 4 could be an allocated object until proven
> otherwise.
> 
> 
> I am still a little confused about the problem you see. If what I wrote
> about the implemented behavior holds true (which I am not totally sure
> of), you should not have a problem with poison even if you would
> sprinkle GEP (inbounds) %p 0 all over the place. Either %p was known to
> be invalid and so is the GEP, or %p was not known to be invalid and
> neither is the GEP. Am I missing something here?
The thing is, I am not asking about the behavior implemented today but about the
behavior of the "abstract LLVM machine" that is described by the
LangRef and
that the optimizer has to justify its transformations against.  Analyses become
smarter every day, so looking at what LLVM deduces from certain instructions is
but a snapshot.

But also, your response assumes "dangling pointers are undef/posion",
which is
new to me.  I'd be rather shocked if this is something LLVM actually relies
on
anywhere.

Kind regards,
Ralf
> 
> Cheers,
>   Johannes
> 
>>> A side-effect based on the GEP will however __locally__ introduce
an
>>> dereferencability assumption (in my opinion at least). Let's
say the
>>> code looks like this:
>>>
>>>
>>>   %G = gep inbounds (int2ptr 4) 0 ; We don't know anything
about the
>>>   dereferencability of ; the memory at address 4 here.  br %cnd,
>>>   %BB0, %BB1
>>>
>>> BB0: ; We don't know anything about the dereferencability of ;
the
>>> memory at address 4 here.  load %G ; We know the memory at address
4
>>> is dereferenceable here.  ; Though, that is due to the load and not
>>> the inbounds.  ...  br %BB1
>>>
>>> BB1: ; We don't know anything about the dereferencability of ;
the
>>> memory at address 4 here.
>>>
>>>
>>> It is a different story if you start to use the GEP in other
>>> operations, e.g., to alter control flow. Then the (potential)
>>> undefined value can propagate.
>>>
>>>
>>> Any thought on this? Did I at least get your problem description
>>> right?
>>>
>>> Cheers, Johannes
>>>
>>>
>>>
>>> P.S. Sorry if this breaks the thread and apologies that I had to
>>> remove Bruce from the CC. It turns out replying to an email you did
>>> not receive is complicated and getting on the LLVM-Dev list is
>>> nowadays as well...
>>>
>>>
>>> On 02/25, Ralf Jung via llvm-dev wrote:
>>>> Hi Bruce,
>>>>
>>>> On 25.02.19 13:10, Bruce Hoult wrote:
>>>>> LLVM has no idea whether the address computed by GEP is
actually
>>>>> within a legal object. The "inbounds" keyword is
just you, the
>>>>> programmer, promising LLVM that you know it's ok and
that you
>>>>> don't care what happens if it is actually out of
bounds.
>>>>>
>>>>>
https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
>>>>
>>>> The LangRef says I get a poison value when I am violating the
>>>> bounds. What I am asking is what exactly this means when the
offset
>>>> is 0 -- what *are* the conditions under which an offset-by-0 is
>>>> "out of bounds" and hence yields poison?  Of course
LLVM cannot
>>>> always statically determine this, but it relies on
(dynamically, on
>>>> the "LLVM abstract machine") such things not
happening, and I am
>>>> asking what exactly these dynamic conditions are.
>>>>
>>>> Kind regards, Ralf
>>>>
>>>>>
>>>>> On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev
>>>>> <llvm... at lists.llvm.org> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> What exactly are the rules for `getelementptr inbounds`
with
>>>>>> offset 0?
>>>>>>
>>>>>> In Rust, we are relying on the fact that if we use, for
example,
>>>>>> `inttoptr` to turn `4` into a pointer, we can then do
>>>>>> `getelementptr inbounds` with offset 0 on that without
LLVM
>>>>>> deducing that there actually is any dereferencable
memory at
>>>>>> location 4.  The argument is that we can think of there
being a
>>>>>> zero-sized allocation. Is that a reasonable assumption?
Can
>>>>>> something like this be documented in the LangRef?
>>>>>>
>>>>>> Relatedly, how does the situation change if the pointer
is not
>>>>>> created "out of thin air" from a fixed
integer, but is actually a
>>>>>> dangling pointer obtained previously from `malloc` (or
`alloca`
>>>>>> or whatever)?  Is getelementptr inbounds` with offset 0
on such a
>>>>>> pointer a NOP, or does it result in `poison`?  And if
that makes
>>>>>> a difference, how does that square with the fact that,
e.g., the
>>>>>> integer `0x4000` could well be inside such an
allocation, but
>>>>>> doing `getelementptr inbounds` with offset 0 on that
would fall
>>>>>> under the first question above?
>>>>>>
>>>>>> Kind regards, Ralf
>>>>>> _______________________________________________ LLVM
Developers
>>>>>> mailing list llvm... at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> _______________________________________________ LLVM Developers
>>>> mailing list llvm... at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>

Doerfert, Johannes via llvm-dev

2019-Mar-26 21:19 UTC

head link

[llvm-dev] getelementptr inbounds with offset 0

Hi Ralf,

On 03/26, Ralf Jung wrote:> >> So, the thinking here is: LLVM cannot exclude the possibility of
an
> >> object of size 0 existing at any given address.  The pointer
returned
> >> by "GEPi p 0" then would be one-past-the-end of such a
0-sized object.
> >> Thus, "GEPi p 0" is the identitiy function for any p, it
will not
> >> return poison.
> > 
> > I don't see the problem. The behavior I hope we want and implement
is:
> > 
> > Either LLVM knows that %p points to an invalid address (=non-object)
or
> > it doesn't. If it does, %p and all GEPs on it yield poison. If it
> > doesn't, it has to assume %p points to a valid address and offset
0, 1,
> > 2, ... might all yield valid pointers. The special case is when we
know
> > %p is valid and has extend of (at most) S, then all offsets <= S,
> > including 0, are potentially valid (negative extends are similar).
> 
> So you are basically saying whether the offset is 0 or not does not matter,
but
> whether the base is an object LLVM can now about or not does?  I see.  That
> makes sense.
Yes, if we are not in the special case (object valid and extend is known).
> The reason I restricted myself to offset 0 is that we'd like to do this
without
> actually having any accessible objects anywhere, which works out if the
objects
> have size 0.
Now that reasoning works from a conceptual standpoint only for
non-inbounds GEPs, I think. From a practical standpoint my above
description will probably make sure everything works out just fine (see
also my rephrased answer down below!). I say this because I think the
following lang-ref passage makes sure everything, not only memory
accesses, involving a non-pointer-to-object* GEP is poison:
  "If the inbounds keyword is present, the result value of the
   getelementptr is a poison value if the base pointer is not an in
   bounds address of an allocated object"

* I would argue every object needs to have an extend, hence cannot be
  zero-sized.

> FWIW, in <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf> we
anyway had to
> make "getelementptr inbounds" on integer pointers (pointers
obtained by casting
> an integer to a pointer) never yield poison directly and instead defer the
> in-bound check to the time when the actual access happens.  That nicely
> accommodates all uses of getelementptr that just compute addresses without
ever
> using them for a memory access (using them only, e.g. to compute offsets or
> compare pointers).  But this is not how the LLVM LangRef is written,
unfortunately.
I see. Is there a quick answer to the questions why you need inbounds
GEPs in that case? Can't you just use non-inbounds GEPs if you know you
might not have a valid base ptr and "optimize" it to inbounds once
that
is proven?
> >> # example1
> >>
> >> %P1 = int2ptr 4
> >> %G1 = gep inbounds %P1 0
> >>
> >> # example2
> >>
> >> %P2 = call noalias i8* @malloc(i64 12)
> >> call void @free(i8* %P2)
> >> %G2 = gep inbounds %P2 0
> >>
> >> The first happens in Rust all the time, and we rely on not getting
> >> poison.  The second doesn't occur in Rust (to my knowledge),
but it
> >> seems somewhat inconsistent to return poison in one case and not
the
> >> other.
> > 
> > Let's start with example2, note that I renamed the values above.
> > 
> > %P2 is dangling (and we know it) after the free. %P2 is therefore
> > poison* and so is %G2.
> > 
> > * or undef I'm always confused which might be bad in this
conversation.
> 
> Wait, I know that C has a rule that dangling pointers are
"indeterminate" but
> this is the first time I hear that LLVM has it as well.  Is that written
down
> anywhere?  Rust relies heavily in dangling pointers being well-behaved when
used
> only on comparisons and casts (no accesses), so this would be a big deal.
> (Also, this rule in C is pretty much impossible to formalize and serves no
> purpose that I know of, but that is a separate discussion.)
I am not very formal in this thread and I realize that this might be a
problem, sorry. The above quote from the lang-ref [0] is why I think
"dangling" inbounds GEPs are poison, do you concur?

[0] https://llvm.org/docs/LangRef.html#getelementptr-instruction

> > In example1, without further information, I'd say that there is no
> > poison (statically). Address 4 could be an allocated object until
proven
> > otherwise.
> > 
> > 
> > I am still a little confused about the problem you see. If what I
wrote
> > about the implemented behavior holds true (which I am not totally sure
> > of), you should not have a problem with poison even if you would
> > sprinkle GEP (inbounds) %p 0 all over the place. Either %p was known
to
> > be invalid and so is the GEP, or %p was not known to be invalid and
> > neither is the GEP. Am I missing something here?
> 
> The thing is, I am not asking about the behavior implemented today but
about the
> behavior of the "abstract LLVM machine" that is described by the
LangRef and
> that the optimizer has to justify its transformations against.  Analyses
become
> smarter every day, so looking at what LLVM deduces from certain
instructions is
> but a snapshot.
I agree with your intent, but: My argument here was not to say we cannot
figure X out today so all is good. What I wanted to say/should have said
is something more along the line of:
  Undefined behavior in C/LLVM-IR is often (runtime) value dependent and
  therefore statically not decidable. If it is not, the code must be
  assumed to have defined (="the normal") behavior statically. This
  should be preserved by current and future LLVM passes. Your particular
  example (example1) seems to me like such a case in which the semantics
  is statically not decidable and therefore I do not see any problem.

Again, I might just be wrong about. Please don't pin it on me at the end
of the day.
> But also, your response assumes "dangling pointers are
undef/posion", which is
> new to me.  I'd be rather shocked if this is something LLVM actually
relies on
> anywhere.
Again, that is how I read the quoted lang-ref wording above for
inbounds GEPs. I agree with you that non-inbounds GEPs have a "normal"
value that can be used for all non-access instructions in the usual way
without producing undef/poison.

Cheers,
  Johannes

> >>> A side-effect based on the GEP will however __locally__
introduce an
> >>> dereferencability assumption (in my opinion at least).
Let's say the
> >>> code looks like this:
> >>>
> >>>
> >>>   %G = gep inbounds (int2ptr 4) 0 ; We don't know anything
about the
> >>>   dereferencability of ; the memory at address 4 here.  br
%cnd,
> >>>   %BB0, %BB1
> >>>
> >>> BB0: ; We don't know anything about the dereferencability
of ; the
> >>> memory at address 4 here.  load %G ; We know the memory at
address 4
> >>> is dereferenceable here.  ; Though, that is due to the load
and not
> >>> the inbounds.  ...  br %BB1
> >>>
> >>> BB1: ; We don't know anything about the dereferencability
of ; the
> >>> memory at address 4 here.
> >>>
> >>>
> >>> It is a different story if you start to use the GEP in other
> >>> operations, e.g., to alter control flow. Then the (potential)
> >>> undefined value can propagate.
> >>>
> >>>
> >>> Any thought on this? Did I at least get your problem
description
> >>> right?
> >>>
> >>> Cheers, Johannes
> >>>
> >>>
> >>>
> >>> P.S. Sorry if this breaks the thread and apologies that I had
to
> >>> remove Bruce from the CC. It turns out replying to an email
you did
> >>> not receive is complicated and getting on the LLVM-Dev list is
> >>> nowadays as well...
> >>>
> >>>
> >>> On 02/25, Ralf Jung via llvm-dev wrote:
> >>>> Hi Bruce,
> >>>>
> >>>> On 25.02.19 13:10, Bruce Hoult wrote:
> >>>>> LLVM has no idea whether the address computed by GEP
is actually
> >>>>> within a legal object. The "inbounds"
keyword is just you, the
> >>>>> programmer, promising LLVM that you know it's ok
and that you
> >>>>> don't care what happens if it is actually out of
bounds.
> >>>>>
> >>>>>
https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
> >>>>
> >>>> The LangRef says I get a poison value when I am violating
the
> >>>> bounds. What I am asking is what exactly this means when
the offset
> >>>> is 0 -- what *are* the conditions under which an
offset-by-0 is
> >>>> "out of bounds" and hence yields poison?  Of
course LLVM cannot
> >>>> always statically determine this, but it relies on
(dynamically, on
> >>>> the "LLVM abstract machine") such things not
happening, and I am
> >>>> asking what exactly these dynamic conditions are.
> >>>>
> >>>> Kind regards, Ralf
> >>>>
> >>>>>
> >>>>> On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev
> >>>>> <llvm... at lists.llvm.org> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> What exactly are the rules for `getelementptr
inbounds` with
> >>>>>> offset 0?
> >>>>>>
> >>>>>> In Rust, we are relying on the fact that if we
use, for example,
> >>>>>> `inttoptr` to turn `4` into a pointer, we can then
do
> >>>>>> `getelementptr inbounds` with offset 0 on that
without LLVM
> >>>>>> deducing that there actually is any dereferencable
memory at
> >>>>>> location 4.  The argument is that we can think of
there being a
> >>>>>> zero-sized allocation. Is that a reasonable
assumption?  Can
> >>>>>> something like this be documented in the LangRef?
> >>>>>>
> >>>>>> Relatedly, how does the situation change if the
pointer is not
> >>>>>> created "out of thin air" from a fixed
integer, but is actually a
> >>>>>> dangling pointer obtained previously from `malloc`
(or `alloca`
> >>>>>> or whatever)?  Is getelementptr inbounds` with
offset 0 on such a
> >>>>>> pointer a NOP, or does it result in `poison`?  And
if that makes
> >>>>>> a difference, how does that square with the fact
that, e.g., the
> >>>>>> integer `0x4000` could well be inside such an
allocation, but
> >>>>>> doing `getelementptr inbounds` with offset 0 on
that would fall
> >>>>>> under the first question above?
> >>>>>>
> >>>>>> Kind regards, Ralf
> >>>>>> _______________________________________________
LLVM Developers
> >>>>>> mailing list llvm... at lists.llvm.org
> >>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>> _______________________________________________ LLVM
Developers
> >>>> mailing list llvm... at lists.llvm.org
> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> > 
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190326/89d73cd8/attachment.sig>

Ralf Jung via llvm-dev

2019-Mar-27 08:09 UTC

head link

[llvm-dev] getelementptr inbounds with offset 0

Hi Johannes,
> Now that reasoning works from a conceptual standpoint only for
> non-inbounds GEPs, I think. From a practical standpoint my above
> description will probably make sure everything works out just fine (see
> also my rephrased answer down below!). I say this because I think the
> following lang-ref passage makes sure everything, not only memory
> accesses, involving a non-pointer-to-object* GEP is poison:
>   "If the inbounds keyword is present, the result value of the
>    getelementptr is a poison value if the base pointer is not an in
>    bounds address of an allocated object"
> 
> * I would argue every object needs to have an extend, hence cannot be
>   zero-sized.
I would find that a rather surprising exception / special case.  There's
nothing
wrong with objects of size 0.
>> FWIW, in <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>
we anyway had to
>> make "getelementptr inbounds" on integer pointers (pointers
obtained by casting
>> an integer to a pointer) never yield poison directly and instead defer
the
>> in-bound check to the time when the actual access happens.  That nicely
>> accommodates all uses of getelementptr that just compute addresses
without ever
>> using them for a memory access (using them only, e.g. to compute
offsets or
>> compare pointers).  But this is not how the LLVM LangRef is written,
unfortunately.
> 
> I see. Is there a quick answer to the questions why you need inbounds
> GEPs in that case? Can't you just use non-inbounds GEPs if you know you
> might not have a valid base ptr and "optimize" it to inbounds
once that
> is proven?
You mean on the Rust side?  We emit GEPi for field accesses and array indexing.
 We cannot always statically determine if this is happening for a ZST or not.
At the same time, given that no memory access ever happens for a ZST, allocating
a ZST (Box::new in Rust, think of it like new in C++) does not actually allocate
any memory, it just returns an integer (sufficiently aligned) cast to a pointer.
>>> Let's start with example2, note that I renamed the values
above.
>>>
>>> %P2 is dangling (and we know it) after the free. %P2 is therefore
>>> poison* and so is %G2.
>>>
>>> * or undef I'm always confused which might be bad in this
conversation.
>>
>> Wait, I know that C has a rule that dangling pointers are
"indeterminate" but
>> this is the first time I hear that LLVM has it as well.  Is that
written down
>> anywhere?  Rust relies heavily in dangling pointers being well-behaved
when used
>> only on comparisons and casts (no accesses), so this would be a big
deal.
>> (Also, this rule in C is pretty much impossible to formalize and serves
no
>> purpose that I know of, but that is a separate discussion.)
> 
> I am not very formal in this thread and I realize that this might be a
> problem, sorry. The above quote from the lang-ref [0] is why I think
> "dangling" inbounds GEPs are poison, do you concur?
> 
> [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
You said above that even the *input* (%P2) would be poison.  That is the part I
am doubting.
If the input is not poison (just dangling), then we come back to the original
question behind example2 -- and yes, I can see a reading of the GEPi spec that
makes this poison.  On the other hand, this would make a dangling pointer
(formerly pointing to an object) behave different than an integer pointer that
never pointed to any object, which seems odd.
> I agree with your intent, but: My argument here was not to say we cannot
> figure X out today so all is good. What I wanted to say/should have said
> is something more along the line of:
>   Undefined behavior in C/LLVM-IR is often (runtime) value dependent and
>   therefore statically not decidable. If it is not, the code must be
>   assumed to have defined (="the normal") behavior statically.
This
>   should be preserved by current and future LLVM passes. Your particular
>   example (example1) seems to me like such a case in which the semantics
>   is statically not decidable and therefore I do not see any problem.
> 
> Again, I might just be wrong about. Please don't pin it on me at the
end
> of the day.
Sure, UB is definitely *defined* in a runtime-value dependent way.  The problem
here is that it is not defined in a precise way -- something where one could
write an interpreter that tracks all the extra state that is needed (like
poison/undef and where allocations lie) and then says precisely under which
conditions we have UB and under which we do not.
What I am asking here for is the exact definition of GEPi if, *at run-time*, the
offset is 0, and the base pointer is (a) an integer, or (b) dangling.
>> But also, your response assumes "dangling pointers are
undef/posion", which is
>> new to me.  I'd be rather shocked if this is something LLVM
actually relies on
>> anywhere.
> 
> Again, that is how I read the quoted lang-ref wording above for
> inbounds GEPs. I agree with you that non-inbounds GEPs have a
"normal"
> value that can be used for all non-access instructions in the usual way
> without producing undef/poison.
I must be missing something here.  You said above "%P2 is dangling (and we
know
it) after the free. %P2 is therefore poison" -- at this point, GEPi has not
even
happened yet!  If GEPi does something, it will make the *output* poison (%G2),
but you are saying the *input* becomes poison (%P1), and that cannot be a
consequence of GEPi at all.

Kind regards,
Ralf
> 
> Cheers,
>   Johannes
> 
> 
>>>>> A side-effect based on the GEP will however __locally__
introduce an
>>>>> dereferencability assumption (in my opinion at least).
Let's say the
>>>>> code looks like this:
>>>>>
>>>>>
>>>>>   %G = gep inbounds (int2ptr 4) 0 ; We don't know
anything about the
>>>>>   dereferencability of ; the memory at address 4 here.  br
%cnd,
>>>>>   %BB0, %BB1
>>>>>
>>>>> BB0: ; We don't know anything about the
dereferencability of ; the
>>>>> memory at address 4 here.  load %G ; We know the memory at
address 4
>>>>> is dereferenceable here.  ; Though, that is due to the load
and not
>>>>> the inbounds.  ...  br %BB1
>>>>>
>>>>> BB1: ; We don't know anything about the
dereferencability of ; the
>>>>> memory at address 4 here.
>>>>>
>>>>>
>>>>> It is a different story if you start to use the GEP in
other
>>>>> operations, e.g., to alter control flow. Then the
(potential)
>>>>> undefined value can propagate.
>>>>>
>>>>>
>>>>> Any thought on this? Did I at least get your problem
description
>>>>> right?
>>>>>
>>>>> Cheers, Johannes
>>>>>
>>>>>
>>>>>
>>>>> P.S. Sorry if this breaks the thread and apologies that I
had to
>>>>> remove Bruce from the CC. It turns out replying to an email
you did
>>>>> not receive is complicated and getting on the LLVM-Dev list
is
>>>>> nowadays as well...
>>>>>
>>>>>
>>>>> On 02/25, Ralf Jung via llvm-dev wrote:
>>>>>> Hi Bruce,
>>>>>>
>>>>>> On 25.02.19 13:10, Bruce Hoult wrote:
>>>>>>> LLVM has no idea whether the address computed by
GEP is actually
>>>>>>> within a legal object. The "inbounds"
keyword is just you, the
>>>>>>> programmer, promising LLVM that you know it's
ok and that you
>>>>>>> don't care what happens if it is actually out
of bounds.
>>>>>>>
>>>>>>>
https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
>>>>>>
>>>>>> The LangRef says I get a poison value when I am
violating the
>>>>>> bounds. What I am asking is what exactly this means
when the offset
>>>>>> is 0 -- what *are* the conditions under which an
offset-by-0 is
>>>>>> "out of bounds" and hence yields poison?  Of
course LLVM cannot
>>>>>> always statically determine this, but it relies on
(dynamically, on
>>>>>> the "LLVM abstract machine") such things not
happening, and I am
>>>>>> asking what exactly these dynamic conditions are.
>>>>>>
>>>>>> Kind regards, Ralf
>>>>>>
>>>>>>>
>>>>>>> On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via
llvm-dev
>>>>>>> <llvm... at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> What exactly are the rules for `getelementptr
inbounds` with
>>>>>>>> offset 0?
>>>>>>>>
>>>>>>>> In Rust, we are relying on the fact that if we
use, for example,
>>>>>>>> `inttoptr` to turn `4` into a pointer, we can
then do
>>>>>>>> `getelementptr inbounds` with offset 0 on that
without LLVM
>>>>>>>> deducing that there actually is any
dereferencable memory at
>>>>>>>> location 4.  The argument is that we can think
of there being a
>>>>>>>> zero-sized allocation. Is that a reasonable
assumption?  Can
>>>>>>>> something like this be documented in the
LangRef?
>>>>>>>>
>>>>>>>> Relatedly, how does the situation change if the
pointer is not
>>>>>>>> created "out of thin air" from a
fixed integer, but is actually a
>>>>>>>> dangling pointer obtained previously from
`malloc` (or `alloca`
>>>>>>>> or whatever)?  Is getelementptr inbounds` with
offset 0 on such a
>>>>>>>> pointer a NOP, or does it result in `poison`? 
And if that makes
>>>>>>>> a difference, how does that square with the
fact that, e.g., the
>>>>>>>> integer `0x4000` could well be inside such an
allocation, but
>>>>>>>> doing `getelementptr inbounds` with offset 0 on
that would fall
>>>>>>>> under the first question above?
>>>>>>>>
>>>>>>>> Kind regards, Ralf
>>>>>>>> _______________________________________________
LLVM Developers
>>>>>>>> mailing list llvm... at lists.llvm.org
>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>> _______________________________________________ LLVM
Developers
>>>>>> mailing list llvm... at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>
>

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Mar 2019 - getelementptr inbounds with offset 0

[llvm-dev] getelementptr inbounds with offset 0

[llvm-dev] getelementptr inbounds with offset 0

[llvm-dev] getelementptr inbounds with offset 0

Seemingly Similar Threads