thr3ads.net - llvm dev - [llvm-dev] getelementptr inbounds with offset 0 [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2019-Apr-10 15:14 UTC

[llvm-dev] getelementptr inbounds with offset 0

Hi,
>>> I see. Is there a quick answer to the questions why you need
inbounds
>>> GEPs in that case? Can't you just use non-inbounds GEPs if you
know you
>>> might not have a valid base ptr and "optimize" it to
inbounds once that
>>> is proven?
>>
>> You mean on the Rust side?  We emit GEPi for field accesses and array
indexing.
>>  We cannot always statically determine if this is happening for a ZST
or not.
>> At the same time, given that no memory access ever happens for a ZST,
allocating
>> a ZST (Box::new in Rust, think of it like new in C++) does not actually
allocate
>> any memory, it just returns an integer (sufficiently aligned) cast to a
pointer.
> 
> OK, but why not emit non-inbonuds GEPs instead? They do not come with
> the problems you have now, or maybe I misunderstand.
The problem is statically figuring out whether it should be inbounds or
non-inbounds.  When we have code like `&x[n]`, this might be an offset-by-0
in
an empty slice and hence fall into the scope of my question, or it might be a
"normal" array access where we definitely want inbounds.
>> Sure, UB is definitely *defined* in a runtime-value dependent way.  The
problem
>> here is that it is not defined in a precise way -- something where one
could
>> write an interpreter that tracks all the extra state that is needed
(like
>> poison/undef and where allocations lie) and then says precisely under
which
>> conditions we have UB and under which we do not.
>> What I am asking here for is the exact definition of GEPi if, *at
run-time*, the
>> offset is 0, and the base pointer is (a) an integer, or (b) dangling.
> 
> That last part is given by the lang-ref (imo):
>   "If the inbounds keyword is present, the result value of the
>    getelementptr is a poison value if the base pointer is not an in
>    bounds address of an allocated object"
> 
> I read this as: If you have a GEPi, you get poison if the base pointer
> is not an allocated object. That is a dangling pointer (b) causes the
> GEPi to be poison and a pointer from integer (a) may, if the address
> denoted by the integer is not inside, or one past, an allocated object.
> Now any offset except 0 will add more possible ways to generate a poison
> value.
Thanks.  That makes sense from reading the docs (though I am not convinced that
it actually helps with optimizations to be this strict here).

For the (a) case, the question about "0-sized objects" remains, but it
doesn't
seem like the answer could affect what LLVM does.

It would be really nice to have a reference interpreter for LLVM IR that can
explicitly check for all the UB.  Maybe, one day... ;)

Kind regards,
Ralf

Doerfert, Johannes via llvm-dev

2019-Apr-12 19:44 UTC

head link

[llvm-dev] getelementptr inbounds with offset 0

Hi Ralf,

On 04/10, Ralf Jung wrote:> >>> I see. Is there a quick answer to the questions why you need
inbounds
> >>> GEPs in that case? Can't you just use non-inbounds GEPs if
you know you
> >>> might not have a valid base ptr and "optimize" it to
inbounds once that
> >>> is proven?
> >>
> >> You mean on the Rust side?  We emit GEPi for field accesses and
array indexing.
> >>  We cannot always statically determine if this is happening for a
ZST or not.
> >> At the same time, given that no memory access ever happens for a
ZST, allocating
> >> a ZST (Box::new in Rust, think of it like new in C++) does not
actually allocate
> >> any memory, it just returns an integer (sufficiently aligned) cast
to a pointer.
> > 
> > OK, but why not emit non-inbonuds GEPs instead? They do not come with
> > the problems you have now, or maybe I misunderstand.
> 
> The problem is statically figuring out whether it should be inbounds or
> non-inbounds.  When we have code like `&x[n]`, this might be an
offset-by-0 in
> an empty slice and hence fall into the scope of my question, or it might be
a
> "normal" array access where we definitely want inbounds.
I'd argue, after all this discussion at least, use non-inbounds if you
do not know you have a valid object (and want to avoid undef and all
what it entails). This might cause performance regressions, if you try
it, it would be interesting to know how much. We could even look into an
"inbounds" detection in the "Attributor framework" [0] to
get some of
the performance back.

[0] https://reviews.llvm.org/D59919 (but see also the "Stack" tab that
shows
                                     related commits)
> >> Sure, UB is definitely *defined* in a runtime-value dependent way.
The problem
> >> here is that it is not defined in a precise way -- something where
one could
> >> write an interpreter that tracks all the extra state that is
needed (like
> >> poison/undef and where allocations lie) and then says precisely
under which
> >> conditions we have UB and under which we do not.
> >> What I am asking here for is the exact definition of GEPi if, *at
run-time*, the
> >> offset is 0, and the base pointer is (a) an integer, or (b)
dangling.
> > 
> > That last part is given by the lang-ref (imo):
> >   "If the inbounds keyword is present, the result value of the
> >    getelementptr is a poison value if the base pointer is not an in
> >    bounds address of an allocated object"
> > 
> > I read this as: If you have a GEPi, you get poison if the base pointer
> > is not an allocated object. That is a dangling pointer (b) causes the
> > GEPi to be poison and a pointer from integer (a) may, if the address
> > denoted by the integer is not inside, or one past, an allocated
object.
> > Now any offset except 0 will add more possible ways to generate a
poison
> > value.
> 
> Thanks.  That makes sense from reading the docs (though I am not convinced
that
> it actually helps with optimizations to be this strict here).
I never argued it does "make sense" ;)

> For the (a) case, the question about "0-sized objects" remains,
but it doesn't
> seem like the answer could affect what LLVM does.
I think I now see (maybe part of) your point.
Something like:

  x = malloc(0);
  // ... anything except free(x) or equivalent
  y = gep inbounds x, 0
  // ... anything except free(x) or equivalent
  use_but_not_dereference(y);

should be OK (= no undef/poison appears). Does that at least go in the
right direction? I think this should be OK from the IR definition or
something is broken. Obviously, there is always the possibility, or
better the certainty, that the implementation is somewhere broken ;)

> It would be really nice to have a reference interpreter for LLVM IR that
can
> explicitly check for all the UB.  Maybe, one day... ;)
Let me know once you start working on one, I'd be quite interested ;)

Cheers,
  Johannes


-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190412/24ae3ff7/attachment.sig>

Ralf Jung via llvm-dev

2019-Apr-15 09:06 UTC

head link

[llvm-dev] getelementptr inbounds with offset 0

Hi,
>> For the (a) case, the question about "0-sized objects"
remains, but it doesn't
>> seem like the answer could affect what LLVM does.
> 
> I think I now see (maybe part of) your point.
> Something like:
> 
>   x = malloc(0);
>   // ... anything except free(x) or equivalent
>   y = gep inbounds x, 0
>   // ... anything except free(x) or equivalent
>   use_but_not_dereference(y);
> 
> should be OK (= no undef/poison appears). Does that at least go in the
> right direction? I think this should be OK from the IR definition or
> something is broken. Obviously, there is always the possibility, or
> better the certainty, that the implementation is somewhere broken ;)
I guess that is a way to look at it -- though malloc can return NULL, and that
may (or may not) change the rules here.
But yes, this is the closest that you can get to in C when trying to mirror what
we do in Rust.  C does not have 0-sized types, Rust does, so there is no more
direct equivalent.

Kind regards,
Ralf

llvm dev - Apr 2019 - getelementptr inbounds with offset 0

[llvm-dev] getelementptr inbounds with offset 0

[llvm-dev] getelementptr inbounds with offset 0

[llvm-dev] getelementptr inbounds with offset 0