thr3ads.net - llvm dev - [llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null. [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2017-Jan-02 08:01 UTC

[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

Hi George,

Have you considered changing our existing behavior to match GCC's
builtin_object_size instead of adding a new parameter?  That may be
simpler overall.  There's also a clear upgrade strategy -- fold every
old style call to "<min> ? 0 : 1".

You probably already know this, but GCC folds
builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0.
We'll have to have some <min>-awareness either in clang (to decide if
the <null-is-unknown> bit needs to be set) or in the middle end.  What
is your plan here?

I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and
builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the
inverse.  This follows from the following "intuitive" rules

ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B))
ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B))

(where X is a value that can either be A or B at runtime)

and that we want to fold

ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N

However, since Malloc can return null:

ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N
ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N

and for that to be true ObjSizeMax(NULL) builtin_object_size(NULL, 0) needs to
be 0 and ObjSizeMin(NULL) builtin_object_size(NULL, 2) needs to be (unsigned)-1.

However, I'm not sure if it is up to us to change that; given the very
motivation of this thread is GCC compatibility.

-- Sanjoy

On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Happy New Year ping. :)
>
> Will ping again on Wednesday. If I don't get comments by EOD Thursday,
I'll
> assume everyone's OK with this and put together a patch.
>
> On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV
> <george.burgess.iv at gmail.com> wrote:
>>
>> tl;dr: We'd like to add a bit to @llvm.objectsize to make it
optionally be
>> conservative when it's handed a null pointer.
>>
>> Happy Wednesday!
>>
>> We're trying to fix PR23277, which is a bug about how clang+LLVM
treat
>> __builtin_object_size when it's given a null pointer. For
compatibility with
>> GCC, clang would like to be able to hand back a conservative result
(e.g.
>> (size_t)-1), but the LLVM intrinsic that clang lowers
>> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back 0
when
>> given a null pointer. Because it's often assumed that
__builtin_object_size
>> always folds to a constant, handling this only in clang may be tricky:
if we
>> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) : -1ULL)
and
>> LLVM can't fold away the null check, we've failed to emit a
constant.
>>
>> So, the best path forward that I can see is to add a "null is
unknown
>> size" bit to @llvm.objectsize, much like the "min" bit
it already has. If
>> this bit is true, null would be treated like an opaque pointer.
Otherwise,
>> @llvm.objectsize would act no differently than it does today.
>>
>> If we like this approach, I'm assuming it would be best to have
this bit
>> as a third argument to @llvm.objectsize, rather than making the second
>> argument an i8 and using it as a bag of bits.
>>
>> All thoughts/comments/alternative approaches/etc. highly appreciated.
:)
>>
>> Thanks (and Happy Holidays)!
>> George
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Sanjoy Das
http://playingwithpointers.com

George Burgess IV via llvm-dev

2017-Jan-02 18:41 UTC

head link

[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

Thanks for the comments!
> Have you considered changing our existing behavior to match GCC's
> builtin_object_size instead of adding a new parameter
Yup! My issue with turning `i1 %min` into `i8 %flags` is that
__builtin_object_size would get lowered like:

__builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i8 2)
__builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i8 2)
__builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i8 3)
(__builtin_object_size(p, 3) doesn't actually get lowered)

...Which might be a bit surprising to some people. If we think that's a
non-issue, I'm happy to take the simpler approach and use an i8.
> We'll have to have some <min>-awareness either in clang (to
decide if
> the <null-is-unknown> bit needs to be set) or in the middle end. 
What
> is your plan here?
My plan was just to always set the `null-is-unknown` bit when lowering a
call to __builtin_object_size in clang. If `min` is true, we treat unknown
and null values identically in @llvm.objectsize, so whether
`null-is-unknown` is set in that case shouldn't matter.
> However, since Malloc can return null:
I think I was unclear: this behavior would only exist if @llvm.objectsize
was actually handed null. I wasn't planning on changing how we'd handle
memory allocation functions that may return null (GCC gives back 2 for
__builtin_object_size(malloc(2), 0)). In other words, this
`null-is-unknown` bit only makes the objectsize evaluator see `T* null` as
though it was `call T* @opaque_user_defined_function()`, nothing else.

This is also consistent with how @llvm.objectsize already acts: if the
pointer it's given is the result of a call to `malloc(N)`, it'll return
N
regardless of the value of `min`.

But yeah, I agree that some of the features of __builtin_object_size aren't
entirely intuitive. :)

On Mon, Jan 2, 2017 at 12:01 AM, Sanjoy Das <sanjoy at
playingwithpointers.com>
wrote:
> Hi George,
>
> Have you considered changing our existing behavior to match GCC's
> builtin_object_size instead of adding a new parameter?  That may be
> simpler overall.  There's also a clear upgrade strategy -- fold every
> old style call to "<min> ? 0 : 1".
>
> You probably already know this, but GCC folds
> builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0.
> We'll have to have some <min>-awareness either in clang (to
decide if
> the <null-is-unknown> bit needs to be set) or in the middle end. 
What
> is your plan here?
>
> I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and
> builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the
> inverse.  This follows from the following "intuitive" rules
>
> ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B))
> ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B))
>
> (where X is a value that can either be A or B at runtime)
>
> and that we want to fold
>
> ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N
>
> However, since Malloc can return null:
>
> ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N
> ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N
>
> and for that to be true ObjSizeMax(NULL) > builtin_object_size(NULL, 0)
needs to be 0 and ObjSizeMin(NULL) > builtin_object_size(NULL, 2) needs to be
(unsigned)-1.
>
> However, I'm not sure if it is up to us to change that; given the very
> motivation of this thread is GCC compatibility.
>
> -- Sanjoy
>
> On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Happy New Year ping. :)
> >
> > Will ping again on Wednesday. If I don't get comments by EOD
Thursday,
> I'll
> > assume everyone's OK with this and put together a patch.
> >
> > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV
> > <george.burgess.iv at gmail.com> wrote:
> >>
> >> tl;dr: We'd like to add a bit to @llvm.objectsize to make it
optionally
> be
> >> conservative when it's handed a null pointer.
> >>
> >> Happy Wednesday!
> >>
> >> We're trying to fix PR23277, which is a bug about how
clang+LLVM treat
> >> __builtin_object_size when it's given a null pointer. For
compatibility
> with
> >> GCC, clang would like to be able to hand back a conservative
result
> (e.g.
> >> (size_t)-1), but the LLVM intrinsic that clang lowers
> >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back
0 when
> >> given a null pointer. Because it's often assumed that
> __builtin_object_size
> >> always folds to a constant, handling this only in clang may be
tricky:
> if we
> >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) :
-1ULL)
> and
> >> LLVM can't fold away the null check, we've failed to emit
a constant.
> >>
> >> So, the best path forward that I can see is to add a "null is
unknown
> >> size" bit to @llvm.objectsize, much like the "min"
bit it already has.
> If
> >> this bit is true, null would be treated like an opaque pointer.
> Otherwise,
> >> @llvm.objectsize would act no differently than it does today.
> >>
> >> If we like this approach, I'm assuming it would be best to
have this bit
> >> as a third argument to @llvm.objectsize, rather than making the
second
> >> argument an i8 and using it as a bag of bits.
> >>
> >> All thoughts/comments/alternative approaches/etc. highly
appreciated. :)
> >>
> >> Thanks (and Happy Holidays)!
> >> George
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
>
>
> --
> Sanjoy Das
> http://playingwithpointers.com
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170102/99eb1bb1/attachment.html>

Sanjoy Das via llvm-dev

2017-Jan-02 19:45 UTC

head link

[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

Hi George,

On Mon, Jan 2, 2017 at 10:41 AM, George Burgess IV
<george.burgess.iv at gmail.com> wrote:> Thanks for the comments!
>
>> Have you considered changing our existing behavior to match GCC's
>> builtin_object_size instead of adding a new parameter
>
> Yup! My issue with turning `i1 %min` into `i8 %flags` is that
> __builtin_object_size would get lowered like:
>
> __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i8 2)
> __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i8 2)
> __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i8 3)
> (__builtin_object_size(p, 3) doesn't actually get lowered)
>
> ...Which might be a bit surprising to some people. If we think that's a
> non-issue, I'm happy to take the simpler approach and use an i8.
What I had in mind was:

 __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i1 0)
 __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i1 0)
 __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i1 1)
 __builtin_object_size(p, 3) -> @llvm.objectsize(i8* %p, i1 1)

and changing the specification of @llvm.objectsize to match.
>> We'll have to have some <min>-awareness either in clang (to
decide if
>> the <null-is-unknown> bit needs to be set) or in the middle end. 
What
>> is your plan here?
>
> My plan was just to always set the `null-is-unknown` bit when lowering a
> call to __builtin_object_size in clang. If `min` is true, we treat unknown
> and null values identically in @llvm.objectsize, so whether
> `null-is-unknown` is set in that case shouldn't matter.
Continuing from above: in other words, since your use case is always
setting the `null-is-unknown` bit, can you re-define @llvm.objectsize
to have that semantic without adding a new parameter?

That's not backward compatible, but there's a simple but conservative
update strategy.
>> However, since Malloc can return null:
>
> I think I was unclear: this behavior would only exist if @llvm.objectsize
> was actually handed null. I wasn't planning on changing how we'd
handle
> memory allocation functions that may return null (GCC gives back 2 for
> __builtin_object_size(malloc(2), 0)). In other words, this
`null-is-unknown`
> bit only makes the objectsize evaluator see `T* null` as though it was
`call
> T* @opaque_user_defined_function()`, nothing else.
I meant to say that it looks like the semantics of ObjectSizeMin(X) is
that "return the conservative minimum object size for all values that
X may take at runtime" (resp. ObjectSizeMax(X)).

For instance this:

void i(int c, volatile int* sink) {
  void* mem1 = malloc(20);
  void* mem2 = malloc(40);

  *sink = __builtin_object_size(c ? mem1 : mem2, 0);
  *sink = __builtin_object_size(c ? mem1 : mem2, 2);
}

is lowered to

        movl    $40, (%rsi)
        movl    $20, (%rsi)
        ret

by GCC.

Applying the same logic to malloc(N), since it returns a location that
has N dereferenceable bytes or NULL, it follows that
ObjectSizeMin(malloc(N)) should return the smaller of
ObjectSizeMin(NULL) and ObjectSizeMin(MemoryLocOfNBytes) == the
smaller of ObjectSizeMin(NULL) and N.  Given that we want the result
of ObjectSizeMin(malloc(N)) == UMIN(ObjectSizeMin(NULL), N) to be N,
we'd want ObjectSizeMin(NULL) to be (unsigned)-1 for consistency.

However, none of this matters if we're not in a position to change the
specification.

-- Sanjoy

> This is also consistent with how @llvm.objectsize already acts: if the
> pointer it's given is the result of a call to `malloc(N)`, it'll
return N
> regardless of the value of `min`.
>
> But yeah, I agree that some of the features of __builtin_object_size
aren't
> entirely intuitive. :)
>
> On Mon, Jan 2, 2017 at 12:01 AM, Sanjoy Das <sanjoy at
playingwithpointers.com>
> wrote:
>>
>> Hi George,
>>
>> Have you considered changing our existing behavior to match GCC's
>> builtin_object_size instead of adding a new parameter?  That may be
>> simpler overall.  There's also a clear upgrade strategy -- fold
every
>> old style call to "<min> ? 0 : 1".
>>
>> You probably already know this, but GCC folds
>> builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0.
>> We'll have to have some <min>-awareness either in clang (to
decide if
>> the <null-is-unknown> bit needs to be set) or in the middle end. 
What
>> is your plan here?
>>
>> I also found gcc's choice of folding builtin_object_size(0, 2) to 0
and
>> builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the
>> inverse.  This follows from the following "intuitive" rules
>>
>> ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B))
>> ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B))
>>
>> (where X is a value that can either be A or B at runtime)
>>
>> and that we want to fold
>>
>> ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N
>>
>> However, since Malloc can return null:
>>
>> ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N
>> ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N
>>
>> and for that to be true ObjSizeMax(NULL) >>
builtin_object_size(NULL, 0) needs to be 0 and ObjSizeMin(NULL) >>
builtin_object_size(NULL, 2) needs to be (unsigned)-1.
>>
>> However, I'm not sure if it is up to us to change that; given the
very
>> motivation of this thread is GCC compatibility.
>>
>> -- Sanjoy
>>
>> On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > Happy New Year ping. :)
>> >
>> > Will ping again on Wednesday. If I don't get comments by EOD
Thursday,
>> > I'll
>> > assume everyone's OK with this and put together a patch.
>> >
>> > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV
>> > <george.burgess.iv at gmail.com> wrote:
>> >>
>> >> tl;dr: We'd like to add a bit to @llvm.objectsize to make
it optionally
>> >> be
>> >> conservative when it's handed a null pointer.
>> >>
>> >> Happy Wednesday!
>> >>
>> >> We're trying to fix PR23277, which is a bug about how
clang+LLVM treat
>> >> __builtin_object_size when it's given a null pointer. For
compatibility
>> >> with
>> >> GCC, clang would like to be able to hand back a conservative
result
>> >> (e.g.
>> >> (size_t)-1), but the LLVM intrinsic that clang lowers
>> >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands
back 0
>> >> when
>> >> given a null pointer. Because it's often assumed that
>> >> __builtin_object_size
>> >> always folds to a constant, handling this only in clang may be
tricky:
>> >> if we
>> >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) :
-1ULL)
>> >> and
>> >> LLVM can't fold away the null check, we've failed to
emit a constant.
>> >>
>> >> So, the best path forward that I can see is to add a
"null is unknown
>> >> size" bit to @llvm.objectsize, much like the
"min" bit it already has.
>> >> If
>> >> this bit is true, null would be treated like an opaque
pointer.
>> >> Otherwise,
>> >> @llvm.objectsize would act no differently than it does today.
>> >>
>> >> If we like this approach, I'm assuming it would be best to
have this
>> >> bit
>> >> as a third argument to @llvm.objectsize, rather than making
the second
>> >> argument an i8 and using it as a bag of bits.
>> >>
>> >> All thoughts/comments/alternative approaches/etc. highly
appreciated.
>> >> :)
>> >>
>> >> Thanks (and Happy Holidays)!
>> >> George
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>>
>>
>>
>> --
>> Sanjoy Das
>> http://playingwithpointers.com
>
>

llvm dev - Jan 2017 - RFC: Allowing @llvm.objectsize to be more conservative with null.

[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.