Sanjoy Das via llvm-dev
2017-Jan-02 08:01 UTC
[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.
Hi George, Have you considered changing our existing behavior to match GCC's builtin_object_size instead of adding a new parameter? That may be simpler overall. There's also a clear upgrade strategy -- fold every old style call to "<min> ? 0 : 1". You probably already know this, but GCC folds builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0. We'll have to have some <min>-awareness either in clang (to decide if the <null-is-unknown> bit needs to be set) or in the middle end. What is your plan here? I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the inverse. This follows from the following "intuitive" rules ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B)) ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B)) (where X is a value that can either be A or B at runtime) and that we want to fold ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N However, since Malloc can return null: ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N and for that to be true ObjSizeMax(NULL) builtin_object_size(NULL, 0) needs to be 0 and ObjSizeMin(NULL) builtin_object_size(NULL, 2) needs to be (unsigned)-1. However, I'm not sure if it is up to us to change that; given the very motivation of this thread is GCC compatibility. -- Sanjoy On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Happy New Year ping. :) > > Will ping again on Wednesday. If I don't get comments by EOD Thursday, I'll > assume everyone's OK with this and put together a patch. > > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV > <george.burgess.iv at gmail.com> wrote: >> >> tl;dr: We'd like to add a bit to @llvm.objectsize to make it optionally be >> conservative when it's handed a null pointer. >> >> Happy Wednesday! >> >> We're trying to fix PR23277, which is a bug about how clang+LLVM treat >> __builtin_object_size when it's given a null pointer. For compatibility with >> GCC, clang would like to be able to hand back a conservative result (e.g. >> (size_t)-1), but the LLVM intrinsic that clang lowers >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back 0 when >> given a null pointer. Because it's often assumed that __builtin_object_size >> always folds to a constant, handling this only in clang may be tricky: if we >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) : -1ULL) and >> LLVM can't fold away the null check, we've failed to emit a constant. >> >> So, the best path forward that I can see is to add a "null is unknown >> size" bit to @llvm.objectsize, much like the "min" bit it already has. If >> this bit is true, null would be treated like an opaque pointer. Otherwise, >> @llvm.objectsize would act no differently than it does today. >> >> If we like this approach, I'm assuming it would be best to have this bit >> as a third argument to @llvm.objectsize, rather than making the second >> argument an i8 and using it as a bag of bits. >> >> All thoughts/comments/alternative approaches/etc. highly appreciated. :) >> >> Thanks (and Happy Holidays)! >> George > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Sanjoy Das http://playingwithpointers.com
George Burgess IV via llvm-dev
2017-Jan-02 18:41 UTC
[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.
Thanks for the comments!> Have you considered changing our existing behavior to match GCC's > builtin_object_size instead of adding a new parameterYup! My issue with turning `i1 %min` into `i8 %flags` is that __builtin_object_size would get lowered like: __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i8 2) __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i8 2) __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i8 3) (__builtin_object_size(p, 3) doesn't actually get lowered) ...Which might be a bit surprising to some people. If we think that's a non-issue, I'm happy to take the simpler approach and use an i8.> We'll have to have some <min>-awareness either in clang (to decide if > the <null-is-unknown> bit needs to be set) or in the middle end. What > is your plan here?My plan was just to always set the `null-is-unknown` bit when lowering a call to __builtin_object_size in clang. If `min` is true, we treat unknown and null values identically in @llvm.objectsize, so whether `null-is-unknown` is set in that case shouldn't matter.> However, since Malloc can return null:I think I was unclear: this behavior would only exist if @llvm.objectsize was actually handed null. I wasn't planning on changing how we'd handle memory allocation functions that may return null (GCC gives back 2 for __builtin_object_size(malloc(2), 0)). In other words, this `null-is-unknown` bit only makes the objectsize evaluator see `T* null` as though it was `call T* @opaque_user_defined_function()`, nothing else. This is also consistent with how @llvm.objectsize already acts: if the pointer it's given is the result of a call to `malloc(N)`, it'll return N regardless of the value of `min`. But yeah, I agree that some of the features of __builtin_object_size aren't entirely intuitive. :) On Mon, Jan 2, 2017 at 12:01 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Hi George, > > Have you considered changing our existing behavior to match GCC's > builtin_object_size instead of adding a new parameter? That may be > simpler overall. There's also a clear upgrade strategy -- fold every > old style call to "<min> ? 0 : 1". > > You probably already know this, but GCC folds > builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0. > We'll have to have some <min>-awareness either in clang (to decide if > the <null-is-unknown> bit needs to be set) or in the middle end. What > is your plan here? > > I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and > builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the > inverse. This follows from the following "intuitive" rules > > ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B)) > ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B)) > > (where X is a value that can either be A or B at runtime) > > and that we want to fold > > ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N > > However, since Malloc can return null: > > ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N > ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N > > and for that to be true ObjSizeMax(NULL) > builtin_object_size(NULL, 0) needs to be 0 and ObjSizeMin(NULL) > builtin_object_size(NULL, 2) needs to be (unsigned)-1. > > However, I'm not sure if it is up to us to change that; given the very > motivation of this thread is GCC compatibility. > > -- Sanjoy > > On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Happy New Year ping. :) > > > > Will ping again on Wednesday. If I don't get comments by EOD Thursday, > I'll > > assume everyone's OK with this and put together a patch. > > > > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV > > <george.burgess.iv at gmail.com> wrote: > >> > >> tl;dr: We'd like to add a bit to @llvm.objectsize to make it optionally > be > >> conservative when it's handed a null pointer. > >> > >> Happy Wednesday! > >> > >> We're trying to fix PR23277, which is a bug about how clang+LLVM treat > >> __builtin_object_size when it's given a null pointer. For compatibility > with > >> GCC, clang would like to be able to hand back a conservative result > (e.g. > >> (size_t)-1), but the LLVM intrinsic that clang lowers > >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back 0 when > >> given a null pointer. Because it's often assumed that > __builtin_object_size > >> always folds to a constant, handling this only in clang may be tricky: > if we > >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) : -1ULL) > and > >> LLVM can't fold away the null check, we've failed to emit a constant. > >> > >> So, the best path forward that I can see is to add a "null is unknown > >> size" bit to @llvm.objectsize, much like the "min" bit it already has. > If > >> this bit is true, null would be treated like an opaque pointer. > Otherwise, > >> @llvm.objectsize would act no differently than it does today. > >> > >> If we like this approach, I'm assuming it would be best to have this bit > >> as a third argument to @llvm.objectsize, rather than making the second > >> argument an i8 and using it as a bag of bits. > >> > >> All thoughts/comments/alternative approaches/etc. highly appreciated. :) > >> > >> Thanks (and Happy Holidays)! > >> George > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > -- > Sanjoy Das > http://playingwithpointers.com >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170102/99eb1bb1/attachment.html>
Sanjoy Das via llvm-dev
2017-Jan-02 19:45 UTC
[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.
Hi George, On Mon, Jan 2, 2017 at 10:41 AM, George Burgess IV <george.burgess.iv at gmail.com> wrote:> Thanks for the comments! > >> Have you considered changing our existing behavior to match GCC's >> builtin_object_size instead of adding a new parameter > > Yup! My issue with turning `i1 %min` into `i8 %flags` is that > __builtin_object_size would get lowered like: > > __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i8 2) > __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i8 2) > __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i8 3) > (__builtin_object_size(p, 3) doesn't actually get lowered) > > ...Which might be a bit surprising to some people. If we think that's a > non-issue, I'm happy to take the simpler approach and use an i8.What I had in mind was: __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i1 0) __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i1 0) __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i1 1) __builtin_object_size(p, 3) -> @llvm.objectsize(i8* %p, i1 1) and changing the specification of @llvm.objectsize to match.>> We'll have to have some <min>-awareness either in clang (to decide if >> the <null-is-unknown> bit needs to be set) or in the middle end. What >> is your plan here? > > My plan was just to always set the `null-is-unknown` bit when lowering a > call to __builtin_object_size in clang. If `min` is true, we treat unknown > and null values identically in @llvm.objectsize, so whether > `null-is-unknown` is set in that case shouldn't matter.Continuing from above: in other words, since your use case is always setting the `null-is-unknown` bit, can you re-define @llvm.objectsize to have that semantic without adding a new parameter? That's not backward compatible, but there's a simple but conservative update strategy.>> However, since Malloc can return null: > > I think I was unclear: this behavior would only exist if @llvm.objectsize > was actually handed null. I wasn't planning on changing how we'd handle > memory allocation functions that may return null (GCC gives back 2 for > __builtin_object_size(malloc(2), 0)). In other words, this `null-is-unknown` > bit only makes the objectsize evaluator see `T* null` as though it was `call > T* @opaque_user_defined_function()`, nothing else.I meant to say that it looks like the semantics of ObjectSizeMin(X) is that "return the conservative minimum object size for all values that X may take at runtime" (resp. ObjectSizeMax(X)). For instance this: void i(int c, volatile int* sink) { void* mem1 = malloc(20); void* mem2 = malloc(40); *sink = __builtin_object_size(c ? mem1 : mem2, 0); *sink = __builtin_object_size(c ? mem1 : mem2, 2); } is lowered to movl $40, (%rsi) movl $20, (%rsi) ret by GCC. Applying the same logic to malloc(N), since it returns a location that has N dereferenceable bytes or NULL, it follows that ObjectSizeMin(malloc(N)) should return the smaller of ObjectSizeMin(NULL) and ObjectSizeMin(MemoryLocOfNBytes) == the smaller of ObjectSizeMin(NULL) and N. Given that we want the result of ObjectSizeMin(malloc(N)) == UMIN(ObjectSizeMin(NULL), N) to be N, we'd want ObjectSizeMin(NULL) to be (unsigned)-1 for consistency. However, none of this matters if we're not in a position to change the specification. -- Sanjoy> This is also consistent with how @llvm.objectsize already acts: if the > pointer it's given is the result of a call to `malloc(N)`, it'll return N > regardless of the value of `min`. > > But yeah, I agree that some of the features of __builtin_object_size aren't > entirely intuitive. :) > > On Mon, Jan 2, 2017 at 12:01 AM, Sanjoy Das <sanjoy at playingwithpointers.com> > wrote: >> >> Hi George, >> >> Have you considered changing our existing behavior to match GCC's >> builtin_object_size instead of adding a new parameter? That may be >> simpler overall. There's also a clear upgrade strategy -- fold every >> old style call to "<min> ? 0 : 1". >> >> You probably already know this, but GCC folds >> builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0. >> We'll have to have some <min>-awareness either in clang (to decide if >> the <null-is-unknown> bit needs to be set) or in the middle end. What >> is your plan here? >> >> I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and >> builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the >> inverse. This follows from the following "intuitive" rules >> >> ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B)) >> ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B)) >> >> (where X is a value that can either be A or B at runtime) >> >> and that we want to fold >> >> ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N >> >> However, since Malloc can return null: >> >> ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N >> ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N >> >> and for that to be true ObjSizeMax(NULL) >> builtin_object_size(NULL, 0) needs to be 0 and ObjSizeMin(NULL) >> builtin_object_size(NULL, 2) needs to be (unsigned)-1. >> >> However, I'm not sure if it is up to us to change that; given the very >> motivation of this thread is GCC compatibility. >> >> -- Sanjoy >> >> On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > Happy New Year ping. :) >> > >> > Will ping again on Wednesday. If I don't get comments by EOD Thursday, >> > I'll >> > assume everyone's OK with this and put together a patch. >> > >> > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV >> > <george.burgess.iv at gmail.com> wrote: >> >> >> >> tl;dr: We'd like to add a bit to @llvm.objectsize to make it optionally >> >> be >> >> conservative when it's handed a null pointer. >> >> >> >> Happy Wednesday! >> >> >> >> We're trying to fix PR23277, which is a bug about how clang+LLVM treat >> >> __builtin_object_size when it's given a null pointer. For compatibility >> >> with >> >> GCC, clang would like to be able to hand back a conservative result >> >> (e.g. >> >> (size_t)-1), but the LLVM intrinsic that clang lowers >> >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back 0 >> >> when >> >> given a null pointer. Because it's often assumed that >> >> __builtin_object_size >> >> always folds to a constant, handling this only in clang may be tricky: >> >> if we >> >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) : -1ULL) >> >> and >> >> LLVM can't fold away the null check, we've failed to emit a constant. >> >> >> >> So, the best path forward that I can see is to add a "null is unknown >> >> size" bit to @llvm.objectsize, much like the "min" bit it already has. >> >> If >> >> this bit is true, null would be treated like an opaque pointer. >> >> Otherwise, >> >> @llvm.objectsize would act no differently than it does today. >> >> >> >> If we like this approach, I'm assuming it would be best to have this >> >> bit >> >> as a third argument to @llvm.objectsize, rather than making the second >> >> argument an i8 and using it as a bag of bits. >> >> >> >> All thoughts/comments/alternative approaches/etc. highly appreciated. >> >> :) >> >> >> >> Thanks (and Happy Holidays)! >> >> George >> > >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> >> >> >> -- >> Sanjoy Das >> http://playingwithpointers.com > >