Daniel Berlin via llvm-dev
2017-Apr-12 01:16 UTC
[llvm-dev] Potential issue with noalias @malloc and @realloc
The corresponding line does not exist, and it is in fact, wrong :) C11 says nothing like that. C++14 says: "The lifetime of an object of type T begins when: — storage with the proper alignment and size for type T is obtained, and — if the object has non-trivial initialization, its initialization is complete. The lifetime of an object of type T ends when: — if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or — the storage which the object occupies is reused or released." it also says: "If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if: ... a bunch of conditions". Which makes it even worse because they become aliasing again. On Tue, Apr 11, 2017 at 5:09 PM, Flamedoge via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I don't know when this was added on cppreference but > > > The behavior is undefined if after free() returns, an access is made > through the pointer ptr (unless another allocation function happened to > result in a pointer value equal to ptr) > > This seems to suggest that there is no UB... However, I couldn't find the > corresponding line or relevant part on latest C std, > http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf > > Regards, > Kevin > > On Tue, Apr 11, 2017 at 4:27 PM, Sanjoy Das <sanjoy at playingwithpointers. > com> wrote: > >> Hi Kevin, >> >> On April 11, 2017 at 4:14:14 PM, Flamedoge (code.kchoi at gmail.com) wrote: >> > So only "non-freed" malloc pointers are No-Alias which makes it >> > flow-sensitive. There is no reason why malloc couldn't return previously >> > freed location. >> >> Yes. >> >> Talking to Nick Lewycky on IRC, I figured out a shorter way of saying >> what I wanted to say. We know that programs like this are UB in C: >> >> p0 = malloc(); >> free(p0); >> p1 = malloc(); >> if (p0 == p1) { >> int v = *p0; // Semantically free'ed but bitwise equal to an allocated >> value >> } >> >> and we relied on them having UB when marking malloc's return value as >> noalias. >> >> However, we can end up in cases like the above by applying >> loop-unswitch + GVN to well defined C programs. >> >> -- Sanjoy >> > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170411/e2e8728c/attachment-0001.html>
Sanjoy Das via llvm-dev
2017-Apr-12 01:45 UTC
[llvm-dev] Potential issue with noalias @malloc and @realloc
Hi Daniel, On April 11, 2017 at 6:16:32 PM, Daniel Berlin (dberlin at dberlin.org) wrote:> The corresponding line does not exist, and it is in fact, wrong :) > > C11 says nothing like that. > C++14 says: > "The lifetime of an object of type T begins when: > — storage with the proper alignment and size for type T is obtained, and — > if the object has non-trivial initialization, its initialization is > complete. > The lifetime of an object of type T ends when: > — if T is a class type with a non-trivial destructor (12.4), the destructor > call starts, or > — the storage which the object occupies is reused or released." > > it also says: > "If, after the lifetime of an object has ended and before the storage which > the object occupied is reused or released, a new object is created at the > storage location which the original object occupied, a pointer that pointed > to the original object, a reference that referred to the original object, > or the name of the original object will automatically refer to the new > object and, once the lifetime of the new object has started, can be used to > manipulate the new object, if: > ... > a bunch of conditions".Quickly scanning the spec, it looks like it tries to differentiate between the "lifetime" of an object (starts on construction, ends on destruction) and the time during which the storage in which the object resides has been acquired and not been released. In particular, it says "If, after the lifetime of an object has ended and **before** the storage which the object occupied is reused or released", so I suspect that wording is there to allow: X* val = new X; val->~X(); new(val) X; val->foo(); // valid use Also, basic.stc.dynamic.deallocation says: If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. *Any other use of an invalid pointer value has implementation-defined behavior.* -- Sanjoy
Richard Smith via llvm-dev
2017-Apr-12 19:06 UTC
[llvm-dev] Potential issue with noalias @malloc and @realloc
On 11 April 2017 at 18:16, Daniel Berlin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> The corresponding line does not exist, and it is in fact, wrong :) > > C11 says nothing like that. > C++14 says: > "The lifetime of an object of type T begins when: > — storage with the proper alignment and size for type T is obtained, and — > if the object has non-trivial initialization, its initialization is > complete. > The lifetime of an object of type T ends when: > — if T is a class type with a non-trivial destructor (12.4), the > destructor call starts, or > — the storage which the object occupies is reused or released." > > it also says: > "If, after the lifetime of an object has ended and before the storage > which the object occupied is reused or released, a new object is created at > the storage location which the original object occupied, a pointer that > pointed to the original object, a reference that referred to the original > object, or the name of the original object will automatically refer to the > new object and, once the lifetime of the new object has started, can be > used to manipulate the new object, if: >This wording describes what happens if you, for instance, placement new over an existing object. It is not intended to cover the case where you happen to reallocate storage you previously freed; see below for the rule on that case.> ... > a bunch of conditions". > > Which makes it even worse because they become aliasing again. >I believe this is the portion of the C++ standard you're looking for: [basic.stc]/4: "When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values (6.9.2). Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior." (In C++14 and before, this was [basic.dynamic.deallocation]p4 and only applied to the effects of invoking 'operator delete', but we added the above wording as a defect resolution so it's intended to apply retroactively.) So the comparison in the source program has UB, and the loop unswitching transformation is therefore invalid as an operation on C++ programs. (I don't think this observation helps, though, since we -- presumably -- want the transformation to be valid as an operation on LLVM IR...) It seems to me that there are two ways of thinking about this: either the value of a pointer in IR is richer than its bit sequence, in which case replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect transformation if you cannot prove that one pointer was based on the other, or the value of a pointer in IR is exactly its bit sequence, in which case the code performing the transformation incorrectly updated the IR and a correct transformation would need to somehow remove the noalias from the malloc calls. The C++ object model formally takes the former standpoint; its pointers notionally point to objects, which are abstract entities occupying storage, rather than pointing to the storage itself. On Tue, Apr 11, 2017 at 5:09 PM, Flamedoge via llvm-dev <> llvm-dev at lists.llvm.org> wrote: > >> I don't know when this was added on cppreference but >> >> > The behavior is undefined if after free() returns, an access is made >> through the pointer ptr (unless another allocation function happened to >> result in a pointer value equal to ptr) >> >> This seems to suggest that there is no UB... However, I couldn't find the >> corresponding line or relevant part on latest C std, >> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf >> >> Regards, >> Kevin >> >> On Tue, Apr 11, 2017 at 4:27 PM, Sanjoy Das < >> sanjoy at playingwithpointers.com> wrote: >> >>> Hi Kevin, >>> >>> On April 11, 2017 at 4:14:14 PM, Flamedoge (code.kchoi at gmail.com) wrote: >>> > So only "non-freed" malloc pointers are No-Alias which makes it >>> > flow-sensitive. There is no reason why malloc couldn't return >>> previously >>> > freed location. >>> >>> Yes. >>> >>> Talking to Nick Lewycky on IRC, I figured out a shorter way of saying >>> what I wanted to say. We know that programs like this are UB in C: >>> >>> p0 = malloc(); >>> free(p0); >>> p1 = malloc(); >>> if (p0 == p1) { >>> int v = *p0; // Semantically free'ed but bitwise equal to an allocated >>> value >>> } >>> >>> and we relied on them having UB when marking malloc's return value as >>> noalias. >>> >>> However, we can end up in cases like the above by applying >>> loop-unswitch + GVN to well defined C programs. >>> >>> -- Sanjoy >>> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/bc646f76/attachment.html>
Daniel Berlin via llvm-dev
2017-Apr-12 19:19 UTC
[llvm-dev] Potential issue with noalias @malloc and @realloc
> > > It seems to me that there are two ways of thinking about this: either the > value of a pointer in IR is richer than its bit sequence, in which case > replacing p1 with p0 in a block predicated by p0 == p1 is an incorrect > transformation if you cannot prove that one pointer was based on the other, >Which would be a non-starter just from the cost of doing so, not to mention the allowable optimization you lose :)> or the value of a pointer in IR is exactly its bit sequence, in which case > the code performing the transformation incorrectly updated the IR and a > correct transformation would need to somehow remove the noalias from the > malloc calls. >Sure, but noalias is just a symptom here. You can make the same thing occur in other ways. It's fundamentally an issue of being able to express when the abstract identity of a pointer has changed even when the bit-value has not. IE there are enough transformation updates that need to occur that we probably need to do something different than try to band-aid/patch all the places that will have this issue. The C++ object model formally takes the former standpoint; its pointers> notionally point to objects, which are abstract entities occupying storage, > rather than pointing to the storage itself. >Which, i get why they do (in fact i would do the same), but saying the abstract objects have an identity outside of the bit values of the pointers means that the IR's need to be able to represent identity changes separately from value changes (this is what i meant by "add support for describing lifetimes that has semantic meaning"). I'm not aware of any compiler that does this effectively, and it's a fairly large semantic change. They all pretty much hack it and hope they don't break too much shit. Separately, changing noalias would just band-aid it. You can make the same thing occur with TBAA, placement new, or really, any way we have where the abstract identity may change but llvm doesn't express it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170412/b39afb4c/attachment.html>