Keno Fischer via llvm-dev
2017-Apr-06 21:50 UTC
[llvm-dev] Dereferenceable load semantics & LICM
I left a comment on https://reviews.llvm.org/D18738 earlier which I feel is related to this discussion, so I'm reproducing it here: I wonder though if what we want to express isn't some sort of "type-based dereferencability annotations". For example the semantics I care about are essentially, "if you know you have a defereferencable pointer, you can go and dereference any other valid (managed) pointers the pointed to data references (recursively)". This has to be true for me, because the GC walks all pointers recursively that way. Of course the problem with this is that the compiler doesn't know which part of the referenced data are valid pointers for this purpose (and it can't just be based on the struct type, because we do store pointers to unmanaged data). So if we had a way to express to the compiler "here are the pointers in this struct that you may assume dereferencable", that would work very well for me. On Thu, Apr 6, 2017 at 5:47 PM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi Piotr, > > On April 6, 2017 at 9:28:58 AM, Piotr Padlewski > (piotr.padlewski at gmail.com) wrote: >> Hi Sanjoy, >> My point is that this it is exactly the same way as normal !dereferenceable >> introduces UB. >> >> ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8} >> if (false) { >> int val = *ptr; >> } > > These are two different things. > > In the above example, your original program executes a load that was > supposed to produce a dereferenceable value, but it did not. This > means the original program is undefined, so we can optimize it to do > whatever we want. > > OTOH, look at this program: > > void main() { > if (false) { > // ptrptr does not contain a dereferenceable pointer, but is > itself dereferenceable > ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global > int val = *ptr; > } > } > > What is the behavior of the above program? It better be well defined, > since it does nothing! > > However, if we follow your rules, we can first xform it to > > void main() { > ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global > if (false) { > int val = *ptr; > } > } > > and then to > > void main() { > ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global > int val = *ptr; > if (false) { > } > } > > which introduces UB into a program that was well defined to begin > with. > >> If frontend says that something is dereferenceable, which is not actually >> dereferenceable, then it is UB and everything can happen - like the >> execution of dead instruction. >> This is exactly the same with the global properties - we are giving a >> guarantee that pointer it is dereferenceable even if we would hoist or sink >> it, and if it is not true then it is UB. > > But then you're saying dead code (code that would not actually execute > in the original program) can affect program behavior, and: "if (false) > X;" is not a no-op for some values of X. It is in this respect that > your proposal is similar to the speculatable proposal. > > -- Sanjoy > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Sanjoy Das via llvm-dev
2017-Apr-06 21:54 UTC
[llvm-dev] Dereferenceable load semantics & LICM
+CC Artur On April 6, 2017 at 2:51:17 PM, Keno Fischer (keno at juliacomputing.com) wrote:> I left a comment on https://reviews.llvm.org/D18738 earlier which I > feel is related to this discussion, so I'm reproducing it here: > > I wonder though if what we want to express isn't some sort of > "type-based dereferencability annotations". For example the semantics > I care about are essentially, "if you know you have a defereferencable > pointer, you can go and dereference any other valid (managed) pointers > the pointed to data references (recursively)". This has to be true for > me, because the GC walks all pointers recursively that way. Of course > the problem with this is that the compiler doesn't know which part of > the referenced data are valid pointers for this purpose (and it can't > just be based on the struct type, because we do store pointers to > unmanaged data). So if we had a way to express to the compiler "here > are the pointers in this struct that you may assume dereferencable", > that would work very well for me.That is very close to what we (Azul) do internally (I think Artur covered some of this on his EuroLLVM talk). If we have: if (x instanceof java.lang.String) { if (arbitraryCondition) { char[]v = x.value; } } we know we can hoist the load of x.value outside the arbitraryCondition check (modulo aliasing). -- Sanjoy> > On Thu, Apr 6, 2017 at 5:47 PM, Sanjoy Das via llvm-dev > wrote: > > Hi Piotr, > > > > On April 6, 2017 at 9:28:58 AM, Piotr Padlewski > > (piotr.padlewski at gmail.com) wrote: > >> Hi Sanjoy, > >> My point is that this it is exactly the same way as normal !dereferenceable > >> introduces UB. > >> > >> ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8} > >> if (false) { > >> int val = *ptr; > >> } > > > > These are two different things. > > > > In the above example, your original program executes a load that was > > supposed to produce a dereferenceable value, but it did not. This > > means the original program is undefined, so we can optimize it to do > > whatever we want. > > > > OTOH, look at this program: > > > > void main() { > > if (false) { > > // ptrptr does not contain a dereferenceable pointer, but is > > itself dereferenceable > > ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global > > int val = *ptr; > > } > > } > > > > What is the behavior of the above program? It better be well defined, > > since it does nothing! > > > > However, if we follow your rules, we can first xform it to > > > > void main() { > > ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global > > if (false) { > > int val = *ptr; > > } > > } > > > > and then to > > > > void main() { > > ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global > > int val = *ptr; > > if (false) { > > } > > } > > > > which introduces UB into a program that was well defined to begin > > with. > > > >> If frontend says that something is dereferenceable, which is not actually > >> dereferenceable, then it is UB and everything can happen - like the > >> execution of dead instruction. > >> This is exactly the same with the global properties - we are giving a > >> guarantee that pointer it is dereferenceable even if we would hoist or sink > >> it, and if it is not true then it is UB. > > > > But then you're saying dead code (code that would not actually execute > > in the original program) can affect program behavior, and: "if (false) > > X;" is not a no-op for some values of X. It is in this respect that > > your proposal is similar to the speculatable proposal. > > > > -- Sanjoy > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Philip Reames via llvm-dev
2017-Apr-07 16:38 UTC
[llvm-dev] Dereferenceable load semantics & LICM
On 04/06/2017 02:54 PM, Sanjoy Das via llvm-dev wrote:> +CC Artur > > On April 6, 2017 at 2:51:17 PM, Keno Fischer (keno at juliacomputing.com) wrote: >> I left a comment on https://reviews.llvm.org/D18738 earlier which I >> feel is related to this discussion, so I'm reproducing it here: >> >> I wonder though if what we want to express isn't some sort of >> "type-based dereferencability annotations". For example the semantics >> I care about are essentially, "if you know you have a defereferencable >> pointer, you can go and dereference any other valid (managed) pointers >> the pointed to data references (recursively)". This has to be true for >> me, because the GC walks all pointers recursively that way. Of course >> the problem with this is that the compiler doesn't know which part of >> the referenced data are valid pointers for this purpose (and it can't >> just be based on the struct type, because we do store pointers to >> unmanaged data). So if we had a way to express to the compiler "here >> are the pointers in this struct that you may assume dereferencable", >> that would work very well for me. > That is very close to what we (Azul) do internally (I think Artur > covered some of this on his EuroLLVM talk). > > If we have: > > if (x instanceof java.lang.String) { > if (arbitraryCondition) { > char[]v = x.value; > } > } > > we know we can hoist the load of x.value outside the > arbitraryCondition check (modulo aliasing).As mentioned in the talk and discussion afterwards, we're open to trying to upstream the logic we've developed around expressing java types over IR. We'd have to generalize it a bit, but if others would find it helpful, we'd be perfectly willing to share this code.> > -- Sanjoy > >> On Thu, Apr 6, 2017 at 5:47 PM, Sanjoy Das via llvm-dev >> wrote: >>> Hi Piotr, >>> >>> On April 6, 2017 at 9:28:58 AM, Piotr Padlewski >>> (piotr.padlewski at gmail.com) wrote: >>>> Hi Sanjoy, >>>> My point is that this it is exactly the same way as normal !dereferenceable >>>> introduces UB. >>>> >>>> ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8} >>>> if (false) { >>>> int val = *ptr; >>>> } >>> These are two different things. >>> >>> In the above example, your original program executes a load that was >>> supposed to produce a dereferenceable value, but it did not. This >>> means the original program is undefined, so we can optimize it to do >>> whatever we want. >>> >>> OTOH, look at this program: >>> >>> void main() { >>> if (false) { >>> // ptrptr does not contain a dereferenceable pointer, but is >>> itself dereferenceable >>> ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global >>> int val = *ptr; >>> } >>> } >>> >>> What is the behavior of the above program? It better be well defined, >>> since it does nothing! >>> >>> However, if we follow your rules, we can first xform it to >>> >>> void main() { >>> ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global >>> if (false) { >>> int val = *ptr; >>> } >>> } >>> >>> and then to >>> >>> void main() { >>> ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global >>> int val = *ptr; >>> if (false) { >>> } >>> } >>> >>> which introduces UB into a program that was well defined to begin >>> with. >>> >>>> If frontend says that something is dereferenceable, which is not actually >>>> dereferenceable, then it is UB and everything can happen - like the >>>> execution of dead instruction. >>>> This is exactly the same with the global properties - we are giving a >>>> guarantee that pointer it is dereferenceable even if we would hoist or sink >>>> it, and if it is not true then it is UB. >>> But then you're saying dead code (code that would not actually execute >>> in the original program) can affect program behavior, and: "if (false) >>> X;" is not a no-op for some values of X. It is in this respect that >>> your proposal is similar to the speculatable proposal. >>> >>> -- Sanjoy >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Artur Pilipenko via llvm-dev
2017-Apr-07 17:51 UTC
[llvm-dev] Dereferenceable load semantics & LICM
On 7 Apr 2017, at 00:54, Sanjoy Das <sanjoy at playingwithpointers.com<mailto:sanjoy at playingwithpointers.com>> wrote: +CC Artur On April 6, 2017 at 2:51:17 PM, Keno Fischer (keno at juliacomputing.com<mailto:keno at juliacomputing.com>) wrote: I left a comment on https://reviews.llvm.org/D18738 earlier which I feel is related to this discussion, so I'm reproducing it here: I wonder though if what we want to express isn't some sort of "type-based dereferencability annotations". For example the semantics I care about are essentially, "if you know you have a defereferencable pointer, you can go and dereference any other valid (managed) pointers the pointed to data references (recursively)". This has to be true for me, because the GC walks all pointers recursively that way. Of course the problem with this is that the compiler doesn't know which part of the referenced data are valid pointers for this purpose (and it can't just be based on the struct type, because we do store pointers to unmanaged data). So if we had a way to express to the compiler "here are the pointers in this struct that you may assume dereferencable", that would work very well for me. That is very close to what we (Azul) do internally (I think Artur covered some of this on his EuroLLVM talk). The slides and videos from the conference have not yet been published, but I posted my slides on slideshare: https://www.slideshare.net/secret/Bj7JJPimlxjwGb In a few words, we introduced the notion of Java types which are expressed in the IR using attributes and metadata, we derive various facts from the Java types including dereferenceability. Artur If we have: if (x instanceof java.lang.String) { if (arbitraryCondition) { char[]v = x.value; } } we know we can hoist the load of x.value outside the arbitraryCondition check (modulo aliasing). -- Sanjoy On Thu, Apr 6, 2017 at 5:47 PM, Sanjoy Das via llvm-dev wrote: Hi Piotr, On April 6, 2017 at 9:28:58 AM, Piotr Padlewski (piotr.padlewski at gmail.com<mailto:piotr.padlewski at gmail.com>) wrote: Hi Sanjoy, My point is that this it is exactly the same way as normal !dereferenceable introduces UB. ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8} if (false) { int val = *ptr; } These are two different things. In the above example, your original program executes a load that was supposed to produce a dereferenceable value, but it did not. This means the original program is undefined, so we can optimize it to do whatever we want. OTOH, look at this program: void main() { if (false) { // ptrptr does not contain a dereferenceable pointer, but is itself dereferenceable ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global int val = *ptr; } } What is the behavior of the above program? It better be well defined, since it does nothing! However, if we follow your rules, we can first xform it to void main() { ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global if (false) { int val = *ptr; } } and then to void main() { ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global int val = *ptr; if (false) { } } which introduces UB into a program that was well defined to begin with. If frontend says that something is dereferenceable, which is not actually dereferenceable, then it is UB and everything can happen - like the execution of dead instruction. This is exactly the same with the global properties - we are giving a guarantee that pointer it is dereferenceable even if we would hoist or sink it, and if it is not true then it is UB. But then you're saying dead code (code that would not actually execute in the original program) can affect program behavior, and: "if (false) X;" is not a no-op for some values of X. It is in this respect that your proposal is similar to the speculatable proposal. -- Sanjoy _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/9eeb32ef/attachment.html>