Ralf Jung via llvm-dev
2021-Jun-22 17:21 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi,> Your proposal generally relies on certain optimizations not applying > to pointers because they mess up provenance as represented in > transitive use-dependencies. If those optimizations can be applied > to integers, you can lose use-dependencies in exactly the same way as > you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p > reduction doesn’t matter at all, because in either case that value > has a use-dependency on p, whereas the actual problem is that there > is no longer a use-dependency on some other value.Note that "provenance" as we use it in this discussion is an *explicit operational artifact* -- it exists as a concrete piece of state in the Abstract Machine. That is very different from something that might just be used internally in some kind of analysis. There is no problem with "resetting" that provenance on a "inttoptr", and basically forgetting about where the int comes from. Note that this is a statement about an operation in the Abstract Machine, not merely a statement about some analysis: this is not "forgetting" as in "safely overapproximating the real set of possible behaviors", it is "forgetting" by *forcing* the provenance to be some kind of wildcard/default provenance. All analyses then have to correctly account for that.> For example, you have compellingly argued that it’s problematic to > do the reduction |a == b ? a : b| to |b| for pointer types. Suppose > I instead do this optimization on integers where |a = ptrtoint A|. > The result is now simply |b|. If I |inttoptr| that result and access > the memory, there will be no record that that access may validly > be to |A|. It does not help that the access may be represented > as |inttoptr (ptrtoint B)| for some |B| rather than just directly > to |B|, because there is no use-dependence on |A|. All there is > is an apparently unrelated and unused |ptrtoint A|.So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being replaced by "ptrtoint B"? I don't see any problem with that. Do you have a concrete example?> Obviously we can avoid doing this optimization locally when we > see that the inputs result from |ptrtoint|, but that’s no more > than best-effort: we can do this optimization in a function which > we later inline in a caller that performs all the |ptrtoint| and > |inttoptr| casts.I agree "avoiding something locally" is way too fragile to be considered a solution. I do not think it is needed, though. Kind regards, Ralf
John McCall via llvm-dev
2021-Jun-22 18:52 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On 22 Jun 2021, at 13:21, Ralf Jung wrote:>> Your proposal generally relies on certain optimizations not applying >> to pointers because they mess up provenance as represented in >> transitive use-dependencies. If those optimizations can be applied >> to integers, you can lose use-dependencies in exactly the same way as >> you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p >> reduction doesn’t matter at all, because in either case that value >> has a use-dependency on p, whereas the actual problem is that there >> is no longer a use-dependency on some other value. > > Note that "provenance" as we use it in this discussion is an *explicit > operational artifact* -- it exists as a concrete piece of state in the > Abstract Machine. That is very different from something that might > just be used internally in some kind of analysis. > > There is no problem with "resetting" that provenance on a "inttoptr", > and basically forgetting about where the int comes from. Note that > this is a statement about an operation in the Abstract Machine, not > merely a statement about some analysis: this is not "forgetting" as in > "safely overapproximating the real set of possible behaviors", it is > "forgetting" by *forcing* the provenance to be some kind of > wildcard/default provenance. All analyses then have to correctly > account for that.But it’s not a truly wildcard provenance. At the very least, it’s restricted to the set of provenances that have been exposed, and my understanding is that Juneyoung’s non-determinism rule attempts to readmit a use-dependency analysis and turn `ptrtoint` back into a simple scalar operation.>> For example, you have compellingly argued that it’s problematic to >> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose >> I instead do this optimization on integers where |a = ptrtoint A|. >> The result is now simply |b|. If I |inttoptr| that result and access >> the memory, there will be no record that that access may validly >> be to |A|. It does not help that the access may be represented >> as |inttoptr (ptrtoint B)| for some |B| rather than just directly >> to |B|, because there is no use-dependence on |A|. All there is >> is an apparently unrelated and unused |ptrtoint A|. > > So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" > being replaced by "ptrtoint B"? I don't see any problem with that. Do > you have a concrete example?I think you can take any example where pointer-type restrictions would be necessary to protect against miscompilation and turn it into an example here by just inserting `inttoptr` and `ptrtoint` appropriately. Quick example: ``` int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = 0x4; else *((int*) result) = 0x8; printf(“%d %d\n”, A, B); ``` I submit that this program has unspecified but not undefined behavior, with printing “1 8” and “4 2” being the only two valid outcomes. But I think an optimizer which changes the fifth line to `long result = b;` without leaving any trace behind could easily compile this to print “1 2” because there would be nothing to prevent the initialization of `A` from being forwarded to the final load. You can prevent this by noting that the provenance of `A` has been xposed and allowing the `inttoptr` of `result` to alias `A`, but that seems inconsistent with treating `ptrtoint` as a simple scalar operation and allowing a use-analysis of a `ptrtoint` to restrict which `inttoptr` casts are allowed to recreate provenance for the `ptrtoint` operand. If you want to keep treating `ptrtoint` as a scalar operation and doing use-analyses on it, I think the most palatable option is to recognize whenever you’re cutting a use-dependency and conservatively record in IR that the original value has now been exposed. So if you start with this: ``` %eq = icmp eq i32 %a, %b %result = select i1 %eq, i32 %a, i32 %b ``` You have to transform it like this: ``` %result = %b call void @llvm.expose.i32(i32 %a) ``` You should be able to remove these exposure events in a lot of situations, but conservatively they’ll have to be treated as escapes. Most optimizations never cut use-dependencies on opaque values like this and so won’t be affected. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/e44e1d7c/attachment.html>