Ralf Jung via llvm-dev
2021-Jun-22 19:40 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi John,> Note that "provenance" as we use it in this discussion is an *explicit > operational artifact* -- it exists as a concrete piece of state in the > Abstract Machine. That is very different from something that might just be > used internally in some kind of analysis. > > There is no problem with "resetting" that provenance on a "inttoptr", and > basically forgetting about where the int comes from. Note that this is a > statement about an operation in the Abstract Machine, not merely a statement > about some analysis: this is not "forgetting" as in "safely > overapproximating the real set of possible behaviors", it is "forgetting" by > *forcing* the provenance to be some kind of wildcard/default provenance. All > analyses then have to correctly account for that. > > But it’s not a truly wildcard provenance. At the very least, it’s > restricted to the set of provenances that have been exposed, and > my understanding is that Juneyoung’s non-determinism rule attempts > to readmit a use-dependency analysis and turn |ptrtoint| back into > a simple scalar operation.So, at least in the paper Juneyoung and me and a few other people wrote (the one referenced a couple times in this thread already: https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a wildcard provenance. "exposed" is not a thing in the operational semantics, it is a theorem we can prove.> > For example, you have compellingly argued that it’s problematic to > do the reduction |a == b ? a : b| to |b| for pointer types. Suppose > I instead do this optimization on integers where |a = ptrtoint A|. > The result is now simply |b|. If I |inttoptr| that result and access > the memory, there will be no record that that access may validly > be to |A|. It does not help that the access may be represented > as |inttoptr (ptrtoint B)| for some |B| rather than just directly > to |B|, because there is no use-dependence on |A|. All there is > is an apparently unrelated and unused |ptrtoint A|. > > So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being > replaced by "ptrtoint B"? I don't see any problem with that. Do you have a > concrete example? > > I think you can take any example where pointer-type restrictions > would be necessary to protect against miscompilation and turn it > into an example here by just inserting |inttoptr| and |ptrtoint| > appropriately. Quick example: > > |int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result > = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = 0x4; else *((int*) > result) = 0x8; printf(“%d %d\n”, A, B); |(Sorry about the bad quote, not sure what Thunderbird does here.) I assume you mean "&A" and "&B" in lines 3 and 4?> I submit that this program has unspecified but not undefined behavior, > with printing “1 8” and “4 2” being the only two valid outcomes.Okay, I can follow that.> But I think an optimizer which changes the fifth line to > |long result = b;| without leaving any trace behindso then we are at int A = 0x1; int B = 0x2; long a = (long) (&A+1); long b = (long) &B; long result = b; if (a == b) *(((int*) result) - 1) = 0x4; else *((int*) result) = 0x8; printf(“%d %d\n”, A, B);> could easily > compile this to print “1 2” because there would be nothing to > prevent the initialization of |A| from being forwarded to the > final load.I don't think that's right, since "a" is still used in the "if", so a bit of information about the address is leaked and you can't assume the address was not guessed. The furthest you can go is int A = 0x1; int B = 0x2; long a = (long) (&A+1); long b = (long) &B; if (a == b) *(((int*) (long) &B) - 1) = 0x4; else *((int*) (long) &B) = 0x8; printf(“%d %d\n”, A, B); No need for an explicit 'exposed', I think. That said, things get more tricky once you want to account for C 'restrict' / LLVM 'noalias', and then it might be necessary to have explicit 'exposed'-like operations. I haven't seen a properly worked-out model for this, but the candidate models I saw would need something like this. So maybe it'd not be a bad idea to have such an operation anyway... I just don't think it should have any effect for the kind of programs we have been discussing here so far. Kind regards, Ralf> > You can prevent this by noting that the provenance of |A| has been > xposed and allowing the |inttoptr| of |result| to alias |A|, but > that seems inconsistent with treating |ptrtoint| as a simple scalar > operation and allowing a use-analysis of a |ptrtoint| to restrict > which |inttoptr| casts are allowed to recreate provenance for the > |ptrtoint| operand. > > If you want to keep treating |ptrtoint| as a scalar operation and > doing use-analyses on it, I think the most palatable option is to > recognize whenever you’re cutting a use-dependency and > conservatively record in IR that the original value has now been > exposed. So if you start with this: > > |%eq = icmp eq i32 %a, %b %result = select i1 %eq, i32 %a, i32 %b | > > You have to transform it like this: > > |%result = %b call void @llvm.expose.i32(i32 %a) | > > You should be able to remove these exposure events in a lot of > situations, but conservatively they’ll have to be treated as > escapes. > > Most optimizations never cut use-dependencies on opaque values > like this and so won’t be affected. > > John. >-- Website: https://people.mpi-sws.org/~jung/
John McCall via llvm-dev
2021-Jun-22 20:52 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On 22 Jun 2021, at 15:40, Ralf Jung wrote:>> Note that "provenance" as we use it in this discussion is an >> *explicit >> operational artifact* -- it exists as a concrete piece of state >> in the >> Abstract Machine. That is very different from something that >> might just be >> used internally in some kind of analysis. >> >> There is no problem with "resetting" that provenance on a >> "inttoptr", and >> basically forgetting about where the int comes from. Note that >> this is a >> statement about an operation in the Abstract Machine, not merely >> a statement >> about some analysis: this is not "forgetting" as in "safely >> overapproximating the real set of possible behaviors", it is >> "forgetting" by >> *forcing* the provenance to be some kind of wildcard/default >> provenance. All >> analyses then have to correctly account for that. >> >> But it’s not a truly wildcard provenance. At the very least, it’s >> restricted to the set of provenances that have been exposed, and >> my understanding is that Juneyoung’s non-determinism rule attempts >> to readmit a use-dependency analysis and turn |ptrtoint| back into >> a simple scalar operation. > > So, at least in the paper Juneyoung and me and a few other people > wrote (the one referenced a couple times in this thread already: > https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a > wildcard provenance. > "exposed" is not a thing in the operational semantics, it is a theorem > we can prove.Okay, fine, I won’t say “exposure”. :) You intend to still be able to prove that the reconstructed provenance cannot be that of certain memory locations based on a comprehensive use analysis of those locations.>> For example, you have compellingly argued that it’s >> problematic to >> do the reduction |a == b ? a : b| to |b| for pointer types. >> Suppose >> I instead do this optimization on integers where |a = >> ptrtoint A|. >> The result is now simply |b|. If I |inttoptr| that result and >> access >> the memory, there will be no record that that access may >> validly >> be to |A|. It does not help that the access may be >> represented >> as |inttoptr (ptrtoint B)| for some |B| rather than just >> directly >> to |B|, because there is no use-dependence on |A|. All there >> is >> is an apparently unrelated and unused |ptrtoint A|. >> >> So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : >> ptrtoint B" being >> replaced by "ptrtoint B"? I don't see any problem with that. Do >> you have a >> concrete example? >> >> I think you can take any example where pointer-type restrictions >> would be necessary to protect against miscompilation and turn it >> into an example here by just inserting |inttoptr| and |ptrtoint| >> appropriately. Quick example: >> >> |int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; >> long result = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = >> 0x4; else *((int*) result) = 0x8; printf(“%d %d\n”, A, B); | > > (Sorry about the bad quote, not sure what Thunderbird does here.) > I assume you mean "&A" and "&B" in lines 3 and 4?Yes, sorry.>> I submit that this program has unspecified but not undefined >> behavior, >> with printing “1 8” and “4 2” being the only two valid >> outcomes. > > Okay, I can follow that. > >> But I think an optimizer which changes the fifth line to >> |long result = b;| without leaving any trace behind > > so then we are at > > int A = 0x1; > int B = 0x2; > long a = (long) (&A+1); > long b = (long) &B; > long result = b; > if (a == b) > *(((int*) result) - 1) = 0x4; > else > *((int*) result) = 0x8; > printf(“%d %d\n”, A, B); > > >> could easily >> compile this to print “1 2” because there would be nothing to >> prevent the initialization of |A| from being forwarded to the >> final load. > > I don't think that's right, since "a" is still used in the "if", so a > bit of information about the address is leaked and you can't assume > the address was not guessed.Okay, so you have a baseline expectation that pointer comparisons will be treated as escapes. The reason I added a second comparison in my example is just because otherwise in the `a != b` case the access is out of bounds. So I guess you’re suggesting that it’s impossible to construct an example that avoids that problem that wouldn’t ultimately do something uneliminatable that would count as an escape?> That said, things get more tricky once you want to account for C > 'restrict' / LLVM 'noalias', and then it might be necessary to have > explicit 'exposed'-like operations. I haven't seen a properly > worked-out model for this, but the candidate models I saw would need > something like this. So maybe it'd not be a bad idea to have such an > operation anyway... I just don't think it should have any effect for > the kind of programs we have been discussing here so far.Honestly, part of why I like this approach is that I think the basic idea — recognizing when we’re doing a dependency-breaking value replacement and using a different replacement sequence — also moves us towards a solution for `consume` dependencies. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/e057e169/attachment.html>