thr3ads.net - llvm dev - [llvm-dev] [RFC] Introducing a byte type to LLVM [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2021-Jun-22 17:21 UTC

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi,
> Your proposal generally relies on certain optimizations not applying
> to pointers because they mess up provenance as represented in
> transitive use-dependencies. If those optimizations can be applied
> to integers, you can lose use-dependencies in exactly the same way as
> you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p
> reduction doesn’t matter at all, because in either case that value
> has a use-dependency on p, whereas the actual problem is that there
> is no longer a use-dependency on some other value.
Note that "provenance" as we use it in this discussion is an *explicit
operational artifact* -- it exists as a concrete piece of state in the Abstract 
Machine. That is very different from something that might just be used 
internally in some kind of analysis.

There is no problem with "resetting" that provenance on a
"inttoptr", and
basically forgetting about where the int comes from. Note that this is a 
statement about an operation in the Abstract Machine, not merely a statement 
about some analysis: this is not "forgetting" as in "safely
overapproximating
the real set of possible behaviors", it is "forgetting" by
*forcing* the
provenance to be some kind of wildcard/default provenance. All analyses then 
have to correctly account for that.
> For example, you have compellingly argued that it’s problematic to
> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
> I instead do this optimization on integers where |a = ptrtoint A|.
> The result is now simply |b|. If I |inttoptr| that result and access
> the memory, there will be no record that that access may validly
> be to |A|. It does not help that the access may be represented
> as |inttoptr (ptrtoint B)| for some |B| rather than just directly
> to |B|, because there is no use-dependence on |A|. All there is
> is an apparently unrelated and unused |ptrtoint A|.
So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B"
being
replaced by "ptrtoint B"? I don't see any problem with that. Do
you have a
concrete example?
> Obviously we can avoid doing this optimization locally when we
> see that the inputs result from |ptrtoint|, but that’s no more
> than best-effort: we can do this optimization in a function which
> we later inline in a caller that performs all the |ptrtoint| and
> |inttoptr| casts.
I agree "avoiding something locally" is way too fragile to be
considered a
solution. I do not think it is needed, though.

Kind regards,
Ralf

John McCall via llvm-dev

2021-Jun-22 18:52 UTC

head link

[llvm-dev] [RFC] Introducing a byte type to LLVM

On 22 Jun 2021, at 13:21, Ralf Jung wrote:>> Your proposal generally relies on certain optimizations not applying
>> to pointers because they mess up provenance as represented in
>> transitive use-dependencies. If those optimizations can be applied
>> to integers, you can lose use-dependencies in exactly the same way as
>> you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p
>> reduction doesn’t matter at all, because in either case that value
>> has a use-dependency on p, whereas the actual problem is that there
>> is no longer a use-dependency on some other value.
>
> Note that "provenance" as we use it in this discussion is an
*explicit
> operational artifact* -- it exists as a concrete piece of state in the 
> Abstract Machine. That is very different from something that might 
> just be used internally in some kind of analysis.
>
> There is no problem with "resetting" that provenance on a
"inttoptr",
> and basically forgetting about where the int comes from. Note that 
> this is a statement about an operation in the Abstract Machine, not 
> merely a statement about some analysis: this is not "forgetting"
as in
> "safely overapproximating the real set of possible behaviors", it
is
> "forgetting" by *forcing* the provenance to be some kind of 
> wildcard/default provenance. All analyses then have to correctly 
> account for that.
But it’s not a truly wildcard provenance.  At the very least, it’s
restricted to the set of provenances that have been exposed, and
my understanding is that Juneyoung’s non-determinism rule attempts
to readmit a use-dependency analysis and turn `ptrtoint` back into
a simple scalar operation.
>> For example, you have compellingly argued that it’s problematic to
>> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
>> I instead do this optimization on integers where |a = ptrtoint A|.
>> The result is now simply |b|. If I |inttoptr| that result and access
>> the memory, there will be no record that that access may validly
>> be to |A|. It does not help that the access may be represented
>> as |inttoptr (ptrtoint B)| for some |B| rather than just directly
>> to |B|, because there is no use-dependence on |A|. All there is
>> is an apparently unrelated and unused |ptrtoint A|.
>
> So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint
B"
> being replaced by "ptrtoint B"? I don't see any problem with
that. Do
> you have a concrete example?
I think you can take any example where pointer-type restrictions
would be necessary to protect against miscompilation and turn it
into an example here by just inserting `inttoptr` and `ptrtoint`
appropriately.  Quick example:

```
int A = 0x1;
int B = 0x2;
long a = (long) (A+1);
long b = (long) B;
long result = (a == b ? a : b);
if (a == b)
   *(((int*) result) - 1) = 0x4;
else
   *((int*) result) = 0x8;
printf(“%d %d\n”, A, B);
```

I submit that this program has unspecified but not undefined behavior,
with printing “1 8” and “4 2” being the only two valid outcomes.
But I think an optimizer which changes the fifth line to
`long result = b;` without leaving any trace behind could easily
compile this to print “1 2” because there would be nothing to
prevent the initialization of `A` from being forwarded to the
final load.

You can prevent this by noting that the provenance of `A` has been
xposed and allowing the `inttoptr` of `result` to alias `A`, but
that seems inconsistent with treating `ptrtoint` as a simple scalar
operation and allowing a use-analysis of a `ptrtoint` to restrict
which `inttoptr` casts are allowed to recreate provenance for the
`ptrtoint` operand.

If you want to keep treating `ptrtoint` as a scalar operation and
doing use-analyses on it, I think the most palatable option is to
recognize whenever you’re cutting a use-dependency and
conservatively record in IR that the original value has now been
exposed.  So if you start with this:

```
   %eq = icmp eq i32 %a, %b
   %result = select i1 %eq, i32 %a, i32 %b
```

You have to transform it like this:

```
   %result = %b
   call void @llvm.expose.i32(i32 %a)
```

You should be able to remove these exposure events in a lot of
situations, but conservatively they’ll have to be treated as
escapes.

Most optimizations never cut use-dependencies on opaque values
like this and so won’t be affected.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/e44e1d7c/attachment.html>

llvm dev - Jun 2021 - [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM