thr3ads.net - llvm dev - [llvm-dev] [RFC] Introducing a byte type to LLVM [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Ralf Jung via llvm-dev

2021-Jun-22 09:58 UTC

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi John,
> Unfortunately, though, I this non-determinism still doesn’t allow LLVM
> to be anywhere near as naive about pointer-to-int casts as it is today.
Definitely. There are limits to how naive one can be; beyond those limits, 
miscompilations lurk.
<https://www.ralfj.de/blog/2020/12/14/provenance.html>
explains this by showing such a miscompilation arising from three naive 
optimizations being chained together.
> The rule is intended to allow the compiler to start doing use-analysis
> of exposures; let’s assume that this analysis doesn’t see any
> un-analyzable uses, since of course it would need to conservatively
> treat them as escapes. But if we can optimize uses of integers as if
> they didn’t carry pointer data — say, in a function that takes integer
> parameters — and then we can apply those optimized uses to integers
> that concretely result from pointer-to-int casts — say, by inlining
> that function into one of its callers — can’t we end up with a use
> pattern for one or more of those pointer-to-int casts that no longer
> reflects the fact that it’s been exposed? It seems to me that either
> (1) we cannot do those optimizations on opaque integers or (2) we
> need to record that we did them in a way that, if it turns out that
> they were created by a pointer-to-int casts, forces other code to
> treat that pointer as opaquely exposed.
There is a third option: don't optimize away ptr-int-ptr roundtrips. Then
you
can still do all the same optimizations on integers that LLVM does today, 
completely naively -- the integer world remains "sane". Only the
pointer world
has to be "strange".
(You can also not do things like GVN replacement of *pointer-typed* values, but 
for values of integer types this remains unproblematic.)

I don't think it makes sense for LLVM to adopt an explicit
"exposed" flag in its
semantics. Reasoning based on non-determinism works fine, and has the advantage 
of keeping ptr-to-int casts a pure, side-effect-free operation. This is the 
model we explored in
<https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and
we were able to show quite a few of LLVM's standard optimizations correct 
formally. Some changes are still needed as you noted, but those changes will be 
required anyway even if LLVM were to adopt PNVI-ae:
- No removal of ptr-int-ptr roundtrips. 
(https://bugs.llvm.org/show_bug.cgi?id=34548)
- No GVN replacement of pointer-typed values. 
(https://bugs.llvm.org/show_bug.cgi?id=35229)
>     (I'm not sure whether this is a good place to introduce this, but)
we
>     actually have semantics for pointer castings tailored to LLVM (link
>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf
>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf>>).
>     In this proposal, ptrtoint does not have an escaping side effect;
ptrtoint
>     and inttoptr are scalar operations.
>     inttoptr simply returns a pointer which can access any object.
> 
> Skimming your paper, I can see how this works /except/ that I don’t
> see any way not to treat |ptrtoint| as an escape. And really I think
> you’re already partially acknowledging that, because that’s the only
> real sense of saying that |inttoptr(ptrtoint p)| can’t be reduced to
> |p|. If those are really just scalar operations that don’t expose
> |p| in ways that might be disconnected from the uses of the |inttoptr|
> then that reduction ought to be safe.
They are indeed just scalar operations, but the reduction is not safe.
The reason is that pointer-typed variables have values of the form "(addr, 
provenance)". There is essentially an 'invisible' component in each
pointer
value that tracks some additional information -- the "provenance" of
the
pointer. Casting a ptr to an int removes that provenance. Casting an int to a 
ptr picks a "default" provenance. So the overall effect of
inttoptr(ptrtoint p)
is to turn "(addr, provenance)" into "(addr,
DEFAULT_PROVENANCE)".
Clearly that is *not* a NOP, and hence performing the reduction actually changes
the result of this operation. Before the reduction, the resulting pointer had 
DEFAULT_PROVENANCE; after the reduction, it maintains the original provenance of
"p". This can introduce UB into previously UB-free programs.

Kind regards,
Ralf

Hal Finkel via llvm-dev

2021-Jun-22 16:07 UTC

head link

[llvm-dev] [RFC] Introducing a byte type to LLVM

On 6/22/21 05:58, Ralf Jung via llvm-dev wrote:> Hi John,
>
>> Unfortunately, though, I this non-determinism still doesn’t allow LLVM
>> to be anywhere near as naive about pointer-to-int casts as it is today.
>
> Definitely. There are limits to how naive one can be; beyond those 
> limits, miscompilations lurk. 
> <https://www.ralfj.de/blog/2020/12/14/provenance.html> explains this 
> by showing such a miscompilation arising from three naive 
> optimizations being chained together.
>
>> The rule is intended to allow the compiler to start doing use-analysis
>> of exposures; let’s assume that this analysis doesn’t see any
>> un-analyzable uses, since of course it would need to conservatively
>> treat them as escapes. But if we can optimize uses of integers as if
>> they didn’t carry pointer data — say, in a function that takes integer
>> parameters — and then we can apply those optimized uses to integers
>> that concretely result from pointer-to-int casts — say, by inlining
>> that function into one of its callers — can’t we end up with a use
>> pattern for one or more of those pointer-to-int casts that no longer
>> reflects the fact that it’s been exposed? It seems to me that either
>> (1) we cannot do those optimizations on opaque integers or (2) we
>> need to record that we did them in a way that, if it turns out that
>> they were created by a pointer-to-int casts, forces other code to
>> treat that pointer as opaquely exposed.
>
> There is a third option: don't optimize away ptr-int-ptr roundtrips. 
> Then you can still do all the same optimizations on integers that LLVM 
> does today, completely naively -- the integer world remains
"sane".
> Only the pointer world has to be "strange".
> (You can also not do things like GVN replacement of *pointer-typed* 
> values, but for values of integer types this remains unproblematic.)

Do we have any idea how large of an effect this might be? If we disable 
GVN for all pointer-typed values? And is it really all GVN, or just 
cases where you unify the equivalence classes based on some dominating 
comparison operation? We should be careful here, perhaps, because LLVM's 
GVN does a lot of plain-old CSE, store-to-load forwarding, etc. and we 
should say specifically what would need to be disabled and in what contexts.

  -Hal

>
> I don't think it makes sense for LLVM to adopt an explicit
"exposed"
> flag in its semantics. Reasoning based on non-determinism works fine, 
> and has the advantage of keeping ptr-to-int casts a pure, 
> side-effect-free operation. This is the model we explored in 
> <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and we were 
> able to show quite a few of LLVM's standard optimizations correct 
> formally. Some changes are still needed as you noted, but those 
> changes will be required anyway even if LLVM were to adopt PNVI-ae:
> - No removal of ptr-int-ptr roundtrips. 
> (https://bugs.llvm.org/show_bug.cgi?id=34548)
> - No GVN replacement of pointer-typed values. 
> (https://bugs.llvm.org/show_bug.cgi?id=35229)
>
>>     (I'm not sure whether this is a good place to introduce this, 
>> but) we
>>     actually have semantics for pointer castings tailored to LLVM (link
>>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf
>>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf>>).
>>     In this proposal, ptrtoint does not have an escaping side effect; 
>> ptrtoint
>>     and inttoptr are scalar operations.
>>     inttoptr simply returns a pointer which can access any object.
>>
>> Skimming your paper, I can see how this works /except/ that I don’t
>> see any way not to treat |ptrtoint| as an escape. And really I think
>> you’re already partially acknowledging that, because that’s the only
>> real sense of saying that |inttoptr(ptrtoint p)| can’t be reduced to
>> |p|. If those are really just scalar operations that don’t expose
>> |p| in ways that might be disconnected from the uses of the |inttoptr|
>> then that reduction ought to be safe.
>
> They are indeed just scalar operations, but the reduction is not safe.
> The reason is that pointer-typed variables have values of the form 
> "(addr, provenance)". There is essentially an 'invisible'
component in
> each pointer value that tracks some additional information -- the 
> "provenance" of the pointer. Casting a ptr to an int removes that
> provenance. Casting an int to a ptr picks a "default" provenance.
So
> the overall effect of inttoptr(ptrtoint p) is to turn "(addr, 
> provenance)" into "(addr, DEFAULT_PROVENANCE)".
> Clearly that is *not* a NOP, and hence performing the reduction 
> actually changes the result of this operation. Before the reduction, 
> the resulting pointer had DEFAULT_PROVENANCE; after the reduction, it 
> maintains the original provenance of "p". This can introduce UB
into
> previously UB-free programs.
>
> Kind regards,
> Ralf
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nicolai Hähnle via llvm-dev

2021-Jun-23 05:27 UTC

head link

[llvm-dev] [RFC] Introducing a byte type to LLVM

On Tue, Jun 22, 2021 at 11:59 AM Ralf Jung via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I don't think it makes sense for LLVM to adopt an explicit
"exposed" flag
> in its
> semantics. Reasoning based on non-determinism works fine, and has the
> advantage
> of keeping ptr-to-int casts a pure, side-effect-free operation. This is
> the
> model we explored in
<https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>,
> and
> we were able to show quite a few of LLVM's standard optimizations
correct
> formally. Some changes are still needed as you noted, but those changes
> will be
> required anyway even if LLVM were to adopt PNVI-ae:
> - No removal of ptr-int-ptr roundtrips.
> (https://bugs.llvm.org/show_bug.cgi?id=34548)
> - No GVN replacement of pointer-typed values.
> (https://bugs.llvm.org/show_bug.cgi?id=35229)
>
I've read this paper now, and it makes good sense to me as something to
adopt in LLVM.

I do have one question about a point that doesn't seem sufficiently
justified, though. In the semantics of the paper,
store-pointer-then-load-as-integer results in poison. This seems to be the
root cause for being forced to introduce a "byte" type for
correctness, but
it is only really justified by an optimization that eliminates a store that
writes back a previously loaded value. That optimization doesn't seem all
that important (but feel free to point out why it is...), while introducing
a "byte" type is a massive change. On the face of it, that doesn't
seem
like a good trade-off to me.

Has the alternative of allowing type punning through memory at the cost of
removing that optimization been studied sufficiently elsewhere?

Cheers,
Nicolai

-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210623/fbd41df9/attachment.html>

llvm dev - Jun 2021 - [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM