thr3ads.net - llvm dev - [llvm-dev] [RFC] Introducing a byte type to LLVM [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Jeroen Dobbelaere via llvm-dev

2021-Jun-24 07:47 UTC

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi Ralf,

My interpretation (well not just mine, we did have discussions about this in our
group)
wrt to restrict handling, is that the use of decrypt/encrypt
triggers undefined behavior. Aka, we are not forced to try to track the 
restrict dependency for this case. That is important if you want to optimize
restrict annotated accesses vs not-annotated accesses.

At the time that I came up with the implementation, that was also a convenient
fallback
to avoid some of the pitfalls. It made thinking about the solution
'easier'.
For our customers, getting the pointer based use cases working also had the
highest priority.

Now that we are going over the different pieces of the implementation and see
how we can use
them in a broader context, the situation is different: instead of just tracking
the 'restrict/noalias' provenance, we now want to use that part of the
infrastructure to
track provenance in general. Because of that, it also makes sense to reconsider
what 'policy'
we want to use. In that context, mapping a 'int2ptr' to a
'add_provenance(int2ptr(%Decrypt), null)'
indicating that it can point to anything makes sense, but is still orthogonal to
the infrastructure.

For this particular example, it would also be nice if we could somehow indicate
that the
'decrypt(encrypt(%P))' can only depend on %P. But that is another
discussion.

Greetings,

Jeroen

> -----Original Message-----
> From: Ralf Jung <jung at mpi-sws.org>
> Sent: Thursday, June 24, 2021 09:02
> To: Jeroen Dobbelaere <dobbel at synopsys.com>; Juneyoung Lee
> <juneyoung.lee at sf.snu.ac.kr>; Nicolai Hähnle <nhaehnle at
gmail.com>; llvm-
> dev at lists.llvm.org
> Subject: Re: [llvm-dev] [RFC] Introducing a byte type to LLVM
> 
> Hi again Jeroen,
> 
> >> However, I am a bit worried about what happens when we eventually
add
> proper
> >> support for 'restrict'/'noalias': the only models
I know for that one
> actually
> >> make 'ptrtoint' have side-effects on the memory state
(similar to setting
> the
> >> 'exposed' flag in the C provenance TS). I can't
(currently) demonstrate
> that
> >
> > For the 'c standard', it is undefined behavior to convert a
restrict pointer
> to
> > an integer and back to a pointer type.
> >
> > (At least, that is my interpretation of n2573 6.7.3.1 para 3:
> >     Note that "based" is defined only for expressions with
pointer types.
> > )
> 
> After sleeping over it, I think I want to push back against this
> interpretation
> a bit more strongly. Consider a program snippet like
> 
> int *out = (int*) decrypt(encrypt( (uintptr_t)in  ));
> 
> It doesn't matter what "encrypt" and "decrypt" do,
as long as they are
> inverses
> of each other.
> "out" is definitely of pointer type. And by the dependency-based
definition of
> the standard, it is the case that modifying "in" to point
elsewhere would also
> make "out" point elsewhere. Thus "out" is 'based
on' "in". And hence it is
> okay
> to use "out" to access the object "in" points to, even
in the presence of
> 'restrict'.
> 
> Kind regards,
> Ralf
> 
> >
> > For the full restrict patches, we do not track restrict provenance
across a
> > ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as
llvm
> sometimes
> > introduced these pairs; not sure if this is still valid).
> >
> > Greetings,
> >
> > Jeroen Dobbelaere
> >
> >> this is *required*, but I also don't know an alternative. So
if this
> remains
> >> the
> >> case, and if we say "load i64" performs a ptrtoint when
needed, then that
> >> would
> >> mean we could not do dead load elimination any more as that would
remove
> the
> >> ptrtoint side-effect.
> >>
> >> There also is the somewhat conceptual concern that LLVM ought to
have a
> type
> >> that can loslessly hold all kinds of data that exist in LLVM.
Currently,
> that
> >> is
> >> not the case -- 'iN' cannot hold data with provenance.
> >>
> >> Kind regards,
> >> Ralf
> >
> 
> --
> Website: https://urldefense.com/v3/__https://people.mpi-
>
sws.org/*jung/__;fg!!A4F2R9G_pg!OxsbBsUqT_ORztvmiL8KMQVNFdMPVYluQbPvIfVWl8KHjQ
> dIXhSF65d6sByCus-4fqepGR7h$

Ralf Jung via llvm-dev

2021-Jun-24 08:26 UTC

head link

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi Jeroen,
> My interpretation (well not just mine, we did have discussions about this
in our group)
> wrt to restrict handling, is that the use of decrypt/encrypt
> triggers undefined behavior.
Yes, that is exactly what I am pushing back against. :)  I cannot see a reading 
of the standard where this is UB.  I also don't think it is the intention of
the
standard to make this UB.  Note that the line I showed could be very far away 
from the 'restrict' annotation. Basically if this is UB then a
'restrict'
pointer cannot be passed to other functions unless we know exactly that they do 
not do ptr-to-int casts.
> Now that we are going over the different pieces of the implementation and
see how we can use
> them in a broader context, the situation is different: instead of just
tracking
> the 'restrict/noalias' provenance, we now want to use that part of
the infrastructure to
> track provenance in general. Because of that, it also makes sense to
reconsider what 'policy'
> we want to use. In that context, mapping a 'int2ptr' to a
'add_provenance(int2ptr(%Decrypt), null)'
> indicating that it can point to anything makes sense, but is still
orthogonal to the infrastructure.
That is not sufficient though. You also need to know that the provenance of the 
'restrict'ed pointer can now be acquired by other pointers created
literally
anywhere via int2ptr. *That* is what makes this so tricky, I think.

int foo(int *restrict x) {
   *x = 0;
   unk1();
   assert(*x == 0); // can be optimized to 'true'
   unk2((uintptr_t)x);
   assert(*x == 0); // can *not* be optimized to 'true'
}
> For this particular example, it would also be nice if we could somehow
indicate that the
> 'decrypt(encrypt(%P))' can only depend on %P. But that is another
discussion.
It would be nice if one could express this in the surface language (C/Rust), but
I don't think we should allow LLVM to infer this -- that would basically
require
tracking provenance through integers, which is not a good idea.
Put differently: as the various examples in this thread show, integers can 
easily acquire "provenance" of other values simply by comparing them
for
equality -- so in a sense, after "x == y" evaluates to true, now
'x' also has
the "provenance" of 'y'. I don't think we want obscure
effects like this in the
semantics of the Abstract Machine. (I am not even convinced this can be done 
consistently.)
So then what we are left with are those transformations that are correct without
extra support from the abstract machine. And since these dependencies can 
entirely disappear from the source code through optimizations like GVN replacing
'x' by 'y', there are strong limits to what can be done here.

Kind regards,
Ralf
> 
> Greetings,
> 
> Jeroen
> 
> 
>> -----Original Message-----
>> From: Ralf Jung <jung at mpi-sws.org>
>> Sent: Thursday, June 24, 2021 09:02
>> To: Jeroen Dobbelaere <dobbel at synopsys.com>; Juneyoung Lee
>> <juneyoung.lee at sf.snu.ac.kr>; Nicolai Hähnle <nhaehnle at
gmail.com>; llvm-
>> dev at lists.llvm.org
>> Subject: Re: [llvm-dev] [RFC] Introducing a byte type to LLVM
>>
>> Hi again Jeroen,
>>
>>>> However, I am a bit worried about what happens when we
eventually add
>> proper
>>>> support for 'restrict'/'noalias': the only
models I know for that one
>> actually
>>>> make 'ptrtoint' have side-effects on the memory state
(similar to setting
>> the
>>>> 'exposed' flag in the C provenance TS). I can't
(currently) demonstrate
>> that
>>>
>>> For the 'c standard', it is undefined behavior to convert a
restrict pointer
>> to
>>> an integer and back to a pointer type.
>>>
>>> (At least, that is my interpretation of n2573 6.7.3.1 para 3:
>>>      Note that "based" is defined only for expressions
with pointer types.
>>> )
>>
>> After sleeping over it, I think I want to push back against this
>> interpretation
>> a bit more strongly. Consider a program snippet like
>>
>> int *out = (int*) decrypt(encrypt( (uintptr_t)in  ));
>>
>> It doesn't matter what "encrypt" and "decrypt"
do, as long as they are
>> inverses
>> of each other.
>> "out" is definitely of pointer type. And by the
dependency-based definition of
>> the standard, it is the case that modifying "in" to point
elsewhere would also
>> make "out" point elsewhere. Thus "out" is
'based on' "in". And hence it is
>> okay
>> to use "out" to access the object "in" points to,
even in the presence of
>> 'restrict'.
>>
>> Kind regards,
>> Ralf
>>
>>>
>>> For the full restrict patches, we do not track restrict provenance
across a
>>> ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do,
as llvm
>> sometimes
>>> introduced these pairs; not sure if this is still valid).
>>>
>>> Greetings,
>>>
>>> Jeroen Dobbelaere
>>>
>>>> this is *required*, but I also don't know an alternative.
So if this
>> remains
>>>> the
>>>> case, and if we say "load i64" performs a ptrtoint
when needed, then that
>>>> would
>>>> mean we could not do dead load elimination any more as that
would remove
>> the
>>>> ptrtoint side-effect.
>>>>
>>>> There also is the somewhat conceptual concern that LLVM ought
to have a
>> type
>>>> that can loslessly hold all kinds of data that exist in LLVM.
Currently,
>> that
>>>> is
>>>> not the case -- 'iN' cannot hold data with provenance.
>>>>
>>>> Kind regards,
>>>> Ralf
>>>
>>
>> --
>> Website: https://urldefense.com/v3/__https://people.mpi-
>>
sws.org/*jung/__;fg!!A4F2R9G_pg!OxsbBsUqT_ORztvmiL8KMQVNFdMPVYluQbPvIfVWl8KHjQ
>> dIXhSF65d6sByCus-4fqepGR7h$
-- 
Website: https://people.mpi-sws.org/~jung/

llvm dev - Jun 2021 - [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM