thr3ads.net - llvm dev - [llvm-dev] [RFC] Introducing a byte type to LLVM [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Jeroen Dobbelaere via llvm-dev

2021-Jun-23 14:17 UTC

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi Ralf,

[..]> 
> To add to what Juneyoung said:
> I don't think that experiment has been made. From what I can see, the
> alternative you propose leads to an internally consistent model -- one
"just"
> has to account for the fact that a "load i64" might do some
transformation on
> the data to actually obtain an integer result (namely, it might to
ptrtoint).
> 
> However, I am a bit worried about what happens when we eventually add
proper
> support for 'restrict'/'noalias': the only models I know
for that one actually
> make 'ptrtoint' have side-effects on the memory state (similar to
setting the
> 'exposed' flag in the C provenance TS). I can't (currently)
demonstrate that
For the 'c standard', it is undefined behavior to convert a restrict
pointer to
an integer and back to a pointer type.

(At least, that is my interpretation of n2573 6.7.3.1 para 3: 
   Note that "based" is defined only for expressions with pointer
types.
)

For the full restrict patches, we do not track restrict provenance across a
ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm
sometimes
introduced these pairs; not sure if this is still valid).

Greetings,

Jeroen Dobbelaere
> this is *required*, but I also don't know an alternative. So if this
remains
> the
> case, and if we say "load i64" performs a ptrtoint when needed,
then that
> would
> mean we could not do dead load elimination any more as that would remove
the
> ptrtoint side-effect.
> 
> There also is the somewhat conceptual concern that LLVM ought to have a
type
> that can loslessly hold all kinds of data that exist in LLVM. Currently,
that
> is
> not the case -- 'iN' cannot hold data with provenance.
> 
> Kind regards,
> Ralf

Ralf Jung via llvm-dev

2021-Jun-23 19:09 UTC

head link

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi Jeroen,
>> To add to what Juneyoung said:
>> I don't think that experiment has been made. From what I can see,
the
>> alternative you propose leads to an internally consistent model -- one
"just"
>> has to account for the fact that a "load i64" might do some
transformation on
>> the data to actually obtain an integer result (namely, it might to
ptrtoint).
>>
>> However, I am a bit worried about what happens when we eventually add
proper
>> support for 'restrict'/'noalias': the only models I
know for that one actually
>> make 'ptrtoint' have side-effects on the memory state (similar
to setting the
>> 'exposed' flag in the C provenance TS). I can't (currently)
demonstrate that
> 
> For the 'c standard', it is undefined behavior to convert a
restrict pointer to
> an integer and back to a pointer type.
> 
> (At least, that is my interpretation of n2573 6.7.3.1 para 3:
>     Note that "based" is defined only for expressions with
pointer types.
> )
> 
> For the full restrict patches, we do not track restrict provenance across a
> ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm
sometimes
> introduced these pairs; not sure if this is still valid).
Interesting. I assumed that doing ptr2int, then doing whatever you want with 
that value (say, AES encrypt and then decrypt it), and then turning the same 
value back into a pointer, must always produce a pointer that is "at least
as
usable" as the one that we started with. I would interpret the parts of the
standard that talk about integer-pointer casts that way.
(That's the problem with axiomatic standards: it is very easy to have
mutually
contradicting axioms...)

FWIW, Rust's use of LLVM 'noalias' pretty much relies on this. It
would be
rather disastrous for Rust if 'noalias' pointers cannot be cast to
integers,
cast back (potentially in a different function), and used.


The C standard definition of 'restrict' is based on hypothetical
alternative
executions of the program with different inputs. I can't even imagine any 
reasonable way to interpret that unambiguously, so honestly I don't see how
that
is even a starting point for a precise formal definition that one could prove 
theorems about.^^
The ideas colleagues and me discussed for this more evolved around the idea of 
having more than one "provenance" for an allocation (so when a pointer
is passed
to a function as 'restrict' argument, it gets a fresh "ID"
into its provenance),
and then ensuring that the different provenances on one allocation are used 
consistently.  But then when you cast a ptr to an int you basically have to mark
that particular provenance as 'exposed' (losing all 'restrict'
advantages) to
have any chance of handling the case of casting the int back to a ptr.  That 
seems fair to me honestly, if you cast a ptr to an int you cannot reasonably 
expect alias analysis to make heads or tails of what you are doing.  But then 
'ptrtoint' has a side-effect and cannot be removed even if the result is
unused.

Kind regards,
Ralf

> 
> Greetings,
> 
> Jeroen Dobbelaere
> 
>> this is *required*, but I also don't know an alternative. So if
this remains
>> the
>> case, and if we say "load i64" performs a ptrtoint when
needed, then that
>> would
>> mean we could not do dead load elimination any more as that would
remove the
>> ptrtoint side-effect.
>>
>> There also is the somewhat conceptual concern that LLVM ought to have a
type
>> that can loslessly hold all kinds of data that exist in LLVM.
Currently, that
>> is
>> not the case -- 'iN' cannot hold data with provenance.
>>
>> Kind regards,
>> Ralf
> 
-- 
Website: https://people.mpi-sws.org/~jung/

Ralf Jung via llvm-dev

2021-Jun-24 07:01 UTC

head link

[llvm-dev] [RFC] Introducing a byte type to LLVM

Hi again Jeroen,
>> However, I am a bit worried about what happens when we eventually add
proper
>> support for 'restrict'/'noalias': the only models I
know for that one actually
>> make 'ptrtoint' have side-effects on the memory state (similar
to setting the
>> 'exposed' flag in the C provenance TS). I can't (currently)
demonstrate that
> 
> For the 'c standard', it is undefined behavior to convert a
restrict pointer to
> an integer and back to a pointer type.
> 
> (At least, that is my interpretation of n2573 6.7.3.1 para 3:
>     Note that "based" is defined only for expressions with
pointer types.
> )
After sleeping over it, I think I want to push back against this interpretation 
a bit more strongly. Consider a program snippet like

int *out = (int*) decrypt(encrypt( (uintptr_t)in  ));

It doesn't matter what "encrypt" and "decrypt" do, as
long as they are inverses
of each other.
"out" is definitely of pointer type. And by the dependency-based
definition of
the standard, it is the case that modifying "in" to point elsewhere
would also
make "out" point elsewhere. Thus "out" is 'based on'
"in". And hence it is okay
to use "out" to access the object "in" points to, even in
the presence of
'restrict'.

Kind regards,
Ralf
> 
> For the full restrict patches, we do not track restrict provenance across a
> ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm
sometimes
> introduced these pairs; not sure if this is still valid).
> 
> Greetings,
> 
> Jeroen Dobbelaere
> 
>> this is *required*, but I also don't know an alternative. So if
this remains
>> the
>> case, and if we say "load i64" performs a ptrtoint when
needed, then that
>> would
>> mean we could not do dead load elimination any more as that would
remove the
>> ptrtoint side-effect.
>>
>> There also is the somewhat conceptual concern that LLVM ought to have a
type
>> that can loslessly hold all kinds of data that exist in LLVM.
Currently, that
>> is
>> not the case -- 'iN' cannot hold data with provenance.
>>
>> Kind regards,
>> Ralf
> 
-- 
Website: https://people.mpi-sws.org/~jung/

llvm dev - Jun 2021 - [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM

[llvm-dev] [RFC] Introducing a byte type to LLVM