Jeroen Dobbelaere via llvm-dev
2021-Jun-23 14:17 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi Ralf, [..]> > To add to what Juneyoung said: > I don't think that experiment has been made. From what I can see, the > alternative you propose leads to an internally consistent model -- one "just" > has to account for the fact that a "load i64" might do some transformation on > the data to actually obtain an integer result (namely, it might to ptrtoint). > > However, I am a bit worried about what happens when we eventually add proper > support for 'restrict'/'noalias': the only models I know for that one actually > make 'ptrtoint' have side-effects on the memory state (similar to setting the > 'exposed' flag in the C provenance TS). I can't (currently) demonstrate thatFor the 'c standard', it is undefined behavior to convert a restrict pointer to an integer and back to a pointer type. (At least, that is my interpretation of n2573 6.7.3.1 para 3: Note that "based" is defined only for expressions with pointer types. ) For the full restrict patches, we do not track restrict provenance across a ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm sometimes introduced these pairs; not sure if this is still valid). Greetings, Jeroen Dobbelaere> this is *required*, but I also don't know an alternative. So if this remains > the > case, and if we say "load i64" performs a ptrtoint when needed, then that > would > mean we could not do dead load elimination any more as that would remove the > ptrtoint side-effect. > > There also is the somewhat conceptual concern that LLVM ought to have a type > that can loslessly hold all kinds of data that exist in LLVM. Currently, that > is > not the case -- 'iN' cannot hold data with provenance. > > Kind regards, > Ralf
Ralf Jung via llvm-dev
2021-Jun-23 19:09 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi Jeroen,>> To add to what Juneyoung said: >> I don't think that experiment has been made. From what I can see, the >> alternative you propose leads to an internally consistent model -- one "just" >> has to account for the fact that a "load i64" might do some transformation on >> the data to actually obtain an integer result (namely, it might to ptrtoint). >> >> However, I am a bit worried about what happens when we eventually add proper >> support for 'restrict'/'noalias': the only models I know for that one actually >> make 'ptrtoint' have side-effects on the memory state (similar to setting the >> 'exposed' flag in the C provenance TS). I can't (currently) demonstrate that > > For the 'c standard', it is undefined behavior to convert a restrict pointer to > an integer and back to a pointer type. > > (At least, that is my interpretation of n2573 6.7.3.1 para 3: > Note that "based" is defined only for expressions with pointer types. > ) > > For the full restrict patches, we do not track restrict provenance across a > ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm sometimes > introduced these pairs; not sure if this is still valid).Interesting. I assumed that doing ptr2int, then doing whatever you want with that value (say, AES encrypt and then decrypt it), and then turning the same value back into a pointer, must always produce a pointer that is "at least as usable" as the one that we started with. I would interpret the parts of the standard that talk about integer-pointer casts that way. (That's the problem with axiomatic standards: it is very easy to have mutually contradicting axioms...) FWIW, Rust's use of LLVM 'noalias' pretty much relies on this. It would be rather disastrous for Rust if 'noalias' pointers cannot be cast to integers, cast back (potentially in a different function), and used. The C standard definition of 'restrict' is based on hypothetical alternative executions of the program with different inputs. I can't even imagine any reasonable way to interpret that unambiguously, so honestly I don't see how that is even a starting point for a precise formal definition that one could prove theorems about.^^ The ideas colleagues and me discussed for this more evolved around the idea of having more than one "provenance" for an allocation (so when a pointer is passed to a function as 'restrict' argument, it gets a fresh "ID" into its provenance), and then ensuring that the different provenances on one allocation are used consistently. But then when you cast a ptr to an int you basically have to mark that particular provenance as 'exposed' (losing all 'restrict' advantages) to have any chance of handling the case of casting the int back to a ptr. That seems fair to me honestly, if you cast a ptr to an int you cannot reasonably expect alias analysis to make heads or tails of what you are doing. But then 'ptrtoint' has a side-effect and cannot be removed even if the result is unused. Kind regards, Ralf> > Greetings, > > Jeroen Dobbelaere > >> this is *required*, but I also don't know an alternative. So if this remains >> the >> case, and if we say "load i64" performs a ptrtoint when needed, then that >> would >> mean we could not do dead load elimination any more as that would remove the >> ptrtoint side-effect. >> >> There also is the somewhat conceptual concern that LLVM ought to have a type >> that can loslessly hold all kinds of data that exist in LLVM. Currently, that >> is >> not the case -- 'iN' cannot hold data with provenance. >> >> Kind regards, >> Ralf >-- Website: https://people.mpi-sws.org/~jung/
Ralf Jung via llvm-dev
2021-Jun-24 07:01 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi again Jeroen,>> However, I am a bit worried about what happens when we eventually add proper >> support for 'restrict'/'noalias': the only models I know for that one actually >> make 'ptrtoint' have side-effects on the memory state (similar to setting the >> 'exposed' flag in the C provenance TS). I can't (currently) demonstrate that > > For the 'c standard', it is undefined behavior to convert a restrict pointer to > an integer and back to a pointer type. > > (At least, that is my interpretation of n2573 6.7.3.1 para 3: > Note that "based" is defined only for expressions with pointer types. > )After sleeping over it, I think I want to push back against this interpretation a bit more strongly. Consider a program snippet like int *out = (int*) decrypt(encrypt( (uintptr_t)in )); It doesn't matter what "encrypt" and "decrypt" do, as long as they are inverses of each other. "out" is definitely of pointer type. And by the dependency-based definition of the standard, it is the case that modifying "in" to point elsewhere would also make "out" point elsewhere. Thus "out" is 'based on' "in". And hence it is okay to use "out" to access the object "in" points to, even in the presence of 'restrict'. Kind regards, Ralf> > For the full restrict patches, we do not track restrict provenance across a > ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm sometimes > introduced these pairs; not sure if this is still valid). > > Greetings, > > Jeroen Dobbelaere > >> this is *required*, but I also don't know an alternative. So if this remains >> the >> case, and if we say "load i64" performs a ptrtoint when needed, then that >> would >> mean we could not do dead load elimination any more as that would remove the >> ptrtoint side-effect. >> >> There also is the somewhat conceptual concern that LLVM ought to have a type >> that can loslessly hold all kinds of data that exist in LLVM. Currently, that >> is >> not the case -- 'iN' cannot hold data with provenance. >> >> Kind regards, >> Ralf >-- Website: https://people.mpi-sws.org/~jung/