Harald van Dijk via llvm-dev
2021-Jun-07 17:39 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi, On 04/06/2021 16:24, George Mitenkov via llvm-dev wrote:> > 2. > > When copying pointers as bytes implicit pointer-to-integer casts > are avoided.The current setting of performing a memcpy using i8s > leads to miscompilations (example: bug report 37469 > <https://bugs.llvm.org/show_bug.cgi?id=37469>) and is bad for > alias analysis. >While trying to understand what exactly is going wrong in this bug I think I stumbled upon incomplete documentation. At https://llvm.org/docs/LangRef.html#pointeraliasing it is documented that * A memory access must be done through a pointer associated with the address range. * A pointer value is associated with the addresses associated with any value it is based on. * A pointer value formed by an instruction is based on another pointer value the instruction is getelementptr, bitcast, or inttoptr. What is not mentioned is what a pointer value formed by a load instruction is based on. If a pointer value formed by a load instruction is not associated with any address range, then it cannot be used to load or store /any/ value. Clearly that is not wanted. If memory is completely untyped, then a load of a pointer value needs to be equivalent to a load of an integer value followed by inttoptr. In that model, it is not possible to replace a load of a pointer value by a previously stored value, it is only possible to replace it by inttoptr(ptrtoint(previously stored value)). In that model, the problem is that LLVM does replace a load of a pointer value by a previously stored value, and the bug can be fixed (at a cost of reduced optimisations of other code) by making sure to insert ptrtoint and inttoptr as needed, including for any pointer members of structs etc. Your proposal is based on an interpretation of the problem using a memory model where memory is not completely untyped. In your memory model, what address range is a pointer value that came from a load instruction associated with? Cheers, Harald van Dijk -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210607/997e13fa/attachment.html>
Nicolai Hähnle via llvm-dev
2021-Jun-08 23:01 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi, On Mon, Jun 7, 2021 at 7:40 PM Harald van Dijk via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 04/06/2021 16:24, George Mitenkov via llvm-dev wrote: > > > 1. > > When copying pointers as bytes implicit pointer-to-integer casts are > avoided.The current setting of performing a memcpy using i8s leads to > miscompilations (example: bug report 37469 > <https://bugs.llvm.org/show_bug.cgi?id=37469>) and is bad for alias > analysis. > > While trying to understand what exactly is going wrong in this bug I think > I stumbled upon incomplete documentation. At > https://llvm.org/docs/LangRef.html#pointeraliasing it is documented that > > - A memory access must be done through a pointer associated with the > address range. > - A pointer value is associated with the addresses associated with any > value it is based on. > - A pointer value formed by an instruction is based on another pointer > value the instruction is getelementptr, bitcast, or inttoptr. > > What is not mentioned is what a pointer value formed by a load instruction > is based on. If a pointer value formed by a load instruction is not > associated with any address range, then it cannot be used to load or store > *any* value. Clearly that is not wanted. > > If memory is completely untyped, then a load of a pointer value needs to > be equivalent to a load of an integer value followed by inttoptr. In that > model, it is not possible to replace a load of a pointer value by a > previously stored value, it is only possible to replace it by > inttoptr(ptrtoint(previously stored value)). In that model, the problem is > that LLVM does replace a load of a pointer value by a previously stored > value, and the bug can be fixed (at a cost of reduced optimisations of > other code) by making sure to insert ptrtoint and inttoptr as needed, > including for any pointer members of structs etc. > > Your proposal is based on an interpretation of the problem using a memory > model where memory is not completely untyped. In your memory model, what > address range is a pointer value that came from a load instruction > associated with? >I second this. I feel like the entire discussion is extremely confused and confusing because there is lack of clarity about what values are in fact carried by the types i<N>, ptr (assuming untyped pointers already), and the proposed b<N>. There are statements about "whether a b<N> contains a pointer or an integer" without a clear definition of what any of that really means. By the way: the discussion of GVN states that i<N> do not carry provenance information, but LangRef appears to contradict this, because it states (in the Pointer Aliasing section) that "A pointer value formed by an inttoptr is *based* on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value." The only way to make this statement operational is for the operand of the inttoptr to carry provenance information; this definitely needs to be addressed by this proposal. My educated guess is that what the proposal really wants to say is that an i<N> value is poison or an integer without pointer provenance (i.e., an i<N> value is never "based on" a pointer) while a b<N> value is poison or an integer that may have provenance (i.e., it may be "based on" one or more pointers). Everything else should follow from this core definition. With this phrasing, I can see a coherent picture emerging that can be discussed without a lot of the confusion in this thread (though still _some_ confusion, as the topic is inherently subtle). @George, Nuno, and Juneyoung: is that a good lens through which to see the proposal? If no, why not? If yes, I believe there are some immediate implications and questions, such as: 1. Forbidding arithmetic and bitwise operations in b<N> seems pointless. Just define them as the corresponding i<N> op plus the union of provenance of the operands. This allows consistent implementation of char/unsigned char as b8, without having to jump back and forth between b8 and i8 all the time. 2. What's the provenance of a b<N> `select`? 3. A b<N>-to-i<N> conversion necessarily loses all provenance information. 4. What's the provenance of the result of an i<N>-to-b<N> conversion? The only possible choices are "empty" or "full" provenance, where "full" provenance means that the value is "based on" _all_ pointers. Depending on the details of allowed instruction signatures, we may want to allow both choices. "Full" provenance gives us something that's useful as a step towards inttoptr. "Empty" provenance allows us to lift i<N> to b<N> for arithmetic operations without being overly conservative (e.g., consider the somewhat hypothetical problem of lowering a GEP to a sequence of arithmetic on b<N> types in IR -- we want the result to have the provenance of the base pointer, which enters the arithmetic as a b<N> with provenance; adding the various i<N> offset operands should have no effect on provenance). 5. Memory is untyped but can carry provenance; i.e., it consists of b8. Load/store of i<N> performs implicit i-to-b/b-to-i conversions, respectively. Which kind of i-to-b conversion is used during a store? The "empty" one or the "full" one? There are several reasonable options for how to address this, and a choice must be made. 6. (How) are pointer types fundamentally different from b<N> types of the correct size? (By this I mean: is there any interesting difference in the values that these types can carry? Ignore surface differences like the fact that GEP traditionally goes with pointers while `add` goes with integer types -- we could have a GEP instruction on a correctly sized b<N>) It seems pretty clear that there is still a lot of work to be done on this proposal before we can be sufficiently confident that it won't just add new correctness problems elsewhere. Besides, the cost of adding a new family of types is very high, and I am not (yet?) convinced that it pays for itself. But with the above I can at least see better where you're coming from. Cheers, Nicolai Cheers,> Harald van Dijk > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210609/27f378b4/attachment.html>