Ralf Jung via llvm-dev
2021-Jun-15 19:15 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi,> The semantics you seem to want are that LLVM’s integer types cannot carry > information from pointers. But I can cast a pointer to an integer in C and > vice-versa, and compilers have de facto defined the behavior of subsequent > operations like breaking the integer up (and then putting it back together), > adding numbers to it, and so on. So no, as a C compiler writer, I do not have a > choice; I will have to use a type that can validly carry pointer information for > integers in C.Integers demonstrably do not carry provenance; see <https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed explanation of why. As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the original provenance information is lost. This means that optimizing away such roundtrips is incorrect, and indeed doing so leads to miscompilations (https://bugs.llvm.org/show_bug.cgi?id=34548). The key difference between int and byte is that ptr-byte-ptr roundtrips are *lossless*, all the provenance is preserved. This means some extra optimizations (such as removing these roundtrips -- which implicitly happens when a redundant-store-after-load is removed), but also some lost optimizations (most notably, "x == y" does not mean x and y are equal in all respects; their provenance might still differ, so it is incorrect for GVN to replace one my the other). It's a classic tradeoff: we can *either* have lossless roundtrips *or* "x == y" implies full equality of the abstract values. Having both together leads to contradictions, which manifest as miscompilations. "byte" and "int" represent the two possible choices here; therefore, by adding "byte", LLVM would close a gap in the expressive power of its IR. Kind regards, Ralf
James Courtier-Dutton via llvm-dev
2021-Jun-15 20:10 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On Tue, 15 Jun 2021 at 20:16, Ralf Jung via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Integers demonstrably do not carry provenance; see > <https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed > explanation of why.That article is nice in the way it is trying to describe what the problem is. Extract from the article: "How can we fix this? To fix the problem, we will have to declare one of the three optimizations incorrect and stop performing it. Speaking in terms of the LLVM IR semantics, this corresponds to deciding whether pointers and/or integers have provenance: 1) We could say both pointers and integers have provenance, which invalidates the first optimization. 2) We could say pointers have provenance but integers do not, which invalidates the second optimization. 3) We could say nothing has provenance, which invalidates the third optimization." This is really the part that I disagree with. There are more possible alternatives than just 1, 2 and 3. Kind Regards James
David Blaikie via llvm-dev
2021-Jun-15 21:38 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On Tue, Jun 15, 2021 at 12:16 PM Ralf Jung via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > > The semantics you seem to want are that LLVM’s integer types cannot > carry > > information from pointers. But I can cast a pointer to an integer in C > and > > vice-versa, and compilers have de facto defined the behavior of > subsequent > > operations like breaking the integer up (and then putting it back > together), > > adding numbers to it, and so on. So no, as a C compiler writer, I do not > have a > > choice; I will have to use a type that can validly carry pointer > information for > > integers in C. > > Integers demonstrably do not carry provenance; see > <https://www.ralfj.de/blog/2020/12/14/provenance.html> for a detailed > explanation of why. > As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the > original > provenance information is lost. This means that optimizing away such > roundtrips > is incorrect, and indeed doing so leads to miscompilations > (https://bugs.llvm.org/show_bug.cgi?id=34548). > > The key difference between int and byte is that ptr-byte-ptr roundtrips > are > *lossless*, all the provenance is preserved. This means some extra > optimizations > (such as removing these roundtrips -- which implicitly happens when a > redundant-store-after-load is removed), but also some lost optimizations > (most > notably, "x == y" does not mean x and y are equal in all respects; their > provenance might still differ, so it is incorrect for GVN to replace one > my the > other). > > It's a classic tradeoff: we can *either* have lossless roundtripsI think an important part of explaining the motivation for "byte" would be an explanation/demonstration of what the cost of losing "lossless roundtrips" would be.> *or* "x == y" > implies full equality of the abstract values. Having both together leads > to > contradictions, which manifest as miscompilations. "byte" and "int" > represent > the two possible choices here; therefore, by adding "byte", LLVM would > close a > gap in the expressive power of its IR. > > Kind regards, > Ralf > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/d6e78374/attachment.html>