Juneyoung Lee via llvm-dev
2021-Jun-15 05:49 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> The semantics you seem to want are that LLVM’s integer types cannot carry > information from pointers. But I can cast a pointer to an integer in C and > vice-versa, and compilers have de facto defined the behavior of subsequent > operations like breaking the integer up (and then putting it back > together), adding numbers to it, and so on. So no, as a C compiler writer, > I do not have a choice; I will have to use a type that can validly carry > pointer information for integers in C. >int->ptr cast can reconstruct the pointer information, so making integer types not carry pointer information does not necessarily mean that dereferencing a pointer casted from integer is UB. For example, the definition of cast_ival_to_ptrval at the n2676 proposal shows that a pointer's provenance is reconstructed from an integer. (Whether n2676's cast_ival_to_ptrval can be also used for LLVM's inttoptr semantics is a different question, though)> Since you seem to find this sort of thing compelling, please note that > even a simple assignment like char c2 = c1 technically promotes through > int in C, and so int must be able to carry pointer information if char > can. >IIUC integer promotion is done when it is used as an operand of arithmetic ops or switch's condition, so I think assignment operation is okay. Juneyoung On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 14 Jun 2021, at 7:04, Ralf Jung wrote: > > Hi, > > I don't dispute that but I am still not understanding the need for bytes. > None of the examples I have seen so far > clearly made the point that it is the byte types that provide a > substantial benefit. The AA example below does neither. > > I hope <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html> > makes a convincing case that under the current semantics, when one does an > "i64" load of a value that was stored at pointer type, we have to say that > this load returns poison. In particular, saying that this implicitly > performs a "ptrtoint" is inconsistent with optimizations that are probably > too important to be changed to accommodate this implicit "ptrtoint". > > I think it is fact rather obvious that, if this optimization as currently > written is indeed in conflict with the current semantics, it is the > optimization that will have to give. If the optimization is too important > for performance to give up entirely, we will simply have to find some more > restricted pattern that wee can still soundly optimize. > > That is certainly a reasonable approach. > However, judging from how reluctant LLVM is to remove optimizations that > are much more convincingly wrong [1], my impression was that it is easier > to complicate the semantics than to remove an optimization that LLVM > already performs. > > [1]: https://bugs.llvm.org/show_bug.cgi?id=34548, > https://bugs.llvm.org/show_bug.cgi?id=35229; > see https://www.ralfj.de/blog/2020/12/14/provenance.html for a > more detailed explanation > > Perhaps the clearest reason is that, if we did declare that integer types > cannot carry pointers and so introduced byte types that could, C frontends > would have to switch to byte types for their integer types, and so we would > immediately lose this supposedly important optimization for C-like > languages, and so, since optimizing C is very important, we would > immediately need to find some restricted pattern under which we could > soundly apply this optimization to byte types. That’s assuming that this > optimization is actually significant, of course. > > At least C with strict aliasing enabled (i.e., standard C) only needs to > use the byte type for "(un)signed char". The other integer types remain > unaffected. There is no arithmetic on these types ("char + char" is subject > to integer promotion), so the IR overhead would consist in a few "bytecast" > instructions next to / replacing the existing sign extensions that convert > "char" to "int" before performing the arithmetic. > > The semantics you seem to want are that LLVM’s integer types cannot carry > information from pointers. But I can cast a pointer to an integer in C and > vice-versa, and compilers have de facto defined the behavior of subsequent > operations like breaking the integer up (and then putting it back > together), adding numbers to it, and so on. So no, as a C compiler writer, > I do not have a choice; I will have to use a type that can validly carry > pointer information for integers in C. > > Since you seem to find this sort of thing compelling, please note that > even a simple assignment like char c2 = c1 technically promotes through > int in C, and so int must be able to carry pointer information if char > can. > > John. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/4f348f6c/attachment-0001.html>
John McCall via llvm-dev
2021-Jun-15 07:06 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On 15 Jun 2021, at 1:49, Juneyoung Lee wrote:> On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> The semantics you seem to want are that LLVM’s integer types cannot >> carry >> information from pointers. But I can cast a pointer to an integer in >> C and >> vice-versa, and compilers have de facto defined the behavior of >> subsequent >> operations like breaking the integer up (and then putting it back >> together), adding numbers to it, and so on. So no, as a C compiler >> writer, >> I do not have a choice; I will have to use a type that can validly >> carry >> pointer information for integers in C. >> > int->ptr cast can reconstruct the pointer information, so making > integer > types not carry pointer information does not necessarily mean that > dereferencing a pointer casted from integer is UB.What exactly is the claimed formal property of byte types, then, that integer types will lack? Because it seems to me that converting from an integer gives us valid provenance in strictly more situations than converting from bytes, since it reconstructs provenance if there’s any object at that address (under still-debated restrictions), while converting from bytes always preserves the original provenance (if any). I don’t understand how that can possibly give us *more* flexibility to optimize integers.>> Since you seem to find this sort of thing compelling, please note >> that >> even a simple assignment like char c2 = c1 technically promotes >> through >> int in C, and so int must be able to carry pointer information if >> char >> can. >> > IIUC integer promotion is done when it is used as an operand of > arithmetic > ops or switch's condition, so I think assignment operation is okay.Hmm, I was misremembering the rule, you’re right. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/00c5e35a/attachment.html>