John McCall via llvm-dev
2021-Jun-15 07:06 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On 15 Jun 2021, at 1:49, Juneyoung Lee wrote:> On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> The semantics you seem to want are that LLVM’s integer types cannot >> carry >> information from pointers. But I can cast a pointer to an integer in >> C and >> vice-versa, and compilers have de facto defined the behavior of >> subsequent >> operations like breaking the integer up (and then putting it back >> together), adding numbers to it, and so on. So no, as a C compiler >> writer, >> I do not have a choice; I will have to use a type that can validly >> carry >> pointer information for integers in C. >> > int->ptr cast can reconstruct the pointer information, so making > integer > types not carry pointer information does not necessarily mean that > dereferencing a pointer casted from integer is UB.What exactly is the claimed formal property of byte types, then, that integer types will lack? Because it seems to me that converting from an integer gives us valid provenance in strictly more situations than converting from bytes, since it reconstructs provenance if there’s any object at that address (under still-debated restrictions), while converting from bytes always preserves the original provenance (if any). I don’t understand how that can possibly give us *more* flexibility to optimize integers.>> Since you seem to find this sort of thing compelling, please note >> that >> even a simple assignment like char c2 = c1 technically promotes >> through >> int in C, and so int must be able to carry pointer information if >> char >> can. >> > IIUC integer promotion is done when it is used as an operand of > arithmetic > ops or switch's condition, so I think assignment operation is okay.Hmm, I was misremembering the rule, you’re right. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/00c5e35a/attachment.html>
Juneyoung Lee via llvm-dev
2021-Jun-15 17:29 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On Tue, Jun 15, 2021 at 4:07 PM John McCall <rjmccall at apple.com> wrote:> On 15 Jun 2021, at 1:49, Juneyoung Lee wrote: > > On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > The semantics you seem to want are that LLVM’s integer types cannot carry > information from pointers. But I can cast a pointer to an integer in C and > vice-versa, and compilers have de facto defined the behavior of subsequent > operations like breaking the integer up (and then putting it back > together), adding numbers to it, and so on. So no, as a C compiler writer, > I do not have a choice; I will have to use a type that can validly carry > pointer information for integers in C. > > int->ptr cast can reconstruct the pointer information, so making integer > types not carry pointer information does not necessarily mean that > dereferencing a pointer casted from integer is UB. > > What exactly is the claimed formal property of byte types, then, > that integer types will lack? Because it seems to me that converting > from an integer gives us valid provenance in strictly more situations > than converting from bytes, since it reconstructs provenance if there’s > any object at that address (under still-debated restrictions), > while converting from bytes always preserves the original provenance > (if any). I don’t understand how that can possibly give us *more* > flexibility to optimize integers. >When two objects are adjacent, and an integer is exactly pointing to the location between them, its provenance cannot be properly recovered. int x[1], y[1]; llvm.assume((intptr_t)&x[0] == 0x100 && (intptr_t)&y[0] == 0x104); int *p = (int*)(intptr_t)&x[1]; // Q: Is p's provenance x or y? If it is expected that '*(p-1)' is equivalent to *x, p's provenance should be x. However, based on llvm.assume, optimizations on integers can replace (intptr_t)&x[1] with (intptr_t)&y[0] (which is what happened in the bug report). Then, '*(p-1)' suddenly becomes out-of-bounds access, which is UB. So, p's provenance isn't simply x or y; it should be something that can access both x and y. This implies that, unless there is a guarantee that all allocated objects are one or more bytes apart, there is no type that can perfectly store a pointer byte. memcpy(x, y, 8) isn't equivalent to 'v=load i64 y;store i64 v, x' because v already lost the pointer information. The pointer information is perfectly stored in a byte type. But, arithmetic property-based optimizations such as the above one are not correct anymore. Here is an example with a byte-type version: int x[1], y[1]; // byte_8 is a 64-bits byte type llvm.assume((byte_8)&x[0] == 0x100 && (byte_8)&y[0] == 0x104); int *p = (int*)(byte_8)&x[1]; // p's provenance is alway x. For a byte type, equality comparison is true does not mean that the two values are precisely equal. Since (byte_8)&x[1] and (byte_8)&y[0] have different provenances, replacing one with another must be avoided. Instead, we can guarantee that p is precisely equivalent to &x[1]. Another benefit is that optimizations on integers do not need to suffer from these pointer thingy anymore; e.g., the optimization on llvm.assume above can survive and it does not need to check whether an integer variable is derived from a pointer value.> Since you seem to find this sort of thing compelling, please note that > even a simple assignment like char c2 = c1 technically promotes through > int in C, and so int must be able to carry pointer information if char > can. > > IIUC integer promotion is done when it is used as an operand of arithmetic > ops or switch's condition, so I think assignment operation is okay. > > Hmm, I was misremembering the rule, you’re right. > > John. >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210616/ef0a0249/attachment.html>