Ralf Jung via llvm-dev
2021-Jun-13 15:26 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
Hi Johannes,>> I think Joshua gave a very nice motivation already. > > I don't dispute that but I am still not understanding the need for bytes. None > of the examples I have seen so far > clearly made the point that it is the byte types that provide a substantial > benefit. The AA example below does neither.I hope <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html> makes a convincing case that under the current semantics, when one does an "i64" load of a value that was stored at pointer type, we have to say that this load returns poison. In particular, saying that this implicitly performs a "ptrtoint" is inconsistent with optimizations that are probably too important to be changed to accommodate this implicit "ptrtoint". If/once we agree on that, "byte" is a fairly natural proposal IMO: we need some type at which to perform a load that does *not* turn some kinds of data into poison. "byte" is that type. Kind regards, Ralf> > >> I'll just complement with an alias analysis example. (plus see my reply to Chris) >> >> define @f() { >> %p = alloca i8 >> %q = alloca i8** >> store i8 42, %p >> store i8* %p, i8** %q >> >> %pp = some ptr arithmetic; full may-alias >> %v = load i64, %pp >> call @g(%v) >> >> %w = load i8, %p ; call we store-forward & RAUW 42? >> } >> >> So, we pass an integer to some function. If we don't distinguish between >> pointers and integers (current memory model), we need to assume that %v may >> carry the address of %p, as the load may have done an implicit >> pointer-to-integer cast. Therefore, pointer %p is escaped. Thus, in this model >> we cannot do store forwarding of 42 to the load of %w. > > The conclusion that we cannot forward 42 is true if %pp has access to %p or %q, > which is not clear in the above code. > If no use of %p or %q somehow makes it's way into %pp, nor a load of %q, %pp can > be shown to be independent of %q already, regardless if we already perform this > reasoning or not. > > > >> Except that LLVM doesn't consider integer loads as escaping pointers. This >> is wrong, and that's one of the reasons why we miscompile integer/pointer >> casts. The other reason is what I explained in the reply to John & Chris, >> which is that some integer optimizations aren't correct for pointers, so we >> cannot represent pointers as integers (without appropriate p2i/i2p casts). > > At least me (and probably others) are arguing for the appropriate and explicit > p2i/i2p casts instead of a entirely new type that does the same job. > > >> Another model is to say that integers can only carry integers. Above there's >> no pointer-to-integer cast of %p, hence %v cannot have the address of %p, and >> thus store-forwarding can happen. >> This model requires a "universal holder" different than (ab)using integers. > > Storing %p into %q causes %p to be escaped by default, IMHO. With the newest > definition of escapes, we allow analyses to prove %q is not used in "a weird > way" and therefore we can pretend %p is not escaped. This effectively requires > us to look at all the loads of %q and assume they are potential copies of %p. If > such a load is (potentially) an integer, you'd need to decide if the (potential) > integer use can escape %p, same as you would decide any other load of %q would. > I fail to see the need for byte types here. > > >> Introducing a byte type means we only need to careful with char loads, but not >> with short, int, long, etc loads. Clang would lower char loads to byte loads, >> and the remaining integer types straight to i<n> loads. So we can go full >> speed on integers, and throttle down just on chars, which is all we can do >> because the semantics of chars in C. > > This will cause a lot of side effects, in addition to the work required to make > this happen in the first place. > Even if I assume we really want byte types for AA, which I am not sure of given > the lack of a convincing example, we should determine what the impact would be > here, e.g., restrict some optimizations in the presence of i8 operation and get > some idea of the cost. > > ~ Johannes > > >> >> Nuno >> >> >> -----Original Message----- >> From: Johannes Doerfert <johannesdoerfert at gmail.com> >> Sent: 04 June 2021 22:12 >> To: Nuno Lopes <nunoplopes at sapo.pt>; 'George Mitenkov' <georgemitenk0v at gmail.com> >> Cc: 'Ralf Jung' <jung at mpi-sws.org>; cfe-dev at lists.llvm.org; >> llvm-dev at lists.llvm.org >> Subject: Re: [RFC] Introducing a byte type to LLVM >> >> I think the lack of a end-to-end example is making this harder (for me) to >> understand. >> >> >> On 6/4/21 2:17 PM, Nuno Lopes wrote: >>> It's true: an alternative to introducing a new byte type is to make alias >>> analysis more conservative. But I think it's way worse than you think. >>> Making AA conservative means that every single integer load must be >>> considered to be escaping a pointer. >> What pointer is escaping if I have an integer load. I'm sorry for my >> questions but I assume I'm not the only one that might >> want to understand this properly. So far, I've assumed if I store a >> pointer away w/o knowing what happens to the memory the >> pointer escapes. This is our current model and it's not that bad. >> >> >>> Unless you know the last stored value was an integer, you would have to >>> conservatively assume that the load may be escaping some pointer (even if >>> only partially). This sounds to me like terrible. Type punning through memory >>> is probably very uncommon, and it seems we would be slowing down all programs >>> because of such uncommon case. >>> >>> Therefore I feel it's best if we introduce a way to explicitly mark potential >>> type punning situations, such that AA (and optimizers) only need to be >>> conservative when needed. It's not the case that right now we can detect the >>> few potential type punning cases and be conservative just there. We would >>> need to be conservative in e.g. every function that loads an argument pointer. >>> I agree it's scary to introduce a new type in the IR; we don't do that very >>> often. But we need to fix the bugs in LLVM. >> What doesn't add up for me is that one the one side the argument >> "everything needs to be conservative" w/o byte types, but at the same >> time we can somehow determine when to use byte types locally? If the >> latter would not be the case, how would we know when we need a byte type? >> >> Why would we not be conservative/explicit about type punning whenever we >> would introduce byte types with this proposal? I unfortunately don't >> understand what the byte types give us. My best guess is that they >> explicitly mark byte operations on pointers, which we could do otherwise >> I assume (explicit p2i, i2p). If there is more to it, I think we need an >> end to end example that shows how we cannot make type punning explicit >> with current means and a new type is conceptually better. >> >> ~ Johannes >> >> >>> Nuno >>> >>> -----Original Message----- >>> From: Johannes Doerfert via llvm-dev >>> Sent: 04 June 2021 17:56 >>> Subject: Re: [llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM >>> >>> >>> On 6/4/21 11:35 AM, George Mitenkov wrote: >>>> Hi Johannes, >>>> >>>> Sure! The underlying problem is that raw-memory access handlers are >>>> treated as integers, while they are not really integers. Especially >>>> std::byte that specifically states that it has raw-memory access >>>> semantics. This semantic mismatch can make AA wrong and a pointer to >>>> escape. >>> I would assume being conservative is a possibility here, right? >>> So if someone wants to type-pun a pointer we would do very little >>> optimizations around it. >>> >>> >>>> Consider the following LLVM IR that copies a pointer: >>>> >>>> %src8 = bitcast i8** %src to i8* >>>> >>>> %dst8 = bitcast i8** %dst to i8* >>>> >>>> call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1 >>>> false) >>>> >>>> %load = load i8*, i8** %dst >>>> >>>> %addr = ptrtoint i8* %load to i64 >>>> >>>> ret i64 %addr >>>> >>>> >>>> If we optimize the call to memcpy, then the IR becomes >>>> >>>> >>>> %src64 = bitcast i8** %src to i64* >>>> >>>> %dst64 = bitcast i8** %dst to i64* >>>> >>>> %addr = load i64, i64* %src64, align 1 >>>> >>>> store i64 %addr, i64* %dst64, align 1 >>>> >>>> ret i64 %addr >>>> >>>> >>>> Since there is no "data" type in LLVM like a byte, the ptrtoint is >>>> optimized out and the pointer escapes. If we have used a pair of byte >>>> load/store that would not happen. >>> The pointer (load %src) does escape in this example, doesn't it? >>> >>> I don't get why "that would not happen" with byte load/store and why that >>> would be correct at all. >>> >>> >>>> The mentioned bug is just an example, but this pattern can be >>>> >>>> replicated in other cases that optimize memcpy as integer load/stores and >>>> drop ptrtoint, >>>> >>>> undermining alias analysis. To illustrate further, consider loading a >>>> pointer like: >>>> >>>> >>>> %v1 = load i8*, %p >>>> >>>> %v2 = load i8*, %p >>>> >>>> >>>> Instcombine optimizes this into >>>> >>>> >>>> %v1 = load i64, %p >>>> >>>> %v2 = inttoptr %v1 >>>> >>>> >>>> changing the underlying provenance (again, not good for AA). >>> This example looks somehow broken (I'll assuming %v1 has integer type). >>> >>> I agree this is not a great optimization, but is it wrong? We should have >>> done the opposite (ptr2int for v1) but even then I'm not sure what AA could >>> derive here. We load a pointer and then we escape it into an integer. If the >>> ptr2int is not removed, I'm not sure what AA should do about this. I'm also >>> not sure how practical this is. Do we have some idea how often such >>> cases exist? >>> >>> >>>> If we had a >>>> byte type instead, >>>> >>>> then we could >>>> >>>> 1. preserve provenance >>>> >>>> 2. differentiate between integers and the raw data copied >>>> >>>> >>>> This might be beneficial for some backends (e.g CHERI) that want to >>>> differentiate between >>>> >>>> true integers and bytes that can be integers. The optimizations for known >>>> integers >>>> >>>> can become more aggressive in this case. >>>> >>> It sounds to me like the problem is our optimization in the presence of >>> int2ptr and ptr2int are (still) unsound. We should at least not remove >>> ptr2int, >>> is that it? >>> >>> That said, I still don't get how a byte type helps for these examples. >>> More concretely: >>> >>> ref 1) How does a byte type preserve provenance? If it is somehow >>> implicitly attached >>> would that not mean we need to now be careful doing >>> optimizations on bytes (as we >>> should now be careful doing optimizations around p2i and i2p)? >>> Literally shifting >>> the problem from looking for p2i and i2p to looking for bytecast? >>> >>> ref 2) What optimization can I do if I have the differentiation of >>> integers and raw data? >>> Can we somehow get an idea of the optimization potential? >>> >>> >>> ~ Johannes >>> >>> >>> >>>> Thanks, >>>> >>>> George >>>> >>>> On Fri, Jun 4, 2021 at 7:03 PM Johannes Doerfert <johannesdoerfert at gmail.com> >>>> wrote: >>>> >>>>> Hi George, >>>>> >>>>> On 6/4/21 10:24 AM, George Mitenkov via cfe-dev wrote: >>>>>> Hi all, >>>>>> >>>>>> Together with Nuno Lopes and Juneyoung Lee we propose to add a new byte >>>>>> type to LLVM to fix miscompilations due to load type punning. Please see >>>>>> the proposal below. It would be great to hear the >>>>>> feedback/comments/suggestions! >>>>>> >>>>>> >>>>>> Motivation >>>>>> =========>>>>>> >>>>>> char and unsigned char are considered to be universal holders in C. They >>>>>> can access raw memory and are used to implement memcpy. i8 is the LLVM’s >>>>>> counterpart but it does not have such semantics, which is also not >>>>>> desirable as it would disable many optimizations. >>>>>> >>>>>> We therefore propose to introduce a new byte type that would have a >>>>> raw-memory >>>>>> access semantics and can be differentiated from i8. This can help to fix >>>>>> unsound optimizations and make the lowering of char, unsigned char or >>>>>> std::byte correct. Currently, we denote the byte type as b<N>, where N is >>>>>> the number of bits. >>>>>> >>>>>> In our semantics, byte type carries provenance and copying bytes does not >>>>>> escape pointers, thereby benefiting alias analysis. If the memory >>>>> contains >>>>>> poison bits, then the result of loading a byte from it is bitwise poison. >>>>> Could you elaborate on the motivation a bit. I'm not sure I follow. >>>>> Maybe examples would be good. The only one I found is in >>>>> https://bugs.llvm.org/show_bug.cgi?id=37469. For that one I'm >>>>> wondering how much we would in reality loose if we make the required >>>>> ptr2int explicit when people cast a pointer to an integer. In general, >>>>> explicit encoding seems to me preferable and overall less work (new >>>>> types kinda scare me TBH). >>>>> >>>>> ~ Johannes >>>>> >>>>> >>>>>> Benefits >>>>>> =======>>>>>> >>>>>> The small calls to memcpy and memmove that are transformed by Instcombine >>>>>> into integer load/store pairs would be correctly transformed into the >>>>>> loads/stores of the new byte type no matter what underlying value is >>>>> being >>>>>> copied (integral value or a pointer value). >>>>>> >>>>>> 1. >>>>>> >>>>>> The new byte type solves the problem of copying the padded data, >>>>> with no >>>>>> contamination of the loaded value due to poison bits of the padding. >>>>>> 2. >>>>>> >>>>>> When copying pointers as bytes implicit pointer-to-integer casts are >>>>>> avoided.The current setting of performing a memcpy using i8s leads to >>>>>> miscompilations (example: bug report 37469 >>>>>> <https://bugs.llvm.org/show_bug.cgi?id=37469>) and is bad for alias >>>>>> analysis. >>>>>> >>>>>> >>>>>> >>>>>> Lowering of char, unsigned char and std::byte >>>>>> >>>>>> =====================================>>>>>> >>>>>> For any function that takes char, unsigned char or std::byte as >>>>> arguments, >>>>>> we lower these to the b8 instead of i8. The motivation comes from the >>>>> fact >>>>>> that any pointer can be explicitly copied in bytes, and each byte can be >>>>>> passed as an argument to the copying function. Since a copy like that can >>>>>> escape a pointer, we need to generate the byte type to avoid the issue. >>>>>> >>>>>> Example >>>>>> >>>>>> void foo(unsigned char arg1, char arg2) {...} >>>>>> >>>>>> => >>>>>> >>>>>> void @foo(zeroext b8 %arg1, signext b8 %arg2) {...} >>>>>> >>>>>> >>>>>> >>>>>> std::byte is defined as enum class byte : unsigned char {} and therefore >>>>>> all bitwise operations over it can be lowered equivalently to operations >>>>> on >>>>>> unsigned char (operation is performed over zero-extended to i32 operands, >>>>>> and the result is truncated back). >>>>>> >>>>>> >>>>>> =============================>>>>>> >>>>>> Byte Type Semantics (Version 1) >>>>>> >>>>>> =============================>>>>>> >>>>>> >>>>>> [allowed] alloca/load/store >>>>>> >>>>>> ====================>>>>>> >>>>>> The byte type is allowed to be allocated, stored and loaded. No semantic >>>>>> changes needed. >>>>>> >>>>>> [allowed] zext/sext/trunc >>>>>> >>>>>> ===================>>>>>> >>>>>> In order to easily adapt the current replacement of i8 with b8, we want >>>>> to >>>>>> extend zext/sext/trunc instructions’ semantics to allow the first operand >>>>>> as a byte type as well. This will reduce the number of instructions >>>>> needed >>>>>> for the cast. >>>>>> >>>>>> Example >>>>>> >>>>>> trunc i32 %int to b8 >>>>>> >>>>>> zext i8 %char to b32 >>>>>> >>>>>> [modified] bitcast >>>>>> >>>>>> =============>>>>>> >>>>>> We modify the semantics of the bitcast to allow casts from and to the >>>>> byte >>>>>> types. If the byte has a value of the same kind (integral or a pointer) >>>>> as >>>>>> the other type, then the bitcast is a noop. Otherwise, the result is >>>>> poison. >>>>>> Example >>>>>> >>>>>> bitcast i<N> %val to b<N> >>>>>> >>>>>> bitcast i<N>* %val to b64 >>>>>> >>>>>> bitcast b<N> %byte to i<N> [byte has an integral value => noop] >>>>>> >>>>>> >>>>>> >>>>>> [new] bytecast >>>>>> >>>>>> ===========>>>>>> >>>>>> During IR generation, we cannot deduce whether the byte value contains a >>>>>> pointer or an integral value, and therefore we cannot create a bitcast. >>>>>> >>>>>> Example >>>>>> >>>>>> int @cast(char c) { return (int) c; } >>>>>> >>>>>> Current >>>>>> >>>>>> Proposed >>>>>> >>>>>> i32 @cast(i8 signext %c) { >>>>>> >>>>>> %1 = alloca i8, align 1 >>>>>> >>>>>> store i8 %c, i8* %1, align 1 >>>>>> >>>>>> %2 = load i8, i8* %1, align 1 >>>>>> >>>>>> %3 = sext i8 %2 to i32 >>>>>> >>>>>> ret i32 %3 >>>>>> >>>>>> } >>>>>> >>>>>> i32 @cast(b8 %c) { >>>>>> >>>>>> %1 = alloca b8 >>>>>> >>>>>> store b8 %c, b8* %11 >>>>>> >>>>>> %2 = load b8, b8* %1 >>>>>> >>>>>> %3 = ? b8 %2 to i32 >>>>>> >>>>>> ret i32 %3 >>>>>> >>>>>> } >>>>>> >>>>>> In this example, the operation ? can be sext (b8 is i8) or can be >>>>>> ptrtoint if the byte has a pointer value. >>>>>> >>>>>> We therefore introduce a new bytecast instruction. The frontend will >>>>> always >>>>>> produce a bytecast, which can then be optimized into a more specific cast >>>>>> if necessary. Bytecast operands must have the same bitwidth (or size). >>>>>> >>>>>> The semantics of bytecast instruction is: >>>>>> >>>>>> bytecast b<N> %byte to T >>>>>> >>>>>> The kind of the value of the byte >>>>>> >>>>>> T >>>>>> >>>>>> Semantics >>>>>> >>>>>> Pointer >>>>>> >>>>>> Integral >>>>>> >>>>>> ptrtoint >>>>>> >>>>>> Pointer >>>>>> >>>>>> Pointer >>>>>> >>>>>> bitcast >>>>>> >>>>>> Integral >>>>>> >>>>>> Integral >>>>>> >>>>>> integral casts (zext, sext, etc.) >>>>>> >>>>>> Integral >>>>>> >>>>>> Pointer >>>>>> >>>>>> inttoptr >>>>>> >>>>>> >>>>>> >>>>>> Essentially, this instruction is a selection of the right casts. Once the >>>>>> compiler has been able to prove the kind of the byte’s value, the >>>>> bytecast >>>>>> can be replaced with appropriate cast or become a noop. >>>>>> >>>>>> Example >>>>>> >>>>>> bytecast b64 %bytes to i32* >>>>>> >>>>>> bytecast b8 %byte to i8 >>>>>> >>>>>> >>>>>> >>>>>> [disabled] and/or/xor >>>>>> >>>>>> ================>>>>>> >>>>>> We do not allow bitwise operations over byte types because it’s not >>>>> trivial >>>>>> to define all cases, particularly the ones involving pointers. If these >>>>>> operations are useful for optimizations and/or some frontends, semantics >>>>>> for these can be considered separately. >>>>>> >>>>>> If the operation is desired on the frontend level, then the default >>>>>> generated code can always cast the byte type to an integer to operate on >>>>>> integer values. >>>>>> >>>>>> >>>>>> >>>>>> [disabled] arithmetic >>>>>> >>>>>> ================>>>>>> >>>>>> Performing arithmetic operations over the byte type is similarly not >>>>>> allowed (A valid question is what does it mean to add to bytes of raw >>>>>> memory?). If we want to perform arithmetic, we need to cast a byte to an >>>>>> integer (via sext/zext/trunc explicitly). >>>>>> >>>>>> >>>>>> >>>>>> [modified] comparison >>>>>> >>>>>> =================>>>>>> >>>>>> We allow performing comparisons, as we may potentially want to compare >>>>> the >>>>>> ordering of the memory instances, check for null, etc. Comparison is also >>>>>> needed since char types are commonly compared. We define the following >>>>>> semantic of the byte type comparison. >>>>>> >>>>>> Case 1: The values of the bytes have the same kinds: compare the values. >>>>>> >>>>>> Case 2: The values of the bytes have the different kinds: do a bytecast >>>>> of >>>>>> the non-integral type to the other type and compare the integral values. >>>>>> >>>>>> >>>>>> Example >>>>>> >>>>>> ======>>>>>> >>>>>> unsigned char sum(unsigned char a, unsigned char b) { return a + b; } >>>>>> >>>>>> Current >>>>>> >>>>>> Proposed >>>>>> >>>>>> zeroext i8 @sum(i8 zeroext %a, i8 zeroext %b) { >>>>>> >>>>>> %1 = alloca i8 >>>>>> >>>>>> %2 = alloca i8 >>>>>> >>>>>> store i8 %a, i8* %1 >>>>>> >>>>>> store i8 %b, i8* %2 >>>>>> >>>>>> %3 = load i8, i8* %1 >>>>>> >>>>>> %4 = zext i8 %3 to i32 >>>>>> >>>>>> %5 = load i8, i8* %2 >>>>>> >>>>>> %6 = zext i8 %5 to i32 >>>>>> >>>>>> %7 = add nsw i32 %4, %6 >>>>>> >>>>>> %8 = trunc i32 %7 to i8 >>>>>> >>>>>> ret i8 %8 >>>>>> >>>>>> } >>>>>> >>>>>> zeroext b8 @sum(b8 zeroext %a, b8 zeroext %b) { >>>>>> >>>>>> %1 = alloca b8 >>>>>> >>>>>> %2 = alloca b8 >>>>>> >>>>>> store b8 %a, b8* %1 >>>>>> >>>>>> store b8 %b, b8* %2 >>>>>> >>>>>> %3 = load b8, b8* %1 >>>>>> >>>>>> %4 = bytecast b8 %3 to i8 >>>>>> >>>>>> %5 = zext i8 %4 to i32 >>>>>> >>>>>> %6 = load b8, b8* %2 >>>>>> >>>>>> %7 = bytecast b8 %6 to i8 >>>>>> >>>>>> %8 = zext i8 %7 to i32 >>>>>> >>>>>> %9 = add nsw i32 %5, %8 >>>>>> >>>>>> %10 = trunc i32 %9 to b8 >>>>>> >>>>>> ret b8 %10 >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> George >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> cfe-dev mailing list >>>>>> cfe-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>-- Website: https://people.mpi-sws.org/~jung/
John McCall via llvm-dev
2021-Jun-14 05:33 UTC
[llvm-dev] [RFC] Introducing a byte type to LLVM
On 13 Jun 2021, at 11:26, Ralf Jung wrote:> Hi Johannes, > >>> I think Joshua gave a very nice motivation already. >> >> I don't dispute that but I am still not understanding the need for >> bytes. None of the examples I have seen so far >> clearly made the point that it is the byte types that provide a >> substantial benefit. The AA example below does neither. > > I hope > <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html> > makes a convincing case that under the current semantics, when one > does an "i64" load of a value that was stored at pointer type, we have > to say that this load returns poison. In particular, saying that this > implicitly performs a "ptrtoint" is inconsistent with optimizations > that are probably too important to be changed to accommodate this > implicit "ptrtoint".I think it is fact rather obvious that, if this optimization as currently written is indeed in conflict with the current semantics, it is the optimization that will have to give. If the optimization is too important for performance to give up entirely, we will simply have to find some more restricted pattern that wee can still soundly optimize. Perhaps the clearest reason is that, if we did declare that integer types cannot carry pointers and so introduced byte types that could, C frontends would have to switch to byte types for their integer types, and so we would immediately lose this supposedly important optimization for C-like languages, and so, since optimizing C is very important, we would immediately need to find some restricted pattern under which we could soundly apply this optimization to byte types. That’s assuming that this optimization is actually significant, of course. John.