The old and popular tradition of "int" being the default all-purpose type in C hit a snag with the arrival of 64-bit general-purpose architectures, when, for reasons of memory usage and compatibility, "int" was fixed at 32 bits. This meant it was no longer the same size as a pointer, the size of an array index, or the widest native integer type. (Sequence of events simplified for brevity.) The C standard tries to help with a bunch of typedefs: int_fast32_t, int_least32_t, intptr_t, int32_t, size_t, ptrdiff_t, and more. With all these great typedefs, one could imagine that there's no reason anyone should ever use plain "int" ever again. Unfortunately, juggling all these typedefs can be impractical, for several reasons. C doesn't have function overloading, so whenever one talks to the standard library (e.g. to call "abs"), one must know what the underlying types are (e.g. which of "abs", "absl", or "absll" is needed?). printf also has a problem, and although the C standard actually tries to help there, "%" PRIdLEAST32 and "%" PRIdFAST32 have yet to seriously challenge the mindshare of "%d". And there are other issues. "int" remains widely popular, even though it isn't quite the close fit to hardware that it used to be. Consider this simple testcase, which is representative of a broader issue, containing an innocent use of int: void my_copy(char *dst, const char *src, int n) { for (int i = 0; i != n; ++i) dst[i] = src[i]; } On LP64 platforms, this code has implicit sign-extensions to convert the 32-bit int "i" to 64 bits in the array index expressions. Sign extensions are relatively fast on most machines, but there's also the cost of delaying the computations of the addresses for the memory references, which is significant. And small loops turn small problems into big problems. The result is around a 12% slowdown on the machine I happen to be typing this on. (I'm ignoring loop vectorization and memcpy pattern matching here, to keep this example simple without loosing too much generality.) However, C compilers have a way to fix this problem, by repurposing an ancient hack. Long ago, the C standard was designed to accommodate machines that didn't use two's complement representations for signed integers. Different representations lead to different behaviors on overflow, and the C committee decided to make signed integer arithmetic portable by declaring that overflow in a signed integer type gets undefined behavior. Since then, two's complement has largely taken over, but the C standard still has that rule. Today, compilers use this rule to promote 32-bit "int" variables to 64 bits to eliminate these troublesome sign extensions. This is considered valid because any time this would change the behavior of the program, it would be an overflow in the narrower type, which the rule says is undefined behavior. Now consider LLVM. In LLVM, "int" is lowered to "i32" (on LP64 targets), and signedness is a property of operators, rather than types. For many years, LLVM was unable to promote "int" variables, because it lacked the necessary information. Then, I was assigned to fix this. I added a flag named "nsw", which stands for No Signed Wrap, to be attached to add instructions and others. It's not a very satisfying name, but it's sufficiently unique and easy to find in LangRef. The purpose of the nsw flag is to indicate instructions which are known to have no overflow. The nsw flag is simple on the surface, but it has complexities. One is that nsw add is not an associative operation (unlike mathematical integers, two's complement wrapping signed integers, and unsigned integers, which are all associative). With nsw, ((a + b) + c) is not equivalent to (a + (b + c)), because (b + c) might overflow, even if ((a + b) + c) has no overflow. This is inconvenient, but it's manageable. A much bigger complication is that an add with undefined behavior on overflow is an instruction that potentially has side effects. Side effects mean the compiler can't freely move such instructions around. This is jarring, because add instructions are the kinds of things that otherwise can be easily moved around -- they are cheap and safe. An example of this is short-circuit elimination -- turning && into &. If the right operand of the && contains an add, it will need to be speculatively executed, and that isn't safe if it might have side effects. The first solution for this was to define nsw add to return undef on overflow, instead of invoking immediate undefined behavior. This way, overflow can happen and it doesn't cause any harm unless the value is actually used. With the code motion problem fixed, I proceeded to implement nsw using this approach, and a sign-extension elimination optimization on top of it. However, I later realized that undef isn't actually expressive enough for this purpose. Undef can produce any value, but it will still be some specific value of its type (with fluctuation, but that's not important here). When an integer add is promoted to a wider type, an expression in the narrower type which would overflow is converted to an expression in the wider type which returns a value which can't be expressed in the narrower type. There is no possible value in the narrower type that undef could take on which would produce the same behavior. When I asked Chris what to do about this problem, he didn't understand why I was worrying about it. Admittedly, it is a fairly obscure problem. But at the time, I was of the opinion that we shouldn't sweep known semantic problems under the rug. We don't have formal semantics, and we don't do correctness proofs, but it still seemed that we should respond when we do notice inconsistencies. I wanted to get to the bottom of it. I set out to find some way to say that an nsw add which overflows doesn't invoke immediate undefined behavior, but invokes undefined behavior only when its result is used. It can't be just any use though, because I didn't want to rule out the possibility of hoisting chains of instructions, like a+b+c+d+e. LLVM doesn't usually speculate this aggressively today, but there are plausible scenarios where it might want to in the future. This led me to think about making nsw overflow return some kind of poisoned value, like undef, only more powerful. The poison would propagate down the def-use graph until it reaches an instruction that is capable of triggering the undefined behavior. I called the poison value "trap", because of some superficial similarity to C's "trap value" concept, although I now believe this to be mistaken. It really is quite different. Also, it's unrelated to the llvm.trap intrinsic. It should probably be renamed. The presence of this poison value means that undefined behavior has been invoked on a potentially speculative code path. The consequences of the undefined behavior are deferred until that code path is actually used on some non-speculative path. This concept seemed to fit all the requirements. It allows for arbitrary speculation, it's sufficient for sign-extension elimination, and it's useful for a handful of other neat tricks as well. I wrote up a description of this concept, and it's been in LangRef ever since. It sticks out though, because it got pretty big and complex, especially in view of its relative obscurity. Realistically speaking, it's probably not fully watertight yet. But I'm not aware of any fundamental flaws. Perhaps the best way to understand the complexity is to imagine writing a tool to detect this kind of undefined behavior. Detecting signed overflow in C source is pretty easy: just insert checking code at each signed operation. But as we've just seen, LLVM IR has different rules from C. The undefined behavior of nsw doesn't matter until it's actually observed. So initially, one might imagine detecting trouble by adding a bit to every Value to track whether it is a poison value. That's pretty heavy-weight, but it's doable. It gets worse though, because poison values can also be transmitted through memory. So one would also have to add a flag to bytes of memory which are poisoned. But it gets even worse, because poison values can be used as branch conditions, so one would have to do advanced CFG analysis to prove whether branch decisions based on poisoned values actually matter. And it gets worse yet, because the control structure of the program may be dynamic, so what one would really need to do is something like non-deterministic execution. This can easily take exponential amounts of time and memory. A natural reaction to this problem is to think that LLVM IR is so nice and pretty that naturally there must be a nice and pretty solution. Here are some alternatives that have been considered: - Go back to using undef for overflow. There were no known real-world bugs with this. It's just inconsistent. - Define add nsw as a fully side-effecting operation, and accept the limits on code motion that this implies. However, as LLVM starts doing profile-guided optimizations, and starts thinking about more diverse architectures, code speculation will likely become more important. - Define add nsw as a fully side-effecting operation, and teach optimization passes to strip nsw when moving code past control boundaries. This is seen as suboptimal because it prevents subsequent passes from making use of the nsw information. And, it's extra work for optimization passes. - Instead of trying to define dependence in LangRef, just say that if changing the value returned from an overflowing add nsw would affect the observable behavior of the program, then the behavior of the program is undefined. This would reduce the amount of text in LangRef, but it would be a weaker definition, and it would require sign-extension optimizers and others to do significantly more work to establish that their transformations are safe. - Give up on nsw and have compilers emit warnings when they are unable to perform some optimization due to their inability to exclude the possibility of overflow. Obviously the warnings would not be on by default, or even -Wall or probably even -Wextra, so -Werror users need not revolt. Many people are often faced with code that they cannot modify for any number of reasons, and would not be able to make use of such warnings. It's an interesting tradeoff, but it's unpopular. Unfortunately, none of these are completely nice and pretty. Because of this, and because most people don't care, nsw with all its conceptual complexity has survived. Dan
A few thoughts from someone on the sidelines; hope they're helpful. On Tue, Nov 29, 2011 at 3:21 PM, Dan Gohman <gohman at apple.com> wrote:> The first solution for this was to define nsw add to return undef on > overflow, instead of invoking immediate undefined behavior. This way, > overflow can happen and it doesn't cause any harm unless the value > is actually used. With the code motion problem fixed, I proceeded to > implement nsw using this approach, and a sign-extension elimination > optimization on top of it. > > However, I later realized that undef isn't actually expressive enough > for this purpose. Undef can produce any value, but it will still be some > specific value of its type (with fluctuation, but that's not important > here). When an integer add is promoted to a wider type, an expression in > the narrower type which would overflow is converted to an expression in > the wider type which returns a value which can't be expressed in the > narrower type. There is no possible value in the narrower type that > undef could take on which would produce the same behavior. >Part of the confusion seems to come from overloading "undefined." The VAX architecture spec nicely dealt with the distinction between unspecified results and unspecified behavior by consistently using distinct terms: _unpredictable_ results, and _undefined_ behavior. An operation producing an unpredictable result could produce any bit-pattern that fit in the output operand; but it would not trap or do anything else unfriendly. An operation causing undefined behavior means anything could happen, from no-effect to machine meltdown. I always thought these were usefully distinct concepts, and the terminology convention really clarified that. I set out to find some way to say that an nsw add which overflows doesn't> invoke immediate undefined behavior, but invokes undefined behavior only > when its result is used. It can't be just any use though, because I didn't > want to rule out the possibility of hoisting chains of instructions, > like a+b+c+d+e. LLVM doesn't usually speculate this aggressively today, > but there are plausible scenarios where it might want to in the future. > > This led me to think about making nsw overflow return some kind > of poisoned value, like undef, only more powerful. The poison would > propagate down the def-use graph until it reaches an instruction that > is capable of triggering the undefined behavior. >Reminds me of how Itanium does speculation, although the details are getting fuzzy (it has been a while). Registers have an associated "NaT" (Not A Thing) flag, which gets set when something bad happens. (Usually loading from nonexistent or protected memory. Can't remember whether integer overflows can set this.) Some operations are NaT-tolerant; any input NaT is simply propagated to the output. Arithmetic, sign-extension, bit fiddling, and the like are NaT-tolerant. Some operations are NaT-intolerant; any input NaT causes a trap. Storing to memory, pointer derefs, and I think compares are all NaT-intolerant. There's also a "check" instruction that exists solely to check for NaT flags. That might be specific to memory traps; when you trap on a load, you want to know what address trapped, and the check instruction has a memory-address operand to provide that info. So initially, one might imagine detecting trouble by adding a bit> to every Value to track whether it is a poison value. That's pretty > heavy-weight, but it's doable. It gets worse though, because poison > values can also be transmitted through memory. So one would also > have to add a flag to bytes of memory which are poisoned. But it gets > even worse, because poison values can be used as branch conditions, > so one would have to do advanced CFG analysis to prove whether branch > decisions based on poisoned values actually matter. And it gets worse > yet, because the control structure of the program may be dynamic, so > what one would really need to do is something like non-deterministic > execution. This can easily take exponential amounts of time and memory. >I suggest that storing to memory, and conditionals, and anything else that seems to lead toward the really hairy situations, should be cases where the nsw/poison/trap is "actually observed." - Values in memory can be observable by other parts of the program, or by other threads/processes. - After you've made a control-flow decision, memory references in the successor blocks are part of the observable behavior of the program. This is taking a broader view of "observable" than what's defined in the C or C++ standards, which steadfastly ignore a variety of practical realities. But I think it's a reasonable and defensible position, that still allows you some scope for operating on these things in a sensible way. Pogo -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111130/41808a9f/attachment.html>
On Nov 30, 2011, at 12:21 PM, Paul Robinson wrote:> Part of the confusion seems to come from overloading "undefined." > The VAX architecture spec nicely dealt with the distinction between > unspecified results and unspecified behavior by consistently using > distinct terms: _unpredictable_ results, and _undefined_ behavior. > An operation producing an unpredictable result could produce any > bit-pattern that fit in the output operand; but it would not trap > or do anything else unfriendly. > An operation causing undefined behavior means anything could happen, > from no-effect to machine meltdown. > > I always thought these were usefully distinct concepts, and the > terminology convention really clarified that.LLVM IR does make these distinctions, though the language is a bit subtle. "Undefined behavior" and "behavior is undefined" mean anything could happen. Demons may fly out your nose. "undef" is an arbitrary fluctuating bit pattern. Also it causes operations that use it to exhibit constrained fluctuation. "Result is undefined" means an arbitrary bit pattern may be returned. Sometimes this means "undef" and sometimes this means a fixed bit pattern, though this inconsistency may be unintentional.> I suggest that storing to memory, and conditionals, and anything > else that seems to lead toward the really hairy situations, should > be cases where the nsw/poison/trap is "actually observed." > - Values in memory can be observable by other parts of the program, or > by other threads/processes. > - After you've made a control-flow decision, memory references in the > successor blocks are part of the observable behavior of the program. > > This is taking a broader view of "observable" than what's defined in the > C or C++ standards, which steadfastly ignore a variety of practical realities. > But I think it's a reasonable and defensible position, that still allows you > some scope for operating on these things in a sensible way.Prohibiting poison values from propogating through memory would mean that the reg2mem pass would no longer be a semantics-preserving pass. Prohibiting it from propogating through control flow would mean that a select instruction wouldn't be equivalent to a conditional branch and a phi. These are problems that aren't really important for hardware ISAs, but are important for LLVM IR. Dan
On Nov 29, 2011, at 3:21 PM, Dan Gohman wrote:> > A natural reaction to this problem is to think that LLVM IR is so nice > and pretty that naturally there must be a nice and pretty solution. Here > are some alternatives that have been considered: > > - Go back to using undef for overflow. There were no known real-world > bugs with this. It's just inconsistent.+1 for the simple and obvious answer. -Chris
On Nov 30, 2011, at 10:49 PM, Chris Lattner wrote:> > On Nov 29, 2011, at 3:21 PM, Dan Gohman wrote: > >> >> A natural reaction to this problem is to think that LLVM IR is so nice >> and pretty that naturally there must be a nice and pretty solution. Here >> are some alternatives that have been considered: >> >> - Go back to using undef for overflow. There were no known real-world >> bugs with this. It's just inconsistent. > > +1 for the simple and obvious answer.To be clear, the main sign-extension elimination optimization is not valid under the simple and obvious answer. Are you proposing to do it anyway? Dan
Dan Gohman <gohman at apple.com> writes:> The presence of this poison value means that undefined behavior has been > invoked on a potentially speculative code path. The consequences of the > undefined behavior are deferred until that code path is actually used > on some non-speculative path.This is exactly what was done on Itanium and its follow-ons to handle speculative loads. It's a perfectly reasonable approach IMHO in its basic formulation. The complication you introduced is to go beyond thwe basic formulation and consider certain kinds of uses "special" in that they don't trigger undefined behavior. This leads to the analysis problems you described.> - Go back to using undef for overflow. There were no known real-world > bugs with this. It's just inconsistent.I have always considered "undef" to mean "any possible value" so I don't see a real problem with promoted integers containing a value thhat can't fit in the original type size. The value in the original type size is undetermined by the definition of "undef" so who cares what we choose as the "real" value? That said, there are a couple of times I ran into situations where LLVM assumed "undef" somehow restricted the possible set of values it could represent. As I recall I ran into this in instcombine and dagcombine. I will have to go back and search for the details. In any event, I think these things are bugs in LLVM. So "undef" seems like a reasonable approach to me.> - Define add nsw as a fully side-effecting operation, and accept the > limits on code motion that this implies. However, as LLVM starts doing > profile-guided optimizations, and starts thinking about more diverse > architectures, code speculation will likely become more important.It will indeed. I don't like this approach.> - Define add nsw as a fully side-effecting operation, and teach > optimization passes to strip nsw when moving code past control > boundaries. This is seen as suboptimal because it prevents subsequent > passes from making use of the nsw information. And, it's extra work > for optimization passes.Agreed.> - Instead of trying to define dependence in LangRef, just say that if > changing the value returned from an overflowing add nsw would > affect the observable behavior of the program, then the behavior of > the program is undefined.Isn't that exactly what the C standard says? Now, LLVM handles more than C so we need to consider that. Personally, I agree that this is a less than satisfying solution. Are you mainly worried about the promoted value being vcasted back to the original type?> - Give up on nsw and have compilers emit warnings when they are unable > to perform some optimization due to their inability to exclude the > possibility of overflow.This would be very bad. We groan every time we see "unsigned" used as a loop index because it seriously degrades dependence analysis and vectorization for exactly this reason. -Dave
Paul Robinson <pogo.work at gmail.com> writes:> Some operations are NaT-tolerant; any input NaT is simply propagated > to the output. Arithmetic, sign-extension, bit fiddling, and the like > are NaT-tolerant.Oh yeah, I forgot about that. Forget all the stuff I said about "base formulation" and complicating that. :)> I suggest that storing to memory, and conditionals, and anything > else that seems to lead toward the really hairy situations, should > be cases where the nsw/poison/trap is "actually observed."Agreed.> This is taking a broader view of "observable" than what's defined in the > C or C++ standards, which steadfastly ignore a variety of practical realities. > But I think it's a reasonable and defensible position, that still allows you > some scope for operating on these things in a sensible way.I like this approach a lot. -Dave