Chandler Carruth via llvm-dev
2019-Jun-13 01:20 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
FWIW, the talks linked by Mehdi really do talk about these things and why I don't think the really are the correct trade-off. Even if you imagine an unsigned type that doesn't allow wrapping, I think this is a really bad type. The problem is that you have made the most common value of the type (zero in every study I'm aware of) be a boundary condition. Today, it wraps to a huge value if you cross it. Afterward, it would trap. Both are super surprising. Another way of looking at the same lens: do you subtract these values? Should `a + (b - c)` be the same as `(a + b) - c`? You either need a signed type or wrapping to have reasonable answers here. And if you solve this with wrapping, then it makes any attempt to write assertions or other checks in the same type system very difficult. The fact that you write an assert to check for "did I accidentally go past zero?" by conjuring some "it's probably too large" value and then comparing if it is *greater* than that is ... extraordinarily confusing. Meanwhile, with signed types, it is quite easy to write asserts that check for non-negative values in the correct places. They are easy to read and produce easily understood errors. The boundary conditions are uncommon. Even on the C++ standards committee, there is remarkably strong consensus that in the *absence* of unsigned types coming back from `.size()` methods and such, we should be using signed types for the reasons above. The fact that we have unsigned `size_t` in a bunch of places is, IMO, a concern and it is important to have good ways of avoiding warnings. But I think we have so very many ways that don't require us to just use unsigned types everywhere and deal with the above issues: - Change the return types of our containers `size()` methods. - Add a `ssize()` method. (This is the direction the committee is moving AFAICT, but they are constrained by a powerful desire to break zero code, where as LLVM's containers have much more API freedom.) - Use idioms like the one I suggested with `llvm::seq`. Any or all of these seem significantly preferable to the readability concerns I outline above, at least to me. This is why I am still *strongly* in favor of signed types and assertions around value at known points where the value should obey that assertion. -Chandler On Wed, Jun 12, 2019 at 1:01 AM Renato Golin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> +1 to both points here. > > On Wed, 12 Jun 2019, 07:55 Aaron Ballman via llvm-dev, < > llvm-dev at lists.llvm.org> wrote: > >> >> >> On Tue, Jun 11, 2019, 9:59 PM Zachary Turner <zturner at roblox.com> wrote: >> >>> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com> >>> wrote: >>> >>>> I agree that readability, maintainability, and ability to debug/find >>>> issues are key. >>>> I haven't found myself in a situation where unsigned was helping my >>>> readability: on the opposite actually I am always wondering where is the >>>> expecting wrap-around behavior and that is one more thing I have to keep in >>>> mind when I read code that manipulate unsigned. So YMMV but using unsigned >>>> *increases* my mental load when reading code. >>>> >>> I'm on the other end. I'm always reading the code wondering "is this >>> going to warn?" "Why could a container ever have a negative number of >>> elements?" "The maximum value representable by the return type (unsigned) >>> is larger than that of the value i'm storing it in (signed), so an overflow >>> could happen even if there were no error. What then?" >>> >> >> Strong +1 to this. >> >> ~Aaron >> >> >>> >>> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at meinersbur.de> >>> wrote: >>> >>>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev >>>> <llvm-dev at lists.llvm.org>: >>>> > >>>> > I'm personally against changing everything to signed integers. To >>>> me, this is an example of making code strictly less readable and more >>>> confusing in order to fight deficiencies in the language standard. I get >>>> the problem that it's solving, but I view this as mostly a theoretical >>>> problem, whereas being able to read the code and have it make sense is a >>>> practical problem that we must face on a daily basis. If you change >>>> everything to signed integers, you may catch a real problem with it a >>>> couple of times a year. And by "real problem" here, I'm talking about a >>>> miscompile or an actual bug that surfaces in production somewhere, rather >>>> than a "yes, it seems theoretically possible for this to overflow". >>>> >>>> Doesn't it make it already worth it? >>>> >>> vector.size() returns a size_t, which on 64-bit platforms can represent >>> types values larger than those that can fit into an int64_t. So to turn >>> your argument around, since it's theoretically possible to have a vector >>> with more items than an int64_t can represent, isn't it already worth it to >>> use size_t, which is an unsigned type? >>> >>> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190612/40d574e3/attachment.html>
Stefan Teleman via llvm-dev
2019-Jun-13 16:58 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
On Wed, Jun 12, 2019 at 9:21 PM Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > FWIW, the talks linked by Mehdi really do talk about these things and why I don't think the really are the correct trade-off. > > Even if you imagine an unsigned type that doesn't allow wrapping, I think this is a really bad type. The problem is that you have made the most common value of the type (zero in every study I'm aware of) be a boundary condition. Today, it wraps to a huge value if you cross it. Afterward, it would trap. Both are super surprising.[ ... ]> Any or all of these seem significantly preferable to the readability concerns I outline above, at least to me. This is why I am still *strongly* in favor of signed types and assertions around value at known points where the value should obey that assertion.Have there been any documented cases in LLVM where a for() loop with an unsigned int induction variable has wrapped around to 0? In other words, is there any container - either LLVM or C++ Standard Library - that ended up storing more than UINT_MAX or ULLONG_MAX elements? I'm looking at these values in <limits.h>: #define UINT_MAX 4294967295U #define ULLONG_MAX 18446744073709551615ULL and I am having a really hard time imagining a llvm::SmallVector<Foo> storing 18446744073709551615ULL + 1ULL Foo elements. But i'm happy to be proven wrong. As far as the C++ Standard Library is concerned, all the containers implement std::<container-type>::max_size(), which is of type std::size_t and is always - and intentonally - smaller than either UINT_MAX or ULLONG_MAX. So I'm not even sure how an unsigned induction variable testing for std::vector<Foo>::size() or a std::string::size() - for example - would ever end up wrapping around to 0. The container will blow up when its number of elements attempts to exceed its max_size(). Plus, it's not that hard to write std::vector<Foo> FooVector; for (unsigned I = 0; I < ${SOMETHING} && I < FooVector.max_size(); ++I) { } if unsigned's wrap-around is a material concern. Maybe the compiler should just warn when sizeof(unsigned) < sizeof(std::container<Foo>::max_size()). I think that would be enough of a hint, and in the vast majority of cases it will be moot anyway. Just my 0.02. -- Stefan Teleman stefan.teleman at gmail.com
Jameson Nash via llvm-dev
2019-Jun-13 17:17 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
> Should `a + (b - c)` be the same as `(a + b) - c`? You either need asigned type or wrapping to have reasonable answers here Depending on what "reasonable" means here, only wrapping (unsigned in C) gets you this commutative property. For a signed value with C, it's possible for one of these to be undefined behavior, while the other returns a reasonable value. For instance, `a == b == c == std:: numeric_limits<typeof(a)>::min()` is probably unusual as a value, but could be used as a sentinel (perhaps to represent an infinite or empty set). Of course, the unsigned result might just be nonsense. Anyways, I don't have a strong opinion either way, since I think they both can have surprises. One other occasional benefit to using unsigned that can be surprising is that power-of-two division is slightly cheaper (since it doesn't need to handle negative numbers): (ssize_t)x / 2 shrq $63, %rax leaq (%rax,%rdi), %rax sarq %rax (size_t)x / 2 shrq %rdi> > is there any containerI'd posit that UINT_MAX is uncommon, but pretty easy to exceed (although it needs a 64-bit machine to represent it). For example, anything that might need to handle the return value of `MemoryBuffer::getFile` could come across a file that's larger than 2GB. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190613/ade1da77/attachment.html>