Mehdi AMINI via llvm-dev
2019-Jun-08 17:17 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
Hi, The LLVM coding style does not specify anything about the use of signed/unsigned integer, and the codebase is inconsistent (there is a majority of code that is using unsigned index in loops today though). I'd like to suggest that we specify to prefer `int` when possible and use `unsigned` only for bitmask and when you intend to rely on wrapping behavior, see: https://reviews.llvm.org/D63049 A lot has been written about this, good references are [unsigned: A Guideline for Better Code] https://www.youtube.com/watch?v=wvtFGa6XJDU) and [Garbage In, Garbage Out: Arguing about Undefined Behavior...]( https://www.youtube.com/watch?v=yG1OZ69H_-o), as well as this panel discussion: - https://www.youtube.com/watch?v=Puio5dly9N8#t=12m12s - https://www.youtube.com/watch?v=Puio5dly9N8#t=42m40s Other coding guidelines already embrace this: - http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-signed - http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-unsigned - https://google.github.io/styleguide/cppguide.html#Integer_Types It is rare that overflowing (and wrapping) an unsigned integer won't trigger a program bug when the overflow was not intentionally handled. Using signed arithmetic means that you can actually trap on over/underflow and catch these bugs (when using fuzzing for instance). Chandler explained this nicely in his CPPCon 2016 talk "Garbage In, Garbage Out: Arguing about Undefined Behavior...". I encourage to watch the full talk but here is one relevant example: https://youtu.be/yG1OZ69H_-o?t=2006 , and here he mentions how Google experimented with this internally: https://youtu.be/yG1OZ69H_-o?t=2249 Unsigned integer also have a discontinuity right to the left of zero. Suppose A, B and C are small positive integers close to zero, say all less than a hundred or so. Then given: A + B > C and knowing elementary school algebra, one can rewrite that as: A > B - C But C might be greater than B, and the subtraction would produce some huge number. This happens even when working with seemingly harmless numbers like A=2, B=2, and C=3. -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190608/18a819b2/attachment.html>
Tim Northover via llvm-dev
2019-Jun-08 18:12 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
On Sat, 8 Jun 2019 at 18:18, Mehdi AMINI via llvm-dev <llvm-dev at lists.llvm.org> wrote:> The LLVM coding style does not specify anything about the use of signed/unsigned integer, and the codebase is inconsistent (there is a majority of code that is using unsigned index in loops today though). > > I'd like to suggest that we specify to prefer `int` when possible and use `unsigned` only for bitmask and when you intend to rely on wrapping behavior, see: https://reviews.llvm.org/D63049I'd prefer us to have something neater than static_cast<int> for the loop problem before we made that change. Perhaps add an ssize (or equivalent) method to all of our internal data structures? They're a lot more common than std::* containers. But it's not something that would keep me up at night. Cheers. Tim.
Chandler Carruth via llvm-dev
2019-Jun-09 06:04 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
On Sat, Jun 8, 2019 at 11:12 AM Tim Northover via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Sat, 8 Jun 2019 at 18:18, Mehdi AMINI via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > The LLVM coding style does not specify anything about the use of > signed/unsigned integer, and the codebase is inconsistent (there is a > majority of code that is using unsigned index in loops today though). > > > > I'd like to suggest that we specify to prefer `int` when possible and > use `unsigned` only for bitmask and when you intend to rely on wrapping > behavior, see: https://reviews.llvm.org/D63049 > > I'd prefer us to have something neater than static_cast<int> for the > loop problem before we made that change.Not sure this is much of a problem in LLVM-land... We already encourage a loop style that avoids the classical problem: ``` for (int Index = 0, Size = MyVector.size(); Index < Size; ++Index) { // ... } ``` Alternatively, LLVM supports a form I like even more: ``` for (int Index : llvm::seq<int>(0, MyVector.size())) { // ... } ``` If passing the explicit type is an issue, we could fix that? FWIW, I'd also be fine changing the return value ef `size()` for LLVM containers, and given the variability of standard library container performance, I'm generally a fan of LLVM code consistently using its own containers. -Chnadler> Perhaps add an ssize (or > equivalent) method to all of our internal data structures? They're a > lot more common than std::* containers. > > But it's not something that would keep me up at night. > > Cheers. > > Tim. > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190608/5b56b2b4/attachment.html>
Michael Kruse via llvm-dev
2019-Jun-10 16:24 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
Am Sa., 8. Juni 2019 um 13:12 Uhr schrieb Tim Northover via llvm-dev <llvm-dev at lists.llvm.org>:> I'd prefer us to have something neater than static_cast<int> for the > loop problem before we made that change. Perhaps add an ssize (or > equivalent) method to all of our internal data structures? They're a > lot more common than std::* containers.+1 Since C++20 is also introducing ssize [1] members, this makes a lot of sense to me. Using it would help avoiding an unsigned comparison as in if (IndexOfInterestingElement >= Container.size()) ... to sneak in from the start. Michael [1] http://wg21.link/p1227r1
Adrien Guinet via llvm-dev
2019-Jun-18 11:48 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
My 2 cents on this. I watched Chandler's talk, and it looks like there are two statements that are a bit contradictory one to another: https://www.youtube.com/watch?v=yG1OZ69H_-o&t=2820s "it's fairly easy to get enough bits w/o using the sign bits" versus https://www.youtube.com/watch?v=yG1OZ69H_-o&t=2846s "using size_t takes more space in data structures" Why not using size_t for indexes/length, while storing uint32_t integers and cast them back to size_t when necessary (i.e. for space reasons)? About checking for overflows, let's take the various examples here: https://godbolt.org/z/TIGiVe, which are various ways to load a byte at index "n+10" where n is user-controlled. In this case, we need to check that "n+10" does not overflow. The first thing is that, IMHO, overflow checks are tricky to get right for both the signed and unsigned case (and has led to numerous vulnerabilities). But that's a personal opinion. If we talk about code size (and performance), the load3 function confirms what Chandler showed in his talk (using uint32_t can be a performance issue on some 64-bit systems, do to the enforced wrapping that has to be done on 32bits integers). The other load* functions all emit the intended "mov eax, DWORD PTR [rdi+40+rsi*4]". Notable differences arise in load2, which needs to sign extend esi to rsi. In load1, there is also an extra instruction to set the value of (2**63-12). One conclusion is that, at least on x86-64, using size_t for these kind of overflow checks is the most efficient. But I might miss something / do something wrong in this analysis, and I'd like another POV on this! One argument for the usage of signed integers (actually integer types that don't allow wrapping) is indeed the ability to automatically insert these overflow checks at compile time. On 6/8/19 7:17 PM, Mehdi AMINI via llvm-dev wrote:> Hi, > > The LLVM coding style does not specify anything about the use of signed/unsigned integer, > and the codebase is inconsistent (there is a majority of code that is using unsigned index > in loops today though). > > I'd like to suggest that we specify to prefer `int` when possible and use `unsigned` only > for bitmask and when you intend to rely on wrapping behavior, > see: https://reviews.llvm.org/D63049 > > A lot has been written about this, good references are [unsigned: A Guideline for Better > Code] https://www.youtube.com/watch?v=wvtFGa6XJDU) and [Garbage In, Garbage Out: Arguing > about Undefined Behavior...](https://www.youtube.com/watch?v=yG1OZ69H_-o), as well as this > panel discussion: > - https://www.youtube.com/watch?v=Puio5dly9N8#t=12m12s > - https://www.youtube.com/watch?v=Puio5dly9N8#t=42m40s > > Other coding guidelines already embrace this: > > - http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-signed > - http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-unsigned > - https://google.github.io/styleguide/cppguide.html#Integer_Types > > It is rare that overflowing (and wrapping) an unsigned integer won't trigger a program bug > when the overflow was not intentionally handled. Using signed arithmetic means that you > can actually trap on over/underflow and catch these bugs (when using fuzzing for instance). > > Chandler explained this nicely in his CPPCon 2016 talk "Garbage In, Garbage Out: Arguing > about Undefined Behavior...". I encourage to watch the full talk but here is one relevant > example: https://youtu.be/yG1OZ69H_-o?t=2006 , and here he mentions how Google > experimented with this internally: https://youtu.be/yG1OZ69H_-o?t=2249 > > Unsigned integer also have a discontinuity right to the left of zero. Suppose A, B and C > are small positive integers close to zero, say all less than a hundred or so. Then given: > A + B > C > and knowing elementary school algebra, one can rewrite that as: > A > B - C > But C might be greater than B, and the subtraction would produce some huge number. This > happens even when working with seemingly harmless numbers like A=2, B=2, and C=3. > > -- > Mehdi > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Apparently Analagous Threads
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."