thr3ads.net - llvm dev - [llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior." [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2019-Jun-08 17:17 UTC

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Hi,

The LLVM coding style does not specify anything about the use of
signed/unsigned integer, and the codebase is inconsistent (there is a
majority of code that is using unsigned index in loops today though).

I'd like to suggest that we specify to prefer `int` when possible and use
`unsigned` only for bitmask and when you intend to rely on wrapping
behavior, see: https://reviews.llvm.org/D63049

A lot has been written about this, good references are [unsigned: A
Guideline for Better Code] https://www.youtube.com/watch?v=wvtFGa6XJDU) and
[Garbage In, Garbage Out: Arguing about Undefined Behavior...](
https://www.youtube.com/watch?v=yG1OZ69H_-o), as well as this panel
discussion:
- https://www.youtube.com/watch?v=Puio5dly9N8#t=12m12s
- https://www.youtube.com/watch?v=Puio5dly9N8#t=42m40s

Other coding guidelines already embrace this:

- http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-signed
- http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-unsigned
- https://google.github.io/styleguide/cppguide.html#Integer_Types

It is rare that overflowing (and wrapping) an unsigned integer won't
trigger a program bug when the overflow was not intentionally handled.
Using signed arithmetic means that you can actually trap on over/underflow
and catch these bugs (when using fuzzing for instance).

Chandler explained this nicely in his CPPCon 2016 talk "Garbage In, Garbage
Out: Arguing about Undefined Behavior...". I encourage to watch the full
talk but here is one relevant example: https://youtu.be/yG1OZ69H_-o?t=2006
, and here he mentions how Google experimented with this internally:
https://youtu.be/yG1OZ69H_-o?t=2249

Unsigned integer also have a discontinuity right to the left of zero.
Suppose A, B and C are small positive integers close to zero, say all less
than a hundred or so. Then given:
A + B > C
and knowing elementary school algebra, one can rewrite that as:
A > B - C
But C might be greater than B, and the subtraction would produce some huge
number. This happens even when working with seemingly harmless numbers like
A=2, B=2, and C=3.

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190608/18a819b2/attachment.html>

Tim Northover via llvm-dev

2019-Jun-08 18:12 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Sat, 8 Jun 2019 at 18:18, Mehdi AMINI via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> The LLVM coding style does not specify anything about the use of
signed/unsigned integer, and the codebase is inconsistent (there is a majority
of code that is using unsigned index in loops today though).
>
> I'd like to suggest that we specify to prefer `int` when possible and
use `unsigned` only for bitmask and when you intend to rely on wrapping
behavior, see: https://reviews.llvm.org/D63049
I'd prefer us to have something neater than static_cast<int> for the
loop problem before we made that change. Perhaps add an ssize (or
equivalent) method to all of our internal data structures? They're a
lot more common than std::* containers.

But it's not something that would keep me up at night.

Cheers.

Tim.

Chandler Carruth via llvm-dev

2019-Jun-09 06:04 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Sat, Jun 8, 2019 at 11:12 AM Tim Northover via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Sat, 8 Jun 2019 at 18:18, Mehdi AMINI via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > The LLVM coding style does not specify anything about the use of
> signed/unsigned integer, and the codebase is inconsistent (there is a
> majority of code that is using unsigned index in loops today though).
> >
> > I'd like to suggest that we specify to prefer `int` when possible
and
> use `unsigned` only for bitmask and when you intend to rely on wrapping
> behavior, see: https://reviews.llvm.org/D63049
>
> I'd prefer us to have something neater than static_cast<int> for
the
> loop problem before we made that change.

Not sure this is much of a problem in LLVM-land... We already encourage a
loop style that avoids the classical problem:

```
for (int Index = 0, Size = MyVector.size(); Index < Size; ++Index) {
  // ...
}
```

Alternatively, LLVM supports a form I like even more:
```
for (int Index : llvm::seq<int>(0, MyVector.size())) {
  // ...
}
```

If passing the explicit type is an issue, we could fix that?

FWIW, I'd also be fine changing the return value ef `size()` for LLVM
containers, and given the variability of standard library container
performance, I'm generally a fan of LLVM code consistently using its own
containers.

-Chnadler

> Perhaps add an ssize (or
> equivalent) method to all of our internal data structures? They're a
> lot more common than std::* containers.
>
> But it's not something that would keep me up at night.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190608/5b56b2b4/attachment.html>

Michael Kruse via llvm-dev

2019-Jun-10 16:24 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Am Sa., 8. Juni 2019 um 13:12 Uhr schrieb Tim Northover via llvm-dev
<llvm-dev at lists.llvm.org>:> I'd prefer us to have something neater than static_cast<int> for
the
> loop problem before we made that change. Perhaps add an ssize (or
> equivalent) method to all of our internal data structures? They're a
> lot more common than std::* containers.
+1

Since C++20 is also introducing ssize [1] members, this makes a lot of
sense to me. Using it would help avoiding an unsigned comparison as in

    if (IndexOfInterestingElement >= Container.size())
      ...

to sneak in from the start.

Michael

[1] http://wg21.link/p1227r1

Adrien Guinet via llvm-dev

2019-Jun-18 11:48 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

My 2 cents on this.

I watched Chandler's talk, and it looks like there are two statements that
are a bit
contradictory one to another:

https://www.youtube.com/watch?v=yG1OZ69H_-o&t=2820s "it's fairly
easy to get enough bits
w/o using the sign bits"
versus
https://www.youtube.com/watch?v=yG1OZ69H_-o&t=2846s "using size_t takes
more space in data
structures"

Why not using size_t for indexes/length, while storing uint32_t integers and
cast them
back to size_t when necessary (i.e. for space reasons)?

About checking for overflows, let's take the various examples here:
https://godbolt.org/z/TIGiVe, which are various ways to load a byte at index
"n+10" where
n is user-controlled. In this case, we need to check that "n+10" does
not overflow.
The first thing is that, IMHO, overflow checks are tricky to get right for both
the signed
and unsigned case (and has led to numerous vulnerabilities). But that's a
personal
opinion. If we talk about code size (and performance), the load3 function
confirms what
Chandler showed in his talk (using uint32_t can be a performance issue on some
64-bit
systems, do to the enforced wrapping that has to be done on 32bits integers).
The other
load* functions all emit the intended "mov eax, DWORD PTR
[rdi+40+rsi*4]". Notable
differences arise in load2, which needs to sign extend esi to rsi. In load1,
there is also
an extra instruction to set the value of (2**63-12).

One conclusion is that, at least on x86-64, using size_t for these kind of
overflow checks
is the most efficient. But I might miss something / do something wrong in this
analysis,
and I'd like another POV on this!

One argument for the usage of signed integers (actually integer types that
don't allow
wrapping) is indeed the ability to automatically insert these overflow checks at
compile time.

On 6/8/19 7:17 PM, Mehdi AMINI via llvm-dev wrote:> Hi,
> 
> The LLVM coding style does not specify anything about the use of
signed/unsigned integer,
> and the codebase is inconsistent (there is a majority of code that is using
unsigned index
> in loops today though).
> 
> I'd like to suggest that we specify to prefer `int` when possible and
use `unsigned` only
> for bitmask and when you intend to rely on wrapping behavior,
> see: https://reviews.llvm.org/D63049
> 
> A lot has been written about this, good references are [unsigned: A
Guideline for Better
> Code] https://www.youtube.com/watch?v=wvtFGa6XJDU) and [Garbage In, Garbage
Out: Arguing
> about Undefined Behavior...](https://www.youtube.com/watch?v=yG1OZ69H_-o),
as well as this
> panel discussion:
> - https://www.youtube.com/watch?v=Puio5dly9N8#t=12m12s
> - https://www.youtube.com/watch?v=Puio5dly9N8#t=42m40s
> 
> Other coding guidelines already embrace this:
> 
> - http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-signed
> - http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-unsigned
> - https://google.github.io/styleguide/cppguide.html#Integer_Types
> 
> It is rare that overflowing (and wrapping) an unsigned integer won't
trigger a program bug
> when the overflow was not intentionally handled. Using signed arithmetic
means that you
> can actually trap on over/underflow and catch these bugs (when using
fuzzing for instance).
> 
> Chandler explained this nicely in his CPPCon 2016 talk "Garbage In,
Garbage Out: Arguing
> about Undefined Behavior...". I encourage to watch the full talk but
here is one relevant
> example: https://youtu.be/yG1OZ69H_-o?t=2006 , and here he mentions how
Google
> experimented with this internally: https://youtu.be/yG1OZ69H_-o?t=2249
> 
> Unsigned integer also have a discontinuity right to the left of zero.
Suppose A, B and C
> are small positive integers close to zero, say all less than a hundred or
so. Then given:
> A + B > C
> and knowing elementary school algebra, one can rewrite that as:
> A > B - C
> But C might be greater than B, and the subtraction would produce some huge
number. This
> happens even when working with seemingly harmless numbers like A=2, B=2,
and C=3.
> 
> -- 
> Mehdi
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jun 2019 - [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Maybe Matching Threads