thr3ads.net - llvm dev - [llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior." [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth via llvm-dev

2019-Jun-13 01:20 UTC

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

FWIW, the talks linked by Mehdi really do talk about these things and why I
don't think the really are the correct trade-off.

Even if you imagine an unsigned type that doesn't allow wrapping, I think
this is a really bad type. The problem is that you have made the most
common value of the type (zero in every study I'm aware of) be a boundary
condition. Today, it wraps to a huge value if you cross it. Afterward, it
would trap. Both are super surprising.

Another way of looking at the same lens: do you subtract these values?
Should `a + (b - c)` be the same as `(a + b) - c`? You either need a signed
type or wrapping to have reasonable answers here. And if you solve this
with wrapping, then it makes any attempt to write assertions or other
checks in the same type system very difficult. The fact that you write an
assert to check for "did I accidentally go past zero?" by conjuring
some
"it's probably too large" value and then comparing if it is
*greater* than
that is ... extraordinarily confusing.

Meanwhile, with signed types, it is quite easy to write asserts that check
for non-negative values in the correct places. They are easy to read and
produce easily understood errors. The boundary conditions are uncommon.

Even on the C++ standards committee, there is remarkably strong consensus
that in the *absence* of unsigned types coming back from `.size()` methods
and such, we should be using signed types for the reasons above.

The fact that we have unsigned `size_t` in a bunch of places is, IMO, a
concern and it is important to have good ways of avoiding warnings. But I
think we have so very many ways that don't require us to just use unsigned
types everywhere and deal with the above issues:

- Change the return types of our containers `size()` methods.
- Add a `ssize()` method. (This is the direction the committee is moving
AFAICT, but they are constrained by a powerful desire to break zero code,
where as LLVM's containers have much more API freedom.)
- Use idioms like the one I suggested with `llvm::seq`.

Any or all of these seem significantly preferable to the readability
concerns I outline above, at least to me. This is why I am still *strongly*
in favor of signed types and assertions around value at known points where
the value should obey that assertion.

-Chandler

On Wed, Jun 12, 2019 at 1:01 AM Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> +1 to both points here.
>
> On Wed, 12 Jun 2019, 07:55 Aaron Ballman via llvm-dev, <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Tue, Jun 11, 2019, 9:59 PM Zachary Turner <zturner at
roblox.com> wrote:
>>
>>> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at
gmail.com>
>>> wrote:
>>>
>>>> I agree that readability, maintainability, and ability to
debug/find
>>>> issues are key.
>>>> I haven't found myself in a situation where unsigned was
helping my
>>>> readability: on the opposite actually I am always wondering
where is the
>>>> expecting wrap-around behavior and that is one more thing I
have to keep in
>>>> mind when I read code that manipulate unsigned. So YMMV but
using unsigned
>>>> *increases* my mental load when reading code.
>>>>
>>> I'm on the other end.  I'm always reading the code
wondering "is this
>>> going to warn?"  "Why could a container ever have a
negative number of
>>> elements?"  "The maximum value representable by the
return type (unsigned)
>>> is larger than that of the value i'm storing it in (signed), so
an overflow
>>> could happen even if there were no error.  What then?"
>>>
>>
>> Strong +1 to this.
>>
>> ~Aaron
>>
>>
>>>
>>> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at
meinersbur.de>
>>> wrote:
>>>
>>>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via
llvm-dev
>>>> <llvm-dev at lists.llvm.org>:
>>>> >
>>>> > I'm personally against changing everything to signed
integers.  To
>>>> me, this is an example of making code strictly less readable
and more
>>>> confusing in order to fight deficiencies in the language
standard.  I get
>>>> the problem that it's solving, but I view this as mostly a
theoretical
>>>> problem, whereas being able to read the code and have it make
sense is a
>>>> practical problem that we must face on a daily basis.  If you
change
>>>> everything to signed integers, you may catch a real problem
with it a
>>>> couple of times a year.  And by "real problem" here,
I'm talking about a
>>>> miscompile or an actual bug that surfaces in production
somewhere, rather
>>>> than a "yes, it seems theoretically possible for this to
overflow".
>>>>
>>>> Doesn't it make it already worth it?
>>>>
>>> vector.size() returns a size_t, which on 64-bit platforms can
represent
>>> types values larger than those that can fit into an int64_t.  So to
turn
>>> your argument around, since it's theoretically possible to have
a vector
>>> with more items than an int64_t can represent, isn't it already
worth it to
>>> use size_t, which is an unsigned type?
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190612/40d574e3/attachment.html>

Stefan Teleman via llvm-dev

2019-Jun-13 16:58 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Wed, Jun 12, 2019 at 9:21 PM Chandler Carruth via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> FWIW, the talks linked by Mehdi really do talk about these things and why I
don't think the really are the correct trade-off.
>
> Even if you imagine an unsigned type that doesn't allow wrapping, I
think this is a really bad type. The problem is that you have made the most
common value of the type (zero in every study I'm aware of) be a boundary
condition. Today, it wraps to a huge value if you cross it. Afterward, it would
trap. Both are super surprising.
[ ... ]> Any or all of these seem significantly preferable to the readability
concerns I outline above, at least to me. This is why I am still *strongly* in
favor of signed types and assertions around value at known points where the
value should obey that assertion.
Have there been any documented cases in LLVM where a for() loop with
an unsigned int induction variable has wrapped around to 0? In other
words, is there any container - either LLVM or C++ Standard Library -
that ended up storing more than UINT_MAX or ULLONG_MAX elements?

I'm looking at these values in <limits.h>:

#define UINT_MAX      4294967295U
#define ULLONG_MAX   18446744073709551615ULL

and I am having a really hard time imagining a llvm::SmallVector<Foo>
storing 18446744073709551615ULL + 1ULL Foo elements. But i'm happy to
be proven wrong.

As far as the C++ Standard Library is concerned, all the containers
implement std::<container-type>::max_size(), which is of type
std::size_t and is always - and intentonally - smaller than either
UINT_MAX or ULLONG_MAX.

So I'm not even sure how an unsigned induction variable testing for
std::vector<Foo>::size() or a std::string::size() - for example -
would ever end up wrapping around to 0. The container will blow up
when its number of elements attempts to exceed its max_size().

Plus, it's not that hard to write

std::vector<Foo> FooVector;
for (unsigned I = 0; I < ${SOMETHING} && I < FooVector.max_size();
++I) {
}

if unsigned's wrap-around is a material concern.

Maybe the compiler should just warn when sizeof(unsigned) <
sizeof(std::container<Foo>::max_size()). I think that would be enough
of a hint, and in the vast majority of cases it will be moot anyway.

Just my 0.02.

-- 
Stefan Teleman
stefan.teleman at gmail.com

Jameson Nash via llvm-dev

2019-Jun-13 17:17 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

> Should `a + (b - c)` be the same as `(a + b) - c`? You either need asigned type or wrapping to have reasonable answers here

Depending on what "reasonable" means here, only wrapping (unsigned in
C)
gets you this commutative property. For a signed value with C, it's
possible for one of these to be undefined behavior, while the other returns
a reasonable value. For instance, `a == b == c == std::
numeric_limits<typeof(a)>::min()`
is probably unusual as a value, but could be used as a sentinel (perhaps to
represent an infinite or empty set). Of course, the unsigned result might
just be nonsense.

Anyways, I don't have a strong opinion either way, since I think they both
can have surprises.

One other occasional benefit to using unsigned that can be surprising is
that power-of-two division is slightly cheaper (since it doesn't need to
handle negative numbers):

(ssize_t)x / 2
shrq $63, %rax
leaq (%rax,%rdi), %rax
sarq %rax

(size_t)x / 2
shrq %rdi

> > is there any container

I'd posit that UINT_MAX is uncommon, but pretty easy to exceed (although it
needs a 64-bit machine to represent it). For example, anything that might
need to handle the return value of `MemoryBuffer::getFile` could come
across a file that's larger than 2GB.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190613/ade1da77/attachment.html>

llvm dev - Jun 2019 - [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."