thr3ads.net - llvm dev - [llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior." [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Michael Kruse via llvm-dev

2019-Jun-11 19:25 UTC

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
<llvm-dev at lists.llvm.org>:>
> I'm personally against changing everything to signed integers.  To me,
this is an example of making code strictly less readable and more confusing in
order to fight deficiencies in the language standard.  I get the problem that
it's solving, but I view this as mostly a theoretical problem, whereas being
able to read the code and have it make sense is a practical problem that we must
face on a daily basis.  If you change everything to signed integers, you may
catch a real problem with it a couple of times a year.  And by "real
problem" here, I'm talking about a miscompile or an actual bug that
surfaces in production somewhere, rather than a "yes, it seems
theoretically possible for this to overflow".
Doesn't it make it already worth it?

> On the other hand, a large number of people need to work in this codebase
every day, and multiplied over the same time period, my belief is that having
the code make sense and be simple has a higher net value.
>
> It simply doesn't make sense (conceptually) to use a signed type for
domains that are inherently unsigned, like the size of an object.  IMO, we
should revisit this if and when the deficiencies in the C++ Standard are
addressed.
The underlying problem is that the C family of languages mixes two
orthogonal properties: value range and overflow behavior. There is no
unsigned type with undefined wraparound. So the question becomes: What
property is more important to reflect? Do we want catch unintended
wraparound behavior using a sanitizer/make optimizations based on it?
Do we need the additional range provided by an unsigned type? As
Chandler says in one of his talks linked earlier: "If you need more
bits, use more bits" (such as int64_t).

Michael

Zachary Turner via llvm-dev

2019-Jun-11 19:59 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
> I agree that readability, maintainability, and ability to debug/find
> issues are key.
> I haven't found myself in a situation where unsigned was helping my
> readability: on the opposite actually I am always wondering where is the
> expecting wrap-around behavior and that is one more thing I have to keep in
> mind when I read code that manipulate unsigned. So YMMV but using unsigned
> *increases* my mental load when reading code.
>I'm on the other end.  I'm always reading the code wondering "is
this going
to warn?"  "Why could a container ever have a negative number of
elements?"  "The maximum value representable by the return type
(unsigned)
is larger than that of the value i'm storing it in (signed), so an overflow
could happen even if there were no error.  What then?"


On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at meinersbur.de>
wrote:
> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
> <llvm-dev at lists.llvm.org>:
> >
> > I'm personally against changing everything to signed integers.  To
me,
> this is an example of making code strictly less readable and more confusing
> in order to fight deficiencies in the language standard.  I get the problem
> that it's solving, but I view this as mostly a theoretical problem,
whereas
> being able to read the code and have it make sense is a practical problem
> that we must face on a daily basis.  If you change everything to signed
> integers, you may catch a real problem with it a couple of times a year.
> And by "real problem" here, I'm talking about a miscompile or
an actual bug
> that surfaces in production somewhere, rather than a "yes, it seems
> theoretically possible for this to overflow".
>
> Doesn't it make it already worth it?
>vector.size() returns a size_t, which on 64-bit platforms can represent
types values larger than those that can fit into an int64_t.  So to turn
your argument around, since it's theoretically possible to have a vector
with more items than an int64_t can represent, isn't it already worth it to
use size_t, which is an unsigned type?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/0c9a9624/attachment.html>

Andrew Kelley via llvm-dev

2019-Jun-11 20:10 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On 6/11/19 3:59 PM, Zachary Turner via llvm-dev wrote:> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com
> <mailto:joker.eph at gmail.com>> wrote:
> 
>     I agree that readability, maintainability, and ability to debug/find
>     issues are key. 
>     I haven't found myself in a situation where unsigned was helping my
>     readability: on the opposite actually I am always wondering where is
>     the expecting wrap-around behavior and that is one more thing I have
>     to keep in mind when I read code that manipulate unsigned. So YMMV
>     but using unsigned *increases* my mental load when reading code.
> 
> I'm on the other end.  I'm always reading the code wondering
"is this
> going to warn?"  "Why could a container ever have a negative
number of
> elements?"  "The maximum value representable by the return type
> (unsigned) is larger than that of the value i'm storing it in (signed),
> so an overflow could happen even if there were no error.  What then?"
This is why the Zig frontend has the arithmetic operators +,/,*,- assert
that overflow does not occur for *both* signed and unsigned integers.
The wrapping operators +%,/%,*%,-% are available when one needs a
guaranteed twos complement wraparound.

This gives the best of both worlds, where you can use proper types
matching the range of values, yet have the desired performance
optimizations (and sanitation checks) that only signed integers have in
C/C++.

In fact, in Zig, unsigned integers are actually more optimizable than
signed integers! https://godbolt.org/z/QICZyy

Andrew

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/92e63331/attachment.sig>

Jake Ehrlich via llvm-dev

2019-Jun-11 20:22 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

This whole debate seems kind of odd to me. I don't know that cases where it
isn't clear what type to use come up that often. If a value can truly never
be negative you should use an unsigned value. If a value can be negative,
you should use a signed value. Anecdotal evidence in my case is that the
vast majority of values are unsigned by this rule.

Is there a reason to use a signed value when you know a value will never be
negative? Trapping on overflow doesn't seem motivated to me to me since
I'm
not aware of anything that does that. UBSan also checks for overflow in
unsigned types by default as well so you can still check for that issue.

I'm not going to go watch the YouTube videos but the ES.102 lacks merit. On
systems I work with the bug they mention wouldn't be caught the way they
say. They also use subtraction (a rare operation IMO) as a motivating
example and arbitrarily declare large values to be less obvious bugs than
negative values without evidence to this.

ES.101 is valid but is not a reason to prefer signed to unsigned values in
any context. I've also run into a number of instances of signed shifts
being used and the interplay between negation and bitwise operators being
used. Not that those are common but it's just to say that exceptions exist
even to that rule.

On Tue, Jun 11, 2019, 12:59 PM Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> I agree that readability, maintainability, and ability to debug/find
>> issues are key.
>> I haven't found myself in a situation where unsigned was helping my
>> readability: on the opposite actually I am always wondering where is
the
>> expecting wrap-around behavior and that is one more thing I have to
keep in
>> mind when I read code that manipulate unsigned. So YMMV but using
unsigned
>> *increases* my mental load when reading code.
>>
> I'm on the other end.  I'm always reading the code wondering
"is this
> going to warn?"  "Why could a container ever have a negative
number of
> elements?"  "The maximum value representable by the return type
(unsigned)
> is larger than that of the value i'm storing it in (signed), so an
overflow
> could happen even if there were no error.  What then?"
>
>
> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at
meinersbur.de>
> wrote:
>
>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> >
>> > I'm personally against changing everything to signed integers.
To me,
>> this is an example of making code strictly less readable and more
confusing
>> in order to fight deficiencies in the language standard.  I get the
problem
>> that it's solving, but I view this as mostly a theoretical problem,
whereas
>> being able to read the code and have it make sense is a practical
problem
>> that we must face on a daily basis.  If you change everything to signed
>> integers, you may catch a real problem with it a couple of times a
year.
>> And by "real problem" here, I'm talking about a
miscompile or an actual bug
>> that surfaces in production somewhere, rather than a "yes, it
seems
>> theoretically possible for this to overflow".
>>
>> Doesn't it make it already worth it?
>>
> vector.size() returns a size_t, which on 64-bit platforms can represent
> types values larger than those that can fit into an int64_t.  So to turn
> your argument around, since it's theoretically possible to have a
vector
> with more items than an int64_t can represent, isn't it already worth
it to
> use size_t, which is an unsigned type?
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/9b6ebf97/attachment.html>

Michael Spencer via llvm-dev

2019-Jun-11 20:44 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Tue, Jun 11, 2019 at 1:00 PM Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> I agree that readability, maintainability, and ability to debug/find
>> issues are key.
>> I haven't found myself in a situation where unsigned was helping my
>> readability: on the opposite actually I am always wondering where is
the
>> expecting wrap-around behavior and that is one more thing I have to
keep in
>> mind when I read code that manipulate unsigned. So YMMV but using
unsigned
>> *increases* my mental load when reading code.
>>
> I'm on the other end.  I'm always reading the code wondering
"is this
> going to warn?"  "Why could a container ever have a negative
number of
> elements?"  "The maximum value representable by the return type
(unsigned)
> is larger than that of the value i'm storing it in (signed), so an
overflow
> could happen even if there were no error.  What then?"
>
>
> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at
meinersbur.de>
> wrote:
>
>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> >
>> > I'm personally against changing everything to signed integers.
To me,
>> this is an example of making code strictly less readable and more
confusing
>> in order to fight deficiencies in the language standard.  I get the
problem
>> that it's solving, but I view this as mostly a theoretical problem,
whereas
>> being able to read the code and have it make sense is a practical
problem
>> that we must face on a daily basis.  If you change everything to signed
>> integers, you may catch a real problem with it a couple of times a
year.
>> And by "real problem" here, I'm talking about a
miscompile or an actual bug
>> that surfaces in production somewhere, rather than a "yes, it
seems
>> theoretically possible for this to overflow".
>>
>> Doesn't it make it already worth it?
>>
> vector.size() returns a size_t, which on 64-bit platforms can represent
> types values larger than those that can fit into an int64_t.  So to turn
> your argument around, since it's theoretically possible to have a
vector
> with more items than an int64_t can represent, isn't it already worth
it to
> use size_t, which is an unsigned type?
>
sequence containers (like vector) cannot hold more items than a ptrdiff_t
(a signed type) can represent due to requirements based on std::distance.

I don't see a use for unsigned types outside of bit manipulation.

- Michael Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/d25372bb/attachment-0001.html>

Quentin Colombet via llvm-dev

2019-Jun-11 21:33 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

> On Jun 11, 2019, at 12:59 PM, Zachary Turner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com
<mailto:joker.eph at gmail.com>> wrote:
> I agree that readability, maintainability, and ability to debug/find issues
are key.
> I haven't found myself in a situation where unsigned was helping my
readability: on the opposite actually I am always wondering where is the
expecting wrap-around behavior and that is one more thing I have to keep in mind
when I read code that manipulate unsigned. So YMMV but using unsigned
*increases* my mental load when reading code.
> I'm on the other end.  I'm always reading the code wondering
"is this going to warn?"  "Why could a container ever have a
negative number of elements?"  "The maximum value representable by the
return type (unsigned) is larger than that of the value i'm storing it in
(signed), so an overflow could happen even if there were no error.  What then?”
+1
>  
> 
> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at meinersbur.de
<mailto:llvmdev at meinersbur.de>> wrote:
> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>:
> >
> > I'm personally against changing everything to signed integers.  To
me, this is an example of making code strictly less readable and more confusing
in order to fight deficiencies in the language standard.  I get the problem that
it's solving, but I view this as mostly a theoretical problem, whereas being
able to read the code and have it make sense is a practical problem that we must
face on a daily basis.  If you change everything to signed integers, you may
catch a real problem with it a couple of times a year.  And by "real
problem" here, I'm talking about a miscompile or an actual bug that
surfaces in production somewhere, rather than a "yes, it seems
theoretically possible for this to overflow".
> 
> Doesn't it make it already worth it?
> vector.size() returns a size_t, which on 64-bit platforms can represent
types values larger than those that can fit into an int64_t.  So to turn your
argument around, since it's theoretically possible to have a vector with
more items than an int64_t can represent, isn't it already worth it to use
size_t, which is an unsigned type?
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/c5ce3ffc/attachment.html>

Aaron Ballman via llvm-dev

2019-Jun-12 06:55 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Tue, Jun 11, 2019, 9:59 PM Zachary Turner <zturner at roblox.com>
wrote:
> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> I agree that readability, maintainability, and ability to debug/find
>> issues are key.
>> I haven't found myself in a situation where unsigned was helping my
>> readability: on the opposite actually I am always wondering where is
the
>> expecting wrap-around behavior and that is one more thing I have to
keep in
>> mind when I read code that manipulate unsigned. So YMMV but using
unsigned
>> *increases* my mental load when reading code.
>>
> I'm on the other end.  I'm always reading the code wondering
"is this
> going to warn?"  "Why could a container ever have a negative
number of
> elements?"  "The maximum value representable by the return type
(unsigned)
> is larger than that of the value i'm storing it in (signed), so an
overflow
> could happen even if there were no error.  What then?"
>
Strong +1 to this.

~Aaron

>
> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at
meinersbur.de>
> wrote:
>
>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> >
>> > I'm personally against changing everything to signed integers.
To me,
>> this is an example of making code strictly less readable and more
confusing
>> in order to fight deficiencies in the language standard.  I get the
problem
>> that it's solving, but I view this as mostly a theoretical problem,
whereas
>> being able to read the code and have it make sense is a practical
problem
>> that we must face on a daily basis.  If you change everything to signed
>> integers, you may catch a real problem with it a couple of times a
year.
>> And by "real problem" here, I'm talking about a
miscompile or an actual bug
>> that surfaces in production somewhere, rather than a "yes, it
seems
>> theoretically possible for this to overflow".
>>
>> Doesn't it make it already worth it?
>>
> vector.size() returns a size_t, which on 64-bit platforms can represent
> types values larger than those that can fit into an int64_t.  So to turn
> your argument around, since it's theoretically possible to have a
vector
> with more items than an int64_t can represent, isn't it already worth
it to
> use size_t, which is an unsigned type?
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190612/793fd999/attachment.html>

Krzysztof Parzyszek via llvm-dev

2019-Jun-12 15:04 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

> The underlying problem is that the C family of languages mixes two
orthogonal properties: value range and overflow behavior. There is no unsigned
type with undefined wraparound. So the question becomes: What property is more
important to reflect? Do we want catch unintended wraparound behavior using a
sanitizer/make optimizations based on it?
That's a valid argument, but I suspect that the vast majority of loops using
an unsigned induction variable, start at an explicit 0 and go up by 1.  Such
loops cannot overflow, so there is nothing to catch there.


-- 
Krzysztof Parzyszek  kparzysz at quicinc.com   LLVM compiler development

-----Original Message-----
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Michael
Kruse via llvm-dev
Sent: Tuesday, June 11, 2019 2:26 PM
To: Zachary Turner <zturner at roblox.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Aaron Ballman <aaron.ballman
at gmail.com>
Subject: [EXT] Re: [llvm-dev] [RFC] Coding Standards: "prefer `int` for
regular arithmetic, use `unsigned` only for bitmask and when you intend to rely
on wrapping behavior."

Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
<llvm-dev at lists.llvm.org>:>
> I'm personally against changing everything to signed integers.  To me,
this is an example of making code strictly less readable and more confusing in
order to fight deficiencies in the language standard.  I get the problem that
it's solving, but I view this as mostly a theoretical problem, whereas being
able to read the code and have it make sense is a practical problem that we must
face on a daily basis.  If you change everything to signed integers, you may
catch a real problem with it a couple of times a year.  And by "real
problem" here, I'm talking about a miscompile or an actual bug that
surfaces in production somewhere, rather than a "yes, it seems
theoretically possible for this to overflow".
Doesn't it make it already worth it?

> On the other hand, a large number of people need to work in this codebase
every day, and multiplied over the same time period, my belief is that having
the code make sense and be simple has a higher net value.
>
> It simply doesn't make sense (conceptually) to use a signed type for
domains that are inherently unsigned, like the size of an object.  IMO, we
should revisit this if and when the deficiencies in the C++ Standard are
addressed.
The underlying problem is that the C family of languages mixes two orthogonal
properties: value range and overflow behavior. There is no unsigned type with
undefined wraparound. So the question becomes: What property is more important
to reflect? Do we want catch unintended wraparound behavior using a
sanitizer/make optimizations based on it?
Do we need the additional range provided by an unsigned type? As Chandler says
in one of his talks linked earlier: "If you need more bits, use more
bits" (such as int64_t).

Michael
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi AMINI via llvm-dev

2019-Jun-12 15:50 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Wed, Jun 12, 2019 at 8:05 AM Krzysztof Parzyszek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> > The underlying problem is that the C family of languages mixes two
> orthogonal properties: value range and overflow behavior. There is no
> unsigned type with undefined wraparound. So the question becomes: What
> property is more important to reflect? Do we want catch unintended
> wraparound behavior using a sanitizer/make optimizations based on it?
>
> That's a valid argument, but I suspect that the vast majority of loops
> using an unsigned induction variable, start at an explicit 0 and go up by
> 1.  Such loops cannot overflow, so there is nothing to catch there.
>
This isn't entirely true: some comparison can exist in the code `if (idx -
1 < N) { ....}` which does not give the same result when `idx` is unsigned
and zero.

Just like Michael mentioned: there are two aspects of unsigned, and the
wrap-around behavior is the one causing bugs. After being bitten by
unsigned wrap-around twice a year, and because debugging these isn't fun
(and correctness is important and hard), my code reading (and others, as I
sourced initially) has adjusted to be suspicious and careful around
unsigned: if I see unsigned I need to check more invariants when I read the
code.
Obviously this is a tradeoff: losing the "this can't be negative"
property
enforced in the type system is sad, but I haven't encountered any bugs or
issue caused by using int in place of unsigned (while the opposite is true).
(any bug caused "negative indexing of a container" would be as much of
a
bug with unsigned, please watch Chandler's talk).

-- 
Mehdi




>
>
> --
> Krzysztof Parzyszek  kparzysz at quicinc.com   LLVM compiler development
>
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
Michael
> Kruse via llvm-dev
> Sent: Tuesday, June 11, 2019 2:26 PM
> To: Zachary Turner <zturner at roblox.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>; Aaron Ballman <
> aaron.ballman at gmail.com>
> Subject: [EXT] Re: [llvm-dev] [RFC] Coding Standards: "prefer `int`
for
> regular arithmetic, use `unsigned` only for bitmask and when you intend to
> rely on wrapping behavior."
>
> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
> <llvm-dev at lists.llvm.org>:
> >
> > I'm personally against changing everything to signed integers.  To
me,
> this is an example of making code strictly less readable and more confusing
> in order to fight deficiencies in the language standard.  I get the problem
> that it's solving, but I view this as mostly a theoretical problem,
whereas
> being able to read the code and have it make sense is a practical problem
> that we must face on a daily basis.  If you change everything to signed
> integers, you may catch a real problem with it a couple of times a year.
> And by "real problem" here, I'm talking about a miscompile or
an actual bug
> that surfaces in production somewhere, rather than a "yes, it seems
> theoretically possible for this to overflow".
>
> Doesn't it make it already worth it?
>
>
> > On the other hand, a large number of people need to work in this
> codebase every day, and multiplied over the same time period, my belief is
> that having the code make sense and be simple has a higher net value.
> >
> > It simply doesn't make sense (conceptually) to use a signed type
for
> domains that are inherently unsigned, like the size of an object.  IMO, we
> should revisit this if and when the deficiencies in the C++ Standard are
> addressed.
>
> The underlying problem is that the C family of languages mixes two
> orthogonal properties: value range and overflow behavior. There is no
> unsigned type with undefined wraparound. So the question becomes: What
> property is more important to reflect? Do we want catch unintended
> wraparound behavior using a sanitizer/make optimizations based on it?
> Do we need the additional range provided by an unsigned type? As Chandler
> says in one of his talks linked earlier: "If you need more bits, use
more
> bits" (such as int64_t).
>
> Michael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190612/7e310ada/attachment.html>

David Greene via llvm-dev

2019-Jun-12 16:56 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> The underlying problem is that the C family of languages mixes two
>> orthogonal properties: value range and overflow behavior. There is
>> no unsigned type with undefined wraparound. So the question becomes:
>> What property is more important to reflect? Do we want catch
>> unintended wraparound behavior using a sanitizer/make optimizations
>> based on it?
>
> That's a valid argument, but I suspect that the vast majority of loops
> using an unsigned induction variable, start at an explicit 0 and go up
> by 1.  Such loops cannot overflow, so there is nothing to catch there.
I have in fact seen loops that rely on overflow of induction variables,
as crazy as that sounds.

But again, for loops it's not (directly) about overflow as much as it is
about optimization.  You're leaving a lot of potential performance on
the floor with unsigned, precisely because compilers have to guard
against overflowing induction variables.

                          -David

Michael Kruse via llvm-dev

2019-Jun-12 22:07 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Am Di., 11. Juni 2019 um 14:59 Uhr schrieb Zachary Turner <zturner at
roblox.com>:>> > I'm personally against changing everything to signed integers.
To me, this is an example of making code strictly less readable and more
confusing in order to fight deficiencies in the language standard.  I get the
problem that it's solving, but I view this as mostly a theoretical problem,
whereas being able to read the code and have it make sense is a practical
problem that we must face on a daily basis.  If you change everything to signed
integers, you may catch a real problem with it a couple of times a year.  And by
"real problem" here, I'm talking about a miscompile or an actual
bug that surfaces in production somewhere, rather than a "yes, it seems
theoretically possible for this to overflow".
>>
>> Doesn't it make it already worth it?
>
> vector.size() returns a size_t, which on 64-bit platforms can represent
types values larger than those that can fit into an int64_t.  So to turn your
argument around, since it's theoretically possible to have a vector with
more items than an int64_t can represent, isn't it already worth it to use
size_t, which is an unsigned type?
According to your definition above, this is not a "real problem".
64-bit CPUs do not support using the full 64-bit virtual address
space. Even if one does in the future, PTRDIFF_MAX (and as a
consequence std::distance that a std::vector must support) effectively
limits the max allocation size that can be worked with without
undefined behavior.

To summarize, with signed sizes we could fix a few production-bugs a
year, while unsigned sizes only have a theoretical advantage in
production.

Michael

llvm dev - Jun 2019 - [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."