thr3ads.net - llvm dev - [llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior." [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Jake Ehrlich via llvm-dev

2019-Jun-11 20:22 UTC

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

This whole debate seems kind of odd to me. I don't know that cases where it
isn't clear what type to use come up that often. If a value can truly never
be negative you should use an unsigned value. If a value can be negative,
you should use a signed value. Anecdotal evidence in my case is that the
vast majority of values are unsigned by this rule.

Is there a reason to use a signed value when you know a value will never be
negative? Trapping on overflow doesn't seem motivated to me to me since
I'm
not aware of anything that does that. UBSan also checks for overflow in
unsigned types by default as well so you can still check for that issue.

I'm not going to go watch the YouTube videos but the ES.102 lacks merit. On
systems I work with the bug they mention wouldn't be caught the way they
say. They also use subtraction (a rare operation IMO) as a motivating
example and arbitrarily declare large values to be less obvious bugs than
negative values without evidence to this.

ES.101 is valid but is not a reason to prefer signed to unsigned values in
any context. I've also run into a number of instances of signed shifts
being used and the interplay between negation and bitwise operators being
used. Not that those are common but it's just to say that exceptions exist
even to that rule.

On Tue, Jun 11, 2019, 12:59 PM Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>> I agree that readability, maintainability, and ability to debug/find
>> issues are key.
>> I haven't found myself in a situation where unsigned was helping my
>> readability: on the opposite actually I am always wondering where is
the
>> expecting wrap-around behavior and that is one more thing I have to
keep in
>> mind when I read code that manipulate unsigned. So YMMV but using
unsigned
>> *increases* my mental load when reading code.
>>
> I'm on the other end.  I'm always reading the code wondering
"is this
> going to warn?"  "Why could a container ever have a negative
number of
> elements?"  "The maximum value representable by the return type
(unsigned)
> is larger than that of the value i'm storing it in (signed), so an
overflow
> could happen even if there were no error.  What then?"
>
>
> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at
meinersbur.de>
> wrote:
>
>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
>> <llvm-dev at lists.llvm.org>:
>> >
>> > I'm personally against changing everything to signed integers.
To me,
>> this is an example of making code strictly less readable and more
confusing
>> in order to fight deficiencies in the language standard.  I get the
problem
>> that it's solving, but I view this as mostly a theoretical problem,
whereas
>> being able to read the code and have it make sense is a practical
problem
>> that we must face on a daily basis.  If you change everything to signed
>> integers, you may catch a real problem with it a couple of times a
year.
>> And by "real problem" here, I'm talking about a
miscompile or an actual bug
>> that surfaces in production somewhere, rather than a "yes, it
seems
>> theoretically possible for this to overflow".
>>
>> Doesn't it make it already worth it?
>>
> vector.size() returns a size_t, which on 64-bit platforms can represent
> types values larger than those that can fit into an int64_t.  So to turn
> your argument around, since it's theoretically possible to have a
vector
> with more items than an int64_t can represent, isn't it already worth
it to
> use size_t, which is an unsigned type?
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/9b6ebf97/attachment.html>

David Greene via llvm-dev

2019-Jun-12 16:54 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Jake Ehrlich via llvm-dev <llvm-dev at lists.llvm.org> writes:
> This whole debate seems kind of odd to me. I don't know that cases
> where it isn't clear what type to use come up that often. If a value
> can truly never be negative you should use an unsigned value. If a
> value can be negative, you should use a signed value. Anecdotal
> evidence in my case is that the vast majority of values are unsigned
> by this rule.
>
> Is there a reason to use a signed value when you know a value will
> never be negative?
Since this thread is really long, I want to make sure to address this
specific point even though it's been covered elsewhere.

One reason to prefer signed is optimization.  The compiler simply cannot
optimize code with unsigned as well as it can with signed, because of
unsigned's breaking of standard integer algebra.  This affects
everything from simple expression simplification to vectorization and
parallelization.  Using unsigned can have serious performance
consequences.  Because of the nature of the work I do, I see it all the
time.

Some have said this is premature optimization but to me there is no
additional mental load with signed.  In fact it's less for me than
unsigned because of the mental gymnastics I have to go through to verify
code that uses unsigned.

                             -David

Quentin Colombet via llvm-dev

2019-Jun-12 17:11 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

> On Jun 12, 2019, at 9:54 AM, David Greene via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Jake Ehrlich via llvm-dev <llvm-dev at lists.llvm.org> writes:
> 
>> This whole debate seems kind of odd to me. I don't know that cases
>> where it isn't clear what type to use come up that often. If a
value
>> can truly never be negative you should use an unsigned value. If a
>> value can be negative, you should use a signed value. Anecdotal
>> evidence in my case is that the vast majority of values are unsigned
>> by this rule.
>> 
>> Is there a reason to use a signed value when you know a value will
>> never be negative?
> 
> Since this thread is really long, I want to make sure to address this
> specific point even though it's been covered elsewhere.
> 
> One reason to prefer signed is optimization.
FWIW. If you care about optimization, signed size_t is probably the way to go in
general.

Int type will incur a sign extension for any address accesses on 64-bit platform
(32-bit to 64-bit extension).
Unsigned on the other hand creates zero extension which are most of the time
free.

Thus, unsigned is sometimes better for codegen than signed, in particular in a
compiler code base where vectorization is not really a thing.

Anyway, it seems to me that there are enough people on both sides of the fence
that this shouldn’t be in the coding standard.

My 2c.

Quentin
> The compiler simply cannot
> optimize code with unsigned as well as it can with signed, because of
> unsigned's breaking of standard integer algebra.  This affects
> everything from simple expression simplification to vectorization and
> parallelization.  Using unsigned can have serious performance
> consequences.  Because of the nature of the work I do, I see it all the
> time.
> 
> Some have said this is premature optimization but to me there is no
> additional mental load with signed.  In fact it's less for me than
> unsigned because of the mental gymnastics I have to go through to verify
> code that uses unsigned.
> 
>                             -David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Jun 2019 - [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."