thr3ads.net - llvm dev - [llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior." [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2019-Jun-11 08:10 UTC

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

On Tue, Jun 11, 2019 at 1:01 AM Aaron Ballman <aaron.ballman at gmail.com>
wrote:
>
>
> On Tue, Jun 11, 2019, 9:51 AM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>>
>>
>> On Tue, Jun 11, 2019 at 12:37 AM Aaron Ballman <aaron.ballman at
gmail.com>
>> wrote:
>>
>>> Sorry for the brevity, I am currently travelling and responding on
a
>>> cell phone. I won't be able to give you a full accounting until
later,
>>>
>>
>> There is no hurry right now!
>>
>>
>>> but 1) I don't see a motivating problem this churn solves, 2)
signed int
>>> does not represent the full size of an object like size_t does and
is
>>> inappropriate to use for addressing into objects or arrays, which
means we
>>> won't use this convention consistently anyway.
>>>
>>
>> As far as I can tell, the same arguments applies to
"unsigned" I believe:
>> it does not represent the full extent of size_t on every platform
either!
>> Yet we're still using it as an index, including to index into
objects/array.
>>
>
>>
>>
>>> I have yet to be convinced by the c++ community's very recent
desire to
>>> switch everything to signed integers and would be very unhappy to
see us
>>> switch without considerably more motivating rarionale.
>>>
>>
>> Note that this is not about "switching" anything: there is no
standard in
>> LLVM right now (as far as I know) and the codebase is inconsistent (I
am
>> using `int` in general for a while I believe).
>>
>
> Glad to hear there's no plan to change the codebase
>
Just like every guideline, new code should follow unless there is a good
reason not to (the wording is "prefer").

> , but then I fail to see the purpose of the coding rule. Won't it boil
> down to "use the correct datatype for what you're doing?"
>
How do you define what is the correct datatype?

>
> What I mostly want to avoid is code that uses signed values for inherently
> unsigned operations like comparing against the size of an object.
>
I believe `if (index < v.size() - 1)` can be very surprising when v.empty().

-- 
Mehdi


-- >> Mehdi
>>
>>
>>
>>>
>>> On Mon, Jun 10, 2019, 11:04 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jun 10, 2019 at 10:32 AM Aaron Ballman via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 10, 2019, 7:16 PM Jake Ehrlich via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> I'm in the same situation James is in and thus have
the same bias but
>>>>>> I'll +1 that comment nevertheless. I think I prefer
using size_t or the
>>>>>> uintX_t types where applicable. Only when I need a
signed value do I use
>>>>>> one.
>>>>>>
>>>>>
>>>>> +1 to prefering unsigned types.
>>>>>
>>>>
>>>> I'd appreciate if you guys could provide rational that
address the
>>>> extensive arguments and opinion provided in the C++ community
that I tried
>>>> to summarize in the link above.
>>>> Otherwise I don't know what to take out of unmotivated
"+1".
>>>>
>>>> --
>>>> Mehdi
>>>>
>>>>
>>>>
>>>>>
>>>>>> On Mon, Jun 10, 2019, 9:59 AM James Henderson via
llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Maybe it's just because I work in code around
the binary file
>>>>>>> formats almost exclusively, but unsigned (or more
often uint64_t) is FAR
>>>>>>> more common than int everywhere I go. I don't
have time right now to read
>>>>>>> up on the different links you provided, and I
expect this is covered in
>>>>>>> them, but it also seems odd to me to use int in a
loop when indexing in a
>>>>>>> container (something that can't always be
avoided), given the types of
>>>>>>> size() etc.
>>>>>>>
>>>>>>> On Mon, 10 Jun 2019 at 17:26, Michael Kruse via
llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Am Sa., 8. Juni 2019 um 13:12 Uhr schrieb Tim
Northover via llvm-dev
>>>>>>>> <llvm-dev at lists.llvm.org>:
>>>>>>>> > I'd prefer us to have something neater
than static_cast<int> for
>>>>>>>> the
>>>>>>>> > loop problem before we made that change.
Perhaps add an ssize (or
>>>>>>>> > equivalent) method to all of our internal
data structures?
>>>>>>>> They're a
>>>>>>>> > lot more common than std::* containers.
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Since C++20 is also introducing ssize [1]
members, this makes a lot
>>>>>>>> of
>>>>>>>> sense to me. Using it would help avoiding an
unsigned comparison as
>>>>>>>> in
>>>>>>>>
>>>>>>>>     if (IndexOfInterestingElement >=
Container.size())
>>>>>>>>       ...
>>>>>>>>
>>>>>>>> to sneak in from the start.
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>> [1] http://wg21.link/p1227r1
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/fdc2f950/attachment.html>

Zachary Turner via llvm-dev

2019-Jun-11 16:45 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

I'm personally against changing everything to signed integers.  To me, this
is an example of making code strictly less readable and more confusing in
order to fight deficiencies in the language standard.  I get the problem
that it's solving, but I view this as mostly a theoretical problem, whereas
being able to read the code and have it make sense is a practical problem
that we must face on a daily basis.  If you change everything to signed
integers, you may catch a real problem with it a couple of times a year.
And by "real problem" here, I'm talking about a miscompile or an
actual bug
that surfaces in production somewhere, rather than a "yes, it seems
theoretically possible for this to overflow".

On the other hand, a large number of people need to work in this codebase
every day, and multiplied over the same time period, my belief is that
having the code make sense and be simple has a higher net value.

It simply doesn't make sense (conceptually) to use a signed type for
domains that are inherently unsigned, like the size of an object.  IMO, we
should revisit this if and when the deficiencies in the C++ Standard are
addressed.

On Tue, Jun 11, 2019 at 1:10 AM Mehdi AMINI via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Tue, Jun 11, 2019 at 1:01 AM Aaron Ballman <aaron.ballman at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 11, 2019, 9:51 AM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Jun 11, 2019 at 12:37 AM Aaron Ballman <aaron.ballman at
gmail.com>
>>> wrote:
>>>
>>>> Sorry for the brevity, I am currently travelling and responding
on a
>>>> cell phone. I won't be able to give you a full accounting
until later,
>>>>
>>>
>>> There is no hurry right now!
>>>
>>>
>>>> but 1) I don't see a motivating problem this churn solves,
2) signed
>>>> int does not represent the full size of an object like size_t
does and is
>>>> inappropriate to use for addressing into objects or arrays,
which means we
>>>> won't use this convention consistently anyway.
>>>>
>>>
>>> As far as I can tell, the same arguments applies to
"unsigned" I
>>> believe: it does not represent the full extent of size_t on every
platform
>>> either! Yet we're still using it as an index, including to
index into
>>> objects/array.
>>>
>>
>>>
>>>
>>>> I have yet to be convinced by the c++ community's very
recent desire to
>>>> switch everything to signed integers and would be very unhappy
to see us
>>>> switch without considerably more motivating rarionale.
>>>>
>>>
>>> Note that this is not about "switching" anything: there
is no standard
>>> in LLVM right now (as far as I know) and the codebase is
inconsistent (I am
>>> using `int` in general for a while I believe).
>>>
>>
>> Glad to hear there's no plan to change the codebase
>>
>
> Just like every guideline, new code should follow unless there is a good
> reason not to (the wording is "prefer").
>
>
>> , but then I fail to see the purpose of the coding rule. Won't it
boil
>> down to "use the correct datatype for what you're doing?"
>>
>
> How do you define what is the correct datatype?
>
>
>>
>> What I mostly want to avoid is code that uses signed values for
>> inherently unsigned operations like comparing against the size of an
object.
>>
>
> I believe `if (index < v.size() - 1)` can be very surprising when
> v.empty().
>
> --
> Mehdi
>
>
> --
>>> Mehdi
>>>
>>>
>>>
>>>>
>>>> On Mon, Jun 10, 2019, 11:04 PM Mehdi AMINI <joker.eph at
gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 10, 2019 at 10:32 AM Aaron Ballman via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 10, 2019, 7:16 PM Jake Ehrlich via llvm-dev
<
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> I'm in the same situation James is in and thus
have the same bias
>>>>>>> but I'll +1 that comment nevertheless. I think
I prefer using size_t or the
>>>>>>> uintX_t types where applicable. Only when I need a
signed value do I use
>>>>>>> one.
>>>>>>>
>>>>>>
>>>>>> +1 to prefering unsigned types.
>>>>>>
>>>>>
>>>>> I'd appreciate if you guys could provide rational that
address the
>>>>> extensive arguments and opinion provided in the C++
community that I tried
>>>>> to summarize in the link above.
>>>>> Otherwise I don't know what to take out of unmotivated
"+1".
>>>>>
>>>>> --
>>>>> Mehdi
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> On Mon, Jun 10, 2019, 9:59 AM James Henderson via
llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Maybe it's just because I work in code
around the binary file
>>>>>>>> formats almost exclusively, but unsigned (or
more often uint64_t) is FAR
>>>>>>>> more common than int everywhere I go. I
don't have time right now to read
>>>>>>>> up on the different links you provided, and I
expect this is covered in
>>>>>>>> them, but it also seems odd to me to use int in
a loop when indexing in a
>>>>>>>> container (something that can't always be
avoided), given the types of
>>>>>>>> size() etc.
>>>>>>>>
>>>>>>>> On Mon, 10 Jun 2019 at 17:26, Michael Kruse via
llvm-dev <
>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> Am Sa., 8. Juni 2019 um 13:12 Uhr schrieb
Tim Northover via
>>>>>>>>> llvm-dev
>>>>>>>>> <llvm-dev at lists.llvm.org>:
>>>>>>>>> > I'd prefer us to have something
neater than static_cast<int> for
>>>>>>>>> the
>>>>>>>>> > loop problem before we made that
change. Perhaps add an ssize (or
>>>>>>>>> > equivalent) method to all of our
internal data structures?
>>>>>>>>> They're a
>>>>>>>>> > lot more common than std::*
containers.
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Since C++20 is also introducing ssize [1]
members, this makes a
>>>>>>>>> lot of
>>>>>>>>> sense to me. Using it would help avoiding
an unsigned comparison
>>>>>>>>> as in
>>>>>>>>>
>>>>>>>>>     if (IndexOfInterestingElement >=
Container.size())
>>>>>>>>>       ...
>>>>>>>>>
>>>>>>>>> to sneak in from the start.
>>>>>>>>>
>>>>>>>>> Michael
>>>>>>>>>
>>>>>>>>> [1] http://wg21.link/p1227r1
>>>>>>>>>
_______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/5ef56789/attachment.html>

Michael Kruse via llvm-dev

2019-Jun-11 19:25 UTC

head link

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
<llvm-dev at lists.llvm.org>:>
> I'm personally against changing everything to signed integers.  To me,
this is an example of making code strictly less readable and more confusing in
order to fight deficiencies in the language standard.  I get the problem that
it's solving, but I view this as mostly a theoretical problem, whereas being
able to read the code and have it make sense is a practical problem that we must
face on a daily basis.  If you change everything to signed integers, you may
catch a real problem with it a couple of times a year.  And by "real
problem" here, I'm talking about a miscompile or an actual bug that
surfaces in production somewhere, rather than a "yes, it seems
theoretically possible for this to overflow".
Doesn't it make it already worth it?

> On the other hand, a large number of people need to work in this codebase
every day, and multiplied over the same time period, my belief is that having
the code make sense and be simple has a higher net value.
>
> It simply doesn't make sense (conceptually) to use a signed type for
domains that are inherently unsigned, like the size of an object.  IMO, we
should revisit this if and when the deficiencies in the C++ Standard are
addressed.
The underlying problem is that the C family of languages mixes two
orthogonal properties: value range and overflow behavior. There is no
unsigned type with undefined wraparound. So the question becomes: What
property is more important to reflect? Do we want catch unintended
wraparound behavior using a sanitizer/make optimizations based on it?
Do we need the additional range provided by an unsigned type? As
Chandler says in one of his talks linked earlier: "If you need more
bits, use more bits" (such as int64_t).

Michael

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Jun 2019 - [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Maybe Matching Threads