Mehdi AMINI via llvm-dev
2019-Jun-11 08:10 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
On Tue, Jun 11, 2019 at 1:01 AM Aaron Ballman <aaron.ballman at gmail.com> wrote:> > > On Tue, Jun 11, 2019, 9:51 AM Mehdi AMINI <joker.eph at gmail.com> wrote: > >> >> >> On Tue, Jun 11, 2019 at 12:37 AM Aaron Ballman <aaron.ballman at gmail.com> >> wrote: >> >>> Sorry for the brevity, I am currently travelling and responding on a >>> cell phone. I won't be able to give you a full accounting until later, >>> >> >> There is no hurry right now! >> >> >>> but 1) I don't see a motivating problem this churn solves, 2) signed int >>> does not represent the full size of an object like size_t does and is >>> inappropriate to use for addressing into objects or arrays, which means we >>> won't use this convention consistently anyway. >>> >> >> As far as I can tell, the same arguments applies to "unsigned" I believe: >> it does not represent the full extent of size_t on every platform either! >> Yet we're still using it as an index, including to index into objects/array. >> > >> >> >>> I have yet to be convinced by the c++ community's very recent desire to >>> switch everything to signed integers and would be very unhappy to see us >>> switch without considerably more motivating rarionale. >>> >> >> Note that this is not about "switching" anything: there is no standard in >> LLVM right now (as far as I know) and the codebase is inconsistent (I am >> using `int` in general for a while I believe). >> > > Glad to hear there's no plan to change the codebase >Just like every guideline, new code should follow unless there is a good reason not to (the wording is "prefer").> , but then I fail to see the purpose of the coding rule. Won't it boil > down to "use the correct datatype for what you're doing?" >How do you define what is the correct datatype?> > What I mostly want to avoid is code that uses signed values for inherently > unsigned operations like comparing against the size of an object. >I believe `if (index < v.size() - 1)` can be very surprising when v.empty(). -- Mehdi -->> Mehdi >> >> >> >>> >>> On Mon, Jun 10, 2019, 11:04 PM Mehdi AMINI <joker.eph at gmail.com> wrote: >>> >>>> >>>> >>>> On Mon, Jun 10, 2019 at 10:32 AM Aaron Ballman via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jun 10, 2019, 7:16 PM Jake Ehrlich via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> I'm in the same situation James is in and thus have the same bias but >>>>>> I'll +1 that comment nevertheless. I think I prefer using size_t or the >>>>>> uintX_t types where applicable. Only when I need a signed value do I use >>>>>> one. >>>>>> >>>>> >>>>> +1 to prefering unsigned types. >>>>> >>>> >>>> I'd appreciate if you guys could provide rational that address the >>>> extensive arguments and opinion provided in the C++ community that I tried >>>> to summarize in the link above. >>>> Otherwise I don't know what to take out of unmotivated "+1". >>>> >>>> -- >>>> Mehdi >>>> >>>> >>>> >>>>> >>>>>> On Mon, Jun 10, 2019, 9:59 AM James Henderson via llvm-dev < >>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>>> Maybe it's just because I work in code around the binary file >>>>>>> formats almost exclusively, but unsigned (or more often uint64_t) is FAR >>>>>>> more common than int everywhere I go. I don't have time right now to read >>>>>>> up on the different links you provided, and I expect this is covered in >>>>>>> them, but it also seems odd to me to use int in a loop when indexing in a >>>>>>> container (something that can't always be avoided), given the types of >>>>>>> size() etc. >>>>>>> >>>>>>> On Mon, 10 Jun 2019 at 17:26, Michael Kruse via llvm-dev < >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>>> Am Sa., 8. Juni 2019 um 13:12 Uhr schrieb Tim Northover via llvm-dev >>>>>>>> <llvm-dev at lists.llvm.org>: >>>>>>>> > I'd prefer us to have something neater than static_cast<int> for >>>>>>>> the >>>>>>>> > loop problem before we made that change. Perhaps add an ssize (or >>>>>>>> > equivalent) method to all of our internal data structures? >>>>>>>> They're a >>>>>>>> > lot more common than std::* containers. >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Since C++20 is also introducing ssize [1] members, this makes a lot >>>>>>>> of >>>>>>>> sense to me. Using it would help avoiding an unsigned comparison as >>>>>>>> in >>>>>>>> >>>>>>>> if (IndexOfInterestingElement >= Container.size()) >>>>>>>> ... >>>>>>>> >>>>>>>> to sneak in from the start. >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> [1] http://wg21.link/p1227r1 >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/fdc2f950/attachment.html>
Zachary Turner via llvm-dev
2019-Jun-11 16:45 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
I'm personally against changing everything to signed integers. To me, this is an example of making code strictly less readable and more confusing in order to fight deficiencies in the language standard. I get the problem that it's solving, but I view this as mostly a theoretical problem, whereas being able to read the code and have it make sense is a practical problem that we must face on a daily basis. If you change everything to signed integers, you may catch a real problem with it a couple of times a year. And by "real problem" here, I'm talking about a miscompile or an actual bug that surfaces in production somewhere, rather than a "yes, it seems theoretically possible for this to overflow". On the other hand, a large number of people need to work in this codebase every day, and multiplied over the same time period, my belief is that having the code make sense and be simple has a higher net value. It simply doesn't make sense (conceptually) to use a signed type for domains that are inherently unsigned, like the size of an object. IMO, we should revisit this if and when the deficiencies in the C++ Standard are addressed. On Tue, Jun 11, 2019 at 1:10 AM Mehdi AMINI via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On Tue, Jun 11, 2019 at 1:01 AM Aaron Ballman <aaron.ballman at gmail.com> > wrote: > >> >> >> On Tue, Jun 11, 2019, 9:51 AM Mehdi AMINI <joker.eph at gmail.com> wrote: >> >>> >>> >>> On Tue, Jun 11, 2019 at 12:37 AM Aaron Ballman <aaron.ballman at gmail.com> >>> wrote: >>> >>>> Sorry for the brevity, I am currently travelling and responding on a >>>> cell phone. I won't be able to give you a full accounting until later, >>>> >>> >>> There is no hurry right now! >>> >>> >>>> but 1) I don't see a motivating problem this churn solves, 2) signed >>>> int does not represent the full size of an object like size_t does and is >>>> inappropriate to use for addressing into objects or arrays, which means we >>>> won't use this convention consistently anyway. >>>> >>> >>> As far as I can tell, the same arguments applies to "unsigned" I >>> believe: it does not represent the full extent of size_t on every platform >>> either! Yet we're still using it as an index, including to index into >>> objects/array. >>> >> >>> >>> >>>> I have yet to be convinced by the c++ community's very recent desire to >>>> switch everything to signed integers and would be very unhappy to see us >>>> switch without considerably more motivating rarionale. >>>> >>> >>> Note that this is not about "switching" anything: there is no standard >>> in LLVM right now (as far as I know) and the codebase is inconsistent (I am >>> using `int` in general for a while I believe). >>> >> >> Glad to hear there's no plan to change the codebase >> > > Just like every guideline, new code should follow unless there is a good > reason not to (the wording is "prefer"). > > >> , but then I fail to see the purpose of the coding rule. Won't it boil >> down to "use the correct datatype for what you're doing?" >> > > How do you define what is the correct datatype? > > >> >> What I mostly want to avoid is code that uses signed values for >> inherently unsigned operations like comparing against the size of an object. >> > > I believe `if (index < v.size() - 1)` can be very surprising when > v.empty(). > > -- > Mehdi > > > -- >>> Mehdi >>> >>> >>> >>>> >>>> On Mon, Jun 10, 2019, 11:04 PM Mehdi AMINI <joker.eph at gmail.com> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jun 10, 2019 at 10:32 AM Aaron Ballman via llvm-dev < >>>>> llvm-dev at lists.llvm.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Jun 10, 2019, 7:16 PM Jake Ehrlich via llvm-dev < >>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>> >>>>>>> I'm in the same situation James is in and thus have the same bias >>>>>>> but I'll +1 that comment nevertheless. I think I prefer using size_t or the >>>>>>> uintX_t types where applicable. Only when I need a signed value do I use >>>>>>> one. >>>>>>> >>>>>> >>>>>> +1 to prefering unsigned types. >>>>>> >>>>> >>>>> I'd appreciate if you guys could provide rational that address the >>>>> extensive arguments and opinion provided in the C++ community that I tried >>>>> to summarize in the link above. >>>>> Otherwise I don't know what to take out of unmotivated "+1". >>>>> >>>>> -- >>>>> Mehdi >>>>> >>>>> >>>>> >>>>>> >>>>>>> On Mon, Jun 10, 2019, 9:59 AM James Henderson via llvm-dev < >>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>> >>>>>>>> Maybe it's just because I work in code around the binary file >>>>>>>> formats almost exclusively, but unsigned (or more often uint64_t) is FAR >>>>>>>> more common than int everywhere I go. I don't have time right now to read >>>>>>>> up on the different links you provided, and I expect this is covered in >>>>>>>> them, but it also seems odd to me to use int in a loop when indexing in a >>>>>>>> container (something that can't always be avoided), given the types of >>>>>>>> size() etc. >>>>>>>> >>>>>>>> On Mon, 10 Jun 2019 at 17:26, Michael Kruse via llvm-dev < >>>>>>>> llvm-dev at lists.llvm.org> wrote: >>>>>>>> >>>>>>>>> Am Sa., 8. Juni 2019 um 13:12 Uhr schrieb Tim Northover via >>>>>>>>> llvm-dev >>>>>>>>> <llvm-dev at lists.llvm.org>: >>>>>>>>> > I'd prefer us to have something neater than static_cast<int> for >>>>>>>>> the >>>>>>>>> > loop problem before we made that change. Perhaps add an ssize (or >>>>>>>>> > equivalent) method to all of our internal data structures? >>>>>>>>> They're a >>>>>>>>> > lot more common than std::* containers. >>>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> Since C++20 is also introducing ssize [1] members, this makes a >>>>>>>>> lot of >>>>>>>>> sense to me. Using it would help avoiding an unsigned comparison >>>>>>>>> as in >>>>>>>>> >>>>>>>>> if (IndexOfInterestingElement >= Container.size()) >>>>>>>>> ... >>>>>>>>> >>>>>>>>> to sneak in from the start. >>>>>>>>> >>>>>>>>> Michael >>>>>>>>> >>>>>>>>> [1] http://wg21.link/p1227r1 >>>>>>>>> _______________________________________________ >>>>>>>>> LLVM Developers mailing list >>>>>>>>> llvm-dev at lists.llvm.org >>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> LLVM Developers mailing list >>>>>>>> llvm-dev at lists.llvm.org >>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> llvm-dev at lists.llvm.org >>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>>> >>>>> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/5ef56789/attachment.html>
Michael Kruse via llvm-dev
2019-Jun-11 19:25 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev <llvm-dev at lists.llvm.org>:> > I'm personally against changing everything to signed integers. To me, this is an example of making code strictly less readable and more confusing in order to fight deficiencies in the language standard. I get the problem that it's solving, but I view this as mostly a theoretical problem, whereas being able to read the code and have it make sense is a practical problem that we must face on a daily basis. If you change everything to signed integers, you may catch a real problem with it a couple of times a year. And by "real problem" here, I'm talking about a miscompile or an actual bug that surfaces in production somewhere, rather than a "yes, it seems theoretically possible for this to overflow".Doesn't it make it already worth it?> On the other hand, a large number of people need to work in this codebase every day, and multiplied over the same time period, my belief is that having the code make sense and be simple has a higher net value. > > It simply doesn't make sense (conceptually) to use a signed type for domains that are inherently unsigned, like the size of an object. IMO, we should revisit this if and when the deficiencies in the C++ Standard are addressed.The underlying problem is that the C family of languages mixes two orthogonal properties: value range and overflow behavior. There is no unsigned type with undefined wraparound. So the question becomes: What property is more important to reflect? Do we want catch unintended wraparound behavior using a sanitizer/make optimizations based on it? Do we need the additional range provided by an unsigned type? As Chandler says in one of his talks linked earlier: "If you need more bits, use more bits" (such as int64_t). Michael
Reasonably Related Threads
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
- [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."