Jake Ehrlich via llvm-dev
2019-Jun-11 20:22 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
This whole debate seems kind of odd to me. I don't know that cases where it isn't clear what type to use come up that often. If a value can truly never be negative you should use an unsigned value. If a value can be negative, you should use a signed value. Anecdotal evidence in my case is that the vast majority of values are unsigned by this rule. Is there a reason to use a signed value when you know a value will never be negative? Trapping on overflow doesn't seem motivated to me to me since I'm not aware of anything that does that. UBSan also checks for overflow in unsigned types by default as well so you can still check for that issue. I'm not going to go watch the YouTube videos but the ES.102 lacks merit. On systems I work with the bug they mention wouldn't be caught the way they say. They also use subtraction (a rare operation IMO) as a motivating example and arbitrarily declare large values to be less obvious bugs than negative values without evidence to this. ES.101 is valid but is not a reason to prefer signed to unsigned values in any context. I've also run into a number of instances of signed shifts being used and the interplay between negation and bitwise operators being used. Not that those are common but it's just to say that exceptions exist even to that rule. On Tue, Jun 11, 2019, 12:59 PM Zachary Turner via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com> wrote: > >> I agree that readability, maintainability, and ability to debug/find >> issues are key. >> I haven't found myself in a situation where unsigned was helping my >> readability: on the opposite actually I am always wondering where is the >> expecting wrap-around behavior and that is one more thing I have to keep in >> mind when I read code that manipulate unsigned. So YMMV but using unsigned >> *increases* my mental load when reading code. >> > I'm on the other end. I'm always reading the code wondering "is this > going to warn?" "Why could a container ever have a negative number of > elements?" "The maximum value representable by the return type (unsigned) > is larger than that of the value i'm storing it in (signed), so an overflow > could happen even if there were no error. What then?" > > > On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at meinersbur.de> > wrote: > >> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev >> <llvm-dev at lists.llvm.org>: >> > >> > I'm personally against changing everything to signed integers. To me, >> this is an example of making code strictly less readable and more confusing >> in order to fight deficiencies in the language standard. I get the problem >> that it's solving, but I view this as mostly a theoretical problem, whereas >> being able to read the code and have it make sense is a practical problem >> that we must face on a daily basis. If you change everything to signed >> integers, you may catch a real problem with it a couple of times a year. >> And by "real problem" here, I'm talking about a miscompile or an actual bug >> that surfaces in production somewhere, rather than a "yes, it seems >> theoretically possible for this to overflow". >> >> Doesn't it make it already worth it? >> > vector.size() returns a size_t, which on 64-bit platforms can represent > types values larger than those that can fit into an int64_t. So to turn > your argument around, since it's theoretically possible to have a vector > with more items than an int64_t can represent, isn't it already worth it to > use size_t, which is an unsigned type? > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190611/9b6ebf97/attachment.html>
David Greene via llvm-dev
2019-Jun-12 16:54 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
Jake Ehrlich via llvm-dev <llvm-dev at lists.llvm.org> writes:> This whole debate seems kind of odd to me. I don't know that cases > where it isn't clear what type to use come up that often. If a value > can truly never be negative you should use an unsigned value. If a > value can be negative, you should use a signed value. Anecdotal > evidence in my case is that the vast majority of values are unsigned > by this rule. > > Is there a reason to use a signed value when you know a value will > never be negative?Since this thread is really long, I want to make sure to address this specific point even though it's been covered elsewhere. One reason to prefer signed is optimization. The compiler simply cannot optimize code with unsigned as well as it can with signed, because of unsigned's breaking of standard integer algebra. This affects everything from simple expression simplification to vectorization and parallelization. Using unsigned can have serious performance consequences. Because of the nature of the work I do, I see it all the time. Some have said this is premature optimization but to me there is no additional mental load with signed. In fact it's less for me than unsigned because of the mental gymnastics I have to go through to verify code that uses unsigned. -David
Quentin Colombet via llvm-dev
2019-Jun-12 17:11 UTC
[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."
> On Jun 12, 2019, at 9:54 AM, David Greene via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Jake Ehrlich via llvm-dev <llvm-dev at lists.llvm.org> writes: > >> This whole debate seems kind of odd to me. I don't know that cases >> where it isn't clear what type to use come up that often. If a value >> can truly never be negative you should use an unsigned value. If a >> value can be negative, you should use a signed value. Anecdotal >> evidence in my case is that the vast majority of values are unsigned >> by this rule. >> >> Is there a reason to use a signed value when you know a value will >> never be negative? > > Since this thread is really long, I want to make sure to address this > specific point even though it's been covered elsewhere. > > One reason to prefer signed is optimization.FWIW. If you care about optimization, signed size_t is probably the way to go in general. Int type will incur a sign extension for any address accesses on 64-bit platform (32-bit to 64-bit extension). Unsigned on the other hand creates zero extension which are most of the time free. Thus, unsigned is sometimes better for codegen than signed, in particular in a compiler code base where vectorization is not really a thing. Anyway, it seems to me that there are enough people on both sides of the fence that this shouldn’t be in the coding standard. My 2c. Quentin> The compiler simply cannot > optimize code with unsigned as well as it can with signed, because of > unsigned's breaking of standard integer algebra. This affects > everything from simple expression simplification to vectorization and > parallelization. Using unsigned can have serious performance > consequences. Because of the nature of the work I do, I see it all the > time. > > Some have said this is premature optimization but to me there is no > additional mental load with signed. In fact it's less for me than > unsigned because of the mental gymnastics I have to go through to verify > code that uses unsigned. > > -David > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev