thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Dec-29 19:46 UTC

[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

> On Dec 29, 2016, at 11:20 AM, David Blaikie <dblaikie at gmail.com>
wrote:
> 
> 
> 
> On Thu, Dec 29, 2016 at 10:04 AM Mehdi Amini via cfe-dev <cfe-dev at
lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> On Dec 29, 2016, at 5:54 AM, Daniel Berlin <dberlin at dberlin.org
<mailto:dberlin at dberlin.org>> wrote:
>> 
>> 
>> From yesterday discussion, Daniel Berlin proposed using emplace_back
everywhere to make code easier to maintain. I think it is valid argument, but I
believe it would reduce readability.
>> 
>> Just to be clear; I proposed not trying to switch uses back and forth
without real data, and to come to some agreement about what should be written in
the first place, preferably based on real data, and then, switching in some
organized fashion, not just randomly :)
>> 
>> IE Either have a clear coding standard and enforce it, or leave uses
alone one way or the other without some demonstration that it actually matters.
>> 
>> I would *personally* prefer to just use emplace_back everywhere, and
not try to switch between the two, without demonstration of real benefit.
>> 
>> (i'd also accept the other way, use push_back everywhere, and not
try to switch between the two, without demonstration of real benefit).
>> 
>> This preference is based on a simple truth: People suck at deciding
when to use one or the other. Yes, i know how to use it and when to use it. 
Lots of our types are cheaply movable, etc, so you probably won't see any
performance difference even when using them "right".  People have
enough style/etc issues without having to try to think hard about this.
> 
> I agree that “people suck at deciding”, me in the first place in the line,
and that’s why I like clear and good guideline and stick to it unless
"demonstration of real benefit”.
> I also think we can have clear guideline that are different from “always
use emplace_back” or “always use push_back”, like Piotr is suggesting.
> 
> I don't think Piotr is suggesting "always use push_back" but
"always use push_back when it's valid" & I'd second this.
Define “valid”? Just that it will compile?
> To show a simpler example:
> 
> std::unique_ptr<T> u(v);
> 
> std::unique_ptr<T> u = v;
> 
> I'd generally advocate for using copy init (the second form - it
doesn't copy in this case, it moves) over direct init (the first form)
because it's more legible - copy init can only execute implicit ctor
operations, whereas direct init can invoke implicit and explicit operations. So
from a readability perspective seeing the latter is easier than seing the former
- I know the operation is further constrained to simpler/less interesting
operations (eg: 'v' can't be a raw pointer in the second form, it
might be (& then I have to worry about whether that's an ownership
transfer that's intended), etc)
> 
> & push_back V emplace_back is the same choice - push_back means only
implicit ops are used and so it's more readable
I don’t see what’s more readable about “only implicit ctor”. Emplace is very
explicit that we intended to construct an object, I don’t understand what hurts
readability here?

> , emplace_back requires more careful attention.
> 
> I think this is a reasonably good stylistic rule - but I'm happy
leaving an open to judgment for sure - there might be readability reasons that
an emplace_back may work better in some cases.
> 
> I'd think of this like "else after return" - we don't
have any enforcement, sometimes we make judgment about it being better (for
consistency - sometimes there's a pattern such that else makes the code more
readable), we haven't gone back to do any cleanup of the codebase, but
it's pretty consistently applied/desired in code review.
I have a different impression: we are actively cleaning the codebase with
tidy-checks. For instance look for `git log --author=Zelenko`.
Another example is that a few months ago the entire LLDB codebase has been
formatted with clang-format, after marking the specific places where it was not
desirable (tables for instance) with comment to disable clang-format.

— 
Mehdi

> 
> All that said - sure, I'd like to have tools to help me get this right
(else after return and "prefer copy init style over direct init")
& make that really easy to do.
> 
> - Dave
>  
> 
> That said, I’m not convinced the relative “ugliness" of emplace_back 
(I’m not sure what’s ugly there, I only see it as a habit to take)  vs push_back
is enough to justify a set of rules to decide between the two, I’d be fine with
“always using emplace_back” in the name of “simple rule is better when
possible".
> 
> 
>> 
>> So that isn't what I would necessarily propose for LLVM.
>> 
> 
>> For LLVM, my answer would be "either make tool do what we want,
force use of tool, or be consistent and obvious with what we want and then fix
it if it leads to performance issue”
> 
> +1
> Ultimately whatever we do is not practical if it can’t be done by a tool.
> 
> Not sure I follow this - we have lots of things we don't currently do
with a tool that's still part of our style & stuff we try to catch with
code review, etc. (else after return is my best example)
>  
> 
> 
>> 
>> 
>>  
>> There is also other reason - emplace_back can't take arguments of
some types like bitfields, NULL, static member. Using emplace can also lead to
memory leak in case of smart ptrs
>> "
>> auto *ptr = new int(1);
>> v.push_back(std::unique_ptr<int>(ptr));
>> This is because replacing it with emplace_back could cause a leak of
this pointer if emplace_back would throw exception before emplacement (e.g. not
enough memory to add new element).".
>> 
>> 
>> This seems, IMHO, not a useful argument for LLVM since it does not try
to use exception based error handling to recover.
> 
> 
> This, and also we should not write code like that (naked pointers, calling
new), using make_unique<> for instance would prevent any such situation.
Passing a raw pointer to a container of unique_ptr can be flagged by a tool.
> 
> Sure - but it generalizes further than that, it's just a nice simple
example.
> 
> std::vector<int> v(i);
> 
> 'i' could be an int & this would make a vector of that many
ints, or it could be another vector & this is making a copy, etc.
> 
> std::vector<int> v = i;
> 
> I know taht's not creating a vector of 'i' many ints.
> 
> Yeah, not a great example either - but the general idea applies, the latter
form is less powerful & so is easier to read because you don't have to
worry/think about it as much.
> 
> I think we mostly do this sort of thing out of habit anyway - but codifying
it (& if possible, tools to help) could be nice.
> 
>  
> 
> 
> — 
> Mehdi
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161229/13104706/attachment.html>

David Blaikie via llvm-dev

2016-Dec-29 20:03 UTC

head link

[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

On Thu, Dec 29, 2016 at 11:46 AM Mehdi Amini <mehdi.amini at apple.com>
wrote:
> On Dec 29, 2016, at 11:20 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
>
> On Thu, Dec 29, 2016 at 10:04 AM Mehdi Amini via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> On Dec 29, 2016, at 5:54 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>
> From yesterday discussion, Daniel Berlin proposed using emplace_back
> everywhere to make code easier to maintain. I think it is valid argument,
> but I believe it would reduce readability.
>
>
> Just to be clear; I proposed not trying to switch uses back and forth
> without real data, and to come to some agreement about what should be
> written in the first place, preferably based on real data, and then,
> switching in some organized fashion, not just randomly :)
>
> IE Either have a clear coding standard and enforce it, or leave uses alone
> one way or the other without some demonstration that it actually matters.
>
> I would *personally* prefer to just use emplace_back everywhere, and not
> try to switch between the two, without demonstration of real benefit.
>
> (i'd also accept the other way, use push_back everywhere, and not try
to
> switch between the two, without demonstration of real benefit).
>
> This preference is based on a simple truth: People suck at deciding when
> to use one or the other. Yes, i know how to use it and when to use it.
> Lots of our types are cheaply movable, etc, so you probably won't see
any
> performance difference even when using them "right".  People have
enough
> style/etc issues without having to try to think hard about this.
>
>
> I agree that “people suck at deciding”, me in the first place in the line,
> and that’s why I like clear and good guideline and stick to it unless
> "demonstration of real benefit”.
> I also think we can have clear guideline that are different from “always
> use emplace_back” or “always use push_back”, like Piotr is suggesting.
>
>
> I don't think Piotr is suggesting "always use push_back" but
"always use
> push_back when it's valid" & I'd second this.
>
>
> Define “valid”? Just that it will compile?
>
Yep

>
> To show a simpler example:
>
> std::unique_ptr<T> u(v);
>
> std::unique_ptr<T> u = v;
>
> I'd generally advocate for using copy init (the second form - it
doesn't
> copy in this case, it moves) over direct init (the first form) because
it's
> more legible - copy init can only execute implicit ctor operations, whereas
> direct init can invoke implicit and explicit operations. So from a
> readability perspective seeing the latter is easier than seing the former -
> I know the operation is further constrained to simpler/less interesting
> operations (eg: 'v' can't be a raw pointer in the second form,
it might be
> (& then I have to worry about whether that's an ownership transfer
that's
> intended), etc)
>
> & push_back V emplace_back is the same choice - push_back means only
> implicit ops are used and so it's more readable
>
>
> I don’t see what’s more readable about “only implicit ctor”.
>
It limits the set of operations that can be performed by the code. So when
I read it there's less I have to think about/consider. (& specifically,
the
implicit operations tend to be simpler/safer/more obvious - copy/move or
operations that are similar - not complex/interesting things like "taking
ownership from a raw pointer" or "creating a vector of N
elements")

> Emplace is very explicit that we intended to construct an object, I don’t
> understand what hurts readability here?
>
Going back to the example above, given the following two lines:

std::unique_ptr<T> u(foo());
std::unique_ptr<T> u = foo();

(& the equivalent: emplace_back(foo()) V push_back(foo()) for a vector of
unique_ptr)

the copy init/push_back are simpler to read because they aren't as powerful
- I don't have to wonder if something is taking ownership there (or if
I'm
creating a vector of N ints, etc). I know it's a simple/obvious operation,
generally (because others shouldn't be implicit).

>
>
> , emplace_back requires more careful attention.
>
> I think this is a reasonably good stylistic rule - but I'm happy
leaving
> an open to judgment for sure - there might be readability reasons that an
> emplace_back may work better in some cases.
>
> I'd think of this like "else after return" - we don't
have any
> enforcement, sometimes we make judgment about it being better (for
> consistency - sometimes there's a pattern such that else makes the code
> more readable), we haven't gone back to do any cleanup of the codebase,
but
> it's pretty consistently applied/desired in code review.
>
>
> I have a different impression: we are actively cleaning the codebase with
> tidy-checks.
>
I think that's where Danny is pushing back - I'm on the fence about
that.
So I'd push back on the automated cleanup (& have done so on several
occasions as these sort of patches have been sent) & encourage efforts to
provide the tools (like clang-tidy integration for patches and editors) to
avoid mistakes going forward in an advisory (rather than mandatory) way.

> For instance look for `git log --author=Zelenko`.
> Another example is that a few months ago the entire LLDB codebase has been
> formatted with clang-format, after marking the specific places where it was
> not desirable (tables for instance) with comment to disable clang-format.
>
*nod* Only because it was so far out of LLVM's style (by design
originally). LLVM's generally "close enough" that the cleanup
isn't
desired/intended.

I'd say the same for this particular instance - implicit V explicit ctor
calls. Mostly we write them using copy init, not direct init. I suspect
push_back/emplace_back is similar (if only due to history - and perhaps due
to my/other code review feedback encouraging push_back when possible in
reviews that might otherwise be going to more emplace_back) so I'm not sure
I care too much about the cleanup.

I think the issue Danny was getting at - which I agree - is unless we have
good tools in place to not make these mistakes again going forward (I think
clang-format has sort of reached that point - it's got easy integration in
editors and source control systems) then there's limited merit in doing the
cleanup. +1 to that.

If we have good integration for clang-tidy, then I'd be more OK with doing
cleanup.

Either way (good or not integration) - if we have some people who have ways
of using clang-tidy in their development process, I'd say it's worth
having
a project-wide clang-tidy config. I'd vote for putting "use
implicit-only
ops (like copy init and push_back) over explicit ops (like direct init and
emplace_back) where both compile" in that list.

- Dave

>
> —
> Mehdi
>
>
>
> All that said - sure, I'd like to have tools to help me get this right
> (else after return and "prefer copy init style over direct init")
& make
> that really easy to do.
>
> - Dave
>
>
>
> That said, I’m not convinced the relative “ugliness" of emplace_back 
(I’m
> not sure what’s ugly there, I only see it as a habit to take)  vs push_back
> is enough to justify a set of rules to decide between the two, I’d be fine
> with “always using emplace_back” in the name of “simple rule is better when
> possible".
>
>
> So that isn't what I would necessarily propose for LLVM.
>
> For LLVM, my answer would be "either make tool do what we want, force
use
> of tool, or be consistent and obvious with what we want and then fix it if
> it leads to performance issue”
>
>
> +1
> Ultimately whatever we do is not practical if it can’t be done by a tool.
>
>
> Not sure I follow this - we have lots of things we don't currently do
with
> a tool that's still part of our style & stuff we try to catch with
code
> review, etc. (else after return is my best example)
>
>
>
>
>
>
>
>
> There is also other reason - emplace_back can't take arguments of some
> types like bitfields, NULL, static member. Using emplace can also lead to
> memory leak in case of smart ptrs
> "
>
> auto *ptr = new int(1);v.push_back(std::unique_ptr<int>(ptr));
>
> This is because replacing it with emplace_back could cause a leak of this
> pointer if emplace_back would throw exception before emplacement (e.g.
> not enough memory to add new element).".
>
>
> This seems, IMHO, not a useful argument for LLVM since it does not try to
> use exception based error handling to recover.
>
>
> This, and also we should not write code like that (naked pointers, calling
> new), using make_unique<> for instance would prevent any such
situation.
> Passing a raw pointer to a container of unique_ptr can be flagged by a
tool.
>
>
> Sure - but it generalizes further than that, it's just a nice simple
> example.
>
> std::vector<int> v(i);
>
> 'i' could be an int & this would make a vector of that many
ints, or it
> could be another vector & this is making a copy, etc.
>
> std::vector<int> v = i;
>
> I know taht's not creating a vector of 'i' many ints.
>
> Yeah, not a great example either - but the general idea applies, the
> latter form is less powerful & so is easier to read because you
don't have
> to worry/think about it as much.
>
> I think we mostly do this sort of thing out of habit anyway - but
> codifying it (& if possible, tools to help) could be nice.
>
>
>
>
>
> —
> Mehdi
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161229/0a49cd10/attachment-0001.html>

Chandler Carruth via llvm-dev

2016-Dec-29 22:04 UTC

head link

[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

Somewhat unfortunately, we have two discussions here:

- Clang-tidy has checks that might improve code -- should we deploy them?
If so which?

I'll address this in a separate email, and focus this one on:

- Should we have coding standards around 'push_back' and
'emplace_back',
and if so, what should they say?

I think the amount of confusion makes a coding standard useful.

As for what it should say, I tend to agree with Dave here. In particular:

On Thu, Dec 29, 2016 at 12:03 PM David Blaikie via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Thu, Dec 29, 2016 at 11:46 AM Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
>
> On Dec 29, 2016, at 11:20 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
>
> On Thu, Dec 29, 2016 at 10:04 AM Mehdi Amini via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> On Dec 29, 2016, at 5:54 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>
> From yesterday discussion, Daniel Berlin proposed using emplace_back
> everywhere to make code easier to maintain. I think it is valid argument,
> but I believe it would reduce readability.
>
>
> Just to be clear; I proposed not trying to switch uses back and forth
> without real data, and to come to some agreement about what should be
> written in the first place, preferably based on real data, and then,
> switching in some organized fashion, not just randomly :)
>
> IE Either have a clear coding standard and enforce it, or leave uses alone
> one way or the other without some demonstration that it actually matters.
>
> I would *personally* prefer to just use emplace_back everywhere, and not
> try to switch between the two, without demonstration of real benefit.
>
> (i'd also accept the other way, use push_back everywhere, and not try
to
> switch between the two, without demonstration of real benefit).
>
> This preference is based on a simple truth: People suck at deciding when
> to use one or the other. Yes, i know how to use it and when to use it.
> Lots of our types are cheaply movable, etc, so you probably won't see
any
> performance difference even when using them "right".  People have
enough
> style/etc issues without having to try to think hard about this.
>
>
> I agree that “people suck at deciding”, me in the first place in the line,
> and that’s why I like clear and good guideline and stick to it unless
> "demonstration of real benefit”.
> I also think we can have clear guideline that are different from “always
> use emplace_back” or “always use push_back”, like Piotr is suggesting.
>
>
> I don't think Piotr is suggesting "always use push_back" but
"always use
> push_back when it's valid" & I'd second this.
>
>
> Define “valid”? Just that it will compile?
>
>
> Yep
>
>
>
> To show a simpler example:
>
> std::unique_ptr<T> u(v);
>
> std::unique_ptr<T> u = v;
>
> I'd generally advocate for using copy init (the second form - it
doesn't
> copy in this case, it moves) over direct init (the first form) because
it's
> more legible - copy init can only execute implicit ctor operations, whereas
> direct init can invoke implicit and explicit operations. So from a
> readability perspective seeing the latter is easier than seing the former -
> I know the operation is further constrained to simpler/less interesting
> operations (eg: 'v' can't be a raw pointer in the second form,
it might be
> (& then I have to worry about whether that's an ownership transfer
that's
> intended), etc)
>
> & push_back V emplace_back is the same choice - push_back means only
> implicit ops are used and so it's more readable
>
>
> I don’t see what’s more readable about “only implicit ctor”.
>
>
> It limits the set of operations that can be performed by the code. So when
> I read it there's less I have to think about/consider. (&
specifically, the
> implicit operations tend to be simpler/safer/more obvious - copy/move or
> operations that are similar - not complex/interesting things like
"taking
> ownership from a raw pointer" or "creating a vector of N
elements")
>
>
> Emplace is very explicit that we intended to construct an object, I don’t
> understand what hurts readability here?
>
>
> Going back to the example above, given the following two lines:
>
> std::unique_ptr<T> u(foo());
> std::unique_ptr<T> u = foo();
>
> (& the equivalent: emplace_back(foo()) V push_back(foo()) for a vector
of
> unique_ptr)
>
> the copy init/push_back are simpler to read because they aren't as
> powerful - I don't have to wonder if something is taking ownership
there
> (or if I'm creating a vector of N ints, etc). I know it's a
simple/obvious
> operation, generally (because others shouldn't be implicit).
>
This is exactly where I come down as well.

Another useful way to think about it is "what do I need to understand to
understand the semantics of this operation".

In order to understand push_back, I need only read its documentation. The
type going in will have to have value semantics (potentially with moves).
Otherwise it will be an error. And push_back's documentation says what it
does.

In order to understand a given emplace_back call I have to *both* read its
documentation and the documentation for all of the constructors on the type.


Still another way to see the consequence of this is to look at the nature
of compiler errors when a programmer makes a mistake.

With emplace_back, if you fail to call the constructor correctly, all of
the error messages come with a layer (or many layers) of template
instantiation. With push_back(T(...)), the constructor call is direct and
the error messages are as well.

With emplace_back, if you are converting from one type to another and the
conversion fails, you again get template backtraces. With push_back, all
the conversions happen before the push_back method and so the error is
local to your code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161229/c0277b4a/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Dec 2016 - [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

Reasonably Related Threads