thr3ads.net - llvm dev - [llvm-dev] [RFC] Semi-Automatic clang-format of files with low frequency [Jul 2020]

If this information is useful, please help other people find it:
Share via:
Chris Tetreault via llvm-dev
2020-Jul-01 16:07 UTC
[llvm-dev] [RFC] Semi-Automatic clang-format of files with low frequency

I am 100% in agreement that all clang-format patches should get actual code
review. Clang-format falls over on lots of things, and I believe that judicious
use of clang-format on/off pairs is the way to go to have a readable
clang-format'ed codebase.
> I don't ever recall such a debate over formatting in a review thread
(and while I don't any longer, for years, I read every review on the mailing
list).
   Speaking from my experience, what usually happens is a newcomer touches a
file, and a person who feels that they have code ownership of it tells them to
make some formatting change. Generally, the newcomer has enough sense not to
fight it and just makes the change. Very rarely does a full blown argument break
out in code review, but even 1 round trip is unfortunate. Another bad thing that
can happen is that a code reviewer *wants* to raise an issue in formatting, but
they don't want to be confrontational so they let it go and/or fix it
themselves later.
> Code is read more than it is written, and optimizing to make the code
easier to read is very important.
   I agree. Personally, I find having the code be consistent is a very important
factor in this equation. What looks nice is largely a matter of opinion, and
with such a diverse set of people working on this codebase you get N different
styles. Frankly, there are several things about the LLVM style that I hate
(opening brace not on its own line, `auto *Foo` instead of `auto* Foo` or `auto
* Foo` among others), that it's already hard for me to read. Part of being a
professional is learning to deal with this sort of stuff, so clang-format making
suboptimal formatting choices is just another thing. But at least it would be
consistent, which is a huge boon to readability in my opinion.

   There are advantages to having complete clang-format coverage:

1) consistency!
2) developers can use git hooks to clang-format to their preferred style
3) nobody has to think about how to format

   There are advantages to not using clang-format as well. Mainly, the fact that
you can have fine-grained control over the formatting. However, I don't
believe that there are any real advantages to being where we are with partial
clang-format coverage. It seems to me that the direction the project is going is
to have full coverage. Why else would Phabricator complain when your code
isn't formatted? Frankly, I think 1 year is easily enough time to get
complete coverage without being overly disruptive to downstreams and developers
with long-lived patches, so I think 1 year being the threshold for "this
file is stable, let's format it" is unreasonably conservative.

Thanks,
   Christopher Tetreault

-----Original Message-----
From: Hal Finkel <hfinkel at anl.gov>
Sent: Tuesday, June 30, 2020 3:37 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Subject: [EXT] Re: [llvm-dev] [RFC] Semi-Automatic clang-format of files with
low frequency


On 6/30/20 5:03 PM, Chris Tetreault via llvm-dev wrote:>> My comment there was more targeted against those who would take your
idea and push it to unnecessary and counterproductive extremes.
> There's nothing counterproductive about just formatting everything and
being done with it. The goal of doing clang-format-diff as we commit code is to
minimize churn so that there isn't one big patch that conflicts with
everything. But if you only format the lines that get touched by a commit, the
work will never be done. At some point you need to finish the work, or the tool
might as well be called clang-make-the-code-inconsistent.

I agree. But I think that we should view this as an iterative process.
We should generate patches for review, and realize that we might need to update
clang-format in some cases instead of just always updating the code.
clang-format has evolved over time, and I doubt that evolution has stopped now.

>
>> Since you're interested in thoughts about clang-format's
>> capabilities, I agree with Matt that the strictness of the approach
>> is a weakness that can frequently make code *less* readable. In
>> addition to what he mentions, here's an example of a bad change
that
>> clang-format wants to make that I found after half a minute of
>> scanning our AMDGPU backend
>> code:
>>
>>    // We only support LOAD/STORE and vector manipulation ops for
vectors
>>    // with > 4 elements.
>> -  for (MVT VT : { MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32,
>> -                  MVT::v2i64, MVT::v2f64, MVT::v4i16, MVT::v4f16,
>> -                  MVT::v4i64, MVT::v4f64, MVT::v8i64, MVT::v8f64,
>> -                  MVT::v16i64, MVT::v16f64, MVT::v32i32, MVT::v32f32
}) {
>> +  for (MVT VT :
>> +       {MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32, MVT::v2i64,
>> +        MVT::v2f64, MVT::v4i16, MVT::v4f16, MVT::v4i64, MVT::v4f64,
MVT::v8i64,
>> +        MVT::v8f64, MVT::v16i64, MVT::v16f64, MVT::v32i32,
>> + MVT::v32f32}) {
>>      for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op) {
>>        switch (Op) {
>>
>> I don't particularly mind that clang-format puts that initializer
>> list onto a new line, or that it changes the whitespace around
>> braces. What I do mind: the original code lays the initializer list
>> out carefully such that integer and floating point types always come
>> in pairs on the same line: v8[if]32 and v16[if]32 on the first line,
>> v2[if]64 and
>> v4[if]64 on the second line, and so on. clang-format mindlessly mushes
the initializer list together in a way that makes it harder to see at a glance
what is going on.
>>
>> (Note that this isn't code that I wrote, so I'm not emotionally
>> attached to it or anything. But I've run into this kind of problem
>> many times during development.)
>>
>> I believe the common theme here is that clang-format ought to have a
mode in which it is more accepting of different layouts of lists co-existing
within the same code base. If that feature was available, I would be a strong
proponent for adopting it in LLVM itself.
>>
>> Cheers,
>> Nicolai
> The problem is that while you may find the way it's formatted to be
easier to read, not everybody does. If I were formatting it manually, I probably
would have had 8 lines of pairs, with one pair per line, or I would have had all
the integer types then all the float types if the order is not important. Now we
have a disagreement, and a needless code review issue in the future when
somebody adds a new MVT that needs to be handled and fails to figure out the
pattern. That's the point of the tool. It can't be reasoned with. It
can't be bargained with. It doesn't feel pity of remorse or fear. And it
absolutely will not stop. Ever. Until your code is formatted correctly.

Okay, wait. Let's not solve problems that we don't have. I don't
ever recall such a debate over formatting in a review thread (and while I
don't any longer, for years, I read every review on the mailing list). I
agree that your formatting suggestion is better than the current one, but if
both the current formatting and your suggested improvement are better than what
clang-format provides, so using the clang-format version seems like a
regression. We should improve things so that using clang-format would be an
improvement, or we should figure out how to exempt this part of the code.

>
> Arguing about code format in code review is a productivity drain. Time
spent figuring out the style of this one random file is a productivity drain.
Time spent inserting a bunch of spaces in a 43 line switch because you added a
new case that is just 1 character too long is a productivity drain. We have a
tool that can do this for us. Maybe it doesn't format everything exactly how
we'd like, but at least you won't have to justify it in code review.
Sure, maybe it will miss things, but at least you don't have to push a new
patch because a line is too long or you put the * next to the type rather than
the variable name. The tool will evolve and improve. Frankly, I'm surprised
that it doesn't already perfectly implement the LLVM coding style, but
surely this is the goal.

No one I know likes spending their time formatting code. We do it when we feel
it is important. Code is read more than it is written, and optimizing to make
the code easier to read is very important. Formatting lists of things, etc.
nicely has helped me catch bugs in the past, and even if the price for that is
having to comment in a review thread about the formatting, or spend some time
formatting, I'll take it. When things are nicely arranged, it's
sometimes more obvious when something just isn't right. Tracking down bugs
in compilers takes far more time than all of that.

Thanks again,

Hal

>
>  From your previous email:
>
>> Hard no on the shorter period. I have changes I'm working on for
several months at a time. 1 year is a minimum for this in my opinion.
> Your patch should already be formatted using clang-format-diff, right?
Phabricator is configured to complain on your commit if this is not the case, so
I don't see how a clang-format commit could possibly cause a conflict for
you.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages Leadership Computing
Facility Argonne National Laboratory
llvm dev - Jul 2020 - [RFC] Semi-Automatic clang-format of files with low frequency

[llvm-dev] [RFC] Semi-Automatic clang-format of files with low frequency