thr3ads.net - llvm dev - [LLVMdev] RFC: variable names [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Richard Smith

2014-Oct-13 23:08 UTC

[LLVMdev] RFC: variable names

On Mon, Oct 13, 2014 at 3:19 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
> On Mon, Oct 13, 2014 at 3:04 PM, Nick Kledzik <kledzik at apple.com>
wrote:
>
>> I’d like to discuss revising the LLVM coding conventions to change the
>> naming of variables to start with a lowercase letter.
>>
>
> Almost all of your negatives of the current conventions also apply to your
> proposed convention.
>
> Type names: CamelCase
> Function names: camelCase
> Variable names: ???
>
> If we name variables in camelCase then variable names and function names
> collide.
>
> If we are going to change how we name variables, I very much want them to
> not collide with either type names or function names. My suggestion would
> be "lower_case" names.
>
I think this would be bad:

  function();
  lambda();
  longFunction();
  long_lambda();

... but possibly not in practice, since function names rarely have only one
word.

A partial-camel-case, partly-underscores convention sounds strange to me.
(I don't find this to be problematic for BIG_SCARY_MACROS and for
ABCK_EnumNamespaces because the former are rare and in the latter case the
underscore isn't a word separator, it's a namespace separator.) We have
a
few people here who are used to such a style (since it's what the Google
style guide and derivatives uses); any useful feedback from that experience?


Some arguments against the change as proposed:

1. Initialisms. It's common in Clang code (also in LLVM?) to use
initialisms as variable names. This doesn't really seem to work for names
that start with a lower case letter.

2. The ambiguity introduced might be worse than the one removed. It's
usually easy to see if a name is a type or variable from the context of the
use. It's not so easy to see if a name is a function or a variable,
especially as more variables become callable due to the prevalence of
lambdas.

This also happens to be the vastly most common pattern across all
C++> coding styles and C-based language coding styles I have seen.
>
>
>>  This should not be a discussion on the pain of such a transition, or
how
>> to get from here to there, but rather, if there is a better place to
be.
>>
>> My arguments for the change are:
>>
>> 1. No other popular C++ coding style uses capitalized variable names.
>> For instance here are other popular C++ conventions that use camelCase:
>>
>>    http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
>>
>This does not use camelCase for variable names.

   http://www.c-xx.com/ccc/ccc.php>>    http://geosoft.no/development/cppstyle.html
>>
>> And, of course, the all-lower-case conventions (e.g. C++ ARM) don’t
>> capitalize variable names.  In addition, all the common C derived
languages
>> don’t use capitalized variable names (e.g. Java, C#, Objective-C).
>>
>Some or all of those other conventions don't capitalize *any* names (other
than perhaps macros), so we're not going to become consistent with them by
making this change.

2. Ambiguity.  Capitalizing type names is common across most
C++>> conventions.  But in LLVM variables are also capitalized which
conflates
>> types and variables.  Starting variable names with a lowercase letter
>> disambiguates variables from types. For instance, the following are
>> ambiguous using LLVM’s conventions:
>>
>> Xxx Yyy(Zzz);  // function prototype or local object construction?
>> Aaa(Bbb);      // function call or cast?
>>
>>
>> 3. Allows name re-use. Since types and variables are both nouns, using
>> different capitalization allows you to use the same simple name for
types
>> and variables, for instance:
>>
>> Stream  stream;
>>
>>
>> 4. Dubious history.  Years ago the LLVM coding convention did not
specify
>> if variables were capitalized or not.  Different contributors used
>> different styles.  Then in an effort to make the style more uniform,
>> someone flipped a coin and updated the convention doc to say variables
>> should be capitalized.  I never saw any on-list discussion about this.
>>
>FWIW, I thought the argument for the current convention was: capitalize
proper nouns (classes and variables), do not capitalize verbs (functions),
as in English. Though maybe that's just folklore.
>5. Momentum only.  When I’ve talked with various contributors privately,
I>> have found no one who says they likes capitalized variables.  It seems
like
>> everyone thinks the conventions are carved in stone...
>>
>Momentum is an argument against the change, not in favour of it: this
change has a re-learning cost for everyone who hacks on LLVM projects.
(Your point that no-one seems to like capitalized variables is valid, but
generally people are opposed to change too.)

I would add:

6. Lower barrier to entry. Our current convention is different from almost
all other C++ code, and new developers *very* frequently get it wrong.

My proposal is that we modify the LLVM Coding Conventions to have
variable>> names start with a lowercase letter.
>>
>> Index: CodingStandards.rst
>>
==================================================================>> ---
CodingStandards.rst (revision 219065)
>> +++ CodingStandards.rst (working copy)
>> @@ -1073,8 +1073,8 @@
>>    nouns and start with an upper-case letter (e.g. ``TextFileReader``).
>>
>>  * **Variable names** should be nouns (as they represent state).  The
>> name should
>> -  be camel case, and start with an upper case letter (e.g. ``Leader``
or
>> -  ``Boats``).
>> +  be camel case, and start with a lower case letter (e.g. ``leader``
or
>> +  ``boats``).
>>
>>  * **Function names** should be verb phrases (as they represent
actions),
>> and
>>    command-like function should be imperative.  The name should be
camel
>> case,
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/906b1292/attachment.html>

Chandler Carruth

2014-Oct-13 23:14 UTC

head link

[LLVMdev] RFC: variable names

On Mon, Oct 13, 2014 at 4:08 PM, Richard Smith <richard at metafoo.co.uk>
wrote:
> I think this would be bad:
>
>   function();
>   lambda();
>   longFunction();
>   long_lambda();
>
> ... but possibly not in practice, since function names rarely have only
> one word.
>
> A partial-camel-case, partly-underscores convention sounds strange to me.
> (I don't find this to be problematic for BIG_SCARY_MACROS and for
> ABCK_EnumNamespaces because the former are rare and in the latter case the
> underscore isn't a word separator, it's a namespace separator.) We
have a
> few people here who are used to such a style (since it's what the
Google
> style guide and derivatives uses); any useful feedback from that
experience?
>
This has never come up as a practical problem in my time at Google. Or at
least, if it has, it was so rare and long ago that I can't remember it. I
don't expect it to be a problem in practice. Mostly that is because all of
the problematic cases have two words in them, with one of the words often
being "is" or a related obvious verb like "get",
"create", etc.

>
> Some arguments against the change as proposed:
>
> 1. Initialisms. It's common in Clang code (also in LLVM?) to use
> initialisms as variable names. This doesn't really seem to work for
names
> that start with a lower case letter.
>
I think wee at least need a good answer to this.

>
> 2. The ambiguity introduced might be worse than the one removed. It's
> usually easy to see if a name is a type or variable from the context of the
> use. It's not so easy to see if a name is a function or a variable,
> especially as more variables become callable due to the prevalence of
> lambdas.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/9860f890/attachment.html>

Richard Smith

2014-Oct-13 23:30 UTC

head link

[LLVMdev] RFC: variable names

On Mon, Oct 13, 2014 at 4:14 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
>
> On Mon, Oct 13, 2014 at 4:08 PM, Richard Smith <richard at
metafoo.co.uk>
> wrote:
>
>> I think this would be bad:
>>
>>   function();
>>   lambda();
>>   longFunction();
>>   long_lambda();
>>
>> ... but possibly not in practice, since function names rarely have only
>> one word.
>>
>> A partial-camel-case, partly-underscores convention sounds strange to
me.
>> (I don't find this to be problematic for BIG_SCARY_MACROS and for
>> ABCK_EnumNamespaces because the former are rare and in the latter case
the
>> underscore isn't a word separator, it's a namespace separator.)
We have a
>> few people here who are used to such a style (since it's what the
Google
>> style guide and derivatives uses); any useful feedback from that
experience?
>>
>
> This has never come up as a practical problem in my time at Google. Or at
> least, if it has, it was so rare and long ago that I can't remember it.
I
> don't expect it to be a problem in practice. Mostly that is because all
of
> the problematic cases have two words in them, with one of the words often
> being "is" or a related obvious verb like "get",
"create", etc.
>
Thanks, that's really helpful to know.

Some arguments against the change as proposed:>>
>> 1. Initialisms. It's common in Clang code (also in LLVM?) to use
>> initialisms as variable names. This doesn't really seem to work for
names
>> that start with a lower case letter.
>>
>
> I think wee at least need a good answer to this.
>
OK; I think if we have a good answer to this, then either variableName or
variable_name works for me (though I still weakly prefer the former).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/25533c90/attachment.html>

Chandler Carruth

2014-Oct-13 23:31 UTC

head link

[LLVMdev] RFC: variable names

On Mon, Oct 13, 2014 at 4:14 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
> 1. Initialisms. It's common in Clang code (also in LLVM?) to use
>> initialisms as variable names. This doesn't really seem to work for
names
>> that start with a lower case letter.
>>
>
> I think wee at least need a good answer to this.
>
As I really suspect this is the most important point to address, let me
make an attempt:

Variable names are *either* initialisms, written as all caps, or terms
written in lower case and separated by underscores. For the purposes of
variable naming "terms" can include words but also extremely common
and
recognizable abbreviations within LLVM such as "rhs", "lhs",
or "gep".
These types of terms should not be written as initialisms but as words. For
example, you might write "LE" or "lhs_expr" for the
Left-hand Expression,
but not "LHSE" or "LHS_expr".

While I'm trying to avoid it, this has the advantage of leaving a large
number of initialisms in the existing code base as "stylish".

I'm not really happy with this rule, but it is the least disruptive and
most consistent I can come up with. I would also be happy encouraging
people to not use initialisms excessively or if confusing. I think the
current codebase uses them more than is helpful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141013/b5984d26/attachment.html>

Joshua Cranmer 🐧

2014-Oct-14 00:31 UTC

head link

[LLVMdev] RFC: variable names

On 10/13/2014 6:08 PM, Richard Smith wrote:> 1. Initialisms. It's common in Clang code (also in LLVM?) to use 
> initialisms as variable names. This doesn't really seem to work for 
> names that start with a lower case letter.
In my local use of LLVM code, I tend to follow a lowercase variable 
naming convention. However, I have taking to using Module *M, Function 
*F, Instruction *I, etc. At longer abbreviations... well, I use gep, 
lhs, rhs, but BB, GV, SI, LI, CI. I suppose my convention ends up being 
that I use upper-case letters if it's referring to the current thing 
being processed (and there is no ambiguity as to what is meant). Either 
that, or I do it only for 1- or 2-character initialisms. :-)

This does have the added benefit of making a distinction between i (the 
integer loop index count) and I (the current instruction being 
processed) exceptionally clear.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

Florian Merz

2014-Oct-14 06:59 UTC

head link

[LLVMdev] RFC: variable names

Am 14.10.2014 um 02:31 schrieb Joshua Cranmer 🐧:> On 10/13/2014 6:08 PM, Richard Smith wrote:
>> 1. Initialisms. It's common in Clang code (also in LLVM?) to use
>> initialisms as variable names. This doesn't really seem to work for
>> names that start with a lower case letter.
> In my local use of LLVM code, I tend to follow a lowercase variable
> naming convention. However, I have taking to using Module *M, Function
> *F, Instruction *I, etc. At longer abbreviations... well, I use gep,
> lhs, rhs, but BB, GV, SI, LI, CI. I suppose my convention ends up being
> that I use upper-case letters if it's referring to the current thing
> being processed (and there is no ambiguity as to what is meant). Either
> that, or I do it only for 1- or 2-character initialisms. :-)
>
> This does have the added benefit of making a distinction between i (the
> integer loop index count) and I (the current instruction being
> processed) exceptionally clear.I always had the impression there is an implicit naming convention along 
the following lines:

In general, variable names are in lower case, even when abbreviated. 
But, if a variable's name refers to its type and the type is in the llvm 
or clang namespaces, then use the initials of the type in capitals. So 
it would be an llvm::BasicBlock called BB, an llvm::Instruction called 
I, and an llvm::GlobalValue called GV, but it would be rhs, because 
there is no class llvm::RightHandSide.

This way, there is an implicit list of terms to abbreviate in capitals 
in the code base, namely all types in the llvm namespace.

I can also imagine a short list of reserved variable names in the coding 
stype, e.g. I may only be used for Instructions, BB only for basic 
blocks, ...

llvm dev - Oct 2014 - [LLVMdev] RFC: variable names

[LLVMdev] RFC: variable names

[LLVMdev] RFC: variable names

[LLVMdev] RFC: variable names

[LLVMdev] RFC: variable names

[LLVMdev] RFC: variable names

[LLVMdev] RFC: variable names