Hi! The rewrite of the attributes class is well underway. The next step is to add support for the expanded rôle of attributes in the language and IR. This is the final proposal for the language changes. There isn't a lot of new information except for the syntax changes for the new feature. Executive Summary: The new syntax is: #0 = attributes { noinline align=4 "cpu"="cortex-a8" } #1 = attributes { attr = (val1 val2 val3) } #bork = attributes { sspreq noredzone } define void @foo() #0 #bork { ret void } The 'align' and "cpu" attributes both have a value associated with them. The 'attr' attribute in '#1' has multiple values associated with it. (The BNF is below in the 'IR Changes' section.) Attribute groups with the same attributes in them, but in a different order, are identical. So '@bar' and '@qux' have the same attributes in this example: #0 = attributes { align=4 } #1 = attributes { sspreq noredzone } #2 = attributes { noredzone sspreq align=4 noinline } define void @bar() noinline #0 #1 { ret void } define void @qux() #2 { ret void } When run through 'llvm-as', the disassembly will look similar to this: #0 = attributes { noredzone sspreq align=4 noinline } define void @bar() #0 { ret void } define void @qux() #0 { ret void } Thanks! -bw Passing Options to Different Parts of the Compiler Problem ====== There is a growing need to pass information from the front-end to different parts of the compiler, especially code generation. LTO, for instance, needs to encode within the .o files the options it was compiled with. Otherwise, the code generator could generate code that is unexpected -- e.g., generating SSE instructions when the programmer used the `-mno-sse' flag to compile that module. After considering several different options, we decided it was best to extend the Attributes class to support *all* code generation options, even target-specific ones. Proposal ======= We will expand the Attriutes class to support all of the attributes that the compiler may care about. Anything that affects code transformations and code generation will be specified inside of the Attributes class. This allows for a cleaner interface for the front-ends, since they won't have to fill in a target-specific structure to pass along this information. It also allows for LTO to merge files that were compiled with different options. It can determine if it's possible to inline one function into another based upon the options with which it was compiled. And finally, it's necessary for correctness. LTO currently ignores the command line options with which a file was compiled. There are two classes of attributes: those that are target-independent (e.g., 'noinline'), and those that are target-dependent (e.g., 'thumb' and 'cpu=cortex-a8'). The target-dependent options are stored as strings inside of the Attributes class. The target's back-end is responsible for interpreting target-dependent attributes. Attributes should be documented in the language reference document. IR Changes ---------- The attributes will be specified within the IR. This allows us to generate code that the user wants. This also has the advantage that it will no longer be necessary to specify all of the command line options when compiling the bit code (via 'llc' or 'clang'). E.g., '-mcpu=cortex-a8' will be an attribute and won't be required on llc's command line. However, explicit flags (like `-mcpu') on the llc command line will override flags specified in the module. The core of this proposal is the idea of an "attribute group". As the name implies, it's a group of attributes that are then referenced by objects within the IR. An attribute group is a module-level object. The BNF of the syntax is: attribute_group := <attrgroup_id> '=' attributes '{' <attribute_list> '}' attrgroup_id := #<id> attribute_list := <attribute> <attribute>* attribute := <name> ('=' <list_of_values>)? list_of_values := <value> | '(' <value> <value>* ')' id := <number> | <name> To use an attribute group, an object references the attribute group's ID: attribute_group_ref := attrgroup(<attrgroup_id>) This is an example of an attribute group for a function that should always be inlined, has stack alignment of 4, and doesn't unwind: #1 = attributes { alwaysinline nounwind alignstack=4 } void @foo() #1 { ret void } An object may refer to more than one attribute group. In that situation, the attributes are merged. Attribute groups are important for keeping `.ll' files readable, because a lot of functions will use the same attributes. In the degenerative case of a `.ll' file that corresponds to a single `.c' file, the single `attrgroup' will capture the command line flags used to build that file. Target-Dependent Attributes in IR --------------------------------- The front-end is responsible for knowing which target-dependent options are interesting to the target. Target-dependent attributes are specified as strings, which are understood by the target's back-end. E.g.: #0 = attributes { "long-calls", "cpu=cortex-a8", "thumb" } define void @func() #0 { ret void } The ARM back-end is the only target that knows about these options and what to do with them. Some of the `cl::opt' options in the backend could move into attribute groups. This will clean up the compiler. Updating IR ----------- The current attributes that are specified on functions will be moved into an attribute group. The LLVM assembly reader will still honor those but when the assembly file is emitted, those attributes will be output as an attribute group by the assembly writer. As usual, LLVM 3.3 will be able to read and auto-upgrade previous bitcode and `.ll' files. Querying -------- The attributes are attached to the function. It's therefore trivial to access the attributes within the middle- and the back-ends. Here's an example of how attributes are queried: Attributes A = F.getAttributes(); // Target-independent attribute query. A.hasAttribute(Attributes::NoInline); // Target-dependent attribute query. A.hasAttribute("no-sse"); // Retrieving value of a target-independent attribute. int Alignment = A.getIntValue(Attributes::Alignment); // Retrieving value of a target-dependent attribute. StringRef CPU = A.getStringValue("cpu");
On Jan 29, 2013, at 2:42 PM, Bill Wendling <isanbard at gmail.com> wrote:> Executive Summary: > > The new syntax is: > > #0 = attributes { noinline align=4 "cpu"="cortex-a8" } > #1 = attributes { attr = (val1 val2 val3) } > #bork = attributes { sspreq noredzone } > > define void @foo() #0 #bork { ret void }The general syntax LGTM. It seems clean and fits well with what we have.> The 'align' and "cpu" attributes both have a value associated with them. The 'attr' attribute in '#1' has multiple values associated with it. (The BNF is below in the 'IR Changes' section.) Attribute groups with the same attributes in them, but in a different order, are identical. So '@bar' and '@qux' have the same attributes in this example:What is the use case for the multi-value attribute? Perhaps obvious, but it makes sense to stage this out to add one thing at a time. Also, your BNF doesn't make it clear what is allowed for <value>: I would assume it is a set of hardcoded keywords (like align, sspreq, etc) plus the string form. If so, how does "val1" fit into that?> Proposal > =======> > We will expand the Attriutes class to support all of the attributes that the > compiler may care about. Anything that affects code transformations and code > generation will be specified inside of the Attributes class. This allows for a > cleaner interface for the front-ends, since they won't have to fill in a > target-specific structure to pass along this information.It will also hopefully allow us to eliminate most or all of the "global" variables crammed into the TargetOptions class.> There are two classes of attributes: those that are target-independent (e.g., > 'noinline'), and those that are target-dependent (e.g., 'thumb' and > 'cpu=cortex-a8'). The target-dependent options are stored as strings inside of > the Attributes class. The target's back-end is responsible for interpreting > target-dependent attributes. > > Attributes should be documented in the language reference document.Target specific attributes should probably be listed in the target-specific section of docs/CodeGenerator.rst.> The core of this proposal is the idea of an "attribute group". As the name > implies, it's a group of attributes that are then referenced by objects within > the IR. An attribute group is a module-level object. The BNF of the syntax is: > > attribute_group := <attrgroup_id> '=' attributes '{' <attribute_list> '}' > attrgroup_id := #<id> > attribute_list := <attribute> <attribute>* > attribute := <name> ('=' <list_of_values>)? > list_of_values := <value> | '(' <value> <value>* ')' > id := <number> | <name>As mentioned above, it is unclear what "value" is. It makes sense to me to make it one of a hard-coded list of well-known stuff we know about (like align) plus double-quoted strings, used for target-specific stuff.> Attribute groups are important for keeping `.ll' files readable, because a lot > of functions will use the same attributes. In the degenerative case of a `.ll' > file that corresponds to a single `.c' file, the single `attrgroup' will capture > the command line flags used to build that file.It's worth noting that the structure of attribute groups is an .ll file syntax thing, they aren't reflected in the IR once parsed.> > Target-Dependent Attributes in IR > --------------------------------- > > The front-end is responsible for knowing which target-dependent options are > interesting to the target. Target-dependent attributes are specified as strings, > which are understood by the target's back-end. E.g.: > > #0 = attributes { "long-calls", "cpu=cortex-a8", "thumb" }"cpu=cortex-a8" or "cpu"="cortex-a8"? -Chris
On Feb 3, 2013, at 10:45 AM, Chris Lattner <clattner at apple.com> wrote:> On Jan 29, 2013, at 2:42 PM, Bill Wendling <isanbard at gmail.com> wrote: >> Executive Summary: >> >> The new syntax is: >> >> #0 = attributes { noinline align=4 "cpu"="cortex-a8" } >> #1 = attributes { attr = (val1 val2 val3) } >> #bork = attributes { sspreq noredzone } >> >> define void @foo() #0 #bork { ret void } > > The general syntax LGTM. It seems clean and fits well with what we have. >Thanks! :-)>> The 'align' and "cpu" attributes both have a value associated with them. The 'attr' attribute in '#1' has multiple values associated with it. (The BNF is below in the 'IR Changes' section.) Attribute groups with the same attributes in them, but in a different order, are identical. So '@bar' and '@qux' have the same attributes in this example: > > What is the use case for the multi-value attribute? Perhaps obvious, but it makes sense to stage this out to add one thing at a time. Also, your BNF doesn't make it clear what is allowed for <value>: I would assume it is a set of hardcoded keywords (like align, sspreq, etc) plus the string form. If so, how does "val1" fit into that? >I added it because I wanted to limit possible future changes to the IR. But that may be premature. It would essentially act like the '-mattr' command line option works today. Of course the -mattr functionality is covered by this proposal. I will omit that part and leave it for a future expansions if necessary. It won't be a major change to the IR at that point. My first thoughts about what <value> can be is something that can be represented by a Constant object. So a keyword, string, or numerical value. I expect string values to be used mainly for target-dependent attributes. The other two forms would be used for target-independent attributes defined in the LangRef.>> Proposal >> =======>> >> We will expand the Attriutes class to support all of the attributes that the >> compiler may care about. Anything that affects code transformations and code >> generation will be specified inside of the Attributes class. This allows for a >> cleaner interface for the front-ends, since they won't have to fill in a >> target-specific structure to pass along this information. > > It will also hopefully allow us to eliminate most or all of the "global" variables crammed into the TargetOptions class. >God-willing, yes! :-)>> There are two classes of attributes: those that are target-independent (e.g., >> 'noinline'), and those that are target-dependent (e.g., 'thumb' and >> 'cpu=cortex-a8'). The target-dependent options are stored as strings inside of >> the Attributes class. The target's back-end is responsible for interpreting >> target-dependent attributes. >> >> Attributes should be documented in the language reference document. > > Target specific attributes should probably be listed in the target-specific section of docs/CodeGenerator.rst. >Okay, that makes sense.>> The core of this proposal is the idea of an "attribute group". As the name >> implies, it's a group of attributes that are then referenced by objects within >> the IR. An attribute group is a module-level object. The BNF of the syntax is: >> >> attribute_group := <attrgroup_id> '=' attributes '{' <attribute_list> '}' >> attrgroup_id := #<id> >> attribute_list := <attribute> <attribute>* >> attribute := <name> ('=' <list_of_values>)? >> list_of_values := <value> | '(' <value> <value>* ')' >> id := <number> | <name> > > As mentioned above, it is unclear what "value" is. It makes sense to me to make it one of a hard-coded list of well-known stuff we know about (like align) plus double-quoted strings, used for target-specific stuff. >That sounds reasonable.>> Attribute groups are important for keeping `.ll' files readable, because a lot >> of functions will use the same attributes. In the degenerative case of a `.ll' >> file that corresponds to a single `.c' file, the single `attrgroup' will capture >> the command line flags used to build that file. > > It's worth noting that the structure of attribute groups is an .ll file syntax thing, they aren't reflected in the IR once parsed. >Yes. In particular, we won't be trying to preserve the .ll syntax when it's run through llvm-as and llvm-dis.>> >> Target-Dependent Attributes in IR >> --------------------------------- >> >> The front-end is responsible for knowing which target-dependent options are >> interesting to the target. Target-dependent attributes are specified as strings, >> which are understood by the target's back-end. E.g.: >> >> #0 = attributes { "long-calls", "cpu=cortex-a8", "thumb" } > > "cpu=cortex-a8" or "cpu"="cortex-a8"?The latter. -bw
> To use an attribute group, an object references the attribute group's ID: > > attribute_group_ref := attrgroup(<attrgroup_id>)Is this unused now? I don't see it anywhere else in the proposal. -- Sean Silva
This was replaced by having a #<num> referenced by the object. Kind of like how metadata is referenced. -bw On Feb 3, 2013, at 4:50 PM, Sean Silva <silvas at purdue.edu> wrote:>> To use an attribute group, an object references the attribute group's ID: >> >> attribute_group_ref := attrgroup(<attrgroup_id>) > > Is this unused now? I don't see it anywhere else in the proposal. > > -- Sean Silva
Possibly Parallel Threads
- [LLVMdev] [RFC] Attributes Rewrite (Final)
- [LLVMdev] [RFC] Attributes Rewrite (Final)
- [LLVMdev] [RFC] Passing Options to Different Parts of the Compiler Using Attributes
- [LLVMdev] [RFC] Passing Options to Different Parts of the Compiler Using Attributes
- [LLVMdev] [RFC] Passing Options to Different Parts of the Compiler Using Attributes