Michael Spencer
2012-Aug-01 21:23 UTC
[LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.
LLVM Command Line Library I'm proposing a heavy weight command line parsing and generating library for LLVM to replace Clang's parser and provide one for lld and any future tools that may need it. The scope of this library is slightly larger than what Clang has now, but not much. It is centered around the concept of a Tool. A Tool has a set of Options which can be parsed to Arguments or rendered from Arguments. It also has a set of Transformations that convert Arguments from another Tool to Arguments for itself. An Argument is an Option with bound values. Scope: * Parse argv/argc into an ArgumentList according to a TableGen file describing the Options for a given Tool. * Provide typo correction for Options. * Provide a way to print help text. * Render an ArgumentList to a string suitable for invoking another Tool. * Transform an ArgumentList from one Tool to another. The major addition this has over Clang is Transformations. A Transformation is a mapping from one pattern in an ArgumentList to another. These replace the hand written code in a driver that reads arguments and generates a command line to call another tool. An example for this from Clang would be going from Clang to Clang -cc1 options. Quite a few of these are trivial forwards, while others are more complicated and may depend on the values of other arguments. Transformations not only make this simpler, they also allow other drivers to more easily target Clang -cc1. A cl.exe style driver would get its own Tool, Options, and Transformation set. This also makes calling out to a single type of tool, such as the linker, with various tools that implement it (gnu-ld, ld64, link.exe) easier. You simply select which Tool to use for transformation, and render the resulting ArgumentList to a string to pass to the program. The TableGen Option definitions provide enough information to both parse and render command lines. This allows us to have a single definition of as and ld options and be able to reuse them in both Clang to call the tool, and in the llvm implementation of the tool itself to parse the command line. Here's a mockup of a TableGen file for part of Clang: Option.td: class Tool { // The list of all possible prefixes. Not every option in the tool has all // prefixes. Any string that does not begin with one of these prefixes and is // not an argument to a previous option is considered an input Argument. A // string that does begin with a prefix but is not a known option is eligible // for typo-correction. } def joined; def separate; def or; def str; class Option<list<string> prefixes, string name, Tool tool, dag strparse, string render, dag rendermatch> { // The tool this Option belongs to. Tool Tool_ = tool; // How to parse the Option from argc+argv. dag StringParse = strparse; // How to render the Option to a string. RenderMatch is used to capture // values and assign them identifiers. When Render is printed, these values // are inserted into it in the marked locations. string Render = render; dag RenderMatch = rendermatch; // The meta-variable name of each value. list<string> ValueMetavars; // The list of valid prefixes for this Option. The parser will check if // Prefixes[i] + Name is a prefix of a potential Option for each prefix in // Prefixes. list<string> Prefixes = prefixes; // The name of this Option without any prefixes or postfixes. This is what // typo correction is checked against. string Name = name; // Is Name case insensitive. bit IsCaseInsensitive = 0; // Should this Option be hidden from the default help. bit IsHidden = 0; // Used as a tiebraker when multiple Options share the same prefix. Higher // values are picked first. int Priority = 0; // The single Option that this Option is an alias of. Option Alias = ?; // The help text for this Option. string HelpText = ?; } class Alias<Option alias> { Option Alias = alias; } class MetaVars<list<string> mv> { list<string> ValueMetavars = mv; } class CaseInsensitive { bit IsCaseInsensitive = 1; } Clang.td: include "Option.td" def clang : Tool; class ClangOption< list<string> prefixes , string name , dag strparse , string render , dag rendermatch> : Option<prefixes, name, clang, strparse, render, rendermatch>; class ClangFlag<string name> : ClangOption<["-"], name, ?, "-"#name#, ?>; class ClangSingleLetterOption<string name> : ClangOption< ["-"], name, (or (joined (str ""), (str:$v0)), (separate (str:$v0))) , "-"#name#"$v0", (str:$v0)> { int Priority = 1; } def clang_f_strict_enums : ClangFlag<"fstrict-enums">; def clang_f_no_strict_enums : ClangFlag<"fno-strict-enums">; def clang_f_fast_math : ClangFlag<"ffast-math">; def clang_o : ClangSingleLetterOption<"o">, MetaVars<["<file>"]>; // And now for a simi-strange one. -ftemplate-depth. def clang_f_template_depth : ClangOption< ["-"], "ftemplate-depth" , (or (joined (str "="), (str:$v0)), (joined (str "-"), (str:$v0))) , "-ftemplate-depth=$v0", (str:$v0)>; // Note that we don't need to also have a clang_f_template_depth_EQ. // One with a limited set of values. class ClangSeparateValues<string name, list<string> values> : ClangOption< ["-"], name , (joined (str "="), (str:$v0 values)) , "-"#name#"=$v0", (str:$v0)>; // This won't match unless the value is one of the ones in the list. We can // generate a very good error message with the information we have that // includes the list of valid values. def clang_f_fp_contract : ClangSeparateValues<"ffp-contract", ["fast", "on", "off"]>; ClangCC1.td: include "Option.td" def clang_cc1 : Tool; class ClangCC1Option< list<string> prefixes , string name , dag strparse , string render , dag rendermatch> : Option<prefixes, name, clang_cc1, strparse, render, rendermatch>; class ClangCC1Flag<string name> : ClangCC1Option<["-"], name, ?, "-"#name#, ?>; class ClangCC1Separate<string name> : ClangCC1Option<["-"], name, (separate (str:$v0)), "-"#name#" $v0", (str:$v0)>; class ClangCC1SeparateValues<string name, list<string> values> : ClangCC1Option< ["-"], name , (joined (str "="), (str:$v0 values)) , "-"#name#"=$v0", (str:$v0)>; def clang_cc1_f_strict_enums : ClangCC1Flag<"fstrict-enums">; def clang_cc1_f_template_depth : ClangCC1Separate<"ftemplate-depth">; def clang_cc1_f_fp_contract : ClangCC1SeparateValues<"ffp-contract", ["fast", "on", "off"]>; You may wonder why the parsing info is a dag instead of just being essentially an enum value as it is in Clang's current implementation. The main reason for this is that there exist tools with option formats that do not nicely fit into that model. And in fact have many different ways of representing arguments. These are actually very simple to convert to C++ code from TableGen. It is also trivial to merge identical parsers before generating them, which means there's no code size explosion. Here's an example of what this would generate. ArgParseResult parseJoinedOrSeperate(const ArgParseState APS) { return parseOr(parseJoined("", parseStr(0)), parseSeperate(parseStr(0)))(APS); } Each parse* function is a template function which creates a function object that implements that parser with the given arguments. The integer argument for parseStr tell it which Argument value slot to put it in. This is based on v0 from above. This is an idea of what transforms would look like: def not; class Transform<list<dag> match, list<dag> produce> { list<dag> M = match; list<dag> P = produce; } include "Clang.td" include "ClangCC1.td" def : Transform< [(clang_f_strict_enums), (not clang_f_no_strict_enums)]) , [(clang_cc1_f_strict_enums)]>; def : Transform< [(clang_f_template_depth (str:$v0))] , [(clang_cc1_f_template_depth (str:$v0))]>; // Since this case is common, there would probably be a: def : Forward<clang_f_template_depth, clang_cc1_f_template_depth>; // This would simply copy the Argument values. def : Forward<clang_f_fp_contract, clang_cc1_f_fp_contract>; def : Transform< [(clang_f_fast_math), (not clang_f_fp_contract)] , [(clang_cc1_f_fp_contract (str "fast"))]>; For each Transform, each dag in M is matched against the ArgumentList in order. Once a dag matches an Argument the process continues with the next Argument in the list. Values are extracted using :$<name>. If all dags in M are satisfied, the dag in P has its :$<name> values substituted, converted to an Argument, then added to the output ArgumentList. Not all transforms can be represented in this manner, but you can still hand write the code for these casses. Attached is a patch that adds tools/llvm-cltest. This currently contains code that should be in a library and will not exist in the final version. This is a proof of concept for what TableGen would actually generate. It does not contain the actual TableGen implementation. - Michael Spencer -------------- next part -------------- A non-text attachment was scrubbed... Name: OptionParsing.patch Type: application/octet-stream Size: 43723 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120801/e9d31f0c/attachment.obj>
Michael Spencer
2012-Aug-09 20:26 UTC
[LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.
On Wed, Aug 1, 2012 at 2:23 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:> LLVM Command Line Library > > I'm proposing a heavy weight command line parsing and generating library for > LLVM to replace Clang's parser and provide one for lld and any future tools > that may need it. > > The scope of this library is slightly larger than what Clang has now, but not > much. > > It is centered around the concept of a Tool. A Tool has a set of Options which > can be parsed to Arguments or rendered from Arguments. It also has a set of > Transformations that convert Arguments from another Tool to Arguments for > itself. > > An Argument is an Option with bound values. > > Scope: > > * Parse argv/argc into an ArgumentList according to a TableGen file describing > the Options for a given Tool. > > * Provide typo correction for Options. > > * Provide a way to print help text. > > * Render an ArgumentList to a string suitable for invoking another Tool. > > * Transform an ArgumentList from one Tool to another. > > The major addition this has over Clang is Transformations. A Transformation is a > mapping from one pattern in an ArgumentList to another. These replace the hand > written code in a driver that reads arguments and generates a command line to > call another tool. > > An example for this from Clang would be going from Clang to Clang -cc1 options. > Quite a few of these are trivial forwards, while others are more complicated > and may depend on the values of other arguments. > > Transformations not only make this simpler, they also allow other drivers to > more easily target Clang -cc1. A cl.exe style driver would get its own Tool, > Options, and Transformation set. > > This also makes calling out to a single type of tool, such as the linker, with > various tools that implement it (gnu-ld, ld64, link.exe) easier. You simply > select which Tool to use for transformation, and render the resulting > ArgumentList to a string to pass to the program. > > The TableGen Option definitions provide enough information to both parse and > render command lines. This allows us to have a single definition of as and ld > options and be able to reuse them in both Clang to call the tool, and in the > llvm implementation of the tool itself to parse the command line. > > Here's a mockup of a TableGen file for part of Clang: > > Option.td: > class Tool { > // The list of all possible prefixes. Not every option in the tool has all > // prefixes. Any string that does not begin with one of these > prefixes and is > // not an argument to a previous option is considered an input Argument. A > // string that does begin with a prefix but is not a known option > is eligible > // for typo-correction. > } > > def joined; > def separate; > def or; > def str; > > class Option<list<string> prefixes, string name, Tool tool, dag > strparse, string render, dag rendermatch> { > // The tool this Option belongs to. > Tool Tool_ = tool; > > // How to parse the Option from argc+argv. > dag StringParse = strparse; > > // How to render the Option to a string. RenderMatch is used to capture > // values and assign them identifiers. When Render is printed, these values > // are inserted into it in the marked locations. > string Render = render; > dag RenderMatch = rendermatch; > > // The meta-variable name of each value. > list<string> ValueMetavars; > > // The list of valid prefixes for this Option. The parser will check if > // Prefixes[i] + Name is a prefix of a potential Option for each prefix in > // Prefixes. > list<string> Prefixes = prefixes; > > // The name of this Option without any prefixes or postfixes. This is what > // typo correction is checked against. > string Name = name; > > // Is Name case insensitive. > bit IsCaseInsensitive = 0; > > // Should this Option be hidden from the default help. > bit IsHidden = 0; > > // Used as a tiebraker when multiple Options share the same prefix. Higher > // values are picked first. > int Priority = 0; > > // The single Option that this Option is an alias of. > Option Alias = ?; > > // The help text for this Option. > string HelpText = ?; > } > > class Alias<Option alias> { > Option Alias = alias; > } > > class MetaVars<list<string> mv> { > list<string> ValueMetavars = mv; > } > > class CaseInsensitive { > bit IsCaseInsensitive = 1; > } > > Clang.td: > include "Option.td" > > def clang : Tool; > > class ClangOption< list<string> prefixes > , string name > , dag strparse > , string render > , dag rendermatch> > : Option<prefixes, name, clang, strparse, render, rendermatch>; > > class ClangFlag<string name> > : ClangOption<["-"], name, ?, "-"#name#, ?>; > > class ClangSingleLetterOption<string name> > : ClangOption< ["-"], name, (or (joined (str ""), (str:$v0)), > (separate (str:$v0))) > , "-"#name#"$v0", (str:$v0)> { > int Priority = 1; > } > > def clang_f_strict_enums : ClangFlag<"fstrict-enums">; > def clang_f_no_strict_enums : ClangFlag<"fno-strict-enums">; > def clang_f_fast_math : ClangFlag<"ffast-math">; > def clang_o : ClangSingleLetterOption<"o">, MetaVars<["<file>"]>; > > // And now for a simi-strange one. -ftemplate-depth. > def clang_f_template_depth > : ClangOption< ["-"], "ftemplate-depth" > , (or (joined (str "="), (str:$v0)), > (joined (str "-"), (str:$v0))) > , "-ftemplate-depth=$v0", (str:$v0)>; > // Note that we don't need to also have a clang_f_template_depth_EQ. > > // One with a limited set of values. > class ClangSeparateValues<string name, list<string> values> > : ClangOption< ["-"], name > , (joined (str "="), (str:$v0 values)) > , "-"#name#"=$v0", (str:$v0)>; > > // This won't match unless the value is one of the ones in the list. We can > // generate a very good error message with the information we have that > // includes the list of valid values. > def clang_f_fp_contract : ClangSeparateValues<"ffp-contract", > ["fast", "on", "off"]>; > > ClangCC1.td: > include "Option.td" > > def clang_cc1 : Tool; > > class ClangCC1Option< list<string> prefixes > , string name > , dag strparse > , string render > , dag rendermatch> > : Option<prefixes, name, clang_cc1, strparse, render, rendermatch>; > > class ClangCC1Flag<string name> > : ClangCC1Option<["-"], name, ?, "-"#name#, ?>; > class ClangCC1Separate<string name> > : ClangCC1Option<["-"], name, (separate (str:$v0)), "-"#name#" > $v0", (str:$v0)>; > class ClangCC1SeparateValues<string name, list<string> values> > : ClangCC1Option< ["-"], name > , (joined (str "="), (str:$v0 values)) > , "-"#name#"=$v0", (str:$v0)>; > > def clang_cc1_f_strict_enums : ClangCC1Flag<"fstrict-enums">; > def clang_cc1_f_template_depth : ClangCC1Separate<"ftemplate-depth">; > def clang_cc1_f_fp_contract : ClangCC1SeparateValues<"ffp-contract", > ["fast", "on", "off"]>; > > You may wonder why the parsing info is a dag instead of just being essentially > an enum value as it is in Clang's current implementation. The main reason for > this is that there exist tools with option formats that do not nicely fit into > that model. And in fact have many different ways of representing arguments. > > These are actually very simple to convert to C++ code from TableGen. It is also > trivial to merge identical parsers before generating them, which means there's > no code size explosion. Here's an example of what this would generate. > > ArgParseResult parseJoinedOrSeperate(const ArgParseState APS) { > return parseOr(parseJoined("", parseStr(0)), > parseSeperate(parseStr(0)))(APS); > } > > Each parse* function is a template function which creates a function object that > implements that parser with the given arguments. The integer argument for > parseStr tell it which Argument value slot to put it in. This is based on v0 > from above. > > This is an idea of what transforms would look like: > > def not; > > class Transform<list<dag> match, list<dag> produce> { > list<dag> M = match; > list<dag> P = produce; > } > > include "Clang.td" > include "ClangCC1.td" > > def : Transform< [(clang_f_strict_enums), (not clang_f_no_strict_enums)]) > , [(clang_cc1_f_strict_enums)]>; > > def : Transform< [(clang_f_template_depth (str:$v0))] > , [(clang_cc1_f_template_depth (str:$v0))]>; > // Since this case is common, there would probably be a: > def : Forward<clang_f_template_depth, clang_cc1_f_template_depth>; > // This would simply copy the Argument values. > > def : Forward<clang_f_fp_contract, clang_cc1_f_fp_contract>; > > def : Transform< [(clang_f_fast_math), (not clang_f_fp_contract)] > , [(clang_cc1_f_fp_contract (str "fast"))]>; > > For each Transform, each dag in M is matched against the ArgumentList in order. > Once a dag matches an Argument the process continues with the next Argument in > the list. Values are extracted using :$<name>. If all dags in M are satisfied, > the dag in P has its :$<name> values substituted, converted to an Argument, then > added to the output ArgumentList. > > Not all transforms can be represented in this manner, but you can still hand > write the code for these casses. > > Attached is a patch that adds tools/llvm-cltest. This currently contains code > that should be in a library and will not exist in the final version. This is a > proof of concept for what TableGen would actually generate. It does not contain > the actual TableGen implementation. > > - Michael Spencerping. - Michael Spencer
Chris Lattner
2012-Aug-11 01:28 UTC
[LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.
On Aug 9, 2012, at 1:26 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:>> Attached is a patch that adds tools/llvm-cltest. This currently contains code >> that should be in a library and will not exist in the final version. This is a >> proof of concept for what TableGen would actually generate. It does not contain >> the actual TableGen implementation. >> >> - Michael Spencer > > ping.Are you proposing that clang switch over to this, or that this be used solely by lld? -Chris
Possibly Parallel Threads
- [LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.
- [LLVMdev] Function Type and Argument List
- [LLVMdev] How to get the const argument data from Function?
- [LLVMdev] Accessing a function's arguments
- [LLVMdev] Accessing a function's arguments