thr3ads.net - llvm dev - [llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON) [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Simon Tatham via llvm-dev

2018-Apr-23 12:08 UTC

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

Hello llvm-dev,

Would there be any interest in adding a back end to TableGen to produce output
in a general-purpose but machine-parsable format?

At the moment, TableGen has two kinds of output option. The -gen-foo options are
each tailored to a particular use case; the -print-records option is fully
general, but it's difficult to machine-parse, since its output is in the
same syntax as the TableGen input language, so any tool that wants to analyse it
and pick out some particular class of fact has to start by doing half of
TableGen's work all over again.

I've often thought it would be useful to have an output mode which produces
all the same information as -print-records, but in a format that's easily
parsed by existing standard library facilities in typical scripting languages
such as Python. (My opening bid would be JSON.) This would make it convenient to
take a large TableGen input such as an entire target description, and run
automated processing over it.

Here are a few examples of things I've wanted to do in the past, and would
rather have done by this method instead of resorting to fragile regex-based
matching on the -print-records output:

* Iterate over all instances of the Instruction class, and output the fixed bits
of each one's bits<> vector. (Useful to collect a set of starting
points for disassembly testing.)

* List all subtarget features on which at least one Instruction class is
conditional. (Useful to collect a set of modes to run testing in.)

* Extract the number, names and types of the oops and iops for each Instruction,
in a form that's easy to use to annotate post-isel LLVM diagnostic output.
(Useful if you can never remember which way round all the operands go!)

I've written a proof-of-concept back end that outputs JSON, and produces
complete enough data to let me implement any of the above examples in a few
lines of Python, and (I hope) also the next few queries along these lines that I
might happen to think of.

I chose JSON because I wanted it to be supported by the Python core standard
library without needing to install any third-party modules. (XML would have been
OK as well from that perspective, but JSON is considerably simpler: the Python
JSON reader can be called in one line of code without having to set up lots of
machinery like a custom parser subclass, and it delivers output in a data
structure better suited to the kinds of query I list above.)

Would there be any interest in me finishing this up (polishing the code,
documenting the output data representation, etc) and sharing it?

(Of course, another class of use case that this would make easier is the use of
TableGen for things that have nothing to do with LLVM, like that blog post a few
years back from someone who was using it to manage a set of related SSH
configuration files. But I have no idea whether that counts as a pro or a con
:-)

Cheers,
Simon

Chris Lattner via llvm-dev

2018-Apr-24 03:20 UTC

head link

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

> On Apr 23, 2018, at 5:08 AM, Simon Tatham via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hello llvm-dev,
> 
> Would there be any interest in adding a back end to TableGen to produce
output in a general-purpose but machine-parsable format?
> 
> At the moment, TableGen has two kinds of output option. The -gen-foo
options are each tailored to a particular use case; the -print-records option is
fully general, but it's difficult to machine-parse, since its output is in
the same syntax as the TableGen input language, so any tool that wants to
analyse it and pick out some particular class of fact has to start by doing half
of TableGen's work all over again.
> 
> I've often thought it would be useful to have an output mode which
produces all the same information as -print-records, but in a format that's
easily parsed by existing standard library facilities in typical scripting
languages such as Python. (My opening bid would be JSON.) This would make it
convenient to take a large TableGen input such as an entire target description,
and run automated processing over it.
> 
> Here are a few examples of things I've wanted to do in the past, and
would rather have done by this method instead of resorting to fragile
regex-based matching on the -print-records output:
> 
> * Iterate over all instances of the Instruction class, and output the fixed
bits of each one's bits<> vector. (Useful to collect a set of starting
points for disassembly testing.)
> 
> * List all subtarget features on which at least one Instruction class is
conditional. (Useful to collect a set of modes to run testing in.)
> 
> * Extract the number, names and types of the oops and iops for each
Instruction, in a form that's easy to use to annotate post-isel LLVM
diagnostic output. (Useful if you can never remember which way round all the
operands go!)
> 
> I've written a proof-of-concept back end that outputs JSON, and
produces complete enough data to let me implement any of the above examples in a
few lines of Python, and (I hope) also the next few queries along these lines
that I might happen to think of.
> 
> I chose JSON because I wanted it to be supported by the Python core
standard library without needing to install any third-party modules. (XML would
have been OK as well from that perspective, but JSON is considerably simpler:
the Python JSON reader can be called in one line of code without having to set
up lots of machinery like a custom parser subclass, and it delivers output in a
data structure better suited to the kinds of query I list above.)
> 
> Would there be any interest in me finishing this up (polishing the code,
documenting the output data representation, etc) and sharing it?
This makes sense to me, it seems like general goodness and fits with the spirit
of tblgen.

-Chris

> 
> (Of course, another class of use case that this would make easier is the
use of TableGen for things that have nothing to do with LLVM, like that blog
post a few years back from someone who was using it to manage a set of related
SSH configuration files. But I have no idea whether that counts as a pro or a
con :-)
> 
> Cheers,
> Simon
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nicolai Hähnle via llvm-dev

2018-Apr-24 07:29 UTC

head link

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

Hi Simon,

that makes sense to me. Please add me on any reviews when you're done.

Cheers,
Nicolai

On 23.04.2018 14:08, Simon Tatham via llvm-dev wrote:> Hello llvm-dev,
> 
> Would there be any interest in adding a back end to TableGen to produce
output in a general-purpose but machine-parsable format?
> 
> At the moment, TableGen has two kinds of output option. The -gen-foo
options are each tailored to a particular use case; the -print-records option is
fully general, but it's difficult to machine-parse, since its output is in
the same syntax as the TableGen input language, so any tool that wants to
analyse it and pick out some particular class of fact has to start by doing half
of TableGen's work all over again.
> 
> I've often thought it would be useful to have an output mode which
produces all the same information as -print-records, but in a format that's
easily parsed by existing standard library facilities in typical scripting
languages such as Python. (My opening bid would be JSON.) This would make it
convenient to take a large TableGen input such as an entire target description,
and run automated processing over it.
> 
> Here are a few examples of things I've wanted to do in the past, and
would rather have done by this method instead of resorting to fragile
regex-based matching on the -print-records output:
> 
> * Iterate over all instances of the Instruction class, and output the fixed
bits of each one's bits<> vector. (Useful to collect a set of starting
points for disassembly testing.)
> 
> * List all subtarget features on which at least one Instruction class is
conditional. (Useful to collect a set of modes to run testing in.)
> 
> * Extract the number, names and types of the oops and iops for each
Instruction, in a form that's easy to use to annotate post-isel LLVM
diagnostic output. (Useful if you can never remember which way round all the
operands go!)
> 
> I've written a proof-of-concept back end that outputs JSON, and
produces complete enough data to let me implement any of the above examples in a
few lines of Python, and (I hope) also the next few queries along these lines
that I might happen to think of.
> 
> I chose JSON because I wanted it to be supported by the Python core
standard library without needing to install any third-party modules. (XML would
have been OK as well from that perspective, but JSON is considerably simpler:
the Python JSON reader can be called in one line of code without having to set
up lots of machinery like a custom parser subclass, and it delivers output in a
data structure better suited to the kinds of query I list above.)
> 
> Would there be any interest in me finishing this up (polishing the code,
documenting the output data representation, etc) and sharing it?
> 
> (Of course, another class of use case that this would make easier is the
use of TableGen for things that have nothing to do with LLVM, like that blog
post a few years back from someone who was using it to manage a set of related
SSH configuration files. But I have no idea whether that counts as a pro or a
con :-)
> 
> Cheers,
> Simon
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Simon Tatham via llvm-dev

2018-Apr-25 14:56 UTC

head link

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

> From: Nicolai Hähnle [mailto:nhaehnle at gmail.com]
> Sent: 24 April 2018 08:29
> 
> that makes sense to me. Please add me on any reviews when you're done.
Thanks! https://reviews.llvm.org/D46054 is a first draft, with a big list in the
log message of all the things I know I haven't done yet.

Cheers,
Simon

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Apr 2018 - RFC: general machine-parsable backend for TableGen (e.g. JSON)

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

[llvm-dev] RFC: general machine-parsable backend for TableGen (e.g. JSON)

Apparently Analagous Threads