thr3ads.net - llvm dev - [llvm-dev] [RFC][binutils] Machine-readable output from Binutils

If this information is useful, please help other people find it:
Share via:

James Henderson via llvm-dev

2020-Jan-10 11:55 UTC

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

Hi all,



I was giving some thought as to possible project ideas I could propose for
this year’s Google Summer of Code, with regards to the LLVM Binutils. One
idea that I had was something discussed at last year’s Euro LLVM developer
meeting, namely machine-readable output from the LLVM Binutils. Before I
actually start advertising this as an open project, I wanted to ask a few
questions:



   1. Are people still interested in this? If so, what is the typical use
   case you’d use the result of this project for? Why would this be better
   than the existing llvm-readobj output (if applicable)?
   2. Which tool(s) and feature(s) would you most want this for? I
   personally think this should just be another output style for llvm-readobj.
   Does anybody have any different opinion there?
   3. Is there any additional tooling in relation to this project that you
   think would be important to be a part of this project, e.g. a lit function
   to query the output?
   4. How might this interact with obj2yaml? Could the new output
   ultimately be used to replace it?
   5. Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
   6. Would anybody be interested in co-mentoring such a project?



Thanks in advance for the comments!



James
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200110/fcf83841/attachment.html>

Greg Bedwell via llvm-dev

2020-Jan-10 14:07 UTC

head link

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

Disclaimer:  I'm sat a few desks away from James in a related team,
although I don't think that we've actually ever discussed this topic at
all.
> Are people still interested in this? If so, what is the typical use caseyou’d use the result of this project for?

Yes.  We have a test framework that extracts a load of metrics from various
large codebases (generally games) built with different toolchain revisions
and stores them in a database for analysis and visualization.  For example,
we get section sizes from llvm-readelf output via regular expression
parsing into json format for database submission.
> Why would this be better than the existing llvm-readobj output (ifapplicable)?

Because it makes me sad to see things like this in my test framework:

^\[\s*(?P<id>\d+)\]\s(?P<section>.+?)\s+(\w+)\s+(\w+)\s+(\w+)\s+(?P<size_hex>\w+)\s.+$
It's far from resilient.

As a comparison we also get metrics from running "llvm-dwarfdump
--statistics" which outputs json so no need for any custom parsing and is
quite lovely.
> Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
In my case ELF and DWARF are the focus.
> Would anybody be interested in co-mentoring such a project?
I'm very happy to provide input as a potential consumer of the data.
Whether that extends as far as co-mentoring I don't mind either way.

-Greg



On Fri, 10 Jan 2020 at 11:56, James Henderson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
>
>
> I was giving some thought as to possible project ideas I could propose for
> this year’s Google Summer of Code, with regards to the LLVM Binutils. One
> idea that I had was something discussed at last year’s Euro LLVM developer
> meeting, namely machine-readable output from the LLVM Binutils. Before I
> actually start advertising this as an open project, I wanted to ask a few
> questions:
>
>
>
>    1. Are people still interested in this? If so, what is the typical use
>    case you’d use the result of this project for? Why would this be better
>    than the existing llvm-readobj output (if applicable)?
>    2. Which tool(s) and feature(s) would you most want this for? I
>    personally think this should just be another output style for
llvm-readobj.
>    Does anybody have any different opinion there?
>    3. Is there any additional tooling in relation to this project that
>    you think would be important to be a part of this project, e.g. a lit
>    function to query the output?
>    4. How might this interact with obj2yaml? Could the new output
>    ultimately be used to replace it?
>    5. Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
>    6. Would anybody be interested in co-mentoring such a project?
>
>
>
> Thanks in advance for the comments!
>
>
>
> James
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200110/bf7b1ad6/attachment.html>

Eric Christopher via llvm-dev

2020-Jan-13 06:56 UTC

head link

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

FWIW I think a --statistics functionality could be useful for object
reading. I think we'd want to see it factored out into a library as part of
the project so that all of the various readers could use it.

-eric

On Fri, Jan 10, 2020 at 6:08 AM Greg Bedwell via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Disclaimer:  I'm sat a few desks away from James in a related team,
> although I don't think that we've actually ever discussed this
topic at all.
>
> > Are people still interested in this? If so, what is the typical use
case
> you’d use the result of this project for?
>
> Yes.  We have a test framework that extracts a load of metrics from
> various large codebases (generally games) built with different toolchain
> revisions and stores them in a database for analysis and visualization.
> For example, we get section sizes from llvm-readelf output via regular
> expression parsing into json format for database submission.
>
> > Why would this be better than the existing llvm-readobj output (if
> applicable)?
>
> Because it makes me sad to see things like this in my test framework:
>
>
^\[\s*(?P<id>\d+)\]\s(?P<section>.+?)\s+(\w+)\s+(\w+)\s+(\w+)\s+(?P<size_hex>\w+)\s.+$
> It's far from resilient.
>
> As a comparison we also get metrics from running "llvm-dwarfdump
> --statistics" which outputs json so no need for any custom parsing and
is
> quite lovely.
>
> > Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
>
> In my case ELF and DWARF are the focus.
>
> > Would anybody be interested in co-mentoring such a project?
>
> I'm very happy to provide input as a potential consumer of the data.
> Whether that extends as far as co-mentoring I don't mind either way.
>
> -Greg
>
>
>
> On Fri, 10 Jan 2020 at 11:56, James Henderson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>>
>>
>> I was giving some thought as to possible project ideas I could propose
>> for this year’s Google Summer of Code, with regards to the LLVM
Binutils.
>> One idea that I had was something discussed at last year’s Euro LLVM
>> developer meeting, namely machine-readable output from the LLVM
Binutils.
>> Before I actually start advertising this as an open project, I wanted
to
>> ask a few questions:
>>
>>
>>
>>    1. Are people still interested in this? If so, what is the typical
>>    use case you’d use the result of this project for? Why would this be
better
>>    than the existing llvm-readobj output (if applicable)?
>>    2. Which tool(s) and feature(s) would you most want this for? I
>>    personally think this should just be another output style for
llvm-readobj.
>>    Does anybody have any different opinion there?
>>    3. Is there any additional tooling in relation to this project that
>>    you think would be important to be a part of this project, e.g. a
lit
>>    function to query the output?
>>    4. How might this interact with obj2yaml? Could the new output
>>    ultimately be used to replace it?
>>    5. Is there a priority for a specific format (e.g. ELF, DWARF,
COFF)?
>>    6. Would anybody be interested in co-mentoring such a project?
>>
>>
>>
>> Thanks in advance for the comments!
>>
>>
>>
>> James
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200112/5241877f/attachment.html>

Jordan Rupprecht via llvm-dev

2020-Jan-13 18:00 UTC

head link

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

On Fri, Jan 10, 2020 at 3:56 AM James Henderson <
jh7370.2008 at my.bristol.ac.uk> wrote:
> Are people still interested in this? If so, what is the typical use case
> you’d use the result of this project for? Why would this be better than the
> existing llvm-readobj output (if applicable)?
>llvm-readobj produces human readable output, which is not always great for
being machine readable. json comes to mind as a more machine-parse-friendly
format, although it may not be the best one.
I'm not sure we have a need *right now* -- we have tools that parse
binutils output and work fine -- but if this tool existed, we would surely
tell people they could use it instead of trying to parse a human readable
tool for any new uses. Or if they scratch their heads trying to update a
regex that parses a binutil, tell them to switch.

> Which tool(s) and feature(s) would you most want this for? I personally
> think this should just be another output style for llvm-readobj. Does
> anybody have any different opinion there?
>I mean... almost every single binutil this might replace (readelf, objdump,
nm, size, strings) could be described as "a program to read object
files".
So a different output style for llvm-readobj sounds fine. "llvm-readobj
--machine"?

How might this interact with obj2yaml? Could the new output ultimately
be> used to replace it?
>I see this as very separate from obj2yaml -- I view that and yaml2obj as a
1:1 mapping between object files and text; in theory you can pipe back and
forth forever and always get the same result (modulo unimportant bit
differences), whereas llvm-readobj --machine should be an inspection tool
that can be filtered/adjusted accordingly (only query certain types of
sections, only print relocations, etc.)

> Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
>Is DWARF support necessary since llvm-dwarfdump already exists?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200113/8f91887f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4849 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200113/8f91887f/attachment.bin>

James Henderson via llvm-dev

2020-Jan-14 09:59 UTC

head link

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

Thanks for the comments Jordan!

On Mon, 13 Jan 2020 at 18:01, Jordan Rupprecht <rupprecht at google.com>
wrote:
>
> Which tool(s) and feature(s) would you most want this for? I personally
>> think this should just be another output style for llvm-readobj. Does
>> anybody have any different opinion there?
>>
> I mean... almost every single binutil this might replace (readelf,
> objdump, nm, size, strings) could be described as "a program to read
object
> files". So a different output style for llvm-readobj sounds fine.
> "llvm-readobj --machine"?
>Yeah, that was my thoughts mostly, although I'd probably go with
"--output-stlye=json" or whatever, and deprecate
"--elf-output-style". That
would a) avoid any weird interactions between the --elf-output-style switch
and the --machine (or whatever) switch, and b) allow other languages to be
added later if the need arose.

> How might this interact with obj2yaml? Could the new output ultimately be
>> used to replace it?
>>
> I see this as very separate from obj2yaml -- I view that and yaml2obj as a
> 1:1 mapping between object files and text; in theory you can pipe back and
> forth forever and always get the same result (modulo unimportant bit
> differences), whereas llvm-readobj --machine should be an inspection tool
> that can be filtered/adjusted accordingly (only query certain types of
> sections, only print relocations, etc.)
>If we adopted YAML as the machine readable format, should there ultimately
be any difference between obj2yaml and llvm-readobj --output-style=yaml
--all? It seems to me not really, hence the thought. But maybe others have
a different overall thought on what the latter might produce.

> Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
>>
> Is DWARF support necessary since llvm-dwarfdump already exists?
>Indeed, llvm-dwarfdump already exists, but like llvm-readobj, its output
isn't always that amenable to machine-reading (see for example --debug-line
output). Hence my question.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200114/3d25c3da/attachment.html>

David Chisnall via llvm-dev

2020-Jan-14 11:24 UTC

head link

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

On 10/01/2020 11:55, James Henderson via llvm-dev wrote:> Hi all,
> 
> I was giving some thought as to possible project ideas I could propose 
> for this year’s Google Summer of Code, with regards to the LLVM 
> Binutils. One idea that I had was something discussed at last year’s 
> Euro LLVM developer meeting, namely machine-readable output from the 
> LLVM Binutils. Before I actually start advertising this as an open 
> project, I wanted to ask a few questions:
> 
>  1. Are people still interested in this? If so, what is the typical use
>     case you’d use the result of this project for? Why would this be
>     better than the existing llvm-readobj output (if applicable)?
>  2. Which tool(s) and feature(s) would you most want this for? I
>     personally think this should just be another output style for
>     llvm-readobj. Does anybody have any different opinion there?
>  3. Is there any additional tooling in relation to this project that you
>     think would be important to be a part of this project, e.g. a lit
>     function to query the output?
>  4. How might this interact with obj2yaml? Could the new output
>     ultimately be used to replace it?
>  5. Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
>  6. Would anybody be interested in co-mentoring such a project?
I wonder if machine-readable output from the tools is actually the 
correct approach.  When I have needed something similar, for example 
when parsing traces from a CPU debug interface and mapping them to 
places in the object code, I have used the same underlying libraries 
that these tools use in LLVM to get much richer output.

When I have done so, I have found that there is a huge amount of 
boilerplate involved.  I would be much more interested in moving a lot 
of the logic in these tools into some higher-level (API-stable) library 
abstractions (with scripting-language bindings) and then reimplementing 
the tools in terms of those libraries.

If at all possible, I'd rather not use these via a serialisation format.

For example, consider the disassembly bit.  There are three steps:

1. The binary encoding of the instruction.
2. The semantic decoding of the operation, the input and output 
operands, including information about the kind of instruction (e.g. 
branch, load, store).
3. The text representation.

A lot of the things where I've wanted machine-readable objdump output, 
I've wanted part of 2.  Consider this line from objdump:

16bed:       48 83 c3 01             add    $0x1,%rbx

It has an address in the binary, the hex of the instruction, and the 
formatted assembly for the instruction.  The first two are pretty easy 
to encode in something like YAML, but would the last bit be just a 
string?  A format string with some more explicit values?  Would that be 
sufficient to know that this is an operation that reads and writes %rbx, 
uses a constant as another operand, and does not modify memory or 
control flow?

David

Steven Wu via llvm-dev

2020-Jan-20 19:39 UTC

head link

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

I think a machine readable format can be helpful to many users. I have seen many
regex in the build system or binary analysis tools trying to parse the output of
`otool` (llvm-objdump). That is really error-prone and makes tuning the output
of binutils tools impossible.

Binding for scripting language sounds nice but it is often not enough to prevent
the people from taking the shortcut (try to avoid building llvm or add extra
dependencies, or allow them to use the language of their choice).

If you want to start working on this, I will suggest starting with something
like architectures/symbol table/section info. Those are more commonly parsed by
regex than some other information like disassembly.

Steven
> On Jan 14, 2020, at 3:24 AM, David Chisnall via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 10/01/2020 11:55, James Henderson via llvm-dev wrote:
>> Hi all,
>> I was giving some thought as to possible project ideas I could propose
for this year’s Google Summer of Code, with regards to the LLVM Binutils. One
idea that I had was something discussed at last year’s Euro LLVM developer
meeting, namely machine-readable output from the LLVM Binutils. Before I
actually start advertising this as an open project, I wanted to ask a few
questions:
>> 1. Are people still interested in this? If so, what is the typical use
>>    case you’d use the result of this project for? Why would this be
>>    better than the existing llvm-readobj output (if applicable)?
>> 2. Which tool(s) and feature(s) would you most want this for? I
>>    personally think this should just be another output style for
>>    llvm-readobj. Does anybody have any different opinion there?
>> 3. Is there any additional tooling in relation to this project that you
>>    think would be important to be a part of this project, e.g. a lit
>>    function to query the output?
>> 4. How might this interact with obj2yaml? Could the new output
>>    ultimately be used to replace it?
>> 5. Is there a priority for a specific format (e.g. ELF, DWARF, COFF)?
>> 6. Would anybody be interested in co-mentoring such a project?
> 
> I wonder if machine-readable output from the tools is actually the correct
approach.  When I have needed something similar, for example when parsing traces
from a CPU debug interface and mapping them to places in the object code, I have
used the same underlying libraries that these tools use in LLVM to get much
richer output.
> 
> When I have done so, I have found that there is a huge amount of
boilerplate involved.  I would be much more interested in moving a lot of the
logic in these tools into some higher-level (API-stable) library abstractions
(with scripting-language bindings) and then reimplementing the tools in terms of
those libraries.
> 
> If at all possible, I'd rather not use these via a serialisation
format.
> 
> For example, consider the disassembly bit.  There are three steps:
> 
> 1. The binary encoding of the instruction.
> 2. The semantic decoding of the operation, the input and output operands,
including information about the kind of instruction (e.g. branch, load, store).
> 3. The text representation.
> 
> A lot of the things where I've wanted machine-readable objdump output,
I've wanted part of 2.  Consider this line from objdump:
> 
> 16bed:       48 83 c3 01             add    $0x1,%rbx
> 
> It has an address in the binary, the hex of the instruction, and the
formatted assembly for the instruction.  The first two are pretty easy to encode
in something like YAML, but would the last bit be just a string?  A format
string with some more explicit values?  Would that be sufficient to know that
this is an operation that reads and writes %rbx, uses a constant as another
operand, and does not modify memory or control flow?
> 
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jan 2020 - [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

[llvm-dev] [RFC][binutils] Machine-readable output from Binutils - possible GSOC project?

Maybe Matching Threads