thr3ads.net - llvm dev - [LLVMdev] Status of YAML IO? [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Nick Kledzik

2012-Oct-30 17:38 UTC

[LLVMdev] Status of YAML IO?

On Oct 30, 2012, at 7:12 AM, Shankar Easwaran wrote:
> Hi Nick,
> 
> I had a few questions :-
> 
> 1) Is there a way to validate that the input file is of a valid format,
thats defined by the YAML Reader ?Do you mean different than if the yaml reader accepts it?   Tons of files will
be valid yaml syntactically.  It is the semantic level checking that is hard,
and that is what YAML I/O does.
> 2) How are you plannning to represent section groups in the YAML ?You mean the ELF concept of section groups in YAML encoded ELF?   The YAML
encoding of ELF (or COFF or mach-o) does not know anything deeper about the
meaning of the files.  It is just the bytes from each section and the entries in
the symbol table.  If a section group is a section of bytes which are
interpreted as an array of symbol/section indexes, then the ELF encoded YAML
just has the raw bytes for the section.
> 3) How are you planning to support Atom specific flags ? Is there a way
already ?
>     (This would be needed to group similiar atoms together)It is still an open question how to support platform specific Atom attributes. 
As much as possible we'd like to expand the Atom model to be a superset of
all the platform specific flags.  But there are some attributes that are very
much tied to one platform.  One idea is to just add a new Reference which has no
target but its kind (and maybe addend) encode the platform specific attributes. 
The Reference kind is already platform specific.

> 4) Are you planning to support representing shared libraries too in this
model ?Yes, we already support shared library atoms in yaml.

> 5) are you planning to support dwarf information too ?Debugging information is another big open question.  The dwarf format is very
much tied to the section model.  Not only is the debug information put is
sections with special names, but the dwarf debug into references code by its
address in the .o files (the Atom model does not model addresses).    I'm
sure the lldb guys have some ideas on direction of where they would like debug
information to go.  It may be that the Atom model has a different representation
for debug info.  And when generating a final linked image you can choose the
debug format you want.  A Writer could convert the debug info to dwarf if
requested.

-Nick
> 
> On 10/29/2012 9:26 PM, Nick Kledzik wrote:
>> Michael,
>> 
>> To validate the refactor of YAML Reader/Writer using YAML I/O.  I
updated all the test cases to be compatible with YAML I/O.  One issue that was a
gnarly was how to handle the test cases with archives.  Currently, we have test
cases like:
>> 
>> ---
>> atoms:
>>     - name: foo
>>      # more stuff
>> ---
>> archive:
>>    - name bar.o 
>>      atoms:
>>        - name:  bar
>>         # more stuff
>> 
>> 
>> This sort of weak/dynamic typing is hard to using with YAML I/O which
enforces stronger typing which helps it do better error checking.   The core of
the problem is when a new document is started, we don't know what kind of
file it is going to be, to know what keys are legal.   I first looked into used
tags to specify the document type.  For example:
>> 
>> --- !archive
>> members:
>>    - name: bar.o
>>    # more stuff
>> 
>> But after modifying YAMLParser to make that the tag available, then
trying to figure out how to make the tag actionable in the trait, I realized
that for maps, the tag is just like another key.  So, if every client agreed
that the first key/value was a particular key name (e.g. tag:  type) which YAML
I/O already supports, then there is no need for tags and no need for an
additional mechanism in YAML I/O.
>> 
>> So, I know have the traits set up to support archives assuming the
first (option) key of each document type read by lld will be "kind:". 
The archive-basic.objctxt case now looks like:
>> 
>> # RUN: lld-core %s | FileCheck %s
>> 
>> #
>> # Tests archives in YAML. Tests that an undefined in a regular file
will load
>> # all atoms in select archive members.
>> #
>> 
>> ---
>> defined-atoms:
>>     - name:              foo
>>       type:              code
>> 
>> undefined-atoms:
>>     - name:              bar
>> 
>> ---
>> kind:                   archive
>> members:
>>   - name:               bar.o
>>     content:
>>       defined-atoms:
>>         - name:              bar
>>           scope:             global
>>           type:              code
>> 
>>         - name:              bar2
>>           type:              code
>> 
>>   - name:               baz.o
>>     content: 
>>       defined-atoms:
>>         - name:              baz
>>           scope:             global
>>           type:              code
>> 
>>         - name:              baz2
>>           type:              code
>> ...
>> 
>> # CHECK:       name:       foo
>> # CHECK-NOT:  undefined-atoms:
>> # CHECK:       name:       bar
>> # CHECK:       name:       bar2
>> # CHECK-NOT:   name:       baz
>> # CHECK:       ...
>> 
>> My thinking is that we can extend this to support embedded
COFF/ELF/MachO in yaml by using new kind values.  For example:
>> 
>> ---
>> kind:                   object-coff
>> header:
>>    # stuff 
>> sections:
>>    # stuff 
>> symbols:
>>    # stuff 
>> ...
>> 
>> The MappingTrait<const ld::File*> will look at the kind value and
switch off it.   We just need an external function (per file format) which can
be called with the same mapping() parameters which will do the io.map*() calls
and have traits for platform specific types,  which turns the yaml into an
in-memory binary object, then runs the Reader to return a File*.  I'll be
prototyping this approach for mach-o.
>> 
>> -Nick
>> 
>> 
>> On Oct 25, 2012, at 9:59 AM, Sean Silva wrote:
>>>> To better understand how a client would use YAML I/O.  I've
completely rewritten the ReaderYAML and WriterYAML in lld to use YAML I/O.  The
source code is now about half the size.  But more importantly,  the error
checking is much, much better and any time an attribute (e.g. of an Atom) is
changed or added, there is just one place to update the yaml code instead of two
places (the reader and writer).
>>> 
>>> Fantastic!
>> 
> 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by the Linux Foundation
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/db965a52/attachment.html>

Shankar Easwaran

2012-Oct-30 18:10 UTC

head link

[LLVMdev] Status of YAML IO?

Thanks for the reply Nick.

1) Is there a way to validate that the input file is of a valid format, 
thats defined by the YAML Reader ?> Do you mean different than if the yaml reader accepts it? Tons of 
> files will be valid yaml syntactically.  It is the semantic level 
> checking that is hard, and that is what YAML I/O does.
>Yes, if the YAML reader accepts it and figures out that its not the 
format what ReaderYAML needs.
>> 2) How are you plannning to represent section groups in the YAML ?
> You mean the ELF concept of section groups in YAML encoded ELF?   The 
> YAML encoding of ELF (or COFF or mach-o) does not know anything deeper 
> about the meaning of the files.  It is just the bytes from each 
> section and the entries in the symbol table.  If a section group is a 
> section of bytes which are interpreted as an array of symbol/section 
> indexes, then the ELF encoded YAML just has the raw bytes for the section.
>Ok.
>> 3) How are you planning to support Atom specific flags ? Is there a 
>> way already ?
>>     (This would be needed to group similiar atoms together)
> It is still an open question how to support platform specific Atom 
> attributes.  As much as possible we'd like to expand the Atom model to 
> be a superset of all the platform specific flags.  But there are some 
> attributes that are very much tied to one platform.  One idea is to 
> just add a new Reference which has no target but its kind (and maybe 
> addend) encode the platform specific attributes.  The Reference kind 
> is already platform specific.
How about if the atom flags could be overridden ? The Atom flag could 
have a MIN/MAX and anything above the MAX or lower than the MIN are 
platform specific, like how its dealt with section indexes ?
>> 4) Are you planning to support representing shared libraries too in 
>> this model ?
> Yes, we already support shared library atoms in yaml.
>
Sorry didnt check that.>
>> 5) are you planning to support dwarf information too ?
> Debugging information is another big open question.  The dwarf format 
> is very much tied to the section model.  Not only is the debug 
> information put is sections with special names, but the dwarf debug 
> into references code by its address in the .o files (the Atom model 
> does not model addresses).    I'm sure the lldb guys have some ideas 
> on direction of where they would like debug information to go.  It may 
> be that the Atom model has a different representation for debug info. 
>  And when generating a final linked image you can choose the debug 
> format you want.  A Writer could convert the debug info to dwarf if 
> requested.Wouldnt it be hard to get the source / line information right if the 
linker tries to write the debug information ?

Thanks

Shankar Easwaran

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the
Linux Foundation

Nick Kledzik

2012-Oct-30 22:34 UTC

head link

[LLVMdev] Status of YAML IO?

On Oct 30, 2012, at 11:10 AM, Shankar Easwaran wrote:>>> 
>>> 3) How are you planning to support Atom specific flags ? Is there a
way already ?
>>>    (This would be needed to group similiar atoms together)
>> It is still an open question how to support platform specific Atom
attributes.  As much as possible we'd like to expand the Atom model to be a
superset of all the platform specific flags.  But there are some attributes that
are very much tied to one platform.  One idea is to just add a new Reference
which has no target but its kind (and maybe addend) encode the platform specific
attributes.  The Reference kind is already platform specific.
> 
> How about if the atom flags could be overridden ? The Atom flag could have
a MIN/MAX and anything above the MAX or lower than the MIN are platform
specific, like how its dealt with section indexes ?
I know ELF file format has some ranges for various values that are specifically
reserved for processors or "user" defined functionality.  It serves
the needs of ELF well.  It allows processor and software tools teams to use ELF
but work independently (and/or in secret) on new functionality without needed to
coordinate with a central ELF owner.

But lld is different. It is not a file format.  It is an API. If a particular
processor needs to express something not captured in the Atom model, we should
discuss what that functionality is and see if we can grow the Atom model.  There
may well be another processor that needs some similar functionality.    If we
added a generic uint32_t DefinedAtom::flags() method, I would be concerned that
lld porters would be quick to just use the bits for whatever they need and not
see if the Atom model needs expanding.

An example of something I added (but am not happy with) is
DefinedAtom::isThumb().  This is something only applicable to ARM (and only if
you care about interop of thumb and arm code).

Given that the Reference::Kind field is already platform specific, I'm
leaning towards saying that the way to add platform specific atom attributes is
to add a Reference with no target to the Atom with a Kind field that for that
platform means whatever attribute you need.

> 
>> 
>>> 5) are you planning to support dwarf information too ?
>> Debugging information is another big open question.  The dwarf format
is very much tied to the section model.  Not only is the debug information put
is sections with special names, but the dwarf debug into references code by its
address in the .o files (the Atom model does not model addresses).    I'm
sure the lldb guys have some ideas on direction of where they would like debug
information to go.  It may be that the Atom model has a different representation
for debug info.  And when generating a final linked image you can choose the
debug format you want.  A Writer could convert the debug info to dwarf if
requested.
> Wouldnt it be hard to get the source / line information right if the linker
tries to write the debug information ?Just as hard as reading and writing dwarf debug information in general ;-)

Let me also mention why the debug information is not an issue for MacOS/iOS. 
Dwarf is designed to work with "dumb" linkers or "smart"
linkers.  A dumb linker just copies all the dwarf sections from all input files
to the output file, and applies any relocations.  This is simple, but the
resulting dwarf is huge with tons of "dead" dwarf in it (because of
coalescing by the linker).  A smart linker knows how to parse up dwarf and
optimize the combining of sections.  The resulting dwarf is much smaller, but it
takes a lot of computation to do the merge.

When we (Apple/darwin) switched from stabs to dwarf years ago, we decided to
take a different approach. We realized a dumb linker would be slow because of
all the I/O copying dwarf.  A smart linker would be slow because of all the
computation needed.  So, instead the darwin linker just ignores all dwarf in .o
files!  Instead it writes "debug notes" to the final linked image that
lists the paths to all the .o files used to create the image.  This approach
makes linking fast.  Next, if you happen to run the program in the debugger, the
debugger would see the debug notes and go read the .o files' dwarf
information.  Lastly, if you are making a release build, you run a tool called
dsymutil on the final linked image.  dsymutil finds the debug notes, parses the
.o files' dwarf information then does all the computation to produce an
optimal dwarf output file (we use a .dSYM extension).  Later, if you need to
debug a release build, you point the debugger at the .dSYM file.

Perhaps the initial approach you should take for ELF is to go the dumb linker
route.  Have the ELF reader produce on Atom for each dwarf section with all the
fixups/References needed.  Then the ELF Writer will just concatenate those
sections into the output file, and apply the fixups.

-Nick

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Oct 2012 - [LLVMdev] Status of YAML IO?

[LLVMdev] Status of YAML IO?

[LLVMdev] Status of YAML IO?

[LLVMdev] Status of YAML IO?

Seemingly Similar Threads