thr3ads.net - llvm dev - [LLVMdev] [lld] Representation of lld::Reference with a fake target [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Rui Ueyama

2015-Feb-07 03:28 UTC

[LLVMdev] [lld] Representation of lld::Reference with a fake target

Not all input files have to be able to represented in YAML/Native format.
There are many unrealistic use cases there. No one wants to write an
executable file in Native because there's no operating system that can run
that file. So is YAML. So is the combination of .so file and Native/YAML
unless we have an operating system whose loader is able to loads a YAML .so
file.

We might want to write a Native/YAML file as a re-linkable object file (in
GNU it's -r option), but that's an object file.

So it's totally okay if some input file type is not representable in
YAML/Native. Some use cases are not real. We can't force all developers to
spend their time to support unrealistic use cases.

On Fri, Feb 6, 2015 at 7:04 PM, Shankar Easwaran <shankarke at gmail.com>
wrote:
> The intermediate result is what is really written to disk when
> --output-filetype=yaml or native is chosen too.
>
>
> Writing to YAML/Reading back YAML is not doable when you convert input
> files to atoms because some of the input files are not representable in
> YAML format.
>
> On Fri, Feb 6, 2015 at 8:48 PM, Rui Ueyama <ruiu at google.com>
wrote:
>
>> I think no one is opposing the idea of reading and writing YAML.
>>
>> The problem here is that why we need to force all developers to write
>> code to serialize intermediate data in the middle of link, which no one
>> except the round-trip passes needs.
>>
>> On Fri, Feb 6, 2015 at 6:41 PM, Shankar Easwaram <shankarke at
gmail.com>
>> wrote:
>>
>>> Doing it for every input file is not useful as some of the input
files
>>> are not represent able in YAML form. Examples are shared libraries.
>>>
>>> The reason I made the yaml pass be called before the writer was the
>>> intermediate result was more complete since all atoms have been
resolved at
>>> that point and the state of all atoms are much sane.
>>>
>>> It was also easy to use the pass manager. the code was very small
to
>>> achieve what we are trying to do that all the information to the
writer is
>>> passed through references or atom properties.
>>>
>>> Shankar Easwaran
>>>
>>>
>>>
>>> On Feb 6, 2015, at 19:54, Rui Ueyama <ruiu at google.com>
wrote:
>>>
>>> On Fri, Feb 6, 2015 at 5:42 PM, Michael Spencer <bigcheesegs at
gmail.com>
>>> wrote:
>>>
>>>> On Fri, Feb 6, 2015 at 5:31 PM, Rui Ueyama <ruiu at
google.com> wrote:
>>>> > There are two questions.
>>>> >
>>>> > Firstly, do you think the on-disk format needs to
compatible with a
>>>> C++
>>>> > struct so that we can cast that memory buffer to the
struct? That may
>>>> be
>>>> > super-fast but that also comes with many limitations.
It's hard to
>>>> extend,
>>>> > for example. Every time we want to store variable-length
objects we
>>>> need to
>>>> > define string-table-like data structure. And I'm not
very sure that
>>>> it's
>>>> > fastest -- because mmap'able objects are not very
compact on disk,
>>>> slow disk
>>>> > IO could be a bottleneck, if we compare that with more
compact file
>>>> format.
>>>> > I believe Protobufs or Thrust are fast enough or even
might be faster.
>>>>
>>>> I'm not sure here. Although I do question if the object
files will
>>>> even need to be read from disk in your standard
edit/compile/debug
>>>> loop or on a build server. I believe we'll need real data
to determine
>>>> this.
>>>>
>>>> >
>>>> > Secondly, do you know why we are dumping post-linked
object file to
>>>> Native
>>>> > format? If we want to have a different kind of *object*
file format,
>>>> we
>>>> > would want to have a tool to convert an object file in an
existing
>>>> file
>>>> > format (say, ELF) to "native", and teach LLD how
read from the file.
>>>> > Currently we are writing a file in the middle of linking
process,
>>>> which
>>>> > doesn't make sense to me.
>>>>
>>>> This is an artifact of having the native format before we had
any
>>>> readers. I agree that it's weird and not terribly useful to
write to
>>>> native format in the middle of the link, although I have found
it
>>>> helpful to output yaml. There's no need to be able to read
it back in
>>>> and resume though.
>>>>
>>>
>>> Even for YAML it doesn't make much sense to write it to a file
and read
>>> it back from the file in the middle of the link, do it? I found
that being
>>> able to output YAML is useful too, but round-trip is a different
thing. In
>>> the middle of the process, we have bunch of additional information
that
>>> doesn't exist in input files and doesn't have to be output
to the link
>>> result. Ability to serialize that intermediate result is not
useful.
>>>
>>> Shankar, you added these round-trip tests. Do you have any opinion?
>>>
>>> Ideally lld -r would be the tool we use to convert COFF/ELF/MachO
to
>>>> the native format.
>>>>
>>>> - Michael Spencer
>>>>
>>>> >
>>>> > On Fri, Feb 6, 2015 at 5:02 PM, Michael Spencer <
>>>> bigcheesegs at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> On Fri, Feb 6, 2015 at 2:54 PM, Rui Ueyama <ruiu at
google.com> wrote:
>>>> >> > Can we remove Native format support? I'd like
to get input from
>>>> anyone
>>>> >> > who
>>>> >> > wants to keep the current Native format in LLD.
>>>> >>
>>>> >> One of the original goals for LLD was to provide a new
object file
>>>> >> format for performance. The reason it is not used
currently is
>>>> because
>>>> >> we've yet to teach llvm to generate it, and we
haven't done that
>>>> >> because it hasn't been finalized yet. The value it
currently provides
>>>> >> is catching stuff like this, so we can fix it now
instead of down the
>>>> >> road when we actually productize the native format.
>>>> >>
>>>> >> As for the specific implementation of the native
format, I'm open to
>>>> >> an extensible format, but only if the performance cost
is low.
>>>> >>
>>>> >> - Michael Spencer
>>>> >>
>>>> >> >
>>>> >> > On Thu, Feb 5, 2015 at 2:03 PM, Shankar Easwaran
>>>> >> > <shankare at codeaurora.org>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> The only way currently is to create a new
reference, unless we can
>>>> >> >> think
>>>> >> >> of adding some target specific metadata
information in the Atom
>>>> model.
>>>> >> >>
>>>> >> >> This has come up over and over again, we need
something in the
>>>> Atom
>>>> >> >> model
>>>> >> >> to store information that is target specific.
>>>> >> >>
>>>> >> >> Shankar Easwaran
>>>> >> >>
>>>> >> >>
>>>> >> >> On 2/5/2015 2:22 PM, Simon Atanasyan wrote:
>>>> >> >>>
>>>> >> >>> Hi,
>>>> >> >>>
>>>> >> >>> I need an advice on implementation of a
very specific kind of
>>>> >> >>> relocations
>>>> >> >>> used by MIPS N64 ABI. As usual the main
problem is how to pass
>>>> target
>>>> >> >>> specific
>>>> >> >>> data over Native/YAML conversion barrier.
>>>> >> >>>
>>>> >> >>> In this ABI relocation record r_info
field in fact consists of
>>>> five
>>>> >> >>> subfields:
>>>> >> >>> * r_sym   - symbol index
>>>> >> >>> * r_ssym  - special symbol
>>>> >> >>> * r_type3 - third relocation type
>>>> >> >>> * r_type2 - second relocation type
>>>> >> >>> * r_type  - first relocation type
>>>> >> >>>
>>>> >> >>> Up to three these relocations applied one
by one. The first
>>>> relocation
>>>> >> >>> uses
>>>> >> >>> an addendum from the relocation record.
Each subsequent
>>>> relocation
>>>> >> >>> takes
>>>> >> >>> as
>>>> >> >>> its addend the result of the previous
operation. Only the final
>>>> >> >>> operation
>>>> >> >>> actually modifies the location relocated.
The first relocation
>>>> uses as
>>>> >> >>> a reference symbol specified by the r_sym
field. The third
>>>> relocation
>>>> >> >>> assumes NULL symbol.
>>>> >> >>>
>>>> >> >>> The most interesting case is the second
relocation. It uses the
>>>> >> >>> special
>>>> >> >>> symbol value given by the r_ssym field.
This field can contain
>>>> four
>>>> >> >>> predefined values:
>>>> >> >>> * RSS_UNDEF - zero value
>>>> >> >>> * RSS_GP    - value of gp symbol
>>>> >> >>> * RSS_GP0   - gp0 value taken from the
.MIPS.options or .reginfo
>>>> >> >>> section
>>>> >> >>> * RSS_LOC   - address of location being
relocated
>>>> >> >>>
>>>> >> >>> So the problem is how to store these four
constants in the
>>>> >> >>> lld::Reference object.
>>>> >> >>> The RSS_UNDEF is obviously not a problem.
To represent the RSS_GP
>>>> >> >>> value I
>>>> >> >>> can
>>>> >> >>> set an AbsoluteAtom created for the
"_gp" as the reference's
>>>> target.
>>>> >> >>> But
>>>> >> >>> what
>>>> >> >>> about RSS_GP0 and RSS_LOC? I am
considering the following
>>>> approaches
>>>> >> >>> but
>>>> >> >>> cannot
>>>> >> >>> select the best one:
>>>> >> >>>
>>>> >> >>> a) Create AbsoluteAtom for each of these
cases and set them as
>>>> the
>>>> >> >>> reference's target.
>>>> >> >>>     The problem is that these atoms are
fake and should not go
>>>> to the
>>>> >> >>> symbol table.
>>>> >> >>>     One more problem is to select unique
names for these atoms.
>>>> >> >>> b) Use two high bits of
lld::Reference::_kindValue field to
>>>> encode
>>>> >> >>> RSS_xxx value.
>>>> >> >>>     Then decode these bits in the
RelocationHandler to calculate
>>>> >> >>> result
>>>> >> >>> of relocation.
>>>> >> >>>     In that case the problem is how to
represent a relocation
>>>> kind
>>>> >> >>> value in YAML format.
>>>> >> >>>     The simple
xxxRelocationStringTable::kindStrings[] array
>>>> will not
>>>> >> >>> satisfy us.
>>>> >> >>> c) Add one more field to the
lld::Reference class. Something
>>>> like the
>>>> >> >>> DefinedAtom::CodeModel
>>>> >> >>>     field.
>>>> >> >>>
>>>> >> >>> Any advices, ideas, and/or objections are
much appreciated.
>>>> >> >>>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum,
>>>> >> >> hosted
>>>> >> >> by the Linux Foundation
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > LLVM Developers mailing list
>>>> >> > LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150206/08f95739/attachment.html>

Simon Atanasyan

2015-Feb-07 08:36 UTC

head link

[LLVMdev] [lld] Representation of lld::Reference with a fake target

My 2c: maybe we should not try to put all target specific object file
formats into the single YAML/Native representation. Let's define an
universal formats of file "header" for YAML/Native representation and
probably some top-level structures common for all target and allow
target specific code to arbitrary extend these formats. For example
code in the ReaderWriter/ELF will know how to convert ELF object files
into the YAML/Native form. In that case we get in fact some
incompatible YAML/Native formats for ELF, PECOFF, MachO etc. But I
think it is not a problem.

On Sat, Feb 7, 2015 at 6:28 AM, Rui Ueyama <ruiu at google.com>
wrote:> Not all input files have to be able to represented in YAML/Native format.
> There are many unrealistic use cases there. No one wants to write an
> executable file in Native because there's no operating system that can
run
> that file. So is YAML. So is the combination of .so file and Native/YAML
> unless we have an operating system whose loader is able to loads a YAML .so
> file.
>
> We might want to write a Native/YAML file as a re-linkable object file (in
> GNU it's -r option), but that's an object file.
>
> So it's totally okay if some input file type is not representable in
> YAML/Native. Some use cases are not real. We can't force all developers
to
> spend their time to support unrealistic use cases.
>
> On Fri, Feb 6, 2015 at 7:04 PM, Shankar Easwaran <shankarke at
gmail.com>
> wrote:
>>
>> The intermediate result is what is really written to disk when
>> --output-filetype=yaml or native is chosen too.
>>
>>
>> Writing to YAML/Reading back YAML is not doable when you convert input
>> files to atoms because some of the input files are not representable in
YAML
>> format.
>>
>> On Fri, Feb 6, 2015 at 8:48 PM, Rui Ueyama <ruiu at google.com>
wrote:
>>>
>>> I think no one is opposing the idea of reading and writing YAML.
>>>
>>> The problem here is that why we need to force all developers to
write
>>> code to serialize intermediate data in the middle of link, which no
one
>>> except the round-trip passes needs.
>>>
>>> On Fri, Feb 6, 2015 at 6:41 PM, Shankar Easwaram <shankarke at
gmail.com>
>>> wrote:
>>>>
>>>> Doing it for every input file is not useful as some of the
input files
>>>> are not represent able in YAML form. Examples are shared
libraries.
>>>>
>>>> The reason I made the yaml pass be called before the writer was
the
>>>> intermediate result was more complete since all atoms have been
resolved at
>>>> that point and the state of all atoms are much sane.
>>>>
>>>> It was also easy to use the pass manager. the code was very
small to
>>>> achieve what we are trying to do that all the information to
the writer is
>>>> passed through references or atom properties.
>>>>
>>>> Shankar Easwaran
>>>>
>>>>
>>>>
>>>> On Feb 6, 2015, at 19:54, Rui Ueyama <ruiu at google.com>
wrote:
>>>>
>>>> On Fri, Feb 6, 2015 at 5:42 PM, Michael Spencer <bigcheesegs
at gmail.com>
>>>> wrote:
>>>>>
>>>>> On Fri, Feb 6, 2015 at 5:31 PM, Rui Ueyama <ruiu at
google.com> wrote:
>>>>> > There are two questions.
>>>>> >
>>>>> > Firstly, do you think the on-disk format needs to
compatible with a
>>>>> > C++
>>>>> > struct so that we can cast that memory buffer to the
struct? That may
>>>>> > be
>>>>> > super-fast but that also comes with many limitations.
It's hard to
>>>>> > extend,
>>>>> > for example. Every time we want to store
variable-length objects we
>>>>> > need to
>>>>> > define string-table-like data structure. And I'm
not very sure that
>>>>> > it's
>>>>> > fastest -- because mmap'able objects are not very
compact on disk,
>>>>> > slow disk
>>>>> > IO could be a bottleneck, if we compare that with more
compact file
>>>>> > format.
>>>>> > I believe Protobufs or Thrust are fast enough or even
might be
>>>>> > faster.
>>>>>
>>>>> I'm not sure here. Although I do question if the object
files will
>>>>> even need to be read from disk in your standard
edit/compile/debug
>>>>> loop or on a build server. I believe we'll need real
data to determine
>>>>> this.
>>>>>
>>>>> >
>>>>> > Secondly, do you know why we are dumping post-linked
object file to
>>>>> > Native
>>>>> > format? If we want to have a different kind of
*object* file format,
>>>>> > we
>>>>> > would want to have a tool to convert an object file in
an existing
>>>>> > file
>>>>> > format (say, ELF) to "native", and teach LLD
how read from the file.
>>>>> > Currently we are writing a file in the middle of
linking process,
>>>>> > which
>>>>> > doesn't make sense to me.
>>>>>
>>>>> This is an artifact of having the native format before we
had any
>>>>> readers. I agree that it's weird and not terribly
useful to write to
>>>>> native format in the middle of the link, although I have
found it
>>>>> helpful to output yaml. There's no need to be able to
read it back in
>>>>> and resume though.
>>>>
>>>>
>>>> Even for YAML it doesn't make much sense to write it to a
file and read
>>>> it back from the file in the middle of the link, do it? I found
that being
>>>> able to output YAML is useful too, but round-trip is a
different thing. In
>>>> the middle of the process, we have bunch of additional
information that
>>>> doesn't exist in input files and doesn't have to be
output to the link
>>>> result. Ability to serialize that intermediate result is not
useful.
>>>>
>>>> Shankar, you added these round-trip tests. Do you have any
opinion?
>>>>
>>>>> Ideally lld -r would be the tool we use to convert
COFF/ELF/MachO to
>>>>> the native format.
-- 
Simon Atanasyan

Shankar Easwaram

2015-Feb-07 15:52 UTC

head link

[LLVMdev] [lld] Representation of lld::Reference with a fake target

We are modeling target specific functionally using references, Doesn't your
idea defeat the purpose of the atom model? Atoms are mostly target neutral and
yaml/native format represents just an atom. Having a derived class for atoms
will have a impact on the testing method with lld IMO.

We could continue to model using references in my opinion and add some meta data
information in the atom where references are not able to model.

> On Feb 7, 2015, at 02:36, Simon Atanasyan <simon at atanasyan.com>
wrote:
> 
> My 2c: maybe we should not try to put all target specific object file
> formats into the single YAML/Native representation. Let's define an
> universal formats of file "header" for YAML/Native representation
and
> probably some top-level structures common for all target and allow
> target specific code to arbitrary extend these formats. For example
> code in the ReaderWriter/ELF will know how to convert ELF object files
> into the YAML/Native form. In that case we get in fact some
> incompatible YAML/Native formats for ELF, PECOFF, MachO etc. But I
> think it is not a problem.
> 
>> On Sat, Feb 7, 2015 at 6:28 AM, Rui Ueyama <ruiu at google.com>
wrote:
>> Not all input files have to be able to represented in YAML/Native
format.
>> There are many unrealistic use cases there. No one wants to write an
>> executable file in Native because there's no operating system that
can run
>> that file. So is YAML. So is the combination of .so file and
Native/YAML
>> unless we have an operating system whose loader is able to loads a YAML
.so
>> file.
>> 
>> We might want to write a Native/YAML file as a re-linkable object file
(in
>> GNU it's -r option), but that's an object file.
>> 
>> So it's totally okay if some input file type is not representable
in
>> YAML/Native. Some use cases are not real. We can't force all
developers to
>> spend their time to support unrealistic use cases.
>> 
>> On Fri, Feb 6, 2015 at 7:04 PM, Shankar Easwaran <shankarke at
gmail.com>
>> wrote:
>>> 
>>> The intermediate result is what is really written to disk when
>>> --output-filetype=yaml or native is chosen too.
>>> 
>>> 
>>> Writing to YAML/Reading back YAML is not doable when you convert
input
>>> files to atoms because some of the input files are not
representable in YAML
>>> format.
>>> 
>>>> On Fri, Feb 6, 2015 at 8:48 PM, Rui Ueyama <ruiu at
google.com> wrote:
>>>> 
>>>> I think no one is opposing the idea of reading and writing
YAML.
>>>> 
>>>> The problem here is that why we need to force all developers to
write
>>>> code to serialize intermediate data in the middle of link,
which no one
>>>> except the round-trip passes needs.
>>>> 
>>>> On Fri, Feb 6, 2015 at 6:41 PM, Shankar Easwaram <shankarke
at gmail.com>
>>>> wrote:
>>>>> 
>>>>> Doing it for every input file is not useful as some of the
input files
>>>>> are not represent able in YAML form. Examples are shared
libraries.
>>>>> 
>>>>> The reason I made the yaml pass be called before the writer
was the
>>>>> intermediate result was more complete since all atoms have
been resolved at
>>>>> that point and the state of all atoms are much sane.
>>>>> 
>>>>> It was also easy to use the pass manager. the code was very
small to
>>>>> achieve what we are trying to do that all the information
to the writer is
>>>>> passed through references or atom properties.
>>>>> 
>>>>> Shankar Easwaran
>>>>> 
>>>>> 
>>>>> 
>>>>> On Feb 6, 2015, at 19:54, Rui Ueyama <ruiu at
google.com> wrote:
>>>>> 
>>>>> On Fri, Feb 6, 2015 at 5:42 PM, Michael Spencer
<bigcheesegs at gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> On Fri, Feb 6, 2015 at 5:31 PM, Rui Ueyama <ruiu
at google.com> wrote:
>>>>>>> There are two questions.
>>>>>>> 
>>>>>>> Firstly, do you think the on-disk format needs to
compatible with a
>>>>>>> C++
>>>>>>> struct so that we can cast that memory buffer to
the struct? That may
>>>>>>> be
>>>>>>> super-fast but that also comes with many
limitations. It's hard to
>>>>>>> extend,
>>>>>>> for example. Every time we want to store
variable-length objects we
>>>>>>> need to
>>>>>>> define string-table-like data structure. And
I'm not very sure that
>>>>>>> it's
>>>>>>> fastest -- because mmap'able objects are not
very compact on disk,
>>>>>>> slow disk
>>>>>>> IO could be a bottleneck, if we compare that with
more compact file
>>>>>>> format.
>>>>>>> I believe Protobufs or Thrust are fast enough or
even might be
>>>>>>> faster.
>>>>>> 
>>>>>> I'm not sure here. Although I do question if the
object files will
>>>>>> even need to be read from disk in your standard
edit/compile/debug
>>>>>> loop or on a build server. I believe we'll need
real data to determine
>>>>>> this.
>>>>>> 
>>>>>>> 
>>>>>>> Secondly, do you know why we are dumping
post-linked object file to
>>>>>>> Native
>>>>>>> format? If we want to have a different kind of
*object* file format,
>>>>>>> we
>>>>>>> would want to have a tool to convert an object file
in an existing
>>>>>>> file
>>>>>>> format (say, ELF) to "native", and teach
LLD how read from the file.
>>>>>>> Currently we are writing a file in the middle of
linking process,
>>>>>>> which
>>>>>>> doesn't make sense to me.
>>>>>> 
>>>>>> This is an artifact of having the native format before
we had any
>>>>>> readers. I agree that it's weird and not terribly
useful to write to
>>>>>> native format in the middle of the link, although I
have found it
>>>>>> helpful to output yaml. There's no need to be able
to read it back in
>>>>>> and resume though.
>>>>> 
>>>>> 
>>>>> Even for YAML it doesn't make much sense to write it to
a file and read
>>>>> it back from the file in the middle of the link, do it? I
found that being
>>>>> able to output YAML is useful too, but round-trip is a
different thing. In
>>>>> the middle of the process, we have bunch of additional
information that
>>>>> doesn't exist in input files and doesn't have to be
output to the link
>>>>> result. Ability to serialize that intermediate result is
not useful.
>>>>> 
>>>>> Shankar, you added these round-trip tests. Do you have any
opinion?
>>>>> 
>>>>>> Ideally lld -r would be the tool we use to convert
COFF/ELF/MachO to
>>>>>> the native format.
> 
> -- 
> Simon Atanasyan

llvm dev - Feb 2015 - [LLVMdev] [lld] Representation of lld::Reference with a fake target

[LLVMdev] [lld] Representation of lld::Reference with a fake target

[LLVMdev] [lld] Representation of lld::Reference with a fake target

[LLVMdev] [lld] Representation of lld::Reference with a fake target