thr3ads.net - llvm dev - [llvm-dev] RFC: LLD symbol table redesign [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2016-Apr-27 18:49 UTC

[llvm-dev] RFC: LLD symbol table redesign

Hi all,

This proposes a redesign of LLD’s symbol table in order to improve memory
locality by minimizing indirection when resolving relocations. The key idea
is that we perform symbol resolution by overwriting SymbolBodies, rather
than by updating pointers. This is based on some ideas mentioned by Rafael
on IRC.

Conceptually, we split Symbol into a non-polymorphic part and a polymorphic
part (a SymbolBody). The non-polymorphic part contains the flags that are
currently stored in Symbol, while the polymorphic part would be a derived
class of SymbolBody stored directly in the Symbol object without a level of
indirection. The class definitions would roughly look like this:

struct Symbol {
  uint8_t Binding;
  unsigned Visibility : 2;
  // ...
  char Body[Max]; // Max is the maximum size of a SymbolBody
  SymbolBody *body() { return reinterpret_cast<SymbolBody *>(Body); }
};

class SymbolBody {
  const unsigned SymbolKind : 8;
  // …

  Symbol &symbol() {
    return *reinterpret_cast<Symbol *>(reinterpret_cast<char
*>(this) -
offsetof(Symbol, Body));
  }
};

class Defined : public SymbolBody {
 ...
};

As we load symbols from input files, the input file calls methods on the
symbol table that add symbols of specific types. These methods would look
up the symbol name and make a decision about whether to replace the
existing symbol. This would be similar to the decision currently being made
by SymbolBody::compare, but it would be made directly using symbol
information, without needing to create a SymbolBody for the symbol. If we
decide to replace the symbol, we placement-new a SymbolBody for the symbol
into Symbol::Body.

A sketch of how the symbol table might implement adding an undefined symbol:

Symbol *SymbolTable<ELFT>::addUndefined(StringRef Name, uint8_t Binding,
…)
{
  std::pair<Symbol *, bool> P = insert(Name);
  if (!P.second) { // symbol already in symbol table
    if (auto *L = dyn_cast<Lazy>(P.first->body()))
      addFile(L);
    return P.first;
  }
  // symbol previously unseen
  new (P.first->Body) Undefined(Name, Binding, ...);
  return P.first;
}

Input files would have an associated symbol list for use in resolving
relocations. This symbol list would be a std::vector<Symbol *>. Symbols
retrieved from the symbol table are added to the symbol list. Relocations
would be resolved by following two fewer levels of indirection, from the
vector to the Symbol, rather than the current vector -> SymbolBody ->
Symbol -> SymbolBody.

This design has a disadvantage over the current design: input file loading
would no longer be thread safe. However, input file loading is already a
highly serialized process since the set of files we need to parse is
unknown until we have examined all input files, so I think that doesn’t
matter.

Next steps: I intend to implement a prototype of this idea and get
benchmark numbers. If the numbers look good, I'll send out a patch.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/0013d2c3/attachment.html>

Rui Ueyama via llvm-dev

2016-Apr-27 19:01 UTC

head link

[llvm-dev] RFC: LLD symbol table redesign

This proposal seems overall legit. It's trickier than the current
implementation, but not that much. As long as it is described how it works
in the README, it should be fine.

This is a divergence from the original design and widen the gap between
COFF and ELF. I'd like to keep them consistent as possible so that one can
easily understand the COFF linker using the knowledge of the ELF linker
(and vice versa.) So we probably want to do the same thing if the
experiment is successful.

On Wed, Apr 27, 2016 at 11:49 AM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi all,
>
> This proposes a redesign of LLD’s symbol table in order to improve memory
> locality by minimizing indirection when resolving relocations. The key idea
> is that we perform symbol resolution by overwriting SymbolBodies, rather
> than by updating pointers. This is based on some ideas mentioned by Rafael
> on IRC.
>
> Conceptually, we split Symbol into a non-polymorphic part and a
> polymorphic part (a SymbolBody). The non-polymorphic part contains the
> flags that are currently stored in Symbol, while the polymorphic part would
> be a derived class of SymbolBody stored directly in the Symbol object
> without a level of indirection. The class definitions would roughly look
> like this:
>
> struct Symbol {
>   uint8_t Binding;
>   unsigned Visibility : 2;
>   // ...
>   char Body[Max]; // Max is the maximum size of a SymbolBody
>   SymbolBody *body() { return reinterpret_cast<SymbolBody *>(Body); }
> };
>
> class SymbolBody {
>   const unsigned SymbolKind : 8;
>   // …
>
>   Symbol &symbol() {
>     return *reinterpret_cast<Symbol *>(reinterpret_cast<char
*>(this) -
> offsetof(Symbol, Body));
>   }
> };
>
> class Defined : public SymbolBody {
>  ...
> };
>
> As we load symbols from input files, the input file calls methods on the
> symbol table that add symbols of specific types. These methods would look
> up the symbol name and make a decision about whether to replace the
> existing symbol. This would be similar to the decision currently being made
> by SymbolBody::compare, but it would be made directly using symbol
> information, without needing to create a SymbolBody for the symbol. If we
> decide to replace the symbol, we placement-new a SymbolBody for the symbol
> into Symbol::Body.
>
> A sketch of how the symbol table might implement adding an undefined
> symbol:
>
> Symbol *SymbolTable<ELFT>::addUndefined(StringRef Name, uint8_t
Binding,
> …) {
>   std::pair<Symbol *, bool> P = insert(Name);
>   if (!P.second) { // symbol already in symbol table
>     if (auto *L = dyn_cast<Lazy>(P.first->body()))
>       addFile(L);
>     return P.first;
>   }
>   // symbol previously unseen
>   new (P.first->Body) Undefined(Name, Binding, ...);
>   return P.first;
> }
>
> Input files would have an associated symbol list for use in resolving
> relocations. This symbol list would be a std::vector<Symbol *>.
Symbols
> retrieved from the symbol table are added to the symbol list. Relocations
> would be resolved by following two fewer levels of indirection, from the
> vector to the Symbol, rather than the current vector -> SymbolBody ->
> Symbol -> SymbolBody.
>
> This design has a disadvantage over the current design: input file loading
> would no longer be thread safe. However, input file loading is already a
> highly serialized process since the set of files we need to parse is
> unknown until we have examined all input files, so I think that doesn’t
> matter.
>
> Next steps: I intend to implement a prototype of this idea and get
> benchmark numbers. If the numbers look good, I'll send out a patch.
>
> Thanks,
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/f027a451/attachment.html>

Peter Collingbourne via llvm-dev

2016-Apr-27 19:08 UTC

head link

[llvm-dev] RFC: LLD symbol table redesign

Yes, I think it makes sense for the designs to be consistent, and I'd be
wiling to do the same thing in the COFF linker if this works out.

This design might even be slightly easier to implement in the COFF linker
due to lack of templating and the fact that we aren't storing anything else
in the Symbol.

Peter

On Wed, Apr 27, 2016 at 12:01 PM, Rui Ueyama <ruiu at google.com> wrote:
> This proposal seems overall legit. It's trickier than the current
> implementation, but not that much. As long as it is described how it works
> in the README, it should be fine.
>
> This is a divergence from the original design and widen the gap between
> COFF and ELF. I'd like to keep them consistent as possible so that one
can
> easily understand the COFF linker using the knowledge of the ELF linker
> (and vice versa.) So we probably want to do the same thing if the
> experiment is successful.
>
> On Wed, Apr 27, 2016 at 11:49 AM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> Hi all,
>>
>> This proposes a redesign of LLD’s symbol table in order to improve
memory
>> locality by minimizing indirection when resolving relocations. The key
idea
>> is that we perform symbol resolution by overwriting SymbolBodies,
rather
>> than by updating pointers. This is based on some ideas mentioned by
Rafael
>> on IRC.
>>
>> Conceptually, we split Symbol into a non-polymorphic part and a
>> polymorphic part (a SymbolBody). The non-polymorphic part contains the
>> flags that are currently stored in Symbol, while the polymorphic part
would
>> be a derived class of SymbolBody stored directly in the Symbol object
>> without a level of indirection. The class definitions would roughly
look
>> like this:
>>
>> struct Symbol {
>>   uint8_t Binding;
>>   unsigned Visibility : 2;
>>   // ...
>>   char Body[Max]; // Max is the maximum size of a SymbolBody
>>   SymbolBody *body() { return reinterpret_cast<SymbolBody
*>(Body); }
>> };
>>
>> class SymbolBody {
>>   const unsigned SymbolKind : 8;
>>   // …
>>
>>   Symbol &symbol() {
>>     return *reinterpret_cast<Symbol *>(reinterpret_cast<char
*>(this) -
>> offsetof(Symbol, Body));
>>   }
>> };
>>
>> class Defined : public SymbolBody {
>>  ...
>> };
>>
>> As we load symbols from input files, the input file calls methods on
the
>> symbol table that add symbols of specific types. These methods would
look
>> up the symbol name and make a decision about whether to replace the
>> existing symbol. This would be similar to the decision currently being
made
>> by SymbolBody::compare, but it would be made directly using symbol
>> information, without needing to create a SymbolBody for the symbol. If
we
>> decide to replace the symbol, we placement-new a SymbolBody for the
symbol
>> into Symbol::Body.
>>
>> A sketch of how the symbol table might implement adding an undefined
>> symbol:
>>
>> Symbol *SymbolTable<ELFT>::addUndefined(StringRef Name, uint8_t
Binding,
>> …) {
>>   std::pair<Symbol *, bool> P = insert(Name);
>>   if (!P.second) { // symbol already in symbol table
>>     if (auto *L = dyn_cast<Lazy>(P.first->body()))
>>       addFile(L);
>>     return P.first;
>>   }
>>   // symbol previously unseen
>>   new (P.first->Body) Undefined(Name, Binding, ...);
>>   return P.first;
>> }
>>
>> Input files would have an associated symbol list for use in resolving
>> relocations. This symbol list would be a std::vector<Symbol *>.
Symbols
>> retrieved from the symbol table are added to the symbol list.
Relocations
>> would be resolved by following two fewer levels of indirection, from
the
>> vector to the Symbol, rather than the current vector -> SymbolBody
->
>> Symbol -> SymbolBody.
>>
>> This design has a disadvantage over the current design: input file
>> loading would no longer be thread safe. However, input file loading is
>> already a highly serialized process since the set of files we need to
parse
>> is unknown until we have examined all input files, so I think that
doesn’t
>> matter.
>>
>> Next steps: I intend to implement a prototype of this idea and get
>> benchmark numbers. If the numbers look good, I'll send out a patch.
>>
>> Thanks,
>> --
>> --
>> Peter
>>
>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/bc11b6e9/attachment.html>

Rafael Espíndola via llvm-dev

2016-Apr-27 22:58 UTC

head link

[llvm-dev] RFC: LLD symbol table redesign

> A sketch of how the symbol table might implement adding an undefined
symbol:
>
> Symbol *SymbolTable<ELFT>::addUndefined(StringRef Name, uint8_t
Binding, …)
> {
>   std::pair<Symbol *, bool> P = insert(Name);
>   if (!P.second) { // symbol already in symbol table
>     if (auto *L = dyn_cast<Lazy>(P.first->body()))
>       addFile(L);
>     return P.first;
>   }
>   // symbol previously unseen
>   new (P.first->Body) Undefined(Name, Binding, ...);
>   return P.first;
> }
>
> Input files would have an associated symbol list for use in resolving
> relocations. This symbol list would be a std::vector<Symbol *>.
Symbols
> retrieved from the symbol table are added to the symbol list. Relocations
> would be resolved by following two fewer levels of indirection, from the
> vector to the Symbol, rather than the current vector -> SymbolBody ->
Symbol
> -> SymbolBody.
I do like the design. What are you doing with local symbols? Are they
just an special symbol type?

Another thing that can be tried that might be simpler and already
provide some performance advantage:

* Replace the current map vector. We have a map to an symbol number
that indexes an array. We could map directly to a Symbol if:
  * We stored the number in the Symbol to allow sorting.
  * Or used a hash that produces the same value everywhere.

Cheers,
Rafael

Peter Collingbourne via llvm-dev

2016-Apr-27 23:29 UTC

head link

[llvm-dev] RFC: LLD symbol table redesign

On Wed, Apr 27, 2016 at 3:58 PM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:
> > A sketch of how the symbol table might implement adding an undefined
> symbol:
> >
> > Symbol *SymbolTable<ELFT>::addUndefined(StringRef Name, uint8_t
Binding,
> …)
> > {
> >   std::pair<Symbol *, bool> P = insert(Name);
> >   if (!P.second) { // symbol already in symbol table
> >     if (auto *L = dyn_cast<Lazy>(P.first->body()))
> >       addFile(L);
> >     return P.first;
> >   }
> >   // symbol previously unseen
> >   new (P.first->Body) Undefined(Name, Binding, ...);
> >   return P.first;
> > }
> >
> > Input files would have an associated symbol list for use in resolving
> > relocations. This symbol list would be a std::vector<Symbol *>.
Symbols
> > retrieved from the symbol table are added to the symbol list.
Relocations
> > would be resolved by following two fewer levels of indirection, from
the
> > vector to the Symbol, rather than the current vector -> SymbolBody
->
> Symbol
> > -> SymbolBody.
>
> I do like the design. What are you doing with local symbols? Are they
> just an special symbol type?
>
I'm not sure, but I think the best approach may be to process them directly
in the Writer without creating Symbols for them. This may help us save time
in file parsing because (according to Rui) most symbols are not global
symbols.

Another thing that can be tried that might be simpler and
already> provide some performance advantage:
>
> * Replace the current map vector. We have a map to an symbol number
> that indexes an array. We could map directly to a Symbol if:
>   * We stored the number in the Symbol to allow sorting.
>   * Or used a hash that produces the same value everywhere.
>
Thanks, I might try that as well. The second one should be quite easy to
benchmark without needing a stable hash.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/1f2da7fc/attachment.html>

llvm dev - Apr 2016 - RFC: LLD symbol table redesign

[llvm-dev] RFC: LLD symbol table redesign

[llvm-dev] RFC: LLD symbol table redesign

[llvm-dev] RFC: LLD symbol table redesign

[llvm-dev] RFC: LLD symbol table redesign

[llvm-dev] RFC: LLD symbol table redesign