thr3ads.net - llvm dev - [llvm-dev] [RFC] Thoughts on a bitcode symbol table [May 2016]

If this information is useful, please help other people find it:
Share via:

Rafael Espíndola via llvm-dev

2016-May-27 15:48 UTC

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

This is about https://llvm.org/bugs/show_bug.cgi?id=27551.

Currently there is no easy way to get symbol information out of
bitcode files. One has to read the module and mangle the names. This
has a few problem

* During lto we have to create the Module earlier.
* There is no convenient spot to store flags/summary.
* Simpler tools like llvm-nm have massive dependencies because Object
depends on MC to find asm defined symbols.

To fix this I think we need a symbol table. The desired properties are

* Include the *final* name of symbols (_foo, not foo).
* Not be compressed so that be can keep StringRefs to the names.
* Be easy to parse without a LLVMContext.
* Include names created by inline assembly.
* Include other information a linker or nm would want: linkage,
visbility, comdat

The first question is: where should we store it? Some options I thought about:

* Use the existing support for putting bitcode in a section of a
native file and use the file's symbol table.
* Use a custom wrapper over the .bc
* Encode it with records/blocks in the .bc

The first option would be a bit annoying as we are sure to want to
represent more than the native files have. It is also a bit odd for
cross compiling. Do we create a MachO when the bitcode is for darwin
and an ELF when it is for Linux? It would also mean that llvm-as would
depend on a library to create these files.

The second option is tempting for parsing simplicity, but introduces
duplication as the names for regular global values would be stored
twice (once mangled, once not). The symbol table would also use a
string table, which is a concept I think would improve the .bc format.

So my current preference is for the last one. Encode the symbol table
in the .bc. This means that lib/Object will depend on BitReader, but
not more than that.

The next issue is what to do with .ll files. One option is to change
nothing and have llvm-as parse module level inline asm to crete symbol
entries. That would work, but sounds odd. I think we need directives
in the .ll so that symbols created or used by inline asm can be
declared.

Yet another issue is how to handle a string table in .bc. The problem
is not with the format, it is with StreamingMemoryObject. We have to
keep the string table alive while the rest of the file is read, and
the StreamingMemoryObject can reallocate the buffer.

I can think of two solutions

* Drop it. The one known user is PNaCl and it is moving to subzero, so
it is not clear if this is still needed.

* Change the representation so that each read is required to be
contiguous and not be freed. It would basically store a vector of
std::pair<offset, char*> and we would make sure the string table is
read as a blob in a single read.

With all that sorted, I think the representation can be fairly simple:

* a top level record stores the string table as a single blob. This
can be used for any string in the .bc, not just the symbol table.
* a sub block contains the symbol table with one record per symbol. It
would include an offset in the string table, the name size, the
linkage, etc. Being a record makes it easy to extend.

Cheers,
Rafael

Pete Cooper via llvm-dev

2016-May-28 02:31 UTC

head link

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

Hi Rafael

Thanks for bringing this up.  libObject linking libCore is something I’ve been
hoping someone could find a way to fix.

The plan as you’ve described sounds good to me.

One thing I had considered when I looked at the code was whether it would make
sense to have a base class in BitReader which can just read a SymbolicIRFile. 
In libObject, IRObjectFile inherits from SymbolFile as we only really want the
symbols from it.  It would be interesting to see if BitReader could mirror this.
Then we could use the IR-less Symbolic BitReader from libObject to just crack
the symbol table.

Anyway, not something we necessarily need immediately, but would be interesting
to see if one day we can do more in BitReader without creating IR.  I think this
is what you were alluding to when you said you shouldn’t need an LLVMContext.

Cheers,
Pete> On May 27, 2016, at 8:48 AM, Rafael Espíndola via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> This is about https://llvm.org/bugs/show_bug.cgi?id=27551.
> 
> Currently there is no easy way to get symbol information out of
> bitcode files. One has to read the module and mangle the names. This
> has a few problem
> 
> * During lto we have to create the Module earlier.
> * There is no convenient spot to store flags/summary.
> * Simpler tools like llvm-nm have massive dependencies because Object
> depends on MC to find asm defined symbols.
> 
> To fix this I think we need a symbol table. The desired properties are
> 
> * Include the *final* name of symbols (_foo, not foo).
> * Not be compressed so that be can keep StringRefs to the names.
> * Be easy to parse without a LLVMContext.
> * Include names created by inline assembly.
> * Include other information a linker or nm would want: linkage,
> visbility, comdat
> 
> The first question is: where should we store it? Some options I thought
about:
> 
> * Use the existing support for putting bitcode in a section of a
> native file and use the file's symbol table.
> * Use a custom wrapper over the .bc
> * Encode it with records/blocks in the .bc
> 
> The first option would be a bit annoying as we are sure to want to
> represent more than the native files have. It is also a bit odd for
> cross compiling. Do we create a MachO when the bitcode is for darwin
> and an ELF when it is for Linux? It would also mean that llvm-as would
> depend on a library to create these files.
> 
> The second option is tempting for parsing simplicity, but introduces
> duplication as the names for regular global values would be stored
> twice (once mangled, once not). The symbol table would also use a
> string table, which is a concept I think would improve the .bc format.
> 
> So my current preference is for the last one. Encode the symbol table
> in the .bc. This means that lib/Object will depend on BitReader, but
> not more than that.
> 
> The next issue is what to do with .ll files. One option is to change
> nothing and have llvm-as parse module level inline asm to crete symbol
> entries. That would work, but sounds odd. I think we need directives
> in the .ll so that symbols created or used by inline asm can be
> declared.
> 
> Yet another issue is how to handle a string table in .bc. The problem
> is not with the format, it is with StreamingMemoryObject. We have to
> keep the string table alive while the rest of the file is read, and
> the StreamingMemoryObject can reallocate the buffer.
> 
> I can think of two solutions
> 
> * Drop it. The one known user is PNaCl and it is moving to subzero, so
> it is not clear if this is still needed.
> 
> * Change the representation so that each read is required to be
> contiguous and not be freed. It would basically store a vector of
> std::pair<offset, char*> and we would make sure the string table is
> read as a blob in a single read.
> 
> With all that sorted, I think the representation can be fairly simple:
> 
> * a top level record stores the string table as a single blob. This
> can be used for any string in the .bc, not just the symbol table.
> * a sub block contains the symbol table with one record per symbol. It
> would include an offset in the string table, the name size, the
> linkage, etc. Being a record makes it easy to extend.
> 
> Cheers,
> Rafael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Teresa Johnson via llvm-dev

2016-May-31 14:27 UTC

head link

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

On Fri, May 27, 2016 at 8:48 AM, Rafael Espíndola <llvm-dev at
lists.llvm.org>
wrote:
> This is about https://llvm.org/bugs/show_bug.cgi?id=27551.
>
> Currently there is no easy way to get symbol information out of
> bitcode files. One has to read the module and mangle the names. This
> has a few problem
>
This would be great for ThinLTO as well:

>
> * During lto we have to create the Module earlier.
>
During the ThinLink step we could avoid creating the Module altogether,
only the parallel backends would need the Module.

> * There is no convenient spot to store flags/summary.
>
Right now we are duplicating some info like the linkage type into the
summary since it isn't available in the ValueSymbolTable (which I assume
this would subsume?)

Thanks,
Teresa

> * Simpler tools like llvm-nm have massive dependencies because Object
> depends on MC to find asm defined symbols.
>
> To fix this I think we need a symbol table. The desired properties are
>
> * Include the *final* name of symbols (_foo, not foo).
> * Not be compressed so that be can keep StringRefs to the names.
> * Be easy to parse without a LLVMContext.
> * Include names created by inline assembly.
> * Include other information a linker or nm would want: linkage,
> visbility, comdat
>
> The first question is: where should we store it? Some options I thought
> about:
>
> * Use the existing support for putting bitcode in a section of a
> native file and use the file's symbol table.
> * Use a custom wrapper over the .bc
> * Encode it with records/blocks in the .bc
>
> The first option would be a bit annoying as we are sure to want to
> represent more than the native files have. It is also a bit odd for
> cross compiling. Do we create a MachO when the bitcode is for darwin
> and an ELF when it is for Linux? It would also mean that llvm-as would
> depend on a library to create these files.
>
> The second option is tempting for parsing simplicity, but introduces
> duplication as the names for regular global values would be stored
> twice (once mangled, once not). The symbol table would also use a
> string table, which is a concept I think would improve the .bc format.
>
> So my current preference is for the last one. Encode the symbol table
> in the .bc. This means that lib/Object will depend on BitReader, but
> not more than that.
>
> The next issue is what to do with .ll files. One option is to change
> nothing and have llvm-as parse module level inline asm to crete symbol
> entries. That would work, but sounds odd. I think we need directives
> in the .ll so that symbols created or used by inline asm can be
> declared.
> Yet another issue is how to handle a string table in .bc. The problem
> is not with the format, it is with StreamingMemoryObject. We have to
> keep the string table alive while the rest of the file is read, and
> the StreamingMemoryObject can reallocate the buffer.
>
> I can think of two solutions
>
> * Drop it. The one known user is PNaCl and it is moving to subzero, so
> it is not clear if this is still needed.
>
> * Change the representation so that each read is required to be
> contiguous and not be freed. It would basically store a vector of
> std::pair<offset, char*> and we would make sure the string table is
> read as a blob in a single read.
>
> With all that sorted, I think the representation can be fairly simple:
>
> * a top level record stores the string table as a single blob. This
> can be used for any string in the .bc, not just the symbol table.
> * a sub block contains the symbol table with one record per symbol. It
> would include an offset in the string table, the name size, the
> linkage, etc. Being a record makes it easy to extend.
>
> Cheers,
> Rafael
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160531/5d9f8231/attachment.html>

Rafael Espíndola via llvm-dev

2016-May-31 17:21 UTC

head link

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

On 31 May 2016 at 07:27, Teresa Johnson <tejohnson at google.com> wrote:
>
>
> On Fri, May 27, 2016 at 8:48 AM, Rafael Espíndola <llvm-dev at
lists.llvm.org
> > wrote:
>
>> This is about https://llvm.org/bugs/show_bug.cgi?id=27551.
>>
>> Currently there is no easy way to get symbol information out of
>> bitcode files. One has to read the module and mangle the names. This
>> has a few problem
>>
>
> This would be great for ThinLTO as well:
>
>
>>
>> * During lto we have to create the Module earlier.
>>
>
> During the ThinLink step we could avoid creating the Module altogether,
> only the parallel backends would need the Module.
>
>
>> * There is no convenient spot to store flags/summary.
>>
>
> Right now we are duplicating some info like the linkage type into the
> summary since it isn't available in the ValueSymbolTable (which I
assume
> this would subsume?)
>
>It should yes. The general idea is for it to include any symbol info a
linker might want during resolution.

Cheers,
Rafael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160531/175c52df/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - May 2016 - [RFC] Thoughts on a bitcode symbol table

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

[llvm-dev] [RFC] Thoughts on a bitcode symbol table

Apparently Analagous Threads