Rafael Espíndola via llvm-dev
2016-May-27 15:48 UTC
[llvm-dev] [RFC] Thoughts on a bitcode symbol table
This is about https://llvm.org/bugs/show_bug.cgi?id=27551. Currently there is no easy way to get symbol information out of bitcode files. One has to read the module and mangle the names. This has a few problem * During lto we have to create the Module earlier. * There is no convenient spot to store flags/summary. * Simpler tools like llvm-nm have massive dependencies because Object depends on MC to find asm defined symbols. To fix this I think we need a symbol table. The desired properties are * Include the *final* name of symbols (_foo, not foo). * Not be compressed so that be can keep StringRefs to the names. * Be easy to parse without a LLVMContext. * Include names created by inline assembly. * Include other information a linker or nm would want: linkage, visbility, comdat The first question is: where should we store it? Some options I thought about: * Use the existing support for putting bitcode in a section of a native file and use the file's symbol table. * Use a custom wrapper over the .bc * Encode it with records/blocks in the .bc The first option would be a bit annoying as we are sure to want to represent more than the native files have. It is also a bit odd for cross compiling. Do we create a MachO when the bitcode is for darwin and an ELF when it is for Linux? It would also mean that llvm-as would depend on a library to create these files. The second option is tempting for parsing simplicity, but introduces duplication as the names for regular global values would be stored twice (once mangled, once not). The symbol table would also use a string table, which is a concept I think would improve the .bc format. So my current preference is for the last one. Encode the symbol table in the .bc. This means that lib/Object will depend on BitReader, but not more than that. The next issue is what to do with .ll files. One option is to change nothing and have llvm-as parse module level inline asm to crete symbol entries. That would work, but sounds odd. I think we need directives in the .ll so that symbols created or used by inline asm can be declared. Yet another issue is how to handle a string table in .bc. The problem is not with the format, it is with StreamingMemoryObject. We have to keep the string table alive while the rest of the file is read, and the StreamingMemoryObject can reallocate the buffer. I can think of two solutions * Drop it. The one known user is PNaCl and it is moving to subzero, so it is not clear if this is still needed. * Change the representation so that each read is required to be contiguous and not be freed. It would basically store a vector of std::pair<offset, char*> and we would make sure the string table is read as a blob in a single read. With all that sorted, I think the representation can be fairly simple: * a top level record stores the string table as a single blob. This can be used for any string in the .bc, not just the symbol table. * a sub block contains the symbol table with one record per symbol. It would include an offset in the string table, the name size, the linkage, etc. Being a record makes it easy to extend. Cheers, Rafael
Pete Cooper via llvm-dev
2016-May-28 02:31 UTC
[llvm-dev] [RFC] Thoughts on a bitcode symbol table
Hi Rafael Thanks for bringing this up. libObject linking libCore is something I’ve been hoping someone could find a way to fix. The plan as you’ve described sounds good to me. One thing I had considered when I looked at the code was whether it would make sense to have a base class in BitReader which can just read a SymbolicIRFile. In libObject, IRObjectFile inherits from SymbolFile as we only really want the symbols from it. It would be interesting to see if BitReader could mirror this. Then we could use the IR-less Symbolic BitReader from libObject to just crack the symbol table. Anyway, not something we necessarily need immediately, but would be interesting to see if one day we can do more in BitReader without creating IR. I think this is what you were alluding to when you said you shouldn’t need an LLVMContext. Cheers, Pete> On May 27, 2016, at 8:48 AM, Rafael Espíndola via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > This is about https://llvm.org/bugs/show_bug.cgi?id=27551. > > Currently there is no easy way to get symbol information out of > bitcode files. One has to read the module and mangle the names. This > has a few problem > > * During lto we have to create the Module earlier. > * There is no convenient spot to store flags/summary. > * Simpler tools like llvm-nm have massive dependencies because Object > depends on MC to find asm defined symbols. > > To fix this I think we need a symbol table. The desired properties are > > * Include the *final* name of symbols (_foo, not foo). > * Not be compressed so that be can keep StringRefs to the names. > * Be easy to parse without a LLVMContext. > * Include names created by inline assembly. > * Include other information a linker or nm would want: linkage, > visbility, comdat > > The first question is: where should we store it? Some options I thought about: > > * Use the existing support for putting bitcode in a section of a > native file and use the file's symbol table. > * Use a custom wrapper over the .bc > * Encode it with records/blocks in the .bc > > The first option would be a bit annoying as we are sure to want to > represent more than the native files have. It is also a bit odd for > cross compiling. Do we create a MachO when the bitcode is for darwin > and an ELF when it is for Linux? It would also mean that llvm-as would > depend on a library to create these files. > > The second option is tempting for parsing simplicity, but introduces > duplication as the names for regular global values would be stored > twice (once mangled, once not). The symbol table would also use a > string table, which is a concept I think would improve the .bc format. > > So my current preference is for the last one. Encode the symbol table > in the .bc. This means that lib/Object will depend on BitReader, but > not more than that. > > The next issue is what to do with .ll files. One option is to change > nothing and have llvm-as parse module level inline asm to crete symbol > entries. That would work, but sounds odd. I think we need directives > in the .ll so that symbols created or used by inline asm can be > declared. > > Yet another issue is how to handle a string table in .bc. The problem > is not with the format, it is with StreamingMemoryObject. We have to > keep the string table alive while the rest of the file is read, and > the StreamingMemoryObject can reallocate the buffer. > > I can think of two solutions > > * Drop it. The one known user is PNaCl and it is moving to subzero, so > it is not clear if this is still needed. > > * Change the representation so that each read is required to be > contiguous and not be freed. It would basically store a vector of > std::pair<offset, char*> and we would make sure the string table is > read as a blob in a single read. > > With all that sorted, I think the representation can be fairly simple: > > * a top level record stores the string table as a single blob. This > can be used for any string in the .bc, not just the symbol table. > * a sub block contains the symbol table with one record per symbol. It > would include an offset in the string table, the name size, the > linkage, etc. Being a record makes it easy to extend. > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Teresa Johnson via llvm-dev
2016-May-31 14:27 UTC
[llvm-dev] [RFC] Thoughts on a bitcode symbol table
On Fri, May 27, 2016 at 8:48 AM, Rafael Espíndola <llvm-dev at lists.llvm.org> wrote:> This is about https://llvm.org/bugs/show_bug.cgi?id=27551. > > Currently there is no easy way to get symbol information out of > bitcode files. One has to read the module and mangle the names. This > has a few problem >This would be great for ThinLTO as well:> > * During lto we have to create the Module earlier. >During the ThinLink step we could avoid creating the Module altogether, only the parallel backends would need the Module.> * There is no convenient spot to store flags/summary. >Right now we are duplicating some info like the linkage type into the summary since it isn't available in the ValueSymbolTable (which I assume this would subsume?) Thanks, Teresa> * Simpler tools like llvm-nm have massive dependencies because Object > depends on MC to find asm defined symbols. > > To fix this I think we need a symbol table. The desired properties are > > * Include the *final* name of symbols (_foo, not foo). > * Not be compressed so that be can keep StringRefs to the names. > * Be easy to parse without a LLVMContext. > * Include names created by inline assembly. > * Include other information a linker or nm would want: linkage, > visbility, comdat > > The first question is: where should we store it? Some options I thought > about: > > * Use the existing support for putting bitcode in a section of a > native file and use the file's symbol table. > * Use a custom wrapper over the .bc > * Encode it with records/blocks in the .bc > > The first option would be a bit annoying as we are sure to want to > represent more than the native files have. It is also a bit odd for > cross compiling. Do we create a MachO when the bitcode is for darwin > and an ELF when it is for Linux? It would also mean that llvm-as would > depend on a library to create these files. > > The second option is tempting for parsing simplicity, but introduces > duplication as the names for regular global values would be stored > twice (once mangled, once not). The symbol table would also use a > string table, which is a concept I think would improve the .bc format. > > So my current preference is for the last one. Encode the symbol table > in the .bc. This means that lib/Object will depend on BitReader, but > not more than that. > > The next issue is what to do with .ll files. One option is to change > nothing and have llvm-as parse module level inline asm to crete symbol > entries. That would work, but sounds odd. I think we need directives > in the .ll so that symbols created or used by inline asm can be > declared.> Yet another issue is how to handle a string table in .bc. The problem > is not with the format, it is with StreamingMemoryObject. We have to > keep the string table alive while the rest of the file is read, and > the StreamingMemoryObject can reallocate the buffer. > > I can think of two solutions > > * Drop it. The one known user is PNaCl and it is moving to subzero, so > it is not clear if this is still needed. > > * Change the representation so that each read is required to be > contiguous and not be freed. It would basically store a vector of > std::pair<offset, char*> and we would make sure the string table is > read as a blob in a single read. > > With all that sorted, I think the representation can be fairly simple: > > * a top level record stores the string table as a single blob. This > can be used for any string in the .bc, not just the symbol table. > * a sub block contains the symbol table with one record per symbol. It > would include an offset in the string table, the name size, the > linkage, etc. Being a record makes it easy to extend. > > Cheers, > Rafael > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160531/5d9f8231/attachment.html>
Rafael Espíndola via llvm-dev
2016-May-31 17:21 UTC
[llvm-dev] [RFC] Thoughts on a bitcode symbol table
On 31 May 2016 at 07:27, Teresa Johnson <tejohnson at google.com> wrote:> > > On Fri, May 27, 2016 at 8:48 AM, Rafael Espíndola <llvm-dev at lists.llvm.org > > wrote: > >> This is about https://llvm.org/bugs/show_bug.cgi?id=27551. >> >> Currently there is no easy way to get symbol information out of >> bitcode files. One has to read the module and mangle the names. This >> has a few problem >> > > This would be great for ThinLTO as well: > > >> >> * During lto we have to create the Module earlier. >> > > During the ThinLink step we could avoid creating the Module altogether, > only the parallel backends would need the Module. > > >> * There is no convenient spot to store flags/summary. >> > > Right now we are duplicating some info like the linkage type into the > summary since it isn't available in the ValueSymbolTable (which I assume > this would subsume?) > >It should yes. The general idea is for it to include any symbol info a linker might want during resolution. Cheers, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160531/175c52df/attachment.html>