Peter Collingbourne
2014-Jul-07 01:57 UTC
[LLVMdev] Proposal: support object file-based on-disk module format
Hi, Over in [1] we've been discussing adding support in LTO for an object file-based on-disk module format. Rafael suggested that I send a proposal to this list; this is that proposal. As motivation, consider a compiler that needs to store metadata in the LTO object file that may need to be read by future compilation steps, such as the "export data" used by some Go compilers [2]. Such metadata might also need to be read by external tools which do not know about LLVM, so a good choice of file format would be something relatively stable and well understood, readable without depending on LLVM and compatible with the non-LTO scenario. This lends itself to the platform's native object file format being a good candidate for the outermost file format, such that the metadata and IR are stored in separate sections. The basic proposal is that as an alternative on-disk representation for IR, we also support native object files (i.e. ELF/COFF/Mach-O) with a section named '.llvmbc' containing the bitcode in the same format that we are using now, and no other (allocatable) sections. The actual support needed in LLVM would be limited to consumers, i.e. LTO infrastructure: linker plugins, llvm-ar, llvm-nm etc. We would not necessarily need to teach other bitcode consumers (e.g. llvm-dis) about this format or add any producers to the tree, but it may be useful as a matter of convenience to do so. We can also consider extending this format by generating code into the object file, such as for functions which we believe at compile time to be cold, or for all functions if we want the decisions to be made at link time. This may be beneficial for C/C++ compilation as it may allow us to parallelize/deduplicate the code generation work for at least some functions. Thanks, -- Peter [1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/224359.html [2] http://golang.org/doc/install/gccgo#Imports
Reid Kleckner
2014-Jul-07 05:35 UTC
[LLVMdev] Proposal: support object file-based on-disk module format
FWIW I always thought it was a little silly that Clang produces .o files for LTO that aren't the native object file format. On Sun, Jul 6, 2014 at 6:57 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> Hi, > > Over in [1] we've been discussing adding support in LTO for an object > file-based on-disk module format. Rafael suggested that I send a proposal > to this list; this is that proposal. > > As motivation, consider a compiler that needs to store metadata in the > LTO object file that may need to be read by future compilation steps, > such as the "export data" used by some Go compilers [2]. Such metadata > might also need to be read by external tools which do not know about LLVM, > so a good choice of file format would be something relatively stable and > well understood, readable without depending on LLVM and compatible with the > non-LTO scenario. This lends itself to the platform's native object file > format being a good candidate for the outermost file format, such that the > metadata and IR are stored in separate sections. > > The basic proposal is that as an alternative on-disk representation for > IR, we > also support native object files (i.e. ELF/COFF/Mach-O) with a section > named > '.llvmbc' containing the bitcode in the same format that we are using now, > and no other (allocatable) sections. The actual support needed in LLVM > would > be limited to consumers, i.e. LTO infrastructure: linker plugins, llvm-ar, > llvm-nm etc. We would not necessarily need to teach other bitcode consumers > (e.g. llvm-dis) about this format or add any producers to the tree, but > it may be useful as a matter of convenience to do so. > > We can also consider extending this format by generating code into the > object file, such as for functions which we believe at compile time to > be cold, or for all functions if we want the decisions to be made at link > time. This may be beneficial for C/C++ compilation as it may allow us to > parallelize/deduplicate the code generation work for at least some > functions. > > Thanks, > -- > Peter > > [1] > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/224359.html > [2] http://golang.org/doc/install/gccgo#Imports > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140706/d2acf549/attachment.html>
Dan Liew
2014-Jul-08 18:21 UTC
[LLVMdev] Proposal: support object file-based on-disk module format
Hi Peter, This sounds sensible to me. There is one thing that does concern me though. IIRC when you create object files with additional sections the GNU ld linker (possibly others too) will concatenate sections it doesn't recognise into the final executable. There is actually a hacky tool called whole-program-llvm [1] which actually uses this to get a list of paths to LLVM bitcode files that make up the final executable. If I've understood your proposal correctly then when compiling and using the GNU ld linker you would end up with all the bitcode files embedded in the final executable. Is this intentional? [1] https://github.com/travitch/whole-program-llvm Thanks, Dan Liew.
Peter Collingbourne
2014-Jul-08 18:43 UTC
[LLVMdev] Proposal: support object file-based on-disk module format
On Tue, Jul 08, 2014 at 07:21:00PM +0100, Dan Liew wrote:> Hi Peter, > > This sounds sensible to me. > > There is one thing that does concern me though. IIRC when you create > object files with additional sections the GNU ld linker (possibly > others too) will concatenate sections it doesn't recognise into the > final executable. > There is actually a hacky tool called whole-program-llvm [1] which > actually uses this to get a list of paths to LLVM bitcode files that > make up the final executable. > > If I've understood your proposal correctly then when compiling and > using the GNU ld linker you would end up with all the bitcode files > embedded in the final executable. Is this intentional? > > [1] https://github.com/travitch/whole-program-llvmIf the linker never sees the intermediate object files, this will not happen. This is the case under the current proposal. However, if we codegen into the object files, we might want to make those object files visible to the linker. In which case, the compiler can use an object-format-specific exclude flag [1] to exclude those sections from the executable or DSO. Thanks, -- Peter [1] https://sourceware.org/binutils/docs/as/Section.html
Rafael EspĂndola
2014-Aug-25 22:19 UTC
[LLVMdev] Proposal: support object file-based on-disk module format
Sorry for the delay in replying. I agree that this looks like an interesting feature to have. Having worked with gcc's lto in the past, the one thing I would like to avoid is having a failure mode where a non-LTO build is done, which is the case with a complete fat native object. I understand that is not the case in your current proposal since the object files only have auxiliary metadata, not text sections. This is just something to keep in mind as things evolve. If I remember correctly one of the issues with the original patches was the handling of MemoryBuffer ownership. Trunk has switched to object::Binary holding just a reference to the memory, so this should be easier to implement now. So, it looks like everyone is OK at least with the idea of supporting IR-in-Object, so would you mind rebasing your patches on top of current trunk? Thanks, Rafael On 6 July 2014 21:57, Peter Collingbourne <peter at pcc.me.uk> wrote:> Hi, > > Over in [1] we've been discussing adding support in LTO for an object > file-based on-disk module format. Rafael suggested that I send a proposal > to this list; this is that proposal. > > As motivation, consider a compiler that needs to store metadata in the > LTO object file that may need to be read by future compilation steps, > such as the "export data" used by some Go compilers [2]. Such metadata > might also need to be read by external tools which do not know about LLVM, > so a good choice of file format would be something relatively stable and > well understood, readable without depending on LLVM and compatible with the > non-LTO scenario. This lends itself to the platform's native object file > format being a good candidate for the outermost file format, such that the > metadata and IR are stored in separate sections. > > The basic proposal is that as an alternative on-disk representation for IR, we > also support native object files (i.e. ELF/COFF/Mach-O) with a section named > '.llvmbc' containing the bitcode in the same format that we are using now, > and no other (allocatable) sections. The actual support needed in LLVM would > be limited to consumers, i.e. LTO infrastructure: linker plugins, llvm-ar, > llvm-nm etc. We would not necessarily need to teach other bitcode consumers > (e.g. llvm-dis) about this format or add any producers to the tree, but > it may be useful as a matter of convenience to do so. > > We can also consider extending this format by generating code into the > object file, such as for functions which we believe at compile time to > be cold, or for all functions if we want the decisions to be made at link > time. This may be beneficial for C/C++ compilation as it may allow us to > parallelize/deduplicate the code generation work for at least some functions. > > Thanks, > -- > Peter > > [1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/224359.html > [2] http://golang.org/doc/install/gccgo#Imports > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Seemingly Similar Threads
- [LLVMdev] Proposal: support object file-based on-disk module format
- RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
- RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units
- RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units
- RFC [ThinLTO]: Promoting more aggressively in order to reduce incremental link time and allow sharing between linkage units