Steven Wu via llvm-dev
2016-Feb-03 18:25 UTC
[llvm-dev] [RFC] Embedding Bitcode in Object Files
Apple has some internal implemenation for embedding bitcode in the object file that we would like to upstream. It has few changes to clang frontend, including new clang options, clang driver changes and utilities to embed bitcode inside object file. We believe upstreaming these implementations will benefit the people who would like to develop software on Apple platform using open source LLVM. It also helps the driver compatibility and it aligns with some of ongoing efforts like Thin-LTO which also has an object wrapper for bitcode. Embedded Bitcode Design: Embedded Bitcode are designed to enable bitcode distribution without disturbing normal development flow. When a program is compiled with bitcode, clang will embed the optimized bitcode in a special section in the object file, together with the options that is used during the compilation. The object file will still have the normal TEXT, DATA sections for normal linking. During the linking, linker will check all the input object files have embedded bitcode and collect the bitcode into an archive which is embedded in the output. The archive also contains all the information that is needed to rebuild the linked binary. All compilation and linking stage can be replayed to generated the final binary. There are mainly two parts we would like to upstream first: 1. Clang Driver: Adding -fembed-bitcode option. When this new option is used, it will split the compilation into 2 stages. The first stage runs the frontend and all the optimization passes, and the second stage embeds the bitcode from the first stage then runs the CodeGen passes. There is also a -fembed-bitcode-marker option that doesn't split the compilation into 2 stages and it only puts an 1 byte marker into the object file. This is used to speed up the debug build because bitcode serialization and verification will make -fembed-bitcode slower especially with -O0 -g. Linker can still check the presence of the section to provide feedback if any of the object files participated in the linking is missing bitcode in a full bitcode build. 2. Bitcode Embedding: Several special sections are used by bitcode to mark the presence of the bitcode in the MachO file. "__LLVM, __bitcode" is used to store the optimized bitcode in the object file. It can have an 1-byte size as a marker to provide diagnostics in debug build. "__LLVM, __cmdline" is used to store the clang command-line options. There are few options that are not reflected in the bitcode that we would like to replay in the rebuild. For example, '-O0' option makes us run FastISel during rebuild. Thanks Steven
Peter Collingbourne via llvm-dev
2016-Feb-03 18:48 UTC
[llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files
Hi Steven, Can you please explain how this relates to the existing .llvmbc section feature? Peter On Wed, Feb 03, 2016 at 10:25:32AM -0800, Steven Wu via cfe-dev wrote:> Apple has some internal implemenation for embedding bitcode in the object file > that we would like to upstream. It has few changes to clang frontend, including > new clang options, clang driver changes and utilities to embed bitcode inside > object file. We believe upstreaming these implementations will benefit the > people who would like to develop software on Apple platform using open source > LLVM. It also helps the driver compatibility and it aligns with some of ongoing > efforts like Thin-LTO which also has an object wrapper for bitcode. > > Embedded Bitcode Design: > Embedded Bitcode are designed to enable bitcode distribution without disturbing > normal development flow. When a program is compiled with bitcode, clang will > embed the optimized bitcode in a special section in the object file, together > with the options that is used during the compilation. The object file will still > have the normal TEXT, DATA sections for normal linking. During the linking, > linker will check all the input object files have embedded bitcode and collect > the bitcode into an archive which is embedded in the output. The archive also > contains all the information that is needed to rebuild the linked binary. All > compilation and linking stage can be replayed to generated the final binary. > > There are mainly two parts we would like to upstream first: > 1. Clang Driver: > Adding -fembed-bitcode option. When this new option is used, it will split the > compilation into 2 stages. The first stage runs the frontend and all the > optimization passes, and the second stage embeds the bitcode from the first > stage then runs the CodeGen passes. There is also a -fembed-bitcode-marker > option that doesn't split the compilation into 2 stages and it only puts an 1 > byte marker into the object file. This is used to speed up the debug build > because bitcode serialization and verification will make -fembed-bitcode slower > especially with -O0 -g. Linker can still check the presence of the section to > provide feedback if any of the object files participated in the linking is > missing bitcode in a full bitcode build. > 2. Bitcode Embedding: > Several special sections are used by bitcode to mark the presence of the bitcode > in the MachO file. > "__LLVM, __bitcode" is used to store the optimized bitcode in the object file. > It can have an 1-byte size as a marker to provide diagnostics in debug build. > "__LLVM, __cmdline" is used to store the clang command-line options. There are > few options that are not reflected in the bitcode that we would like to replay in > the rebuild. For example, '-O0' option makes us run FastISel during rebuild. > > > Thanks > > Steven > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-- Peter
Steven Wu via llvm-dev
2016-Feb-03 19:01 UTC
[llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files
Hi Peter It is not currently related because we started the implementation before Thin-LTO gets proposed in the community but our "__LLVM, __bitcode" section is pretty much the same as ".llvmbc" section. Note ".llvmbc" doesn't really follow the section naming convention for MachO objects. I am hoping to unify them during the upstream of the implementation. Thanks Steven> On Feb 3, 2016, at 10:48 AM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > Hi Steven, > > Can you please explain how this relates to the existing .llvmbc section > feature? > > Peter > > On Wed, Feb 03, 2016 at 10:25:32AM -0800, Steven Wu via cfe-dev wrote: >> Apple has some internal implemenation for embedding bitcode in the object file >> that we would like to upstream. It has few changes to clang frontend, including >> new clang options, clang driver changes and utilities to embed bitcode inside >> object file. We believe upstreaming these implementations will benefit the >> people who would like to develop software on Apple platform using open source >> LLVM. It also helps the driver compatibility and it aligns with some of ongoing >> efforts like Thin-LTO which also has an object wrapper for bitcode. >> >> Embedded Bitcode Design: >> Embedded Bitcode are designed to enable bitcode distribution without disturbing >> normal development flow. When a program is compiled with bitcode, clang will >> embed the optimized bitcode in a special section in the object file, together >> with the options that is used during the compilation. The object file will still >> have the normal TEXT, DATA sections for normal linking. During the linking, >> linker will check all the input object files have embedded bitcode and collect >> the bitcode into an archive which is embedded in the output. The archive also >> contains all the information that is needed to rebuild the linked binary. All >> compilation and linking stage can be replayed to generated the final binary. >> >> There are mainly two parts we would like to upstream first: >> 1. Clang Driver: >> Adding -fembed-bitcode option. When this new option is used, it will split the >> compilation into 2 stages. The first stage runs the frontend and all the >> optimization passes, and the second stage embeds the bitcode from the first >> stage then runs the CodeGen passes. There is also a -fembed-bitcode-marker >> option that doesn't split the compilation into 2 stages and it only puts an 1 >> byte marker into the object file. This is used to speed up the debug build >> because bitcode serialization and verification will make -fembed-bitcode slower >> especially with -O0 -g. Linker can still check the presence of the section to >> provide feedback if any of the object files participated in the linking is >> missing bitcode in a full bitcode build. >> 2. Bitcode Embedding: >> Several special sections are used by bitcode to mark the presence of the bitcode >> in the MachO file. >> "__LLVM, __bitcode" is used to store the optimized bitcode in the object file. >> It can have an 1-byte size as a marker to provide diagnostics in debug build. >> "__LLVM, __cmdline" is used to store the clang command-line options. There are >> few options that are not reflected in the bitcode that we would like to replay in >> the rebuild. For example, '-O0' option makes us run FastISel during rebuild. >> >> >> Thanks >> >> Steven >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > > -- > Peter
James Y Knight via llvm-dev
2016-Feb-05 22:14 UTC
[llvm-dev] [RFC] Embedding Bitcode in Object Files
On Wed, Feb 3, 2016 at 1:25 PM, Steven Wu via llvm-dev < llvm-dev at lists.llvm.org> wrote:> "__LLVM, __cmdline" is used to store the clang command-line options. > There are > few options that are not reflected in the bitcode that we would like to > replay in > the rebuild. For example, '-O0' option makes us run FastISel during > rebuild.Without knowing more details of your implementation, I'd be concerned about how this might impact deterministic/reproducible builds. Source paths are recorded in a number of places, but you can typically fix that by using -fdebug-prefix-map. But if the entire command-line including the -fdebug-prefix-map argument gets stored in the output too, then you still have a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/cec2a859/attachment.html>
Steven Wu via llvm-dev
2016-Feb-05 23:06 UTC
[llvm-dev] [RFC] Embedding Bitcode in Object Files
> On Feb 5, 2016, at 2:14 PM, James Y Knight <jyknight at google.com> wrote: > > On Wed, Feb 3, 2016 at 1:25 PM, Steven Wu via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > "__LLVM, __cmdline" is used to store the clang command-line options. There are > few options that are not reflected in the bitcode that we would like to replay in > the rebuild. For example, '-O0' option makes us run FastISel during rebuild. > > Without knowing more details of your implementation, I'd be concerned about how this might impact deterministic/reproducible builds. > > Source paths are recorded in a number of places, but you can typically fix that by using -fdebug-prefix-map. But if the entire command-line including the -fdebug-prefix-map argument gets stored in the output too, then you still have a problem.I don't think we need any path in the command line section. We only record the command-line options that will affect CodeGen. See my example in one of the preview reply:> $ clang -fembed-bitcode -O0 test.c -c -### > "clang" "-cc1" (...lots of options...) "-o" "test.bc" "-x" "c" "test.c" <--- First stage > "clang" "-cc1" "-triple" "x86_64-apple-macosx10.11.0" "-emit-obj" "-fembed-bitcode" "-O0" "-disable-llvm-optzns" "-o" "test.o" "-x" "ir" "test.bc" <--- Second stageI can't think of any source path that can affect CodeGen. There should not be any paths other than the bitcode input path and binary output path exists in the second stage and they are excluded from the command line section as well. -fdebug-prefix-map is consumed by the front-end and prefixed paths are a part of the debug info in the metadata. You don't need to encode -fdebug-prefix-map in the bitcode section to reproduce the object file with the same debug info. Did that answer your concern? Thanks Steven -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/45de3b9e/attachment.html>
Apparently Analagous Threads
- [cfe-dev] [RFC] Embedding Bitcode in Object Files
- [RFC] Embedding Bitcode in Object Files
- [RFC] Embedded bitcode and related upstream (Part II)
- [RFC] Embedding Bitcode in Object Files
- Difference when compiling human readable IR vs bitcode with clang frontend