thr3ads.net - llvm dev - [llvm-dev] [RFC] Embedding Bitcode in Object Files [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Steven Wu via llvm-dev

2016-Feb-03 18:25 UTC

[llvm-dev] [RFC] Embedding Bitcode in Object Files

Apple has some internal implemenation for embedding bitcode in the object file
that we would like to upstream. It has few changes to clang frontend, including
new clang options, clang driver changes and utilities to embed bitcode inside
object file. We believe upstreaming these implementations will benefit the
people who would like to develop software on Apple platform using open source
LLVM. It also helps the driver compatibility and it aligns with some of ongoing
efforts like Thin-LTO which also has an object wrapper for bitcode.

Embedded Bitcode Design:
Embedded Bitcode are designed to enable bitcode distribution without disturbing
normal development flow. When a program is compiled with bitcode, clang will
embed the optimized bitcode in a special section in the object file, together
with the options that is used during the compilation. The object file will still
have the normal TEXT, DATA sections for normal linking. During the linking,
linker will check all the input object files have embedded bitcode and collect
the bitcode into an archive which is embedded in the output. The archive also
contains all the information that is needed to rebuild the linked binary. All
compilation and linking stage can be replayed to generated the final binary.

There are mainly two parts we would like to upstream first:
1. Clang Driver:
Adding -fembed-bitcode option. When this new option is used, it will split the
compilation into 2 stages. The first stage runs the frontend and all the
optimization passes, and the second stage embeds the bitcode from the first
stage then runs the CodeGen passes.  There is also a -fembed-bitcode-marker
option that doesn't split the compilation into 2 stages and it only puts an
1
byte marker into the object file. This is used to speed up the debug build
because bitcode serialization and verification will make -fembed-bitcode slower
especially with -O0 -g. Linker can still check the presence of the section to
provide feedback if any of the object files participated in the linking is
missing bitcode in a full bitcode build.
2. Bitcode Embedding:
Several special sections are used by bitcode to mark the presence of the bitcode
in the MachO file.
"__LLVM, __bitcode" is used to store the optimized bitcode in the
object file.
It can have an 1-byte size as a marker to provide diagnostics in debug build.
"__LLVM, __cmdline" is used to store the clang command-line options. 
There are
few options that are not reflected in the bitcode that we would like to replay
in
the rebuild. For example, '-O0' option makes us run FastISel during
rebuild.


Thanks

Steven

Peter Collingbourne via llvm-dev

2016-Feb-03 18:48 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files

Hi Steven,

Can you please explain how this relates to the existing .llvmbc section
feature?

Peter

On Wed, Feb 03, 2016 at 10:25:32AM -0800, Steven Wu via cfe-dev
wrote:> Apple has some internal implemenation for embedding bitcode in the object
file
> that we would like to upstream. It has few changes to clang frontend,
including
> new clang options, clang driver changes and utilities to embed bitcode
inside
> object file. We believe upstreaming these implementations will benefit the
> people who would like to develop software on Apple platform using open
source
> LLVM. It also helps the driver compatibility and it aligns with some of
ongoing
> efforts like Thin-LTO which also has an object wrapper for bitcode.
> 
> Embedded Bitcode Design:
> Embedded Bitcode are designed to enable bitcode distribution without
disturbing
> normal development flow. When a program is compiled with bitcode, clang
will
> embed the optimized bitcode in a special section in the object file,
together
> with the options that is used during the compilation. The object file will
still
> have the normal TEXT, DATA sections for normal linking. During the linking,
> linker will check all the input object files have embedded bitcode and
collect
> the bitcode into an archive which is embedded in the output. The archive
also
> contains all the information that is needed to rebuild the linked binary.
All
> compilation and linking stage can be replayed to generated the final
binary.
> 
> There are mainly two parts we would like to upstream first:
> 1. Clang Driver:
> Adding -fembed-bitcode option. When this new option is used, it will split
the
> compilation into 2 stages. The first stage runs the frontend and all the
> optimization passes, and the second stage embeds the bitcode from the first
> stage then runs the CodeGen passes.  There is also a -fembed-bitcode-marker
> option that doesn't split the compilation into 2 stages and it only
puts an 1
> byte marker into the object file. This is used to speed up the debug build
> because bitcode serialization and verification will make -fembed-bitcode
slower
> especially with -O0 -g. Linker can still check the presence of the section
to
> provide feedback if any of the object files participated in the linking is
> missing bitcode in a full bitcode build.
> 2. Bitcode Embedding:
> Several special sections are used by bitcode to mark the presence of the
bitcode
> in the MachO file.
> "__LLVM, __bitcode" is used to store the optimized bitcode in the
object file.
> It can have an 1-byte size as a marker to provide diagnostics in debug
build.
> "__LLVM, __cmdline" is used to store the clang command-line
options.  There are
> few options that are not reflected in the bitcode that we would like to
replay in
> the rebuild. For example, '-O0' option makes us run FastISel during
rebuild.
> 
> 
> Thanks
> 
> Steven
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Peter

Steven Wu via llvm-dev

2016-Feb-03 19:01 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files

Hi Peter

It is not currently related because we started the implementation before
Thin-LTO
gets proposed in the community but our "__LLVM, __bitcode" section is
pretty much
the same as ".llvmbc" section. Note ".llvmbc" doesn't
really follow the section
naming convention for MachO objects. I am hoping to unify them during the
upstream
of the implementation.

Thanks

Steven
 > On Feb 3, 2016, at 10:48 AM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi Steven,
> 
> Can you please explain how this relates to the existing .llvmbc section
> feature?
> 
> Peter
> 
> On Wed, Feb 03, 2016 at 10:25:32AM -0800, Steven Wu via cfe-dev wrote:
>> Apple has some internal implemenation for embedding bitcode in the
object file
>> that we would like to upstream. It has few changes to clang frontend,
including
>> new clang options, clang driver changes and utilities to embed bitcode
inside
>> object file. We believe upstreaming these implementations will benefit
the
>> people who would like to develop software on Apple platform using open
source
>> LLVM. It also helps the driver compatibility and it aligns with some of
ongoing
>> efforts like Thin-LTO which also has an object wrapper for bitcode.
>> 
>> Embedded Bitcode Design:
>> Embedded Bitcode are designed to enable bitcode distribution without
disturbing
>> normal development flow. When a program is compiled with bitcode, clang
will
>> embed the optimized bitcode in a special section in the object file,
together
>> with the options that is used during the compilation. The object file
will still
>> have the normal TEXT, DATA sections for normal linking. During the
linking,
>> linker will check all the input object files have embedded bitcode and
collect
>> the bitcode into an archive which is embedded in the output. The
archive also
>> contains all the information that is needed to rebuild the linked
binary. All
>> compilation and linking stage can be replayed to generated the final
binary.
>> 
>> There are mainly two parts we would like to upstream first:
>> 1. Clang Driver:
>> Adding -fembed-bitcode option. When this new option is used, it will
split the
>> compilation into 2 stages. The first stage runs the frontend and all
the
>> optimization passes, and the second stage embeds the bitcode from the
first
>> stage then runs the CodeGen passes.  There is also a
-fembed-bitcode-marker
>> option that doesn't split the compilation into 2 stages and it only
puts an 1
>> byte marker into the object file. This is used to speed up the debug
build
>> because bitcode serialization and verification will make
-fembed-bitcode slower
>> especially with -O0 -g. Linker can still check the presence of the
section to
>> provide feedback if any of the object files participated in the linking
is
>> missing bitcode in a full bitcode build.
>> 2. Bitcode Embedding:
>> Several special sections are used by bitcode to mark the presence of
the bitcode
>> in the MachO file.
>> "__LLVM, __bitcode" is used to store the optimized bitcode in
the object file.
>> It can have an 1-byte size as a marker to provide diagnostics in debug
build.
>> "__LLVM, __cmdline" is used to store the clang command-line
options.  There are
>> few options that are not reflected in the bitcode that we would like to
replay in
>> the rebuild. For example, '-O0' option makes us run FastISel
during rebuild.
>> 
>> 
>> Thanks
>> 
>> Steven
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
> -- 
> Peter

James Y Knight via llvm-dev

2016-Feb-05 22:14 UTC

head link

[llvm-dev] [RFC] Embedding Bitcode in Object Files

On Wed, Feb 3, 2016 at 1:25 PM, Steven Wu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> "__LLVM, __cmdline" is used to store the clang command-line
options.
> There are
> few options that are not reflected in the bitcode that we would like to
> replay in
> the rebuild. For example, '-O0' option makes us run FastISel during
> rebuild.

Without knowing more details of your implementation, I'd be concerned about
how this might impact deterministic/reproducible builds.

Source paths are recorded in a number of places, but you can typically fix
that by using -fdebug-prefix-map. But if the entire command-line including
the -fdebug-prefix-map argument gets stored in the output too, then you
still have a problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/cec2a859/attachment.html>

Steven Wu via llvm-dev

2016-Feb-05 23:06 UTC

head link

[llvm-dev] [RFC] Embedding Bitcode in Object Files

> On Feb 5, 2016, at 2:14 PM, James Y Knight <jyknight at google.com>
wrote:
> 
> On Wed, Feb 3, 2016 at 1:25 PM, Steven Wu via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> "__LLVM, __cmdline" is used to store the clang command-line
options.  There are
> few options that are not reflected in the bitcode that we would like to
replay in
> the rebuild. For example, '-O0' option makes us run FastISel during
rebuild.
> 
> Without knowing more details of your implementation, I'd be concerned
about how this might impact deterministic/reproducible builds.
> 
> Source paths are recorded in a number of places, but you can typically fix
that by using -fdebug-prefix-map. But if the entire command-line including the
-fdebug-prefix-map argument gets stored in the output too, then you still have a
problem.

I don't think we need any path in the command line section. We only record
the command-line options that will affect CodeGen. See my example in one of the
preview reply:> $ clang -fembed-bitcode -O0 test.c -c -###
> "clang" "-cc1"  (...lots of options...) "-o"
"test.bc" "-x" "c" "test.c"   <---
First stage
> "clang" "-cc1" "-triple"
"x86_64-apple-macosx10.11.0" "-emit-obj"
"-fembed-bitcode" "-O0" "-disable-llvm-optzns"
"-o" "test.o" "-x" "ir"
"test.bc"  <--- Second stageI can't think of any source path that can affect CodeGen. There should not
be any paths other than the bitcode input path and binary output path exists in
the second stage and they are excluded from the command line section as well.
-fdebug-prefix-map is consumed by the front-end and prefixed paths are a part of
the debug info in the metadata. You don't need to encode -fdebug-prefix-map
in the bitcode section to reproduce the object file with the same debug info.
Did that answer your concern?

Thanks

Steven
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/45de3b9e/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2016 - [RFC] Embedding Bitcode in Object Files

[llvm-dev] [RFC] Embedding Bitcode in Object Files

[llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files

[llvm-dev] [cfe-dev] [RFC] Embedding Bitcode in Object Files

[llvm-dev] [RFC] Embedding Bitcode in Object Files

[llvm-dev] [RFC] Embedding Bitcode in Object Files

Apparently Analagous Threads