thr3ads.net - llvm dev - [llvm-dev] [RFC] Embedded bitcode and related upstream (Part II) [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Steven Wu via llvm-dev

2016-Jun-03 18:36 UTC

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

Hi everyone

I am still in the process of upstreaming some improvements to the embed bitcode
option. If you want more background, you can read the previous RFC
(http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html
<http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>).
This is part II of the discussion.

Current Status:
A basic version of -fembed-bitcode option is upstreamed and functioning.
You can use -fembed-bitcode={off, all, bitcode, marker} option to control what
gets embedded in the final object file output:
off: default, nothing gets embedded.
all: optimized bitcode and command line options gets embedded in the object
file.
bitcode: only optimized bitcode is embedded
marker: only put a marker in the object file

What needs to be improved:
1. Whitelist for command line options that can be used with bitcode:
Current trunk implementation embeds all the cc1 command line options (that
includes header include paths, warning flags and other front-end options) in the
command line section. That is lot of redundant information. To re-create the
object file from the embedded optimized bitcode, most of these options are
useless. On the other hand, they can leak information of the source code. One
solution will be keeping a list of all the options that can affect code
generation but not encoded in the bitcode. I have internally prototyped with
disallowing these options explicitly and allowed only the reminder of the 
options to be embedded (http://reviews.llvm.org/D17394
<http://reviews.llvm.org/D17394>). A better solution might be encoding
that information in "Options.td" as specific group.

2. Assembly input handling:
This is a workaround to allow source code written in assembly to work with
"-fembed-bitcode" options. When compiling assembly source code with
"-fembed-bitcode", clang-as creates an empty section "__LLVM,
__asm" in the object file. That is just a way to distinguish object files
compiled from assembly source from those compiled from higher level source code
but forgot to use "-fembed-bitcode" options. Linker can use this
section to diagnose if "-fembed-bitcode" is consistently used on all
the object files participated in the linking.

3. Bitcode symbol hiding:
There was some concerns for leaking source code information when using bitcode
feature. One approach to avoid the leak is to add a pass which renames all the
globals and metadata strings. The also keeps a reverse map in case the original
name needs to be recovered. The final bitcode should contain no more symbols or
debug info than a stripped binary. To make sure modified bitcode can still be
linked correctly, the renaming need to be consistent across all bitcode
participated in the linking and everything that is external of the linkage unit
need to be preserved. This means the pass can only be run during the linking and
requires some LTO api.

4. Debug info strip to line-tables pass:
As the name suggested, this pass strip down the full debug info to line-tables
only. This is also one of the steps we took to prevent the leak of source code
information in bitcode.

Please let me know what do you think about the pieces above or if you have any
concerns about the methodology. I will put up patches for review soon.

Thanks

Steven
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160603/14455905/attachment.html>

Eric Christopher via llvm-dev

2016-Jun-13 06:44 UTC

head link

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

Hi Steven,

Great to see the commentary and updates here. I've got a few questions
about some of this work. It might be nice to see some separate RFCs for a
couple of things, but we'll figure that out after you send out patches
probably :)

What needs to be improved:> 1. Whitelist for command line options that can be used with bitcode:
> Current trunk implementation embeds all the cc1 command line options (that
> includes header include paths, warning flags and other front-end options)
> in the command line section. That is lot of redundant information. To
> re-create the object file from the embedded optimized bitcode, most of
> these options are useless. On the other hand, they can leak information of
> the source code. One solution will be keeping a list of all the options
> that can affect code generation but not encoded in the bitcode. I have
> internally prototyped with disallowing these options explicitly and allowed
> only the reminder of the  options to be embedded (
> http://reviews.llvm.org/D17394). A better solution might be encoding that
> information in "Options.td" as specific group.
>
This is really interesting. I'm not a particularly security minded person
so I don't have a lot of commentary there. An explicit whitelist sounds a
bit painful to keep maintained, explicitly having a group in Options.td
sounds pretty nice. You'll need to add them to multiple groups, but it
seems pretty nice.

> 2. Assembly input handling:
> This is a workaround to allow source code written in assembly to work with
> "-fembed-bitcode" options. When compiling assembly source code
with
> "-fembed-bitcode", clang-as creates an empty section
"__LLVM, __asm" in the
> object file. That is just a way to distinguish object files compiled from
> assembly source from those compiled from higher level source code but
> forgot to use "-fembed-bitcode" options. Linker can use this
section to
> diagnose if "-fembed-bitcode" is consistently used on all the
object files
> participated in the linking.
>
I'm surprised you want a separate and empty section and not a header flag
as those are easier to keep around and won't take up a precious mach-o
section. There are probably other options here as well. There are probably
other options or concerns that someone shipping bitcode might have here as
well, but I'm sure those are being talked about - doesn't have too much
affect on the community though.

3. Bitcode symbol hiding:> There was some concerns for leaking source code information when using
> bitcode feature. One approach to avoid the leak is to add a pass which
> renames all the globals and metadata strings. The also keeps a reverse map
> in case the original name needs to be recovered. The final bitcode should
> contain no more symbols or debug info than a stripped binary. To make sure
> modified bitcode can still be linked correctly, the renaming need to be
> consistent across all bitcode participated in the linking and everything
> that is external of the linkage unit need to be preserved. This means the
> pass can only be run during the linking and requires some LTO api.
>
How are you planning to ensure the safety of the reverse map? Seems that
requiring linking is a bit icky, but might work. Are you mostly worried
about function names that could be stripped out? What LTO api are you
envisioning here?

> 4. Debug info strip to line-tables pass:
> As the name suggested, this pass strip down the full debug info to
> line-tables only. This is also one of the steps we took to prevent the leak
> of source code information in bitcode.
>
I'm very curious about what's going on here. Could you elaborate? :)

Thanks a ton for the update - glad to see this being worked on!

-eric

>
> Please let me know what do you think about the pieces above or if you have
> any concerns about the methodology. I will put up patches for review soon.
>
> Thanks
>
> Steven
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160613/a4b13a50/attachment.html>

Steven Wu via llvm-dev

2016-Jun-13 16:37 UTC

head link

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

Thanks for the feedback! Replies inline. 
> On Jun 12, 2016, at 11:44 PM, Eric Christopher <echristo at
gmail.com> wrote:
> 
> Hi Steven,
> 
> Great to see the commentary and updates here. I've got a few questions
about some of this work. It might be nice to see some separate RFCs for a couple
of things, but we'll figure that out after you send out patches probably :)
> 
> What needs to be improved:
> 1. Whitelist for command line options that can be used with bitcode:
> Current trunk implementation embeds all the cc1 command line options (that
includes header include paths, warning flags and other front-end options) in the
command line section. That is lot of redundant information. To re-create the
object file from the embedded optimized bitcode, most of these options are
useless. On the other hand, they can leak information of the source code. One
solution will be keeping a list of all the options that can affect code
generation but not encoded in the bitcode. I have internally prototyped with
disallowing these options explicitly and allowed only the reminder of the 
options to be embedded (http://reviews.llvm.org/D17394
<http://reviews.llvm.org/D17394>). A better solution might be encoding
that information in "Options.td" as specific group.
> 
> This is really interesting. I'm not a particularly security minded
person so I don't have a lot of commentary there. An explicit whitelist
sounds a bit painful to keep maintained, explicitly having a group in Options.td
sounds pretty nice. You'll need to add them to multiple groups, but it seems
pretty nice.
I have already implemented the new approach in http://reviews.llvm.org/D21230
<http://reviews.llvm.org/D21230>. It creates a new group for all the cc1
options that can affect codegen but not having a corresponding attribute in the
bitcode. When I wrote up this patch, I think it is also a good idea to extend
the group to driver flags so clang driver can issue warnings when using these
flags with LTO because they are likely to be dropped in the process. That is my
next thing to do if someone reviews my patch and agrees that is right thing to
do.
>  
> 2. Assembly input handling:
> This is a workaround to allow source code written in assembly to work with
"-fembed-bitcode" options. When compiling assembly source code with
"-fembed-bitcode", clang-as creates an empty section "__LLVM,
__asm" in the object file. That is just a way to distinguish object files
compiled from assembly source from those compiled from higher level source code
but forgot to use "-fembed-bitcode" options. Linker can use this
section to diagnose if "-fembed-bitcode" is consistently used on all
the object files participated in the linking.
> 
> I'm surprised you want a separate and empty section and not a header
flag as those are easier to keep around and won't take up a precious mach-o
section. There are probably other options here as well. There are probably other
options or concerns that someone shipping bitcode might have here as well, but
I'm sure those are being talked about - doesn't have too much affect on
the community though.
I suppose you mean the alternative is to burn a macho command for that. Well,
that is a limited resource and we don't have much left. Plus, using empty
section will make this accessible to other binary format, not only macho files.
I also have an interesting thought about handle the assembly, that is to wrap it
in module assembly in a bitcode file. I am not sure it would preserve the all
semantics of the original assembly and that would mean I need to somehow teach
the assembler about bitcode (which might make this not very attractive). Yes,
you might be right this doesn't affect the community, if no one else is
interesting in a solution for the problem we have, then this might not be
suitable for contributing. I am happy to keep it downstream.
> 
> 3. Bitcode symbol hiding:
> There was some concerns for leaking source code information when using
bitcode feature. One approach to avoid the leak is to add a pass which renames
all the globals and metadata strings. The also keeps a reverse map in case the
original name needs to be recovered. The final bitcode should contain no more
symbols or debug info than a stripped binary. To make sure modified bitcode can
still be linked correctly, the renaming need to be consistent across all bitcode
participated in the linking and everything that is external of the linkage unit
need to be preserved. This means the pass can only be run during the linking and
requires some LTO api.
> 
> How are you planning to ensure the safety of the reverse map? Seems that
requiring linking is a bit icky, but might work. Are you mostly worried about
function names that could be stripped out? What LTO api are you envisioning
here?
The reverse map is emitted as a separate file from the output binary/bitcode. It
should not be shipped together with the binary output, just like dSYM bundle.
The reason it needs to be done after linking is a limitation of the symbol
hiding technique. It requires that the symbols must be resolved. Think about the
following case:
a.o:
T export_symbol
T global_symbol
t local_symbol
b.o:
U global_symbol

To make sure the bitcode after symbol hiding pass can still link and produce the
same output, the pass need to rename them:
a.o:
T export_symbol    --> export_symbol (preserve)
T global_symbol    --> hidden_symbol_1 (rename, but need to have the same
name as the one in b.o)
t local_symbol       --> hidden_symbol_2 (rename, but don't care what it
becomes)
b.o:
U global_symbol   --> hidden_symbol_1
The pass need to know what symbols to keep and a global renaming table so the
names after renaming are consistent across all the modules.
 >  
> 4. Debug info strip to line-tables pass:
> As the name suggested, this pass strip down the full debug info to
line-tables only. This is also one of the steps we took to prevent the leak of
source code information in bitcode.
> 
> I'm very curious about what's going on here. Could you elaborate?
:)
Cc Adrian
He would know more about it. I would only know that it can reconstruct
-gline-tables-only debug info from full debug info. We use it as a part of the
bitcode pipeline because we don't want the bitcode file to be exceedingly
large but I can see this pass to be useful in other circumstances.

Steven
> 
> Thanks a ton for the update - glad to see this being worked on!
> 
> -eric
>  
> 
> Please let me know what do you think about the pieces above or if you have
any concerns about the methodology. I will put up patches for review soon.
> 
> Thanks
> 
> Steven
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160613/7bd1609a/attachment.html>

Jonas Devlieghere via llvm-dev

2016-Jul-25 10:24 UTC

head link

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

Hi,

I hope I'm not breaking any mailing list etiquette by replying to this
mail, but if I am then please accept my apologies.

On Fri, Jun 3, 2016 at 8:36 PM, Steven Wu via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi everyone
>
> I am still in the process of upstreaming some improvements to the embed
> bitcode option. If you want more background, you can read the previous RFC
> (http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html). This
> is part II of the discussion.
>
> Current Status:
> A basic version of -fembed-bitcode option is upstreamed and functioning.
> You can use -fembed-bitcode={off, all, bitcode, marker} option to control
> what gets embedded in the final object file output:
> off: default, nothing gets embedded.
> all: optimized bitcode and command line options gets embedded in the object
> file.
> bitcode: only optimized bitcode is embedded
> marker: only put a marker in the object file
>
> What needs to be improved:
> 1. Whitelist for command line options that can be used with bitcode:
> Current trunk implementation embeds all the cc1 command line options (that
> includes header include paths, warning flags and other front-end options)
in
> the command line section. That is lot of redundant information. To
re-create
> the object file from the embedded optimized bitcode, most of these options
> are useless. On the other hand, they can leak information of the source
> code. One solution will be keeping a list of all the options that can
affect
> code generation but not encoded in the bitcode. I have internally
prototyped
> with disallowing these options explicitly and allowed only the reminder of
> the  options to be embedded (http://reviews.llvm.org/D17394). A better
> solution might be encoding that information in "Options.td" as
specific
> group.
>
> 2. Assembly input handling:
> This is a workaround to allow source code written in assembly to work with
> "-fembed-bitcode" options. When compiling assembly source code
with
> "-fembed-bitcode", clang-as creates an empty section
"__LLVM, __asm" in the
> object file. That is just a way to distinguish object files compiled from
> assembly source from those compiled from higher level source code but
forgot
> to use "-fembed-bitcode" options. Linker can use this section to
diagnose if
> "-fembed-bitcode" is consistently used on all the object files
participated
> in the linking.
>
> 3. Bitcode symbol hiding:
> There was some concerns for leaking source code information when using
> bitcode feature. One approach to avoid the leak is to add a pass which
> renames all the globals and metadata strings. The also keeps a reverse map
> in case the original name needs to be recovered. The final bitcode should
> contain no more symbols or debug info than a stripped binary. To make sure
> modified bitcode can still be linked correctly, the renaming need to be
> consistent across all bitcode participated in the linking and everything
> that is external of the linkage unit need to be preserved. This means the
> pass can only be run during the linking and requires some LTO api.
Regarding the symbol map, are you planning to upstream a pass that
restores the symbols? I have been trying to do this myself in order to
reverse the "BCSymbolMap". However this turned out to be less
straightforward than I'd hoped. Any info on this would be greatly
appreciated!
> 4. Debug info strip to line-tables pass:
> As the name suggested, this pass strip down the full debug info to
> line-tables only. This is also one of the steps we took to prevent the leak
> of source code information in bitcode.
>
> Please let me know what do you think about the pieces above or if you have
> any concerns about the methodology. I will put up patches for review soon.
>
> Thanks
>
> Steven
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Cheers,
Jonas

Steven Wu via llvm-dev

2016-Jul-25 16:01 UTC

head link

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

> On Jul 25, 2016, at 3:24 AM, Jonas Devlieghere <jonas at
devlieghere.com> wrote:
> 
> Hi,
> 
> I hope I'm not breaking any mailing list etiquette by replying to this
> mail, but if I am then please accept my apologies.
> 
> On Fri, Jun 3, 2016 at 8:36 PM, Steven Wu via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi everyone
>> 
>> I am still in the process of upstreaming some improvements to the embed
>> bitcode option. If you want more background, you can read the previous
RFC
>> (http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html).
This
>> is part II of the discussion.
>> 
>> Current Status:
>> A basic version of -fembed-bitcode option is upstreamed and
functioning.
>> You can use -fembed-bitcode={off, all, bitcode, marker} option to
control
>> what gets embedded in the final object file output:
>> off: default, nothing gets embedded.
>> all: optimized bitcode and command line options gets embedded in the
object
>> file.
>> bitcode: only optimized bitcode is embedded
>> marker: only put a marker in the object file
>> 
>> What needs to be improved:
>> 1. Whitelist for command line options that can be used with bitcode:
>> Current trunk implementation embeds all the cc1 command line options
(that
>> includes header include paths, warning flags and other front-end
options) in
>> the command line section. That is lot of redundant information. To
re-create
>> the object file from the embedded optimized bitcode, most of these
options
>> are useless. On the other hand, they can leak information of the source
>> code. One solution will be keeping a list of all the options that can
affect
>> code generation but not encoded in the bitcode. I have internally
prototyped
>> with disallowing these options explicitly and allowed only the reminder
of
>> the  options to be embedded (http://reviews.llvm.org/D17394). A better
>> solution might be encoding that information in "Options.td"
as specific
>> group.
>> 
>> 2. Assembly input handling:
>> This is a workaround to allow source code written in assembly to work
with
>> "-fembed-bitcode" options. When compiling assembly source
code with
>> "-fembed-bitcode", clang-as creates an empty section
"__LLVM, __asm" in the
>> object file. That is just a way to distinguish object files compiled
from
>> assembly source from those compiled from higher level source code but
forgot
>> to use "-fembed-bitcode" options. Linker can use this section
to diagnose if
>> "-fembed-bitcode" is consistently used on all the object
files participated
>> in the linking.
>> 
>> 3. Bitcode symbol hiding:
>> There was some concerns for leaking source code information when using
>> bitcode feature. One approach to avoid the leak is to add a pass which
>> renames all the globals and metadata strings. The also keeps a reverse
map
>> in case the original name needs to be recovered. The final bitcode
should
>> contain no more symbols or debug info than a stripped binary. To make
sure
>> modified bitcode can still be linked correctly, the renaming need to be
>> consistent across all bitcode participated in the linking and
everything
>> that is external of the linkage unit need to be preserved. This means
the
>> pass can only be run during the linking and requires some LTO api.
> 
> Regarding the symbol map, are you planning to upstream a pass that
> restores the symbols? I have been trying to do this myself in order to
> reverse the "BCSymbolMap". However this turned out to be less
> straightforward than I'd hoped. Any info on this would be greatly
> appreciated!
We have tools to restore symbols in the dSYM bundle (check dsymutil -symbol-map
option in the Apple toolchain).
I don't think we have a pass to restore the symbols in the bitcode now but
that should be very straight forward and I am happy to implement one as a part
of the item 3.
Of course, that will only happen if the community thinks this feature is
beneficial to them. At the meantime, if you need assist, please file a radar to
Apple at https://bugreport.apple.com <https://bugreport.apple.com/>.

Steven

> 
>> 4. Debug info strip to line-tables pass:
>> As the name suggested, this pass strip down the full debug info to
>> line-tables only. This is also one of the steps we took to prevent the
leak
>> of source code information in bitcode.
>> 
>> Please let me know what do you think about the pieces above or if you
have
>> any concerns about the methodology. I will put up patches for review
soon.
>> 
>> Thanks
>> 
>> Steven
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> Cheers,
> Jonas
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160725/d55bef6d/attachment.html>

Nico Weber via llvm-dev

2016-Nov-30 15:46 UTC

head link

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

Hi Steven,

On Fri, Jun 3, 2016 at 2:36 PM, Steven Wu via cfe-commits <
cfe-commits at lists.llvm.org> wrote:
> Hi everyone
>
> I am still in the process of upstreaming some improvements to the embed
> bitcode option. If you want more background, you can read the previous RFC
(
> http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html). This
> is part II of the discussion.
>
> Current Status:
> A basic version of -fembed-bitcode option is upstreamed and functioning.
> You can use -fembed-bitcode={off, all, bitcode, marker} option to control
> what gets embedded in the final object file output:
> off: default, nothing gets embedded.
> all: optimized bitcode and command line options gets embedded in the
> object file.
> bitcode: only optimized bitcode is embedded
> marker: only put a marker in the object file
>
> What needs to be improved:
> 1. Whitelist for command line options that can be used with bitcode:
> Current trunk implementation embeds all the cc1 command line options (that
> includes header include paths, warning flags and other front-end options)
> in the command line section. That is lot of redundant information. To
> re-create the object file from the embedded optimized bitcode, most of
> these options are useless. On the other hand, they can leak information of
> the source code. One solution will be keeping a list of all the options
> that can affect code generation but not encoded in the bitcode. I have
> internally prototyped with disallowing these options explicitly and allowed
> only the reminder of the  options to be embedded (http://reviews.llvm.org/
> D17394). A better solution might be encoding that information in
> "Options.td" as specific group.
>
> 2. Assembly input handling:
> This is a workaround to allow source code written in assembly to work with
> "-fembed-bitcode" options. When compiling assembly source code
with
> "-fembed-bitcode", clang-as creates an empty section
"__LLVM, __asm" in the
> object file. That is just a way to distinguish object files compiled from
> assembly source from those compiled from higher level source code but
> forgot to use "-fembed-bitcode" options. Linker can use this
section to
> diagnose if "-fembed-bitcode" is consistently used on all the
object files
> participated in the linking.
>
It looks like shipping Xcode's clang has this behavior, but open-source
clang still doesn't. Can you contribute it? It's very useful to us if
open-source clang has the same features as the clang shipping in Xcode.
(That last sentence is true in general, not just for this specific feature.)

>
> 3. Bitcode symbol hiding:
> There was some concerns for leaking source code information when using
> bitcode feature. One approach to avoid the leak is to add a pass which
> renames all the globals and metadata strings. The also keeps a reverse map
> in case the original name needs to be recovered. The final bitcode should
> contain no more symbols or debug info than a stripped binary. To make sure
> modified bitcode can still be linked correctly, the renaming need to be
> consistent across all bitcode participated in the linking and everything
> that is external of the linkage unit need to be preserved. This means the
> pass can only be run during the linking and requires some LTO api.
>
> 4. Debug info strip to line-tables pass:
> As the name suggested, this pass strip down the full debug info to
> line-tables only. This is also one of the steps we took to prevent the leak
> of source code information in bitcode.
>
> Please let me know what do you think about the pieces above or if you have
> any concerns about the methodology. I will put up patches for review soon.
>
> Thanks
>
> Steven
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161130/c7ea8188/attachment.html>

Alex L via llvm-dev

2016-Nov-30 17:37 UTC

head link

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

On 30 November 2016 at 15:46, Nico Weber via cfe-commits <
cfe-commits at lists.llvm.org> wrote:
> Hi Steven,
>
> On Fri, Jun 3, 2016 at 2:36 PM, Steven Wu via cfe-commits <
> cfe-commits at lists.llvm.org> wrote:
>
>> Hi everyone
>>
>> I am still in the process of upstreaming some improvements to the embed
>> bitcode option. If you want more background, you can read the previous
RFC (
>> http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html).
>> This is part II of the discussion.
>>
>> Current Status:
>> A basic version of -fembed-bitcode option is upstreamed and
functioning.
>> You can use -fembed-bitcode={off, all, bitcode, marker} option to
control
>> what gets embedded in the final object file output:
>> off: default, nothing gets embedded.
>> all: optimized bitcode and command line options gets embedded in the
>> object file.
>> bitcode: only optimized bitcode is embedded
>> marker: only put a marker in the object file
>>
>> What needs to be improved:
>> 1. Whitelist for command line options that can be used with bitcode:
>> Current trunk implementation embeds all the cc1 command line options
>> (that includes header include paths, warning flags and other front-end
>> options) in the command line section. That is lot of redundant
information.
>> To re-create the object file from the embedded optimized bitcode, most
of
>> these options are useless. On the other hand, they can leak information
of
>> the source code. One solution will be keeping a list of all the options
>> that can affect code generation but not encoded in the bitcode. I have
>> internally prototyped with disallowing these options explicitly and
allowed
>> only the reminder of the  options to be embedded (
>> http://reviews.llvm.org/D17394). A better solution might be encoding
>> that information in "Options.td" as specific group.
>>
>> 2. Assembly input handling:
>> This is a workaround to allow source code written in assembly to work
>> with "-fembed-bitcode" options. When compiling assembly
source code with
>> "-fembed-bitcode", clang-as creates an empty section
"__LLVM, __asm" in the
>> object file. That is just a way to distinguish object files compiled
from
>> assembly source from those compiled from higher level source code but
>> forgot to use "-fembed-bitcode" options. Linker can use this
section to
>> diagnose if "-fembed-bitcode" is consistently used on all the
object files
>> participated in the linking.
>>
>
> It looks like shipping Xcode's clang has this behavior, but open-source
> clang still doesn't. Can you contribute it? It's very useful to us
if
> open-source clang has the same features as the clang shipping in Xcode.
> (That last sentence is true in general, not just for this specific
feature.)
>
Just FYI, Steven is away on vacation for a month. I think he should be back
in January.

>
>
>>
>> 3. Bitcode symbol hiding:
>> There was some concerns for leaking source code information when using
>> bitcode feature. One approach to avoid the leak is to add a pass which
>> renames all the globals and metadata strings. The also keeps a reverse
map
>> in case the original name needs to be recovered. The final bitcode
should
>> contain no more symbols or debug info than a stripped binary. To make
sure
>> modified bitcode can still be linked correctly, the renaming need to be
>> consistent across all bitcode participated in the linking and
everything
>> that is external of the linkage unit need to be preserved. This means
the
>> pass can only be run during the linking and requires some LTO api.
>>
>> 4. Debug info strip to line-tables pass:
>> As the name suggested, this pass strip down the full debug info to
>> line-tables only. This is also one of the steps we took to prevent the
leak
>> of source code information in bitcode.
>>
>> Please let me know what do you think about the pieces above or if you
>> have any concerns about the methodology. I will put up patches for
review
>> soon.
>>
>> Thanks
>>
>> Steven
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>>
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161130/3ed9ac6d/attachment.html>

llvm dev - Jul 2016 - [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)

[llvm-dev] [RFC] Embedded bitcode and related upstream (Part II)