thr3ads.net - llvm dev - [llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd [Aug 2020]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2020-Aug-28 18:22 UTC

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

You should probably pull in some folks who implemented/maintain the feature
for Darwin.

I guess they aren't linking this info, but only communicating in the object
file between tools - maybe they flag these sections (either in the object,
or by the linker) as ignored/dropped during linking. That semantic could be
implemented in ELF too by marking the sections SHF_IGNORED or something
(same-file split DWARF uses this technique).

So maybe the goal/desire is to have a different semantic, rather than the
equivalent semantic being different on ELF compared to MachO.

So if it's a different semantic - yeah, I'd guess a flag that prefixes
the
module metadata with a length would make sense, then it can be linked
naturally on any platform. (if the "don't link these sections"
support on
Darwin is done by the linker hardcoding the section name - then maybe this
flag would also put the data in a different section that isn't linker
stripped on Darwin, so users interested in getting everything linked
together can do so on any platform)

But if this data is linked, then it'd be hard to know which command line
goes with which module, yes? So maybe it'd make sense then to have the
command line as a header before the module, in the same section. So they're
kept together.

On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks, Sean, Steven,
>
> to explore this a bit further, are there currently users for non-Darwin
> cases? I wonder if it would it be an issue if we inserted markers in the
> section (maybe as an opt-in, if there were users), such that, when
> concatenated, the resulting section would be self-describing, for a
> specialized reader, of course - basically, achieve what Sean described, but
> "by design".
>
> For instance, each .o file could have a size, followed by the payload
> (maybe include in the payload the name of the module, too; maybe compress
> it, too). Same for the .llvmcmd case.
>
> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at
illinois.edu>
> wrote:
>
>> Hi Mircea,
>>
>> If you use an ordinary linker that concatenates .llvmbc sections, you
can
>> use this code to get the size of each bitcode module. As far as I know,
>> there's no clean way to separate the .llvmcmd sections without
making
>> assumptions about what options were used.
>>
>> // Given a bitcode file followed by garbage, get the size of the actual
>> // bitcode. This only works correctly with some kinds of garbage (in
>> // particular, it will work if the bitcode file is followed by zeros,
or
>> if
>> // it's followed by another bitcode file).
>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
>>   const unsigned char *BufPtr >>       reinterpret_cast<const
unsigned char *>(Buffer.getBufferStart());
>>   const unsigned char *EndBufPtr >>      
reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
>>     const unsigned char *FixedBufPtr = BufPtr;
>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
>>       report_fatal_error("Invalid bitcode wrapper");
>>     return EndBufPtr - BufPtr;
>>   }
>>
>>   if (!isRawBitcode(BufPtr, EndBufPtr))
>>     report_fatal_error("Invalid magic bytes; not a bitcode
file?");
>>
>>   BitstreamCursor Reader(Buffer);
>>   Reader.Read(32); // skip signature
>>   while (true) {
>>     size_t EntryStart = Reader.getCurrentByteNo();
>>     BitstreamEntry Entry >>        
Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
>>     if (Entry.Kind == BitstreamEntry::SubBlock) {
>>       if (Reader.SkipBlock())
>>         report_fatal_error("Invalid bitcode file");
>>     } else {
>>       // We must have reached the end of the module.
>>       return EntryStart;
>>     }
>>   }
>> }
>>
>> Sean
>>
>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
>>
>> Hi Mircea
>>
>> From the RFC you mentioned, that is a Darwin specific implementation,
>> which later got extended to support other targets. The main use case
for
>> the embed bitcode option is to allow compiler passing intermediate IR
and
>> command flags in the object file it produced for later use. For Darwin,
it
>> is used for bitcode recompilation, and some might use it to achieve
other
>> goals.
>>
>> In order to use this information properly, you needs to have tools that
>> understand the layout and sections for embedded bitcode. You can't
just use
>> an ordinary linker, because like you said, an ELF linker will just
append
>> the bitcode. Depending on what you are trying to achieve, you need to
>> implement the downstream tools, like linker, binary analysis tools,
etc. to
>> understand this concept.
>>
>> Steven
>>
>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hello,
>>
>> I'm trying to understand how .llvmbc and .llvmcmd fit into an
end-to-end
>> story. From the RFC
>>
<http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>,
>> and reading through the implementation, I'm piecing together that
the goal
>> was to enable capturing IR right after clang and before passing it to
>> LLVM's optimization passes, as well as the command line options
needed for
>> later compiling that IR to the same native object it was compiled to
>> originally (with the same compiler).
>>
>> Here's what I don't understand: say you have a.o and b.o
compiled with
>> -fembed-bitcode=all. They are linked into a binary called my_binary.
How do
>> you re-create the corresponding IR for modules a and b (let's call
them
>> a.bc and b.bc), and their corresponding command lines? From what I can
>> tell, the linker just concatenates the IR for a and b in
my_binary's
>> .llvmbc, and the same for the command line in .llvmcmd. Is there a
>> separator maybe I missed? For .llvmcmd, I could see how *maybe* -cc1
could
>> be that separator, what about the .llvmbc part? The magic number?
>>
>> Thanks!
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> *Attachments:*
>>
>>    - ATT00001.txt
>>
>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200828/484f6cb1/attachment.html>

Mircea Trofin via llvm-dev

2020-Aug-28 18:57 UTC

head link

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at gmail.com>
wrote:
> You should probably pull in some folks who implemented/maintain the
> feature for Darwin.
>
> I guess they aren't linking this info, but only communicating in the
> object file between tools - maybe they flag these sections (either in the
> object, or by the linker) as ignored/dropped during linking. That semantic
> could be implemented in ELF too by marking the sections SHF_IGNORED or
> something (same-file split DWARF uses this technique).
>
> So maybe the goal/desire is to have a different semantic, rather than the
> equivalent semantic being different on ELF compared to MachO.
>
> So if it's a different semantic - yeah, I'd guess a flag that
prefixes the
> module metadata with a length would make sense, then it can be linked
> naturally on any platform. (if the "don't link these
sections" support on
> Darwin is done by the linker hardcoding the section name - then maybe this
> flag would also put the data in a different section that isn't linker
> stripped on Darwin, so users interested in getting everything linked
> together can do so on any platform)
>
> But if this data is linked, then it'd be hard to know which command
line
> goes with which module, yes? So maybe it'd make sense then to have the
> command line as a header before the module, in the same section. So
they're
> kept together.
>This last point was my follow-up :)

>
> On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Thanks, Sean, Steven,
>>
>> to explore this a bit further, are there currently users for non-Darwin
>> cases? I wonder if it would it be an issue if we inserted markers in
the
>> section (maybe as an opt-in, if there were users), such that, when
>> concatenated, the resulting section would be self-describing, for a
>> specialized reader, of course - basically, achieve what Sean described,
but
>> "by design".
>>
>> For instance, each .o file could have a size, followed by the payload
>> (maybe include in the payload the name of the module, too; maybe
compress
>> it, too). Same for the .llvmcmd case.
>>
>> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at
illinois.edu>
>> wrote:
>>
>>> Hi Mircea,
>>>
>>> If you use an ordinary linker that concatenates .llvmbc sections,
you
>>> can use this code to get the size of each bitcode module. As far as
I know,
>>> there's no clean way to separate the .llvmcmd sections without
making
>>> assumptions about what options were used.
>>>
>>> // Given a bitcode file followed by garbage, get the size of the
actual
>>> // bitcode. This only works correctly with some kinds of garbage
(in
>>> // particular, it will work if the bitcode file is followed by
zeros, or
>>> if
>>> // it's followed by another bitcode file).
>>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
>>>   const unsigned char *BufPtr >>>      
reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());
>>>   const unsigned char *EndBufPtr >>>      
reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
>>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
>>>     const unsigned char *FixedBufPtr = BufPtr;
>>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
>>>       report_fatal_error("Invalid bitcode wrapper");
>>>     return EndBufPtr - BufPtr;
>>>   }
>>>
>>>   if (!isRawBitcode(BufPtr, EndBufPtr))
>>>     report_fatal_error("Invalid magic bytes; not a bitcode
file?");
>>>
>>>   BitstreamCursor Reader(Buffer);
>>>   Reader.Read(32); // skip signature
>>>   while (true) {
>>>     size_t EntryStart = Reader.getCurrentByteNo();
>>>     BitstreamEntry Entry >>>        
Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
>>>     if (Entry.Kind == BitstreamEntry::SubBlock) {
>>>       if (Reader.SkipBlock())
>>>         report_fatal_error("Invalid bitcode file");
>>>     } else {
>>>       // We must have reached the end of the module.
>>>       return EntryStart;
>>>     }
>>>   }
>>> }
>>>
>>> Sean
>>>
>>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
>>>
>>> Hi Mircea
>>>
>>> From the RFC you mentioned, that is a Darwin specific
implementation,
>>> which later got extended to support other targets. The main use
case for
>>> the embed bitcode option is to allow compiler passing intermediate
IR and
>>> command flags in the object file it produced for later use. For
Darwin, it
>>> is used for bitcode recompilation, and some might use it to achieve
other
>>> goals.
>>>
>>> In order to use this information properly, you needs to have tools
that
>>> understand the layout and sections for embedded bitcode. You
can't just use
>>> an ordinary linker, because like you said, an ELF linker will just
append
>>> the bitcode. Depending on what you are trying to achieve, you need
to
>>> implement the downstream tools, like linker, binary analysis tools,
etc. to
>>> understand this concept.
>>>
>>> Steven
>>>
>>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to understand how .llvmbc and .llvmcmd fit into an
end-to-end
>>> story. From the RFC
>>>
<http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>,
>>> and reading through the implementation, I'm piecing together
that the goal
>>> was to enable capturing IR right after clang and before passing it
to
>>> LLVM's optimization passes, as well as the command line options
needed for
>>> later compiling that IR to the same native object it was compiled
to
>>> originally (with the same compiler).
>>>
>>> Here's what I don't understand: say you have a.o and b.o
compiled with
>>> -fembed-bitcode=all. They are linked into a binary called
my_binary. How do
>>> you re-create the corresponding IR for modules a and b (let's
call them
>>> a.bc and b.bc), and their corresponding command lines? From what I
can
>>> tell, the linker just concatenates the IR for a and b in
my_binary's
>>> .llvmbc, and the same for the command line in .llvmcmd. Is there a
>>> separator maybe I missed? For .llvmcmd, I could see how *maybe*
-cc1 could
>>> be that separator, what about the .llvmbc part? The magic number?
>>>
>>> Thanks!
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> *Attachments:*
>>>
>>>    - ATT00001.txt
>>>
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200828/466c2ab9/attachment.html>

Fangrui Song via llvm-dev

2020-Aug-28 21:16 UTC

head link

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

On 2020-08-28, Mircea Trofin via llvm-dev wrote:>On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at gmail.com>
wrote:
>
>> You should probably pull in some folks who implemented/maintain the
>> feature for Darwin.
>>
>> I guess they aren't linking this info, but only communicating in
the
>> object file between tools - maybe they flag these sections (either in
the
>> object, or by the linker) as ignored/dropped during linking. That
semantic
>> could be implemented in ELF too by marking the sections SHF_IGNORED or
>> something (same-file split DWARF uses this technique).
The .llvmbc / .llvmcmd section does not have the SHF_EXCLUDE flag. It
will be retained in the linked image.
>> So maybe the goal/desire is to have a different semantic, rather than
the
>> equivalent semantic being different on ELF compared to MachO.
>>
>> So if it's a different semantic - yeah, I'd guess a flag that
prefixes the
>> module metadata with a length would make sense, then it can be linked
>> naturally on any platform. (if the "don't link these
sections" support on
>> Darwin is done by the linker hardcoding the section name - then maybe
this
>> flag would also put the data in a different section that isn't
linker
>> stripped on Darwin, so users interested in getting everything linked
>> together can do so on any platform)
>>
>> But if this data is linked, then it'd be hard to know which command
line
>> goes with which module, yes? So maybe it'd make sense then to have
the
>> command line as a header before the module, in the same section. So
they're
>> kept together.
>>
>This last point was my follow-up :)
A module has a source_filename field.

clang -fembed-bitcode=all -c d/a.c
llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null
llvm-dis < a.bc => source_filename = "d/a.c"

The missing piece is a mechanism to extract a module from concatenated
bitcode (llvm-dis supports multi-module bitcode but not concatenated
bitcode https://reviews.llvm.org/D70153). I'll be happy to look into it:)

---

.llvmcmd may need the source file to be more useful.
>
>>
>> On Thu, Aug 27, 2020 at 10:26 PM Mircea Trofin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Thanks, Sean, Steven,
>>>
>>> to explore this a bit further, are there currently users for
non-Darwin
>>> cases? I wonder if it would it be an issue if we inserted markers
in the
>>> section (maybe as an opt-in, if there were users), such that, when
>>> concatenated, the resulting section would be self-describing, for a
>>> specialized reader, of course - basically, achieve what Sean
described, but
>>> "by design".
>>>
>>> For instance, each .o file could have a size, followed by the
payload
>>> (maybe include in the payload the name of the module, too; maybe
compress
>>> it, too). Same for the .llvmcmd case.
>>>
>>> On Thu, Aug 27, 2020 at 6:57 PM Sean Bartell <smbarte2 at
illinois.edu>
>>> wrote:
>>>
>>>> Hi Mircea,
>>>>
>>>> If you use an ordinary linker that concatenates .llvmbc
sections, you
>>>> can use this code to get the size of each bitcode module. As
far as I know,
>>>> there's no clean way to separate the .llvmcmd sections
without making
>>>> assumptions about what options were used.
>>>>
>>>> // Given a bitcode file followed by garbage, get the size of
the actual
>>>> // bitcode. This only works correctly with some kinds of
garbage (in
>>>> // particular, it will work if the bitcode file is followed by
zeros, or
>>>> if
>>>> // it's followed by another bitcode file).
>>>> size_t GetBitcodeSize(MemoryBufferRef Buffer) {
>>>>   const unsigned char *BufPtr >>>>      
reinterpret_cast<const unsigned char *>(Buffer.getBufferStart());
>>>>   const unsigned char *EndBufPtr >>>>      
reinterpret_cast<const unsigned char *>(Buffer.getBufferEnd());
>>>>   if (isBitcodeWrapper(BufPtr, EndBufPtr)) {
>>>>     const unsigned char *FixedBufPtr = BufPtr;
>>>>     if (SkipBitcodeWrapperHeader(FixedBufPtr, EndBufPtr, true))
>>>>       report_fatal_error("Invalid bitcode wrapper");
>>>>     return EndBufPtr - BufPtr;
>>>>   }
>>>>
>>>>   if (!isRawBitcode(BufPtr, EndBufPtr))
>>>>     report_fatal_error("Invalid magic bytes; not a bitcode
file?");
>>>>
>>>>   BitstreamCursor Reader(Buffer);
>>>>   Reader.Read(32); // skip signature
>>>>   while (true) {
>>>>     size_t EntryStart = Reader.getCurrentByteNo();
>>>>     BitstreamEntry Entry >>>>        
Reader.advance(BitstreamCursor::AF_DontAutoprocessAbbrevs);
>>>>     if (Entry.Kind == BitstreamEntry::SubBlock) {
>>>>       if (Reader.SkipBlock())
>>>>         report_fatal_error("Invalid bitcode file");
>>>>     } else {
>>>>       // We must have reached the end of the module.
>>>>       return EntryStart;
>>>>     }
>>>>   }
>>>> }
>>>>
>>>> Sean
>>>>
>>>> On Thu, Aug 27, 2020, at 13:17, Steven Wu via llvm-dev wrote:
>>>>
>>>> Hi Mircea
>>>>
>>>> From the RFC you mentioned, that is a Darwin specific
implementation,
>>>> which later got extended to support other targets. The main use
case for
>>>> the embed bitcode option is to allow compiler passing
intermediate IR and
>>>> command flags in the object file it produced for later use. For
Darwin, it
>>>> is used for bitcode recompilation, and some might use it to
achieve other
>>>> goals.
>>>>
>>>> In order to use this information properly, you needs to have
tools that
>>>> understand the layout and sections for embedded bitcode. You
can't just use
>>>> an ordinary linker, because like you said, an ELF linker will
just append
>>>> the bitcode. Depending on what you are trying to achieve, you
need to
>>>> implement the downstream tools, like linker, binary analysis
tools, etc. to
>>>> understand this concept.
>>>>
>>>> Steven
>>>>
>>>> On Aug 24, 2020, at 7:10 PM, Mircea Trofin via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm trying to understand how .llvmbc and .llvmcmd fit into
an end-to-end
>>>> story. From the RFC
>>>>
<http://lists.llvm.org/pipermail/llvm-dev/2016-February/094851.html>,
>>>> and reading through the implementation, I'm piecing
together that the goal
>>>> was to enable capturing IR right after clang and before passing
it to
>>>> LLVM's optimization passes, as well as the command line
options needed for
>>>> later compiling that IR to the same native object it was
compiled to
>>>> originally (with the same compiler).
>>>>
>>>> Here's what I don't understand: say you have a.o and
b.o compiled with
>>>> -fembed-bitcode=all. They are linked into a binary called
my_binary. How do
>>>> you re-create the corresponding IR for modules a and b
(let's call them
>>>> a.bc and b.bc), and their corresponding command lines? From
what I can
>>>> tell, the linker just concatenates the IR for a and b in
my_binary's
>>>> .llvmbc, and the same for the command line in .llvmcmd. Is
there a
>>>> separator maybe I missed? For .llvmcmd, I could see how *maybe*
-cc1 could
>>>> be that separator, what about the .llvmbc part? The magic
number?
>>>>
>>>> Thanks!
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>> *Attachments:*
>>>>
>>>>    - ATT00001.txt
>>>>
>>>>
>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Sean Bartell via llvm-dev

2020-Aug-30 02:22 UTC

head link

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

On Fri, Aug 28, 2020, at 16:31, Mircea Trofin via llvm-dev
wrote:> 
> 
> On Fri, Aug 28, 2020 at 2:16 PM Fangrui Song <maskray at google.com>
wrote:
>> On 2020-08-28, Mircea Trofin via llvm-dev wrote:
>> >On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at
gmail.com> wrote:
>> >
>> >> So maybe the goal/desire is to have a different semantic,
rather than the
>> >> equivalent semantic being different on ELF compared to MachO.
>> >>
>> >> So if it's a different semantic - yeah, I'd guess a
flag that prefixes the
>> >> module metadata with a length would make sense, then it can be
linked
>> >> naturally on any platform. (if the "don't link these
sections" support on
>> >> Darwin is done by the linker hardcoding the section name -
then maybe this
>> >> flag would also put the data in a different section that
isn't linker
>> >> stripped on Darwin, so users interested in getting everything
linked
>> >> together can do so on any platform)
>> >>
>> >> But if this data is linked, then it'd be hard to know
which command line
>> >> goes with which module, yes? So maybe it'd make sense then
to have the
>> >> command line as a header before the module, in the same
section. So they're
>> >> kept together.
>> >>
>> >This last point was my follow-up :)
>> 
>> A module has a source_filename field.
>> 
>> clang -fembed-bitcode=all -c d/a.c
>> llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null
>> llvm-dis < a.bc => source_filename = "d/a.c"
>> 
>> The missing piece is a mechanism to extract a module from concatenated
>> bitcode (llvm-dis supports multi-module bitcode but not concatenated
>> bitcode https://reviews.llvm.org/D70153). I'll be happy to look
into it:)
>> 
>> ---
>> 
>> .llvmcmd may need the source file to be more useful.
> Right - I think, for the non-Darwin concatenated case, all three of us
(David, you, and I) are thinking along the lines of keeping together: the module
name, the bytecode, and the command line - effectively not using .llvmcmd, and
being able to correctly extract, by design, the rest of the information.
Here's the format I would suggest:

1. Put command-line flags in the module metadata instead of .llvmcmd.
2. Put each module in the bitcode wrapper supported by SkipBitcodeWrapperHeader,
which includes a length field. I think LLVM only generates the wrapper for
Darwin, but it can read the wrapper correctly on all platforms.
3. Change the .llvmbc section alignment so that no extra zeros are added between
modules.

My use case: I'm using -fembed-bitcode on Linux as an alternative to the
wllvm/whole-program-llvm tool. For my purposes, it'd be nice to also keep
track of linker flags and other linker input files, but I can get most of what I
need from the modules alone.

Sean

Fāng-ruì Sòng via llvm-dev

2020-Aug-30 04:48 UTC

head link

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

On Sat, Aug 29, 2020 at 7:22 PM Sean Bartell via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On Fri, Aug 28, 2020, at 16:31, Mircea Trofin via llvm-dev wrote:
> >
> >
> > On Fri, Aug 28, 2020 at 2:16 PM Fangrui Song <maskray at
google.com> wrote:
> >> On 2020-08-28, Mircea Trofin via llvm-dev wrote:
> >> >On Fri, Aug 28, 2020 at 11:22 AM David Blaikie <dblaikie at
gmail.com> wrote:
> >> >
> >> >> So maybe the goal/desire is to have a different semantic,
rather than the
> >> >> equivalent semantic being different on ELF compared to
MachO.
> >> >>
> >> >> So if it's a different semantic - yeah, I'd guess
a flag that prefixes the
> >> >> module metadata with a length would make sense, then it
can be linked
> >> >> naturally on any platform. (if the "don't link
these sections" support on
> >> >> Darwin is done by the linker hardcoding the section name
- then maybe this
> >> >> flag would also put the data in a different section that
isn't linker
> >> >> stripped on Darwin, so users interested in getting
everything linked
> >> >> together can do so on any platform)
> >> >>
> >> >> But if this data is linked, then it'd be hard to know
which command line
> >> >> goes with which module, yes? So maybe it'd make sense
then to have the
> >> >> command line as a header before the module, in the same
section. So they're
> >> >> kept together.
> >> >>
> >> >This last point was my follow-up :)
> >>
> >> A module has a source_filename field.
> >>
> >> clang -fembed-bitcode=all -c d/a.c
> >> llvm-objcopy --dump-section=.llvmbc=a.bc a.o /dev/null
> >> llvm-dis < a.bc => source_filename = "d/a.c"
> >>
> >> The missing piece is a mechanism to extract a module from
concatenated
> >> bitcode (llvm-dis supports multi-module bitcode but not
concatenated
> >> bitcode https://reviews.llvm.org/D70153). I'll be happy to
look into it:)
> >>
> >> ---
> >>
> >> .llvmcmd may need the source file to be more useful.
> > Right - I think, for the non-Darwin concatenated case, all three of us
(David, you, and I) are thinking along the lines of keeping together: the module
name, the bytecode, and the command line - effectively not using .llvmcmd, and
being able to correctly extract, by design, the rest of the information.
>
> Here's the format I would suggest:
>
> 1. Put command-line flags in the module metadata instead of .llvmcmd.
> 2. Put each module in the bitcode wrapper supported by
SkipBitcodeWrapperHeader, which includes a length field. I think LLVM only
generates the wrapper for Darwin, but it can read the wrapper correctly on all
platforms.
> 3. Change the .llvmbc section alignment so that no extra zeros are added
between modules.
>
> My use case: I'm using -fembed-bitcode on Linux as an alternative to
the wllvm/whole-program-llvm tool. For my purposes, it'd be nice to also
keep track of linker flags and other linker input files, but I can get most of
what I need from the modules alone.
>
> Sean
I investigated a bit about the bitcode file format today. The bitcode
is streaming style and I think an optional size field may be useful.
 https://reviews.llvm.org/D86847 proposes to add a
BITCODE_SIZE_BLOCK_ID block. We actually don't need a container
because
the MODULE_CODE_SOURCE_FILENAME record encodes the source filename. We
can do a lightweight parse and obtain the field.
This should be fast because there are typically very few
blocks/records preceding MODULE_CODE_SOURCE_FILENAME.

For .llvmcmd, I am on the fence moving it into the bitcode. Downside:
retrieving the command line will be more difficult...
I'd like to mention that the functionality duplicates the existing
-frecord-command-line a bit...

% readelf -p .GCC.command.line a.o

String dump of section '.GCC.command.line':
  [     1]  /tmp/clang-12 -c -frecord-command-line a.c

(GCC -frecord-gcc-switches uses a different format (some folks
consider it inferior to clang's format; and worse, the section is
SHF_MERGE|SHF_STRINGS):
% readelf -p .GCC.command.line a.o

String dump of section '.GCC.command.line':
  [     0]  -imultiarch x86_64-linux-gnu
  [    1d]  a.c
  [    21]  -mtune=generic
  [    30]  -march=x86-64
  [    3e]  -frecord-gcc-switches
  [    54]  -fasynchronous-unwind-tables

llvm dev - Aug 2020 - End-to-end -fembed-bitcode .llvmbc and .llvmcmd

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd

[llvm-dev] End-to-end -fembed-bitcode .llvmbc and .llvmcmd