thr3ads.net - llvm dev - [llvm-dev] RFC: APIs for bitcode files containing multiple modules [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Oct-28 21:21 UTC

[llvm-dev] RFC: APIs for bitcode files containing multiple modules

> On Oct 28, 2016, at 2:16 PM, Will Dietz <willdtz at gmail.com> wrote:
> 
> On Fri, Oct 28, 2016 at 2:06 PM, Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
>> On Fri, Oct 28, 2016 at 6:11 AM, Will Dietz <willdtz at
gmail.com> wrote:
>>> 
>>> On Wed, Oct 26, 2016 at 2:04 PM, Peter Collingbourne via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> On Tue, Oct 25, 2016 at 8:36 PM, Mehdi Amini <mehdi.amini at
apple.com>
>>>> wrote:
>>>>> 
>>>>> 
>>>>> On Oct 25, 2016, at 6:28 PM, Peter Collingbourne <peter
at pcc.me.uk>
>>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> As mentioned in my recent RFC entitled "RFC: a more
detailed design for
>>>>> ThinLTO + vcall CFI" I would like to introduce the
ability for bitcode
>>>>> files
>>>>> to contain multiple modules. In
https://reviews.llvm.org/D24786 I took
>>>>> a
>>>>> step towards that by proposing a change to the module
format so that
>>>>> the
>>>>> block info block is stored at the top level. The next step
is to think
>>>>> about
>>>>> what the API would look like for reading and writing
multiple modules.
>>>>> 
>>>>> Here's what I have in mind. To create a multi-module
bitcode file, you
>>>>> would create a BitcodeWriter object and add modules to it:
>>>>> 
>>>>> BitcodeWriter W(OS);
>>>>> W.addModule(M1);
>>>>> W.addModule(M2);
>>>>> W.write();
>>>>> 
>>>>> 
>>>>> That requires the two modules to lives longer than the
bitcode write,
>>>>> the
>>>>> API could be:
>>>>> 
>>>>> BitcodeWriter W(OS);
>>>>> W.writeModule(M1);
>>>>> // delete M1
>>>>> // ...
>>>>> // create M2
>>>>> W.writeModule(M2);
>>>>> 
>>>>> (Maybe you had this in mind, but the API naming didn’t
reflect it so
>>>>> I’m
>>>>> not sure).
>>>> 
>>>> 
>>>> In the API I prototyped, I took the maximum
BitsRequiredForTypeIndices
>>>> value
>>>> from all the modules, and used it to produce the abbreviations
for the
>>>> top/
>>>> level block info block (without this I was seeing
"Unexpected abbrev
>>>> ordering!" errors in the bitcode writer as a result of
emitting the
>>>> "same"
>>>> abbreviation multiple times). That would have required us to
keep the
>>>> modules around until the call to write(). However, let me
revisit this,
>>>> because it does not seem necessary (i.e. we can just continue
to emit
>>>> block
>>>> info blocks within the module block except with different
abbreviation
>>>> numbers for each module).
>>>>> 
>>>>> Reading a multi-module bitcode file would be supported with
a
>>>>> BitcodeReader class. Each of the functional reader APIs in
>>>>> ReaderWriter.h
>>>>> would have a member function on BitcodeReader. We would
also have a
>>>>> next()
>>>>> member function which would move to the next module in the
file. For
>>>>> example:
>>>>> 
>>>>> BitcodeReader R(MBRef);
>>>>> Expected<bool> B = R.hasGlobalValueSummary();
>>> 
>>> What's this used for?
>> 
>> 
>> This would be the equivalent to the existing
llvm::hasGlobalValueSummary()
>> function, which currently controls whether we compile a module with
regular
>> LTO or with ThinLTO.
>> 
>>> Would there be a "readGlobalValueSummary()"
>>> similar to function summaries?
>> 
>> 
>> There would be a getModuleSummaryIndex() which again would be similar
to
>> llvm::getModuleSummaryIndex(). Note that the module summary already
covers
>> all global values, not just functions.
>> 
>>>>> std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); //
lazily load the
>>>>> first module
>>>>> R.next();
>>>>> std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx);
// eagerly load
>>>>> the
>>>>> second module
>>> 
>>> I'm very excited about the idea of storing multiple modules in
a
>>> bitcode file, and the (thin)LTO and CFI goodness you're
building using
>>> it.
>>> 
>>> I have a few questions about where you're going if you
don't mind--and
>>> it's related to the API in that it's awfully hard to judge
an API
>>> without knowing what it's expected to be used for or what the
>>> underlying data represents.
>>> 
>>> On that-- I'm sorry if I've missed this information, but
reading
>>> through your RFC's and posts I'm not finding the answer.
>>> Is there a definition/explanation of what it means to have a
bitcode
>>> file containing multiple modules?
>>> 
>>> Is this a storage optimization where each module is what today is
an
>>> "llvm::Module" but we're encoding them into a single
file for
>>> efficiency/convenience reasons?
>> 
>> 
>> Yes, each module would be an llvm::Module. This is more for convenience
>> reasons -- it's the simplest way to split modules that use CFI into
a
>> regular LTO part and a ThinLTO part (as described in the RFC entitled
"RFC:
>> a more detailed design for ThinLTO + vcall CFI") while storing the
entire
>> compiled translation unit in a single file.
>> 
> 
> Hmm, interesting.  Thank you for the explanation.
> 
> This seems to be closer to partitioning a single Module than
> supporting multiple modules (at least not yet).
> Does that seem accurate?
The use case is portioning a single module. We should have any other assumption
at this level (bitcode).
If you want to stuck multiple version of the same module for various
architecture, that’s fine. You can have your own tooling to load the right
module for a given architecture.

> If so maybe the API should be geared towards that--allow
> "partition-aware" clients to read the pieces individually while
> transparently treating the overall file as a single Module for
> existing clients.
> Just a thought, perhaps this wouldn't work for your use case?
While this could work for this use case, this would make it either very complex
in the bitcode itself, or very inefficient for loading all as single module.
> 
> Anyway I actually am very interested in support for multiple modules,
> my use case being for use in shipping software in IR form as part of
> the ALLVM project.  Hence questions about things like linker semantics
> and such.
Right, I’m interested in this as well, and my vision in general is to try to
build `basic blocks`  as neutral as possible, so that it is easier reuse them
for such cases as ALLVM.

Hope this help.

— 
Mehdi

> 
> Don't mean to burden you with accommodating the use-cases of everyone
> else (like myself),
> I guess I was just was surprised to see the bitcode format extended in
> this way without an explicit discussion of the bigger picture--
> what this was intended to be used for or why it was necessary, where
> it was going... :).  Mostly because as you say it seems rather useful
> for other parties (heterogeneous, for example) but I suppose we/they
> can chime in and help refine the details later on once these bits are
> committed :).
> 
> Thank you for your explanation, very much appreciated :).
> 
>>> If so, can these modules have different triples?
>> 
>> 
>> That would certainly be possible in principle, but it's not part of
my use
>> case. I'd imagine that another potential use case for this could be
to allow
>> for LTO when targeting heterogeneous architectures (e.g. CUDA/OpenMP),
but
>> I'm not sure about the specifics of how that could work.
>> 
>>> 
>>> Different ("conflicting") definitions for a global?
>> 
>> 
>> In principle such inputs would be rejected by the linker with a
duplicate
>> symbol error. That might not be the appropriate thing to do in the
>> heterogeneous case though.
> 
> Yeah, it seemed unclear what this would "mean" and I suppose for
now
> is simply something folks can interpret/handle however makes sense for
> their use case :).
> 
>> 
>>> There are also multiple tools that take bitcode as input, and
>>> currently expect a single module.
>>> Will these be made to reject multiple-module bitcode, and if not is
>>> the plan to extend tools to handle multiple-module files?
>> 
>> 
>> For testing purposes I was planning to extend llvm-dis (and possibly
opt) to
>> take a flag specifying a module index, and introduce an llvm-join tool
which
>> could be used to create a bitcode from multiple inputs.
>> 
> 
> Awesome! I'm not sure how important it is but it seems that it should
> be made an error to ignore part of a bitcode file?
> (Shouldn't llvm-nm print vtable bits?)
> 
>> The other tools probably don't need to know about this and could
just read
>> the first module.
>> 
>>> Beyond the random access suggestion (+1) and lifetime comments, it
>>> seems like there should be a way to reason about the contents of
these
>>> modules--names, identifiers, flags, *something* so that "load
the
>>> first module lazily and the second eagerly" can become
"load the
>>> module containing my CFI information eagerly but the rest
lazily" or
>>> something, or at least to check that this file was created using
>>> -fsanitize=cfi and not something else.
>> 
>> 
>> Right, this is the sort of functionality that would be provided by
functions
>> such as hasGlobalValueSummary().
> 
> Ah, neat.  I'll look into that, since apparently it answers many of my
> questions :D.  Sorry for the trouble :).
> 
> Thanks again, happy LLVM'ing...
> 
> ~Will
> 
>> 
>> Thanks,
>> --
>> Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/2c13f342/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Oct-28 21:25 UTC

head link

[llvm-dev] RFC: APIs for bitcode files containing multiple modules

> On Oct 28, 2016, at 2:21 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Oct 28, 2016, at 2:16 PM, Will Dietz <willdtz at gmail.com
<mailto:willdtz at gmail.com>> wrote:
>> 
>> On Fri, Oct 28, 2016 at 2:06 PM, Peter Collingbourne <peter at
pcc.me.uk <mailto:peter at pcc.me.uk>> wrote:
>>> On Fri, Oct 28, 2016 at 6:11 AM, Will Dietz <willdtz at
gmail.com <mailto:willdtz at gmail.com>> wrote:
>>>> 
>>>> On Wed, Oct 26, 2016 at 2:04 PM, Peter Collingbourne via
llvm-dev
>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>> On Tue, Oct 25, 2016 at 8:36 PM, Mehdi Amini
<mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>>
>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Oct 25, 2016, at 6:28 PM, Peter Collingbourne
<peter at pcc.me.uk <mailto:peter at pcc.me.uk>>
>>>>>> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> As mentioned in my recent RFC entitled "RFC: a
more detailed design for
>>>>>> ThinLTO + vcall CFI" I would like to introduce the
ability for bitcode
>>>>>> files
>>>>>> to contain multiple modules. In
https://reviews.llvm.org/D24786 <https://reviews.llvm.org/D24786> I took
>>>>>> a
>>>>>> step towards that by proposing a change to the module
format so that
>>>>>> the
>>>>>> block info block is stored at the top level. The next
step is to think
>>>>>> about
>>>>>> what the API would look like for reading and writing
multiple modules.
>>>>>> 
>>>>>> Here's what I have in mind. To create a
multi-module bitcode file, you
>>>>>> would create a BitcodeWriter object and add modules to
it:
>>>>>> 
>>>>>> BitcodeWriter W(OS);
>>>>>> W.addModule(M1);
>>>>>> W.addModule(M2);
>>>>>> W.write();
>>>>>> 
>>>>>> 
>>>>>> That requires the two modules to lives longer than the
bitcode write,
>>>>>> the
>>>>>> API could be:
>>>>>> 
>>>>>> BitcodeWriter W(OS);
>>>>>> W.writeModule(M1);
>>>>>> // delete M1
>>>>>> // ...
>>>>>> // create M2
>>>>>> W.writeModule(M2);
>>>>>> 
>>>>>> (Maybe you had this in mind, but the API naming didn’t
reflect it so
>>>>>> I’m
>>>>>> not sure).
>>>>> 
>>>>> 
>>>>> In the API I prototyped, I took the maximum
BitsRequiredForTypeIndices
>>>>> value
>>>>> from all the modules, and used it to produce the
abbreviations for the
>>>>> top/
>>>>> level block info block (without this I was seeing
"Unexpected abbrev
>>>>> ordering!" errors in the bitcode writer as a result of
emitting the
>>>>> "same"
>>>>> abbreviation multiple times). That would have required us
to keep the
>>>>> modules around until the call to write(). However, let me
revisit this,
>>>>> because it does not seem necessary (i.e. we can just
continue to emit
>>>>> block
>>>>> info blocks within the module block except with different
abbreviation
>>>>> numbers for each module).
>>>>>> 
>>>>>> Reading a multi-module bitcode file would be supported
with a
>>>>>> BitcodeReader class. Each of the functional reader APIs
in
>>>>>> ReaderWriter.h
>>>>>> would have a member function on BitcodeReader. We would
also have a
>>>>>> next()
>>>>>> member function which would move to the next module in
the file. For
>>>>>> example:
>>>>>> 
>>>>>> BitcodeReader R(MBRef);
>>>>>> Expected<bool> B = R.hasGlobalValueSummary();
>>>> 
>>>> What's this used for?
>>> 
>>> 
>>> This would be the equivalent to the existing
llvm::hasGlobalValueSummary()
>>> function, which currently controls whether we compile a module with
regular
>>> LTO or with ThinLTO.
>>> 
>>>> Would there be a "readGlobalValueSummary()"
>>>> similar to function summaries?
>>> 
>>> 
>>> There would be a getModuleSummaryIndex() which again would be
similar to
>>> llvm::getModuleSummaryIndex(). Note that the module summary already
covers
>>> all global values, not just functions.
>>> 
>>>>>> std::unique_ptr<Module> M1 =
R.getLazyModule(Ctx); // lazily load the
>>>>>> first module
>>>>>> R.next();
>>>>>> std::unique_ptr<Module> M2 =
R.parseBitcodeFile(Ctx); // eagerly load
>>>>>> the
>>>>>> second module
>>>> 
>>>> I'm very excited about the idea of storing multiple modules
in a
>>>> bitcode file, and the (thin)LTO and CFI goodness you're
building using
>>>> it.
>>>> 
>>>> I have a few questions about where you're going if you
don't mind--and
>>>> it's related to the API in that it's awfully hard to
judge an API
>>>> without knowing what it's expected to be used for or what
the
>>>> underlying data represents.
>>>> 
>>>> On that-- I'm sorry if I've missed this information,
but reading
>>>> through your RFC's and posts I'm not finding the
answer.
>>>> Is there a definition/explanation of what it means to have a
bitcode
>>>> file containing multiple modules?
>>>> 
>>>> Is this a storage optimization where each module is what today
is an
>>>> "llvm::Module" but we're encoding them into a
single file for
>>>> efficiency/convenience reasons?
>>> 
>>> 
>>> Yes, each module would be an llvm::Module. This is more for
convenience
>>> reasons -- it's the simplest way to split modules that use CFI
into a
>>> regular LTO part and a ThinLTO part (as described in the RFC
entitled "RFC:
>>> a more detailed design for ThinLTO + vcall CFI") while storing
the entire
>>> compiled translation unit in a single file.
>>> 
>> 
>> Hmm, interesting.  Thank you for the explanation.
>> 
>> This seems to be closer to partitioning a single Module than
>> supporting multiple modules (at least not yet).
>> Does that seem accurate?
> 
> The use case is portioning a single module. We should have any other
assumption at this level (bitcode).
I think my sentence is not well written, let me retry: “The CFI use case here is
partitioning a single module in two. But at this level (bitcode), we should not
bake such assumptions."
> If you want to stuck multiple version of the same module for various
architecture, that’s fine. You can have your own tooling to load the right
module for a given architecture.
> 
> 
>> If so maybe the API should be geared towards that--allow
>> "partition-aware" clients to read the pieces individually
while
>> transparently treating the overall file as a single Module for
>> existing clients.
>> Just a thought, perhaps this wouldn't work for your use case?
> 
> While this could work for this use case, this would make it either very
complex in the bitcode itself, or very inefficient for loading all as single
module.
> 
>> 
>> Anyway I actually am very interested in support for multiple modules,
>> my use case being for use in shipping software in IR form as part of
>> the ALLVM project.  Hence questions about things like linker semantics
>> and such.
> 
> Right, I’m interested in this as well, and my vision in general is to try
to build `basic blocks`  as neutral as possible, so that it is easier reuse them
for such cases as ALLVM.
> 
> Hope this help.
> 
> — 
> Mehdi
> 
> 
>> 
>> Don't mean to burden you with accommodating the use-cases of
everyone
>> else (like myself),
>> I guess I was just was surprised to see the bitcode format extended in
>> this way without an explicit discussion of the bigger picture--
>> what this was intended to be used for or why it was necessary, where
>> it was going... :).  Mostly because as you say it seems rather useful
>> for other parties (heterogeneous, for example) but I suppose we/they
>> can chime in and help refine the details later on once these bits are
>> committed :).
>> 
>> Thank you for your explanation, very much appreciated :).
>> 
>>>> If so, can these modules have different triples?
>>> 
>>> 
>>> That would certainly be possible in principle, but it's not
part of my use
>>> case. I'd imagine that another potential use case for this
could be to allow
>>> for LTO when targeting heterogeneous architectures (e.g.
CUDA/OpenMP), but
>>> I'm not sure about the specifics of how that could work.
>>> 
>>>> 
>>>> Different ("conflicting") definitions for a global?
>>> 
>>> 
>>> In principle such inputs would be rejected by the linker with a
duplicate
>>> symbol error. That might not be the appropriate thing to do in the
>>> heterogeneous case though.
>> 
>> Yeah, it seemed unclear what this would "mean" and I suppose
for now
>> is simply something folks can interpret/handle however makes sense for
>> their use case :).
>> 
>>> 
>>>> There are also multiple tools that take bitcode as input, and
>>>> currently expect a single module.
>>>> Will these be made to reject multiple-module bitcode, and if
not is
>>>> the plan to extend tools to handle multiple-module files?
>>> 
>>> 
>>> For testing purposes I was planning to extend llvm-dis (and
possibly opt) to
>>> take a flag specifying a module index, and introduce an llvm-join
tool which
>>> could be used to create a bitcode from multiple inputs.
>>> 
>> 
>> Awesome! I'm not sure how important it is but it seems that it
should
>> be made an error to ignore part of a bitcode file?
>> (Shouldn't llvm-nm print vtable bits?)
>> 
>>> The other tools probably don't need to know about this and
could just read
>>> the first module.
>>> 
>>>> Beyond the random access suggestion (+1) and lifetime comments,
it
>>>> seems like there should be a way to reason about the contents
of these
>>>> modules--names, identifiers, flags, *something* so that
"load the
>>>> first module lazily and the second eagerly" can become
"load the
>>>> module containing my CFI information eagerly but the rest
lazily" or
>>>> something, or at least to check that this file was created
using
>>>> -fsanitize=cfi and not something else.
>>> 
>>> 
>>> Right, this is the sort of functionality that would be provided by
functions
>>> such as hasGlobalValueSummary().
>> 
>> Ah, neat.  I'll look into that, since apparently it answers many of
my
>> questions :D.  Sorry for the trouble :).
>> 
>> Thanks again, happy LLVM'ing...
>> 
>> ~Will
>> 
>>> 
>>> Thanks,
>>> --
>>> Peter
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/954357db/attachment.html>

Will Dietz via llvm-dev

2016-Oct-28 21:32 UTC

head link

[llvm-dev] RFC: APIs for bitcode files containing multiple modules

On Fri, Oct 28, 2016 at 4:25 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:> I think my sentence is not well written, let me retry: “The CFI use case
> here is partitioning a single module in two. But at this level (bitcode),
we
> should not bake such assumptions."
Ah, awesome.  Thanks this makes great sense, sounds great to me :D.  Thanks!

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Oct 2016 - RFC: APIs for bitcode files containing multiple modules

[llvm-dev] RFC: APIs for bitcode files containing multiple modules

[llvm-dev] RFC: APIs for bitcode files containing multiple modules

[llvm-dev] RFC: APIs for bitcode files containing multiple modules

Possibly Parallel Threads